Machine Translation

PangeaMT is Pangeanic’s own, independent translation technology division with a clear focus on customized, domain-specific Machine Translation (MT). Pangeanic has developed and used machine translation for many applications since it became a member of TAUS and thanks to access to millions of words as training corpus with which it was able to experiment. Machine translation became part of company culture since 2009 and since then machine translation services to corporations and even other translation companies have become part of Pangeanic’s range of services.

HISTORY

As a forward-thinking and technology-savvyLSP, Pangeanic wins a post-editing contract in 2007 to work for the European Commission as MT output post-editors. It is at this time when we become acquainted with institutional user needs and (re-)evaluated several commercial MT products we had been using. Soon we decided to develop our own machine translation technology.

Pangeanic was quoted as the first language service provider to make commercial use of Moses in EU’s Framework development program euromatrixplus.net (the second, more perfected release of Moses). Since then, many presentations, awards and implementations have followed, and Pangeanic has made a name for itself as a leading machine translation implementation company. It also markets its machine translation services in other areas beyond the translation industry and is heavily involved in two more EU machine translation R&D programs, EXPERT and Casmacat (User Group).

FOCUS

We began as keen followers of the statistical-driven paradigm of machine translation. This worked very well in several related languages (Romance languages and English, German and Scandinavian languages). However, our links to Japanese industry soon provided requests to add Japanese and Chinese to our service portfolio. In 2011, Pangeanic developed hybrid machine translation services which were included as part of the system features.

Pangeanic’s Syntax-Based Hybrid Machine Translation

FEATURES

Despite our Moses bias, we have been able to overcome many of Moses shortcomings in order to fit the needs of the translation industry: our solutions go beyond text-based MT and are capable of taking input and producing output in industry-standards, such as TMX and XLIFF. PangeaMT provides API access to other translation platforms so you do not need to change your translation environment but you can benefit from adding your future translations in a virtous re-training cycle. Using open standards means that you will never have to buy expensive TM software again. Our solutions just avoid having you locked-in by expensive upgrades year after year.

Another PangeaMT breakthrough is our inline mark-up parser. PangeaMT handles tags extremely efficiently. Statistical machine translation systems (as they come from open sources releases) usually produce plain text output because this is also the format they process. However, we are keen to see PangeaMT solutions in use and adapted to the most demanding language industry requirements. We focused our effort on developing SMT engines capable of handling in-line coding typical of other content formats used in localization production environments. Thanks to this parser, PangeaMT can identify in-lines without attempting to translate them, and it places them back in the resulting text, too. An in-line placeholder acts first by copying and transferring all XML and code information to a separate module. The translation engine does its work and then places the in-line back into the translated segment. At the time of its release, our in-line parser constituted an innovation well-above the current level of maturity of well-known SMT systems.

We keep learning and improving with every development commissioned by an existing or new client and language combination. We therefore remain open as to apply new hybridization techniques, even ad-hoc rules, that we research and implement ourselves or co-develop in conjunction with our clients. We are aware of the fact that for some language combinations it will be necessary to resort to some linguistic-informative techniques that will be part of the pre- or post-processing phases. Right word and phrase reordering in the MT output is not an easy goal to achieve, especially when the languages involved are not closely linked from a linguistic family standpoint, or when one of the two languages has a really flexible and so MT-challenging word order (WO). Some language-specific fixing procedures may come handy. In some other cases, it may be useful to use one language as pivot to train engines in languages that are pretty close. These and other techniques may be used or taken as a basis for expanding our PangeaMT solution palette.

Please visit our machine translation division website to learn more about PangeaMT.

Our site uses cookies

Pangeanic: Translation Agency