Sometimes, a professional translator may require to create TMX file from Excel or other formats in order to reuse bilingual material accumulated with experience over the years. Or perhaps there is no source material but the translator would like to build a theme or field-specific translation memory for a job as a reference. This is useful sometimes in our profession when clients provide us with text to translate but no previous reference material, no terminology database… no reference. As professional translators, our job is to be as terminologically accurate as possible and this can only be achieved if we have tools and reference material available to check and upon which to base our work.
But “Create TMX file from Excel” is not a feature most CAT tools provide automatically (I’m quoting Excel as the main program as it is readily available in most desktop and laptop PCs or macs, but really any bilingual table will serve the purpose). This example comes from a Google Document where one linguist had collected game terminology English – German.
Creating a TMX from a bilingual corpus
OK, so we have an aligned corpus in bilingual format (delimited format). Imagine now we have this aligned corpus in Excel (or any other delimited format) and we want to make use of that content in our favorite CAT tool. The process I’m about to show you is good for all CAT tools, as they all can import TMX format. Our target is to turn xls or similar format to a format our CAT tool will read successfully.
Follow these steps to convert a bilingual text format file to TMX. We are going to make use of a very-handy and free open-source tool called Olifant, a tool developed in the Okapi Framework.
- Ensure you have an aligned corpus in Excel, with the leftmost column containing the source text and the target in the next column. If your corpus is not perfectly aligned, you may need to check this or even try alignment tools like LF-Aligner, from example. Paste the bilingual table in Notepad and save the file, ensuring the encoding is set to UTF-8. Now you have a bilingual source language-target language file.
- OK, let’s visit the download page to get Olifant. Unzip it, install and launch it.
- Press Ctrl+N to create a new Translation Memory with the name you choose. Add your language code to the target field. Make sure to use the local (for example ES-MX for Spanish, Mexico, FR-CA for French Canada, EN-GB for English Great Britain, etc).
- Go to File>Import. Now choose Tab-delimited files (.txt) from the drop-down menu. Locate the file you created in step 2 and click Open.
- In the Destination Field, set the Field Type of Column 1 to Text, Language EN-US (or whatever source language you’re working with), and for Column 2, Text again as Field Type and your target language code in the Language field.
- Press OK and hit Save. Your bilingual corpus has been converted to a TMX file!
Olifant is fitted with powerful editing tools including advanced Find/Replace. It is an extremely useful tool to clean TMs of non-valid segments. It also gives you the ability to delete, add, merge and edit segments on the spot. We will discuss these in further posts.