Archive for the ‘Specific Translation and Localization Tools’ Category

Integrating OmegaT with MyMemory

Thursday, August 29th, 2013

MyMemory (created by the Italian company provides an open translation memory (TM) server that can be accessed for free by everyone. There are some limitations on the number of queries a user can run, but for the average freelance translator it should be sufficient. In addition to manual searches, MyMemory offers an open API that can be used to retrieve TM contents. These results are returned in either JSON or TMX format and they also include a machine-translation match from Google.

I have recently integrated the MyMemory TM server into OmegaT as a machine translation plugin. Initially, I went for the JSON format, which is easier to parse, but alas! I noticed too late that the JSON library I was using wasn’t compatible with OmegaT’s GPL v2 license (which has been updated to v3 in the meantime). So, I went for TMX instead, which requires some XML parsing and XPath rules to extract the right contents.

During testing we noticed that more often than not the results provided by the TM server were not that helpful. This is not too surprising, since retrieving meaningful matches this way is only possible, if someone else has translated a very similar sentence for this particular language combination before and uploaded it to MyMemory. Therefore, I separated the plugin into an MT plugin and a TM plugin. Both can be used independently. The TM contents could still be useful, of course, for instance in team projects. In any case, the users should be careful when sending contents to MyMemory that they don’t violate any copyrights or non-disclosure agreements. After all, anything you send to MyMemory will be available to the rest of the world.

The update is available in OmegaT v3.0.4 Update 2. Have fun playing around with the plugin and if you have any comments, send me a note: martin (at) wunderlich DOT com. Thank you.

HTML2TMX – A tool to grab bilingual content from the web and import into your CAT tool

Tuesday, April 16th, 2013

Recently, on the OmegaT mailing list a user was describing a problem that others might have faced before. Imagine you have come across a website that offers sentences in a bilingual table format, such as the search results provided by Linguee:

Or you might have some legacy content in a HTML file that you would like to use in your favourite CAT tool.

Well, after a bit of research, this problem turned into a little side project of mine and has led to a tool which I called “HTML2TMX” (please do let me know, if you have a better name).

I have published the files for “HTML2TMX” here
and here\

and the source code (under LGPL) here:

The tool is written in Java, which means it runs on any platform (Mac, Linux, Windows…). It turns any HTML table into a TMX file in a two step process:

- First, you need to tell the tool where to find the table; this can be a URL or a file on your local file system. HTML2TMX will then extract the header information from this table, so that you can select which column of the table maps to which language.

- Second, you run the tool again, this time providing the link to the table, the mapping information and the filename to the TMX file.

Once you have the TMX created, you just need to import this into your preferred CAT tool. At the moment, there HTML2TMX has the limitation of providing a command line interface only. The advantage is that you can use the tool in scripts of your own creation, but the downside is, of course, that many translators would probably prefer a user-friendly GUI. If there is enough interest (which you can express by sending me an email: martin AT wunderlich DOT com), I am more than happy to also create the GUI and perhaps even include this functionality as a new feature in OmegaT.


German GUI localization of OmegaT updated

Thursday, September 13th, 2012

I have brought German version of OmegaT to the current state of things. A lot has happened since the last update of the German GUI and so there were 200+ new segments (out of ca. 750 in total) – this by itself speaks for the development of the tool.

In case you haven’t heard of it, OmegaT is probably the most popular open-source CAT tool (or TEnT for translation environment tools as Jost Zetzsche calls them). It is a Java application and therefore runs on all platforms – Windows, Mac, Linux, etc. OmegaT has been designed with openness in mind and therefore supports many formats that other tools have been ignoring, such as PO or OpenOffice. Also, open standards play a big role in the architecture (TMX, XLIFF, SRX…) and the TM created by the tool is directly accessible in TMX form in the projects folder.

Part of the openness means that everyone can contribute their localised version of the GUI and the documentation. If you also would like to join the OmegaT translation team, have a look at the website and/or join the mailing list.