Archive for the ‘Open Source’ Category

The true power of open-source software

Sunday, November 17th, 2013

Sometimes you need to be truly stuck in order to appreciate the capabilities of your vehicle. On a rough gravel bed, you will quickly find out how good the suspension and sturdy wheels of your bike really are. And likewise, when working on a urgent software development project, you will find out about the true power of open-source software when running into a library compatibility issue.

Let me explain, expand, expound and exposit:

Recently, I was working on adding SRX-based segmentation capabilities to an XLIFF converter. Rather than trying to reinvent the wheel, I decided to use the fantastic Okapi localization framework, which provides an SRX segmenter class (amongst many, many other useful things). It was easy enough to set up: Simply include the Okapi library jar in your Eclipse project, read through the excellent developer guide and off you go. Until that strange bug hits you, when trying to actually read in the first SRX file:

“ClassCastException: Cannot cast java.util.ArrayList (id=77) to org.w3c.dom.NodeList “.

“Very interesting”, you think to yourself, keeping the non-quotable exclamations well restrained inside your little developer brain. And then – the true power of open-source kicks in: Instead of trying to get hold of some tech support person and – if lucky – waiting for an updated release, I simply include the actual Okapi source code in my project. I can inspect the code, identify the problematic line, and modify it to make my own code work again.

In the end, it turned out that the XML parser library we were using was a bit out-of-date and that the return type of an XPath evaluation method was an ArrayList instead of the expected NodeList. Having written to the Okapi mailing list, I quickly got assistance from the ever so helpful Yves Savourel. However, in the meantime I was also able to continue on my project, after having modified the Okapi code, thanks to the open-ness of open-source software.

Sweet, isn’t it? Imagine what the situation might have been with a proprietary library…

Integrating OmegaT with MyMemory

Thursday, August 29th, 2013

MyMemory (created by the Italian company provides an open translation memory (TM) server that can be accessed for free by everyone. There are some limitations on the number of queries a user can run, but for the average freelance translator it should be sufficient. In addition to manual searches, MyMemory offers an open API that can be used to retrieve TM contents. These results are returned in either JSON or TMX format and they also include a machine-translation match from Google.

I have recently integrated the MyMemory TM server into OmegaT as a machine translation plugin. Initially, I went for the JSON format, which is easier to parse, but alas! I noticed too late that the JSON library I was using wasn’t compatible with OmegaT’s GPL v2 license (which has been updated to v3 in the meantime). So, I went for TMX instead, which requires some XML parsing and XPath rules to extract the right contents.

During testing we noticed that more often than not the results provided by the TM server were not that helpful. This is not too surprising, since retrieving meaningful matches this way is only possible, if someone else has translated a very similar sentence for this particular language combination before and uploaded it to MyMemory. Therefore, I separated the plugin into an MT plugin and a TM plugin. Both can be used independently. The TM contents could still be useful, of course, for instance in team projects. In any case, the users should be careful when sending contents to MyMemory that they don’t violate any copyrights or non-disclosure agreements. After all, anything you send to MyMemory will be available to the rest of the world.

The update is available in OmegaT v3.0.4 Update 2. Have fun playing around with the plugin and if you have any comments, send me a note: martin (at) wunderlich DOT com. Thank you.

HTML2TMX – A tool to grab bilingual content from the web and import into your CAT tool

Tuesday, April 16th, 2013

Recently, on the OmegaT mailing list a user was describing a problem that others might have faced before. Imagine you have come across a website that offers sentences in a bilingual table format, such as the search results provided by Linguee:

Or you might have some legacy content in a HTML file that you would like to use in your favourite CAT tool.

Well, after a bit of research, this problem turned into a little side project of mine and has led to a tool which I called “HTML2TMX” (please do let me know, if you have a better name).

I have published the files for “HTML2TMX” here
and here\

and the source code (under LGPL) here:

The tool is written in Java, which means it runs on any platform (Mac, Linux, Windows…). It turns any HTML table into a TMX file in a two step process:

- First, you need to tell the tool where to find the table; this can be a URL or a file on your local file system. HTML2TMX will then extract the header information from this table, so that you can select which column of the table maps to which language.

- Second, you run the tool again, this time providing the link to the table, the mapping information and the filename to the TMX file.

Once you have the TMX created, you just need to import this into your preferred CAT tool. At the moment, there HTML2TMX has the limitation of providing a command line interface only. The advantage is that you can use the tool in scripts of your own creation, but the downside is, of course, that many translators would probably prefer a user-friendly GUI. If there is enough interest (which you can express by sending me an email: martin AT wunderlich DOT com), I am more than happy to also create the GUI and perhaps even include this functionality as a new feature in OmegaT.


Un-conferencing again – remember the date: Friday, 19th of Oct., at Localization World, Seattle

Wednesday, September 19th, 2012

OK, I am biased on this one. As one of the co-organisers of the first two European localization un-conferences in Dublin (in 2009 and 2010), I can only say: Go to this event, if you are in the area! It is a phenomenal opportunity to put those valuable coffee break conversations into the centre. No sales talk, no powerpoints, just straightforward, down-to-earth exchanges with your peers. Have fun and let me know how it went.

German GUI localization of OmegaT updated

Thursday, September 13th, 2012

I have brought German version of OmegaT to the current state of things. A lot has happened since the last update of the German GUI and so there were 200+ new segments (out of ca. 750 in total) – this by itself speaks for the development of the tool.

In case you haven’t heard of it, OmegaT is probably the most popular open-source CAT tool (or TEnT for translation environment tools as Jost Zetzsche calls them). It is a Java application and therefore runs on all platforms – Windows, Mac, Linux, etc. OmegaT has been designed with openness in mind and therefore supports many formats that other tools have been ignoring, such as PO or OpenOffice. Also, open standards play a big role in the architecture (TMX, XLIFF, SRX…) and the TM created by the tool is directly accessible in TMX form in the projects folder.

Part of the openness means that everyone can contribute their localised version of the GUI and the documentation. If you also would like to join the OmegaT translation team, have a look at the website and/or join the mailing list.

Open-Source-Schnitzeljagd für München Schwabing

Friday, July 20th, 2012


im Rahmen einer privaten Feier haben wir eine Schnitzeljagd in München Schwabing organisiert. Start- und Endpunkt waren der Kaiserplatz. Für den Fall, dass jemand etwas Ähnliches arrangieren will, veröffentliche ich hiermit die Materialien als Open-Source unter der Lizenz “Creative Commons CC BY-SA” (siehe

In der Zip-Datei sind enthalten:

- Spielfeld (Stadtplan)

- Fragen auf Deutsch und Englisch in OpenOffice-Format (odt)

- Bildstrecke (modelliert nach dem SZ-Sommerrätsel, siehe

- Translation Memories der Übersetzung in TMX-Format (erstellt mit OmegaT)

- Antworten zu den Fragen

Alle persönlichen Daten wurden aus Gründen des Datenschutzes entfernt. Bei der Erstellung des Rätsel war vor allem folgendes Buch sehr hilfreich:

Bauer, Reinhard und Knuth Weidlich, Schwabing, Unverhau Verlag, 2. Aufl. 1997

Das Material kannst du hier herunterladen: Open-Source-Schnitzeljagd für München Schwabing

Feedback und Fragen bitte an martin DOT wunderlich REMOVETHISWORD ÄT gmx DOT net.

(die Kommentarfunktion des Blogs habe ich aufgrund der Spam-Flut deaktiviert)

Viel Spaß!

I’ve been reminded of my old linklist of open-source tools for translators

Tuesday, May 3rd, 2011

The website, run by a German publishing house, is one of the most important sources of information for all things IT in Germany. They are also the makers behind the magazine “C’T” – a near must-read for German IT professionals. I am mentioning this here, because I have recently noticed that they have published a good article on open-source software for translators:

One of the links on the second page of the article points to my old link collection of open-source software, tools, and utilities for translators (also in German: DE). There is a reason why the site with the link collection isn’t active anymore and has been replaced by this blog: I haven’t had the time to properly maintain the list for the past few years. A lot has happened since it was last updated. However, quite a few of the links are still valid and the reader might discover something valuable. So, have a look around.

(And whenever I find the time, I will convert the stuff to a Wiki so that the maintenance work doesn’t rest on my shoulders alone :)   )


Date set for localisation un-conference Dublin in 2011

Tuesday, February 8th, 2011

Save the date: 12th of May it is (Thursday). Details will follow soon, we are currently in the process of updating the website (