16Apr/13Off

HTML2TMX – A tool to grab bilingual content from the web and import into your CAT tool

April 16th, 2013

Recently, on the OmegaT mailing list a user was describing a problem that others might have faced before. Imagine you have come across a website that offers sentences in a bilingual table format, such as the search results provided by Linguee:

http://www.linguee.com/english-german/search?source=auto&query=babelfish

Or you might have some legacy content in a HTML file that you would like to use in your favourite CAT tool.

Well, after a bit of research, this problem turned into a little side project of mine and has led to a tool which I called “HTML2TMX” (please do let me know, if you have a better name).

I have published the files for “HTML2TMX” here
http://www.martinwunderlich.com/download/HTML2TMX.zip
and here
http://f1.grp.yahoofs.com/v1/qPpWUUzYC6Z2o-_-NzWpv9mLWrM5pyFMSoltE_gKv22YFl6wsbz\
Jo7fgAboVqmCcR9XgzZZUhYYll2L65FsRFGrDx8JpQTM/5-%20Macros%20and%20tools/Other%20t\
hings/HTML2TMX.zip

and the source code (under LGPL) here:
https://github.com/mwunderlich/HTML2TMX/tree/master/src

The tool is written in Java, which means it runs on any platform (Mac, Linux, Windows…). It turns any HTML table into a TMX file in a two step process:

- First, you need to tell the tool where to find the table; this can be a URL or a file on your local file system. HTML2TMX will then extract the header information from this table, so that you can select which column of the table maps to which language.

- Second, you run the tool again, this time providing the link to the table, the mapping information and the filename to the TMX file.

Once you have the TMX created, you just need to import this into your preferred CAT tool. At the moment, there HTML2TMX has the limitation of providing a command line interface only. The advantage is that you can use the tool in scripts of your own creation, but the downside is, of course, that many translators would probably prefer a user-friendly GUI. If there is enough interest (which you can express by sending me an email: martin AT wunderlich DOT com), I am more than happy to also create the GUI and perhaps even include this functionality as a new feature in OmegaT.

Martin

28Oct/120

Visualizing recipes – a novel approach by Sascha Wahlbrink

October 28th, 2012

(this post is about _cooking_ recipes; not design patterns or stuff like that :)

I recently came across a cook book by Sascha Wahlbrink that caught my attention due to the amazing new approach to recipes. It uses a diagram type that somewhat resembles UML activity diagrams instead of the traditional textual approach with a simple list of steps to follow. This new approach has a number of advantages:

  • You can see at a glance what utensils you will need during the preparation.
  • The complexity or simplicity of the receipe becomes obvious due to the diagram structure.
  • Stuff that is meant to happen in parallel in differnent places (or different pots and pans) is easily visible.
  • And finally, time gaps become obvious (e.g. you don’t want to see an instruction like “Now freeze for 4 hours in your freezer” just 10 minutes before your guests arrive).

 

I have given the approach a try by encoding a mildly complex pasta recipe found on the German “Chefkoch” community. The tool I used was “Dia“, an open-source alternative to M$ Visio. Here are the results (available under creative commons license CC-BY-SA):

 

I have also tried creating a small library of the symbols in Dia, but this didn’t work, because Dia’s extension mechanism didn’t allow for dynamic text in custom shapes. Too bad. But you can easily copy and paste from the Source file provided above to replicate this approach and create your own recipes.

Enjoy!

9Oct/12Off

WordPress commenting dis-allowed, due to spam

October 9th, 2012

I have had to switch off the commenting function here, due to the incredibly amount of commenting and trackback spam that I have been getting. If you would like to comment on any post, please send an email to martin ät wunderlich dot com. Thanks a lot.

19Sep/120

Un-conferencing again – remember the date: Friday, 19th of Oct., at Localization World, Seattle

September 19th, 2012

OK, I am biased on this one. As one of the co-organisers of the first two European localization un-conferences in Dublin (in 2009 and 2010), I can only say: Go to this event, if you are in the area! It is a phenomenal opportunity to put those valuable coffee break conversations into the centre. No sales talk, no powerpoints, just straightforward, down-to-earth exchanges with your peers. Have fun and let me know how it went.

19Sep/120

A new blog for localization geeks – espell labs blog

September 19th, 2012

The world of language is full of technology – at least so in the area of localization and business translation. And, consequently, it is a fantastic playground for the geekily inclined. Most translation service providers maintain a small zoo of CATs and larger ones even employ professional CAT herders. Internal processes need to be automated and re-engineered to stay on top of the competition in an era of grossly underpaid translation services and not-so-generous profit margins.
In come the language-loving nerds to save the world. And some even write blogs, such as the newly started espell Labs blog. Have a look here.

13Sep/120

German GUI localization of OmegaT updated

September 13th, 2012

I have brought German version of OmegaT to the current state of things. A lot has happened since the last update of the German GUI and so there were 200+ new segments (out of ca. 750 in total) – this by itself speaks for the development of the tool.

In case you haven’t heard of it, OmegaT is probably the most popular open-source CAT tool (or TEnT for translation environment tools as Jost Zetzsche calls them). It is a Java application and therefore runs on all platforms – Windows, Mac, Linux, etc. OmegaT has been designed with openness in mind and therefore supports many formats that other tools have been ignoring, such as PO or OpenOffice. Also, open standards play a big role in the architecture (TMX, XLIFF, SRX…) and the TM created by the tool is directly accessible in TMX form in the projects folder.

Part of the openness means that everyone can contribute their localised version of the GUI and the documentation. If you also would like to join the OmegaT translation team, have a look at the website and/or join the mailing list.

5Aug/120

How to fix a cracked iPhone screen

August 5th, 2012

I recently dropped my iPhone in the street – really bad, really hard. The screen was cracked and looked somewhat like this:

The phone was still working, but with the cracked screen, it wasn’t a satisfactory state. So, rather than buying a new iPhone (which would be a waste of money and resources, besides being very un-ecological), I looked around a bit and found some great instructinos on how to replace the screen:
On Vimeo.
And on Ifixit.

On ebay you can easily find repair sets, either with tools or just the required glass.

Here are some lessons learnt:
- When buying the repair set, make sure to get the right one for your iPhone model. There are some cable plugs that are different, for instance, from 3G to 3GS.
- You don’t necessarily need to buy a set with tools, if you have a very small Philips screwdriver at home.
- If you do buy the glass only, make sure it contains the required double-sided sticker tape, too. It was missing in my case and I used a double sided carpet tape instead. It works, but doesn’t hold the glass as tightly and the surface isn’t fully flush with the frame.

All in all, the repair didn’t take too long and it will take a lot less time the next time around. The cost was around 9 Euros for the glass – a lot less than a new iPhone. Plus you have the satisfaction of having fixed the iPhone yourself!

20Jul/12Off

Open-Source-Schnitzeljagd für München Schwabing

July 20th, 2012

Hi,

im Rahmen einer privaten Feier haben wir eine Schnitzeljagd in München Schwabing organisiert. Start- und Endpunkt waren der Kaiserplatz. Für den Fall, dass jemand etwas Ähnliches arrangieren will, veröffentliche ich hiermit die Materialien als Open-Source unter der Lizenz “Creative Commons CC BY-SA” (siehe http://en.wikipedia.org/wiki/Creative_Commons#Types_of_Creative_Commons_licenses).

In der Zip-Datei sind enthalten:

- Spielfeld (Stadtplan)

- Fragen auf Deutsch und Englisch in OpenOffice-Format (odt)

- Bildstrecke (modelliert nach dem SZ-Sommerrätsel, siehe www.sz.de)

- Translation Memories der Übersetzung in TMX-Format (erstellt mit OmegaT)

- Antworten zu den Fragen

Alle persönlichen Daten wurden aus Gründen des Datenschutzes entfernt. Bei der Erstellung des Rätsel war vor allem folgendes Buch sehr hilfreich:

Bauer, Reinhard und Knuth Weidlich, Schwabing, Unverhau Verlag, 2. Aufl. 1997

Das Material kannst du hier herunterladen: Open-Source-Schnitzeljagd für München Schwabing

Feedback und Fragen bitte an martin DOT wunderlich REMOVETHISWORD ÄT gmx DOT net.

(die Kommentarfunktion des Blogs habe ich aufgrund der Spam-Flut deaktiviert)

Viel Spaß!

13May/120

Countries I’ve visited – nice visualisation based on Google maps

May 13th, 2012


visited 42 states (18.6%)
Create your own visited map of The World or website vertaling duits?

19May/110

Having fun with Google translate

May 19th, 2011

By now it probably has become common knowledge that you can use google translate to entertain yourself with some beatbox sounds. The google translate blog points to a few other very creative uses of the tool, such as ordering food in Hindi, singing a Taiwanese song, understand what your pet has been trying to tell you.

Cheers,

Martin