Machine Translation – Road to Hell or Heaven Sent?

The other day one of my mining clients rang me and said: “Tea, we have 1,000,000 words here in Brazilian Portuguese and I need them translated into English within a few days – I need to know what it says.

In the past I would have quoted the fabulous words of Darryl Kerrigan, played by Michael Caton in the wonderful Australian film The Castle. Darryl’s abiding claim to fame and his response to almost anything was: “You’re dreaming.”

A million words? In a couple of days? Was my client out of his mind? I can work magic but there are limits! Well, you might think I was out of my mind but instead of channeling Darryl, I didn’t bat an eyelid and just said: “Sure, let’s draw up the schedule and get started ASAP.”

The text consisted of request for tender descriptions for the mining industry and my client, a mining corporation, wanted to know if any of the tenders were relevant to them. Time was running out and in the past they would have simply missed out in a potential tender opportunity.

Machine Translation (MT) combined with post editing was the only solution and is gold standard when “for information only quality” of high volume is required.

Err…you mean… you pasted the text into Google Translate? No, surely not! For intelligent use of MT has absolutely nothing to do with Google Translate for us language professionals.

As I get asked daily about machine versus human translations, I thought I’d give you some handy pointers about MT, what it really is, when to use it and its benefits and pitfalls.

What is MT?keyboard

Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as Portuguese) to another (such as English).

Why is this so difficult?

To process any translation, human or automated, the meaning of a text in the original language must be fully restored in the target language, i.e. the translation. While on the surface this seems straightforward, it is far more complex. Translation is not a mere word-for-word substitution. A translator must interpret and analyse all of the elements in the text and know how each word may influence another. This requires extensive expertise in grammar, syntax, semantics etc., in both languages, as well as familiarity with each local region.

booksWhat is Rule-based versus Statistical MT?

Rule-based machine translation relies on countless built-in linguistic rules and bilingual dictionaries for each language pair.

Statistical machine translation utilises statistical translation models whose parameters stem from the analysis of monolingual and bilingual corpora. You can also combine them and have a hybrid MT system.

For us, MT is mostly a customised engine fed with high quality data (e.g. millions of translated words of, say, mining terminology) hosted on a private server.

What’s the quality like?

scrableYou know the old principle: Garbage in – Garbage out. Your computer output is only as good as the quality of the data you have been feeding your engine. In our case, a couple of million words of previously translated mining material had been fed into the engine and still the text needed to be “cleaned up” by so-called “post editors”, qualified translators or linguists who are trained for post-editing.

So, it’s not publication quality?

I always say: Quality is what the client defines as quality. In this case, my client required the translation “for information only” purposes. The first part required “raw MT output only”. They needed to run a keyword search to isolate any relevant calls for tender. The quality was good enough for this, certainly neither for publication nor for easy reading. Once the small number of calls for tender were identified that were at all potentially of interest, the rest could be discarded immediately and it left us with only 8% of the text to look at. These 80,000 words had to be “cleaned” up by our mining translators, i.e. post editors so they made sense. This was so-called “Light Post Editing”, still for information purposes, hence still no publication quality, but it suited the purpose.

What do Post Editors do?

posteditorThe linguists that are “cleaning up” the raw MT text output have specific instructions according to the degree of post editing (PE) required. Light, medium or full PE, they might just ensure correct use of terminology and not worry about style, grammar and preferences, or they may re-translate some sections. In this case, the “lightly post edited” section (the 8%) revealed that only two calls for tender were relevant and these two sections of 25,000 words received “Full Post Editing”, the client was given these and could respond to them in time.

What kind of texts are suitable for MT?

Companies like Bosch and Siemens have been using MT+PE for many years. Their technical writers have specific pre-editing guidelines on how to write, so their documentation is suitable for MT in the first place: Avoid long sentences, use language logically and precisely (adhere to the literal meaning of words), avoid idioms, figurative language and cultural references, don’t omit words, and adhere strictly to punctuation rules.

When should you not even consider MT?docs

Yes, you guessed it. Literary and any kind of rhetorical texts are not suitable for MT. Life sciences/medical and legal texts as well as marketing and sales & advertising copy and corporate communication in general just to name a few.

What’s a smart engine?

Your MT improves over time; therefore it is a long-term investment. Every time the post editors correct the machine translated text, this is fed back immediately into the MT engine and the output is improved instantly for the rest of the document.

How expensive is it?

This depends on the language combination, industry, volume and how much data including previous TMs (translation memories) and existing translations are available. You pay subscription fees for the MT engine + post editing. Price per 100 words for MT output differs per language but can be down to 20% of that of traditional translation. The real ROI does not come immediately. Time saving, however, is enormous.

So why not just use Google Translate? It’s free after all!

Well, it’s not free if you connect it to your system through a plug-in – but that’s not the issue. Firstly, googletransGoogle owns all content that is passed through Google Translate, so there goes your confidentiality through the window. We translators adhere to a strict code of ethics including confidentiality, and that would be lost immediately. Secondly, it’s not a private customised engine, but a general one that is fed with any data and unauthorised edits. It has come a long way and can show surprisingly good results in certain languages for certain sentences and will always remain a fantastic option for non-professional use by private people who want to get the gist of something non-confidential.

Back to my client… well, MT solution was heaven sent for them as they could submit the tenders on time; but it can surely be a road to hell if you rely on the raw MT output and you and your company lose credibility by providing ridiculous linguistic material.

castleAnd as far as The Castle is concerned – by keeping up with latest technologies and international industry best practices, constant innovation and being aware of what is fit for purpose, I don’t quote Darryl Kerrigan as much as I used to years ago.

You are dreaming? Not anymore! I seem to find language solutions for the impossible these days.

Try me!

Written by Tea C. Dietterich, CEO, 2M Language Services.