Machine Translation through the ages: From the cold war to deep learning

With the disruption of Neural Machine Translation (NMT) in recent years and its application to numerous fields and situations, our idea of Machine Translation (MT) has changed from a practical system that provides fast but poor quality translations to a system that is now capable of producing natural-sounding output and is able to be trained and customised via machine learning and AI.

history machine translation

The research in machine translation and natural language processing has come a long way. Looking back in time you will find below the different systems and models that were developed more or less successfully that led us to today’s NMT.

History of Machine Translation

1933: The origins

In 1933, the Soviet scientist Peter Troyanskii presented a “machine for the selection and printing of words when translating from one language to another” to the USSR’s Academy of Sciences. The invention was based on a typewriter, an old film camera and cards in four different languages.

 

The operator took the first word from the text, found a corresponding card, took a photo, and typed its morphological characteristics (noun, plural, genitive) on the typewriter. The typewriter’s keys encoded one of the features. The tape and the camera’s film were used simultaneously, making a set of frames with words and their morphology.

 

Despite the breakthrough, the invention was not considered useful and no one would know about it until two Soviet scientists found his patents in 1956.

1954: The Georgetown-IBM experiment

At the beginning of the Cold War, on January 7th 1954, at IBM New York headquarters, the IBM 701 computer automatically translated 600 sentences from Russian into English. A first in history, the computer was able to translate at a pace of 2 and a half lines per second.

 

However, the translated samples were carefully selected and tested. The system was not more suitable for everyday use than a simple phrasebook. Nevertheless, this started a race for machine translation between countries such as the US, Germany, France and Japan.

 

From then on, the struggle to improve MT (Machine Translation) lasted for 6 decades resulting in the creation of different models of MT from Rule-Based Machine Translation (RBMT) to Neural Machine Translation (NMT).

The 70″s: Rule-Based Machine Translation (RBMT)

The first rule-based machine translation concepts emerged in the ’70s. The scientist took inspiration in interpreter’s work, trying to program very slow computers to repeat those actions. The system consisted of a bilingual dictionary (RU->EN) and a set of linguistic rules for both languages.

 

PROMT and Systran are the most famous RBMT systems, but even in the RBMT space, there are some nuances and sub-models.

Direct Machine translation

The most basic type of machine translation. It separates the text into words, translates them, slightly corrects the morphology and fine tunes the syntax to make the sentence right, more or less. The output returned a translation of quite poor quality and this model is hardly useful.

Transfer-based Machine Translation

The main difference from Direct Machine Translation is that there is a preparation process to determine the structure of the sentence to be translated. The system was meant to manipulate whole constructions, not words, in order to generate a better output. In theory.

 

Translation outputs were still of poor quality and even though it might have simplified grammar rules it became too complex for the system because of the increased number of word constructions compared to single words.

Interlingua Machine Translation

In this model, the source language is first transformed into an intermediate universal representation (Interlingua) which could then be converted into any target language. This was the main singularity of Interlingua Machine Translation, because of the initial conversion into the intermediate representation, it meant also that we could translate one source text into various languages which was not possible in the transfer-based model.

 

As good as it sounds, it was extremely hard to create such a universal interlingua and scientists dedicating their whole lives to the task did not manage to do so. Although, thanks to them, we now have different levels of language representation such as semantic, syntactic or morphological.

The 80’s: Example-Based Machine Translation (EBMT)

Japan was highly interested in Machine Translation. It was identified very soon that the lack of English speakers in the country would be an issue for the upcoming globalisation. Because of a completely different language structure, Rule-based English-Japanese machine translation is pretty much impossible.

 

In 1984, Makoto Nagao of Kyoto University had the idea of using ready-made phrases instead of repeated translation. Imagine we have to translate a simple sentence – “I’m going to the cinema” and we have already translated the similar sentence – “I’m going to the theatre” and we can find the word “cinema” in the dictionary.

 

All the system needs to do is identify the missing word and then translate it. EBMT unveiled a major breakthrough, the machine can be fed with existing translations, without having to spend years in creating rules and exceptions.

The 90s: Statistical Machine Translation (SMT)

In the early 90s, at the IBM Research Center, a machine translation system was tested based not on rules and linguistics but on previous translations analysis and patterns. The model is based on the fact that, based on statistics of millions of previous translations,  the machine is able to understand a pattern and choose the most adequate translation. “If people translate that way, I will”, and so Statistical Machine Translation was born.

 

The method was more performant than any previous ones and zero linguists were needed. The more data the system had to calculate its statistics, the better the output was.

 

The SMT went through different models over the years as it struggled with different languages dimensions such as word order or when new words need to be added to the output i.e: “Ich möchte keine Persimonen essen -> I do not want to eat persimmons”. The system evolved from word-based SMT to phrase-based SMT which was able to handle word re-ordering and came with a few additional lexical hacks.

 

Phrase-based SMT became the state-of-the-art of Machine Translation and was used by all the high-profile online translators such as Bing, Yandex or Google Translate. The latter would then revolutionise the Machine Translation world a few years later.

2016: The Neural Machine Translation (NMT) revolution

In November 2016, Google made a game-changing announcement announcing the launch of the Google Neural Machine Translation System (GNMT). The idea was similar to transferring style between photos such as programs like Prisma that can turn a photo into a painting imitating famous artists’ style. The neural network is taught to recognise the artist’s paintings to stylise the picture accordingly.

 

If we can transfer the style to a photo, how about imposing a language to a source text? The idea was to be able to translate while keeping the essence of the source text (just like the artist’s style). The source text is encoded into a set of specific features by one neural network and then decoded back into text in the target language by another network. Both networks speak a different language and don’t know about each other but they both can understand the set of features extracted from the source text. This is quite similar to the idea of Interlingua Machine Translation.

 

In a few years, NMT surpassed every system that developed previously and with the implementation of deep learning, it was able to implement improvements without being taught to do so. Because the system relies on an intermediary set of features and not data from previous translations, it allows direct translation between languages where it was not possible before. For example, SMT always worked using English as the source text so to translate from French into Spanish the text had to be translated into English first and then into Spanish adding more errors to the final output.

The future of Machine Translation

Whilst NMT certainly was a game-changer, an output comparable to a high-quality human translation is still a few years away even though it has been claimed by some MT engine developers. The trends lean towards customisation and domain-specific training of machine translation engines (i.e. train an engine to translate scientific content only) or even using hybrid models mixing SMT and NMT to achieve a higher quality output. One thing is for sure: the big tech guys like Amazon, Microsoft & Google are working hard on this and we can expect to see new developments in the near future.

AT 2M we work with a team of computational linguists who can customise NMT engines with training data to suit the domain and terminology of our clients. Our engines are securely hosted on our private servers complying with stringent cybersecurity regulations of the Australian Federal Police, Defence Department and QLD Government. We also use NMT output when assisting Research Institutions in translating vast amounts of content that otherwise would remain untranslated due to lack of budget and time.

We look at “Fit for purpose” and what the intentional use of the translated assets is, in order to determine the most suitable linguistic approach.

Graphics from original article by Vasily Zubarev.