In its 2018 annual trends report, the Ericsson Consumer Lab unveiled that 63% of the surveyed consumers said they would like “earbuds that can translate in real time”. What is called “augmented hearing” is an emerging trend as part of what consumers are expecting for their next smart device’s features, including the real-time translation feature.
Translation earbuds enable the user to listen to a speech in a foreign language and get a real-time interpretation through the earbuds into his native language eliminating the need for intermediary translation techniques such as interpreting or manual translation.
How do translation earbuds work?
Various translation earbuds have been developed by multiple companies, all relying on different technologies and processes. Let’s take a look at the latest industry addition, Google Pixel Buds. These wireless earbuds make use of Google Assistant, a smart application which can speak to, understand and assist the user. One of the headline features is support for Google Translate which is said to be able to translate up to 40 different languages. Impressive technology for under AUD$250.
Real-time speech translation consists of a chain of several distinct technologies – each of which has experienced rapid improvement in recent years. The chain, from input to output, goes like this:
Input conditioning: the earbuds record background noise and interferences, effectively picking up a mixture of the user’s voice and other sounds. “Denoising” removes background sounds while a voice activity detector (VAD) is used to turn the system on only when the correct person is speaking. Touch control is used to improve the VAD accuracy.
Language identification (LID): this system uses machine learning to identify what language is being spoken within a couple of seconds. This is important because everything that follows is language specific. For language identification, phonetic characteristics alone are insufficient to distinguish languages (languages pairs like Ukrainian and Russian, Urdu and Hindi are virtually identical in their units of sound, or “phonemes”), so completely new acoustic representations had to be developed.
Automatic speech recognition (ASR): ASR uses an acoustic model to convert the recorded speech into a string of phonemes and then language modelling is used to convert the phonetic information into words. By using spoken grammar rules, context, probability and a pronunciation dictionary, ASR systems fill in gaps of missing information and correct mistakenly recognised phonemes to infer a textual representation of what the speaker said.
Natural language processing: NLP performs machine translation from one language to another. This is not as simple as substituting nouns and verbs, but includes decoding the meaning of the input, and then re-encoding that meaning as output in a different language – with all the nuances and complexities that make second languages so hard for us to learn.
Speech synthesis or text-to-speech (TTS): almost the opposite of ASR, this synthesises natural sounding speech from a string of words (or phonetic information). The output is finally sent to TTS software for the target language, producing a compressed recording of the output. This is sent back to be played through the earbuds.
Fierce competition in the wearable translator market
Even though the Google Pixel Buds have made a resounding entry into the market, some companies have been looking into translation earbuds, or wearable translators, since 2014. The market is made of translation technology specialised start-ups, technology giants and even search engines.
For example, German firm Bragi, which designs and develops smart earphones, launched Dash, a translation earpiece in 2014.
US-based company Timekettle has just finished a Kickstarter campaign to secure funding for its WT2 real-time translator project.
The Pilot Translating Earpiece made by Waverly Labs reportedly sold AUD +6m in pre-orders in May.
An IBM Watson-powered earpiece started selling in July 2017 for AUD 240. Lingmo International, an Australian-based start-up launched the device at a UN event in Switzerland in June. It claims the device can handle eight languages.
The push to integrate translation in consumer devices is not limited to earphones. Baidu unveiled a device in September that translates speech in near real-time and, on December 6, rival Chinese search engine Sogou launched its own real-time speech translation system.
Other translation devices that debuted in 2017 in the market include the Ili hand-held translator, the Samsung flagship phone Galaxy Note 8 and Huawei’s top-of-the-line Huawei Mate 10 Pro.
How does this affect the language industry?
All these translation devices are based on machine translation (MT) engines. And whilst MT is improving every day, especially with regards to Neural Machine Translation, none of these technologies have yet been good enough to offer the same quality of translation/interpreting provided by certified translators and linguists. In an industry that powers localisation strategies of global corporations as well as governments’ foreign relation departments, the margin for error is very small.
After Google released its Pixel translation earbuds, the outcome was mainly that the technology failed to deliver its promises. A review in Wired UK said they “aren’t just bad, but utterly pointless.” The Guardian saw the “flawed” earbuds as “a missed opportunity.”, Gizmodo said, “it’s not even close to being good.”
Where Google fails, it’s unlikely to see anyone else succeed. Indeed, any translation earpiece will only be as good as the underlying MT engine powering it. Add to that the need for accent recognition and natural speech output, quality translation and the idea of a fluent conversation in two different languages facilitated by technologies is still a few years away.