Facebook Just Set A New Record In Translation And Why It Matters


Hieroglyphics are seen on the sarcophagus once belonging to the judge and prime minister Gemenefherbek of Sais at the Egyptian Museum in Turin, Italy, Tuesday, February 21, 2006. Photographer: Adam Berry/Bloomberg News

Research at Facebook just made it easier to translate between languages without many translation examples. For example, from Urdu to English.

Neural Machine Translation

Neural Machine Translation (NMT) is the field concerned with using AI to translate between any language such as English and French. In 2015 researchers at the Montreal Institute of Learning Algorithms, developed new AI techniques [1] which allowed machine-generated translations to finally work. Almost overnight, systems like Google Translate were orders of magnitude better.

While that leap was significant, it still required having sentence pairs in both languages, for example, “I like to read” (English) and “me gusta comer” (Spanish).  For translations between languages like Urdu and English without many of these pairs, translation systems failed miserably. Since then, researchers have been building systems that can translate without sentence pairings, ie: Unsupervised Neural Machine Translation (UNMT).

In the past year, researchers at Facebook, NYU, University of the Basque Country and Sorbonne Universites, made dramatic advancements which are finally enabling systems to translate without knowing that “house” means “casa” in Spanish.

Just a few days ago, Facebook AI Research (FAIR), published a paper [2] showing a dramatic improvement which allowed translations from languages like Urdu to English. “To give some idea of the level of advancement, an improvement of 1 BLEU point (a common metric for judging the accuracy of MT) is considered a remarkable achievement in this field; our methods showed an improvement of more than 10 BLEU points.” 

Why this matters

Lausanne, Switzerland – May 27, 2016: A test person at a neuroscience lab of the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland, is wearing a Brain-Computer Interface (BCI) hood that is translating brain activity into signals that control a computer. BCI technology aims to allow people with limited mobility to increase their independence or to enable completely paralyzed patients to communicate with their environment. At this specific lab session, the test person, who is paraplegic and sitting in a wheelchair, is training to pilot an avatar through a computer game just with his thoughts. The game is one of the disciplines at ‘Cybathlon’, a championship for racing pilots with disabilities which will take place in October 2016 in Kloten, Switzerland. The team of EPFL neuroscientists is positive about winning ‘Cybathlon’s’ BCI race: at the end of the training session, a new personal high-score was set.

Labeled data is often the largest bottleneck in AI systems. This means we’d have to pay humans to do manual translations which can be prohibitively time consuming and expensive. The advancements this recent paper highlights can provide new ways of training systems without needing to generate this labeled data. Some examples could be, determining if there’s a cat in a photo without any examples of photos labeled as “cat” or question-answer systems where the system isn’t told the correct answer.

From a social sciences perspective, it could allow us to translate documents written in lost languages, or allow new devices that can translate between rare languages in real-time, for example, Swahili and Belarusian.

We could also imagine abstracting this idea to translate between arbitrary domains. For example, “translate” between neural activity in the brain to videos on a screen, or performance of a stock given some news event, to projected performance of another stock given a similar news event.

How it works

Byte-pair encodingsWilliam Falcon

Here I explain how the system works without getting into the nitty-gritty details of the math and AI principles.

Facebook’s system identifies and combines 3 core components developed in previous research:

  1. Byte-pair encodings [3]: Instead of giving the system whole words, they give the system the word in parts. For example the word “hello” might be given as 3 words “he” “ll” “o”. This means we could learn a translation for the word “he” without the system ever having seen the word “he”.
  2.   Language model: They train other neural networks to learn to generate sentences that “sound good” in the language. For example, this neural network might change the sentence “how is you” to “how are you”.
  3. Back-translation [4]: This is a trick where another neural network learns to translate backward. For example, if you want to translate from Spanish to English, here we’d teach a neural network to translate from English to Spanish and use it to generate synthetic data, thereby increasing the amount of data we have.

The rest of the system combines the above techniques through two approaches, a neural network-based system (NMT) [1], and a phrase-based system (PBSMT) [5]. While either approach improves translation qualities, using both creates the new impressive results.

Learning to close a cup

Illustration of a system learning to move pixels in one image to generate the second image. An analogy for matching probability distributions.William Falcon

The version of PBSMT used in this paper is one developed previously at FAIR [5]. This system learns a probability distribution for the phrases in each language and teaches another system to rotate the data points in the second set to match that of the first.

Example: Imagine having two images, one where a cup and a lid are next to each other and one where the lid is on the cup. The system would learn how to move the pixels around the image without a lid to generate the image with a lid.

Nice parts of this research

FAIR researchers did an amazing job at making this work accessible.

  • This nice post has a slightly more technical description of the research.
  • Facebook also opened access to the code for free, allowing anyone to build these systems.
  • Finally, the authors did a great job with an Ablation study which looked at the effect of removing each component of the system to look at the final results. This is often overlooked in research papers but provides great insights to us as researchers about what part of the new system is the source of these improvements.


[1] Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014).

[2] Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc’Aurelio Ranzato. “Phrase-Based & Neural Unsupervised Machine Translation.” arXiv preprint arXiv:1804.07755 (2018).

[3] Sennrich, Rico, Barry Haddow, and Alexandra Birch. “Neural machine translation of rare words with subword units.” arXiv preprint arXiv:1508.07909 (2015).

[4] Sennrich, Rico, Barry Haddow, and Alexandra Birch. “Improving neural machine translation models with monolingual data.” arXiv preprint arXiv:1511.06709 (2015).

[5] Conneau, Alexis, et al. “Word translation without parallel data.” arXiv preprint arXiv:1710.04087 (2017).

Articles You May Like

A remarkably preserved sandstone sphinx was just discovered in Egypt
Harrison Ford: “Stop giving power to people who don’t believe in science”
Geologists just found a long-lost bridge between the UK and the rest of Europe
Our Daily Cup Of Coffee Is Contaminating The Groundwater
A classic school experiment in India has gone terribly wrong, injuring 59 people

Leave a Reply

Your email address will not be published. Required fields are marked *