Facebook has used artificial intelligence to improve its machine translation models, allowing the social media giant to recognize and translate 24 new languages. The new language translation pairs include Serbian and Belarusian to English in Europe; Zulu and Somali to English in Africa; and Cambodian and Mongolian to English in Asia.
More than 6 billion translations take place on Facebook every day. The new language pairs add to the more than 4,000 available on Facebook today, but are distinct in that they tackle a longstanding problem in machine translation.
Part of the challenge has been the inability to find parallel translations of a book in, for example, both English and Pashto in order to make a level comparison. Therefore Facebook reached a limit in its ability to carry out translations using supervised learning, the method used to train Translate, Facebook’s neural machine translation system, which was open-sourced and made available to the public earlier this year.
To overcome this, Facebook manually translated and labeled a selection of public Facebook posts.
“In total, millions of words were manually labeled in 25 languages,” according a blog post to explains the news.
The system also draws on multilingual language modeling to identify words in a sentence and on back-translation to improve translations.
Both of these methods were also used in unsupervised machine translation training methods revealed last month by Facebook that are also being put into practice to translate more languages where Facebook believes it doesn’t have enough data for more traditional approaches.
Members of Facebook AI Research (FAIR) as well as the Facebook Applied Machine Learning division of the company are actively working to apply unsupervised translation to translate between languages like Urdu and English, FAIR Paris lab director Antoine Bordes told VentureBeat in a phone interview.
Methods like these could allow Facebook not only to recognize language pairs with a small amount of known translations, but, Bordes believes, could even let Facebook decipher languages from another planet.
“We could go now on a planet where people speak a language that nobody else speaks — OK, the aliens — and you can actually go and try to have a decent translation of what is said there,” he said.
Also announced today: Facebook released a paper explaining how it made Rosetta, a computer vision system built to extract text from images on Facebook and Instagram in order to moderate content. Rosetta currently processes one billion images daily.