Microsoft taps AI techniques to bring Translator to 100 languages

Today, Microsoft announced that Microsoft Translator, its AI-powered text translation service, now supports more than 100 different languages and dialects. With the addition of 12 new languages including Georgian, Macedonian, Tibetan, and Uyghur, Microsoft claims that Translator can now make text and information in documents accessible to 5.66 billion people worldwide.

Its Translator isn't the first to support more than 100 languages -- Google Translate reached that milestone first in February 2016. (Amazon Translate only supports 71.) But Microsoft says that the new languages are underpinned by unique advances in AI and will be available in the Translator apps, Office, and Translator for Bing, as well as Azure Cognitive Services Translator and Azure Cognitive Services Speech.

"One hundred languages is a good milestone for us to achieve our ambition for everyone to be able to communicate regardless of the language they speak," Microsoft Azure AI chief technology officer Xuedong Huang said in a statement. "We can leverage [commonalities between languages] and use that ... to improve whole language famil[ies]."

Z-code

As of today, Translator supports the following new languages, which Microsoft says are natively spoken by 84.6 million people collectively:

Bashkir
Dhivehi
Georgian
Kyrgyz
Macedonian
Mongolian (Cyrillic)
Mongolian (Traditional)
Tatar
Tibetan
Turkmen
Uyghur
Uzbek (Latin)

Powering Translator's upgrades is Z-code, a part of Microsoft's larger XYZ-code initiative to combine AI models for text, vision, audio, and language in order to create AI systems that can speak, see, hear, and understand. The team comprises a group of scientists and engineers who are part of Azure AI and the Project Turing research group, focusing on building multilingual, large-scale language models that support various production teams.

Z-code provides the framework, architecture, and models for text-based, multilingual AI language translation for whole families of languages. Because of the sharing of linguistic elements across similar languages and transfer learning, which applies knowledge from one task to another related task, Microsoft claims it managed to dramatically improve the quality and reduce costs for its machine translation capabilities.

With Z-code, Microsoft is using transfer learning to move beyond the most common languages and improve translation accuracy for "low-resource" languages, which refers to languages with under 1 million sentences of training data. (Like all models, Microsoft's learn from examples in large datasets sourced from a mixture of public and private archives.) Approximately 1,500 known languages fit this criteria, which is why Microsoft developed a multilingual translation training process that marries language families and language models.

Techniques like neural machine translation, rewriting-based paradigms, and on-device processing have led to quantifiable leaps in machine translation accuracy. But until recently, even the state-of-the-art algorithms lagged behind human performance. Efforts beyond Microsoft illustrate the magnitude of the problem -- the Masakhane project, which aims to render thousands of languages on the African continent automatically translatable, has yet to move beyond the data-gathering and transcription phase. Additionally, Common Voice, Mozilla's effort to build an open source collection of transcribed speech data, has vetted only dozens of languages since its 2017 launch.

Z-code language models are trained multilingually across many languages, and that knowledge is transferred between languages. Another round of training transfers knowledge between translation tasks. For example, the models' translation skills ("machine translation") are used to help improve their ability to understand natural language ("natural language understanding").

In August, Microsoft said that a Z-code model with 10 billion parameters could achieve state-of-the-art results on machine translation and cross-lingual summarization tasks. In machine learning, parameters are internal configuration variables that a model uses when making predictions, and their values essentially -- but not always -- define the model's skill on a problem.

Microsoft is also working to train a 200-billion-parameter version of the aforementioned benchmark-beating model. For reference, OpenAI's GPT-3, one of the world's largest language models, has 175 billion parameters.

Market momentum

Chief rival Google is also using emerging AI techniques to improve the language-translation quality across its service. Not to be outdone, Facebook recently revealed a model that uses a combination of word-for-word translations and back-translations to outperform systems for more than 100 language pairings. And in academia, MIT CSAIL researchers have presented an unsupervised model -- i.e., a model that learns from test data that hasn't been explicitly labeled or categorized -- that can translate between texts in two languages without direct translational data between the two.

Of course, no machine translation system is perfect. Some researchers claim that AI-translated text is less "lexically" rich than human translations, and there's ample evidence that language models amplify biases present in the datasets they're trained on. AI researchers from MIT, Intel, and the Canadian initiative CIFAR have found high levels of bias from language models including BERT, XLNet, OpenAI's GPT-2, and RoBERTa. Beyond this, Google identified (and claims to have addressed) gender bias in the translation models underpinning Google Translate, particularly with regard to resource-poor languages like Turkish, Finnish, Persian, and Hungarian.

Microsoft, for its part, points to Translator's traction as evidence of the platform's sophistication. In a blog post, the company notes that thousands of organizations around the world use Translator for their translation needs, including Volkswagen.

"The Volkswagen Group is using the machine translation technology to serve customers in more than 60 languages -- translating more than 1 billion words each year," Microsoft's John Roach writes. "The reduced data requirements ... enable the Translator team to build models for languages with limited resources or that are endangered due to dwindling populations of native speakers."

Z-code

Market momentum

More