Google is making an effort to reduce perceived gender bias in Google Translate, it announced today. Starting this week, users who translate words and phrases in supported languages will get both feminine and masculine translations; “o bir doktor” in Turkish, for example, now yields “she is a doctor” and “he is a doctor” in English.
Currently, translations from English into French, Italian, Portuguese, or Spanish are supported. Translations of phrases and sentences from Turkish to English, as in the example above, will also show both gender equivalents. (In the Turkish language, the pronoun “o” covers every kind of singular third person.)
James Kuczmarski, product manager at Google Translate, said work has already begun on addressing non-binary gender translations.
“Over the course of this year, there’s been an effort across Google to promote fairness and reduce bias in machine learning,” he wrote in a blog post. “In the future, we plan to extend gender-specific translations to more languages, launch on other Translate surfaces like our iOS and Android apps, and address gender bias in features like query auto-complete.”
Today’s announcement comes shortly after Google blocked Smart Compose, a Gmail feature that automatically suggests sentences for users as they type, from suggesting gender-based pronouns. And it follows on the heels of social media posts purporting to show automated translation apps’ gender bias.
Users noted that words like “engineer” and “strong” in some foreign languages were more likely to be associated with corresponding male words in English — “o bir muhendis” in Google Translate became “he is an engineer,” while “o bir hemsire” was translated to “she is a nurse.”) It’s far from the only example — Apple and Google’s predictive keyboards propose the gendered “policeman” to complete “police” and “salesman” for “sales.”And when Microsoft’s Bing translates “the table is soft” into German, it comes back with the feminine “die Tabelle,” which refers to a table of figures.
It’s an AI training problem, Kuczmarski explained. Word embedding — a common algorithmic training technique that involves linking words to a vector used to calculate the probability of a given word’s language pair — unavoidably picks up, and at worst amplifies, biases implicit in source text and dialogue. A 2016 study found that word embeddings in Google News articles tended to exhibit female and male gender stereotypes.
“Google Translate learns from hundreds of millions of already-translated examples from the web,” Kuczmarski wrote. “Historically, it has provided only one translation for a query, even if the translation could have either a feminine or masculine form. So when the model produced one translation, it inadvertently replicated gender biases that already existed. For example: it would skew masculine for words like ‘strong’ or ‘doctor,’ and feminine for other words, like ‘nurse’ or ‘beautiful.'”
A gender-neutral approach to language translation is a part of Google’s larger effort to mitigate prejudice in AI systems. The Mountain View company uses tests developed by its AI ethics team to uncover bias, and has banned expletives, racial slurs, and mentions of business rivals and tragic events from its predictive technologies.