We can reduce gender bias in natural-language AI, but it will take a lot more work

Thanks to breakthroughs in natural language processing (NLP), machines can generate increasingly sophisticated representations of words. Every year, research groups release more and more powerful language models -- like the recently announced GPT-3, M2M 100, and MT-5 -- that are able to write complex essays or translate text into multiple languages with better accuracy than previous iterations. However, since machine learning algorithms are what they eat (in other words, they function based on the training data they ingest), they inevitably end up picking up on human biases that exist in language data itself.

This summer, GPT-3 researchers discovered inherent biases within the model’s results related to gender, race, and religion. Gender biases included the relationship between gender and occupation, as well as gendered descriptive words. For example, the algorithm predicted that 83% of 388 occupations were more likely to be associated with a male identifier. Descriptive words related to appearance, such as “beautiful” or “gorgeous” were more likely to be associated with women.

When gender (and many other) biases are so rampant in our language and in the language data we have accumulated over time, how do we keep machines from perpetuating them?

What is bias in AI?

Generally speaking, bias is a prejudice for or against one person or group, typically in a way considered to be unfair. Bias in machine learning is defined as an error from incorrect assumptions in the algorithm or, more commonly, systemic prediction errors that arise from the distribution properties of the data used to train the ML model. In other words, the model consistently makes the same mistakes related to certain groups of individuals.

In NLP, both kinds of bias are relevant. The pre-existing biases in our society affect the way we speak and write. Written words are ultimately used to train machine learning systems. When we train our models using biased data, it gets incorporated into our models, which preserves and confirms existing biases.

This happens because machines consume language differently than humans. Simply put, words are represented by lists of numbers called word embeddings that encode information about the word’s meaning, usage, and other properties. Computers “learn” these values for every word after consuming training data of many millions of lines of text, where words are used in their natural contexts.

Since word embeddings are numbers, they can be visualized as coordinates in a plane, and the distance between words — more precisely, the angle between them — is a way of measuring how similar they are semantically. These relationships can be used to generate analogies.

Some terms, like king and queen, are inherently gendered. Other terms, such as those related to occupation, should not be intrinsically gendered. However, in the GPT-3 research example cited above, the machine guessed that professions demonstrating higher levels of education were heavily male leaning (such as banker, or professor emeritus), while professions such as midwife, nurse, receptionist, and housekeeper were heavily female leaning. Professions qualified as “competent” were heavily male leaning. Results like this happen again and again within different machine learning models and algorithms, not to single out GPT-3 alone.

These are obviously not the ideal outcomes. Machine learning systems are no better than the data they consume. Most people assume that more data yields better-performing models. Often, the best way to get more data is to choose large, web-crawled datasets. Since the internet and other content is made up of real, human language, the data will naturally exhibit the same biases that humans do. Unfortunately, not enough attention is paid to the content within these web-crawled datasets.

Reducing AI’s gender bias

Not only are some of the analogies generated by machine learning models offensive, they are also inaccurate. If we want machine learning systems to be more accurate and fair, having humans in the loop is one of the best ways to reduce the risk of gender-biased training data. Humans can correct machines’ errors and provide feedback that helps refine algorithms over time. But certainly there are more fundamental steps that machine learning engineers can take to reduce gender bias in NLP systems.

One of the most intuitive methods is to modify the training data. If we know our models learn bias from data, perhaps de-biasing data is the best approach. One such technique is “gender-swapping,” where the training data is augmented so that for every gendered sentence, an additional sentence is created, replacing pronouns and gendered words with those of the opposite gender, and substituting names with entity placeholders. For example, “Mary hugged her brother Tom” would also create “NAME-1 hugged his sister NAME-2.”

This way, the training data becomes gender-balanced and also does not learn any gender characteristics associated with names. For example, this approach would prevent gendered career analogies given by the model, because it would have seen computer programmers in male and female contexts an equal number of times.

It is important to note that this approach is straightforward for English but much more challenging for other languages. For example, in romance languages, such as French, Portuguese, or Spanish, there is no neutral grammatical gender. Adjectives and other modifiers in these languages express gender, as well. As a result, a different approach is required.

Another method specific to machine translation that helps translations be more gender-accurate involves adding metadata to the sentences that stores the gender of the subject. For example, while the sentence “You are very nice” is gender-ambiguous in English, if the parallel Portuguese sentence was “Tu és muito simpática,” a tag could be added to the beginning of the English sentence so the model could learn the correct translation. After training, if someone requests a translation and supplies the desired gender tag, the model should return the correct one and not just the majority gender.

If the Hungarian-English system was trained in this way, we could ask it to translate “Ő egy orvos” and receive the translation “She is a doctor,” or “Ő egy nővér” and “He is a nurse.” To perform this at scale, an additional model would need to be trained that classifies the gender of a sentence and uses it to tag the sentences, adding a layer of complexity. While these methods may reduce gender bias in NLP models, they are time-consuming to implement. In addition they require linguistic information that might not be readily available or possible to get.

Thankfully, this topic is becoming a fast-growing area of research. For example, in 2018 Google announced that Google Translate would return translations of single words from four languages to English in both the feminine and masculine form. Researchers from Bloomberg have recently collaborated on best-practices for human annotation of language-based models. Many research organizations, like the Brookings Institute, are focused on ways to reduce consumer harms that come from biased algorithms, most recently with voice and chatbots. Everything from hiring practices, to loan applications, to the criminal justice system can be affected by biased algorithms.

Despite these advances, there are more systemic problems that come from a lack of diversity in AI and tech as a whole. Overall, equal gender representation would increase the tech industry’s awareness of bias issues. If AI systems are built by everyone, they would be more unbiased and inclusive of everyone, as well.

Alon Lavie is VP of Language Technologies at Unbabel. Christine Maroti is AI Research Engineer at Unbabel.

What is bias in AI?

Reducing AI’s gender bias

More