Facebook's AI detects gender bias in text

In a technical paper published this week, Facebook researchers describe a framework that decomposes gender bias in text along several dimensions, which they used to annotate data sets and evaluate gender bias classifiers. If the experimental results are any indication, the team's work might shed light on offensive language in terms of genderedness, and perhaps even control for gender bias in natural language processing (NLP) models.

All data sets, annotations, and classifiers will be released publicly, according to the researchers.

It's an open secret that AI systems and the corpora on which they're trained often reflect gender stereotypes and other biases; indeed, Google recently introduced gender-specific translations in Google Translate chiefly to address gender bias. Scientists have proposed a range of approaches to mitigate and measure this, most recently with a leaderboard, challenge, and set of metrics dubbed StereoSet. But few -- if any -- have come into wide use.

The Facebook team says its work considers how humans collaboratively and socially construct language and gender identities. That is, it accounts for (1) bias from the gender of the person being spoken about, (2) bias from the gender of the person being spoken to, and (3) bias from the gender of the speaker. The framework attempts to capture in this way the fact that adjectives, verbs, and nouns describing women differ from those describing men; the way addressees' genders affect how they converse with another person; and the importance of gender to a person's identity.

Leveraging this framework and Facebook's ParlAI , an open source Python toolset for training and testing NLP models, the researchers developed classifiers that decompose bias over sentences into the dimensions -- bias from the gender of the person being discussed, etc. -- while including gender information that falls outside of the male-female binary. The team trained the classifiers on a range of text extracted from Wikipedia, Funpedia (a less formal version of Wikipedia), Yelp reviews, OpenSubtitles (dialogue from movies), LIGHT (chit-chat fantasy dialogue), and other sources, all of which were selected because they contained information about author and addressee gender that could inform the model's decision-making.

The researchers also created a specialized evaluation corpus -- MDGender -- by collecting conversations between two volunteer speakers, each of whom was provided with a persona description containing gender information and tasked with adopting that persona and having a conversation about sections of a biography from Wikipedia. Annotators were asked to rewrite each turn in the dialogue to make it clear they were speaking about a man or a woman, speaking as a man or a woman, and speaking to a man or a woman. For example, a response to “How are you today? I just got off work” might have been rewritten as “Hey, I went for a coffee with my friend and her dog."

In experiments, the team evaluated the gender bias classifiers against MDGender, measuring the percentage accuracy for masculine, feminine, and neutral classes. They found that the best-performing model -- a so-called multitask model -- correctly decomposed sentences 77% of the time across all data sets and 81.82% of the time on Wikipedia only.

In another set of tests, the researchers applied the best-perform classifier to control the genderedness of generated text, detect biased text in Wikipedia, and explore the interplay between offensive content and genderedness.

They report that training the classifier on a data set containing 250,000 text snippets from Reddit enabled it to generate gendered sentences on command, for instance "Awwww, that sounds wonderful" and "You can do it bro!" Separately, the model managed to score paragraphs among a set of biographies to identify which were masculine in the "about" dimension (74% skewed toward masculine, but the classifier was more confident in the femininity of pages about women, suggesting that women's biographies contained more gendered text). Lastly, after training and applying the classifier to a popular corpus of explicitly gendered words, they found that 25% of masculine words fell into "offensive" categories like "sexual connotation."

"In an ideal world, we would expect little difference between texts describing men, women, and people with other gender identities, aside from the use of explicitly gendered words, like pronouns or names. A machine learning model, then, would be unable to pick up on statistical differences among gender labels (i.e., gender bias), because such differences would not exist. Unfortunately, we know this is not the case," wrote the coauthors. "We provide a finer-grained framework for this purpose, analyze the presence of gender bias in models and data, and empower others by releasing tools that can be employed to address these issues for numerous text-based use-cases."

More