Salesforce researchers claim new method mitigates AI models' gender bias

Researchers at Salesforce and the University of Virginia have proposed a new way to mitigate gender bias in word embeddings, the word representations used to train AI models to summarize, translate languages, and perform other prediction tasks. The team says correcting for certain regularities -- like word frequency in large data sets -- allows their method to "purify" the embeddings prior to inference, removing potentially gendered words.

Word embeddings capture semantic and syntactic meanings of words and relationships with other words, which is why they're commonly employed in natural language processing. But they've been criticized for inheriting gender bias, which ties the embedding of gender-neutral words to a certain gender. For example, while "brilliant" and "genius" are gender-neutral by definition, their embeddings are associated with "he," while "homemaker" and "sewing" are more closely associated with "she."

Previous work has aimed to reduce gender bias by subtracting the component associated with gender from embeddings through a post-processing step. But while this alleviates gender bias in some settings, its effectiveness is limited because the gender bias can still be recovered post-debiasing.

Salesforce's proposed alternative -- Double-Hard Debias -- transforms the embedding space into an ostensibly genderless one. That is, it transforms word embeddings into a "subspace" that can be used to find the dimension that encodes frequency information distracting from the encoded genders. It then "projects away" the gender component along this dimension to obtain revised embeddings before executing another debiasing action.

To evaluate their approach, the researchers tested it against the WinoBias data set, which consists of pro-gender-stereotype and anti-gender-stereotype sentences. (For example, "The physician hired the secretary because he was overwhelmed with clients" versus "The physician hired the secretary because she was overwhelmed with clients.") Performance gaps reflect how an algorithm system performs on the two sentence groups and leads to a "gender bias" score.

The researchers report that Double-Hard Debias reduced the bias score of embeddings obtained using the GloVe algorithm from 15 (on two types of sentences) to 7.7 while preserving the semantic information. They also claim that on a visualization (tSNE projection) meant to model embeddings so that similar embeddings are clustered nearest each other and dissimilar ones are spread apart, Double Hard Debias produced a more homogenous mix of embeddings compared with other methods.

It's worth noting that some experts believe bias can't be fully eliminated from word embeddings. In a recent meta-analysis from the Technical University of Munich, contributors claim there's "no such thing" as naturally occurring neutral text because the semantic content of words is always bound up with the sociopolitical context of a society.

Nonetheless, the Salesforce and University of Virginia team believe their technique measurably reduces the gender bias present in embeddings.

"We found that simple changes in word frequency statistics can have an undesirable impact on the debiasing methods used to remove gender bias from word embeddings," wrote the coauthors of the Double-Hard Debias paper. "[Our method] mitigates the negative effects that word frequency features can have on debiasing algorithms. We believe it is important to deliver fair and useful word embeddings, and we hope that this work inspires further research along this direction."

More