Researchers expose biases in datasets used to train AI models

Artificial intelligence (AI) has a bias problem. Word embedding, a common algorithmic training technique that involves linking words to vectors, unavoidably picks up -- and at worst amplifies -- prejudices implicit in source text and dialogue. A 2016 study found that word embeddings in Google News articles tended to exhibit female and male gender stereotypes, for instance.

Fortunately, researchers are making headway in addressing it -- or at least exposing the problem's severity. In a paper published on the preprint server Arxiv.org ("What are the biases in my word embedding?"), scientists at Microsoft Research, Carnegie Mellon, and the University of Maryland describe an algorithm that can expose "offensive associations" related to sensitive issues like race and bias in publicly available embeddings, including supposedly "debiased" embeddings.

Their work builds on a University of California study that details a training solution capable of "preserve[ing] gender information" in word vectors while "compelling other dimensions to be free of gender influence."

"We consider the problem of Unsupervised Bias Enumeration (UBE), discovering biases automatically from an unlabeled data representation," the researchers wrote. "There are multiple reasons why one might want such an algorithm. First, social scientists can use it as a tool to study human bias ... Second, identifying bias is a natural step in 'debiasing' representations. Finally, it can help in avoiding systems that perpetuate these biases: problematic biases can raise red flags for engineers, while little or no bias can be a useful green light indicating that a representation is usable."

The team's model takes as input word embeddings and lists of target tokens, such as workplace versus family-themed words, and uses vector similarity across pairs of tokens to measure the strength of associations. Unsupervised -- i.e., without requiring sensitive groups, such as gender or race, to be prespecified -- it outputs "statistically significant" tests for racial, gender, religious, age, and other biases.

This confers a number of advantages over manual test design, the team says.

"It is not feasible to manually author all possible tests of interest. Domain experts normally create such tests, and it is unreasonable to expect them to cover all possible groups, especially if they do not know which groups are represented in their data ... [And] if a word embedding reveals no biases, this is evidence for lack of bias."

The model leverages two properties of word embeddings to produce the aforementioned tests, according to the team: "parallel" and "cluster" properties. The parallel property takes advantage of the fact that differences between similar token pairs, such as Mary--John and Queen--King, are often nearly parallel; those parallel to name differences in topics may represent biases. Clusters, meanwhile, refer to the fact that normalized vectors of names and words cluster into semantically meaningful groups -- for names, social structures such as gender, religion, and others, and for words, topics such as food, education, occupations, and sports.

To test the system, the researchers sourced sets of first names from a Social Security Administration (SSA) database and words from three publicly available word embeddings, taking care to remove from the first names with embeddings reflective of other uses, such as a month, verb, or U.S. state. And they recruited workers from Amazon's Mechanical Turk to determine whether biases uncovered by the algorithm were consistent with "(problematic) biases held by society at large."

The team's tool discovered that, in some of the word embedding datasets, words like hostess tended to be closer to volleyball than to cornerback, while cab driver was closer to cornerback than to volleyball. The human evaluators agreed -- in one case, they found 38 percent of race, age, and gender associations to be offensive.

"Unlike humans, where implicit tests are necessary to elicit socially unacceptable biases in a straightforward fashion, word embeddings can be directly probed to output hundreds of biases of varying natures, including numerous offensive and socially unacceptable biases," the team wrote. "The racist and sexist associations exposed in publicly available word embeddings raise questions about their widespread use."

More