Microsoft researchers say NLP bias studies must consider role of social hierarchies like racism

As the recently released GPT-3 and several recent studies demonstrate, racial bias, as well as bias based on gender, occupation, and religion, can be found in popular NLP language models. But a team of AI researchers wants the NLP bias research community to more closely examine and explore relationships between language, power, and social hierarchies like racism in their work. That's one of three major recommendations for NLP bias researchers a recent study makes.

Published last week, the work, which includes analysis of 146 NLP bias research papers, also concludes that the research field generally lacks clear descriptions of bias and fails to explain how, why, and to whom that bias is harmful. "Although these papers have laid vital groundwork by illustrating some of the ways that NLP systems can be harmful, the majority of them fail to engage critically with what constitutes 'bias' in the first place," the paper reads. "We argue that such work should examine the relationships between language and social hierarchies; we call on researchers and practitioners conducting such work to articulate their conceptualizations of 'bias' in order to enable conversations about what kinds of system behaviors are harmful, in what ways, to whom, and why; and we recommend deeper engagements between technologists and communities affected by NLP systems."

Authors suggest NLP researchers join other disciplines like sociolinguistics, sociology, and social psychology in examining social hierarchies like racism in order to understand how language is used to maintain social hierarchy, reinforce stereotypes, or oppress and marginalize people. They argue that recognizing the role language plays in maintaining social hierarchies like racism is critical to the future of NLP system bias analysis.

Researchers also argue NLP bias research should be grounded in research that goes beyond machine learning in order to document connections between bias social hierarchy and language. "Without this grounding, researchers and practitioners risk measuring or mitigating only what is convenient to measure or mitigate, rather than what is most normatively concerning," the paper reads.

Each recommendation comes with a series of questions designed to spark future research with the recommendations in mind. Authors say the key question NLP bias researchers should ask is "How are social hierarchies, language ideologies, and NLP systems coproduced?" This question, authors said, is in keeping with Ruha Benjamin's recent insistence that AI researchers consider the historical and social context of their work or risk becoming like IBM researchers who supported the Holocaust during World War II. Taking a historic perspective, the authors document U.S. history of white people labeling the language of non-white speakers as deficient in order to justify violence and colonialism, and say language is still used today to justify enduring racial hierarchies.

"We recommend that researchers and practitioners similarly ask how existing social hierarchies and language ideologies drive the development and deployment of NLP systems, and how these systems therefore reproduce these hierarchies and ideologies," the paper reads.

The paper also recommends NLP researchers and practitioners embrace participatory design and engage with communities impacted by algorithmic bias. To demonstrate a way to apply this approach to NLP bias research, the paper also includes a case study of African-American English (AAE), negative perceptions of how black people talk in tech, and how language is used to reinforce anti-black racism.

The analysis focuses on NLP text and does not include speech algorithmic bias assessments. An assessment released earlier this year found that automatic speech detection systems from companies like Apple, Google, and Microsoft perform better for white speakers and worse for African Americans.

Notable exceptions to trends outlined in the paper include NLP bias surveys or frameworks, which tend to include clear definitions of bias, and papers on stereotyping, which tend to engage with relevant literature outside the NLP field. The paper heavily cites research by Jonathan Rosa and Nelson Flores that approaches language from what the authors describe as a raciolinguistic perspective to counteract white supremacy.

The paper was written by Su Lin Blodgett from the University of Massachusetts, Amherst and Microsoft Research's Solon Barocas, Hal Daumé III, and Hanna Wallach. In other recent AI ethics work, in March, Wallach and Microsoft's Aether committee worked with machine learning practitioners to create a range of products and created an AI ethics checklist with collaborators from a dozen companies.

More