Meta AI’s open-source system attempts to right gender bias in Wikipedia biographies

By this point, it’s become reflexive: When searching for something on Google, Wikipedia is the de facto go-to first page. The website is consistently among the top 10 most-visited websites in the world.

Yet, not all changemakers and historical figures are equally represented on the dominant web encyclopedia. Just 20% of Wikipedia biographies are about women. That percentage goes down even more when it comes to women from intersectional groups — those in male dominated industries like sciences, for example, or from historically underrepresented ethnic backgrounds.

This is indicative of the fact that “there’s a lot of societal bias on the internet in general,” said Meta AI researcher Angela Fan, who set out to explore this imbalance for her Ph.D. project as a computer science student at the Université de Lorraine, CNRS, in France. “AI models don’t cover everyone in the world equally.”

In addressing this, Fan teamed with her Ph.D. advisor, author and computer science researcher Claire Gardent, to build an open source AI system that sources and writes first drafts of Wikipedia-style biographies. Today, they released their findings and methodologies in the paper, “Generating Full-Length Wikipedia Biographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies.”

Meta AI has also open-sourced the model and corresponding dataset. These directly relate to not only women, but women in science and those located in Asia and Africa. The hope, Fan said, is that the open, reproducible science can complement existing efforts and provide a starting point for researchers to bring more representation to the web.

NLP battles gender bias

As Fan pointed out, the natural language processing (NLP) community has focused on combating gender bias in co-reference resolution dialogue, detection of abusive language, machine translation and word embeddings. These studies have presented a variety of strategies, including data augmentation, additional data collection efforts, modified generation and fair evaluation.

In the case of Wikipedia, while efforts by such groups as the Wikimedia Foundation, WikiProject Women, and Women in Red – a Wikipedia editor community – have focused on de-biasing existing content, they haven’t addressed systemic challenges around the initial gathering of content and the factors that introduce bias in the first place, Fan said.

Meanwhile, factuality is one of the major problems in text generation and NLP. The process raises three key challenges, Fan said: How to gather relevant evidence, how to structure that information into well-formed text, and how to ensure that the generated text is factually correct.

The study’s model and dataset uses AI to generate full biographies, instead of focusing on fixing or adding bits and pieces of content to existing profiles. The model writes a full biography by first predicting text around an intro paragraph, then the subject’s early life, then their career. Each section follows three steps: a retrieval module that selects relevant information from the web to write each section; a generation module to write the next section’s text and predict which section to write next; and a citation module that lists relative citations.

Fan and Gardent’s query consisted of three parts: The name of the person for which the biography is generated; their occupation(s), and a section heading. They curated a dataset of 1,500 biographies about women, then analyzed that generated text to understand how differences in available web evidence data affect generation. They evaluated the factuality, fluency, and quality of generated texts using both automatic metrics and human evaluation looking at content and factuality.

The limitations of AI

As Fan explained, existing AI can write individual sentences fairly well, but producing full grammatically correct sentences can be difficult, and producing an entire long-form document or article is even more difficult.

“The key challenge is generating long text,” said Gardent, who authored the book, “Deep Learning Approaches to Text Production,” and is affiliated with the Lorraine Research Laboratory in Computer Science, the French National Centre for Scientific Research, and the University of Lorraine. “That sounds very natural. But if you look at it in detail, it’s full of contradictions and redundancies, and factually it can be very wrong.”

This is because there often aren’t enough secondary sources to fact-check against. Concurrent with that are challenges with multilingual NLP. Wikipedia supports 309 languages, but English is dominant, followed by French and German. From there, it significantly drops off because many languages – such as those spoken in Africa – are low-source. “It’s important to measure not just the representation of one group, but how that interacts with other groups,” Fan said.

The goal is to have “language agnostic representation,” Gardent agreed. If numerous languages can be processed, they can be used to derive maximum information.

In tackling factuality, the study also used what’s known as Natural Language Entailment, a high-level quantification proxy. If two sentences entail each other in both directions, then they are semantically equivalent, Fan explained.

Ultimately, she emphasized that the model and dataset are just one small step in the process of righting long-standing, inherent bias. “Our model addresses just one piece of a multifaceted problem,” Fan said, “so there are additional areas where new techniques should be explored.”

NLP battles gender bias

The limitations of AI

More