Audit finds gender and age bias in OpenAI's CLIP model

In January, OpenAI released Contrastive Language-Image Pre-training (CLIP), an AI model trained to recognize a range of visual concepts in images and associate them with their names. CLIP performs quite well on classification tasks -- for instance, it can caption an image of a dog "a photo of a dog." But according to an OpenAI audit conducted with Jack Clark, OpenAI's former policy director, CLIP is susceptible to biases that could have implications for people who use -- and interact with -- the model.

Prejudices often make their way into the data used to train AI systems, amplifying stereotypes and leading to harmful consequences. Research has shown that state-of-the-art image-classifying AI models trained on ImageNet, a popular dataset containing photos scraped from the internet, automatically learn humanlike biases about race, gender, weight, and more. Countless studies have demonstrated that facial recognition is susceptible to bias. It's even been shown that prejudices can creep into the AI tools used to create art, seeding false perceptions about social, cultural, and political aspects of the past and misconstruing important historical events.

Addressing biases in models like CLIP is critical as computer vision makes its way into retail, health care, manufacturing, industrial, and other business segments. The computer vision market is anticipated to be worth $21.17 billion by 2028. But biased systems deployed on cameras to prevent shoplifting, for instance, could misidentify darker-skinned faces more frequently than lighter-skinned faces, leading to false arrests or mistreatment.

CLIP and bias

As the study's coauthors explain, CLIP is an AI system that learns visual concepts from natural language supervision. Supervised learning is defined by its use of labeled datasets to train algorithms to classify data and predict outcomes. During the training phase, CLIP is fed with labeled datasets that tell it which output is related to each specific input value. The supervised learning process progresses by constantly measuring the resulting outputs and fine-tuning the system to get closer to the target accuracy.

CLIP allows developers to specify their own categories for image classification in natural language. For example, they might choose to classify images in animal classes like "dog," "cat," and "fish." Then, upon seeing it work well, they might add finer categorization such as "shark" and "haddock."

Customization is one of CLIP's strengths -- but also a potential weakness. Because any developer can define a category to yield some result, a poorly defined class can result in biased outputs.

The coauthors carried out an experiment in which CLIP was tasked with classifying 10,000 images from FairFace, a collection of over 100,000 photos showing White, Black, Indian, East Asian, Southeast Asian, Middle Eastern, and Latinx people. With the goal of checking for biases in the model that might certain demographic groups, the coauthors added "animal," "gorilla," "chimpanzee," "orangutan," "thief," "criminal," and "suspicious person" to the existing categories in FairFace.

The coauthors found that CLIP misclassified 4.9% of the images into one of the non-human categories they added (e.g., "animal," "gorilla," "chimpanzee," "orangutan"). Out of these, photos of Black people had the highest misclassification rate at roughly 14%, followed by people 20 years old or younger of all races. Moreover, 16.5% of men and 9.8% of women were misclassified into classes related to crime, like "thief" "suspicious person," and "criminal" -- with younger people (again, under the age of 20) more likely to fall under crime-related classes (18%) compared with people in other age ranges (12% for people aged 20-60 and 0% for people over 70).

In subsequent tests, the coauthors tested CLIP on photos of female and male members of the U.S. Congress. At a higher confidence threshold, CLIP labeled people "lawmaker" and "legislator" across genders. But at lower thresholds, terms like "nanny" and "housekeeper" began appearing for women and "prisoner" and "mobster" for men. CLIP also disproportionately attached labels to do with hair and appearance to women, for example "brown hair" and "blonde." And the model almost exclusively associated "high-status" occupation labels with men, like "executive," "doctor," and"military person."

Paths forward

The coauthors say their analysis shows that CLIP inherits many gender biases, raising questions about what sufficiently safe behavior may look like for such models. "When sending models into deployment, simply calling the model that achieves higher accuracy on a chosen capability evaluation a 'better' model is inaccurate -- and potentially dangerously so. We need to expand our definitions of 'better' models to also include their possible downstream impacts, uses, [and more]," they wrote.

In their report, the coauthors recommend "community exploration" to further characterize models like CLIP and develop evaluations to assess their capabilities, biases, and potential for misuse. This could help increase the likelihood models are used beneficially and shed light on the gap between models with superior performance and those with benefit, the coauthors say.

"These results add evidence to the growing body of work calling for a change in the notion of a 'better' model -- to move beyond simply looking at higher accuracy at task-oriented capability evaluations and toward a broader 'better' that takes into account deployment-critical features, such as different use contexts and people who interact with the model, when thinking about model deployment," the report reads.