Researchers show that computer vision algorithms pretrained on ImageNet exhibit multiple, distressing biases

State-of-the-art image-classifying AI models trained on ImageNet, a popular (but problematic) dataset containing photos scraped from the internet, automatically learn humanlike biases about race, gender, weight, and more. That's according to new research from scientists at Carnegie Mellon University and George Washington University, who developed what they claim is a novel method for quantifying biased associations between representations of social concepts (e.g., race and gender) and attributes in images. When compared with statistical patterns in online image datasets, the findings suggest models automatically learn bias from the way people are stereotypically portrayed on the web.

Companies and researchers regularly use machine learning models trained on massive internet image datasets. To reduce costs, many employ state-of-the-art models pretrained on large corpora to help achieve other goals, a powerful approach called transfer learning. A growing number of computer vision methods are unsupervised, meaning they leverage no labels during training; with fine-tuning, practitioners pair general-purpose representations with labels from domains to accomplish tasks like facial recognition, job candidate screening, autonomous vehicles, and online ad delivery.

Working from the hypothesis that image representations contain biases corresponding to stereotypes of groups in training images, the researchers adapted bias tests designed for contextualized word embedding to the image domain. (Word embeddings are language modeling techniques where words from a vocabulary are mapped to vectors of real numbers, enabling models to learn from them.) Their proposed benchmark -- Image Embedding Association Test (iEAT) -- modifies word embedding tests to compare pooled image-level embeddings (i.e., vectors representing images), with the goal of measuring the biases embedded during unsupervised pretraining by comparing the association of embeddings systematically.

To explore what kinds of biases may get embedded in image representations generated where class labels aren't available, the researchers focused on two computer vision models published this past summer: OpenAI's iGPT and Google's SimCLRv2. Both were pretrained on ImageNet 2012, which contains 1.2 million annotated images from Flickr and other photo-sharing sites of 200 object classes. And as the researchers explain, both learn to produce embeddings based on implicit patterns in the entire training set of image features.

The researchers compiled a representative set of image stimuli for categories like "age," "gender-science," "religion," "sexuality," "weight," "disability," "skin tone," and "race." For each, they drew representative images from Google Images, the open source CIFAR-100 dataset, and other sources.

In experiments, the researchers say they uncovered evidence iGPT and SimCLRv2 contain "significant" biases likely attributable to ImageNet's data imbalance. Previous research has shown that ImageNet unequally represents race and gender; for instance, the "groom" category shows mostly white people.

Both iGPT and SimCLRv2 showed racial prejudices both in terms of valence (i.e., positive and negative emotions) and stereotyping. Embeddings from iGPT and SimCLRv2 exhibited bias for an Arab-Muslim iEAT benchmark measuring whether images of Arab Americans were considered more "pleasant" or "unpleasant" than others. iGPT was biased in a skin tone test comparing perceptions of faces of lighter and darker tones. (Lighter tones were seen by the model to be more "positive.") And both iGPT and SimCLRv2 associated white people with tools while associating Black people with weapons, a bias similar to that shown by Google Cloud Vision, Google's computer vision service, which was found to label images of dark-skinned people holding thermometers "gun."

Beyond racial prejudices, the coauthors report that gender and weight biases plague the pretrained iGPT and SimCLRv2 models. In a gender-career iEAT test estimating the closeness of the category "male" with "business" and "office" and "female" to attributes like "children" and "home," embeddings from the models were stereotypical. In the case of iGPT, a gender-science benchmark designed to judge the relations of "male" with "science" attributes like math and engineering and "female" with "liberal arts" attributes like art showed similar bias. And iGPT displayed a bias toward lighter-weight people of all genders and races, associating thin people with pleasantness and overweight people with unpleasantness.

The researchers also report that the next-level prediction features of iGPT were biased against women in their tests. To demonstrate, they cropped portraits of women and men including Alexandria Ocasio-Cortez (D-NY) below the neck and used iGPT to generate different complete images. iGPT completions of regular, businesslike indoor and outdoor portraits of clothed women and men often featured large breasts and bathing suits; in six of the ten total portraits tested, at least one of the eight completions showed a bikini or low-cut top.

The results are unfortunately not surprising -- countless studies have shown that facial recognition is susceptible to bias. A paper last fall by University of Colorado, Boulder researchers demonstrated that AI from Amazon, Clarifai, Microsoft, and others maintained accuracy rates above 95% for cisgender men and women but misidentified trans men as women 38% of the time. Independent benchmarks of major vendors' systems by the Gender Shades project and the National Institute of Standards and Technology (NIST) have demonstrated that facial recognition technology exhibits racial and gender bias and have suggested that current facial recognition programs can be wildly inaccurate, misclassifying people upwards of 96% of the time.

However, efforts are underway to make ImageNet more inclusive and less toxic. Last year, the Stanford, Princeton, and University of North Carolina team behind the dataset used crowdsourcing to identify and remove derogatory words and photos. They also assessed the demographic and geographic diversity in ImageNet photos and developed a tool to surface more diverse images in terms of gender, race, and age.

"Though models like these may be useful for quantifying contemporary social biases as they are portrayed in vast quantities of images on the internet, our results suggest the use of unsupervised pretraining on images at scale is likely to propagate harmful biases," the Carnegie Mellon and George Washington University researchers wrote in a paper detailing their work, which hasn't been peer-reviewed. "Given the high computational and carbon cost of model training at scale, transfer learning with pre-trained models is an attractive option for practitioners. But our results indicate that patterns of stereotypical portrayal of social groups do affect unsupervised models, so careful research and analysis is needed before these models make consequential decisions about individuals and society."

More