In facial recognition challenge, top-ranking algorithms show bias against Black women

Even the best facial recognition algorithms still struggle to recognize Black faces, particularly for women. That's according to the results of a facial recognition and analysis competition held during the European Conference on Computer Vision 2020 (ECCV) in September, which show higher false-positive rates (i.e., misidentifications) and lower false-negative rates (correct matches) for Black women, eyeglass wearers, and young children.

The goal of the ECCV challenge -- the 2020 ChaLearn Looking at People Fair Face Recognition and Analysis Challenge -- was to evaluate bias with respect to gender and skin tone in a set of facial recognition algorithms. Participants were asked to develop, test, and submit algorithmic methods with an eye to reduced bias. The challenge ran from April to July and included a development phase and a testing phase. According to the organizers, it attracted a total of 151 participants who submitted over 1,800 methods.

Notably, the competition was sponsored by AnyVision, a facial recognition vendor that recently raised $43 million from undisclosed investors. The company claims to have piloted its software -- which our own analysis shows exhibits racial bias -- in hundreds of sites around the world, including schools in Putnam County, Oklahoma and Texas City, Texas.

Each team was required to use the same dataset, which consisted of 152,917 photos of 6,139 males and females ranging in age from under 34 to over 65. AnyVision annotators labeled images according to age, skin color, and other attributes, with multiple annotators verifying the labels for accuracy before the dataset was divided into training, validation, and testing subsets.

For an added challenge, the organizers ensured photos in the dataset captured a range of head poses and showed "considerably" more white men than Black women, which they said better reflected conditions in the real world.

Teams were ranked by accuracy and the degree to which their algorithms exhibited recognition bias. When the top 10 methods compared photos of different people, women with dark complexions were most often discriminated against (45.5% of the time), whereas men with light skin tones were least impacted (12.6%). Moreover, many of the methods were stymied by pictures of people wearing glasses. After analyzing the results, the organizers found that young people captured by the dataset were less likely to wear glasses than older people (only 16% under the age of 35 were pictured wearing glasses), potentially contributing to bias.

The results are unfortunately not surprising -- countless studies have shown that facial recognition is susceptible to bias. A paper last fall by University of Colorado, Boulder researchers demonstrated that AI from Amazon, Clarifai, Microsoft, and others maintained accuracy rates above 95% for cisgender men and women but misidentified trans men as women 38% of the time. Independent benchmarks of major vendors' systems by the Gender Shades project and the National Institute of Standards and Technology (NIST) have demonstrated that facial recognition technology exhibits racial and gender bias and have suggested that current facial recognition programs can be wildly inaccurate, misclassifying people upwards of 96% of the time.

"The post-challenge analysis showed that top winning solutions applied a combination of different strategies to mitigate bias, such as face preprocessing, homogenization of data distributions, the use of bias-aware loss functions, and ensemble models, among others, suggesting there is not a general approach that works better for all the cases," the organizers concluded. "Despite the high accuracy, none of the methods was free of bias."

An AnyVision spokesperson provided this statement: "AnyVision understands that in computer vision, without proper safeguards, bias may exist based on race or ethnicity, gender, age, pose variation, change of appearance, and more. That’s why it is vital that these factors are accounted for when designing and developing algorithms and it is imperative to understand that bias in these algorithms is a function of the underlying datasets upon which they have been trained."