MIT researchers: Amazon's Rekognition shows gender and ethnic bias (updated)

Amazon's facial analysis software distinguishes gender among certain ethnicities less accurately than do competing services from IBM and Microsoft. That's the conclusion drawn by Massachusetts Institute of Technology researchers in a new study published today, which found that Rekognition, Amazon Web Services' (AWS) object detection API, fails to reliably determine the sex of female and darker-skinned faces in specific scenarios.

The study's coauthors claim that in experiments conducted over the course of 2018, Rekognition's facial analysis feature mistakenly identified pictures of woman as men and darker-skinned women as men 19 percent and 31 percent of the time, respectively. By comparison, Microsoft's offering misclassified darker-skinned women as men 1.5 percent of the time.

Amazon disputes those findings. It says that internally, in tests of an updated version of Rekognition, it observed "no difference" in gender classification accuracy across all ethnicities. And it notes that the paper in question failed to make clear the confidence threshold -- i.e., the minimum precision that Rekognition's predictions must achieve in order to be considered "correct" -- used in the experiments.

In a statement provided to VentureBeat, Dr. Matt Wood, general manager of deep learning and AI at AWS, drew a distinction between facial analysis -- which is concerned with spotting faces in videos or images and assigning generic attributes to them -- and facial recognition, which matches an individual face to faces in videos and images.

"[F]acial analysis ... [is] usually used to help search a catalog of photographs," he said. "[F]acial recognition ... is a distinct and different feature from facial analysis and attempts to match faces that appear similar. This is the same approach used to unlock some phones, or authenticate somebody entering a building, or by law enforcement to narrow the field when attempting to identify a person of interest."

He pointed out that facial analysis can only find generic features, such as facial hair, smiles, frowns, and gender, and that it has "no knowledge" of features that make a face unique. That's in contrast to facial recognition, he noted, which focuses on "unique facial features" to match faces.

Dr. Wood added that it's "not possible" to conclude the accuracy of facial recognition based on results obtained using facial analysis and argued that the paper "[doesn't] represent how a customer would use" Rekognition.

"Using an up-to-date version of Amazon Rekognition with similar data downloaded from parliamentary websites and the Megaface dataset of [1 million] images, we found exactly zero false positive matches with the recommended 99 [percent] confidence threshold," Wood said. "Facial analysis and facial recognition are completely different in terms of the underlying technology and the data used to train them. Trying to use facial analysis to gauge the accuracy of facial recognition is ill-advised, as it’s not the intended algorithm for that‎ purpose."

It's the second time Amazon's been in hot water over Rekognition's alleged susceptibility to bias.

In a test this summer -- the accuracy of which Amazon disputes -- the American Civil Liberties Union demonstrated that Rekognition, when fed 25,000 mugshots from a "public source" and tasked with comparing them to official photos of members of Congress, misidentified 28 Congressional representatives as criminals. A majority of the false matches -- 38 percent -- were people of color.

That's not to suggest it's an isolated problem.

A study in 2012 showed that facial algorithms from vendor Cognitec performed 5 to 10 percent worse on African Americans than on Caucasians, and researchers in 2011 found that facial recognition models developed in China, Japan, and South Korea had difficulty distinguishing between Caucasian faces and those of East Asians. In February, researchers at the MIT Media Lab found that facial recognition made by Microsoft, IBM, and Chinese company Megvii misidentified gender in up to 7 percent of lighter-skinned females, up to 12 percent of darker-skinned males, and up to 35 percent of darker-skinned females.

A separate study conducted by researchers at the University of Virginia found that two prominent research-image collections -- ImSitu and COCO, the latter of which is cosponsored by Facebook, Microsoft, and startup MightyAI -- displayed gender bias in their depiction of sports, cooking, and other activities. (Images of shopping, for example, were linked to women, while coaching was associated with men.)

Perhaps most infamously of all, in 2015 a software engineer reported that Google Photos' image classification algorithms identified African Americans as "gorillas."

But there are encouraging signs of progress.

In June, working with experts in artificial intelligence (AI) fairness, Microsoft revised and expanded the datasets it uses to train Face API, a Microsoft Azure API that provides algorithms for detecting, recognizing, and analyzing human faces in images. With new data across skin tones, genders, and ages, it was able to reduce error rates for men and women with darker skin by up to 20 times, and by 9 times for women.

Amazon says it's continually working to improve the accuracy of Rekognition by making funding available for research projects and staff through the AWS Machine Learning Research Grants, most recently through a "significant update" in November 2018. (The company says it's now on its fourth significant version update of Rekognition.) And it says it's "interested" in establishing standardized test for facial analysis and facial recognition, and in working with regulators on guidance of its use.

"We have provided funding for academic research in this area, have made significant investment on our own teams, and will continue to do so," Wood stated. "Many of these efforts have focused on improving facial recognition, facial analysis, the importance of high confidence levels in interpreting these results, the role of manual review, and standardized testing ... [W]e're grateful to customers and academics who contribute to improving these technologies ... We know that facial recognition technology, when used irresponsibly, has risks. This is true of a lot of technologies, computers included ... But, we remain optimistic about the good this technology‎ will provide in society."

The results of the MIT study are scheduled to be presented at the Association for the Advancement of Artificial Intelligence's conference on Artificial Intelligence, Ethics, and Society in Honolulu, Hawaii next week.

Updated 1/26 at 6:11 p.m. Pacific: Added additional statements from Dr. Wood published in a blog post on Amazon's AWS blog.

More