NIST benchmarks show facial recognition technology still struggles to identify Black faces

Every few months, the U.S. National Institute of Standards and Technology (NIST) releases the results of benchmark tests it conducts on facial recognition algorithms submitted by companies, universities, and independent labs. A portion of these tests focus on demographic performance -- that is, how often the algorithms misidentify a Black man as a white man, a Black woman as a Black man, and so on. Stakeholders are quick to say that the algorithms are constantly improving with regard to bias, but a VentureBeat analysis reveals a different story. In fact, our findings cast doubt on the notion that facial recognition algorithms are becoming better at recognizing people of color.

That isn't surprising, as numerous studies have shown facial recognition algorithms are susceptible to bias. But the newest data point comes as some vendors push to expand their market share, aiming to fill the gap left by Amazon, IBM, Microsoft, and others with self-imposed moratoriums on the sale of facial recognition systems. In Detroit this summer, city subcontractor Rank One Computing began supplying facial recognition to local law enforcement over the objections of privacy advocates and protestors. Last November, Los Angeles-based TrueFace was awarded a contract to deploy computer vision tech at U.S. Air Force bases. And the list goes on.

Industrywide trends

NIST uses a mugshot corpus collected over 17 years to look for demographic errors in facial recognition algorithms. Specifically, it measures the rates at which:

White men are misidentified as Black men
White men are misidentified as different white men
Black men are misidentified as white men
Black men are misidentified as different Black men
White women are misidentified as Black women
White women are misidentified as different white women
Black women are misidentified as white women
Black women are misidentified as different Black women

NIST determines the error rate for each category -- also known as the false match rate (FMR) -- by recording how often an algorithm returns a wrong face for 10,000 mugshots. An FMR of .0001 implies one mistaken identity for every 1,000, while an FMR of .1 implies one mistake for every 10.

To get a sense of whether FMRs have decreased or increased in recent years, we plotted the algorithms' FMRs from organizations with commercial deployments, as measured by NIST -- two algorithms per organization. Comparing the performance of the two algorithms provided us an idea of bias over time.

NIST's benchmarks don't account for adjustments vendors make before the algorithms are deployed, and some vendors might never deploy the algorithms commercially. Because the algorithms submitted to NIST are often optimized for best overall accuracy, they're also not necessarily representative of how facial recognition systems behave in the wild. As the AI Now Institute notes in its recent report: While current standards like the NIST benchmarks "are a step in the right direction, it would be premature to rely on them to assess performance ... [because there] is currently no standard practice to document and communicate the histories and limits of benchmarking datasets ... and thus no way to determine their applicability to a particular system or suitability for a given context."

Still, the NIST benchmarks are perhaps the closest thing the industry has to an objective measure of facial recognition bias.

Rank One Computing

Rank One, whose facial recognition software is currently being used by the Detroit Police Department (DPD), improved across all demographic categories from November 2019 to July 2020, particularly with respect to the number of Black women it misidentifies. However, the FMRs of its latest algorithm remain high; NIST reports that Rank One's software misidentifies Black men between 1 and 2 times in 1,000 and Black women between 2 and 3 times in 1,000. That error rate could translate to substantial numbers, considering roughly 3.4 million of Detroit's over 4 million residents are Black (according to the 2018 census).

Perhaps predictably, Rank One's algorithm was involved in a wrongful arrest that some publications mistakenly characterized as the first of its kind in the U.S. (Following a firestorm of criticism, Rank One said it would add "legal means" to thwart misuse and the DPD pledged to limit facial recognition to violent crimes and home invasions.) In the case of the arrest, the DPD violated its own procedural rules, which restrict the use of the system to lead generation. But there's evidence of bias in the transparency reports from the DPD, which show that nearly all (96 out of 98) of the photos Detroit police officers have run through Rank One's software to date are of Black suspects.

Detroit's three-year, $1 million facial recognition technology contract with DataWorks Plus, a reseller of Rank One's algorithm, expired on July 24. But DataWorks agreed last year to extend its service contract through September 30. Beyond that, there's nothing preventing the city's IT department from servicing the software itself in perpetuity.

TrueFace

TrueFace's technology, which early next year will begin powering facial recognition and weapon identification systems on a U.S. Air Force base, became worse at identifying Black women from October 2019 to July 2020. The latest version of the algorithm has an FMR between 0.015 and 0.020 for misidentifying Black women compared with the previous version's FMR of between 0.010 and 0.015. U.S. Air Force Personnel Center statistics show there were more than 49,200 Black service members enlisted as of January 2020.

AnyVision, which recently raised $43 million from undisclosed investors, told Wired its facial recognition software has been piloted in hundreds of sites around the world, including schools in Putnam County, Oklahoma and Texas City, Texas. RealNetworks offers facial recognition for military drones and body cameras through a subsidiary called SAFR. After the Parkland, Florida school shooting in 2018, SAFR made its facial recognition tech free to schools across the U.S. and Canada.

While AnyVision's and RealNetworks' algorithms misidentify fewer Black women than before, they perform worse with Black men. Regarding other demographic groups, they show little to no improvement when measured against FMR.

An AnyVision spokesperson told VentureBeat that it's "inaccurate" to say the company's technology is biased based on the latest NIST results, but declined to provide details. "AnyVision's visual AI technology -- which was always ahead of the industry in terms of accuracy in the most challenging, real-world conditions -- has likewise made significant advances around bias in the field," the spokesperson said.

NtechLab

NtechLab's algorithm exhibits a comparable regression in FMR. The company, which gained notoriety for an app that allowed users to match pictures of people's faces to a Russian social network, recently received a $3.2 million contract to deploy its facial recognition tools throughout Moscow. NtechLab also has contracts in Saint Petersburg and in Jurmala, Latvia.

While the company's newest algorithm achieved reductions in FMR for white men and women, it performs worse with Black men than its predecessor. FMR in this category is closer to 0.005, up from just over 0.0025 in June 2019.

Gorilla Technologies

Another contender is Gorilla Technologies, which claims to have installed facial recognition technology in Taiwanese prisons. NIST data shows the company's algorithm became measurably worse at identifying Black women and men. The newest version of Gorilla's algorithm has an FMR score of between 0.004 and 0.005 for misidentifying Black women and a score of between 0.001 and 0.002 for misidentifying white women.

The algorithms are often misused in the field, as well, which tends to amplify their underlying biases. A report from Georgetown Law's Center on Privacy and Technology details how police feed facial recognition software flawed data, including composite sketches and pictures of celebrities who share physical features with suspects. The New York Police Department and others reportedly edit photos with blur effects and 3D modelers to make them more conducive to algorithmic face searches.

Whatever the reasons for the bias, an increasing number of cities and states have expressed concerns about facial recognition technology -- particularly in the absence of federal guidelines. Oakland and San Francisco in California; Portland, Oregon; and Somerville, Massachusetts are among the metros where law enforcement is prohibited from using facial recognition. In Illinois, companies must get consent before collecting biometric information, including face images. And in Massachusetts, lawmakers are considering a moratorium on government use of any biometric surveillance system in the state.

Congress, too, has put forth a bill -- the Facial Recognition and Biometric Technology Moratorium Act of 2020 -- that would sharply limit federal government officials' use of facial recognition systems. The bill's introduction follows the European Commission's consideration of a five-year moratorium on facial recognition in public places.

"Facial recognition is a uniquely dangerous form of surveillance. This is not just some Orwellian technology of the future -- it's being used by law enforcement agencies across the country right now, and doing harm to communities right now," Fight for the Future deputy director Evan Greer said earlier this year in a statement regarding proposed legislation. "Facial recognition is the perfect technology for tyranny. It automates discriminatory policing ... in our deeply racist criminal justice system. This legislation effectively bans law enforcement use of facial recognition in the United States. That's exactly what we need right now. We give this bill our full endorsement."