NIST benchmarks show facial recognition technology still struggles to identify Black faces

Every few months, the U.S. National Institute of Standards and Technology (NIST) releases the results of benchmark tests it conducts on facial recognition algorithms submitted by companies, universities, and independent labs. A portion of these tests focus on demographic performance — that is, how often the algorithms misidentify a Black man as a white man, a Black woman as a Black man, and so on. Stakeholders are quick to say that the algorithms are constantly improving with regard to bias, but a VentureBeat analysis reveals a different story. In fact, our findings cast doubt on the notion that facial recognition algorithms are becoming better at recognizing people of color.

That isn’t surprising, as numerous studies have shown facial recognition algorithms are susceptible to bias. But the newest data point comes as some vendors push to expand their market share, aiming to fill the gap left by Amazon, IBM, Microsoft, and others with self-imposed moratoriums on the sale of facial recognition systems. In Detroit this summer, city subcontractor Rank One began supplying facial recognition to local law enforcement over the objections of privacy advocates and protestors. Last November, Los Angeles-based Trueface was awarded a contract to deploy computer vision tech at U.S. Air Force bases. And the list goes on.

Industrywide trends

NIST uses a mugshot corpus collected over 17 years to look for demographic errors in facial recognition algorithms. Specifically, it measures the rates at which:

White men are misidentified as Black men
White men are misidentified as different white men
Black men are misidentified as white men
Black men are misidentified as different Black men
White women are misidentified as Black women
White women are misidentified as different white women
Black women are misidentified as white women
Black women are misidentified as different Black women

NIST determines the error rate for each category — also known as the false match rate (FMR) — by recording how often an algorithm returns a wrong face for 10,000 mugshots. An FMR of .0001 implies one mistaken identity for every 1,000, while an FMR of .1 implies one mistake for every 10.

To get a sense of whether FMRs have decreased or increased in recent years, we plotted the algorithms’ FMRs from organizations with commercial deployments, as measured by NIST — two algorithms per organization. Comparing the performance of the two algorithms provided us an idea of bias over time.

NIST’s benchmarks don’t account for adjustments vendors make before the algorithms are deployed, and some vendors might never deploy the algorithms commercially. Because the algorithms submitted to NIST are often optimized for best overall accuracy, they’re also not necessarily representative of how facial recognition systems behave in the wild. As the AI Now Institute notes in its recent report: While current standards like the NIST benchmarks “are a step in the right direction, it would be premature to rely on them to assess performance … [because there] is currently no standard practice to document and communicate the histories and limits of benchmarking datasets … and thus no way to determine their applicability to a particular system or suitability for a given context.”

Still, the NIST benchmarks are perhaps the closest thing the industry has to an objective measure of facial recognition bias.

Rank One Computing

Rank One Computing, whose facial recognition software is currently being used by the Detroit Police Department (DPD), improved across all demographic categories from November 2019 to July 2020, particularly with respect to the number of Black women it misidentifies. However, the FMRs of its latest algorithm remain high; NIST reports that Rank One’s software misidentifies Black men between 1 and 2 times in 1,000 and Black women between 2 and 3 times in 1,000. That error rate could translate to substantial numbers, considering roughly 3.4 million of Detroit’s over 4 million residents are Black (according to the 2018 census).