ImageNet creators find blurring faces for privacy has a 'minimal impact on accuracy'

The makers of ImageNet, one of the most influential datasets in machine learning, have released a version of the dataset that blurs people's faces in order to support privacy experimentation. Authors of a paper on the work say their research is the first known effort to analyze the impact blurring faces has on the accuracy of large-scale computer vision models. For this version, faces were detected automatically before they were blurred. Altogether, the altered dataset removes the faces of 562,000 people in more than a quarter-million images. Creators of a truncated version of the dataset of about 1.4 million images that was used for competitions told VentureBeat the plan is to eliminate the version without blurred faces and replace it with a version with blurred faces.

"Experiments show that one can use the face-blurred version for benchmarking object recognition and for transfer learning with only marginal loss of accuracy," the team wrote in an update published on the ImageNet website late last week, together with a research paper on the work. "An emerging problem now is how to make sure computer vision is fair and preserves people's privacy. We are continually evolving ImageNet to address these emerging needs."

Computer vision systems can be used for everything from recognizing car accidents on freeways to fueling mass surveillance, and as ongoing controversies over facial recognition have shown, images of the human face are deeply personal.

Following experiments with object detection and scene detection benchmark tests using the modified dataset, the team reported in the paper that blurring faces can reduce accuracy by 13% to 60%, depending on the category -- but that this reduction has a "minimal impact on accuracy" overall. Some categories that involve blurring objects close to people's faces, like a harmonica or a mask, resulted in higher rates of classification errors.

"Through extensive experiments, we demonstrate that training on face-blurred does not significantly compromise accuracy on both image classification and downstream tasks, while providing some privacy protection. Therefore, we advocate for face obfuscation to be included in ImageNet and to become a standard step in future dataset creation efforts," the paper's coauthors write.

An assessment of the 1.4 million images included in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset found that 17% of the images contain faces, despite the fact that only three of 1,000 categories in the dataset mention people. In some categories, like "military uniform" and "volleyball," 90% of the images included faces of people. Researchers also found reduced accuracy in categories rarely related to human faces, like "Eskimo dog" and "Siberian husky."

"It is strange since most images in these two categories do not even contain human faces," the paper reads.

Coauthors include researchers who released ImageNet in 2009, including Princeton University professor Jia Deng and Stanford University professor and former Google Cloud AI chief Fei-Fei Li. The original ImageNet paper has been cited tens of thousands of times since it was introduced at the Computer Vision and Pattern Recognition (CVPR) conference in 2009 and has since become one of the most influential research papers and datasets for the advancement of machine learning.

The ImageNet Large Scale Visual Recognition Challenge that took place from 2010 to 2017 is known for helping usher in the era of deep learning and leading to the spinoff of startups like Clarifai and MetaMind. Founded by Richard Socher, who helped Deng and Li assemble ImageNet, MetaMind was acquired by Salesforce in 2016. After helping establish the Einstein AI brand, Socher left his role as chief scientist at Salesforce last summer to launch a search engine startup.

The face-blurring version marks the second major ethical or privacy-related change to the dataset released 12 years ago. In a paper accepted for publication at the Fairness, Accountability, and Transparency (FAccT) in 2020, creators of the ImageNet dataset removed a majority of categories associated with people because the categories were found to be offensive.

That paper attributes racist, sexist, and politically charged predictions associated with ImageNet to issues like a lack of diversity in demographics represented in the dataset and use of the WordNet hierarchy for the words used to select and label images. A 2019 analysis found that roughly 40% of people in ImageNet photos are women, and about 1% are people over 60. It also found an overrepresentation of men between the ages of 18-40 and an underrepresentation of people with dark skin.

A few months after that paper was published, MIT deleted and removed another computer vision dataset, 80 Million Tiny Images, that's over a decade old and also used WordNet after racist, sexist labels and images were found in an audit by Vinay Prabhu and Abeba Birhane. Following an NSFW analysis of 80 Million Tiny Images, that paper examines common shortcomings of large computer vision datasets and considers solutions for the computer vision community going forward.

Analysis of ImageNet in the paper found instances of co-occurrence of people and objects in ImageNet categories involving musical instruments, since those images often include people even if the label itself does not mention people. It also suggests the makers and managers of large computer vision datasets take steps toward reform, including the use of techniques to blur the faces of people found in datasets.

On Monday, Birhane and Prabhu urged coauthors to cite ImageNet critics whose ideas are reflected in the face-obfuscation paper, such as the popular ImageNet Roulette. In a blog post, the duo detail multiple attempts to reach the ImageNet team, and a spring 2020 presentation by Prabhu at HAI that included Fei-Fei Li about the ideas underlying Birhane and Prabhu's criticisms of large computer vision datasets.

"We'd like to clearly point out that the biggest shortcomings are the tactical abdication of responsibility for all the mess in ImageNet combined with systematic erasure of related critical work, that might well have led to these corrective measures being taken," the blog post reads. Coauthor and Princeton University assistant professor Olga Olga Russakovsky told WIRED a citation of the paper will be included in an updated version of the paper. VentureBeat asked coauthors for additional comment about criticisms from Birhane and Prabhu but did not receive additional comment.

In other work critical of ImageNet, a few weeks after 80 Million Tiny Images was taken down, MIT researchers analyzed the ImageNet data collection pipeline and found "systematic shortcomings that led to reductions in accuracy." And a 2017 paper found that a majority of images included in the ImageNet dataset came from Europe and the United States, another example of poor representation of people from the Global South in AI.

ILSVRC is a subset of the larger ImageNet dataset, which contains over 14 million images across more than 20,000 categories. ILSVRC, ImageNet, and the recently modified version of ILSVRC were created with help from Amazon Mechanical Turk employees using photos scraped from Google Images.

In related news, a paper by researchers from Google, Mozilla Foundation, and the University of Washington analyzing datasets used for machine learning concludes that the machine learning research community needs to foster a culture change and recognize the privacy and property rights of individuals. In other news related to harm that can be caused by deploying AI, last fall, Stanford University and OpenAI convened experts from a number of fields to critique GPT-3. The group concluded that the creators of large language models like Google and OpenAI have only a matter of months to set standards and address the societal impact of deploying such language models.

More