Facebook's AI reverse-engineers models used to generate deepfakes

Some experts have expressed concern that machine learning tools could be used to create deepfakes, or media that takes a person in an existing video, photo, or audio file and replaces them with someone else's likeness. The fear is that these fakes might be used to do things like sway opinion during an election or implicate an innocent person in a crime. Deepfakes have already been abused to generate pornographic material of actors and defraud a major energy producer.

While much of the discussion around deepfakes has focused on social media, pornography, and fraud, it's worth noting that deepfakes pose a threat to anyone portrayed in manipulated videos and their circle of trust. As a result, deepfakes represent an existential threat to businesses, particularly in industries that depend on digital media to make important decisions. The FBI earlier this year warned that deepfakes are a critical emerging threat targeting businesses.

To address this challenge, Facebook today announced a collaboration with researchers at Michigan State University (MSU) to develop a method of detecting deepfakes that relies on taking an AI-generated image and reverse-engineering the system used to create it. While this approach is not being used in production at Facebook, the company claims the technique will support deepfake detection and tracing efforts in "real-world" settings, where deepfakes themselves are the only information detectors have to work with.

A new way to detect deepfakes

Current methods of identifying deepfakes focus on distinguishing real from fake images and determining whether an image was generated by an AI model seen during training or not. For example, Microsoft recently launched a deepfake-combating solution in Video Authenticator, a tool that can analyze a still photo or video to provide a score for its level of confidence that the media hasn't been artificially manipulated. And the winners of Facebook's Deepfake Detection Challenge, which ended last June, produced a system that can pick out distorted videos with up to 82% accuracy.

But Facebook argues that solving the problem of deepfakes requires taking the discussion one step further. Reverse engineering isn't a new concept in machine learning -- current techniques can arrive at a model by examining its input and output data or examining hardware information like CPU and memory usage. However, these techniques depend on preexisting knowledge about the model itself, which limits their applicability in cases where such information is unavailable.

By contrast, Facebook and MSU's approach begins with attribution and then works on discovering the properties of the model used to generate the deepfake. By generalizing image attribution and tracing similarities between patterns of a collection of deepfakes, it can ostensibly infer more about the generative model used to create a deepfake and tell whether a series of images originated from a single source.

How it works

The system begins by running a deepfake image through what the researchers call a fingerprint estimation network (FEN) that extracts details about the "fingerprint" left by the model that generated it. These fingerprints are unique patterns left on deepfakes that can be used to identify the generative models the deepfakes originated from.

The researchers estimated fingerprints using different constraints based on properties of deepfake fingerprints found in the wild. They used these constraints to generate a dataset of fingerprints, which they then tapped to train a model to detect fingerprints it hadn't seen before.

Facebook and MSU say their system can estimate both the network architecture of an algorithm used to create a deepfake and its training loss functions, which evaluate how the algorithm models its training data. It also reveals the features -- or the measurable pieces of data that can be used for analysis -- of the model used to create the deepfake.

To test this approach, the MSU research team put together a fake image dataset with 100,000 synthetic images generated from 100 publicly available models. Some of the open source projects already had fake images released, in which case the team randomly selected 1,000 deepfakes from the datasets. In cases where there weren't any fake images available, the researchers ran their code to generate 1,000 images.

The researchers found that their approach performed "substantially better" than chance and was "competitive" with state-of-the-art methods for deepfake detection and attribution. Moreover, they say it could be applied to detect coordinated disinformation attacks where varied deepfakes are uploaded to different platforms but all originate from the same source.

"Importantly, whereas the term deepfake is often associated with swapping someone's face -- their identity -- onto new media, the method we describe allows reverse engineering of any fake scene. In particular, it can help with detecting fake text in images," Facebook AI researcher Tal Hassner told VentureBeat via email. "Beyond detection of malicious attacks -- faces or otherwise -- our work can help improve AI methods designed for generating images: exploring the unlimited variability of model design in the same way that hardware camera designers improve their cameras. Unlike the world of cameras, however, generative models are new, and with their growing popularity comes a need to develop tools to study and improve them."

Looming threat

Since 2019, the number of deepfakes online has grown from 14,678 to 145,227, an uptick of roughly 900% year over year, according to Sentinel. Meanwhile, Forrester Research estimated in October 2019 that deepfake fraud scams would cost $250 million by the end of 2020. But businesses remain largely unprepared. In a survey conducted by data authentication startup Attestiv, fewer than 30% of executives say they've taken steps to mitigate fallout from a deepfake attack.

Deepfakes are likely to remain a challenge, especially as media generation techniques continue to improve. Earlier this year, deepfake footage of actor Tom Cruise posted to an unverified TikTok account racked up 11 million views on the app and millions more on other platforms. When scanned through several of the best publicly available deepfake detection tools, the deepfakes avoided discovery, according to Vice.

Still, a growing number of commercial and open source efforts promise to put to rest the deepfake threat -- at least temporarily. Amsterdam-based Sensity offers a suite of monitoring products that purport to classify deepfakes uploaded on social media, video hosting platforms, and disinformation networks. Dessa has proposed techniques for improving deepfake detectors trained on datasets of manipulated videos. And Jigsaw, Google's internal technology incubator, released a large corpus of visual deepfakes that was incorporated into a benchmark made freely available to researchers for synthetic video detection system development.

Facebook and MSU plan to open-source the dataset, code, and trained models used to create their system to facilitate research in various domains, including deepfake detection, image attribution, and reverse-engineering of generative models. "Deepfakes are becoming easier to produce and harder to detect. Companies, as well as individuals, should know that methods are being developed, not only to detect malicious deep fakes but also to make it harder for bad actors to get away with spreading them," Hassner added. "Our method provides new capabilities in detecting coordinated attacks and in identifying the origins of malicious deepfakes. In other words, this is a new forensic tool for those seeking to keep us safe online."