Last week at the Black Hat cybersecurity conference in Las Vegas, the Democratic National Committee tried to raise awareness of the dangers of AI-doctored videos by displaying a deepfaked video of DNC Chair Tom Perez. Deepfakes are videos that have been manipulated, using deep learning tools, to superimpose a person’s face onto a video of someone else.
As the 2020 presidential election draws near, there’s increasing concern over the potential threats deepfakes pose to the democratic process. In June, the U.S. Congress House Permanent Select Committee on Intelligence held a hearing to discuss the threats of deefakes and other AI-manipulated media. But there’s doubt over whether tech companies are ready to deal with deepfakes. Earlier this month, Rep. Adam Schiff, chairman of the House Intelligence Committee, expressed concern that Google, Facebook, and Twitter don’t have no clear plan to deal with the problem.
Mounting fear over the potential onslaught of deepfakes has spurred a slate of projects and efforts to detect deepfakes and other image- and video-tampering techniques.
Deepfakes use neural networks to overlay the face of the target person on an actor in the source video. While neural networks can do a good job at mapping the features of one person’s face onto another, they don’t have any understanding of the physical and natural characteristics of human faces.
That’s why they can give themselves away by generating unnatural phenomena. One of the most notable artifacts is unblinking eyes. Before the neural networks that generate deepfakes can do their trick, their creators must train them by showing them examples. In the case of deepfakes, those examples are images of the target person. Since most pictures used in the training have open eyes, the neural network tend to create deepfakes that don’t blink, or that blink in unnatural ways.
Last year, researchers from the University of Albany published a paper on a technique for spotting this type of inconsistency in eye blinking. Interestingly, the technique uses deep learning, the same technology used to create the fake videos. The researchers found that neural networks trained on eye blinking videos could localize eye blinking segments in videos and examine the sequence of frames for unnatural movements.
However, with the technology becoming more advanced every day, it’s just a matter of time until someone manages to create deepfakes that can blink naturally.
Tracking head movement
More recently, researchers at UC Berkley developed an AI algorithm that detects face-swapped videos based on something that is much more difficult to fake: head and face gestures. Every person has unique head movements (e.g. nodding when stating a fact) and face gestures (e.g. smirking when making a point). Deepfakes inherit head and face gestures from the actor, not the targeted person.
A neural network trained on the head and face gestures of an individual would be able to flag videos that contain head gestures that don’t belong to that person. To test their model, the UC Berkley researchers trained the neural network on real videos of world leaders. The AI was able to detect deepfaked videos of the same persons with 92% accuracy.
Head movement detection provides a robust protection method against deep fakes. However, unlike the eye-blinking detector, where you train your AI model once, the head movement detector needs to be trained separately for every individual. So while it’s suitable for public figures such as world leaders and celebrities, it’s less ideal for general-purpose deepfake detection.
When forgers tamper with an image or video, they do their best to make it look realistic. While image manipulation can be extremely hard to spot with the naked eye, it leaves behind some artifacts that a well-trained deep learning algorithm can detect.
Researchers at University of California, Riverside, developed an AI model that detects tampering by examining the edges of the objects contained in images. The pixels at the boundaries of objects that are artificially inserted into or removed from an image contain special characteristics, such as unnatural smoothing and feathering.
The UCR researchers trained their model on a large dataset containing annotated examples of untampered and tampered images. The neural network was able to glean common patterns that define the difference between the boundaries of manipulated and non-manipulated objects in images. When presented with new images, the AI was able to detect and highlight manipulated objects.
While the researchers tested this method on still images, it can potentially work on videos too. Deepfakes are essentially a series of manipulated image frames, so the same object manipulation artifacts exist in those individual frames, on the edges of the subject’s face.
Again, while this is an effective method to detect a host of different tampering techniques, it can become obsolete as deepfakes and other video-manipulation tools become more advanced.
Setting a baseline for the truth
While most efforts in the field focus on finding proof of tampering in videos, a different solution to fight deepfakes is to prove what’s true. This is the approach researchers at the UK’s University of Surrey used in Archangel, a project they are trialing with national archives in several countries.
Archangel combines neural networks and blockchain to establish a smart archive for storing videos so that they can be used in the future as a single source of truth. When a record is added to the archive, Archangel trains a neural network on various formats of the video. The neural network will then be able to tell whether a new video is the same as the original video or a tampered version.
Traditional fingerprinting methods verify authenticity of files by comparing them at byte level. This is not suitable for videos, whose byte structure changes when compressed in different formats. But neural networks learn and compare the visual features of the video, so it is codec-agnostic.
To make sure these neural network fingerprints themselves are not compromised, Archangel stores them on a permissioned blockchain maintained by the national archives participating in the trial program. Adding records to the archive requires consensus among the participating organizations. This ensures that no single party can unilaterally decide which videos are authentic. Once Archangel launches publicly, anyone will be able to run a video against the neural networks to check its authenticity.
The downside of this method is that it requires a trained neural network per video. This can limit its use because training neural networks takes hours and requires considerable computing power. It is nonetheless suitable for sensitive videos such as Congressional records and speeches by high-profile figures that are more likely to become the subject of tampering.
A cat-and-mouse game
While it’s comforting to see these and other efforts help protect elections and individuals against deepfakes, they are up against a fast-developing technology. As deepfakes continue to become more sophisticated, it’s unclear whether defense and detection methods will be able to keep up.
Ben Dickson is a software engineer and the founder of TechTalks, a blog that explores the ways technology is solving and creating problems.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here