SenseTime researchers create a benchmark to test face forgery detectors

Face swapping is a category of deepfakes that extracts the faces of people in existing media and replaces them with other peoples' features, typically with AI and machine learning. It's been popularized by apps like MixBooth and SnapChat, and while the underlying techniques have enabled sophisticated image editing for legitimate purposes, they've also given rise to concerns about potential misuse or abuse.

Various groups have compiled manipulated media to support the development of face swapping detection methods, but the samples that have released so far are relatively few in number or overly artificial. That's why researchers from SenseTime Research, the R&D division of Hong Kong-based tech startup SenseTime, partnered with Nanyang Technological University in Singapore to design a new large-scale benchmark for face forgery detection. They call it DeeperForensics-1.0, and they say it's the largest corpora of its kind with over 60,000 videos containing roughly 17.6 million frames.

According to the researchers, all source videos in DeeperForensics-1.0 were carefully selected for their quality and diversity. They're ostensibly more realistic than those in other data sets in that they're closer in kind to real-world detection scenarios, and in that they contain compression, blurriness, and transmission artifacts matching those found in the wild.

To build DeeperForensics-1.0, the researchers collected face data from 100 paid male and female actors of 26 different nationalities and ranging in age from 20 to 45, all of whom were instructed to turn their heads in nine lighting conditions and speak naturally with over 53 expressions. They ran these through an AI framework -- DeepFake Variational AutoEncoder, or DF-VAE -- using 1,000 YouTube videos as target videos, where each of the 100 actors' faces was swapped onto 10 targets. And they deliberately distorted each video in 35 different ways to simulate real-world scenarios, such that the final data set contained 50,000 unmanipulated videos and 10,000 manipulated videos.

"We find that the source faces play a much more critical role than the target faces in building a high-quality data set," wrote the researchers in a preprint paper detailing their work. "Specifically, the expressions, poses, and lighting conditions of source faces should be much richer in order to perform robust face swapping."

The researchers also created what they call a "hidden" test set within DeeperForensics-1.0 -- a set of 400 videos carefully selected to better imitate fake videos in real scenes. Curating the set involved collecting fake videos generated by unknown face-swapping methods and obscuring them with distortions commonly seen in real scenes, and subsequently choosing only videos that fooled at least 50 out of 100 human observers in a user study.

To evaluate the quality of DeeperForensics-1.0 compared with other publicly available data sets, the researchers tasked 100 experts in computer vision with ranking the the quality of a subset of videos contained within it. They report that DeeperForensics-1.0 came out ahead on average in terms of realism for its scale compared with FaceForensics++, Celeb-DF, and other popular deepfake detection corpora.

In future work, the research team intends to expand DeeperForensics gradually and work with the research community toward identifying evaluation metrics for face forgery detection methods.

The fight against deepfakes appears to be ramping up. Last summer, members of DARPA's Media Forensics program tested a prototypical system that could automatically detect AI-generated videos in part by looking for cues like unnatural blinking. Startups like Truepic, which raised an $8 million funding round in July, are experimenting with deepfakes "detection-as-a-service" business models. And in December 2019, Facebook together with the Partnership on AI, Microsoft, and academics launched the Deepfake Detection Challenge, which will offer millions of dollars in grants and awards to spur the development of deepfake-detecting systems.

More