A system that can produce convincing photos out of whole cloth is the holy grail of image synthesis, and thanks to artificial intelligence (AI), researchers at Google subsidiary DeepMind and Heriot-Watt University believe they’ve come pretty darn close to creating it.
In a paper published this week on the preprint server Arxiv.org (“Large Scale GAN training for high fidelity natural image synthesis“), they describe AI that approximates food pics, landscapes portraits, and candid pet photos with impressive consistency. In some cases, the novel samples are almost impossible to distinguish from those taken with a camera.
“In this work, we set out to close the gap in fidelity and variety between images generated by [our AI system] and real-world images from [our] dataset,” the researchers wrote. “We find that current … techniques are sufficient to … dramatically improve the state of the art.”
The trick was employing large, highly optimized generative adversarial networks (GAN), or two-part neural networks consisting of generators that produce samples and discriminators that attempt to distinguish between the generated samples and real-world samples. The teams’ system, which they dubbed “BigGANs,” benefited from architectural tweaks, an increased batch size (2,048 images), and four times as many parameters (158 million) — the algorithmic levers used to control certain properties of the model — compared to prior art.
An expanded number of channels (a corollary for processing capacity) led to further gains, as did “truncation,” a technique which forced the generator to create images closer in appearance to those in the training dataset.
When it came to training the model, the researchers sourced ImageNet, an image database maintained by Stanford and Princeton. And to confirm their design was scalable, they next tapped JFT-300M, a dataset of 300 million real-world images labeled with 18K categories (two orders of magnitude larger than ImageNet).
All told, BigGANs took two days to train on 128 of Google’s Tensor Processing Units (TPUs), AI accelerator application-specific circuits developed specifically for machine learning. The results speak for themselves: It achieved Inception Score (IS) and Frechet Inception Distance (FID) scores — two metrics used to evaluate generative models — of 166.3 and 9.6, respectively, improving over the previous best of 52.52 and 18.65.
It’s not the first time GANs have been used to spin images from thin air. In September, Nvidia researchers developed an AI system that produces synthetic scans of brain cancer, and in August, a team at Carnegie Mellon demonstrated a model that could transfer a person’s recorded motion and facial expressions to a target subject in another photo or video.
Deepfakes (a portmanteau of “deep learning” and “fake”), which in some circles has come to refer to any AI-manipulated photo or video intended to deceive, has lawmakers taking notice. In September, members of Congress sent a letter to Director of National Intelligence Dan Coats asking for a report from intelligence agencies about the potential impact of deepfakes on democracy and national security.
Following the proliferation of deepfakes in the past year and fears of their potential impact, a number of efforts are underway to create AI capable of identifying them. In July, members of DARPA’s Media Forensics program embarked on exercises to automatically detect deepfakes or manipulated images and videos. And startups like Truepic, which raised an $8 million funding round in July, aim to offer deepfake detection services for a fee.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here