Researchers at MIT, Adobe Research, and Tsinghua University say they’ve developed a method — differentiable augmentation (DiffAugment) — that improves the efficiency of generative adversarial networks (GANs) by augmenting both real and fake data samples. In a preprint paper, they claim it effectively stabilizes the networks during training, enabling them to generate high-fidelity images using only 100 images without pretraining and to achieve state-of-the-art performance on popular benchmarks.

GANs — two-part AI models consisting of a generator that creates samples and a discriminator that attempts to differentiate between the generated samples and real-world samples — have demonstrated impressive feats of media synthesis. Top-performing GANs can create realistic portraits of people who don’t exist, for instance, or snapshots of fictional apartment buildings. But their success so far has come at the cost of considerable computation and data; GANs rely heavily on quantities (in the tens of thousands) of diverse and high-quality training samples, and in some cases, collecting such large-scale data sets requires months or years along with annotation costs — if it’s even possible.

As alluded to earlier, the researchers’ technique applies augmentations to both real images from training data and fake images produced by the generator. (If the method were to augment only the real images, the target GAN might learn a different data distribution.) DiffAugment randomly enlarges or shrinks the images and masks them with a random square half the image size, and it simultaneously adjusts the images’ brightness, color, and contrast values.

DiffAugment

Above: With DiffAugment, GANs can generate high-fidelity images using only 100 Obama portraits, grumpy cats, or pandas from a data set. The cats and dogs were generated using 160 and 389 images, respectively.

In experiments conducted on the open source ImageNet and CIFAR-100 corpora, the researchers applied DiffAugment to two leading-class GANs: DeepMind’s BigGAN and Nvidia’s StyleGAN2. With pretraining, they report that against CIFAR-100, their method improved all the baselines by a “considerable margin” independently of the architectures on the Fréchet Inception Distance (FID) metric, which takes photos from the target distribution and the models being evaluated and uses an AI object recognition system to capture important features and suss out similarities. More impressively, without pretraining and using only 100 images, the GANs achieved results on par with existing transfer learning algorithms in several image categories (namely “panda” and “grumpy cat”).

“StyleGAN2’s performance drastically degrades given less training data. With DiffAugment, we are able to roughly match its FID and outperform its Inception Score (IS) using only 20% training data,” the coauthors wrote. “Extensive experiments consistently demonstrate its benefits with different network architectures, supervision settings, and objective functions. Our method is especially effective when limited data is available.”

The code and models are freely available on GitHub.