We’ve all been there: You’re standing next to a smiling group of friends, ready with your smartphone to snap the perfect group selfie, but you’re forced to take a step back because your exposure settings are out of whack. Everything’s far too bright or too dim, a condition your camera’s finicky automatic settings aren’t helping any.

Researchers at Chinese smartphone giant Xiaomi describe a solution to the exposure dilemma in a new paper (“DeepExposure: Learning to Expose Photos with Asynchronously Reinforced Adversarial Learning“) accepted at NeurIPS 2018 in Montreal. In it, they describe an AI system capable of segmenting an image into multiple “sub-images,” each associated with a local exposure, that it subsequently uses to retouch the original input photo.

“The accurate exposure is the key of capturing high-quality photos in computational photography, especially for mobile phones that are limited by sizes of camera modules,” the researchers wrote. “Inspired by luminosity masks usually applied by professional photographers, in this paper, we develop a novel algorithm for learning local exposures with deep reinforcement adversarial learning.”

The AI pipeline — dubbed DeepExposure — kicks things off with image segmentation. Next comes an “action-generating” stage during which the input low-resolution, sub-images, and direct fusion of the images are concatenated and processed by a policy network that computes each’s local and global exposures. After the images pass through local and global value filters, the model completes a finishing step in which a value function evaluates the overall quality. Finally, the sub-images are blended together with the input image.

The neural networks at play here are of the generative adversarial network (GAN) variety. Broadly speaking, GANs are two-part neural networks consisting of generators that produce samples and discriminators that attempt to distinguish between the generated samples and real-world samples. To train the discriminator, the researchers randomly chose small batches of machine-retouched and expert-retouched photos; the contrast, illumination, and saturation features are extracted and concatenated with the RGB image to form an input.

They “taught” the AI system in Google’s TensorFlow framework on a Nvidia P40 Tesla GPU, and their corpus of choice was MIT-Adobe FiveK, a dataset containing 5,000 RAW photos — i.e., files with minimally processed data from the image sensor — and corresponding retouched ones edited by five experts for each photo. Specifically, they used 2,000 unretouched images, 2,000 retouched images, and 1,000 RAW images for testing.

DeepExposure outperformed state-of-the-art algorithms in key metrics, managing to consistently restore most details and styles in original images while enhancing brightness and colors.

“[Our] method bridges deep-learning methods and traditional methods of filtering: Deep-learning methods serve to learn parameters of filters, which makes more precise filtering of traditional methods,” the team wrote. “And traditional methods reduce the training time of deep-learning methods because filtering pixels is much faster than generating pixels with neural networks.”