Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

AI and machine learning hold incredible promise for post-production image editing, if new research is any indication. Engineers at Nvidia recently demoed an AI system — GauGAN — that creates convincingly lifelike landscape photographs from whole cloth, while Microsoft scientists last month proposed a framework capable of producing images and storyboards from natural language captions.

But what about AI that paints with common sense? Thanks to an enterprising team at the MIT-IBM Watson AI Lab, a collaboration between MIT and IBM to jointly pursue AI techniques over the next decade, it’s made its way from the literature to the web. A publicly available tool — GAN Paint Studio — lets users upload any photograph and edit the appearance of depicted buildings, flora, and fixtures to their heart’s content. Impressively, it’s generalizable enough that inserting a new object with one of the built-in tools realistically affects nearby objects (for instance, trees in the foreground occlude structures behind them).

“Right now machine learning systems are these black boxes that we don’t always know how to improve, kind of like those old TV sets that you have to fix by hitting them on the side,” said PhD student at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) David Bau, a lead author on a related paper about the system. “This research suggests that, while it might be scary to open up the TV and take a look at all the wires, there’s going to be a lot of meaningful information in there.”

GAN Paint Studio

Above: Examples of edits performed by GAN Paint Studio.

So how’s it work? Given a photo as input, the machine learning system underlying GAN Paint Studio first rerenders it by finding a latent representation from which it can generate a photo that’s nearly identical to the original. As users tap the tool’s collection of image-editing settings to transform their photo, the system updates the latent representation according to each edit and renders the modified representation.


Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

Developing the model required identifying units correlated with object types (like doorways) inside a GAN, a two-part neural network consisting of generators that produce samples and discriminators that attempt to distinguish between the generated samples and real-world samples. The researchers tested the units individually to see if eliminating them would cause certain objects to disappear or appear, and over time isolated artifact-causing units to increase overall image quality.

“Whenever GANs generate terribly unrealistic images, the cause of these mistakes has previously been a mystery,” said paper coauthor and IBM research scientist Hendrik Strobelt. “We found that these mistakes are triggered by specific sets of [units] that we can silence to improve the quality of the image.”

As alluded to earlier, the system learned a few basic rules about the relationships among objects. It won’t put something where it doesn’t logically belong (like a window in the sky), and it moreover creates different visuals depending on the context. For example, asking GAN Paint Studio to add doors to two different buildings won’t result in duplicate doors; they’ll likely look quite different from each other. And that’s just the tip of the iceberg. GAN Paint Studio can “turn on” bedside lamps which were previously switched off, restyle shrubbery for spring or autumn, install windows to apartment interiors, and add rooftop domes to buildings.

GAN Paint Studio

The team believes that more sophisticated GAN-powered painting tools could someday enable designers to tweak visuals on the fly, and that could allow computer-graphics editors to compose arrangements of objects needed for particular pictures quickly. They acknowledge the technology could misused, of course, but they assert that ongoing research is the best way to prevent abuse.

“You need to know your opponent before you can defend against it,” said coauthor and MIT CSAIL postdoctoral student Jun-Yan Zhu. “This understanding may potentially help us detect fake images more easily.”

Zhu has a point. In mid-June, researchers at Adobe and the University of California, Berkeley detailed an AI program that recognizes when Photoshop’s Face Aware Liquify tool has been used to alter facial expressions. (They claim that in tests, it was able to spot manipulations with an accuracy as high as 99%, compared with the average untrained person’s success rate of 53%.) Separately, computer scientists at the University of Southern California’s Information Sciences Institute devised a system that’s able to suss out anomalous movements in fake videos generated by popular AI-powered apps.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.