MIT researchers train AI to predict how humans paint works of art

MIT researchers have created an AI tool capable of generating time-lapse videos that predict how human artists use their hands to create watercolor or digital paintings. The AI is trained using time-lapse videos of people making art on Vimeo and YouTube. The probabilistic model can synthesize and predict moments in the painting process from just a single image of an artwork.

The network is meant to mimic the ability skilled human artists possess to see a piece of art and comprehend the series of brush strokes or steps a person took to put it together.

"Artists paint using unique combinations of brushes, strokes, and colors. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities," researchers wrote in a paper describing the AI.

The authors characterize their work as distinct from other forms of precognitive AI that predict future frames in a video because others tend to focus on physical processes like a flower blooming or human movement and make predictions over relatively short time frames.

The data set contains 117 digital painting time-lapse videos, averaging four minutes long, and 116 time-lapse watercolor painting videos, averaging 20 minutes each. Both data sets focus on landscape and still life paintings.

Roughly 150 human evaluators were hired from Amazon's Mechanical Turk as part of the experiment to compare video generated by MIT's model with visual deprojection, a method for recovering missing frames from videos introduced at the Conference on Computer Vision and Pattern Recognition (CVPR) in 2019.

"We show that human evaluators almost always prefer our method to an existing video synthesis baseline and often find our results indistinguishable from time-lapses produced by real artists," the paper reads. "To the best of our knowledge, this is the first work that models and synthesizes distributions of videos of the past, given a single final frame."

The frame interpolation method predicts the next sequence in generated time-lapse videos, as well as various AI style transfer techniques. To curate video data sets, a convolutional neural network removes any frames that include hands, paintbrushes, and shadows.

In other CVPR news, earlier today Microsoft CEO Satya Nadella reiterated the need for facial recognition regulation in prerecorded remarks, following the company's commitment last week to not sell the technology until federal regulation is introduced. A precognition workshop showcasing AI that predicts events based on visual data -- like MIT's painting predictor -- will take place Friday at CVPR 2020.

More