Not to be outdone by apps like Prisma and Artisto, Google today unveiled new research that enables an application to apply an artistic style over a video and then switch to different artistic styles on demand. The work, which involves a type of artificial intelligence known as deep learning, suggests that Google wants to advance the state of the art after being inspired by the apps that go beyond simple photo filters like what you find in Instagram.
The release of the work comes a day after Facebook demonstrated applying styles to live video.
As was the case with Facebook, Google doesn’t have an app to release yet, but the search company says it will soon open-source the code, which will let people try out the technology as part of Google’s TensorFlow deep learning framework.
To do what it’s doing, Google is applying artificial neural networks — the key element in deep learning, used in an increasing number of products from Google and other companies — to an approach called style transfer.
“Unlike previous approaches to fast style transfer, we feel that this method of modeling multiple styles at the same time opens the door to exciting new ways for users to interact with style transfer algorithms, not only allowing the freedom to create new styles based on the mixture of several others, but to do it in real-time,” Google senior research scientist Jon Shlens, Google software engineer Manjunath Kudlur, and former Google Brain intern Vincent Dumoulin wrote in a blog post.
The original implementation was very slow. After uploading a single photo — not even a video — you would “still have plenty of time to go grab a cup of coffee before a result was available,” Shlens, Kudlur, and Dumoulin wrote. But of course they improved the system.
The work builds on the DeepDream system from Google that went viral last year. Prisma certainly experienced a surge in popularity this year, and something similar could happen with Google’s new technology — so long as it becomes available relatively soon, while the Prisma idea is still fresh in people’s minds.
That could give Google a short-term gain, but the impact could be more significant in the long term. The trio explain in a paper:
We think this is an important problem that, if solved, would have both scientific and practical importance. First, style transfer has already found use in mobile applications, for which on-device processing is contingent upon the models having a reasonable memory footprint. More broadly, building a separate [network] for each style ignores the fact that individual paintings share many common visual elements and a true model that captures artistic style would be able to exploit and learn from such regularities. Furthermore, the degree to which an artistic styling model might generalize across painting styles would directly measure our ability to build systems that parsimoniously capture the higher level features and statistics of photographs and images (Simoncelli & Olshausen, 2001).
Read the full paper here.
Update on November 1: Google has now open-sourced the code for adding multiple styles to a single image. It’s available on GitHub here. The code for adding multiple styles to video will come later.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here