Engineers and researchers from Samsung’s AI Center in Moscow and Skolkovo Institute of Science and Technology have created a model that can generate realistic animated talking heads from images without relying on traditional methods, like 3D modeling.
Samsung opened AI research centers last year in Moscow, Cambridge, and Toronto.
“Effectively, the learned model serves as a realistic avatar of a person,” said engineer Egor Zakharov in a video explaining the results.
Well-known faces seen in the paper include Marilyn Monroe, Albert Einstein, Leonardo da Vinci’s Mona Lisa, and RZA from the Wu Tang Clan. The technology that focuses on synthesizing photorealistic head images and facial landmarks could be applied to video games, video conferences, or digital avatars like the kind now available on Samsung’s Galaxy S10. Facebook is also working on realistic avatars for its virtual reality initiatives.
Such tech could clearly also be used to create deepfakes.
Few-shot learning means the model can begin to animate a face using just a few images of an individual, or even a single image. Meta training with the VoxCeleb2 data set of videos is carried out before the model can animate previously unseen faces.
During the training process, the system creates three neural networks: The embedded network maps frames to vectors, a generator network maps facial landmarks in the synthesized video, and a discriminator network assesses the realism and pose of the generated images.
“Crucially, the system is able to initialize the parameters of both the generator and the discriminator in a person-specific way so that training can be based on just a few images and done quickly, despite the need to tune tens of millions of parameters. We show that such an approach is able to learn highly realistic and personalized talking head models of new people and even portrait paintings,” coauthors said in a summary of the paper on arXiv.
In other forms of AI recently developed to mimic human faces, University of Washington researchers last year shared how they created ObamaNet, a lip sync model based on Pix2Pix and trained on videos of the former U.S. president.
And University of California, Berkeley researchers last fall introduced a model that uses YouTube videos to train an AI data set to dance or make acrobatic moves, like backflips.
Register for GamesBeat's upcoming event: Driving Game Growth & Into the Metaverse