Nvidia introduces AI for generating video conference talking heads from 2D images

Nvidia AI researchers have introduced AI to generate talking heads for video conferences from a single 2D image. The team says they are capable of achieving a wide range of manipulation, from rotating and moving a person's head to motion transfer and video reconstruction.

The AI uses the first frame in a video as a 2D photo and then uses an unsupervised learning method to gather 3D keypoints within a video. In addition to outperforming other approaches in tests using benchmark datasets, the AI achieves H.264 quality video using one-tenth of the bandwidth that was previously required.

Nvidia research scientists Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu published a paper about the model Monday. Results show the latest AI model outperforms vid2vid, a few-shot GAN detailed in a paper published at NeurIPS last year with Wang listed as lead author and Liu a coauthor.

"By modifying the keypoint transformation only, we are able to generate free-view videos. By transmitting just the keypoint transformations, we can achieve much better compression ratios than existing methods," the paper reads. "By dramatically reducing the bandwidth and ensuring a more immersive experience, we believe this is an important step toward the future of video conferencing."

The release of the model follows the debut in October of Maxine, an Nvidia video conferencing service. In addition to offering virtual backgrounds like Zoom does, Maxine will deliver subtle AI-powered features like face alignment and noise reduction with less conspicuous features, like a conversational AI avatar or live translation.

Video calls for Microsoft Teams and Zoom also use forms of AI to do things like blur backgrounds and power augmented reality animation and effects. A paper about the Nvidia AI release was published a day before Salesforce acquired Slack for $27 billion, news that could shake up the enterprise communications landscape and fuel the feud between Microsoft Teams and Slack. Microsoft also introduced an update to the Teams calling experience today.

Nvidia is one of the best-known companies working on generative adversarial (GANs) models like StyleGan that have the ability to blur the lines between reality and fakes. Such AI models have potential applications for entertainment and gaming, but also for disinformation and the creation of fake accounts. While widespread concerns about the possibility of deepfakes accelerating misinformation leading up to the U.S. presidential election in November were thankfully not fulfilled, GANs did enter the picture. This fall, Russian state actors used fake profile images generated using GANs as part of an effort to propel propaganda by creating a fake news outlet staffed by actual Russian writers. In an incident in 2019, AI-generated images were used to make a profile for Katie Jones, a fake person used to reach out to Washington D.C. political influencers and think tank researchers.

More