Nvidia trains AI to transform 2D images into 3D models

Nvidia Research created an AI system that can predict 3D properties of 2D images without any 3D training data. The work will be presented at the annual conference on Neural Information Processing Systems, where researchers in academia and industry share the latest in cutting-edge machine learning. Now in its 33rd year, the conference formerly known as NIPS will take place this week in Vancouver, Canada. With more than 13,000 participants, NeurIPS is the largest AI research conference of the year.

The work, which was conducted by researchers from Vector Institute, University of Toronto, Nvidia Research, and Aalto University, is detailed in the paper "Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer."

For next steps, Nvidia director of AI and paper coauthor Sanja Fidler told VentureBeat in a phone interview that the company may attempt to extend the differentiable rendering framework (DIB-R) to more complex tasks, like rendering 3D models for multiple objects or entire scenes. Such work could have applications in gaming, AR/VR, robotics, or object tracking systems.

"Imagine you can just take a photo and out comes a 3D model, which means that you can now look at that scene that you have taken a picture of [from] all sorts of different viewpoints. You can go inside it potentially, view it from different angles -- you can take old photographs in your photo collection and turn them into a 3D scene and inspect them like you were there, basically," she said.

A number of deep learning in 3D works have already been developed. Facebook AI Research and Google's DeepMind have also made 2D to 3D AI, but DIB-R is one of the first neural or deep learning architectures that can take 2D images and then predict several key 3D properties, such as shape, 3D geometry, and color and texture of the object, Fidler said.

"So there [are] quite a few previous works, but none of them really was able to predict all these key properties together. They're either focusing on just predicting geometry or perhaps color, but not ... shape, color, texture, and light. And this really completes -- not [a] fully complete, but [a] much more complete understanding of the object in a scene," she said.

A related work at NeurIPS attempts to predict the shape of people's voices based on the sound of their voice.

"I think this is a very interesting domain," Fidler said. "We didn't tackle it in this particular paper, but in terms of deep learning, it's another interesting input that you can provide to the neural architecture, and you can get really good 3D information. Nowadays, I think that's definitely valid."

DIB-R follows the release earlier this year of Kaolin, an Nvidia 3D deep learning library with a range of models to help people get started on 3D processing with neural nets.

Nvidia will present five papers at NeurIPS and will participate today in a number of collocated workshops, like Queer in AI, Latinx in AI, Black in AI, and Women in Machine Learning in AI.

More