In late 2018, Facebook launched 3D Photos, a feature that leverages depth data to create images that look flat but that can be examined from different angles using virtual reality (VR) headsets, through Facebook on the web or Facebook’s mobile apps. It initially required a depth map file on desktop or dual-camera phones like the Galaxy Note10 or iPhone 11, but starting today, 3D Photos is compatible with any modern handset with a single camera — specifically an iPhone 7 or higher or a midrange or better Android device.
Facebook says that “state-of-the-art” machine learning techniques made the expanded phone support possible. Newly deployed AI models can infer the 3D structure of images without depth data, regardless of the images’ ages or origins. It even works with selfies, paintings, and complex scenes. “This advance makes 3D photo technology easily accessible for the first time to the many millions of people who use single-lens camera phones or tablets,” wrote Facebook in a blog post. “It also allows everyone to experience decades-old family photos and other treasured images in a new way, by converting them to 3D.”
Once posted, 3D Photos are viewable by any Facebook user, as well as in VR through the Oculus Browser on Oculus Go or Firefox on the Oculus Rift. They can also be shared through Facebook Stories, where they disappear after 24 hours — as with 3D photos shared to the Facebook News Feed, you’re able to see who’s viewed, reacted to, and responded to them. But restrictions apply. 3D photos can’t be edited, and if you’d like to share a 3D photo, you can’t add multiple photos to a post. 3D photos can’t be added to an album, and if you’re posting a 3D photo from a Page, you won’t be able to boost it or use it in advertisements.
The (data) science behind 3D Photos
Facebook says that improving 3D Photos required overcoming a range of technical challenges, including (but not limited to) training a model that correctly guesses how objects might look from different perspectives and that can run on typical mobile processors in “a fraction of a second.” The 3D Photos team settled on a convolutional neural network and trained it on millions of pairs of 3D images and their accompanying depth maps, after which they used building blocks inspired by FBNet — a family of models for resource-constrained environments — to optimize the model for mobile devices.
To find the optimal architecture configuration, the 3D Photos team employed an automated process using an algorithm called ChamNet, which was developed by Facebook AI Research. ChamNet iteratively samples points from a search space to train an accuracy predictor, which accelerates the search for a model that maximizes accuracy while satisfying resource constraints. The search for the model underpinning the new 3D Photos took roughly three days using 800 Nvidia Tesla V100 graphics cards, according to Facebook.
To reduce the number of bytes that had to be transferred to various devices on first use, the 3D Photos team quantized — or mapped large values to smaller values — the weights (coefficients that connect neurons in a layered AI model) and activations (functions that determine the output of a model, its accuracy, and its efficiency) to 8 bits. (This required only a quarter of the storage taken up by the original weights and activations.) Quantized-aware training helped to prevent drops in quality by simulating quantization during training, eliminating the gap between training and production, while 8-bit operators (constructs that behave like functions) provided higher throughput compared with those of the original, larger model.
Facebook says that in the future, it intends to apply these techniques to depth estimation for videos taken with mobile devices. Additionally, it plans to explore leveraging depth estimation, surface normal estimation, and spatial reasoning in real-time apps like augmented reality. “Videos pose a noteworthy challenge, since each frame depth must be consistent with the next. But it is also an opportunity to improve performance, since multiple observations of the same objects can provide additional signal for highly accurate depth estimations,” wrote Facebook. “Beyond these potential new experiences, this work will help us better understand the content of 2D images more generally. Improved understanding of 3D scenes could also help robots navigate and interact with the physical world.”