A group of AI researchers from Facebook, Virginia Tech, and the National Tsing Hua University in Taiwan say they’ve created a novel way to generate 3D photos that’s superior to Facebook 3D Photos and other existing methods. Facebook 3D Photos launched in October 2018 for dual-camera smartphones like the iPhone X, which uses its TrueDepth camera to determine depth in photos. In the new research, the authors use a range of photos taken with an iPhone to demonstrate how their approach gets rid of the blur and discontinuity other 3D methods introduce.
The method could make for better Facebook 3D Photos someday, but if it translates to other environments it could also lead to more lifelike immersions in environments with 3D digital graphics — like virtual games and meetings or applications in ecommerce or a future metaverse.
The new learning-based method can generate 3D photos from RGB-D imagery, like photos taken with an iPhone X. It also works with simpler 2D photos by using a pretrained depth estimation model. Authors applied their method to historic images of the 20th century to demonstrate effectiveness on 2D images.
The work also claims better performance than Nvidia’s Xview, as well as Local Light Field Fusion (LLFF), a model highlighted last year by a consortium of authors at computer graphics conference SIGGRAPH.
Performance of 3D models was assessed using randomly sampled imagery from the RealEstate10K data set. Head-to-head demos of advanced 3D image generation methods are available on a website and in supplementary material created by authors Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang.
Facebook, Microsoft, and Nvidia have released tech to generate 3D objects from 2D images in recent months, but the new method relies heavily on inpainting. Inpainting is the process of AI predicting missing pixels in a photograph and has been used to auto-crop Google Photos videos and to make better unsupervised generative adversarial networks.
The cutting-edge 3D photo approach was detailed in a paper published on preprint arXiv. The authors say their work — which was motivated by EdgeConnect, a model created in 2019 using inpainting and generative adversarial networks or GANs — is different in that it applies inpainting to both color and depth value predictions. Another key difference is that the new learning method adapts to local depth complexity and does not require predetermining a fixed number of layers. Both Facebook 3D Photos and the experimental approach introduced in the recent paper rely on layered depth image (LDI) representation for a more adaptive approach.
“Each LDI pixel stores a color and a depth value. Unlike the original LDI work, we explicitly represent the local connectivity of pixels: Each pixel stores pointers to either zero or at most one direct neighbor in each of the four cardinal directions (left, right, top, bottom),” the paper reads. “Unlike most previous approaches, we do not require predetermining a fixed number of layers. Instead, our algorithm adapts by design to the local depth-complexity of the input and generates a varying number of layers across the image. We have validated our approach on a wide variety of photos captured in different situations.”
The paper was accepted for publication at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), which will take place next month. Initially scheduled for June 16-18 in Seattle, CVPR will, like other major researcher conferences, move entirely online. According to the AI Index 2019 report, CVPR is one of the largest annual machine learning conferences for AI researchers.