Deep Dive: Why 3D reconstruction may be the next tech disruptor

Artificial intelligence (AI) systems must understand visual scenes in three dimensions to interpret the world around us. For that reason, images play an essential role in computer vision, significantly affecting quality and performance. Unlike the widely available 2D data, 3D data is rich in scale and geometry information, providing an opportunity for a better machine-environment understanding.

Data-driven 3D modeling, or 3D reconstruction, is a growing computer vision domain increasingly in demand from industries including augmented reality (AR) and virtual reality (VR). Rapid advances in implicit neural representation are also opening up exciting new possibilities for virtual reality experiences.

3D reconstruction generates a 3D object or scene representation by combining a sparse set of images of the object or scene from arbitrary viewpoints. The method allows for accurate reconstruction of shapes with complex geometries, as well as higher color reconstruction.

3D reconstruction in the era of digital reality

With the rise of digital experiences and emerging virtual concepts such as the metaverse, it’s critical to have tools that can create accurate 3D reconstructions from image data. Real-world applications of this technology allows users to virtually try on clothing while shopping in AR and VR, as well as to process medical image data. It can also be used for free-viewpoint video reconstruction, robotic mapping, reverse engineering and even reliving memorable moments from various perspectives. According to a SkyQuest survey, the global 3D reconstruction technology market will be worth $1,300 million by 2027.

3D reconstruction is now a priority by tech and ecommerce giants, as it not only lays the groundwork for a future presence in virtual worlds but also provides immediate tangible business benefits in advertising or social commerce.

Recently, Shopify reported that merchants who add 3D content to their stores see a 94% conversion lift, a far more significant impact than videos or photos as 3D representations provide customers with details that images alone cannot.

To develop 3D reconstruction implementations, intelligent context-understanding systems must recognize an object's geometry as well as its foreground and background to accurately comprehend the depth of scenes and objects depicted in 2D photos and videos. Advanced deep learning techniques and increased availability of large training datasets have led to a new generation of methods for capturing 3D geometry and object structure from one or more images without the need for complex camera calibration procedures.

How 3D reconstruction is aiding computer vision

Synthesizing 3D data from a single viewpoint is a fundamental human vision functionality that computer vision algorithms struggle with. Furthermore, as 3D data is more expensive to acquire than 2D data, it has been challenging to access textured 3D data to effectively train machine learning models for predicting correct textures. To address these requirements, 3D reconstruction solutions seamlessly combine real and virtual objects in AR without requiring large amounts of data to learn from or being limited to a few perspectives.

3D reconstruction uses an end-to-end deep learning framework that takes a single RGB color image as input and converts the 2D image to a 3D mesh model in a more desirable camera coordinate format. The perceptual features in the 2D image are extracted and leveraged by a graph-based convolutional neural network, which produces a 3D mesh by progressively converting the input into ellipsoids until it reaches a semantically correct and optimized geometry. The rough edges in the derived 3D model are fine-tuned using a dense prediction transformer (DPT), which employs visual transformers to provide more fine-grained output.

Current Implementations of 3D reconstruction

Meta recently released Implicitron, a 3D reconstruction architecture that enables fast prototyping and 3D reconstruction of objects. Implicitron uses multiple shape architectures to generate implicit shapes, where a renderer further analyzes the input image to convert the 2D input into a 3D model. To facilitate 3D experimentation, the model includes a plug-in and configuration system that allows users to define component implementations and enhance configurations in-between implementations.

The open-source model "3Dification" can recalibrate camera angles and frames post-video capture by processing a collection of videos as input and further reconstructing the environment and scene present in the video through 3D reconstruction and 3D pose estimation methods. The processed output enables reliable re-identification in cases of challenging shot transitions, where camera viewpoints are not significant enough in certain scenes.

Future opportunities

3D research is critical for teaching systems how to understand all perspectives of objects, even when they are obstructed, hidden or have other optical challenges.

Further development of sustainable and feasible approaches will increase access to larger scientific communities and audiences while improving interoperability. Incorporating 3D reconstruction with deep learning frameworks such as tactile sensing and natural language understanding can help AI systems understand three dimensions more intuitively, just like humans do.

3D reconstruction in the era of digital reality

How 3D reconstruction is aiding computer vision

Current Implementations of 3D reconstruction

Future opportunities

More