Last month, a team of researchers from the University of Washington debuted an experimental technique that cuts characters straight out of a static 2D image and transforms them into 3D animations that literally step out into the real world with the help of AR as the medium. The Photo Wake-Up method can be applied to photos, posters, or even graffiti art, infusing life into the embedded Pinocchios so that they can freely walk, run, and jump out into reality.
“We believe the method not only enables new ways for people to enjoy and interact with photos, but also suggests a pathway to reconstructing a virtual avatar from a single image while providing insight into the state of the art of human modelling from a single photo.” the team told the MIT Technology Review in an interview in December.
The idea of reconstructing a 3D model based on a 2D image is actually not that new. In fact, Andrew Ng’s Stanford 3D Reconstruction Group was toiling on it over a decade ago. There has however never been a better time (or more urgent need) for the solution to emerge than now, and it has all to do with the problem of content supply faced by the adolescent VR and AR industry.
Expanding the 3D model toolkit
At the moment, the majority of 3D models for VR and AR content is still created manually by 3D artists, which is time intensive, costly, and not at all scalable. There is also the question of the shortage of talent which isn’t helped by the growing popularity of the gig economy. That is partly why a 3D repository like Sketchfab has become so popular as a marketplace that lets artists publish, share, discover, buy and sell 3D, VR and AR content.
Still, there is a pressing need for the creator community to wean off its dependence on human labor and benefit in greater proportion from those tools that offer the kind of automation that creates shortcuts and “hacks” that can quicken and elevate the creation process.
“For true immersive worlds, the best content creators use a plethora of techniques and software to create realistic assets.” says Jan-Michael Tressler, founder and CEO at Trnio. “I’ve seen artists use photogrammetry to capture an object, and use that initial 3D asset as a starting point. From there, the artist will simplify and optimize the asset to work in AR/VR engines.”
3D scanning tech like photogrammetry and volumetric capture are continually evolving in terms of the quality of the output, cutting more and more time on production cycles, which makes it easier to render the real world itself into a direct source for churning out 3D models. This is the kind of tech that helps to make things look less intimidating for current creators, as well as any fence-sitters or potential newcomers.
“We believe most content creators aren’t exploring 3D content because they don’t have the tools.” says Charles Carriere, founder and President at New Orleans-based Scandy. “The fact is that almost all of the content being created right now is 2D because everyone has access to and experience with 2D tools – the most important being the phone camera. Give these highly-creative Snapchat, Instagram, and YouTube influencers tools to create in 3D and a platform to reach consumers, and you are going to see an explosion in high quality 3D content.”
But the timing is still off. Neither new blood in talent or any advance in 3D scanning tech are going to offer a fast or substantial enough solution to remedy the current shortfall in content supply that the industry is facing. Fair or not, the consumer and enterprise market has a short attention span that expects the production and iteration cycles of VR and AR content to more or less match the pace to which it is accustomed with 2D content.
2D > 3D reconstruction to the rescue
And that would be an unreasonable demand were it not for the new lineup of 3D reconstruction techniques like Photo Wake-Up that offer the sort of magic bullet to suddenly render the entire universe of 2D images, assets, and libraries into a fairly inexhaustible supply of 3D models that can near-automatically populate immersive realms to the brim. Indeed, it allows the industry to not only catch up, but perhaps even trigger a Cambrian explosion of immersive scale.
And it is the AR retail and shopping space that will serve as the first beachhead. A team of veteran entrepreneurs, including Apple, Facebook, PayPal alums and Stanford PhDs came out of stealth this week as Threedy.ai, a deep tech startup that answers the 3D content supply problem specifically for what I tend to consider to be one of the sectors that is most ripe and thirsty for it.
“You would think the manufacturers who created those products must have the 3D models somewhere, right? Turns out that even if one has access to those CAD files, the files can produce the 3D mesh, but it has no texture or material associated with it.” explains Nima Sarshar, co-founder and CEO at Threedy.ai. “There’s no unified AR model creation workflow to help with the fragmented space of 6+ major CAD vendors and 14+ file formats. A whole cottage industry exists for outsourcing manual texture wrapping.”
For example, the Houzz AR app allows you to try out a piece of furniture in your house by checking out how it fits in your room according to specs like size and color. The challenge here is that all the leading AR shopping apps like Houzz, Wayfair, and Overstock only have 3D models for a small percentage of their inventory. Houzz, for instance, has 3D models for about 3 percent of its dining room furniture category.
“We are aiming to become the “Getty Images” of 3D models for commercial products.” says Sarshar. “Imagine a repository of 3D models for every item in Amazon’s catalog.”
Their first product, Threedy Convert, takes ordinary 2D product photos of home and furniture and automatically transforms them into high-quality 3D models using their proprietary computational geometry and deep learning algorithms. The tech can be applied en masse to a growing category of products, often from just a single photo of the product, and the costing is nearly two orders of magnitudes cheaper than current solutions.
“Scanning can give you a higher quality, but it is still tedious and expensive. Also downsampling a high-poly model from scanner to a low-poly model suitable for XR is not easy.” says Sarshar. “The other problem is that for many e-commerce sites you simply don’t have the physical object at hand, and you only have limited unstructured photos of the product.”
It might be surprising when a lower dimension comes to the aid and assist a higher one, but it is definitely a most welcome one. The tech isn’t a band-aid remedy for the short-term, but represents an altogether new channel for “transmigrational” content that has manifested into immersive reality at exactly the right moment as the VR and AR industry continues its march towards its inflection point, which by my estimation could be this year.
Amir Bozorgzadeh is CEO at Virtuleap, the startup that enables the body to speak in VR and AR using neuroscience research and machine learning, so that companies and brands can know if their users are excited, angry, or bored out of their minds.