Synthesis AI debuts high-resolution text-to-3D capabilities with synthesis labs

Synthesis AI, a San Francisco-based startup specializing in synthetic data technologies, announced today that it has developed a new way to create realistic 3D digital humans from text prompts.

The company said its new text-to-3D technology, which is showcased in its online platform synthesis labs, uses generative artificial intelligence (AI) and visual effects pipelines to produce high-resolution, cinematic-quality digital humans that can be used for various applications such as gaming, virtual reality, film and simulation.

Synthesis AI claims it is the first company to demonstrate text-to-3D digital human synthesis at such a high level of quality and detail. The technology allows users to input text descriptions of the desired digital human, such as age, gender, ethnicity, hairstyle and clothing, then generate a 3D model that matches the specifications. Users can also edit the 3D model by changing text prompts or using sliders to adjust features like facial expressions and lighting.

The company said its text-to-3D technology was part of its broader mission to support advanced AI applications by providing perfectly labeled synthetic data to train machine learning (ML) models. Synthetic data is artificially generated data that mimics real data but does not contain any personal or sensitive information.

“The text-to-3D capability we’re showcasing in synthesis labs takes a programmatic, API-driven approach as its starting point, adds a dead-simple prompt-based user interface and outputs a high-resolution 3D model that can be used as synthetic data across a broad range of use cases that require digital humans,” Yashar Behzadi, CEO and founder of Synthesis AI told VentureBeat. “Synthesis labs externalizes some of our research and development work with actual customers.”

This announcement follows the launch of synthesis humans and synthesis scenarios, which represent in-depth offerings of human-centric synthetic data currently available in the market.

Leveraging text-to-3D with generative AI

Synthesis AI has combined generative AI and cinematic VFX pipelines to produce perfectly labeled synthetic data to train ML models. This development marks the first time that text-to-3D digital human synthesis has been demonstrated in high-resolution cinematic quality, and is expected to accelerate the development and reduce the costs of 3D applications in a variety of industries including AR/VR, gaming, VFX, smart cities, virtual try-on (VTON), automotive, and industrial and manufacturing simulations.

The creation of 3D models is a multifaceted and intricate process that demands the interplay of several elements, including geometry, meshes, and texture layers. For seasoned gaming and VFX artists, starting with a human model for human-centric characters and scenes has historically been the go-to option. This approach is often faster and more straightforward than building a computer-generated human from scratch.

However, crafting high-quality human models is a challenging feat that requires specialized photogrammetry setups. These setups are designed to capture multiple angles of actual people under controlled settings to create raw 2D images. Images are then meticulously combined using a variety of hand-crafted and optimized tools to ensure optimal quality.

Through text-to-3D digital human synthesis, the company devised an innovative approach, developing in-house models leveraging diffusion-based generative AI architectures to generate a diverse array of meshes that are governed by critical parameters such as gender, age, ethnicity and more. The texture layers are created using a separate generative model that offers fine-grained independent control.

A comprehensive and high-resolution 3D model gets produced by merging these two essential components.

_{A 3D-model generated through text prompt, Image source: Synthesis AI}

“Creating a diverse set of humans is further complicated by the logistics of recruiting specific individuals and obtaining waivers," Synthesis AI’s Behzadi told VentureBeat. "Starting with an inexpensively synthesized digital human are orders of magnitude faster and cheaper than either of those options. The text-to-3D capability enables on-demand generation of high-quality assets, saving weeks of time and thousands of dollars per model."

The new text-to-3D offerings featured in synthesis labs introduce prompt-based input and editing, making the no-code 3D generative AI capabilities more accessible to all experience levels.

“For starters, prompt-based generation and iteration brings creative power to anyone capable of using a search engine," said Behzadi. "However, we think the early adopters and power users will be technical artists across all forms of entertainment and media, as well as product managers in industrial and manufacturing software looking to populate 3D simulations with representative digital humans. These are both technical audiences, but likely don’t have advanced ML skills.”

Synthesis AI’s proprietary library of more than 100,000 digital humans (or IDs) is the underlying data used to train the models. The company’s other products, synthesis humans and synthesis scenarios, already leverage this library to support leading computer vision teams with labeled training data to support the development of face ID capabilities, driver monitoring, avatars and more.

What's next for Synthesis AI?

The launch of synthesis labs represents a significant milestone in Synthesis AI’s journey to enable enterprise, industrial and public sector customers to simulate reality by synthesizing any person, place or object. Applications include simulation and synthetic data to train computer vision models in VFX, AR/VR, and media and content creation.

The new text-to-3D digital human capabilities will be available to a select group of beta testers starting in Q2 this year.

“Opening up the capability to external users will allow us to leverage community feedback to further refine the underlying generative models,” said Behzadi. “Reinforcement learning from human feedback (RLHF) is key to continually improving the performance of the underlying models and discovering edge cases.”

Behzadi said that by combining generative AI with cinematic visual effects pipelines, companies would be able to synthesize the world, including humans environments, and objects.

“We hope to continue to innovate and lower the bar for developers to create assets and synthetic data to drive the state-of-the-art forward in computer vision,” he added.

Leveraging text-to-3D with generative AI

What's next for Synthesis AI?

More