Synthesis AI emerges from stealth with $4.5M to create synthetic face datasets

Synthesis AI, a synthetic data company, today emerged from stealth with the announcement that it closed a $4.5 million funding round. The startup says that the capital will allow it to expand its R&D team and develop new synthetic data technologies.

Self-driving vehicle companies alone spend billions of dollars per year collecting and labeling data, according to estimates. Third-party contractors enlist hundreds of thousands of human data labelers to draw and trace the annotations machine learning models need to learn. (A properly labeled dataset provides a ground truth that the models use to check their predictions for accuracy and continue refining their algorithms.) Curating these datasets to include the right distribution and frequency of samples becomes exponentially more difficult as performance requirements increase. And the pandemic has underscored how vulnerable these practices are, as contractors have been increasingly forced to work from home, prompting some companies to turn to synthetic data as an alternative.

Synthesis AI's platform leverages generative machine learning models, image rendering and composition, and other techniques to create and label images of objects, scenes, people, and environments. Customers can modify things like geometries, textures, lighting, image modalities, and camera locations to produce varied data for training computer vision models.

Synthesis AI offers datasets containing 10,000 to 200,000 scenes for common use cases including head poses and facial expressions, eye gazes, and near infrared images. But what the company uniquely provides is an API that generates millions of images of realistic faces captured from different angles in a range of environments. Using the API, customers can submit a job in the cloud to synthesize as much as terabytes of data.

Synthesis AI says its API covers tens of thousands of identities spanning genders, age groups, ethnicities, and skin tones. It procedurally generates modifications to faces to reflect changes in expressions and emotions, as well as motions like head turns and features such as head and facial hair. Built-in styles adorn subjects with accessories like glasses, sunglasses, hats and other headwear, headphones, and face masks. Other controls enable adjustments in camera optics, lighting, and post-processing.

Synthesis AI makes the claim that its data is unbiased and "perfectly labeled," but the jury's out on the representativeness of synthetic data. In a study last January, researchers at Arizona State University showed that when an AI system trained on a dataset of images of engineering professors was tasked with creating faces, 93% were male and 99% white. The system appeared to have amplified the dataset's existing biases -- 80% of the professors were male and 76% were white.

On the other hand, startups like Hazy and Mostly AI say that they've developed methods for controlling the biases of data in ways that actually reduce harm. A recent study published by a group of Ph.D. candidates at Stanford claims the same -- the coauthors say their technique allows them to weight certain features as more important in order to generate a diverse set of images for computer vision training.

Despite competition from startups like Datagen and Parallel Domain, Synthesis AI says that "major" technology and handset manufacturers are already using its API to generate model training and test datasets. Among the early adopters is Affectiva, a company that builds AI it claims can understand emotions by analyzing facial expressions and speech.

"One of our teleconferencing customers leveraged synthetic data to create more robust facial segmentation models. By creating a very diverse set of data with more than 1,000 individuals with a wide variety of facial features, hairstyles, accessories, cameras, lighting, and environments, they were able to significantly improve the performance of their models," founder and CEO Yashar Behzadi told VentureBeat via email. "[Another one] of our customers is building a car driver and occupant sensing systems. They leveraged synthetic data of thousands of individuals in the car cabin across various situations and environments to determine the optimal camera placement and overall configuration to ensure the best performance."

In the future, 11-employee Synthesis AI plans to launch additional APIs to address different computer vision challenges. "It is inevitable that simulation and synthetic data will be used to develop computer vision AI," Behzadi continued. "To reach widespread adoption, we need to continue to build out 3D models to represent more of the real world and create scalable cloud-based systems to make the simulation platform available on-demand across a broad set of use cases."

Existing investors Bee Partners, PJC, iRobot Ventures, Swift Ventures, Boom Capital, Kubera VC, and Leta Capital contributed to San Francisco, California-based Synthesis AI's seed round announced today.

More