Tonic.ai, which taps AI to generate synthetic tabular data, raises $35M

Tonic.ai, a company that mimics production data with fake data that can be used for quality assurance and testing, today announced that it raised $35 million in series B financing led by Insight Partners with participation from GGV Capital, Bloomberg Beta, William Smith from Octave, Heavybit, and Silicon Valley CISO Investments. The funds will be used to improve developer relations and support future platform development, with a focus on implementing machine learning and AI to synthesize test datasets, according to CEO Ian Coe.

The synthetic data market is growing at a rapid clip. According to one analysis, out of the 58 largest startups in the space, 45% were created in the last two years. In 2020, venture capital firms injected at least $78 million into these startups, a 78% increase from 2019 -- boosting the segment to over $210 million in value.

Founded in 2018 and with offices in San Francisco and Atlanta, Tonic provides enterprise tools for database deidentification, synthesis, subsetting, and more. The platform lets developers create synthetic versions of their data for use in development and testing while taking steps to protect customer privacy, Coe says.

"Andrew Colombi, Adam Kamor, Karl Hanson, [and I] were several business development engineers sitting on-site in an empty building trying to debug some failing code," Coe told VentureBeat via email. "We had a large, brilliant development team in Palo Alto eager to help us, but they had no way to send the developers the data that was causing all the problems. The data was confidential client data containing a myriad of personal identifiable information. [That's when we] came up with the idea to build a platform that assists in navigating around these issues."

Fake data

Synthetic data is annotated information generated as an alternative to real-world data. Synthetic data closely mirrors real-world data, mathematically or statistically. And while the jury's out on its accuracy, some research suggests it can be as good for training a model compared with data based on actual objects, events, or people.

Tonic's platform leverages AI to preserve ratios, relationships, and dependencies within certain data. It applies differential privacy during data transformations to muffle the impact of outliers and provide mathematical guarantees of privacy. Moreover, Tonic allows columns to be linked and partitioned across tables or databases to mirror the data's complexity and ensure that inputs map to the same outputs. And it flags sensitive information to alert users to changes in up to tens of thousands of database rows and hundreds of tables.

"We do use GANs (generative adversarial networks) when applying machine learning to automating the process of data synthesis," Coe said. "Developers need test data to test software in pre-production environments. In the largest software developments in the world, datasets and schemas are extremely large and complicated. This means that building scripts or manually creating test data sets are near impossible. Using production data is illegal under laws like HIPAA, PCI, and GDPR."

Tonic, which competes with Delphix, Gretel.ai, Mostly AI, and Hazy in the emerging synthetic data generation market, says it quadrupled its team size to 40 in the past year while growing revenue by more than 600%. The company currently counts eBay, The Motley Fool, Flexport, Dreambox, and Everlywell among its customers, as well as others in health care, financial services, education, logistics, and ecommerce industries.

"We want to push the mass adoption of synthetic data within modern CI/CD pipelines," Coe continued. "Tonic offers the ability to mimic production data while maintaining the utility and behavior of the data for the developers, analysts, and DevOps teams."

In a June 2021 report on synthetic data, Gartner predicted by 2030, most of the data used in AI will be artificially generated by rules, statistical models, simulations, or other techniques. If the current trend holds and companies like Tonic have their way, that could well be the case.