Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.
Let the OSS Enterprise newsletter guide your open source journey! Sign up here.
Data quality issues plague organizations, whether or not they’re actively investing in analytics. A 2017 Harvard Business Review survey found that 47% of newly-created data records had at least one critical error and that only 3% were “acceptable” in terms of quality. According to Gartner, poor data quality can cost organizations an average $12.9 million per year. That’s because it can lead to issues downstream, as data pipelines — the means of moving data from one place to another — become more complex over time.
A number of startups have emerged with platforms they claim can solve the data quality problem, like Data.World and Zaloni. Another is Superconductive, which maintains an open source framework, Great Expectations, that’s being used by brands including Vimeo, Heineken, and Calm for data management. In a show of confidence from investors, Superconductive today announced that it raised $40 million in a series B round led by Tiger Global with participation from Index, CRV, and Root Ventures, bringing the company’s total raised to $64.5 million.
Founded in 2017 by Abe Gong and Ben Castleton, Superconductive develops tools for data testing, documentation, and profiling. Castleton was a senior analyst at Massachusetts General Hospital and data engineering architect at Health Catalyst before joining Superconductive as a founding consultant. Gong previously headed data efforts at Aspire, which provides in-home nursing services to patients, and did consulting work for organizations in health and internet of things.
MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.
Superconductive’s Great Expectations, which the company claims averages around two million downloads per month, can automatically test and verify that data looks the way it should and maintains the right properties throughout its lineage. Designed to work against structured data in data warehouses (data management systems that support analytics) or relational databases (databases that store and provide access to data points), Great Expectations is modular and extensible, enabling customers to expand the framework.
“Great Expectations helps data teams eliminate pipeline debt,” the company writes on its website. “Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams.”
Great Expectations can generate documentation including Slack notifications, “data dictionaries,” and customized programming notebooks. The platform can also work with other engineering tools in production settings, ostensibly helping users to explore data and capture knowledge for future documentation and testing.
“For over a decade, software engineering teams have known that tests and documentation are the only effective way to beat technical debt in software. Great Expectations brings the same speed and confidence to the data engineering teams,” Gong told VentureBeat via email. “For managers of data teams, this translates to improved team productivity and morale, not to mention more reliable data products and analytics: fewer broken dashboards, fire drills to fix data-based client reports, mistaken analytics conclusions that need to be walked back. Data teams that eliminate pipeline debt are able to collaborate much more effectively.”
Great Expectations currently supports native execution in the popular data frameworks and languages Pandas, structured query language, and Apache Spark. All orchestration in Great Expectations is based in Python, meanwhile.
As the value of high-quality data comes into focus, investors are pouring a growing amount of money into startups tackling data infrastructure challenges. Overall investment in data management software vendors from January 2021 through mid-July 2021 was $2.2 billion, according to PitchBook — well above 2020’s $2.1 billion.
Betting on this trend, Superconductive recently launched Great Expectations Cloud, a managed software-as-a-service version of Great Expectations that’s currently in Alpha. Over time, the long-term business model is to build tools on top of Great Expectations, Gong says — making new features and a say on the roadmap available to paying users.
Competition aside, over-40-employee Superconductive might not have to look hard for customers. Seventy-three percent of companies are investing in — or plan to invest in — DataOps, the methods to improve the data analytics quality, according to a recent Nexla survey.
“The market potential for data quality is enormous, and the need for data teams is pressing. Sometimes there’s a gap between engineers’ understanding of the problem, and executives’ understanding, that gap is steadily closing as more companies embrace the imperative to become truly data driven,” Gong continued. “The open source growth is a matter of public record, and the community has nearly 6,000 members. [Great Expectations] is downloaded over 2.5 million times every month. We don’t report total customer counts, but a few notable brands using Great Expectations include Vimeo, Calm, Heinekin, and Komodo Health.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.