Let the OSS Enterprise newsletter guide your open source journey! Sign up here.
Data may well be world’s most valuable resource today, given the role it plays in driving all manner of business decisions. But combining data from SaaS applications and other sources to unlock insights is a major undertaking, all the more difficult when it comes to real-time, low-latency data streaming.
New York-based Estuary is setting out to solve this problem with a “data operations platform” that combines the benefits of “batch” and “stream” data processing pipelines.
“There’s a Cambrian explosion of databases and other data tools which are extremely valuable for businesses but difficult to use,” Estuary cofounder and CEO David Yaffe told VentureBeat. “We help clients get their data out of their current systems and into these cloud-based systems without having to maintain infrastructure, in a way that’s optimized for each of them.”
To advance this goal, Estuary today announced it has raised $7 million in a seed round led by FirstMark Capital, with participation from a slew of angel investors, including Datadog CEO Olivier Pomel and Cockroach Labs CEO Spencer Kimball.
The state of play
Batch data processing, for the uninitiated, describes the concept of integrating data in batches at fixed intervals and can be used for things like processing last week’s sales data and compiling a departmental report. Stream data processing, on the other hand, is all about harnessing data in real time. This is particularly useful if a company wants to generate insights into sales as they’re happening, for example, or a customer support team needs all recent data about a customer, including their purchases and website interactions.
While there has been significant progress in the batch data-processing sphere in terms of being able to extract data from SaaS systems with minimal engineering support, the same can’t be said for real-time data. “Engineers who work with lower-latency operational systems still have to manage and maintain a massive infrastructure burden,” Yaffe said. “At Estuary, we bring the best of both worlds to data integrations: the simplicity and data retention of batch systems and the [low] latency of streaming.”
All of the above is already possible using existing technologies, of course. If a company wants low latency data capture, they can use open source tools such as Plusar or Kafka to set up and manage their own infrastructure. Or they can use existing vendor-led tools like HVR, which Fivetran recently acquired, although that is mostly focused on capturing real-time data from databases, with limited support for SaaS applications.
But Estuary is offering a fully managed ELT (extract, load, transform) service the company says “combines both millisecond-latency and point-and-click simplicity,” bringing open source connectors similar to Airbyte to low-latency use cases.
“We’re creating a new paradigm,” Yaffe said. “So far, there haven’t been products to pull data from SaaS applications in real time — for the most part, this is a new concept. We are bringing, essentially, a millisecond-latency version of Airbyte, which works across SaaS, database, pub/sub, and filestores, to the market.”
There has been an explosion of activity across the data integration space of late, with Dbt Labs raising $150 million to help analysts transform data in the warehouse and Airbyte closing a $26 million round of funding. Elsewhere, GitLab spun out an open source data integration platform called Meltano. Estuary is certainly in line with these players, but it’s aiming to set itself apart by focusing on both batch and stream data processing and covering more use cases in the process.
“It’s such a different focus that we don’t see ourselves as competitive with them, but some of the same use cases could be accomplished by either system,” Yaffe said.
The story so far
Yaffe was previously cofounder and CEO of Arbor, a data-focused marketing tech company he sold to LiveRamp in 2016. Arbor created Gazette, the backbone of its managed commercial service Flow, which is currently in private beta.
Enterprises can use Gazette “as a replacement for Kafka,” according to Yaffe, and it has been entirely open source since 2018. Gazette builds a real-time data lake that stores data as regular files in the cloud and allows users to integrate with other tools. It can be a useful solution on its own, but using it as part of a holistic ELT toolset requires considerable engineering resources, which is where Flow comes into play. Companies use flow to integrate all the systems they need to generate, process, and consume data, unifying the “batch versus streaming paradigms” to ensure a company’s current and future systems are “synchronized around the same datasets.”
Flow is source-available, meaning it offers many of the freedoms associated with open source, except its Business Source License (BSL) prevents developers from creating competing products from the source code. On top of that, Estuary licenses a fully managed version of Flow.
“Gazette is a great solution in comparison to what many companies are doing today, but it still requires talented engineering teams to build and operate applications that will move and process their data — we still think this is too much of a challenge compared to the simpler ergonomics of tooling within the batch space,” Yaffe explained. “Flow takes the concept of streaming [that] Gazette enables and makes it as simple as Fivetran for capturing data. The enterprise uses it to get that type of advantage without having to manage infrastructure or be experts in building and operating stream processing pipelines.”
While Estuary doesn’t publish its pricing, Yaffe said it charges based on the amount of input data Flow captures and processes each month. In terms of existing customers, Yaffe wasn’t at liberty to divulge specific names, but he did say the typical client operates in marketing tech or ad tech and that enterprises also use it to migrate data from on-premises databases to the cloud.
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more