The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!


Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

Open source data integration platform Airbyte has announced its first data lake integration, allowing users to replicate data from myriad sources to Amazon’s Simple Storage Service (S3). The San Francisco-based startup said it plans to soon support data lakes from “other cloud providers” — including Databricks’ open source Delta Lake.

Businesses of all sizes have an abundance of data spread across tools such as CRM, marketing, customer support, and product analytics. While accessing the data isn’t the problem, deriving meaningful insights from data stored in different locations and formats is — so businesses have to combine it in a centralized location and transform it into a common format that makes it easier to analyze.

From ETL to ELT

A typical process for achieving this is what’s known as “extract, transform, load” (ETL), which involves transforming the data before it arrives in a central data warehouse. This made more sense with expensive on-premises storage, even though the transformation process could be painfully slow and the user would often have to re-extract the data if their needs changed. The modern alternative — “extract, load, transform” (ELT) — allows companies to transform the raw data on-demand when it’s already in the warehouse. This has been enabled through the lower costs attributed to modern cloud-based storage and computation platforms such as Databricks, Snowflake, Google’s BigQuery, and Amazon’s Redshift.

Airbyte is chiefly concerned with the “EL” part of ELT, though it also supports the transformation phase through integrations with third-party tools such as dbt. The company recently launched its Connector Development Kit (CDK) to enable businesses to create their own custom data source connectors, but it also offers dozens of prebuilt connectors. This makes it easier for companies to create data pipelines and transport their data from sources such as CRMs (e.g. Salesforce), databases (e.g. MySQL, PostreSQL), and analytics (e.g. Amplitude) to destinations including databases (e.g. BigQuery), data warehouses (e.g. Snowflake) and — now — data lakes.

Data lakes and date warehouses serve very distinct purposes — the former house raw, unstructured data, which is more flexible but storage-intensive, while the latter is all about structured data that has already been processed and filtered for specific use cases, as determined by the company. Airbyte’s decision to support S3 makes sense, given that it needs to open itself to as many potential data integration scenarios as possible.

Above: Airbyte: Data replication

Open for business

Open source data integration tools have been big news of late. Last week GitLab announced it was spinning out its open source ELT (extract, load, transform) platform Meltano as a standalone business, a project that aims to achieve something similar to Airbyte. Moreover, as an independent business, Meltano has managed to attract some big-name investors, including Alphabet’s GV and WordPress founder Matt Mullenweg. Elsewhere, Dbt Labs (formerly Fishtown Analytics) last week raised $150 million at a $1.5 billion valuation to build out its open source dbt data transformation tool, which Meltano and Airbyte leverage in their respective products.

Airbyte, for its part, has raised north of $31 million in the past few months, beginning with a $5.2 million seed raise in March and followed by a $26 million series A round less than three months later. It seems the open source data ETL industry is heating up.

For now, Airbyte’s core product is the free and MIT-licensed community edition, though it eventually plans to go commercial through a hosted cloud incarnation, with an additional enterprise-grade offering in the works.

VentureBeat

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member