We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Let the OSS Enterprise newsletter guide your open source journey! Sign up here.
GitLab today announced that it’s spinning out its open source ELT (extract, load, transform) platform Meltano as a standalone business, with financial backing from a number of notable VC and angel investors including Alphabet’s GV.
For context, modern data stacks typically incorporate various tools from ingestion to warehousing that enable companies to take raw data, move it between systems, and convert it into a more usable format that can be queried to generate insights. This data can be transformed prior to its arrival in the data warehouse, a process that is known as “extract, transform, load” (ETL) — this is generally seen as the “old school” way of doing things, in times when storage was more expensive and transforming the data could be painfully slow.
The modern alternative is to transform the data on-demand directly from the warehouse through ELT, which is faster but needs more processing power, such as that provided by cloud-based data warehouses like Databricks, Snowflake, Google’s BigQuery, and Amazon’s Redshift.
“A big challenge with [the old ETL way] is that if your business logic or transformations had to change, you had to re-extract all of the data again, which would slow down time to value,” Meltano CEO Douwe Maan told VentureBeat. “With the advent of cheaper storage solutions and ‘big data’ more broadly, the ELT pattern is more common.”
So what does Meltano do, exactly?
Let’s say a company has data spread across various CRM, marketing, customer support, and product analytics tools. Pooling that data might allow it to generate consumer purchasing trends and insights that wouldn’t be possible with individual data silos. But to achieve this, a company must combine this data in a centralized repository (i.e. a data warehouse) and transform it into a format that makes it easier to analyze. Or in another use case, a company might simply want to to migrate a database from MongoDB to PostgreSQL.
That, essentially, is what Meltano achieves — it enables the data “extraction” by querying a database or SaaS application; the “loading” by transitioning the data into a warehouse or file storage system; and the “transformation” by restructuring it.
There is no shortage of proprietary data integration tools out there, such as Google-owned Alooma and heavily VC-backed Matillion. However, as a community-driven open source project independent from GitLab, Meltano hopes to bring a more flexible, adaptable, and extensible platform to the data engineering realm, one that can be hosted wherever the user wants and accessed via their own orchestration tools or Meltano’s web-based interface.
“Most solutions right now are pay-to-play, which limits how many companies have access to high quality tooling,” Maan said. “Being proprietary also means that you would have to rely on a vendor to add extract and load capabilities for every source you might care about, of which there can be dozens. Being open source means the long-tail of integrations can be better served by a large community, since vendors typically only support about 150.”
Moreover, as an open source project, Meltano can be used by just about anyone for any purpose, from hobbyists to billion-dollar businesses. “We’ve seen others use it for personal data use cases, such as moving data between personal financial applications to track spending,” Maan added.
Though Meltano is an open source platform (released under a permissive MIT license) in its own right, it actually leans on a host of other open source tools including Singer, which is setting out to be the “open source standard” for writing data integration scripts with hundreds of pre-built connectors; dbt, a command-line tool for data transformation; and Apache Airflow for orchestration. Soon, Meltano will also lean on Apache Superset for data visualization.
As a side note, Dbt Labs — the company that maintains and monetizes the open source dbt project — announced a $150 million tranche of funding just today, and that is purely for the “transform” part of ELT. This gives some indication as to the size of the market that Meltano is entering. While Meltano is focused on the entire data lifecycle, its initial focus will center on the first two stages of the data integration journey.
“Data professionals more broadly are beginning to understand the value of open source for increased flexibility and extensibility, and open source communities for knowledge exchange,” Maan continued. “Dbt is a data transformation tool that is a pioneer in this space, as they’ve got a great open source product with a strong community around it. We believe this is possible for all parts of the data lifecycle, and we’re focusing heavily on the beginning stage of any data journey — extract and load.”
Meltano’s official launch as an independent business was accompanied by a $4.2 million seed funding round led by GV, alongside angel investments from WordPress founder Matt Mullenweg; early Google investor and founding board member Ram Shriram; and Max Beauchemin, who created Apache Airflow and Superset.
As a venture-backed business, there will be some pressure to turn Meltano into a money-making business similar to the countless other commercial open source companies out there. For the time being though, Meltano is laser-focused on working with and growing the community, and pushing Meltano — and Singer — as “favorite tools for solving data integration and general data lifecycle challenges,” according to Maan.
“Eventually we plan to offer both a SaaS solution and an enterprise edition with additional functionality, similar to how GitLab operates with their buyer-based open core model,” Maan added.
As for GitLab, why would it want to spin out Meltano in the first place. Surely it could flourish just as well under the wing of an established developer-focused company? According to Maan, it comes down to priorities — GitLab and Meltano have very different users and use cases in mind. Moreover, with GitLab gearing up to become a public company and Meltano really just starting out on its journey, the two entities are worlds apart.
“The primary reason is that GitLab is [so] focused on building a single application for the entire DevOps lifecycle, that we didn’t see Meltano becoming part of because of the very different markets and target audiences,” Maan explained. “As Meltano grew, it became clear that the two products would be best served by their own organizations instead of having GitLab try to cover both. The products are also in very different stages of development and growth, and Meltano needs to be able to operate like a startup to move as quickly as possible in the marketplace.”
While Meltano doesn’t have any paying enterprise customers yet, Maan said that he expects two of the project’s existing users — GitLab and Netlify — to become paying customers further down the road.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.