VentureBeat presents: AI Unleashed - An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More
As enterprises continue to double down on data lakehouses, data and AI company Databricks is shifting gears with Delta Lake, the open-source framework serving as the foundation to store data and tables in its own lakehouse offering.
Today, at its annual conference, the lakehouse vendor announced the launch of Delta Lake 3.0, which features automatic support for competing Apache Iceberg and Hudi table formats. The move, the company says, will allow enterprise users to eliminate complicated integration work and focus on building truly open data lakehouses.
“Customers shouldn’t be limited by their choice of (table) format,” said Databricks cofounder and CEO Ali Ghodsi. “With this latest version of Delta Lake, we’re making it possible for users to easily work with whatever file formats they want, including Iceberg and Hudi, while still accessing Delta Lake’s industry-leading speed and scalability.”
Delta Lake 3.0 also includes Delta Kernel, an initiative that makes it easier to develop and maintain Delta connectors, and Liquid Clustering for cost-effective data clustering even as datasets grow.
An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.
Unification play from Databricks
After the initial rise of first-generation Apache Hive, three open table formats have largely dominated the data ecosystem: Delta Lake, Apache Iceberg and Apache Hudi.
While each of these formats has its own core strength with support for common file formats like Parquet to efficiently handle analytic workloads, data platform vendors have been focusing on one primary table format (like competitor Snowflake’s support for Iceberg) while providing connector support for the others. This meant users had to choose one of the three and engage in complicated integration work.
Now, with the release of Delta Lake 3.0, there’s no need to compromise anymore, according to Databricks. The company is adding Universal Format (UniForm), which offers automatic support for Iceberg and Hudi within Delta, enabling greater interoperability across ecosystems and making it possible for data originating elsewhere to be pulled into Delta Lake. Databricks’ support of the three formats keeps it firmly in the lead in the push toward openness and simplicity. Microsoft recently pushed forward with a commitment to Delta Lake with its new Microsoft Fabric offering.
(Editor’s note: Come learn more about data generative AI in the enterprise at VB Transform on July 11 & 12 in San Francisco, our networking event for enterprise technology decision makers focused the explosive technology.)
When using UniForm, data stored in Delta Lake can be read from as if it were Iceberg or Hudi. The capability automatically generates the metadata needed for Iceberg or Hudi and unifies the table formats, saving users from the hassle of choosing or doing manual conversions between formats.
“With Delta Lake 3.0, Databricks is providing unification of metadata between these formats, while expanding access to a much broader ecosystem of connectors query tools,” Adam Ronthal, VP Analyst for data management and analytics at Gartner told VentureBeat. “The biggest impact here will be in the ability to share metadata between these formats as part of a broader data ecosystem.”
What’s more in Delta Lake 3.0?
In addition to the Universal Format, Delta Lake 3.0 includes Delta Kernel and Delta Liquid Clustering.
Delta Kernel is designed to tackle the hassle of reworking Delta connectors with each new version or protocol change. With just one stable API, the offering will ensure that connectors are built against a core Delta library that implements the latest specifications. Meanwhile, Liquid Clustering introduces a flexible data layout technique that will provide cost-efficient data clustering as data grows, helping companies meet their read-and-write performance requirements.
“Delta Lake 3.0, including Universal Format and Kernel, underlines the open source community’s dedication to enhancing data reliability and delivering advanced analytics,” said Mike Dolan, SVP of projects at The Linux Foundation. “This release is a step forward in creating a community-driven ecosystem of data integrity, seamless collaboration and real-time analytics tools.”
According to statistics from Databricks, Delta Lake garners more than a billion downloads per year as well as regular feature updates from contributing engineers across businesses like AWS, Adobe, eBay, Twilio and Uber.
Databricks’ Data and AI Summit runs through June 29 in San Francisco.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.