Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More


Enterprises often rely on data warehouses and data lakes to handle big data for various purposes, from business intelligence to data science. But these architectures have limitations and tradeoffs that make them less than ideal for modern teams. A new approach, called a data lakehouse, aims to overcome these challenges by integrating the best features of both.

First, let’s talk about the underlying technology: A data warehouse is a system that consolidates structured business data from multiple sources for analysis and reporting, such as tracking sales trends or customer behavior. A data lake, on the other hand, is a broader repository that stores data in its raw or natural format, allowing for more flexibility and exploration for applications such as artificial intelligence and machine learning.

However, these architectures have drawbacks. Data warehouses can be costly, complex and rigid, requiring predefined schemas and transformations that may not suit all use cases. Data lakes can be messy, unreliable and hard to manage, lacking the quality and consistency that data warehouses provide.

A data lakehouse is a hybrid solution that tries to address these issues by combining the scalability and diversity of a data lake with the reliability and performance of a data warehouse. 

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

 

Register Now

According to Adam Ronthal, a vice president analyst for data management and analytics at Gartner, the lakehouse architecture has two goals: “One, to provide the right level of data optimization required to serve its target audience, and two, to physically converge the data warehouse and the data lake environment.” He explained this concept in an interview with VentureBeat.

By moving to a data lakehouse, enterprises can benefit from a single platform that can serve multiple needs and audiences, without compromising on quality or efficiency. However, this transition also poses some challenges, such as ensuring compatibility, security and governance across different types of data and systems. Enterprises need to carefully plan and execute their migration strategy to avoid business disruption and achieve their desired outcomes.

How does a data lakehouse help?

When a company implements a data lakehouse, it allows the organization to store all of its data, from highly structured business records to messy, unstructured data like social media posts, in one repository. 

This unified approach enables teams to run both real-time dashboards and advanced machine learning applications on the same data, unlocking new insights and opportunities for data-driven decision-making across the organization.

Proponents argue that the data lakehouse model provides greater flexibility, scalability and cost savings compared to legacy architectures. When designed well, a data lakehouse allows for real-time analysis, data democratization, and improved business outcomes via data-driven decisions.

The hurdles of moving data to a lakehouse

While the benefits of a data lakehouse are clear, migrating existing data workloads is not a simple task. It can involve high costs, long delays and significant disruptions to the operations that depend on the data. Essentially, when data assets are already residing in existing legacy architecture and driving multiple business applications, migration can be expensive and time-consuming, and create a material disruption for the business — leading to potential loss of customers and revenue. 

“If you have already moved a considerable amount of data into a data warehouse, you should develop a phased migration approach. This should minimize business disruption and prioritize data assets based on your analytics use cases,” Adrian Estala, field chief data officer at Starburst, told VentureBeat.

As part of this, Estala explains, a company should first establish a virtualization layer across existing warehouse environments, building virtual data products that reflect the current legacy warehouse schemas. Once these products are ready, it can use them to maintain existing solutions and ensure business continuity.

Then, the executive said, teams should prioritize moving datasets based on cost, complexity or existing analytics use cases. Ronthal also suggested the same, signaling a “continuous assessment and testing” approach to ensure gradual migration while also making sure that the new architecture meets the organization’s needs.

“It’s primarily around finding out where the line of ‘good enough’ is,” the VP analyst noted. “I might start by taking my most complex data warehouse workloads and trying them on lakehouse architecture … My primary question becomes ‘can the lakehouse address these needs?’ If it cannot, I move to my next most complex workload until I find the line of good enough, and then I can make an assessment as to how viable the lakehouse architecture is for my specific needs.”

Once the workloads are test-moved, data architects can build on this strategy and take over the process of how data assets are moved, where they are placed and which open formats are utilized. This step will not be very complex as there are many methods for moving data to the cloud, from the cloud or across clouds. Plus, all regular database migration rules will also apply, starting from schema migration and quality assurance to application migration and security.

“On the front end, the data consumers shouldn’t care, and if you’re really good, some of them should not even be aware that the data was moved. The back end should be completely abstracted. What they should notice is easier access to reusable data products and much greater agility for iterating through improvements to their data solutions,” Estala said.

A matter of return on investment

Moving to a lakehouse is not a decision to be taken lightly. It should be driven by clear business goals, such as improving data access and performance, and not by mere curiosity or novelty. If a company is satisfied with its current data warehouse and does not see any compelling benefits from switching to a lakehouse, it may be better off sticking with what works and allocating its resources to other areas. Otherwise, it may end up wasting time and money and raising doubts among its stakeholders.

Lakehouse may be the future of data analytics, but it is not a one-size-fits-all solution.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.