Varada today extended its analytics platform to include the ability to rapidly add and remove nodes and clusters as workloads scale up and down.
One of the primary reasons organizations opt for a data warehouse deployed in the cloud over a data lake is performance. A data lake typically makes available a massive amount of data that is stored on inexpensive storage systems, usually on a cloud platform. Varada created the Varada Data Platform based on indexing technology that organizes data into nano blocks based on the type of data being queried and how it is structured. This approach allows end users to query data where it resides without needing to move it into a central data warehouse.
The company’s latest 3.0 version employs an indexing engine from Varada that accelerates SQL queries using a scale-out architecture that enables a data lake to rival the performance of a cloud data warehouses while keeping the cost of consuming infrastructure resources down, Varada CEO Eran Vanounou told VentureBeat.
As the amount of data organizations store in the cloud steadily increases, a data lake can easily transform into a data swamp because the quantity of data that needs to be queried eventually impacts performance adversely. Approaches have emerged that employ various types of distributed SQL engines to optimize query results across a narrower set of data residing in a data lake. In the case of Varada, the company late last year claimed it had developed an adaptive engine that selects the optimal index to enable each dataset to achieve that goal. It also includes an observability capability that automatically determines when to index specific datasets based on usage.
The Varada Data Platform takes data lakes to the next level by providing capabilities that go beyond simply storing data in a cloud platform, Vanounou said. “The storage aspects of a data lake have already been solved,” he said.
The challenge now is finding the most efficient way to launch queries against a massive amount of data. The first wave of data lakes based on platforms such as Hadoop often resulted in organizations creating data swamps because there was no way to dynamically organize data to make it easier to query. Cloud data warehouses emerged as an alternative to enable IT teams to manage data more effectively. The challenge is cloud data warehouses are more expensive to employ than a data lake that makes use of, for example, object-based cloud storage services. Providers of distributed SQL engines promise to provide most of the benefits of a data warehouse at a much lower total cost.
Unfortunately, many first-generation instances of the platforms employed to create data lakes didn’t live up to expectations. Providers of data lakes need to convince IT organizations that the next generation of these platforms has solved that issue. In the meantime, providers of data warehouses in the cloud continue to gain traction. However, providers of platforms for building data lakes note that a cloud data warehouse simply replaces a proprietary on-premises platform with a larger cloud platform. A data lake is designed to enable IT organizations to retain more control over which tools and applications can access their data.
It’s too early to say how this battle will play out. There’s a lot of cloud data warehouse momentum within enterprise IT organizations that often prefer the path of least resistance when it comes to centralizing data management. But as it becomes apparent that the volume of data that needs to be accessed is going to exponentially expand, a day of reckoning for bringing the cost of managing all that data under control is not very far off.
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more