How do you make a real-time database faster? Rockset has a few ideas

Real-time analytics database vendor Rockset today announced an update to its namesake platform that introduces a new architecture designed to help accelerate enterprise use cases. A separation of some basic operations is central to achieving the speed up.

Modern data platforms, including data lakehouses, have increasingly separated the compute component where queries are executed from the storage piece where data is stored. But traditionally, compute for data query execution hasn't been separated from data ingestion.

For a real-time database, data needs to be ingested from all sources. Typically, the same compute engine that supports ingest is the same as that which provides the query engine. But, this can lead to performance and latency issues, as well as challenges for executing real-time analytics queries on data.

Using the same compute for both ingest and query also means that in the cloud, an organization has to size a compute instance for both types of operations, rather than just optimize for each specific use case. With its latest update, Rockset is now separating the two operations in an approach it refers to as "compute-compute separation."

"With real-time analytics, the data never stops; you're processing incoming data all the time and also your queries never stop," Rockset cofounder and CEO Venkat Venkataramani told VentureBeat. "When compute is running on both ingestion and query processing 24/7, it can become too slow, too expensive and too cumbersome to operate — and we now eliminate all of those things."

Open-source RocksDB at the center of compute-compute separation

_{Rockset’s latest real-time analytics database update enables compute-compute separation. Image credit: Rockset}

The team behind Rockset has its roots in Meta (formerly Facebook). Among the core technologies that Venkataramani and his cofounders helped build is the open-source RocksDB persistent key-value store.

RocksDB is at the foundation of Rockset, providing a base for database storage and ingestion. The new compute-compute separation capabilities also have their roots in new features found in RocksDB that Rockset is enabling in its commercial database platform.

Venkataramani explained that Rockset helped develop the RocksDB memtable replicator that can efficiently and reliably duplicate the memory state of data in RocksDB from one compute instance to another.

"Now where one machine is doing writes and another machine is doing reads, they still can get real-time access to each other's state," Venkataramani explained. "The rest of the Rockset stack has already been built to leverage that in terms of data ingestion and SQL query processing."

Less duplication

Replicating the state of a compute instance is not the same as a wholesale replication of data, an attempt to enable real-time data ingestion and data queries. Venkataramani said that a simple "naïve" way of achieving compute-compute separation could be something as basic as using the replicas functionality in a relational database like PostgreSQL.

In the PostgreSQL replicas model, an organization can have a primary node performing data ingestion, and then have a replica that is basically serving all queries. Venkataramani explained that, with that approach, ingestion data has been duplicated. This means more data storage, more cost and some latency.

"The magic here is that we can do this without duplicating compute, and without duplicating storage," said Venkataramani.

What compute-compute separation enables for enterprise data analytics

With compute-compute separation, Venkataramani said an enterprise could have cloud compute instances that are optimized for actual use cases.

For example, some organizations might have fewer query compute needs and more data ingest, or vice versa. Without this model, Venkataramani said, organizations would often end up overprovisioning resources to meet the maximum requirement of both ingest and compute.

The new Rockset update will also enable better overall reliability of applications with the separation of data ingest from query processing. The approach will also allow for concurrency scaling as query volume grows. Venkataramani explained that if an application is initially provisioned to handle 100 queries a second, but then demand spikes up to 500 queries a second, the isolated query compute engine can spin up new virtual compute instances to handle demand.

"Even if there's a flash flood of data coming in from the data ingestion side, your application query processing will be completely isolated from that, which allows you to build more reliable applications," he said.

Open-source RocksDB at the center of compute-compute separation

Less duplication

What compute-compute separation enables for enterprise data analytics

More