We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Let the OSS Enterprise newsletter guide your open source journey! Sign up here.
Today, Redpanda Data, the data streaming platform formerly known as Vectorized, announced a successful $50 million series B round. The company’s main product offers a simpler mechanism for carrying events to the machines in a computing cluster, a process that can grow complicated when network outages or delays add confusion.
The company plans to use the funds for marketing, research and development of the main product, which the company estimates is running now on hundreds of thousands of machines just about a year after launch.
“The world has shifted to real-time first, raising the criticality of data streaming in the modern stack,” said Redpanda founder and CEO Alex Gallego. “We are empowering every developer to realize today the customer experiences of the future through data-intensive applications and services leveraging the full spectrum of data.”
Data streaming designed to streamline developers’ stacks
Creating a smoothly running mechanism for propagating new data to all machines that need to be informed can be a frustrating experience for developers, and the process has only become more essential as the complexity of the stack grows. Approaches like the popular microservice architecture only add to the challenge by splitting the tasks into many multiple machines. The approach may help development by separating the work into clearly defined parts, but it only succeeds when the communication is bulletproof.
In the past, developers relied upon several tools like the open source Apache Kafka. While the results are trustworthy, developers routinely call the name choice prophetic because the software can be difficult to install, configure and maintain. The tool was part of a constellation of open source packages that all needed to be installed and run together. One of the tools bearing the name “Zookeeper,” a name that was just as prophetic, was often a source of considerable pain and frustration.
Redpanda aims to remove this problem by offering a simpler and more automated mechanism that still supports the Kafka API. Developers can replace Kafka, send their data to Redpanda and save the overhead of maintaining five or more different processes that are part of Kafka. The company brags that its service can offer transactions that may be as much as six times faster with ten times lower latencies.
An important addition is a transformation layer that can apply web assembly code to the data as it moves through the pipelines. The WASM approach is gaining popularity with web developers by offering faster options for code execution in browsers. Its appearance in a tool for backend processing shows it is gaining traction as a more general format for distributing and applying code.
“A lot of the work [for WASM] is frankly just mundane tasks,” Gallego said. “Let’s say you have a personal object and you have to remove the Social Security number. That’s just your job as an engineer. Sometimes you share your data with the partner and you’re like okay so that data stream but remove the social security numbers. For this style of one shot transformations, web assembly lets you change the shape of the data without changing your code.”
Stacked with compatibility
The Redpanda stack also extends the functionality by storing older data events from the stream in S3 buckets, either at Amazon or at other S3 compatible services. This allows the service to outgrow the storage limits of the machines hosting Kafka, an option that does more than just create a longer history. It becomes more than a transport layer with a long history.
“It’s actually a ledger to a large extent because it’s immutable. It’s append only. You can’t change it. It’s ordered,” said Gallego, before adding that several cryptocurrency companies were relying on the product.
Redpanda is already seeing some companies relying upon the data stream for both real-time processing of the data when it arrives and historical aggregation, sometimes long after the fact. The tool allows multiple processing hubs to access the stream at points in the history, all using the same basic code with the same API.
“When [one customer] learned that we could do infinite data retention by tiering cold data to S3, or an S3-compatible API, they went from one terabyte to 12 petabytes of data,” said Gallego.
“The idea is that they can now record every single message and interaction for the next five years. Basically, two cents per gigabyte, so it’s extremely cost-effective for them to now store everything that happens. And they don’t have to have two code paths.”
This series B round was led by GV with participation from Lightspeed Venture Partners (LSVP) and Haystack VC.
“Redpanda has built a streaming data infrastructure for developers from the ground up to deliver simplicity, performance, and reliability while enabling enterprises to continue to run the software stacks they’ve already invested in,” said Dave Munichiello, a general partner from GV who will be joining the Redpanda board. “Redpanda’s team brings together deep experience in streaming infrastructure with a concrete understanding of customer pain points with the status quo.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.