Twitter is today releasing its Heron real-time stream processing engine under an open-source Apache license on GitHub. Twitter first published a paper documenting Heron last year, but the software has been proprietary, until now.
Written in C++, Java, and Python, Heron is a successor to the Storm stream-processing engine that Twitter built and then open-sourced in 2011. Heron offers considerable performance gains relative to Storm.
“Heron is a streaming system that was born out of the challenges we faced due to increases in volume and diversity of data being processed, as well as the number of use cases for real-time analytics,” Twitter engineering manager Karthik Ramasamy wrote in a blog post. “We needed a system that scaled better, was easier to debug, had better performance, was easier to deploy and manage, and worked in a shared multi-tenant cluster environment.”
Twitter has been using the software for more than two years now. But it’s not the only company working with Heron. Some startups have been using it, and Microsoft has assembled a version that runs on top of the YARN cluster management component of the Hadoop open-source big data software, Ramasamy wrote. He added that use cases range from extract-transform-load (ETL) to advertising bidding and even augmented reality.
This is just the latest open-source release from Twitter, which has previously shared Diffy, Scalding, and Summingbird. Other big web companies also regularly open-source their tooling. And there are other real-time stream-processing systems available, including Apache Flink and Apache Spark Streaming.
Ramasamy wrote that Twitter may end up putting an independent foundation — like the Apache Software Foundation — in charge of the Heron project.