Of course, Google has been busy. In the past few years, researchers have cooked up a complex and powerful new data-warehousing system called Mesa.
Mesa was born out of Google’s core business: Internet advertising. To serve its advertising customers and internal needs, Google collects detailed information about a given ad, and it has to record and process that data in real time, according to a new research paper on the system.
Mesa deals with these data at great scale, taking in data in near real time. Mesa “handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day,” the paper’s authors wrote. Moreover, they wrote, Mesa is datacenter-failure-proof, as it’s “geo-replicated across multiple datacenters.”
It’s possible that Mesa will lead to a new cloud service available on the Google Cloud Platform. That could help the company further distinguish itself in its cloud warfare against Amazon Web Services, which has a data-warehousing service called Redshift, and Microsoft Azure, which can and do drop cloud prices and frequently release new cloud services, just like Google.
Such a development wouldn’t be too farfetched. After Google introduced the Dremel query system in a research paper, the company created BigQuery based on Dremel and made it available as a cloud service on the Google Cloud Platform.
Architecturally, Mesa’s developers made some important decisions about what to optimize for that make it different, from, say, Dremel:
Mesa explores a new point in the design space with high scalability, strong consistency, and transactional guarantees by restricting the system to be only available for batched and controlled updates that are processed in near real-time.
The system also helps Google in ways that, for instance, the open-source Hive data-warehousing tool for Hadoop couldn’t. And it also likely stands out from the Presto query engine, which Facebook developed in house to meet latency challenges that Hive couldn’t deal with. Facebook recently released Presto under an open-source license.
For one thing, Mesa might be particularly well suited to deployment across data centers around the world.
“The cloud computing paradigm in conjunction with a decentralized architecture has proven to be very useful to scale with growth in data and query load,” they wrote.
Jordan Novet contributed reporting.