Big Data

Yahoo’s former CTO thinks Hadoop’s data wizardry works best as a service

Above: Raymie Stata, CEO of Altiscale, on-stage at VentureBeat's DataBeat conference Tuesday.

Image Credit: Michael O'Donnell / VentureBeat
NOTE: GrowthBeat -- VentureBeat's provocative new marketing-tech event -- is a week away! We've gathered the best and brightest to explore the data, apps, and science of successful marketing. Get the full scoop here, and grab your tickets while they last.

SAN FRANCISCO — Listen up, folks: You may be running Hadoop wrong.

Hadoop is open-source software for scalable, distributed computing. It has become the “operating system for big data,” but that doesn’t mean your company needs to run its own distribution, said Altiscale chief executive Raymie Stata onstage at VentureBeat’s DataBeat conference

Altiscale is a cloud service built specifically to run the open-source software for other companies. Its idea is pretty simple: All businesses should be able to use Hadoop to collect, store, query, and organize their data as efficiently as any tech titan.

Why is Hadoop so important, you might ask? Because it’s cheap, flexible, and powerful compared to other methods of data storage, including many relational database management systems.

Stata formerly served as Yahoo’s chief technology officer. He was deep in the Hadoop project at Yahoo, which used the software primarily to build its web search index. After more than seven years at Yahoo, he was ready to move on — and saw a big need for cloud-based Hadoop distributions.

That’s because Stata saw lots of companies with huge Hadoop clusters barely use their total computing resources and others with capacity-constrained clusters that struggled to keep their critical workloads running. (Hadoop clusters are special types of computational clusters designed to store and analyze large amounts of unstructured data in a distributed computing environment.)

Altiscale, like competitors Qubole and Xplenty, solves these issues by offering Hadoop as a service. It’s similar to a software-as-a-service (SaaS) approach: Each Altiscale customer gets a logically dedicated cluster, but behind the curtain sit Hadoop-specific elasticity clusters that can grow and shrink.

Keeping Hadoop running at scale is not the direct value add for Altiscale’s customers, which include 25 of the Fortune 50 companies, according to the company. The open-source software is increasingly popular in the digital media and advertising industries, said Stata, which can use Hadoop to measure ad campaigns in real time, among other uses. Stata argues Altiscale’s solution offers those folks a good balance between performance and cost.

Marketshare, a marketing analytics company and an Altiscale customer, is certainly a fan of the service — but there’s still room to push the Hadoop ecosystem further, said Satya Ramachandran, its vice president of engineering.

Ramachandran is working with Altiscale to build a suite of applications around the service, like data modeling tools.

“We’re really just running experiments, because we’ve got a real live Hadoop environment,” Ramachandran said. “We can just put [new] tools in front of customers and see what they think.

“Having customers like Marketshare that are using Hadoop for more than just [extract, transform, load processes] — seeing real data science get done and supporting that —  is why we started the company. We’re excited to be doing it and we think that’s going to expand.”

More information:

Powered by VBProfiles


We're studying digital marketing compensation: how much companies pay CMOs, CDOs, VPs of marketing, and more, with ChiefDigitalOfficer. Help us out by filling out the survey, and we'll share the results with you.
0 comments