Pivotal nabs one of Cloudera's top talents, Hadoop project founder Shaposhnik

Three years after joining Cloudera, early Hadoop engineer Roman Shaposhnik, has jumped to Pivotal, the EMC and VMware spinoff with a Hadoop distribution that competes with Cloudera.

As senior manager of the open-source Hadoop platform team, Shaposhnik will be assembling a group of people to contribute to the ecosystem of open-source Hadoop technologies. Pivotal formally announced the hire today in a blog post.

The move comes as Cloudera and other Hadoop distribution vendors duke it out for customers, venture capital, and collaborations with cloud service providers.

Shaposhnik's arrival at Pivotal -- given his status as an architect of core Hadoop code, not to mention his 11-year stint at engineering wonderland Sun Microsystems -- could be coldly interpreted as an indication that Pivotal has a shot at distinguishing itself and pulling in big money in the big data market.

And that reading would be fair. Shaposhnik does in fact believe Pivotal has a sagacious vision for ushering in the future of application development and data analysis, as well as clear customer interest, as demonstrated in GE's $105 million investment in Pivotal.

In 2010, while many engineers around the world were still feeling their way around open-source Hadoop technology for storing and processing huge piles of data, Shaposhnik was leading up efforts to integrate many pieces of it at a core Hadoop contributor, Yahoo.

Tools like Hive and Pig had emerged for querying and analyzing big data, but the tools didn't all hook in with each other well. And Shaposhnik was in the middle of it all at Yahoo. He and his team were known internally as HIT men, referring to their task of Hadoop integration testing.

"It was partially kind of true," he told me in an interview. "We'd come to all engineers who enjoyed freedom. (We said,) 'You know what? Start paying attention to APIs (application programming interfaces) and integration. Start labeling APIs as public ... and private.'"

Suddenly, different systems that might have been isolated were getting hooked up and tested. And that work spawned the birth of an initiative called Bigtop, which has since become an open-source project in the Apache Hadoop ecosystem unto itself.

Shaposhnik is credited as Bigtop's founder. He did that work after leaving Yahoo and joining Cloudera, one of a few companies providing support and services for elements in the Hadoop ecosystem.

Shaposhnik gives credit to Pivotal for fixing its eyes on developers, and not just thinking about how to sell its Hadoop-based portfolio to big companies.

Cloudera "just want[s] to get to the enterprise Hadoop distribution," Shaposhnik said. "That's fine; nothing wrong with that. But where are all the other bits and pieces? Where's my developer sort of aspects?" He cited as an example the Java application development framework Spring, which Pivotal is sponsoring.

"Pivotal is just way better positioned to get away from just talking about Hadoop," Shaposhnik said.

Alongside its Hadoop distribution, Pivotal now offers a commercially supported version of the Cloud Foundry Platform as a Service, on which developers can build and run applications quickly and easily. It also provides analytics tools that plug in with Hadoop and the RabbitMQ messaging service for moving data among applications.

Those four pieces comprise an important unit for Pivotal, dubbed Pivotal One, to which many other Pivotal services connect. Pushing those products helps Pivotal diversify.

And that's why Shaposhnik isn't worried about the fact that nothing will stop other Hadoop distribution vendors from snapping up the technology he makes available for free under open-source licenses.

"Essentially the secret sauce is in the platform itself," he said. "The tremendous amount of value-add happens to be all the integrations (in) the platform. It's the integration between all these bits and pieces, rather than any particular set of technologies."

So he will keep working on integrating the many open-source software components related to Hadoop. For instance, he's interested in the way that tools for real-time analytics, such as Storm, stand to become more tightly connected with Hive and Pig.

But the focus on integration won't preclude him from introducing new technologies as well. As a matter of fact, he's interested in cooking up a better way to run text searches across the seas of unstructured data sitting in Hadoop's file system.

Companies including Cloudera have been trying to connect existing open-source search technologies such as Lucene and Elasticsearch to Hadoop, but that hasn't stopped Shaposhnik from thinking deeply about the challenge. "They don't integrate with Hadoop well enough," he said. "Unstructured search has to scale with my existing data sets."

It sounds like Shaposhnik will have no shortage of things to do at Pivotal. The question is whether his hiring will lead more Hadoop figureheads to follow him to the company.

More