Ex-Google, Yahoo, Facebook employees snub recession, launch Hadoop startup

Updated

Cloudera. Not yet launched, it intends to help other companies adopt a promising software platform called Hadoop.

Hadoop is an open-source software project designed to let developers write and run applications that process huge amounts of data. While it could potentially improve a wide range of other software, the ecosystem supporting its implementation is still developing. Which is where Cloudera hopes to make a place for itself.

Right in the middle of a downturn. It's the kind of move the Valley is all about.

Cloudera will help other companies "install, configure and run" Hadoop, either on a company's own servers or using Amazon's hosted Elastic Compute Cloud (EC2) service.

Its founding team includes ex-Yahoo engineering vice president Amr Awadallah, ex-Google engineering star Christophe Bisciglia, former Facebook data team leader Jeff Hammerbacher and entrepreneur Mike Olson. The company hasn't announced its investors yet, but Awadallah is currently an entrepreneur-in-residence at Facebook backer Accel Partners -- so presumably that firm is already on board.

More on Hadoop: It uses the Google-introduced MapReduce systems framework that divides applications into small blocks of work, creating multiple replicas of data blocks that it places on various computer nodes. The benefits, as stated by the official Hadoop open-source site, include:

Scalable: Hadoop can reliably store and process petabytes.

Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.

Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.

Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

It is already in use at large companies like Yahoo. For a vitriolic yet informative take on the technology, see Ted Dziuba's earlier write-up at The Register.

[Update: Kyle Shank, Dziuba's partner in crime at search engine Pressflip.com, says their startup used Hadoop at first but didn't find it ultimately relevant. He adds: "Hadoop is most useful when you need to regularly process gigabytes of information that fit the MapReduce paradigm. Cloudera will be most useful to companies that have giga/terabytes of data and no idea what to do with it." That's certainly not most companies, but it fits the description of a few companies with lots of cash.]