Amazon wants your data in Elastic Map Reduce

Amazon has announced a new way to process large amounts of data, dubbed Amazon Elastic Map Reduce, that combines the company's cloud computing infrastructure with the Hadoop open source framework.

Hadoop is based on the MapReduce programming model made famous by Google, and it basically involves breaking down large chunks of data and distributing them across multiple processors. For example, Yahoo runs Hadoop across 10,000 Linux clusters to produce data that's used in its web search. And companies are already running Hadoop using Amazon's Elastic Compute Cloud -- that's what The New York Times did to convert its public domain archives into PDF documents.

Still, implementing Hadoop isn't easy. That's why a startup called Cloudera wants to help companies use it, either in Amazon's cloud or on their own servers. And with Elastic Map Reduce, Amazon is trying to make the process simpler too. Here's how it describes the service:

Using Elastic MapReduce, you can create, run, monitor, and control Hadoop jobs with point-and-click ease. You don't have to go out and buys scads of hardware. You don't have to rack it, network it, or administer it. You don't have to worry about running out of resources or sharing them with other members of your organization. You don't have to monitor it, tune it, or spend time upgrading the system or application software on it. You can run world-scale jobs anytime you would like, while remaining focused on your results.

This announcement adds to Amazon's already formidable range of web services. It also marks Amazon's movement away from just providing the bare-bones infrastructure and towards offering services on-top of the infrastructure.

[photo: flickr/johnnie w@alker]

More