Amazon launches RedShift for massive petabyte-scale data analysis in the cloud

Amazon launched RedShift, a new cloud-based service inside Amazon Web Services for analyzing massive petabyte-scale data sets in the cloud, today.

The service, which has already been tested in beta by web startups such as Flipboard, big data stalwarts such as NASA, and massive streaming media service Netflix -- an Amazon competitor -- is available today in limited preview. It's designed for companies with datasets in the hundreds of gigabytes to the petabyte range, and it's intended to reduce the price of big data warehousing and analysis by an order of magnitude.

"Enterprises are tired of paying such high prices for their data warehouses and smaller companies can’t afford to analyze the vast amount of data they collect (often throwing away 95% of their data)," Amazon's VP of database services, Raju Gulabani said in a statement.

With RedShift, Amazon promises to reduce data storage costs below $1,000 per terabyte per year, a tenth the price of most data warehousing solutions, it says. Amazon's talking about data warehousing companies such as Teradata, but also about its own solutions, saying that RedShift is just 10 percent of the cost of existing Amazon Web Services solutions.

On-demand pricing starts, Amazon announced, will start at $0.85 per hour for a 2-terabyte data warehouse, which will scale linearly up to a petabyte. In other words, more data doesn't mean a better deal. Customers who choose to purchase reserved instances to guarantee access get lower pricing: $0.228 per hour, which translates to under $1,000 per terabyte per year.

But it's not just about the cost. It's also about speed.

"Our internal tests have shown over 10 times performance improvement when compared to standard relational data warehouses." Gulabani added. "Having the ability to quickly analyze petabytes of data at a low cost changes the game for our customers."

Customers can buy in at the 2TB level -- a single RedShift node -- and scale up to one hundred 16TB nodes, a total of 1.6 petabytes. Amazon, of course, handles all provisioning, backup, and maintenance tasks. All data is continuously backed up to Amazon S3, the company said.

Amazon worked with data analytics technology company ParAccell to build the new solution. ParAccell enables high-performance analysis of massive datasets -- more than a thousand JOINs in an SQL database, for example -- and features real-time mid-query integration with Hadoop and other technologies.

It looks like the data storage and management industry just changed again.

Image credit: Star Wars

More