Amazon’s outage in third day: debate over cloud computing’s future begins

As Amazon’s web services outage passed its third day, the debate on the future of cloud computing is underway. The outage is costing web sites such as Reddit and Quora considerable losses as users turn elsewhere to get their social media needs met.

Amazon’s Elastic Compute Cloud service hosts thousands of major web sites that rely on it to serve pages to users. And users rely on these services to store their personal accounts and data remotely. So when the EC 2 service goes down, so do the web sites, and that means users can’t log in to access their data. It’s a big hiccup for an industry that is supposed to grow to $55 billion by 2014, according to market researcher IDC.

The duration of the outage has surprised many, since Amazon has a lot of backup computing infrastructure. If Amazon can’t safeguard the cloud, how can we rely on it? So the debate begins on the future of cloud computing and what to do to make users and companies put their trust in cloud vendors such as Amazon.

The good thing about the cloud is that it protects users when their own home computers crash and lose data. But the rotten part about the crash of the cloud is that millions upon millions of users become helpless, and any recovery of the data is beyond their control. Some sites spend the money to run mirror sites on other cloud vendors, so the sites can remain functional even if one cloud vendor goes down. But that’s an expensive measure that many web sites haven’t taken.

“This is a wake-up call for cloud computing,” Matthew Eastwood, an analyst for the research firm IDC, told the New York Times. “It will force a conversation in the industry.”

Corporations will have to decide what computer operations to put on a cloud operated by external vendors and how much they should keep inside their own internal data centers. They will also have to figure out the right policies for backup and recovery services. And they will have to decide whether to allocate more money to backup data centers in multiple locations.

That discussion, he said, will most likely center on what data and computer operations to send off to the cloud and what to keep inside the corporate walls. Netflix uses Amazon, but it hasn’t gone offline because it fully uses Amazon’s redundant cloud backup infrastructure. For most startups, those are luxuries that are too expensive, despite the risks.

Amazon Web Services is supposed to offload all of those worries. It has the expertise, the scale, and the access to massive amounts of cheap computing power. Amazon clients that are still having trouble include Foursquare, Quora, Reddit and BigDoor.  The New York Times said the current problem originated in one of Amazon’s data centers in Northern Virginia. BigDoor, which offers gamification services to big companies, went down because its backup and recovery services were limited to Amazon’s data center in Virginia. BigDoor restored service by Friday evening, but its web site was still down. Amazon has been cryptic about the cause and has only said that matters are improving but the problem is still not resolved.

George Reese of O’Reilly wrote that Amazon’s setback is a learning experience and is, for the most part, a shining example of how the cloud works properly. That seems like a severe attempt to see the glass as half full. He argued that the web site developers should have planned for this kind of outage and taken advantage of Amazon’s full backup capabilities.

Eventually, the cloud will become like a utility. You can get as much computing power as you want with the flip of a switch and you won’t have to worry about outages as much over time. But we’re clearly not there yet.