Last week a fairly minor Amazon cloud outage knocked popular Q&A site Quora and part of Salesforce.com offline. Service quality at web giants Pinterest and Dropbox was affected. It was a fairly minor incident compared to last year’s multi-day meltdown that took out Foursquare, Reddit, Scvngr, and others. But it was a good wake-up call: What are you doing to prepare your services for the next cloud failure?
Newvem, a startup focused on scanning and analyzing companies’ cloud services to optimize their use of Amazon cloud technology, recently did a study of its beta launch participants. The most shocking findings?
A full 40 percent of Amazon Web Services (AWS) users are not prepared for the next outage.
“At least 20 percent of users could easily have withstood the outage, had they implemented best practices,” Cameron Peron, Newvem’s vice president of marketing, told VentureBeat. “And only 60% of users back up all of their EBS volumes.” (An EBS volume is an “Elastic Block Store,” Amazon jargon for data storage.)
“Amazon doesn’t make any promises to back up data,” Peron said. “The real issue is that many users are under the impression that their data is backed up … but in fact it isn’t due to mismanaged infrastructure configuration.”
The problem is not unique to beginners — enterprise users of cloud services have issues as well.
“Light, medium, and heavy cloud users are running clouds where on average 40 percent of their cloud — data, applications, and infrastructure — is NOT backed up and exposed to outage meltdown,” Peron said.
Part of the problem seems to be a perception that when you put services on the cloud, everything is managed for you. Amazon or Microsoft or Rackspace has my back: I don’t need to worry.
“Bottom line: clouds go wild,” said Peron. “The last thing that developers/operations should do is assume that the cloud works on its own.”
Instead, cloud services need to be carefully monitored and managed — especially for what will happen in the event of a service disruption. Proper set-up with services in multiple availability zones can help protect against outages. Careful data and application backup can prevent re-start issues.
Above: “Unhealthy instances” are unresponsive servers within your cloud
In spite of the issues, Peron still feels that Amazon has the best cloud technology:
“Despite all of the hype, AWS is the most reliable cloud out there – period. We’ve seen many of our beta participants leaving other clouds for AWS because it provides Class A infrastructure, elasticity, flexibility, and great business model.”
One option for businesses that absolutely must have complete service reliability is hosting with multiple cloud providers. That’s becoming more of a possibility now that OpenStack is making some headway. OpenStack is an initiative to standardize cloud configurations, making it easier for companies to host on multiple clouds with a minimum of configuration and customization.
Peron agrees that this is a good idea, but it comes with a cost.
“As a best practice in availability, especially to protect a cloud from an outage, a multisite deployment with multi region configuration in AWS and/or between clouds would be one of the basics.” Due to the expenses of running multiple clouds, however, he suggests that “it’s cheaper running a multisite deployment within AWS as opposed to two clouds.”
Newven is a 12-person company headquartered in Tel Aviv, Israel. It’s backed by a $4 million round of initial investment led by Greylock Partners, Index Ventures, and others.
Image credit: ShutterStock