Cloud

The summer of downtime

Image Credit: http://www.shutterstock.com/pic-91504295/stock-photo-beach-access-with-beach-sign.html

Last year, 60 percent of information technology (IT) outages occurred during the summer months of June and July. This concentration suggests that not only are outages predictable, but that they can be reasonably avoided by implementing the right protections. Even powerhouse companies like Amazon, Google and Salesforce.com were victims of downtime last year, indirectly taking down popular consumer properties like Netflix, Google Talk and Instagram along with them.

When infrastructure goes down anytime of the year, the ripple effects can be felt across an entire organization. ‘Always-on’ IT service is more crucial to running a business than ever before. IT infrastructure today is expected to perform better, faster and more consistently than at any other time in history, while at the same time adapting to an exponentially increasing rate of change.

It’s a cruel summer

Why are outages and moments of downtime more prevalent during summer? It’s a number of reasons that cumulatively have a crippling effect on the reliability of popular products and services. For example, a few factors that lead to summer downtime include:

  • According to a recent survey, 44 percent of Americans take their vacations during this time – including IT professionals. With key staff out on vacation during summer, those left in the office are running on a “skeleton crew” and are sometimes less equipped to deal with sudden problems.
  • This increased travel also means increased activity on cellular service and internet usage and this spike in user-generated traffic can oftentimes overwhelm the system.
  • There is also a trend among enterprises towards adoption of cloud infrastructure services. These corporations use the summer months to aggressively migrate their data centers towards virtualization and cloud, further stressing the underlying infrastructure.

It is for all these reasons (and many more) that the summer months are such a crucial and challenging time for infrastructure performance. Undoubtedly, the reliability of the cloud is never tested more than during the summer months, such as when workers go on vacation and require remote access. However, that’s not to say that outside forces such as breaches, weather patterns, software glitches, distracted administrators, and human error don’t play a role in these outages. At the end of the day, it does not matter what the cause is, what matters is the ability to quickly determine root cause, get your systems back online and prevent any future service interruptions.

The new normal

Just last month, we saw a damaging software glitch that caused a system-wide computer failure at Southwest Airlines, resulting in the cancellation of close to 70 flights. This is the new normal for IT professionals across the country – increased adoption of new technologies driven by company executives leads to new and increased system usage and inevitably, increased problems.

In recent years, the adoption of virtualization, the move towards the cloud and the explosion of mobile devices has dramatically increased data volumes and complexity levels throughout the enterprise.  The level of performance demanded from mission-critical applications is at an all-time high. Business leaders today are expecting IT professionals to ensure the speed and reliability of highly-taxed infrastructure systems.

When an outage or problem does arise, the increased levels of abstraction only serve to complicate the search for root cause or the ‘ghost in the machine’ causing the underlying issue. No longer is an outage (or infrastructure latency) simply a concern for IT directors and enterprise executives. Society today expects business applications to be available 24/7, without delay. As we’ve become more reliant on technology to run our lives, we can expect that summer outages may affect everything from whether we can board a plane or book a vacation, to accessing online banking or posting our summer photos.

Ensuring a summer of uptime

John Gentry

Above: John Gentry

Image Credit: John Gentry

This all begs the question: What can we do about it? If IT managers can reasonably predict that they are vulnerable to slowdowns or outages, what steps can they take to minimize these risks? Our customers have found that the following factors can significantly minimize the risk of summertime outages:

  • Keeping clear of the public cloud for mission critical workloads. Public cloud vendors don’t always have solutions in place that give them in-depth visibility into the performance of the cloud infrastructure. Without granular performance metrics it is challenging to provide the specific service level agreements (SLAs) that businesses require today. Most public cloud providers focus SLAs around availability, or uptime. Increasingly, businesses want SLAs tied to specific application and business requirements because they want assurances around application response times.
  • Instrumenting a private cloud for infrastructure performance monitoring. Infrastructure optimization tools can holistically assess the entire private cloud infrastructure, and provide the administrator with the real-time and historical data necessary to make intelligent decisions about input-output (I/O) capacity, utilization and performance for every layer of the infrastructure – network, server, storage and applications.
  • Ensuring the performance and availability of a disaster recovery site. Large organizations today need to replicate key applications and data across their environment for protection against natural or man-made disasters. These long-distance links that support a replication process are extremely important, but can be expensive. It’s critical to monitor their usage and identify any problems quickly.

As the summer of 2013 is upon us, now is the time for enterprises to evaluate the stress that the coming months will place on their infrastructure. In order to avoid outages, downtime or even infrastructure slow downs, it’s crucial to put a comprehensive infrastructure monitoring or management program in place. The summer of downtime can be avoided, but only through system-wide awareness of physical, virtual and cloud computing environments.

John Gentry is VP of Marketing at Virtual Instruments.