5. Inappropriate communication about the outage.
When Dreamhost customers experienced an issue with their billing system, the company responded with what it thought was a humorous explanation, prompting a legendary furor on the part of customers who focused on Homer Simpson cartoons and jokes rather than apologies and responsible explanations. They savagely attacked Dreamhost in online comments and in social media.

How to Avoid:

If the impact of an outage affects your customers and their ability to conduct business, take it seriously. Someone at your customer’s company selected you as a vendor and their judgment could be called into question because of your outage.

facebook server6. Missing any of the 5 elements to a successful outage communication/apology
“I’m so sorry that this happening, but I cannot help you. Yes, I realize that not providing you with any useful information about why this is happening and what is being done to solve it, giving you an ETA for resolution and telling you how we plan to prevent it from happening again and what we intend to do to compensate you for the trouble must be incredibly frustrating, and you have my deepest and heartfelt apologies for any inconvenience this is causing you. I know you depend on us, we value you as a customer and we take this very seriously, etc.”

Don’t do this. This mistake can be a symptom of not having a direct and open line of communication between your customer support and technical operations teams or from softening apologies at the urging of Legal or Finance departments.

VentureBeat is studying the state of marketing technology.
Chime in, and we’ll share the data.

VB Transform 2020 Online - July 15-17. Join leading AI executives: Register for the free livestream.

How to Avoid:

The 5 elements (bolded above in case you missed them) are core to any well-formed apology. They will cost you far less, in the long run, than the loss of revenue you’ll experience if your customers leave in large numbers because you mishandled the outage.

7. Disaster Recovery That’s a Disaster
Companies make a number of mistakes in the area of architecting a Disaster Recovery solution. The first and most obvious is to not have DR in place. The second is to architect a solution but to not factor in the increased load on the secondary system that will occur when failover occurs. Most computer loads do not scale linearly. If two sites are each running with a database at 40 percent load – that does not mean that one site can handle the workload of both at 80 percent load. It is more likely to be 120 percent – which means that in a DR, the fail over of one site will bring both sites down.

Image (1) micro-server1.jpg for post 248917How to Avoid:
Run capacity tests on your systems, so you know your headroom and the pattern of how your performance scales with workload. Another approach is to have the DR site not active at all, but be an idle replica of a production site. Of course, this almost certainly means that it will be slightly misconfigured in a DR event – unless you take to heart the next mistake.

8. Expecting Perfection Without Practice
When Chelsey “Sully” Sullenberger landed a US Air jet in the Hudson River with no fatalities, he’d logged more than 20,000 hours of flight time and completed countless simulated emergency exercises. He put in the time in advance so that he knew exactly what needed to be done at each critical juncture.Yet many companies fail to test their plans or test them often enough to develop an expertise at making them work. And when trouble starts, they’re not ready.

How to Avoid:
Form a Business Continuity Plan and test it multiple times. It’s far better for something to go wrong during a test than during an actual outage.

9. Diffusion of Responsibility
Researchers have shown in studies that people are less likely to take action or feel a sense of responsibility to handle emergencies when in the presence of a large group of people. Diffusion of responsibility is often used to explain why individuals in distress are less likely to receive assistance if a large group is present. In essence, individuals in that group collectively decide that if others aren’t acting, it must not be that serious. This happens far less frequently when individuals are confronted by the same situation.

Companies often have issues with diffuse responsibility during outages. Issues are not clearly assigned to individuals to resolve and groups point fingers at each other or fail to be able to identify who is responsible. This can often be the result of too many monitoring solutions or unclear escalation paths.

How to Avoid:
Assign clear responsibility in your outage response plan and include timelines for escalation. And try and get all teams using a single monitoring platform, like LogicMonitor, which can automatically notify the correct person based on the type of issue being reported and which can have escalation chains set up so that after pre-defined periods of time, escalation is automatic if the issue is not resolved and notifications go out to the next person in the chain.

10. Suffering from the “Tyranny of the Urgent”
“Our dilemma goes deeper than shortage of time; it is basically a problem of priorities.”
– Charles E. Hummel, Tyranny of the Urgent

When I was writing this content, I reached out to a number of CEOs of rapidly-growing SaaS companies. I was surprised when one told me: “I’m interested in the topic, but I wouldn’t be able to make the time to attend the live Web Seminar.”“Why?” I asked.“Well… we aren’t having outages right now,” he responded.

Wrong answer.

It’s easy to get caught up in the tyranny of urgent priorities and spend all of your time firefighting. The glory! The heroics!

But often, it’s mismanagement of priorities that are important and not yet urgent, that turns situations into 5-alarm fires that massively drain your organization’s resources. Ignoring the little things that you can do in advance to prevent outages is equivalent to not spending a few minutes putting fresh batteries into your smoke detectors and having some fire extinguishers on hand. Being prepared and proactive is not as glorious. But so much smarter.

How to Avoid:
Make preventing outages a priority by requiring teams to spend a portion of their time taking proactive steps to prevent them ever occurring. Your shareholders and customers will thank you for it.

Improving management of outage incidents can produce better outcomes for your company’s employees, customers and shareholders. It won’t be easy. But it will be worth it. And it all starts with avoiding some basic mistakes that others have made before you.

As Otto von Bismark once said, “Fools say that they learn by experience. I prefer to profit by other’s experience.”

Scott Barnett is a senior marketing director at LogicMonitor, which monitors servers and networks. He’s hosting a webinar on these 10 mistakes on October 2.