Disaster Recovery Is All About Imagination

IT Management | Guest Opinion | Kelly Lipp, Wednesday, December 2, 2009

Within our IT-centric world, we tend to forget that disaster recovery is more – much more – than getting mission-critical data restored. In fact, getting the data back might be the easiest part of the process. Tougher is knowing what is going to happen with that data after it is restored.

Take a large disaster, for example. How will your employees gain access to that data? What about your customers? What happens if your key users are no longer available to use the data? (After all, you have just experienced a disaster.) Many of our basic assumptions are probably not correct.

And what about the smaller, more common disasters? What happens if you lose a single mission-critical system? Have you thought about how you would exist without that system for some period of time? It is these kinds of disasters that surprisingly require the most thought.

Disaster recovery is about imagination. A simple exercise involving key business and IT folks sitting around a conference table and imagining what disaster could happen -- and what might happen afterward -- can set a business on the right course. If you give thought to something before it happens, the chances of a better reaction are higher.

For this exercise to be most effective, it is essential that you involve as many people outside of IT as possible. If this is an IT-only exercise, it will be much less effective. Use your other stakeholders. Their impressions are probably different from yours; different, but equally as valuable.

The steps below will guide you in your imagination process.

1. Imagine the most likely events that will cause disruption within your data center.

For most of us there are perhaps two or three events that will wreck our ability to conduct our business. Some are geographical: hurricanes in the southeastern Unites States, earthquakes in California or tornados in the Midwest. Other problems like water main breaks, fires, etc., do not have a geographic component and can affect any of us.

Part of your exercise is to think about which of these could happen and to assess what the impact might be. Impacts include the inability to access your data center or your entire site or the unavailability of key personnel who cannot reach your site, etc.

Some events are smaller than others. For instance, an event could be as simple as losing the telephone lines into your site. This is probably more likely than the hurricane and will cause as much disruption. Focus on these. They are much more likely to occur. Much of our disaster recovery planning involves worrying about things that will not happen while ignoring those that are much more probable. Granted, the Black Swan event, the highly unlikely event, will be devastating, but do not become too focused on it. The smaller disasters will hurt just as much and are more likely to happen.

Think of as many of these events as you can, contemplate each and rank them according to how likely they are.

A good template for the discussion might be “What would we do if…” Let your imagination run wild. The more of these you think of now, the more likely you are to recover from them when they happen.

2. Determine the business impact of these events.

Again, some events have a greater impact than others. It may be that the relatively small event, like losing Internet access, has a higher impact on your business than a hurricane, especially since it is more likely to occur. In some cases, a catastrophe like the hurricane will make it impossible to conduct business afterward.

Many events will be localized. You may lose your e-mail database. The rest of the data center is up and running but your mission-critical communications application is down. What is the impact of this?

Business impact has two components: the criticality of the application and how long it will be inaccessible. List both of these components during your exercise.

Good questions are, “How much will it hurt if it is down for an hour? How much if it is down for a day?”

3. Rate the business impact from high to low.

There are lots of applications in most of our environments. Some are much more critical than others. Many of the things that can happen are simple annoyances while some can be devastating fairly quickly. Rate the impacts to your business.

Dollar impacts can often be elusive, but getting to this critical metric will be helpful in the later stages of the exercise. If you know how much one of these will cost, you will find it easier to gain funds to mitigate them. There may be a relatively inexpensive way to avoid the problem.

4. Develop a comprehensive plan to recover from each event, starting with the high impact events.

Pick the event that will have the most impact on your business. Imagine how you would maneuver around it.

Using e-mail as an example, it may be possible to use an alternative communications path. Perhaps most of your key employees have external e-mail accounts. Knowing their addresses and having a plan to switch communications to that path might be adequate to mitigate your e-mail outage.

The plan must be complete. Trying to plug the holes in your plan while in the middle of the outage does not work. The additional stress of knowing many are counting on you will not help your performance.

5. Develop the “Exist Without” process.

The outage will persist. What will you do while you cannot use that application? Will business come to a grinding halt?

It is essential to have a variety of plans in place based on the expected length of the outage. If the outage is short enough, perhaps you simply hunker down and wait it out. For longer outages, though, the business impact starts to be a problem. It is here that you need a well thought out Plan B.

Since this is an imagination exercise, you might as well think of many “exist without” scenarios. Some make more sense than others. Some are easier or harder to implement. Determine the best one to use and go with it.

6. Getting back to “Business as Usual.”

Once the application is back online, you must transition back to your normal business plan. Again, having a developed plan is important. Unwind what you’ve done and move on.

Application outages and disasters, both big and small, are part of our IT fabric. It is what we do about them that matters. If we simply spend an hour or two imagining what we would do, we will be ahead of the curve when the the disaster happens. Better yet, imagine how much better prepared you could be if you put a formal plan in place.

Time to let your imagination go wild.