paulchinonline.com

Introduction to Disaster Recovery Planning

By Paul Chin

Originally published in Intranet Journal (24-Mar-2005)

back back to portfolio


Events over the last several years—Sept. 11; bush fires, earthquakes, and mudslides in the West coast of North America; massive floods in China; and the Boxing Day tsunami in Southeast Asia—have forced us to rethink what we consider critical infrastructure. These events brought about the loss of essential services, causing widespread panic and confusion amongst a population unable to handle the emotional and physical stress of what was unfolding before them.

Almost all organizations—whether commercial or governmental—rely on some form of technology to manage the various parts of their operations. A disruption to the availability of any of these resources, if even for a few hours, can have serious consequences for their ability to function at normal capacity. For organizations that provide mission critical services such as power plants, telecommunications facilities, and national defense agencies, disruptions must be kept to a minimum or, if possible, avoided altogether.

How an organization responds to threats during and after a crisis will determine whether they emerge on the other side intact or cause them to cease operations entirely. This is where disaster recovery planning comes into play.


What's a Disaster Recovery Plan?

A disaster recovery plan (DRP)—often referred to synonymously as a business continuity plan (BCP)—is a comprehensive set of measures and procedures put into place within an organization to ensure that essential, mission critical resources and infrastructures are maintained or backed up by alternatives during various stages of a disaster.

A DRP must address three areas:

  1. Prevention (pre-disaster): The pre-planning required—using mirrored servers for mission critical systems, maintaining hot sites, training disaster recovery personnel—to minimize the overall impact of a disaster on systems and resources. This pre-planning also maximizes the ability of an organization to recover from a disaster (I discussed the issue of prevention in great detail in my two-part series "The Keys to Maintaining Intranet Integrity").
  2. Continuity (during a disaster): The process of maintaining core, mission-critical systems and resource "skeletons" (the bare minimum assets required to keep an organization in operational status) and/or initiating secondary hot sites during a disaster. Continuity measures prevent the whole organization from folding by preserving essential systems and resources.
  3. Recovery (post-disaster): The steps required for the restoration of all systems and resources to full, normal operational status. Organizations can cut down on recovery time by subscribing to quick-ship programs (third-party service providers who can deliver pre-configured replacement systems to any location within a fixed timeframe).

Disaster recovery and business continuity planning, however, involves more than just a series of technology-based system recovery procedures. A DRP needs to include contingencies for the loss of:


The Objectives of Disaster Recovery Plans

A DRP is an insurance policy; you pray that you'll never need to use it but you'll be glad you have it if you ever do. It enables an organization to respond efficiently to potential threats that may render all or parts of its operations and resources unavailable. Unfortunately, according to a META Group article, only 20 percent of the Global 2000 currently claim to be prepared with some type of DRP.

So, why a DRP? They protect an organization in many ways:

In general, smaller operations that don't provide any essential services may find it cost ineffective, or even unnecessary, to implement a full-scale DRP beyond keeping off-site backups and maintaining a basic set of server power down procedures. Larger enterprises with mission critical data and systems, however, must consider a more extensive solution in order to prevent a total collapse and cessation of operations.

But DRP necessity—as well as the size and scale of the DRP—really depends on the purpose of the organization and the importance of business continuity during a disaster.


Identifying and Prioritizing Core Resources

Disasters can come at any time and in any form. They can knock out an organization's internal systems for a few hours or they can bring about the complete destruction of its primary facility. Becoming aware of all the potential threats that can affect the availability of core resources, and developing action scenarios to respond to those threats will be vital in implementing a DRP.

Threats can be classified as:

Once disaster strikes, there won't be any time for humming and hawing. The aim, during the first few minutes of a disaster, is to ensure that the organization's able to maintain its mission critical functions during the crisis. Disaster recovery personnel must know exactly what resources—technology, information, personnel—need to be available and what can be powered down or redirected towards other more important tasks.

But preparing a DRP requires more than just the recovery teams. All departments need to be involved in identifying and prioritizing their resources in terms of:

The various department representatives and disaster recovery personnel can then work together in designing the appropriate DRP procedures to manage and recover departmental assets.


Tips for Writing and Implementing a DRP

It's important to understand that a DRP will be carried out during extremely stressful circumstances. Disaster recovery teams have to be able to mobilize quickly and be prepared to deal with three simultaneous events: the disaster itself, the failure and unavailability of systems and resources, and end-user confusion caused by both the disaster and the system failures.

The clarity of DRP procedures is crucial to effective execution so make sure that it's direct and to the point without any room for interpretation. Here are some tips on writing and implementing a DRP:

It's impossible to plan for every imaginable situation and combination of system failure. While a DRP will go a long way towards minimizing timely decision-making during a disaster, there will be times when it's unavoidable. A small group of disaster recovery decision-makers—with appropriate alternates—should be appointed to act as coordinators during a crisis. This will help eliminate any uncertainties or arguments caused by unexpected circumstances. They must be able to make speedy decisions, work effectively under high-stress, and be comfortable in a leadership role.


Running DRP Simulations

The key to maintaining and executing an effective DRP is preparedness; and the best way to prepare for potential disasters is through practice. This is accomplished by running real-time drills, or "war games," of various disaster scenarios. These war games allow an organization to measure two things: the effectiveness of the DRP, and the ability of the recovery teams to carry it out.

All the care and time in the world could have been taken in the designing of the DRP—procedures being reviewed and revised on paper numerous times—but if those responsible for executing it are not prepared, it won't do the organization any good. Disaster recovery team members can use war games as a tool to familiarize themselves with the frenetic pace and speed at which the procedures need to be carried out.

They need to be able to keep their emotions in check—much like doctors working in an emergency room—to ensure that DRP procedures are carried out efficiently in spite of the highly volatile environment. Discovering, in the midst of a real crisis, that key personnel are unable to handle the pressure and responsibility of disaster recovery will end up compromising the entire organization's ability to maintain its operation.

War games contribute to disaster recovery by:


Final Thoughts

There are countless numbers of unpredictable events that can jeopardize an organization's ability to function. Some can be prevented by implementing appropriate system and resource redundancies; others can only be contained, held back to minimize their impact on the organization's core infrastructure.

Organizations need to identify their critical assets and maintain a comprehensive, thoroughly tested DRP because no one really knows when a disaster will strike. But should it ever happen, a DRP and a well-trained recovery team will ensure that the organization isn't caught completely off guard. And this is really what separates business continuity from business catastrophe.


Copyright © 2005 Paul Chin. All rights reserved.
Reproduction of this article in whole or part in any form without prior written permission of Paul Chin is prohibited.