Disaster Recovery Plans: Murphy s Law

 < Day Day Up > 



Disaster Recovery Plans: Murphy's Law

Murphy's Law states "if anything can go wrong, it will at the worst possible time." Disaster recovery is not a proactive topic. It is the time when safeguards, either whole or in part, have broken down and the organization is now engaged in survival and business restoration activities.

Disaster recovery planning is a complex and labor-intensive process. It requires the redirection of valuable technical staff, processing resources, and, of course, funding. You can minimize the impact of such a project on scarce resources by considering the development and implementation of disaster recovery and business resumption as part of the organization's routine business planning activities. Approaches to this topic should be straightforward and uncomplicated. Remember KISS.

Recovery procedures are designed to prioritize the organization's resources to minimize disruptions, minimize financial losses, and ensure timely resumption of profitable operations.

Because there are no two organizations structured alike, plans should be developed in consideration of priority-ranked critical assets in the worst possible scenarios.

Experience Note 

Plan for the worst, but expect the best.

In all likelihood, organizations will experience a mixture of disaster consequences. Outages affecting manufacturing or sales cost millions of dollars, while outages affecting archived data files may be relatively small. The following questions should be asked of relevant business units in assessing the consequences of disasters and the need for recovery:

  1. At what time does the unavailability of a specific critical asset impact profitable operations?

  2. Do specific assets generate revenue? If so, how much does it generate in 1 hour? 1 day? 2 days? 1 week? 1 month?

  3. What are the intangible losses (public confidence) that would occur in the event of a critical asset not being available for 1 hour? 1 day? 1 week? 1 month?

  4. How quickly can this critical asset be recovered in the event of an outage?

  5. How long did it take to recover this critical asset in the last disaster recovery test or in the last actual disaster?

  6. Do key employees understand how 1 hour of a critical asset's unavailability impacts profitability?

  7. Will employees need to be sent home because they cannot continue to work without a critical asset for 1 hour? 24 hours? 2 days? 1 week?

  8. Are you required by law, regulation, or stakeholders to make disaster recovery data available for audit? What are the liabilities of ignoring these requirements?

Sensible contingency plans should have the following core elements:

  1. Management resumption plans

  2. Emergency operation plans

  3. Emergency communications plans

Management resumption plans include aspects that assume a few positions may be filled by a variety of employees. Who would act in the CEO's place during a crisis if the CEO is unavailable? Several possibilities should be designated and trained so operations may continue as smoothly as possible. The most important aspect of this challenge is that a qualified candidate or set of candidates is trained to fill the position in an emergency. Again, having a set of qualified substitutes assures operational redundancy and fewer single points of failure.

Emergency operation plans include those processes needed to continue profitable operations. It is imperative these operations begin as soon after the critical incident as possible. Specific teams composed of trained employees addressing critical needs should be developed, trained, and tested. This is a continuous process as operations are dynamic processes with constant changes in personnel, facilities, and data. It is important to have required assets designated in their priority ranking so the most critical assets are recovered first with less critical assets following.

Emergency communications can take many forms, depending on the size and needs of the organization. One of the most vital elements of an emergency communications plan is the ability to contact employees and see to their safety. There must be a mechanism developed and in place, where all employees might be contacted and a determination made as to their safety. After that step has been taken, decisions such as how employees can be transported to an emergency relocation site can be made. Remember, human resources are the most important critical assets.

Every business function contains critical tasks that absolutely must be performed and secondary tasks that are important after the critical tasks have been completed. Of course, critical functions require critical assets. The longer the organization continues in its disaster recovery mode, secondary tasks will usually be forced to critical ranking.

Written in the form of an operations manual, or procedures guide, organizations should have their critical tasks documented along with information and business process flowcharts. This recovery guide will provide immediate help to employees involved in both day-to-day operations and of exceptional use during recovery operations. In the face of a disaster, lacking an operations manual with policies and procedures promotes arbitrary and baseless decisions based upon an employee's recollection or interpretation of a policy that was understood, but not committed to a written document.

Informal or unwritten functional responsibilities will cause confusion, ineffective task prioritization, and misunderstandings. Job descriptions must be formalized as poorly defined or nonexistent positions make performance accountability impossible. During a disaster is not the time to discover you do not know who is responsible for backing up the enterprise e-mail servers.

Position descriptions must be developed, maintained, and regularly reviewed relevant to job description, levels of authority, and reporting requirements. Manuals describing position policies and procedures will be introduced in legal proceedings. Count on it.

They must be drafted with this end in mind. The content and tone of all policy documents should reflect the organization's professionalism, as they will for the basis of business operations during the period of the emergency.

Training employees relative to their duties and responsibilities before an actual disaster is critical to the recovery plan's success. Standardized employee training results in a teamwork effort and identifies how employees will perform during an actual disaster. Employees should be knowledgeable about their roles in an emergency and their specific responsibilities in disaster response and recovery. Practical training reduces development errors, improves procedures, and reduces miscommunications. The point behind practical training is the development of the employees' comfort, confidence, and performance.

Basic Employee Training

Employee recovery program training consists of these basic components:

  • Orientation. All new employees, regardless of position, should receive orientation training regarding the organization's philosophy, mission, reporting structure, chain of command, goals, and priorities.

  • Transfer or promotion. All employees, as they are transferred or promoted to a new position, should be thoroughly trained in their new emergency duties, policies, internal control mechanisms, and related responsibilities.

  • Disaster recovery and business resumption procedures. This training should include the organization's stance, reporting requirements, individual duties and responsibilities, and performance expectations in the event of an emergency. Cross-train employees so they may reasonably perform job duties of other employees. Care must be exercised in this area, as internal controls require a separation of duties and least privilege. Employees should possess only as much privilege as they need to perform their assigned tasks, and jobs must be separated from each other. If individual employees have too much privilege or there is insufficient separation of duties, fraud and abuse are the likely outcome. Regardless, if an employee is a manager in Human Resources, there is no reason why this employee cannot be cross-trained as a sales manager and take this role in the event of an emergency.

  • Knowledge of facility's emergency shut-off procedures. All employees must have knowledge of the appropriate shut-off devices for electricity, gas, and water. They should know how and when to extinguish these utilities. Employees must know the location of fire extinguishing equipment, first-aid kits, survival equipment, and emergency supplies. Employees must know when and how to notify emergency personnel such as police, fire department, or emergency medical personnel.

  • Evacuation and emergency staging areas. All employees should know and be able to perform their assigned responsibilities for evacuating building structures, location of staging areas, and performance expectations.

  • Procedures for alerting personnel of an emergency. All employees should be trained relative to what actually constitutes an emergency, when they should notify inside and outside emergency personnel, and their responsibilities for all stages of the recovery effort.

  • Emergency processes. These are procedures established for identifying and reacting to disasters. It is important that designated and trained employees are in a position to identify the nature and extent of the disaster and react according to the organization's emergency/disaster plan.

  • Employee notification process. These procedures call for notifying relevant and designated employees in the event of a disaster. A notification list must include home addresses and telephone numbers, cellular telephone numbers, pager numbers, and mobile device e-mail addresses.

  • Emergency operations center. This section of the contingency plan should also include the processes for setting up an emergency operations center (EOC). EOCs may be as simple as a preselected location where telephone communications are available, or something more elaborate as a specially designed EOC where very sophisticated communications equipment includes telephone, LAN, wireless wide-area computer network terminals, Internet, radio equipment, etc.

  • Emergency recovery processes. These procedures and instructions are relevant to the assessment of asset damage following a disaster. In this process, there is a specified means for determining whether a disaster exists, the extent of asset damage or destruction, and a mechanism for invoking the disaster recovery plan.

  • Business recovery processes. These are the procedures that assure the most-critical assets are recovered and implemented first. Through this part of the plan, critical profitable operations are restored first and other assets are restored in priority order, according to a specified timeline in compliance with the business disaster recovery plan. This is the section that prioritizes the restoration of assets according to the risk assessment.

  • Relocation procedures. Instructions for relocating the emergency operations are found here. Relocation sites may consist of a cold site, a warm site, or a hot site. Cold sites are locations where organizations may relocate with a minimum of equipment and services. Frequently, these sites require significant effort and delay before they can be activated, usually three to five days. Warm sites usually contain equipment and facilities requiring less effort, and are operational within 48 to 72 hours. Hot sites are equipped with redundant systems and applications for minimum profitable operations to continue within 24 hours of the disaster. In essence, hot sites are fully equipped and are just waiting for backed-up data to be installed. Frequently, organizations form partnerships with similar entities to establish and equip mutual hot sites. The significant disadvantage of this venture is an event where both partners require the site at the same time.

  • Salvage operations. These are instructions for salvaging undamaged or partially damaged assets. Often these assets consist of components of facilities, computer and paper records, etc. This is the section that details the filing of insurance claims and the possibility of reoccupying the original business site.



 < Day Day Up > 



Critical Incident Management
Critical Incident Management
ISBN: 084930010X
EAN: 2147483647
Year: 2004
Pages: 144

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net