Creating a Disaster Recovery Plan | Solaris System Management (New Riders Professional Library)

This section identifies the broad steps that are needed to create a disaster recovery plan based on thorough preparation and teamwork. The intention of this section is not to produce a specific recovery plan, but to provide information on the type of activity that should be carried out and the sort of data that an IT disaster recovery plan should contain.

A disaster recovery plan is not created during one short meeting held late on a Friday afternoon. It can take up to two years to create and implement a solid plan. This is a complex document that addresses a number of broad objectives, such as the following:

Protecting the lives of company employees (and maybe members of the general public)
Reducing the risk to the business
Protecting the company against potential legal backlash from stockholders
Maintaining consumer confidence
Recovering critical business functions

The following steps provide a broad outline of how to tackle the task of creating the disaster recovery plan.

Step 1: Obtaining Agreement and Support

This is probably the most fundamental of all the steps to take: management support. It can also be one of the most difficult to achieve. A disaster recovery strategy will be effective only if it has the backing of senior management. One problem is that a disaster recovery strategy yields no tangible return on investment, and it could prove quite costly in terms of money and manpower resource. Management can be reluctant to approve this kind of expenditure for this reason; it is often viewed as unnecessary and overkill.

A justification in terms of potential impact (in hard cash terms) could go far obtaining the necessary support. For example, quantifying the actual loss of revenue for, say, a 24- hour period of unavailability, taking into account the fact that the staff still has to be paid, raises a number of eyebrows in the higher echelons, especially when the business is turning over millions of dollars per day. Management buy-in to the concept is essential for one other good reason:This should be funded as an overhead, not out of the system manager's budget ”that is, funding should come from a corporate budget or a company's contingency budget.

Step 2: Assembling a Committee

The creation of the disaster recovery strategy and associated plan should be run as a project. Therefore, a project manager should be coordinating the operation and guiding the project through its various stages. The project manager needs to identify key areas of operation and assign members of staff who are familiar with these areas to an emergency response committee. Examples within an IT disaster recovery scenario could include the system manager, the network manager, a member from the computer security department, and so on.

After assembling the committee, a recovery team should be identified, comprising key personnel and resources involved with the IT department. It is worth noting that this does not solely mean the computer staff. It needs to include, for example, an electrical engineer, the network supplier (telephones and data), and any suppliers that might be needed in the event of a disaster. The person elected to be the head of the recovery team must have sufficient authority within the company to make urgent decisions if it becomes necessary.

Step 3: Seeking Professional Advice

The design and implementation of a disaster recovery strategy is a complex issue and needs to be addressed properly to have the best chance of being effective. While the employees of the company know the business, and a qualified expert knows about disaster planning, the combination of the two forces ensures that the best possible option will be identified. For example, take a look at the organization Survive (http://www.survive.com), the leading forum for business continuity and recovery expertise.

Step 4: Carrying Out an Impact Analysis

An analysis of the business-critical systems and functions must be carried out to identify areas that are key to the survival of the organization. The analysis should answer questions such as these:

What is the impact if this business function is unavailable for an extended period of time?
What is the effect on this business if the database server were unavailable?
What is the financial impact of this business function?
Does this business function depend on any other business function?
What are the infrastructure dependencies of this business function? Does it require the Internet, a DNS server, or something else?
Does another business function depend on this one? If so, what is the financial impact of the dependent?

These are just some example of questions, but in reality, a full analysis would have to include all the business functions, all the computer systems, the computer network, and communications equipment such as telephones, faxes, and so on.

Step 5: Carrying Out a Risk Assessment

An assessment of the different kinds of disasters should be carried out to ascertain whether the location or type of business might be prone to any particular types of disaster. Examples of these could be if the company was geographically located in an area prone to earthquakes; another could be if the company carried out work for the defense industry and could be a target for terrorist attack. The result of the assessment would identify those elements posing the greatest risk and, conversely, those posing no risk ”as an example, a company located on top of a hill would not be at risk from flooding.

Step 6: Collecting Required Information

Before the plan can be put down on paper, some necessary information must be obtained and collated. The sort of information that is needed includes the following:

A callout list for management and disaster response team members
A floor plan showing the location of computer equipment and associated computer network infrastructure
A list of vendor contacts to be called in the event of a disaster
An inventory of the hardware listed by type, model number, configuration information, original cost, date purchased, and associated software for each system, including the version number
An inventory of the software, listed by cost, date purchased, license key codes, details of the system acting as the license server, and number of licenses purchased
Copies of hardware and software maintenance contracts
An inventory of mobile telephones held by members of staff, along with copies of the agreements
Any special information that might be required, depending on the type of business

The majority of the information listed here would be better placed as a series of appendices to the disaster recovery plan itself, with references to them within the document. The advantage of organizing the plan in this way is that changes are much easier to make without affecting the entire plan; the relevant appendix can merely be updated.

Step 7: Creating the Plan

Keep it simple and as nontechnical as possible. The instructions contained within a plan might have to be implemented by a nontechnical person if key members of staff are not available or are incapacitated. The plan needs to identify what to do for the following time periods, clearly referencing the information already collated in Step 6.

Initial Response: The First 24 Hours

This is the most crucial time when the emergency authorities will be contacted, along with the disaster recovery team. Communication is vital so that key members of the organization are kept informed of the current situation. During this time, the alternate site, if required, should be activated, and hardware and software vendors need to be contacted to arrange for replacement systems and network infrastructure to be delivered. A detailed log should be started to record the events and actions. This material will provide vital details for the learning process after the disaster is over.

Of course, the employees of the company also need to be informed of the situation so that they can be diverted to the alternate site or directed as appropriate to the particular instance. In larger companies, this can often be achieved via a tree network ”managers are informed and contact the supervisors, who contact the members of staff for which they are directly responsible.

The Next 48 Hours

This is the interim period during which the replacement hardware should be delivered and installation can commence. Procedures for recovering critical systems need to be clearly identified, including the complete restoration from the most recent set of backup media.

There should be a clear priority established, designed to restore the operating capability of the business as soon as possible. The instructions should be easy to follow because, as mentioned previously, the recovery might need to be undertaken by someone who is not part of the system administration team.

Resumption of Normal Business

Depending on the extent of the disaster, this period could last for days or weeks. The plan should identify the procedures, such as the members of staff requiredto work in the alternate site, to ensure that business continuity is achieved. This also should resolve issues such as what to do with excess staff that cannot be accommodated at the alternate site, maybe setting up a sort of shift system, in which employees work fewer hours but at different times of the day. Employees might have had to travel farther to reach the alternate site, so there may be additional accommodation issues and traveling expenses.

Return to Normalcy

This is when the original site is declared open again. The procedures must identify the sequence of events for relocating the business back to the original site. This includes doing final backups , restoring to the new replacement systems at the original site, and clearing out the alternate site, including making sure that it is left secure.

Step 8: Reviewing the Plan

When the disaster recovery plan has been created, the first thing to do is to have it reviewed, preferably by an independent party, for objective analysis. Comments and suggestions that result from the review should be addressed; if necessary, the plan should be amended to reflect the decisions made. The disaster recovery plan should be reviewed at least every year, but more frequently if changes are taking place within the organization.

Step 9: Rehearsing the Plan

The members of the disaster recovery team, the management, and key members of staff should all be made familiar with the contents of the plan. They should all be aware of where copies of the plan are kept, specifically copies that are kept off-site because these will be required if the building burns down. A rehearsal will make the recovery team more confident of the procedures to follow. As a result, gaps in the plan may be identified, and the plan can be amended accordingly . The rehearsal will not be anything like the real thing, but the more prepared the members are, the better they will react if it does happen for real.

The best rehearsal, though, is to have a drill. For example, the system manager arrives at work in the morning only to be denied access to the computer systems and informed that they have all been destroyed . His senior system administrator has also been incapacitated as a result of the disaster (the administrator would actually be maintaining the systems while the drill is in progress). In this situation, the system manager would have to invoke the disaster recovery plan as if it were the real thing. The members of a disaster recovery team often learn the most from drills like this because problems that occur can be noted and rectified, adding to the value of the overall plan. The simulation also allows errors to be made ”all part of the learning process ”without the consequences that would have resulted had it been for real.

Top