Chapter 5: Preparing for Disaster

Sooner or later, systems fail. They are compromised and cannot be cleaned, hardware gives up the ghost, and forces beyond our control destroy machines and datacenters. When this happens, you may need to restore from a backup, bring up new systems quickly to replace the old, or move operations to remote sites. Speed may be of critical importance here, right alongside the need to provide platforms and systems that operate just like the ones that were doing the work before.

Understanding Disaster Recovery

Disaster recovery is the process by which this is accomplished. However, it cannot occur unless you have made sound preparations . There should be three parts to your plan.

First, determine which systems are so critical that the business cannot remain solvent without them. While you should develop a restoration plan for all systems, these critical systems are the place to start. You may find yourself prioritizing your systems into more than just critical and the rest. That s fine. In case of a major disaster, you ll want to know which systems to bring up first, which should follow, and those that can wait till later.

Next, for each group of systems, prepare the procedures that will bring them back online and train IT in how to use them. Creating the list of steps and attempting to practice them will bring to light new requirements and make the actual recovery run smoothly. Determine what type of offsite facilities might be necessary, and where backup machines are required. If your research uncovers a lack of proper backup methods and storage, add that to your list of processes to improve.

Finally, determine if current administration, installation, and maintenance processes can be improved to support rapid recovery efforts. If you find weaknesses, take the time now to correct them. Several common procedures can and should be changed.

