Planning for Disaster Recovery | Microsoft SQL Server 2000 Administrators Companion

3 4

To increase your system's uptime, it's not enough simply to cluster your servers (as explained in Chapter 12) or use RAID (as described in Chapter 5)—you must also plan for recovery before disaster strikes. Knowing how to perform good backups and restores of your database when you need to is vital, but you must also be prepared to rebuild your system from scratch if necessary. Being ready involves documenting and planning. In addition, you might find the use of the new SQL Server 2000 feature log shipping to provide sufficient recoverability. This feature allows transaction logs from the primary system to be applied to standby systems.

As a DBA, you should develop a plan for providing maximum uptime. This plan should include the following components:

Documenting the current configuration
Creating a fault-tolerant environment
Preparing for immediate recovery
Documenting the database recovery plan

All of these steps involve planning, creating documentation, or both. Often the recovery plan is not documented, and only the developer of the plan is able to execute it—this can be disastrous if that person is unavailable.

NOTE
Be sure you document your disaster recovery plan and designate a person to execute the plan if you are unavailable when disaster strikes.

Documenting the Current Configuration

If the steps involved in creating your current configuration are not well documented, problems can arise if the system needs to be reconstructed or even if new hardware is added to the system. Documenting the current configuration enables you to more quickly rebuild, reinstall, and reconfigure the system. Be sure you include the following information:

Hardware configuration Include the type and the amount of hardware, the RAID configuration, and other configuration details.
Software products installed Include complete information about all software that is installed on the server.

Creating a Fault-Tolerant Environment

As we saw in Chapter 5, the use of fault-tolerant RAID volumes can greatly reduce the chance of system failure. Disk drives are mechanical components and, as such, are subject to wear and tear. By using a fault-tolerant volume, you might be able to spare yourself the trouble of having to reinstall the operating system and SQL Server and to rebuild the database from backups, which can be a time-consuming and costly process (costly in terms of both labor and lost revenues caused by the system outage).

If one of your fault-tolerant disk drives fails, replace it as soon as possible. Eventually, another disk might fail, and some RAID levels can survive the loss of only one disk. Be prepared with spares, and be ready to use them.

Preparing for Immediate Recovery

Be ready to recover the database if necessary. Backups should be stored off-site to safeguard them against disasters such as fires or floods, but you should also keep a copy of the latest backups close at hand because you might not be able to retrieve off-site backups quickly. The use of a backup staging area is a popular way of keeping the latest backups on-site as well as improving backup and restore performance. A backup staging area is located on your network and consists of disks that store the latest database and transaction log backups. In case a restore is needed, you will not need to use tapes or retrieve them from off-site storage—the restore can begin immediately. This technique is useful if you need to perform a restore immediately, but it can be quite costly because you must devote hardware resources to storing the backups. You must evaluate your budget and your needs to determine whether a backup staging area is feasible for you.

Documenting the Database Recovery Plan

Because you might not be present when database recovery is necessary, you should carefully document the database recovery plan and share this information with others so that they can recover the system if necessary. The database recovery plan document can also be a helpful reminder if it's been a long time since your last database restore. In any event, keeping a written copy of the database recovery plan will make the restoration process run more smoothly.

Log Shipping

The Log Shipping feature of SQL Server 2000 allows for a standby system to be created and kept up to date by applying recent transaction logs to the standby system. The standby system is kept in a permanent recovery mode, continually having transactions logs applied to it. Even though this system is in recovery mode, read-only queries are allowed, thus enabling you to use the standby database for offloading reporting tasks.

In the event of a catastrophic failure on the primary server, the standby server can quickly and easily be converted into the new primary server. This feature has been used in the field by many people with SQL Server 7, but is now officially supported by Microsoft and usability features have been added. These features include the ability to setup and maintain the standby systems via the Database Maintenance Plan Wizard and the SQL Server Agent.