The Disaster Recovery Plan | Windows Server 2003 on Proliants. Deployment Techniques and Management Tools for System Administrators

< Day Day Up >

The beauty of disaster recovery for AD, thanks in part to multimaster replication, is that recovery from the loss of a DC is nearly inconsequential. Of course, you require at least two DCs per domain in your forest to leverage the fault-tolerance features of AD. Because every DC has a copy of the AD, even though they might have different versions, it is highly unlikely that the loss of a DC would also lose data. A new DC can be restored and populated with the AD data and you never have to touch a backup tape.

Multimaster replication is a two-edged sword because executing a delete operation ripples through the domain or forest within the latency period of the topology, and can be initiated on any DC. Microsoft has provided tools and some improvements in Windows Server 2003 that help recover objects that were inadvertently deleted.

Of course, there is the issue of a true disaster destroying part or all of the AD. This could be anything from a disgruntled employee to a natural disaster destroying all DCs in the forest or domain. Perhaps one of the lessons we learned from the terrorist attacks on the World Trade Center in New York City on September 11, 2001, was that disasters can happen and the company's capability to stay in business might well depend on a plan to effect a recovery if such an event occurs.

Developing the Disaster Recovery Plan

A well-thought-out and tested Disaster Recovery Plan is key to a successful recovery of data. This chapter offers points to consider and details concerning the realities of how to restore various components of AD. The components of the plan include

Service Level Agreement (SLA) to define recovery requirements
Backup methods , including frequency, data, media, and storage
Restore methods
Testing and validation

Service Level Agreement (SLA)

In designing the Disaster Recovery plan, the first step is to define the SLA. The SLA should define the frequency of backups (full daily, incremental, full weekly, and so on), and the recovery time. In fact, the recovery time might end up being the driving issue in determining how and how often backups are completed. For instance, if the SLA calls for a SQL database to be restored within four hours and a full backup of the database takes eight hours to restore, then you might consider breaking the backup job into smaller components, or perhaps splitting the database in smaller pieces to meet the restore requirements. It isn't hard to find companies who will back up a terabyte of data only to find it takes four days to restore it ”and of course they discover that after a disaster has occurred to require the restoration.

Backup Requirements

Backup requirements define how backups should take place and are secured, and should specify details such as

What is backed up : Files, directories, entire volumes , certain DCs or GCs, and user data should be backed up. You can't back up everything, and everything isn't important (for example, you don't need to back up every DC in your enterprise). Define what should be backed up.
Frequency : Should all files be backed up every day, or just certain files? Other options include a weekly full backup and daily incremental backups ”or a combination where some directories get a full backup every night, but others don't.
Media : Define the media to use for the backup: tape, DVD, disk, and so on. This includes providing secure storage for the media not only to prevent a security breach by unauthorized personnel, but also to define physical conditions to safely preserve the physical media.
Storage Location : There should be an off-site facility that contains backups of data as recent as possible, and perhaps an on-site location for convenience of the IT staff.

Recovery Requirements

The recovery requirements specify the recovery time that is acceptable to the business. The recovery time might vary for different data. A mission SQL database or Exchange, for instance, might require a faster recovery than a DC in a domain with many DCs.

Testing and Validation

Testing and validation is undoubtedly the most critical step in the plan. The plan must be tested and validated , yet many administrators have never even tested their backups to see if they can recover data. As a system Administrator for a small company years ago, I decided to do a full backup of the system disk and user directories and a few other areas every Friday night and do incremental backups on Saturday through Thursday. I kept the Friday tapes and did not recycle them. I recycled the daily tapes the following week because the current backup would always be Friday's tape plus the previous day. The most I could lose was one day's data, which was the decision we had made. On the last Friday of each month, I kept the tape as a monthly backup tape and kept those six months.

In addition to the backup, every morning I did a test restore procedure where I put the tape in, ran a program I created to get a directory of the tape, pulled some random files off of it, and generated a report. Thus, I knew every day whether the backup tape was valid. If the media broke I knew it at that point and replaced it. Make sure you validate your backup media ”your job and the company's future might depend on it.

< Day Day Up >