Chapter 27: Disaster Recovery of an Exchange Server 2003 Database | Microsoft Exchange Server 2003 Administrators Companion (Pro-Administrators Companion)

This chapter focuses on the backup and recovery of your Microsoft Exchange Server 2003 databases. You will learn how to perform a recovery of an Exchange Server 2003 database and become familiar with the differences between recovery in Microsoft Exchange 5.5 and recovery in Exchange Server 2003. You will also learn about a new feature in Windows Server 2003 named Volume Shadow Copy and how it integrates with Exchange Server 2003.

Backup and Restore Strategy

It goes without saying that you must have a good backup strategy before you can have a good recovery strategy. Implementing a solid plan and maintaining database consistency can improve the integrity of any Exchange Server 2003 database.

Your backup strategy determines your restore strategy. These operations cannot be planned separately. When you create a backup strategy, you should also consider how you would like to restore your databases. For example, ensure you have enough hard disk space to restore both the database and the log files. If you generate 2000 log files in a single week, you’ve got 10 GB of information to (potentially) restore. Add to that your database sizes, and you’ll begin to see why you need to plan your restore strategy along with your backup strategy.

You can’t perform a restore without knowing that your backups are working. You should be verifying your backup jobs every day to ensure you can perform successful restores. Failure to verify backups is a common mistake because it is easy to assume that backup tapes are swapped and that data is backed up properly. Make it part of your daily routine to review all backup logs and to follow up on any errors or inconsistencies.

Understanding the Extensible Storage Engine (ESE) and the Web Storage System (WSS) that is built on the ESE engine are also important to understand before planning your backup and restore strategy. In Chapter 2, "Understanding Exchange Server Storage Architecture," we outlined how this database works. In this chapter, we dive a bit deeper into parts of ESE that are relevant to the recovery on an ESE database, so be sure you are familiar with the concepts in Chapter 2.

Database GUID

Each ESE database has a globally unique identifier (GUID) that is assigned to the database and stored in Active Directory. This is important to understand, because if the GUIDs do not match at any point, the databases will not mount.

Mailbox GUID

Not only does each database have its own GUID, but each mailbox in the database has a unique GUID too. The mailbox GUID becomes an attribute of the user account in Active Directory to which the mailbox is assigned.

This is why you can disconnect and reconnect mailboxes between different user accounts. This also means that even though you can delete a user from Active Directory, the mailbox still exists in the database if you’ve configured a Deleted Mailbox Retention time. The default value is 30 days, meaning that when you delete a mail-enabled user account, the mailbox persists in the database for an additional 30 days after the user account is deleted.

Log File Signature

Each transaction log set has a unique signature that is written to the header of each transaction log in the set. If, for some reason, you delete all the log files in your transaction log set, when you restart the server, the ESE creates a new series of log files, starting with a generation number of one. Because log files can have the same name, the ESE stamps the header in each file series with a unique signature so that it can distinguish between different series of log files.

Circular Logging

Except on Exchange Server 2003 servers where recovery of information is not important, circular logging is a bad idea. Circular logging is intended to reduce storage requirements for the transaction logs after the transactions in the logs are committed to the databases. Fortunately, circular logging is disabled by default.

Checksum

The checksum (also called a message hash) is a string of 4-byte bits that is calculated and then added to each page in the database to verify the integrity of each page. The checksum itself doesn’t guarantee data integrity; instead, the recalculation of the checksum when the page is read into RAM ensures that the data being read from the database is identical to the data that was written to the database.

In terms of the overall page construction, the first 82 bytes of the database page contain the header information, which contains flags for the type of page as well as information about what kind of data the page contains. When a page is loaded into RAM, the checksum is calculated and the page number is verified. If the checksum doesn’t match the one that was written to the page when the page was written to the database, we can be sure that the page is damaged or corrupted. ESE will return an error, the database is stopped, and an event is logged informing you of the damage.

Note that ESE does not cause the damage to the page—it merely reports the damage to you. In nearly all instances, corruption to the database is the result of a hardware device or a device driver malfunctioning. ESE cannot cause page- level corruptions. These corruptions occur when the data is written to the disk and are caused by your hardware or device drivers. This is why it is imperative that you ensure all your firmware and device drivers are using the latest patches and updates. Microsoft Product Support Services (PSS) will work with your hardware manufacturer to resolve any problems that might exist between your hardware and your Exchange Server 2003 database.

Single Database Backup

In Exchange Server 2003, you can use the backup tool included with Windows Server 2003 to back up a single database. If you select an individual database for backup, the backup software will back up the .edb and .stm files for that database along with the needed transaction log files in the storage group. Best practice would be to back up the entire set of transaction logs when backing up an individual database.

You’ll also need Backup Operator permissions on the backup computer. Windows Server 2003 backup uses the permissions of the current logon to do the backup. Third-party backup utilities can function like Windows Server 2003 services, which use permissions from the service startup parameters. These are typically the permissions set in the LocalSystem account.

Domain and Configuration Partitions

When it comes to a full server recovery, you must understand that your Exchange Server 2003 configuration information, such as your Administrative Group or Routing Group configurations, is held in the configuration partition of Active Directory. Objects that are mail-enabled are in the domain partition, and if those objects have mailboxes, the mailboxes are stored in the Exchange databases.

Hence, remembering to back up your Active Directory System State data as well as your Exchange databases is critical. Both sets of information are required to perform a full system restoration.

You’ll also need to back up the Microsoft Internet Information Services (IIS) metabase. The metabase is a structure for storing IIS configuration settings, some of which pertain directly to your Exchange deployment, such as Microsoft Outlook Web Access and Microsoft Outlook Mobile Access synchronization. Failure to back up the IIS metabase will result in the need to rebuild or reinstall portions of Exchange Server 2003. You can view the metabase using utilities such as MetaEdit and Mdutil.

Don’t confuse the IIS metabase with the metabase update service in Exchange Server 2003. The metabase update service reads data from Active Directory and writes it into the local IIS metabase. When this service is notified by Active Directory that changes have occurred in the directory, the service gathers these changes and then updates the metabase automatically.

Backing up all the files on a local hard drive is common. However, the file system backup method is not the best method of backing up the IIS metabase because the metabase maintains dependencies on other components that are not saved using a straight file system backup. Also, the backup file might be undergoing modifications at the time of backup. The best method of backup is to perform a system state backup, which backs up the metabase.

Types of Backups

You can perform five basic types of backups with the Windows Server 2003 Backup utility (and with most other backup utilities). The key difference among these backup types is how each one handles the archive bit that is found in every Windows Server 2003 file. When a file is created or modified, the archive bit is set to on, as shown by the A in the Attributes column in Figure 27-1. After some types of backups run, the archive bit is set to off, which indicates that the file has been backed up. If, prior to the backup, a file’s attribute has been set to archive manually by an administrator, that file will be backed up with the others.

click to expand
Figure 27-1: Contents of the Mdbdata folder, showing the archive bit set to on.

The five types of backups are as follows:

Normal During a normal backup, all selected files are backed up, regardless of how their archive bit is set. After the backup, the archive bit is set to off for all files, indicating that those files have been backed up.
Copy During a copy backup, all selected files are backed up, regardless of how their archive bit is set. After the backup, the archive bit is not changed in any file.
Incremental During an incremental backup, all files for which the archive bit is on are backed up. After the backup, the archive bit is set to off for all files that were backed up.
Differential During a differential backup, all files for which the archive bit is on are backed up. After the backup, the archive bit is not changed in any file.
Daily During a daily backup, all files that changed on the day of the backup, which are identified by the modified date of the file and not the archive bit, are backed up, and the archive bit is not changed in any file.

Note
In this chapter, we will refer to a full backup, which is simply a normal backup with all Exchange-related items selected.

When you initially create a backup job, you manually select the files to be backed up. In most backup software programs, including the Microsoft Windows 2000 Backup utility, these jobs can be saved and reused. In some cases, not all selected files are actually backed up. Normal and copy backups back up all selected files, but in the case of an incremental, differential, or daily backup, the selected files must also meet the selection criteria of the backup type, as just listed.

All five types of backups apply to Exchange 2000 data, although only three are commonly used: normal, differential, and incremental. Daily and copy backups normally apply only to file-level (Microsoft Word documents or Microsoft Excel spreadsheets) backups. Of course, none of this applies to an offline backup, which is the backup of databases while the store.exe process is stopped. Offline backups are a good way to get a current “snap-shot” of the database while it is in a consistent state. However, the problem with performing an offline backup is that you have to bring all the stores down, something most environments are loath to do. Besides, with the advent of the Volume Shadow Copy (VSC), which we discuss later in this chapter, you’ll see that performing an offline backup is not the best choice in most scenarios.

The following list describes what happens with regard to Exchange 2000 Server during each type of (online) backup:

Normal The selected Exchange stores are backed up, and the transaction logs for those stores are purged.
Copy The selected Exchange stores are backed up, but the transaction logs are not flushed.
Daily With respect to Exchange, a daily backup performs the same backup as a copy backup.
Differential Only the transaction logs for the selected stores are backed up. Because differential backups are supposed to back up all changes to the stores since the last normal backup, the transaction logs are not flushed so that they can be backed up again during the next differential or normal backup.
Incremental Only the transaction logs for the selected stores are backed up. Because incremental backups are supposed to back up only the changes to the stores since the last normal or incremental backup, the transaction logs are flushed.

Backup Strategies

Given the five types of backups covered in the preceding section, most administrators use one of three strategies for backing up a server. These strategies all start with a full backup of the Exchange server, performed on a regular basis—for example, every Sunday. One strategy then continues with full backups daily, another involves performing an incremental backup on all other days of the week, and the last calls for performing a differential backup on all other days of the week.

Full daily backup Every day of the week, complete a full backup of your Exchange server. If you follow any other backup strategy, you run the risk of having to revert to a backup that is several days or weeks old. An example of a failure would be when your weekly full backup failed in a normal plus daily incremental backup strategy. You would then have to restore all of the previous week’s backups. Money spent on large-capacity backup systems (such as DLTs) is money well spent.
Normal plus daily incremental backup On Sunday of each week, perform a full backup of all files on the Exchange server that you decide need to be backed up. On Monday, perform an incremental backup that backs up all files that have changed since the full backup. On Tuesday, perform another incremental backup that backs up all files that have changed since the last incremental backup on Monday. At the end of the week, you have performed a full backup and six incremental backups. To restore these backups, you would first restore the full backup and then restore each incremental backup, in order.
Normal plus daily differential backup On Sunday of each week, perform a full backup of all files. On Monday, perform a differential backup that backs up all files that have changed since the full backup. On Tuesday, perform another differential backup that backs up all files that have changed since the last full backup, which occurred on Sunday. Each consecutive differential backup backs up all files that have changed since the last full backup. To restore these backups, you would first restore the full backup and then restore only the most recent differential backup.

In all strategies, plan to use your transaction logs to your advantage. Every backup strategy must incorporate the role that transaction logs play in recovering data up to the point of the disaster. Remember, your transaction logs represent what will happen to your database in the future. Often they hold committed transactions that have yet to be written to the database. (Consult Chapter 2 for a good discussion of the transaction log architecture.)

When a disaster strikes your Exchange server, the information generated in your Exchange organization since the last backup can be recovered from the transaction logs. For instance, if your server finished a full backup last night at 11:30 P.M., and then at 4:30 P.M. today the disk containing one of your Exchange stores experienced a failure, you would recover today’s information from your transaction logs. This ability to recover assumes you have your transaction logs on a different physical disk from the store that experienced the failure. If the logs were on the same disk as the store, you would be able to recover only up to 11:30 last night, when the backup took place. Let’s continue this scenario under the premise that the logs are on a separate disk and the disk with the store experiences a failure. To recover, you do a full restore of last night’s backup from tape. Then when you start the store.exe process, store.exe attempts to replay all the transactions in the transaction logs back into the databases. When it is finished playing these transactions back into the database, the service will start and your databases will have been restored to the point in time when your disaster occurred.

When the store.exe process is started under normal (nonrecovery) conditions, such as during a proper shutdown and restart of the Exchange server, all transaction logs will be replayed unless the checkpoint file is available. Essentially, that file tells the store process which portions of the transaction logs have already been written to the databases and which have not. If the checkpoint file is available, only those portions of the transaction logs that were not previously written to the database will be replayed to the database if the transactions in the logs are more recent than the transactions in the database.

We’re hopeful you’ll see why it is important to make sure that your transaction logs are sitting on a different spindle from your databases, preferably one that has some type of disk fault tolerance, such as mirroring or disk striping with parity. If you lose your databases, you can recover by using the combination of tape backup and transaction logs. If you lose your transaction logs but not your databases, perform a clean shutdown of the store.exe process and your database will be up to date because all the committed transactions in memory will be written to disk. You can better understand now why it is very important to guard your transaction logs.

The Backup Process

You begin the backup process by starting the backup application. The backup application makes calls to the Web Storage System with the type of backup desired, and then the backup procedure begins. WSS informs the ESE that it is entering a backup mode, and then a patch file (.PAT) is generated for each database in the backup (assuming this is a full backup). During an online, full backup, the database is open for business and transactions can still be entered into the databases. If a transaction causes a split operation across the backup boundary (the location in the .edb file that designates what has and has not been backed up), the affected page before the boundary is recorded in the .PAT file. A separate .PAT file is used for each database that is backed up, such as Priv1.pat, Pub1.pat, or Srs.pat. These files are seen only during the backup and restore processes. During differential or incremental backups, a patch file is not created.

When the ESE enters a backup mode, a new log file opens. For example, if Edb.log is the current open log file, Edb.log is closed and is renamed to the latest generation and a new Edb.log is opened. This indicates the point when the ESE can truncate the logs, after the backup is complete.

Also, when the backup begins, backup requests that ESE read the database and sequence the pages. After sequencing, the pages are grouped into 64 KB chunks (16 pages) and then loaded into RAM. ESE then verifies the checksum on each individual page to ensure data integrity. If any page has a calculated checksum that does not match the checksum that was recorded in the page when the page was written to disk, backup stops the process of backing up the database and records an error message in the event logs. Backup does this to prevent the storage of damaged data. The very nice thing about all this is that when you get a successful full, online backup of your Exchange databases using the Exchange agent from your software vendor, you can be certain that the database on your tape has complete integrity, because every page was read into RAM, its checksum calculated, then copied to tape.

Once the backup has successfully completed and all the pages are read, backup copies the logs and patch files to the backup set. The log files are then truncated or deleted at the point when the new generation started at the beginning of the backup. The backup set closes, the ESE enters normal mode, and the backup is complete.

In an incremental or differential backup, only the log files are affected. Operations that involve patch files, checksums, or reading pages sequentially are not executed.

To recap, here are the steps of the backup process:

The backup starts, a synchronization point is fixed, and an empty patch file is created.
Edb.log is renamed to the next log number regardless of whether it is full, and a new Edb.log is created.
The backup for the current storage group begins.
A .PAT file is created for each database that is being backed up in the storage group, and the database header is written into the .PAT file.
During backup, split operations across the backup boundary are written into the .PAT file.
During backup, Windows Server 2003 Backup copies 64 KB of data at a time. Additional transactions are created and saved as normal. Each page’s checksum is calculated and compared to the checksum recorded for that page in the page. The checksums are compared to ensure data integrity on each page.
Logs used during the backup process (those from the checkpoint forward) and the patch files are copied to tape.
The old logs on the disk are deleted.
The old patch files on the disk are deleted.
Backup finishes.

What we have been describing so far is the online backup process. There is another type of backup called the offline backup. Offline backups differ from online backups in that the database is stopped before the backup process starts, allowing you to save a copy of a consistent database file. Offline backups are always full backups because the database shuts down. An offline backup is always the less preferable choice, because you must dismount the database before performing it.

Restore Process Overview

Before you begin the restore process, the database or storage group must be dismounted and made inaccessible to users. You can do this by using the Exchange System Manager (ESM).

When a restore operation begins, the store informs the ESE that a restore process is starting and ESE enters restore mode. The backup agent copies the database from the tape directly to the database target path. Remember that the database is a file pair of the .EDB and .STM files. The associated log and patch files are copied to the server in a temporary location specified by you so that they aren’t saved to the same location as current files in the production environment. If you happen to select the production path as your temporary path, you can overwrite log files and cause a logical corruption of the current production database. So, ensure that your temporary path is not your production path.

After the log and patch files are restored to the temporary location, a new restore storage group starts specifically for the purpose of restoring the database. The database is then copied from tape to the temporary location (and into the restore storage group). Then the patch file data and the log files from the tape backup are copied into the database by the restore database engine.

This means that each transaction in each log file is treated as follows. Each transaction’s data and time stamp is read along with the page number in the database that the transaction references. Then the date and time stamp on the page in the database is read and compared to the date and time stamp of the transaction in the transaction log. If the transaction in the log has a more recent date and time stamp than the one on the page in the database, the transaction from the transaction log is written to the database. If the opposite is true—that is, the date and time stamp on the page in the database is more recent than the one on the transaction in the transaction log—ESE skips that transaction and moves to the next transaction to replay it into the database.

Hence, ESE processes the current logs, bringing you back to the point at which your database became corrupted (assuming you have all the transaction logs available from the last full, online, successful backup to the point of the disaster). After this is complete, ESE performs some cleanup by deleting log and patch files from the temporary location and deleting the restore storage instance. Then the storage group is mounted into the production environment and your database is mounted too.

Restoring the Binary Files

Because the Exchange Server 2003 configuration information is held in the Configuration partition of Active Directory, you can recover an Exchange server more simply than in Exchange 5.5. If the Exchange Server 2003 server to which you are restoring files is a member server in a domain, be sure that Active Directory is running. Run Exchange System Manager and verify that a valid server object still exists for the Exchange Server 2003 server in Active Directory. If Active Directory does not exist, restore Active Directory prior to restoring Exchange Server 2003.

If the Exchange server you want to restore is also the domain controller, begin by restoring Active Directory on that computer. You can restore Exchange Server 2003 only after Active Directory is successfully restored. The security ID on the restored server must match the security ID of the original server. If the security IDs do not match, you cannot access Web Storage System until you restore only Web Storage System and then manually rebuild the Windows Server 2003 accounts.

Considerations of Different Restore Scenarios

Sometimes, you don’t need to restore an entire server or even an entire database. In this section, we’ll discuss the considerations of different restore scenarios. Specifically, we’ll look at the following:

Restoring online backups
Restoring offline backups
Restoring a single mailbox
Restoring a single database
Restoring a database to a different server
Restoring log files

Restoring Online Backups

Restoring an online backup of your databases is the preferred method of restoring a database because the transaction log entries can be replayed into the database during the restore process. Online backups will use the patch file along with the transaction logs to restore the database. When possible, this is the preferred method of restoring an Exchange Server 2003 database.

Restoring Offline Backups

If you need to replace hardware on your Exchange Server 2003 server, you might want to consider performing an offline backup and restore. Remember that database services are stopped during an offline backup, so those users whose mailboxes are homed in that storage group will not have e-mail services until the database is restarted.

You might also want to use an offline backup of your databases if, during an online backup, the backup process fails because you receive a -1018, -1022 or some other page-level corruption error. Doing an offline backup of the database allows you to take a snapshot of the database before you work on correcting the problem. The logic here is that if your work further corrupts the database, you can always fall back on the offline copy and try your efforts again.

One major problem with an offline backup is that the pages are not checked for integrity during the backup phase, as they are in an online backup operation. Also, the ability to replay transactions back into the database during the restore operation is not available to you. Essentially, restoring an offline backup restores you to the point where the store.exe process was stopped.

The process of restoring an offline backup is rather simple: copy the database to the correct location on your server and start the store services. Be sure you have the correct transaction log set that went with the database before starting mounting the database.

Restoring a Single Mailbox

If you need to recover a single mailbox that has passed the mailbox retention time period, you need to either use third-party software or restore the entire database to a recovery server. Because both of these operations can take a tremendous amount of time, we recommend setting your mailbox retention times to a number that exceeds nearly every mailbox restore scenario that you’ve experienced. To do this, open the ESM and navigate to the mailbox store on which you want to set the retention time. Open the store’s properties and click on the Limits tab (Figure 27-2). On the Limits tab, set the retention times for Keep Deleted Items For (Days) and Keep Deleted Mailboxes For (Days).

click to expand
Figure 27-2: Setting the deleted mailbox retention time in the store properties.

Because mailboxes are merely an attribute of the user account, reconnecting a mailbox to a new account is easy. This is why, if you need to delete a user account, you retain the mailbox associated with that account for a considerable amount of time. Let it expire via the deleted mailbox retention time only after you know that you don’t need that mailbox anymore.

When a mailbox is deleted, it is marked with a red X in the System Manager interface. You can reconnect the mailbox to a new user account by right-clicking the mailbox and using the Reconnect feature.

Restoring a Single Database

If you need to restore a single database, dismount it using the Exchange System Manager and then restore the individual database. Notice that you don’t need to stop the store.exe process—it continues to run. Instead, you’ll just dismount the database over which you’ll need to restore and then restore the copy from your tape backup. The restore process creates a special restore storage group, and the database and transaction log files from the backup are restored in that storage group. After recovery, the consistent database is mounted into its original storage group by ESE.

Note

The format of transaction log files is revised in Exchange Server 2003. When you upgrade from Exchange 5.5 to Exchange Server 2003, the existing transaction log files are removed and a new log series is created. Because of the log format change, you cannot restore an Exchange 5.5 database to an Exchange Server 2003 server.

Restoring Databases to a Different Server

You can restore your databases to a different Exchange Server 2003 server other than the one from which it was backed up. Use this method as a last resort to restore individual items or databases. The secondary server must meet the hardware requirements to run Exchange: it must not be connected to the network and it must have enough disk space to restore the entire backup.

To restore a database to a different server, the database display name and the storage group display name must be the same. In addition, the organization name and administrative group name for the server to which you want to restore must match the server from which the database was backed up. You’ll also need to configure the current databases to allow them to be overwritten so that the new databases with the new signatures can overwrite them during the restore process.

This method restores only the ESE databases, so do not use it when you need to recover an entire server. After you copy or move your databases to a different server, you need to reconfigure permissions on the mailboxes before your users will be able to use them.

If you have a large number of mailboxes that need to be connected to their corresponding Active Directory accounts, you can use MBCONN (which is mbconn.exe, or the Mailbox Reconnect Tool, located in the \Support\utils\i386 directory on the Exchange Server CD). This tool is especially helpful when you have just replaced or added a new Exchange server to your Exchange organization. If you are familiar with the Exchange 5.5 DS/IS Consistency Adjuster, you’ll understand the concepts behind the Mbconn tool. It essentially performs the same functions as the DS/IS Consistency Adjuster.

Single Mailbox

Most third-party backup applications will back up individual mailboxes with backup selection granularity down to the item level (for example, you can back up a single calendar entry or a single message). If you are not using mailbox- level backups, or you are using the Windows Backup utility, the deleted mailbox retention period has expired, and you need to restore a single mailbox, you must use an offline server and restore the mailbox there. In most cases, you won’t need to do this because of the mailbox retention features, Dumpster and ExMerge. But let’s go over the steps in the event that you do.

First, ensure that the offline server is in a different Windows Server 2003 forest from your production servers. Second, the storage group that hosts the restored database must have the same display name as the original production server. Third, the database you want to restore must have the same display name as the original production server. The database name must be unique on the backup server in all storage groups. For example, if the database name is Priv.edb, there can only be one instance of a Priv.edb database on the secondary server. The organization name and the administrative group name must be the same.

To recover the mailbox, reconnect the mailbox to a dummy user account, and use Exmerge to create a .PST (personal store file) of the mailbox. Then import that information into the regular mailbox.