5.3 In depth: Exchange 2003 restore operation | Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers (HP Technologies)

One of the keys to a solid disaster-recovery strategy for Exchange is a thorough understanding of how Exchange’s database engine performs recovery operations. While there is a tremendous amount of information on the basics of how Exchange restore operations work, much of this information is inaccurate or not detailed enough for careful planning of your Exchange server recovery. In this section, I will take a closer look at how Exchange restore operations work at the database engine and API-level. Also, note that while this discussion is centered around Exchange 2000/2003, previous versions work similarly, even though several minor variances exist between different version and service pack levels of Exchange Server.

Obviously, a restore operation is an administrator-initiated activity. This means that before a database can be restored, two important things must happen. First, the database needs to be dismounted using the Exchange System Manager (ESM) MMC snap-in or some other means (such as a script that does this via Windows Management Instrumentation, WMI, using Exchange’s Collaboration Data Objects for Exchange Management, or CDOEXM). In addition, the database must be configured (via ESM and so forth) to allow it to be overwritten by a restore operation; by default, the database cannot be overwritten—and important safety measure. Once these preparatory tasks are completed, the database is ready to be recovered.

5.3.1 Beginning the restore and copying the databases

By reading the beginning of the backup set, the backup application gets a list of databases that are available. Once the administrator selects the correct database to be recovered, the backup application begins by making ESE API calls to start the restore. First, the backup application asks the administrator for inputs, such as the server to restore to, the location, and a temporary directory for the log, patch (if applicable), and restore.env files (see Figure 5.4). The backup application makes the HrESERestoreOpen call to gather this information and then the HrESERestoreAddDatabase call once for each database that is going to be restored. At this point, ESE leaves it to the backup application to restore the needed database files to the proper locations. ESE does not get involved much in copying the database files to disk from the backup set. ESE allows the backup application to make Win32 file system calls directly to the operating system and copy the files.

click to expand
Figure 5.4: Exchange 2003’s restore operation.

The reason for ESE’s lack of involvement is based on the reasoning that the database files have already been checksummed when they were backed up, and if the backup set is complete, the databases should be intact. There is no reason for ESE to check the database integrity when it is being restored. Since the databases being restored are dismounted (i.e., not open files), it is much simpler and faster to have the backup application copy these files directly to disk.

5.3.2 Restore the log and patch files

Once again, at this point in the operation, the backup application does not need the help of ESE for a while. The backup application simply calls HrESERestoreOpenFile (ESE does use this call to create the metadata needed for the restore.env file) for each log or patch file to be restored and copies these files to the temporary directory specified at the start of backup by the administrator (note that patch files are no longer used after Exchange 2000 SP2). The log and patch files are copied to the temporary directory because of the requirement to keep them separate from the log files in the production log file directory. This prevents naming conflicts or overlaps between log files in the backup set and the log files on disk. The best course is to copy the log files from the backup set to the temporary directory.

5.3.3 The restore environment

Once all log and patch files have been recovered from the backup set, the backup application makes a call that was first introduced in Exchange 2000. If you recall previous versions of Exchange, the Restore_In_Progress key is created in the system registry during a recovery operation. This key contains information about the recovery operation in progress for the database engine (of which there is only one instance in Exchange 5.5 and previous versions). In Exchange 2000/2003, however, there are multiple instances of the database engine (storage groups), as well as concurrent recovery capabilities, and a single key in the registry will not suffice. This led to the advent of the RESTORE.ENV file (which stands for “Restore Environment”). Because a single registry key won’t do in the case of concurrent recovery, the RESTORE.ENV file is created during recovery by the backup application when it calls HrESERestoreSaveEnvironment. ESE returns the necessary information (similar to that which was stored in the Restore_In_Progress key in previous versions of Exchange) to the backup application, and the RESTORE.ENV file is saved in the temporary location with the log and patch files. You can view the contents of the RESTORE.ENV file using the ESEUTIL program with the /CM switch. The log, patch, and RESTORE.ENV files will be used to complete the recovery operation in the next step.

5.3.4 Completing the restore and running hard recovery

Once all the backup sets that are necessary for the current recovery operation are copied, the backup application is ready to complete and terminate its activities and turn control back to the store process. The backup application calls HrESERestoreComplete and HrESERestoreClose to signal ESE that it is time to take over. At this point, you would think that the storage group that owns the database being recovered would take over and complete the recovery operation. However, this is not the case. The store process instantiates a separate ESE storage group specifically for the purpose and duration of completing the recovery operation. This recovery storage group then takes over and performs the hard recovery operation. Hard recovery is the process of applying patch files to the database, replaying log files from the backup set (located in the temporary directory), and replaying log files from the production log file directory. Once hard recovery completes successfully, the database is ready to be mounted and made available for users. It is important to note here that hard recovery can be performed manually using the ESEUTIL program (/CC switch) if you are performing simultaneous restores or hard recovery did not complete automatically for some reason (like if you forgot to check the “last backup set” checkbox after restoring the last set of log files). Once automatic hard recovery is completed, the recovery instance deletes the files in the temporary directory, terminates, and turns control over to the storage group that owns the database for normal operations. The owning storage group can then mount the database and begin to apply user transactions.

Too often, Exchange disaster-recovery operations (backup and restore) are trivialized. However, it is of paramount importance that we as Exchange administrators understand exactly how these operations work and the impact they have on our ability to meet service level agreements for our Exchange deployments. This in-depth drill down into the internals of Exchange’s ESE and how it exposes these capabilities and performs these operations will serve you in your quest to provide the highest levels of availability for Exchange.