Chapter 5: Protecting and Recovering Exchange Data | Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers (HP Technologies)

I have spent a significant amount of time delving into Exchange storage technology in previous chapters. This is because of the critical role the Exchange storage engine plays in Exchange server reliability and data integrity. In the last chapter, I looked at why Exchange servers fail and the dependencies that Exchange has on other services and infrastructure. In this chapter, we will dig a bit deeper into the Exchange database engine in hopes of revealing even more to you about how critical this storage mechanism is to your data. Once again, at the core of the store process (STORE.EXE) and Exchange’s ESE lie the keys to backing up and recovering Exchange data.

5.1 Exchange backup/restore fundamentals

With the changes and new options implemented for Exchange 2000 (and carried forward to Exchange Server 2003), the disaster-recovery API that Microsoft provides as part of Exchange underwent a bit of a facelift. The ESE database engine in Exchange Server has always made an on-line backup API available via two DLLs called ESECLIB2.DLL and ESEBACK2.DLL. (For Exchange 5.5, the DLLs are ESECLIB1.DLL and ESEBACK.DLL.) This API allows Exchange to stay operational and service users while backup operations are performed. In previous versions of Exchange, since there was only one ESE instance, restore operations were performed off-line (the server is down until the restore is complete).

However, since Exchange 2000/2003 provides multiple ESE instances (storage groups), recovery operations can be underway within one storage group while another storage group continues to service users. This means the API must be adapted to allow for concurrent operations and to handle the fact that multiple SGs, MDBs, and log file sets must be managed. In addition, there is new stuff to back up in Exchange 2000/2003 such as the Site Replication Service (SRS) and the Key Management Server (KMS), which has been removed in Exchange 2003 Server. SRS and KMS were not included in backup operations for previous versions of Exchange. Finally, the ESE recovery API must allow for more granularity. You now are able to back up/ restore an entire storage group (best practice) or an individual database (MDB). What’s more, while a database was one file (*.EDB) in previous versions, it is now a set that includes both the EDB and the STM file. The API has undergone several changes in Exchange 2000 to accommodate these needs. In this section, we will take a look at how Exchange 2000/2003 performs backup and restore operations. Microsoft Exchange Server provides a specific backup API to allow backup products, including Windows Backup, to access the contents of the Exchange information stores while they are on-line.

5.1.1 Exchange backup types

The backup and restore functions of Exchange’s ESE provide three types of backup capabilities: full, incremental, and differential. Combined with Windows Server 2003’s VSS, Exchange Server 2003 also supports snapshot backup types (more on this type of backup later).

A full backup (also called a normal backup) backs up the entire directory-or information store and allows you to restore it from a single backup. An incremental backup backs up just the changes since the last full or incremental backup. These are simply the transaction logs that have accumulated since the last full backup. Restoring incremental backups requires the original full backup plus all the incremental backups (transaction logs) made since that time. A differential backup backs up the changes since the last full backup. Restoring a differential backup requires one differential backup and the original full backup. Appendix A provides pointers to backup and restore API functions and their specific uses.

On-line backup operations are fundamental to Exchange Server and enable you to back up databases without shutting down the entire server to perform a file-by-file type backup (off-line backup). While backup operations are in progress, all services continue to operate, and users can access their data on the Exchange server. Database pages that are cached in memory in the information store buffer pool continue to be updated and flushed to the database on disk. Transactions also continue to be written to the transaction log files, and the checkpoint file continues to advance. All in all, the backup and restore technology for Exchange 2000/2003 is very similar to previous versions of Exchange Server with one notable exception. Exchange 2000’s advent of multiple storage groups and multiple databases (MDBs) has a substantial impact on how the Exchange backup API works. Table 5.1 compares the backup types available for Exchange 2003.

Table 5.1: Exchange Server 2003 Backup Types
Backup Type	Files Included	Logs Truncated?	Restore Method
Normal (Full)	Database (EDB+STM) files, Log files, and Patch files (for Exchange 2000 SP1 and earlier versions)	Yes	Last normal backup
Incremental	Log files only	Yes	Last normal + all incremental backups
Differential	Log files only	No	Last normal + last differential
Copy	Database (EDB+STM) files, Log files, and Patch files (for Exchange 2000 SP1 and earlier versions)	No	Not applicable
Snapshot (Windows VSS)	Special (more on this type of backup in later in the chapter)	Special*	Special*

5.1.2 Normal (full) and copy backups

The normal backup (also referred to as a full backup) is the fundamental unit of operation for most Exchange deployments. Regardless of the strategy you select for backup, the normal backup type will be part of your operational procedures. With a normal backup, both the database files and the log files are copied to tape. In addition, the log files are truncated or deleted once they have been copied to the backup media. The truncation point for the transaction log files is the current database checkpoint location. The normal backup operation is also important to database integrity since only during a normal backup are the 4-KB database pages checked for corruption (they are also checked during copy backups and on-line database maintenance as well). This is accomplished by verifying each page read to make sure that the page number requested is the page the database engine received. Next, each page’s CRC information (contained in the page header) is verified to ensure that the data contained in the page is valid. The normal backup is also important to the ESE Page Zeroing feature, which I will discuss later in this chapter. To restore from a normal backup, you only need to restore the complete set and allow the ESE database engine to replay any log files required for the database to be in a consistent state. Similar to the normal backup is the copy backup. A copy backup differs in that it does not truncate or purge log files once they have been copied to tape. In addition, the copy backup does not update database backup context information contained in the database file header. Copy backups are very useful for archival purposes or other scenarios in which you want to back up your Exchange databases, but do not want to disrupt the normal backup schedule. A copy backup performs the same functions of integrity checking and page-zeroing (if enabled) as the normal backup.

5.1.3 Incremental and differential backups

The backup process in the case of an incremental or differential backup type is somewhat different. Again, as in the case of a normal backup operation, the truncation point marks the beginning of backup. The current E0 n .LOG file ( n being the storage group number or designation) is renamed to E0 nnnnnn .LOG, and a new generation is started in a new E0 n .LOG. With the incremental and differential backup types, no database files are copied to tape. It is also worth mentioning that, since the databases are not copied, no page verification or checksumming is performed on the database to ensure integrity (during recovery, the log file records are also checksummed to ensure integrity). This is a notable point when selecting which backup strategy you will use for your Exchange deployment. Since an incremental or differential backup only operates on the log files, if circular logging is enabled, neither incremental nor differential backup operations will be capable of providing complete recovery. Like a normal backup, an incremental backup will delete log files up to the truncation point once they have been backed up. This is the key point of difference between incremental and differential backups. To restore from an incremental backup, you will need your last normal (full) backup set plus any incremental backup sets that have been made since. For Exchange 2000/2003, you must indicate when the last backup set has been restored in order for the ESE database instance to recover the database properly.

Like an incremental backup, the differential backup is also only concerned with transaction log files. The point of difference from an incremental backup is that a differential backup does not delete the log files at the truncation point (i.e., the current checkpoint location). While the start of the backup operation is marked by closing and renaming the current E0 n .LOG to E0 nnnnnn .LOG and creating a new E0 n .LOG generation, the log files are left intact. To restore from a differential backup set, as was the case with an incremental backup, the last normal (full) backup set is required. Next, this is combined with the latest differential backup set. This is due to the fact that, since the differential backups have left log files intact throughout subsequent backup operations, the latest differential backup contains all log files created since the last normal backup set. As was the case with the incremental backup recovery operation, the last backup set must be indicated for the ESE recovery storage instance to properly recover the database or storage group.

Brick-level versus normal backup

Customers have requested the ability to restore a single message, folder, or mailbox since the earliest days of Exchange. This is not possible using the normal backup API because data is read from the database in 4-KB pages and is written to tape in that manner. This means that the physical structures of the database are backed up devoid of the logical structure that is meaningful to individual mailboxes and public folders. There is no contextual information written along with the data, so a full restore is necessary before the data is reordered into a proper database structure. Several backup software vendors have attempted to provide the necessary features to support single message restore (like what folder it is in, whether it contains attachments, and so on) in their Exchange-compliant backup products. This mode requires that data be written to backup media with all its contextual information intact, so the normal backup API cannot be used. Instead, a connection is made using the MAPI protocol in much the same way as a normal MAPI client, and data is read out in mailbox order. This is referred to as a “brick” backup. A brick restore is one in which a single item (mailbox, folder, or other item) is extracted from the backup media and inserted into the information store. Restoring a single item is much easier with this approach, but backup times are significantly longer due to the requirement to write out additional information—brick backups do not use single-instance storage, and they have to expand the MAPI properties of each message, bloating the backup data. Typically, brick backups take four to five times longer than a normal backup. I do not recommend that you use a brick backup as the basis for your daily backup routine. Instead, if you use a product that supports brick backups, consider using this feature once a week and use a normal backup every other day. Another possibility is to use such a product feature for key personnel in the enterprise. Microsoft may never provide brick-level backup for Exchange, but many third-party products can provide some solution to this problem. With the ability to partition the information store into smaller units of manageability in Exchange 2000/2003 and to recover individual deleted items and mailboxes, brick-level backups may become less important.

5.1.4 Individual item recovery

Microsoft partly addressed the issue of single-item recovery with the deleted items retention feature in Exchange 5.5 that has been carried over to Exchange 2000/2003. The most common reason people ask for single items to be restored is that they made a mistake and deleted something important that they should have kept. Deleted-items retention means that items are “soft” deleted initially and then “hard” deleted after a set period has elapsed. Soft deletion means that the item is marked as deleted in the database and is hidden from view. Hard deletion means that the item is permanently removed from the database. During the deleted item retention period (between the time when the item is soft deleted and its hard deletion), it can be recovered and recalled to view using an option on Outlook 97, Outlook 98, Outlook 2000, and Outlook XP, and Outlook 2003 clients (Outlook 8.0.3 onwards) and OWA 2003. If your organization has not implemented this feature, help desk or administration staff must perform recovery. I believe that the deleted-item recovery feature implemented in Microsoft Exchange 5.5 covers most of the cases in which a brick-level restore might have been required. In addition, Exchange 2000/ 2003 extends this feature by adding the ability to recover individual mailbox that have been deleted as well (called deleted-mailbox retention). Many companies have set item retention periods of between 7 and 14 days and have found that this eliminates the vast majority of requests for item recovery—after all, users typically figure out that they have deleted something important fairly quickly. However, you should be aware that implementing this feature will cause your database to grow. A conservative estimate is that you should expect an individual information store database to grow by between 10% and 15% for a retention period of 14 days. This percentage will vary from organization to organization and will largely depend on the usage pattern of the messaging server. Of course, deleted item retention does not help recover items that have passed out of the retention period; for that case, you will probably want to utilize a dedicated server (called an Exchange recovery server—discussed later in this chapter) or Exchange 2003’s Recovery Storage Group (RSG) for deleted item recovery. This dedicated server must have sufficient disk space to hold the database being restored.

5.1.5 A word to the wise about off-line backup and restore

When Microsoft developed Exchange Server, choices were made about the architecture and operation of the database engine, and an on-line API for backup and restore operations was developed. This ensured that the server could be operational 7 24 and that backup operations would not cause server outage. Microsoft has strongly educated Exchange implementers about the benefits and necessities of performing on-line backup operations. When calling Microsoft PSS, you will be hard-pressed to find a sympathetic ear if your only means of Exchange Server recovery is an off-line backup. Microsoft has specific reasons for enforcing its recommendations for online backups. The main reason is that on-line backups, which utilize the ESE APIs, have awareness of the transactional nature of the Exchange database engine. If you simply treat the Exchange database as a file, no transactional integrity of the database is maintained. An on-line backup not only will back up the database, but will also back up log files and provide log file truncation. In addition, on-line backups for Exchange provide management on the restore side as well. When restoring backup sets created using on-line methods, the database engine is able to provide recovery up to the very last transaction recorded by utilizing the log files. All around, on-line backups are a preferable method. I recommend against the practice of offline methods unless they are used as a mechanism for periodic archival or your databases.

Unfortunately, many have still chosen to use off-line methods to backup their Exchange servers. In an off-line scenario, all Exchange services are shut down in order to perform the backup or restore operation. Backup operations simply treat the Exchange information store databases as individual files in the file system. EDB and STM files would be backed up just like any other file on the server. This can be very problematic for several reasons. First, since an off-line backup method does not utilize the ESE API, no integrity checking of the database is performed (unless you do it manually with ESEUTIL, ESEFILE, or ISINTEG). Remember, using an on-line method and performing a normal (or full) backup will verify each and every page of the database during the backup procedure. Another problem with the off-line backup method for Exchange is that the operator must take responsibility for managing database transaction log files. The transaction log files are required for successful recovery of the database up to the point of the last transaction that occurred. Suppose, for example, you needed to recover your Exchange 2000/2003 server (either an individual database or an entire storage group) from an off-line backup. If your backup was performed at midnight (12:00 A.M.), you would have a consistent copy of the data for that point in time, assuming the services were stopped or an individual database (store) was dismounted. In our example scenario, suppose a failure condition were to occur at 2:00 P.M. (14 hours later) and you were forced to recover the database. If the failure were mild enough (such as database corruption), you would be able to restore the database files (EDB+STM), but would not be able to play through the existing log files that have accumulated (representing real user data) since the database files were backed up. The greatest potential for error with off-line backup methods occurs during the restore of an off-line database—the database engine does not automatically play through the log files as it would normally do in the case of an on-line backup. There are certainly ways to accomplish the task, but this must be accomplished through manual log file management and the use of scripts and “hacks” that attempt to mimic the on-line recovery operations. Realistically, there is no point to this exercise since that is the purpose for which Microsoft designed the on-line backup APIs.

While you may have been able to devise methods of safe recovery using off-line methods in previous versions of Exchange, Exchange 2000/2003 will make it virtually impossible to enjoy success using off-line methods. In previous versions of Exchange (prior to Exchange 2000), the fact that there was a single private and public information store (PUB.EDB and PRIV.EDB) made it possible (although still prone to error) to implement successful disaster-recovery procedures based on off-line methods. There were only two database files and one ESE database engine instance (storage group) in previous versions of Exchange. Consider Exchange 2000/2003, however, in which you can configure up to four storage groups (even more in later releases) on a server—each with five databases. Further consider the fact that the database is now two files (*.EDB and *.STM) instead of one. Putting all of this together, you can see the difficulty in implementing a backup strategy based on off-line methods for Exchange 2000 on a server with multiple storage groups and databases. Here is my bottom line for this discussion: I hope I have convinced you to stay away from an off-line approach and only to use this method for periodic archival or as an added measure of protection that is complementary to on-line, API-based backups. Off-line backups are, arguably, still useful as a last-ditch recovery tool before, for example, a major hardware upgrade—do a full backup to truncate the logs, shut down or dismount, and then do the full off-line backup. Whatever your preference, ensure that you understand the pitfalls of offline backups with Exchange Server.