5.4 Supercharged disaster recovery for Exchange Server 2003 | Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers (HP Technologies)

The increasing importance of messaging and collaboration as a businesscritical service has left Exchange administrators and implementers looking for new ways to supercharge the recoverability and availability of their Exchange servers. In addition, hardware and software vendors are now supporting technologies that enable more rapid recovery of server data and applications. One specific enabling technology is volume cloning or volume snapshots. This technology is available in a wide variety of implementations and packages that include both hardware and software solutions and the packages range from full-blown “snapshot manager” products for Exchange to integration kits available to customers and integrators who desire to customize their own solution for Exchange recovery. In the past, regardless of how these technologies were delivered, Microsoft provided no native support for them in either the operating system or Exchange Server. However, with the advent of Windows Server 2003 and Exchange Server 2003, this technology is now available natively to Exchange administrators. In this section, I wanted to give you a taste of the power of these technologies and discuss the Windows VSS and what it means for Exchange Server disaster recovery and availability.

5.4.1 Snap/clone technology overview

Before taking another step, I should briefly visit the technologies that enable my discussions. Snapshot and cloning technology is not new to the computer industry, but it is relatively new to the Windows platform. This is partially due to the slow adoption rate of technologies like Storage Area Networks (SANs) and Network Attached Storage (NAS) in the Windows space. Cloning and snapshot technologies provide business continuance volumes (BCVs), thereby providing a much needed capability to the Windows space. BCVs are marketing terminology for snapshots and clones. Simply put, clones and snapshots provide a mechanism for data duplication and point-in-time copies that allows business continuance—thus, the term BCV. On the surface, clones and snapshots may appear to be the same technology. However, they are quite different in actual technical implementation.

Snapshots

A snapshot is a metadata mapping to volume blocks that represent the “ picture” (thus, the word snapshot) of the data at the time the snapshot was created. This means that if you create a snapshot of your Exchange database volume, the snapshot represents the list of the blocks on disk that were used to store your Exchange database (and any other files on the volume) at the time the snapshot was created. Therefore, once a snapshot has been created for a volume, these original volume blocks must be maintained in order for the snapshot to stay intact; snapshot data cannot be moved to other blocks, although it is possible to add data in new blocks that are not on the snapshot list. This requirement forces changes to blocks on the volume to be copied out (snapshots are also known as copy-on-write snapshots) to another location in the storage pool. From an Exchange viewpoint, this means that a change to a page in the Exchange database will require copyout operations of changed blocks if a snapshot has been created for the volume on which Exchange data resides. On a volume for which a snapshot has been created, when a block of data is changed, the block is actually copied out and another block is allocated from free volume pool space. In this manner, the contents of the original subset of volume blocks that represent the snapshot are preserved. Thus, after snapshot creation, the production data is actually a combination of original unchanged blocks (that are still part of the snapshot) and the changed (copied-out) blocks of data. The snapshot contains the original set of blocks that represent the data view at the time the snapshot was created. From this description, it is obvious that a snapshot is not really a complete redundant copy of the data, but a representation of the data at a point in time. Part of the snapshot is still part of the production data set and part represents out-of-date data (shown in Figure 5.5). Due to the nature of snapshots, creation is relatively quick and simple—the volume block mapping is simply created and the snapshot exists. Based on the particular characteristics of snapshot technology (it is not a complete redundant copy of the data and is subject to disk failures), snapshots are somewhat less desirable than clones.

click to expand
Figure 5.5: Snapshot technology illustrated.

Clones

Like snapshots, clones are not a recent development. Disk clones come from a foundation of RAID technology—specifically RAID0+1. A clone is, in actuality, just an additional member of a RAID0+1 mirrorset. Typically, we think of mirrored volumes as only having two members. The Windows Logical Disk Manager (LDM) provides two-way mirroring, as do most hardware RAID controllers. However, some software volume managers and high-end storage controllers allow for N-way mirrors, where N may range from 3 to 32. For example, if you have a RAID0+1 set with 3 disks mirrored to 3 disks, you have a two-member RAID0+1 set. By adding another 3 disks to the existing RAID0+1 set, you would create a three-member mirrorset (a triple mirror). Additional members could be added to the mirrorset as well. By creating multimember mirrorsets and separating members from the set, you gain the ability to create point-in-time clones of the mirrored volume. Unlike snapshots, a clone is a complete standalone copy of the data at a particular point in time.

To create a clone, one or more members of the RAID0+1 mirrorset is simply spilt off from the production set. The result is a production mirrorset that supports the application (two-member RAID0+1 array) and one or more clones (each containing a copy of your data) that have been split off from the production data (as shown in Figure 5.6). Clones can then be used in the event of data corruption or loss to recover system data by replacing the production copy with one of the cloned copies. Because clones are a complete redundant copy of the data, they are extremely useful as rapid recovery mechanisms.

click to expand
Figure 5.6: Clone technology illustrated.

5.4.2 Building the foundation: Volume Shadow Copy Service

While the clone and snapshot technologies have been somewhat available in the Windows space, Windows and applications have not been able to take full advantage of them. This is mainly due to the lack of native support in the Windows operating system and the inability of applications like Exchange Server to function with such technologies. Third-party hardware and software developers have implemented snapshot and clone technology solutions with little of or no exposure or integration with the operating system and applications. This has lead to the current status quo of varied and noninteroperable solutions from hardware and software vendors that are not supported by Microsoft. While Microsoft has acknowledged that these technologies are available, they have limited support and put the primary support responsibility on the vendors of these solutions. The storage technology investments made by Microsoft in Windows Server 2003 are substantial and VSS is one of those investments. Shadow copy is the term that Microsoft uses to describe the snapshot or clone technology discussed above. VSS provides a framework that makes snapshot and clone technologies available to applications and provides some operating system–level support for the synchronization and coordination required to implement them. These services can be used by Windows Server 2003 components ( including AD and the Windows Certificate Server), Microsoft and third-party applications, and third-party backup, data integrity, and SAN product build solutions that leverage snapshot or clone technology. The Windows VSS has three primary goals: (1) to provide application synchronization so that backup programs do not have to be intimately aware of how a particular application stores or recovers its data; (2) to provide a way for tools and utilities to discover and enumerate shadow copies; and (3) to provide a framework where hardware and software vendors can plug-in interoperable shadow copy providers. With these goals in mind, Windows Server 2003 delivers a robust architecture that enables a hardware vendor to supply a shadow copy creation component (called a provider), an application developer to expose shadow copy “packages” called writers that provide XMLbased metadata to VSS, and backup vendors that can build applications (called requestors) that can initiate backup and restore operations that leverage these components on a common infrastructure. Figure 5.7 illustrates the VSS architecture included with Windows Server 2003.

click to expand
Figure 5.7: Windows Volume Shadow Copy Services architecture.

VSS providers

VSS exposes APIs that enable vendors to VSS-enable their solutions. In order for a particular vendor’s snapshot/clone technology to function within the VSS framework, each vendor must develop a VSS provider. Providers are the components that manage volumes and create clones and snapshots per a specific vendor’s technology and implementation—think of the provider as the agent that actually writes the shadow copy data on a particular storage platform. Typically, a provider is a process (some kernel-mode and user-mode code) that persists data about a physical shadow copy in order for that shadow copy to be exposed to the operating system and/or applications. Providers must be built regardless of whether the vendor’s solution is hardware based or software based. In the case of a software-based provider, the implementation is usually a user-mode process coupled with a kernel-mode device driver. Both types of solutions (hardware and software) and the implementation details of the provider are left to the discretion of the vendor as long as they follow the implementation rules of the VSS framework (this is what makes VSS supportable from a Microsoft perspective).

Windows Server 2003 includes a software-based shadow copy provider ( implemented as a copy-on-write software snapshot) as part of the operating system; various other vendors have written providers that make their storage products compatible with VSS.

VSS writers

The most important player in the VSS framework is arguably the application. The application must carefully expose recovery “packages” that are specific to an application’s technology, implementation, and disaster-recovery requirements and constraints. For example, since Exchange Server is a transacted database engine, it will have requirements that are unique when compared even with applications similar in nature (such as SQL Server or Oracle). VSS writers are code and data that is embedded in applications and components of those applications to enable VSS compatibility. Application writers respond to the shadow copy interface to ensure data integrity and consistency during shadow copy operations. Writers respond to requestors (via the VSS interface) by supplying writer metadata that includes the details of what is required to perform shadow copy operations for the specific application. When a requestor asks for a shadow copy, the writer will prepare its data to be copied, normally by freezing incoming write requests and flushing any cached data that has not been written to disk. After those preparations are complete, the writer signals the VSS framework that it is safe to copy the application data; after the copy is finished, VSS notifies the writer that it is safe to resume normal I/O operations. The goal of this implementation is to ensure that no writes occur on the volume during shadow copy operations (when the shadow copy is created). A backup operation performed using VSS is a systematic and well orchestrated process that involves the interaction of each of the components in the VSS framework. Figure 5.8 provides a generalized flow and interaction diagram of the backup operation using VSS technology.

click to expand
Figure 5.8: Typical VSS Shadow Copy operation.

VSS requestors

Backup and disaster-recovery solution vendors participate in the VSS framework by developing their applications to make use of the VSS architecture, APIs, and implementation rules. These vendors must develop VSS requestors. A requestor is a process or application (automated or GUIbased) that requests that one or more shadow copy sets be taken from one or more volumes. The requestor is the main process that communicates with the VSS interface; VSS coordinates requests by passing them to writers and providers as necessary. The requestor also communicates directly with writers to gather backup components, files, and metadata managed by the writers. This allows a requestor to select the volumes that should be shadow-copied to complete the requirements of the backup operation. With the advent of VSS, I look to the day when all Windows backup solutions are based on VSS—otherwise, they will find it increasingly difficult to be supported by Microsoft.

5.4.3 Exchange Server 2003 support for VSS

Based on the earlier discussion, it should be clear that, to support the VSS framework, an application like Exchange Server must provide the VSS writer component. For previous versions of Exchange (v4.0 — 2000), Microsoft has not and will not provide a writer and, therefore, does not support VSS for these versions. However, the Exchange Server 2003 release does provide VSS support for Exchange information store backup and recovery. In Exchange Server 2003, Microsoft has built the VSS writer functionality into the information store (STORE.EXE) process of Exchange Server. This writer will provide the necessary support for VSS requestors to initiate backup operations for Exchange Server 2003.

Exchange Server 2003 backups using VSS

Traditional Exchange API-based backups focused on four backup types for Exchange databases: full, incremental, differential, and copy. However, the Exchange 2003 VSS writer supports only a full backup at the storage group (SG) level. VSS performs Exchange Full backups at the SG level, even though the Exchange writer treats individual databases as separate components. VSS uses the AddComponent call to add each database component to the shadow copy set, which in the case of a Full backup, is the entire SG (i.e., databases or log files). In a Full backup of an SG, VSS creates a complete shadow copy of all volumes that contain Exchange data—the shadow copy contains database and transaction log files associated with that SG. In addition, as is the case with non-VSS full backups, VSS truncates the transaction log files after successfully creating and backing up the shadow copy. To truncate the transaction log files, the shadow copy set must include all databases. For this reason, Microsoft will use the metadata definition for the Exchange writer to force the requestor applications to process only full backups that have all SG components (i.e., databases or log files) in the shadow copy set.

Exchange VSS full backup —VSS backups for Exchange will be at the SG level. This is the case even though individual databases are treated as separate components. Each database VSS component is added (with the AddComponent call) to the shadow copy set (which, in the case of a full backup, is the entire storage group—databases and log files). In a full backup of a storage group, a complete shadow copy is created of all the volumes (containing database and transaction log files) associated with that storage group. In addition, as is the case with non-VSS full backups, the transaction log files will be truncated after successful shadow copy creation and backup. In order for the transaction log files to be truncated, all databases must be included in the shadow copy set. For this reason, Microsoft will force (via the metadata definition for the Exchange writer) requestor applications to only process full backups that have all storage group components (databases and logs) included in the shadow copy set.

Exchange Server 2003 recovery using VSS

Although VSS backup for Exchange 2003 is at the SG level, you can recover individual databases from the SG snapshot set—each shadow copy has the individual database files, plus the logs needed to reconstitute them. VSS-based restoration of an Exchange 2003 SG is useful when data in one or more databases in the SG is lost or corrupted, but the current log files remain intact on disk; when the current log files on disk are lost or corrupted, but the databases remain intact; or when databases and current log files within an SG are lost or corrupted.

In the context of Exchange 2003 and VSS, only the backup application is responsible for restoring data to disk. The Exchange 2003 database engine, not the requestor, is responsible for recovering the data to a consistent, up-to-date state through playback of the log file. To do so, the database engine activates existing soft- or hard-recovery procedures after the VSS-aware backup application restores the transaction log files and databases. Once the restore is complete, Exchange 2003 remounts and restarts the SG, and then the database engine initiates recovery. The database engine determines that the state of the databases isn’t consistent with the end of the log file on disk and begins the recovery procedure.

Three Exchange 2003 data-restoration scenarios exist, but only two procedures for those scenarios exist. The roll-forward recovery and point-in-time recovery procedures for restoring data are the same whether you have lost only the SG’s log files or you have lost an SG’s log files and databases. You use the same procedure because the loss of the log files is a catastrophic failure in Exchange and requires restoring the entire SG. In either case, these recovery options follow a specific step-by-step process:

The affected storage group is taken off-line.
VSS-based recovery is initiated. This includes all of the volumes contained in the SG shadow copy set.
- If one LUN is configured per SG, Exchange recovers all databases except those that are intact.
- If multiple LUNs per SG are configured, Exchange recovers only the LUNs with the databases needing recovery from the Shadow Copy set.
Exchange performs an Extensible Storage Engine (ESE) hard recovery and replays applicable log files for databases being recovered, depending on whether a roll-forward recovery or point-in time recovery is occurring.
The storage group is remounted and resumes normal on-line operations.

Roll-forward recovery— In a roll-forward recovery, one or more databases in the SG are lost, but the log files are intact on the server at the time of the recovery. In this case, you can selectively restore each of the affected databases from a full backup of the SG. Within the context of the VSS framework, you select from the SG backup only those database components that correspond with the databases you want to restore. The VSSaware backup application restores the databases, and then Exchange recovers the databases and brings them up to date from their state at the time of the snapshot by rolling forward through the transaction logs (Exchange hard recovery). The roll-forward recovery option lets you recover backed up data as well as data that has accumulated (e.g., in transaction logs) since the last backup.

Point-in-time recovery— When the SG’s log-file volume has been damaged or lost or the log files have been lost or damaged together with some or all of the SG’s databases, you must restore the log files from a previous backup, together with all the databases backed up at the time of the last full backup of the SG. You cannot recover to the point of the failure because the log files and databases since the last backup have been lost or damaged, so you can recover only to the point of the last full backup. This process is known as a point-in-time recovery. Because this option does not provide roll-forward capability, some data will be lost (e.g., data between the point in time of the recovery and the time of the failure). To provide point-in-time recovery, you must restore the databases that you backed up at the time of the full backup as well as the log files from the full backup. In addition, you must recover all databases associated with the SG. You cannot assume that any of the databases were left in a transaction-consistent state at the time the log files were lost and went off-line, because the loss of the transaction log is a fatal error that causes the store to shut down immediately with no guarantee of consistency. Therefore, to ensure that the databases are in a consistent state when you restart the SG, you must return the entire SG to its state at the time of the last Full backup.

Implications for Exchange administrators

As organizations move from previous versions of Windows (NT4 and Win2K) and Exchange (Exchange 4.0 — 2000) to Windows Server 2003 and Exchange Server 2003, the use of VSS-based backup and recovery will become a standard mechanism for Exchange disaster recovery. However, I must concede that VSS solutions are not yet proven or readily available. The non-VSS solutions that exist today allow snapshot and clone technologies to be utilized with Exchange Server. However, these technologies have no native operating system or application support. As a result, support from Microsoft for these solutions is limited [see Table 5.2 for a listing of pertinent Microsoft Product Support Services (PSS) Knowledge Base (Q) articles]. Organizations must therefore rely on the vendors of these solutions for support—both for current non-VSS solutions and for future support and adoption of VSS solutions. At the time of this writing, third-party vendor support for VSS is a bit unknown (for both VSS providers and requestors) but vendors such as EMC, HP, Hitachi, Veritas, and CommVault are leading the way with VSS-enabled solutions released or in beta. Yet it will only be through extensive testing and deployment of these solutions that we will understand the power and utility they bring us. Until Exchange Server 2003 is widely deployed on Windows Server 2003 and vendors have done their part to embrace Windows VSS, the jury is still out as to whether this technology will really help us make our Exchange servers more available. However, as you may imagine, the potential here is huge.

Table 5.2: Microsoft Knowledge Base Articles Discussion Snap/Clone Support for Previous Versions of Exchange Server
Microsoft Knowledge Base “Q” Article	Subject
Q237767	XADM: Understanding Offline and Snapshot Backups
Q311898	XADM: Hot Split Snapshot Backups of Exchange
Q296787	XADM: Off-line Backup and Restore Procedures for Exchange Server 4.0, 5.0, and 5.5
Q296788	XADM: Off-line Backup and Restore Procedures for Exchange 2000 Server

5.4.4 Exchange recovery servers

One of the most challenging aspects of an Exchange administrator’s job is disaster recovery for Exchange. You must be able to provide recovery for every scenario, including mailbox and message recovery, information store recovery, and complete Exchange server recovery. Providing individual bricklevel mailbox and message recovery can be the most challenging of these scenarios. Many third-party solutions exist for mailbox or message-level recovery, but most are far from perfect. In addition, Microsoft does not really provide a solution either—instead leaving this gap open to the third-party developers. However, there is a solution for this problem scenario that should be your best practice regardless of which version of Exchange Server you are running. This solution is called the Exchange recovery server and it can be a very useful tool for every Exchange administrator.

The idea behind the Exchange recovery server is that you maintain a spare server in your environment that is available as a target location to perform recovery operations. Regardless of which version of Exchange you run, you can recover an information store from one server to another server. Once this information store is recovered to the recovery server, you can then perform recovery for complete mailboxes or just individual messages by extracting these items from the store using Outlook or programs like ExMerge. Of course, Exchange 5.5 provides recovery and retention for deleted items, and Exchange 2000 adds mailbox retention to this. However, since even these two mechanisms might not meet all of your disaster recovery requirements, it is nice to know that the recovery server option is available. Figure 5.9 illustrates the idea of how a recovery server is used to provide mailbox and individual item recovery for an Exchange deployment.

click to expand
Figure 5.9: Exchange 2000 recovery server scenario.

Exchange 5.5 recovery servers

The recovery server capability is available for any version of Exchange. However, the configuration and setup of your recovery server will vary depending on whether you are using Exchange 5.5 (and earlier versions) or Exchange 2000. Let us start our discussion with Exchange 5.5 and earlier versions since this is the version that the majority of you are using. With Exchange 5.5 and earlier versions of Exchange, the recovery server can be any server (member server of domain controller) in the same domain as the original server you are recovering. However, the recovery server must have a different name from that of the server being recovered (you don’t want this server to start participating in the Exchange organization and doing directory replication, and so forth). You configure this recovery server by installing it with the same organization and site naming conventions and hierarchy as the original server (but a different server name) and organization but you do not join this server to the production organization. The result will be a server with the different name, but the same site and organization name as the original. This might seem confusing, but the server will not interfere with your existing Exchange organization because it was not specifically joined to it during installation. This allows the server to function in the environment without causing problems for the production Exchange 5.5 deployment (you can even use the same Exchange Service account).

Once the recovery server is installed and properly configured, you can restore an information store to the server. However, the directory database on this server will not have any objects (remember, you did not join this server to the existing organization). You can easily remedy this by either manually creating mailbox objects in the directory that match the ones you want to recover [the Distinguished Name (DN) is the only attribute that must match]. Alternately, you can run the Exchange 5.5 DS/IS Consistency Adjuster (in the Exchange Administrator program) to automatically generate mailbox objects in the directory for each one found in the information store. These procedures will link information store mailboxes to directory objects on the recovery server. From this point, you can simply install the Outlook client on the server or use tools such as ExMerge to extract mailbox data to personal store files (PSTs) and import the data back to the production Exchange server into the appropriate mailboxes.

Exchange 2000 recovery servers

Deploying a recovery server for Exchange 2000 operates on the same principles as earlier versions. However, several things are different in Exchange 2000. The first issue has to do with Exchange 2000’s dependence on the Windows AD. Since you can have only one Exchange organization per AD forest (as things are today—hopefully, someday this will not be the case), we are forced to deploy a completely separate AD forest for our Exchange recovery server(s). This should not be that big a deal since the recovery forest can exist right along side our production forest, and it is a good idea to have a test environment available anyway. However, you will need to determine the administrative impact this has on your organization. The other key differences in Exchange 2000 involve the change from sites (in previous versions of Exchange Server) to administrative groups (in Exchange 2000) and the ability to have more than one information store per server. This adds additional steps to the configuration of an Exchange 2000 recovery server, which I will discuss next.

The first step in deploying and Exchange 2000 recovery server is to deploy your recovery forest. You must have a separate forest when installing your Exchange recovery server or you will be forced to join the existing production Exchange organization. The recovery forest can have any naming convention and need not match naming conventions in the production forest (it can even exist on the same network) as shown in Figure 5.9. Once you have your Exchange 2000 recovery forest deployed, you can install your Exchange 2000 recovery server into the forest with the same organization name as the production server you will be restoring (remember, this is the Exchange organization naming—not the AD naming). The server name can be the same or different as there will be no conflicts with the production server (other than potential DNS issues if it is on the same network). If you are going to keep the recovery forest up and running permanently, I recommend that you install a permanent recovery server into the recovery forest that maintains a permanent name and is the first server in the organization. I also recommend that the naming conventions and administrative group hierarchy match your production Exchange 2000 organization. This will be of huge benefit in the steps I discuss next. When you have a permanent server installed that maintains organization naming and hierarchy, a second recovery server can be installed for each incident to match the naming of the server being recovered. This will alleviate issues with LegacyExchangeDN discussed next.

After the recovery forest and server(s) have been configured, you can perform a restore of the information store database into an administrative group with the same name as the one where the database was taken from. However, from a database and AD point of view, the LegacyExchangeDN values for the administrative group and the database must match. In addition, the storage group and database names must also match those on the original server. If you have taken steps as outlined earlier to maintain a permanent recovery server in the recovery forest (that matches the naming and hierarchy of the production organization) and have installed a second recovery server to the same administrative group, storage group, and database as the production server, these values will match. However, in the event that you do not wish to maintain this extra recovery configuration, the LegacyExchangeDN value can be updated manually using an LDAP editing utility such as LDP, ADSI Edit (Windows 2000 Support Tools), or LDIFDE (installed by default in Windows 2000) to view and edit this value for the database and administrative group. In addition, Microsoft provides an unsupported tool called LegacyDN.EXE (available with Exchange 2000 SP1 or later) that provides an easy-to-use interface for changing this attribute (note really made for this purpose—which is why it is not supported). Regardless of which tool you choose, the Exchange 2000 organization name, administrative group name, storage group name, database name, and LegacyExchangeDN values for the production environment and the recovery server must all match for the database being restored. For more information on the procedure to modify LegacyExchangeDN values, see Appendix A: Changing the LegacyExchangeDN Attribute Values in the white paper Exchange Database Recovery at www.microsoft.com/technet/prodtechnol/exchange/support/dbrecovr.asp.

Once you have restored the information store database to the recovery server, you can proceed to link mailbox objects to mailboxes similar to the Exchange 5.5 recovery scenario. However, for Exchange 2000 and AD, the procedure is different. In earlier versions of Exchange, the mailbox object is linked to a DN and only directory objects with the same DN are required to link mailboxes to directory objects. In Exchange 2000, this is not the case, and you must explicitly connect a mailbox in the database to a directory object. You can do this manually if you are recovering a small number of mailboxes by creating a user object that is not mailbox-enabled (using the AD Users and Computers MMC Snap-in). After creating a user object, it is a good idea to run the mailbox cleanup agent from the Exchange System Manager. Afterward, you should be able to see mailboxes in the restored database that have a red “X” indicating they are orphaned, or not connected to any user object. From this point, you merely right-click on the mailbox object, choose reconnect, and select the user object that you wish to connect to the mailbox. Of course, if you are recovering many mailboxes, you might want to use something like the Mailbox Reconnect Tool (MBConn—available on the Exchange 2000 CD) to save you some keystrokes and mouse clicks. Once mailboxes are connected to user objects, you can extract data from them in the same manner as we discussed in the Exchange 5.5 recovery server scenario (using Outlook, ExMerge, and so forth).

We have taken a closer look at the concept of a recovery server for your Exchange environment. As an Exchange administrator, you should take this concept to heart and implement Exchange recovery servers as a best practice in your environment—regardless of which version of Exchange you have. In addition to being an important recovery facility for your Exchange environment, the recovery server scenario also provides a great nonproduction testbed that is useful in other situations. If you are not using Exchange recovery servers as part of your everyday life as an Exchange administrator or encouraging your customers to use this best practice, you might consider the benefits this solution can bring to you and your customers. The recovery server concept may seem like extra trouble. However, if you desire to provide this level of disaster recovery for your Exchange users, the recovery server concept can be a lifesaver.

5.4.5 Leveraging Exchange Server 2003’s Recovery Storage Group feature

These steps in utilizing Exchange recovery servers may seem a bit cumbersome. That’s because they are! Microsoft has done a great job of documenting this procedure for recovering mailboxes and data, but it is very complex and somewhat prone to error. In fact, Microsoft Product Support Services (PSS) spend far too many cycles support Exchange recovery server methods. In response, the Exchange development team has heard the cries of Exchange administrators (and PSS support engineers …) and has devised a method for providing this functionality in a much simpler and easier to deploy mechanism. It is my pleasure to introduce you to the Exchange Server 2003 Recovery Storage Group (RSG) (Figure 5.10)!

click to expand
Figure 5.10: Exchange Server 2003’s Recovery Storage Group feature.

Bringing greater flexibility to the recovery of databases, mailboxes, and individual items, the Exchange Server 2003 RSG is a powerful new feature. The RSG is a special-purpose fifth storage group available on Exchange Server 2003 servers that exists alongside your production storage groups on the server. This means that even though a server is configured with four production storage groups, you can still add an RSG to the server. You can then use this RSG to recover databases from any Exchange Server 2000 SP3 and later server that is in the same Administrative Group as the server with the RSG. After you have recovered a database to the RSG, the use and procedure for recovering mailbox data is much the same as the Exchange 2000 recovery server scenario. You can use tools such as ExMerge to move data from the RSG to production storage groups. This allows you to recover an entire database or just a single mailbox. The RSG does have some caveats, however. You can only have one RSG per server, and the RSG does take overhead on the server. In addition, if you want to perform concurrent recovery operations on the server and have the maximum number (four) of storage groups and an RSG configured, you will not be able to do so (since a maximum of five ESE instances are possible per Exchange 2000/2003 server, there are no available instances to perform concurrent operations). Finally, RSGs only support the recovery of mailbox stores— not public folder stores (see Table 5.3 for more information). RSGs are created the same way you create regular storage groups by selecting the server and right-clicking and choosing New … Recovery Storage Group. (Yes, it is that simple!)

Table 5.3: Exchange 2003 Recovery Storage Group Usage Scenarios
Scenario	Usage Description
Database/mailbox/item recovery	Useful for recovery of lost or deleted data from user mailbox. A database from the same Administrative Group as the RSG can be recovered and data can be extracted via tools such as ExMerge.
Rapid recovery	In the event of a catastrophic loss of a mailbox database, a “stub” mailbox database can be created and the logs copied to alternate location. The RSG can then be used to recover the database to the point of failure while the stub database allows continued service for users. Once the database has been recovered, it can be swapped with the stub database and new data from the stub database can be recovered via the RSG to the original production database.

The concept of an RSG is new to Exchange Server 2003 and promises to save Exchange administrators and Microsoft PSS hours of work. By avoiding the requirement to set up an entire recovery forest and deploy extra hardware and software (which, by the way, Microsoft required you to pay for …), RSG offer huge wins. However, time will attest to the success of the RSG and whether the efforts the Exchange development team were well invested. If you have ever struggled or been frustrated with the old way of doing things (Recovery Servers), Exchange Server 2003’s Recovery Storage Group is a welcome relief.

Power with responsibility

With the advent of multiple storage groups and databases in Exchange 2000/2003, the recovery API was stretched to accommodate new scenarios. You must be able to perform backup and restore operations for the entire server, a storage group, or an individual database. In addition, since these operations can be performed concurrently, ESE must be able to handle this as well. Exchange 2000/2003 offers a great deal of flexibility and additional availability that previous versions did not offer. For example, you could have four databases configured that each host 1,000 users (a total of 4,000 users). You begin restore operations for one storage group or database without impacting the other storage groups. In our example, 3,000 users would be on-line accessing their data while the 1,000 users using the database being restored would be the only affected users. This, indeed, gives operators many more options and reduces the overall impact of restore operations. However, it also complicates procedures, requires better training, and has greater potential for error. Gather the knowledge you require to ensure that you are implementing solid disaster-recovery plans for Exchange servers. Understand the different backup strategies available for Exchange 2000/2003, and select the one that best suits your organization. Also, be sure to investigate how you can leverage Exchange Server 2003’s RSG feature. Finally, keep an eye on Exchange 2003’s support for Windows VSS and the result vendor solutions that become available. Also, stay tuned because many of the best practices and tricks of the trade for Exchange 2003 have not been discovered yet (although things aren’t that different from Exchange 2000). The power of Exchange 2000/2003 storage must not be realized without properly understanding the disaster-recovery implications of your storage design choices.