11.6 The Recovery Storage Group | Microsoft Exchange Server 2003 Administrators Pocket Consultant

< Day Day Up >

During discussions about email outages, the concept of "dial tone" sometimes occurs in an attempt to convey the urgent need to restore service in the same way that phone companies keep telephone lines available. Of course, there is a world of difference between keeping the phone infrastructure running and bringing an email server back online, but the essential concept is similar. People depend on the service and want the service restored as soon as possible after an outage. Often, restores require hours of patient labor, a fact that users do not appreciate because they just want to access their email.

Up to Exchange 2003, the reliance on the Store means that any corruption, disk failure, or other outage that affects the ability of clients to connect to the Store renders Exchange inoperative until administrators can fix the problem and get the Store databases back online. If the database is more than 10 GB, running ESEUTIL to check a database and make it consistent can take hours. If you have a corrupted database, the only way to solve the problem is to restore the database files from backup media, which leads to extended outages and loss of user confidence. The need to avoid long out- ages that affect Service-Level Agreements is the reason why administrators evolved techniques such as "Rapid Online, Phased Recovery." While these workarounds are effective, they do not address the underlying system deficiencies and usually require extra hardware.

Exchange 2003 supports a Recovery Storage Group (RSG) to allow administrators to recover information from a failed database without the need to use another server specially assigned for this purpose and installed in a separate AD forest and Exchange organization. Large organizations can easily justify keeping a recovery server around to be able to respond to problems quickly, but this is an excessive cost for small organizations, so the ability to do everything on a single server is welcome. The RSG is a special form of storage group that you can use only for recovery operations, attaching up to five databases for recovery purposes. You can use the RSG in addition to the maximum of four standard storage groups supported by Exchange. Users cannot access data in the mailboxes in the databases in the RSG, but an administrator can recover data from the mailboxes with the ExMerge utility. In addition, any databases connected to the RSG are invisible to system maintenance utilities, such as full-text indexing, backup, or antivirus programs.

As an example of how to use the RSG, we encounter a problem with a database and want to restore service to users as quickly as possible (the dial- tone approach). We respond by creating a new database to allow users to continue working and then use the RSG to recover data from the most recent full backup set.

11.6.1 Encountering database problems

We begin with a database that is online and active. We also have a recent full online backup that we know is good, so we are in a position to restore if anything goes wrong. A problem now occurs and one of the database files is corrupted and ceases to function or goes offline. After we address any potential hardware problems, we then have to make a decision about how to handle the database:

Attempt to mount the database and see whether the problem still exists. Sometimes, a hardware problem does not affect the databases and Exchange's normal soft recovery process is enough to make the database consistent when the Store mounts the database. In any case, it is unwise to attempt any recovery procedure if a lingering hardware problem persists.
Immediately proceed to restore from backup media and make users wait until we complete the recovery process, including the replay of any outstanding transaction logs.
Assuming that the Store cannot mount the database and some indication exists that the database is OK but perhaps inconsistent, we could attempt to fix the database with the ESEUTIL utility. On a small server, where the database is less than 4 GB or so, this may be a good course of action, especially if you have contacted Microsoft PSS and they advise you to run ESEUTIL. Because it takes so long to run ESEUTIL on larger databases, it is best to get users back online and plan to work with the database afterward.
Immediately proceed to use the RSG.

In this instance, we want to use the RSG but also preserve the original database for analysis and perhaps to be able to run ESEUTIL on another server. Microsoft PSS may ask you to keep a database for them to look at.

We therefore copy the problem database to a safe location, perhaps using a network location specially mounted for this purpose. After you are positive that you have copied the database safely, you can rename the original database file, and then proceed with the RSG-based recovery. The steps in the process are:

Create a temporary database to host temporary mailboxes for users.
Connect users to the temporary database to allow them to keep working during recovery operations.
Create the Recovery Storage Group and attach the problem database to it.
Recover the database files from the last good full backup plus any subsequent transaction logs.
Use the RSG to ensure that you can see the data in the mailboxes in the recovered database.
Switch the recovered database back into production. The temporary database is now under the control of the RSG.
Use ExMerge to export the contents of the mailboxes in the temporary database and merge them back into user mailboxes in the production database.
Close down the RSG and clean up the temporary files.

11.6.2 Creating a temporary database

The first step is to create a new database for users to access. When the Store attempts to mount the database, it detects that the database is no longer present (you copied and then renamed the original file). The Store then offers to create a new database (Figure 11.30). Note the hint that you can leave the database alone and proceed to restore it from a backup. This is the classic recovery action to restore a database after file corruption, but in this case, it will take us too long to recover from backup and users want immediate access to their mailboxes.

Figure 11.30: The Store offers to create a new database.

While we need a new Mailbox Store to restore service to users, we do not want to create a new Mailbox Store in the "production" location, because we will restore the backup copy of the database here. Therefore, we first create a temporary working directory to hold the new Mailbox Store for immediate use. The logic is that the temporary database is for short-term use and will only accumulate a small amount of data in its mailboxes, at least when compared with a production database. We therefore plan to switch users back to the production database as soon as we recover it from backup, and we will then recover whatever information they generate in the temporary database and merge it back into their mailboxes. The work to recover and merge happens behind the scenes. Users are aware that a problem exists and that they do not have access to their original mailboxes, but we should be able to switch the temporary database for production very soon after we recover the last good copy of the production database from backup, so users will only lose access to their original mailboxes for that time. In the interim, they can process messages using the temporary mailboxes.

Assuming that the volumes are available and the corruption is not due to hardware failure, we can place the database and transaction logs for the RSG on the same volumes that host the production files, as shown in Table 11.5.

Table 11.5: Locations of Production and RSG Files
	Production	RSG
Database	D:\Exchange\SG1	D:\Exchange\RSG
Transaction logs	F:\Exchange Transaction Logs\ SG1	F:\Exchange Transaction Logs\ RSG

Next, select the database and view its properties, going to the database page. Use the Browse option to change the paths for the database and transaction logs to the temporary directories and then press OK. Now remount the database. The new Mailbox Store is empty, so users have no access to data previously held in their mailboxes. As soon as you mount the temporary database, clients can log on and begin working again. However, Outlook clients should turn synchronization off and stop using cached Exchange mode with Outlook 2003, because they are no longer working with the same mailbox context. In both cases, the slave replica folders in the OST do not match the server folders in the temporary database. Clients can resume normal synchronization after you restore the production database and swap out the temporary version. If you use ESM to view the mailboxes in the database just after mounting, you see that none appears to exist. This is because the Store only creates a mailbox within a new database (which is what we have) when a client first connects, so you will see the mailboxes appear only after users connect and begin to use their mailboxes.

11.6.3 Creating the RSG

Users can now create new messages and receive incoming email, but they do not have access to any of the information in the corrupted database. At this point, we still have a problem for Outlook users who synchronize messages with the Store (such as those who work in cached Exchange mode), because the encrypted tokens in their OSTs do not match the tokens in their mailboxes in the new Store. We can resolve this issue in two ways: Either work with Outlook in purely online mode and avoid synchronization or use OWA. Synchronization returns to normal when we restore the original database from backup and return it to production. If Outlook users recreate their profile or delete their OST in an attempt to work around the problem, they will generate a lot of synchronization activity to rebuild their OSTs, which we want to avoid.

The next step is to create the Recovery Storage Group. The process is much like creating a normal Storage Group. Select the server, right-click, select New from the menu, and then select "Recovery Storage Group," as shown in Figure 11.31. When you create the RSG, ESM proposes a default location to hold files during the recovery process. Typically, this is a directory under the Exchange root directory. We want to recover the database into the production location, but we want to use a temporary work directory for the transaction logs and the system path, which holds files such as the checkpoint. Use the browse option (Figure 11.32) to change the locations. Table 11.5 indicates the locations used in this example. Click OK to proceed and ESM instantiates the RSG. At this point, the RSG is just a placeholder and no files exist in its directory. The next step is to tell ESM which database you want to add to the RSG. The RSG only supports one database at a time, so if you have problems with multiple databases, you need to use either a recovery server or an RSG on another production server.

click to expand
Figure 11.31: Creating a new Recovery Storage Group.

click to expand
Figure 11.32: Setting locations for the RSG.

To add a database to the RSG, right-click on the RSG and select the "Add Database to Recover" option. ESM then checks AD to discover what databases exist in the administrative group that the server belongs to, and displays the dialog shown in Figure 11.33. Select the database that you want to recover and click on OK.

click to expand
Figure 11.33: Adding a database to the RSG.

We now have an association between the RSG and a database, in the same way as we have an association between a normal storage group and its databases. If we examine the properties of the RSG, we see that the database locations point to the directory we defined for the RSG. No files exist there yet, because we have not restored the backup. In fact, when you examine the properties, you will see that ESM generates default names for the databases, so they do not match the databases you want to restore. You can change these names to match the original databases with the browse option. As you can see in Figure 11.34, the RSG database locations now point to the location of the original production databases and the names now match the original databases.

click to expand
Figure 11.34: Adding a database to the RSG is completed.

11.6.4 Restoring the database

Restoring the backup copy of the failed database is the next step. In this instance, we use NTBACKUP to restore a copy from disk, but the same principle holds for other backup utilities and media types. We state a temporary directory for the restore operation to put the transaction logs in the backup. The "Last Restore Set" checkbox is set to tell NTBACKUP that we only want to process this set, meaning that the ESE will replay transaction logs at the end of the backup process to make the database fully consistent. (See Figure 11.35.)

click to expand
Figure 11.35: Restoring the failed database from the backup set.

After a successful restore, the backup copy of the failed Mailbox Store is in the production directory. Go to ESM and mount the database in the RSG, then refresh the RSG entry to be able to list the mailboxes in the recovered database, as shown in Figure 11.36. All of the mailboxes have a red cross next to their names, because ESM cannot associate the mailboxes with AD accounts. End users cannot log on to these mailboxes and administrators cannot perform operations such as create new mailboxes in the RSG, but you can recover their contents, which is the major benefit. We now have users connected to a temporary database that holds little data (because it has not been in use much). The recovered database in the RSG holds far more data, so we now proceed to swap the two databases. The effect will be to provide users with all their data up to the time of the backup that we restored, while we move the data that they created after the problem happened into the RSG. Later, we will use the ExMerge utility to recover the data from the RSG and merge it back into the production database.

click to expand
Figure 11.36: Viewing the mailboxes in the restored RSG.

Exchange provides no user interface to swap databases, so we have to do it behind the scenes by editing database properties. These properties exist in the Microsoft Exchange configuration container in the AD, so we have to edit them with the ADSIEDIT utility.

Before we move anything, we have to dismount both databases to prevent any user access while we swap files. In addition, take the opportunity to set the "This database can be overwritten by a restore" checkbox for both databases so that the Store accepts that the underlying files have changed when you remount the databases after editing their locations. To edit the location for a database:

Start ADSIEDIT and open the configuration naming context.
Navigate through Services | Microsoft Exchange | Administrative Groups to the server you are working with and then to the database in the storage group.
Select the database and view its properties.
Navigate to MsExchEDBFile (the location of the EDB database) and change its value to point to the correct location.
Perform the same operation for MsExchSLVFile (the location of the streaming file).
Select the database in the Recovery Storage Group and perform the same changes.

After making the changes, go to ESM and remount the databases. You should now be able to list the contents of the production database and view the mailbox contents at the time of the backup. Clients should be able to connect to their mailboxes and work normally. You should also be able to view the contents of the mailboxes in the recovery database, which holds the information that users generated while the temporary database was online. Table 11.6 summarizes the movements of databases during the RSG procedure, while Figure 11.37 shows how to use ADSIEDIT to change a database location.

Table 11.6: Database Locations during RSG Operations
Situation	Production Mailbox Store Location	RSG Database Location
Before problem happened	D:\Exchange\SG1	-
Bring temporary database online	D:\Exchange\RSG	-
Restore backup copy to RSG	-	D:\Exchange\SG1
Switch databases	D:\Exchange\SG1	D:\Exchange\RSG
Recover user data with ExMerge	D:\Exchange\SG1	D:\Exchange\RSG

click to expand
Figure 11.37: Editing database locations with ADSIEDIT.

11.6.5 Using ExMerge to recover mailbox data

We now need to extract information from the temporary database and merge it back into the production database to provide users with a complete copy of their work before and after the problem occurred. As in a recovery server situation, you can retrieve data from mailboxes in the RSG with the ExMerge utility. It is important to use the version of ExMerge provided in the \support\utils\i386\exmerge folder on the Exchange 2003 server CD, since this version is able to bind to the mailboxes in the RSG with an appropriate level of security. Before attempting to run ExMerge, copy Exmerge.exe and Exmerge.ini into the \exchsrvr\bin directory to ensure that the executable can find exchmem.dll.

As with other ExMerge recovery operations, you have a choice between a one-step process and a two-stage process to recover data from mailboxes. A one-step process is most appropriate when the recovery and target servers are different. In this case, the two databases are available on the same system, so we can use the two-step process (extract and then import using interim PST files as the transfer mechanism). Make sure that you have sufficient disk space available for ExMerge to create the PSTs, planning on the basis that the PSTs will occupy roughly twice the size of the reported mailboxes.

When ExMerge starts, it prompts for a server to connect to, and then the database from which you want to recover mailboxes. You want to export data from the mailboxes in the temporary database that is currently active, because you will eventually switch the restored database back into production; you are now saving information that users create while using the tem-porary database. For this reason, you should run ExMerge at a quiet time, when users are inactive.

After selecting the database, you can then select all or some of the mailboxes in the database; ExMerge will then begin to export mailbox data to the PSTs in the temporary directory. When this stage is complete, a separate PST exists for each mailbox. Figure 11.38 illustrates how to select the RSG database, how to select mailboxes in the database, and how ExMerge reports its progress.

click to expand
Figure 11.38: Running ExMerge to recover mailboxes.

11.6.6 Cleaning up

After you have recovered data from the temporary database and merged it back into the production mailboxes, there is no reason to keep the RSG active. To clean up:

Dismount the temporary database in the RSG.
Delete the database from the RSG.
Delete the files from the RSG directory.

It is also best practice to take a full online backup immediately after you merge data from the last mailbox in the temporary database back into the production database.

It is possible that you want to keep the RSG online for some time for test or other purposes. In this case, you probably want backup utilities to ignore the RSG and not include it in regular backup operations. To force the Backup API to ignore the contents of the RSG, create a DWORD registry value called in the following location:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ MSExchangeIS\ ParametersSystem

The value is called Recovery SG Override and you set it to 1 (0x00000001) to force the Backup API to ignore the RSG.

< Day Day Up >