Local Continuous Replication | Mastering Microsoft Exchange Server 2007 SP1

Local continuous replication (LCR) is one of the most promising features of Exchange 2007. The key point of LCR is to allow you to have a nearly completely replicated copy of the current database on the local server. In a situation where the production database is no longer functioning, the administrator can switch from the production database to the LCR database.

LCR is enabled on a per-storage group basis and the storage group cannot have more than one mailbox or public folder database. When first enabled, LCR creates a seeded copy of the current database; at the point of creation, the LCR database that was seeded will be the same as the production database. As transactions are committed to the production database, the transaction logs are filled. When a transaction log fills up, LCR copies the transaction log file to the LCR transaction log location for that particular storage group. The log is then replayed into the LCR copy of the database; Microsoft calls the LCR database the "passive" copy of the database. At any given time, the LCR database should be within one transaction log of being completely synchronized. If the database is dismounted, the LCR database becomes fully synchronized.

Tip

Local continuous replication is one of the reasons that the Exchange database transaction log file size was reduced from 5MB to 1MB. This ensures that transactions are committed to the LCR database more quickly.

The advantage of this is that it reduces the amount of time necessary to restore a database from backup to practically no time at all. This will allow you to safely support larger and larger database sizes and still maintain good recoverability and recover times. Still, supporting LCR is not a license to have 5TB mailbox databases since you still have to worry about a situation in which you might have to completely rebuild the server or the entire disk subsystem. Databases still have to be backed up to an alternate media even if you do have LCR copies. Further, the database size show not be so large that nightly online maintenance cannot be completed at least once every week.

Microsoft recommends mailbox databases of no more than 100GB without LCR and mailbox databases of up to 200GB with LCR.

Tip

Local continuous replication provides you with a locally backed-up copy of databases. If the entire server fails or must be rebuilt, you still have to restore data from an alternate media. Keep this in mind when planning for database sizes.

One additional possible advantage to using LCR is that you can streamline your backup process. Streaming backups and volume shadow copy (VSS) backups of production databases can adversely affect performance during the backup windows. Backup windows have to be precisely calculated in order to ensure that online maintenance is completed at least once week for each database.

An alternate backup approach for Exchange is to use LCR to keep a completely synchronized copy of the production database and then to use a VSS backup of the LCR database. As long as the LCR copy of the database is on different spindles from the production database, the VSS back should not noticeably affect the I/O on the production database disk.

Requirements for Local Continuous Replication

There are some requirements that you need to plan for when you implement LCR. These include ensuring that you have adequate server capacity and that your storage groups are configured properly. Here are some tips when planning to implement LCR:

Implementing LCR will generally increase the amount of CPU and memory capacity on a server by at least 30 to 40 percent. Do not implement LCR on a server that is already on the border of having performance problems. Move mailboxes or server roles to another server to lighten the load on a mailbox server on which you are planning to enable LCR.
For the best level of recoverability and performance, LCR databases and transaction logs should be on separate physical disks or separate logical units (LUNS if you using a SAN) from the production databases and logs.
Sufficient disk capacity must exist for LCR databases and transaction logs. If you have 500GB of available disk space for the production databases, your LCR database will need to be the same size. The disks that host the LCR databases and transaction logs should be capable of the same I/O capacity as the production databases.
On heavily loaded mailbox servers, you may run out of drive letter capacity when adding LCR databases and transaction logs. Volume mount points can be used in this instance.
Storage groups can have no more than one mailbox or public folder database each.
Only one public folder database in an entire Exchange organization can be replicated with LCR. The LCR solution for organizations with more than one public folder database is to use public folder replication.
It is more efficient to start using LCR immediately after you create a storage group and mailbox database. Enabling LCR for an existing storage group and database will take longer if the database file size is large.

If you can't meet the prerequisites for LCR, then you should look at improving the capacity of your server resources and configuration prior to starting. If a server is not configured with the proper capacity, you will find that you make performance problems worse.

Configuring Local Continuous Replication

Configuring LCR is pretty simple to do and it can be done via the Exchange Management Console (EMC) or the Exchange Management Shell (EMS). We will take you through a configuration of LCR using the EMC and then cover the necessary EMS steps to accomplish the same tasks.

To enable a storage group to use LCR via the EMC, you can use a wizard. In the EMC, locate the storage group in the Server Configuration work center and under the Mailbox subcontainer. Highlight the server name in the Results pane and then locate the storage group in the Work pane. Select the Enable Local Continuous Replication task in the Actions pane; this runs the Enable Storage Group Local Continuous Replication Wizard. The Introduction page of this wizard is shown in Figure 6.15.

image from book
Figure 6.15: Starting the Enable Storage Group Local Continuous Replication Wizard

All you have to do on the Introduction page of the wizard is confirm the storage group name and confirm that there is only a single database in the storage group. The database list is labeled "Database Names," but the wizard will stop you later if the storage group has more than one database.

Just like creating a storage group, you must define the location of the transaction log files and the system files. On the Set Paths page (shown in Figure 6.16), you must specify the LCR paths for the transaction logs and system files. Ideally, these paths should be on a separate physical disk from the original transaction logs, the original database, and the LCR database.

image from book
Figure 6.16: Specifying LCR paths for transaction logs and system paths

The screen capture shown in Figure 6.17 shows the Engineering Mailboxes page of the Enable Storage Group Local Continuous Replication Wizard. This page is unique to the database contained in the storage group you have selected and thus will usually be different depending on the database that appears in the Database Names field.

image from book
Figure 6.17: Specifying a path for the LCR database

On the Engineering Mailboxes page, the only thing you can specify is the location of the LCR database. Ideally, like the LCR path for the system and transaction log files, this should path should be on a separate physical disk from the original database and transaction log files as well as the LCR transaction log and system files.

The Enable page of the wizard simply shows the configuration summary of what tasks are about to be performed.

image from book

Once the Enable button is clicked on the Enable page, the Enable-DatabaseCopy and Enable-StorageGroup copy cmdlets are used to enable LCR for this storage group and database. The following are the commands that are actually executed by the wizard:

 Enable-DatabaseCopy -Identity 'HNLEX03\Engineering Mailboxes    SG\Engineering Mailboxes' -CopyEdbFilePath 'D:\Engineers-Mailboxes-LCR\Engineering Mailboxes.edb' Enable-StorageGroupCopy -Identity 'CN=Engineering Mailboxes SG,CN=InformationStore, CN=HNLEX03,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT), CN=Administrative Groups,CN=Volcano Surfboards,CN=Microsoft Exchange,CN=Services, CN=Configuration,DC=volcanosurfboards,DC=com' -CopyLogFolderPath 'D:\EngineersSG-LCR' -CopySystemFolderPath 'D:\EngineersSG-LCR'

Notice that the fully qualified distinguished name was used in the Enable-StorageGroupCopy command to identify the storage group name. To help you disseminate and decode what was done, here is a summary of the configuration parameters used to enable LCR for the storage group and database:

Storage group name	Engineering Mailboxes SG
Database name	Engineering Mailboxes
LCR transaction logs path	D:\EngineersSG-LCR
LCR system files path	D:\EngineersSG-LCR
LCR database path	D:\Engineers-Mailboxes-LCR

Managing Local Continuous Replication

Once LCR is enabled for a storage group, there are a few management tasks that you may need to perform. Management tasks for LCR include checking the health of a storage group's replication, suspending replication, resuming replication, and resynchronizing (aka reseeding) the entire database.

Health Checks

Now that storage group replication is enabled, you can confirm that it is working in a number of different ways. The first is just to look at the listing of storage groups and databases names in the Work pane. Notice for the Engineering Mailboxes SG that the value in the Copy Status column is Healthy.

image from book

Here are the possible status values you may see in the Copy Status column (or in the SummaryCopyStatus property when you use the EMS cmdlet):

Healthy means that LCR is working normally and data is replicating and being committed to the LCR copy of the database.
Disabled means that LCR is not configured.
Suspended means that an operator has temporarily stopped replication.
Seeding means that the production database is being copied to the LCR location.
Failed means that something has failed during the replication process and there may be problems with the configuration, logs, or database. Consult the event logs.
Not Supported means that the current configuration does not support or allow LCR.

Using the EMS cmdlet Get-StorageGroupCopyStatus, you can retrieve more useful and detailed information about the LCR status of a particular storage group. Here is an example:

 Get-StorageGroupCopyStatus "Engineering Mailboxes SG" | Format-List Identity                      : HNLEX03\Engineering Mailboxes SG StorageGroupName              : Engineering Mailboxes SG SummaryCopyStatus             : Healthy CCRTargetNode                 : Failed                        : False FailedMessage                 : Seeding                       : False Suspend                       : False SuspendComment                : CopyQueueLength               : 2 ReplayQueueLength             : 9 LatestAvailableLogTime        : 12/10/2006 11:39:27 PM LastCopyNotificationedLogTime : 12/10/2006 11:39:27 PM LastCopiedLogTime             : 12/10/2006 11:39:26 PM LastInspectedLogTime          : 12/10/2006 11:38:45 PM LastReplayedLogTime           : 12/10/2006 11:33:18 PM LastLogGenerated              : 1768 LastLogCopyNotified           : 1768 LastLogCopied                 : 1767 LastLogInspected              : 1766 LastLogReplayed               : 1757 LatestFullBackupTime          : 11/25/2006 12:05:57 PM LatestIncrementalBackupTime   : 12/4/2006 8:46:09 PM SnapshotBackup                : False IsValid                       : True ObjectState                   : Unchanged

Suspending and Resuming Replication

There is really not much that you need to do to an LCR database once it is replicating. If you have to do maintenance on the disk on which LCR is running or if you want to stop all replication, you can highlight the storage group that is running LCR and click Suspend Local Continuous Replication in the Actions pane. When you choose to suspend LCR, you are prompted for a reason. Simply type a reason and click Yes.

image from book

To resume LCR on the storage group, select the storage group and then click Resume Local Continuous Replication in the Actions pane. You will be prompted to confirm that this is what you want to do and you will see the reason that LCR was suspended previously.

image from book

You can accomplish the same thing using the Suspend-StorageGroupCopy and Resume-StorageGroupCopy cmdlets:

 Suspend-StorageGroupCopy "Engineering Mailboxes SG" -SuspendComment "LC R disk maintenance on December 12" -Confirm:$False

Once LCR is suspended, you can confirm it also using the Get-StorageGroupCopyStatus cmdlet, as you can see in this example:

 get-storagegroupcopystatus "Engineering Mailboxes SG" | FL Identity,StorageGroupName,SummaryCopyStatus,SuspendComment Identity          : HNLEX03\Engineering Mailboxes SG StorageGroupName  : Engineering Mailboxes SG SummaryCopyStatus : Suspended SuspendComment    : LCR disk maintenance on December 12

The Application event log will contain event ID 2083 from the MSExchangeRepl service indicating that replication for the storage group has been suspended:

 Event Type: Information Event Source: MSExchangeRepl Event Category: Action Event ID: 2083 Date: 12/10/2006 Time: 11:50:58 PM Computer: HNLEX03 Description: Replication for storage group HNLEX03\Engineering Mailboxes SG has been suspended.

Tip

If you suspend LCR during busy times for your server, expect to have a lot of logs that need to be replayed when you resume LCR.

When you are ready to resume LCR for that storage group, you can use the Resume-StorageGroupCopy cmdlet:

 Resume-StorageGroupCopy "Engineering Mailboxes SG"

The Resume-StorageGroupCopy cmdlet starts the log files copying and replaying once again and it clears the SuspendComment property. When LCR is restarted, you will see the following event information in the Application event log:

 Event Type: Information Event Source: MSExchangeRepl Event Category: Action Event ID: 2084 Date: 12/11/2006 Time: 8:04:24 AM Computer: HNLEX03 Description: Replication for storage group HNLEX03\Engineering Mailboxes SG has been resumed.

You will also see events in the event log indicating that the log files have started copying (event ID 2114) and log files have started replying (event ID 2115). These are normal and expected:

 Event Type: Information Event Source: MSExchangeRepl Event Category: Service Event ID: 2115 Date: 12/11/2006 Time: 8:05:08 AM Computer: HNLEX03 Description: The replication instance for storage group Engineering Mailboxes SG has started replaying logfiles. Logfiles up to generation 1788 have been replayed.

Resynchronizing Local Continuous Replication

Under some circumstances, it may be come necessary to resynchronize the database or to manually resume replication. This operation is also called reseeding. This may be necessary if you created an LCR database before the original database was created, if you have performed an offline defragmentation of the original database, if the LCR database gets deleted, or if the LCR database becomes corrupted.

The only way to resynchronize the database is to use the EMS cmdlet Update-StorageGroupCopy. Prior to running this cmdlet, you should suspend LCR for the storage group that you are working on and then delete the LCR files (database and transaction log files) unless you are planning to use the -DeleteExistingFiles parameter. Here is an example:

 Update-StorageGroupCopy "Engineering Mailboxes SG" -DeleteExistingFiles Confirm Continuous replication seeding found an obsolete checkpoint 'D:\EngineersSG-LCR\E02.chk' file for storage group copy 'Engineering  Mailboxes  SG'. The checkpoint file will be deleted, and then the  database will be seeded if you confirm now. [Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"):y Confirm Continuous replication seeding found an existing target database 'D:\Engineers-Mailboxes-LCR\Engineering Mailboxes.edb' for storage  group copy 'Engineering Mailboxes SG'. This target database will be deleted, before seeding starts, if you confirm. [Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"):y

This operation can take a fairly significant amount of time if the production database is large. Remember that it is making a copy of the production database. If you do this during a period when the load on the Exchange server is typical, you may affect your end users' response times when they are using the server. We recommend you perform this operation during off-hours or periods of low usage.

Recovery Using Local Continuous Replication

The reason you put LCR into operation in the first place is to allow you to very quickly bring online a backup copy of the database. You would only need to do this if the production database has become corrupted. Database corruption is a tough topic to try to address in just a few lines, but we should state clearly here that as long as the production database is on a separate physical disk from the transaction logs, the LCR transaction logs, and the LCR copy of the database, the corruption should not extend to the LCR copy of the database.

How will you know that your production database is corrupted? We can think of a couple of situations:

Normal or full backups of the production database fail. Online backups of the database using Exchange-aware backup software will perform a page-by-page check of the database as it backs it up. If a page-level error is detected, the backup halts, the error is logged to the backup log, and the error is logged to the Application event log.
If corruption is detected during normal operations (for example, if the database engine reads a page of data that is corrupted), Exchange confirms that the page in the database is bad and logs an event to the Event Viewer.
The database will not mount or reports errors when you try to mount it.

Monitoring for potential errors in your production databases is something you should do regularly, or you should configure your monitoring system to monitor for specific errors in either the Application event log (such as the one shown in Figure 6.18) or the backup logs such as the one shown here:

 Backup started on 12/11/2006 at 8:47 PM. The 'Microsoft Information Store' returned 'Error returned from an ESE function call (d). ' from a call to 'HrESEBackupRead()' additional data '-'The 'Microsoft  Information Store' returned 'Error returned from an ESE function call  (d). ' from a call to 'HrESEBackupRead()' additional data '-' The operation was ended. Backup completed on 12/11/2006 at 8:47 PM. Directories: 0

image from book
Figure 6.18: Errors found when Exchange Server reads a corrupted page from the database

Note in the case of the error shown in Figure 6.18 that the database was mounted and functioning. The error did not interfere with the normal functioning of the database but was rather a single page in the database that could not be read properly. This error was probably due to the disk subsystem, device driver, or firmware. It is unlikely that the problem would extend to the LCR copy of the database.

If you realize that your production database is corrupted, you can manually switch the LCR database into production. This is done using the Restore-StorageGroupCopy cmdlet. Before we do an example, let's look at the current location of the live database and logs as well as the locations of the LCR files. Here are two quick ways to retrieve this information using the EMS:

 [PS] C:\>Get-StorageGroup "Engineering Mailboxes SG" | FL name,*path* Name                 : Engineering Mailboxes SG LogFolderPath        : D:\EngineersSG SystemFolderPath     : D:\EngineersSG CopyLogFolderPath    : D:\EngineersSG-LCR CopySystemFolderPath : D:\EngineersSG-LCR [PS] C:\>Get-MailboxDatabase "Engineering Mailboxes" | FL name,*path* Name            : Engineering Mailboxes CopyEdbFilePath : D:\Engineers-Mailboxes-LCR\Engineering Mailboxes.edb EdbFilePath     : D:\EngineersSG\Engineering Mailboxes.edb

There are two steps to switching over to using an LCR database instead of the original production database. The production database must be dismounted and then the LCR database/log locations are swapped out. There are two approaches to "swapping out" the database. The first (and desired) approach is to copy the LCR database to the production database location. Here is an example:

 [PS] C:\>Dismount-Database "engineering mailboxes" -Confirm:$False [PS] C:\>Restore-StorageGroupCopy "Engineering Mailboxes SG"       Base name: e02       Log file: D:\EngineersSG\E0200000774.log       Csv file: D:\EngineersSG-LCR\IgnoredLogs\q5cfbb2m.koe       Base name: e02       Log file: D:\EngineersSG-LCR\E0200000774.log       Csv file: D:\EngineersSG-LCR\IgnoredLogs\5p52d1ni.kxz Integrity check passed for log file: D:\EngineersSG-LCR\inspector\E0200000775.log Integrity check passed for log file: D:\EngineersSG-LCR\inspector\E0200000776.log Integrity check passed for log file: D:\EngineersSG-LCR\inspector\E0200000777.log Integrity check passed for log file: D:\EngineersSG-LCR\inspector\E0200000778.log Integrity check passed for log file: D:\EngineersSG-LCR\inspector\E0200000779.log Integrity check passed for log file: D:\EngineersSG-LCR\inspector\E02.log WARNING: Restore-StorageGroupCopy on Engineering Mailboxes SG was successful. All logs were successfully copied.

Once this is executed, you must manually copy it into the production location. You can do this by just copying the files, or if the paths are the same but the drive letters are unique, you could simply reassign the drive letters. For example, if the production database is on D:\EngineeringMailboxes and the LCR database is on E:\EngineeringMailboxes,youcould simply tell the server that the D: drive is now the E: drive. The advantage to this approach is that the documented locations of all of the database and storage group files remains the same. The downside to this is that the only Exchange data that could be on the D: and E: drives in this example would be that one database that is being swapped out.

The other way to swap out the database files is simply to swap out the locations; this is done with the Restore-StorageGroupCopy cmdlet and the -ReplaceLocations option. Here is an example:

 [PS] C:\>Restore-StorageGroupCopy "Engineering Mailboxes SG" -ReplaceLocations       Base name: e02       Log file: D:\EngineersSG\E020000001F.log       Csv file: D:\EngineersSG-LCR\IgnoredLogs\raobyk4o.lqt       Base name: e02       Log file: D:\EngineersSG-LCR\E020000001F.log       Csv file: D:\EngineersSG-LCR\IgnoredLogs\hosrmoec.5v1 Integrity check passed for log file: D:\EngineersSG-LCR\inspector\E02.log WARNING: The Restore-StorageGroupCopy operation for storage group copy Engineering Mailboxes SG was successful, and production paths were updated. All logs were successfully copied.

The database can now be remounted, but it is now in use in a different location. We can confirm this with the Get-StorageGroup and the Get-MailboxDatabase cmdlets. Notice also that LCR has been disabled for this storage group after the Restore-StorageGroupCopy cmdlet was run:

 [PS] C:\>Get-StorageGroup "Engineering Mailboxes SG" | FL Name,*path*,HasLocalCopy Name                 : Engineering Mailboxes SG LogFolderPath        : D:\EngineersSG-LCR SystemFolderPath     : D:\EngineersSG-LCR CopyLogFolderPath    : CopySystemFolderPath : HasLocalCopy         : False [PS] C:\>Get-MailboxDatabase "Engineering Mailboxes" | fl Name,HasLocalCopy,*path* Name            : Engineering Mailboxes HasLocalCopy    : False CopyEdbFilePath : EdbFilePath     : D:\Engineers-Mailboxes-LCR\Engineering Mailboxes.edb

The Microsoft online documentation makes a very good point that if you use the - ReplaceLocations parameter, you should make an effort to update your documentation to reflect the new database location or move the database back to the original location. Otherwise, your documentation will now be out-of-date and other administrators may be confused as to why the production databases are in folders that have LCR in their name.