8.5 Successfully deploying Exchange clusters | Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers (HP Technologies)

After providing a bit of background on clustering and Exchange Server’s historical support of clusters, I turn now and focus solely on Exchange 2000/2003 deployment. I will take a look at the various configuration scenarios and best practices in hopes of giving you a good starting point for your own Exchange cluster implementations.

8.5.1 Decision points for Exchange clusters

When deciding whether to invest in a clustered Exchange Server environment, several issues must be considered and weighed in order to provide an ROI justification. Each organization will need to identify and analyze these issues introspectively. This section provides some decision points and questions an organization should consider before choosing to deploy Microsoft Exchange 2000/2003 servers in a clustered environment. The intention is to give the reader valuable guidance that will aid in selecting clustered solutions based on qualitative thinking, rather than technical or marketing hype.

First, the leading causes of downtime must be considered. Most research shows that the leading causes of downtime can be ranked in the following order:

Infrastructure
Software failures
Operational/administrative
Hardware failures

Interestingly enough, most researchers agree that the least likely component to fail in your environment is hardware. This corroborates what hardware vendors hear from customers and from service return rates. The leading causes of downtime are more often related to events and activities outside of the server. Issues such as poorly trained personnel, building outages (power or air conditioning), or flawed backup and restore procedures typically account for more downtime than hardware-related events, such as hard drive failures. In addition, software failures caused by the interaction of the operating system with third-party tools or drivers are often the downtime culprits as well. Regardless of what research indicates, it is important for each organization to determine the leading causes of downtime on an organizational basis. It is important that this information be related to the actual causes of downtime for the Exchange Server environment. With this information in hand, we can step through several key decision points that will assist you in determining whether deploying Exchange Server clusters is right for your organization.

What are our availability requirements for Exchange Server?

Many organizations can tolerate hours or even days of downtime for their messaging and collaboration environment. Others need 99.999% availability. Since MSCS can typically provide a maximum of 99.99% reliability (speaking from experience), organizations requiring a higher degree of availability may find that their requirements cannot be met within the limits of current technology. The bottom line is that 99.999% availability is not a reality easy to achieve for Exchange Server deployments today.

Can we resolve most of our downtime issues by investment in other areas?

Hardware vendors provide many hardware technologies, such as RAID disk arrays and redundant power supplies and fans that are either standard or optional across their server product line. However, many customers choose not to invest in or implement these features. The high-availability requirements of your organization may be attainable via investment in off-the-shelf technologies available from your server hardware vendor. Illustrating this point further, you may also choose to invest in personnel training or procedural redesign to alleviate problems in these areas that are causing downtime. Looking to make investments in other areas may give you the opportunity to focus on the real issues for your deployment rather than simply throwing technology (i.e., clustering) at your downtime issues.

Does clustering solve our leading causes of downtime?

After evaluating the leading causes of downtime in your Exchange Server environment, you must determine whether the implementation of clustering would address these issues. The design goals for clustering support in Microsoft Exchange Server include protection from hardware failure and increased availability. If your organization’s leading downtime causes are in areas such as operational/administrative, software, or infrastructure failures, investment in clustering technology may not reduce downtime. Also, if you, like most organizations, have a high scheduled-to-unscheduled downtime ratio (like Microsoft OTG’s ratio of 6:1), clustering may be able to provide significant relief. Exchange Server clustering can only protect you from issues such as hardware failures or localized outages and software errors (issues that would only affect the primary node in the cluster). As research indicates, hardware failures are often the least frequent cause of downtime. For example, if you choose to implement clustering as a means of protecting against all software failures, you are likely to be disappointed.

What is the effect of the increased levels of complexity that clustering introduces?

Since MSCS is an additional software component and the additional hardware configuration issues create higher levels of complexity, can your operational staff tolerate the increased complexity, training requirements, and frustration factor? To administer a cluster, operators must be familiar with clustering concepts and learn to use the MSCS Cluster Administrator utility. The shared disk or SAN system required for clustering requires specialized knowledge and training, without which there are many pitfalls lurking for the unwary. Performing disaster recovery for a cluster is much more complex than for a stand-alone server. In addition, procedures for fail over and failback operations will also need to be developed. Clearly, Exchange Server is already a very complex environment. The question is whether your organization wants to increase that complexity by adding clustering to the equation.

Is the investment in clustering justified by the ROI?

This is the question that it all comes down to for most organizations. After considering all the issues, most organizations have to answer this important question. Many organizations will decide that their messaging and collaboration environment is mission-critical in nature and warrants the utmost investment in every capability available to increase availability. Others will decide that investments in other areas make more sense. It really comes down to whether the additional availability achieved through clustering Exchange Server can be justified in your organization.

These decision points are only a beginning. Each organization should evaluate the question of high availability for Exchange Server based on organizational requirements and SLAs for its user/customer base. The issue is not whether clustering is good or bad technology but whether clustering addresses the leading causes of downtime within a particular organization. Selecting clustering technology with the assumption that it will solve all your downtime issues (including those not related to hardware failures) will only lead to disappointment based on false expectations.

8.5.2 Starting off right with storage design for Exchange clusters

One of the most challenging, but most important, parts of deploying Exchange 2000/2003 clusters is storage planning. With Exchange Server 5.5 clusters, only one virtual server technically existed in the cluster, and storage allocation from the shared cluster storage was simplified. With Exchange 2000/2003, the support of multiple storage groups per node significantly complicates cluster deployment and management. Regardless of this challenge, storage design must be done right the first time for a successful implementation. The success and popularity of SAN technology as a shared storage mechanism for Windows Cluster Server will facilitate more learning and a faster progression on the learning curve of storage design and allocation in a clustered environment. When planning your deployment of Exchange 2000/2003 in a cluster, ensure that you are familiar with the setup and configuration of SAN technology in a cluster as well as SAN features and options such as data replication and BCVs that potentially enable even more compelling mission-critical scenarios for Exchange deployments.

When configuring Exchange 2000/2003 in a clustered environment, you need to plan carefully the volumes you want to share between the member nodes in the cluster. In fact, “share” is not the most appropriate word—“apportion” is better—because Microsoft clustering for Windows Server works in a shared-nothing model. The first step is to take a backwards approach to the hardware design and setup for a cluster. Start with the Exchange configuration and work backwards. For example, if you plan on deploying a four-node cluster running Exchange 2000/2003, decide the user-load-per-node requirements for the entire cluster first. As an example, suppose you want to support 7,500 users on a four-node Active/Passive ( N+1) Exchange 2000/2003 cluster. Evenly dividing these users across the cluster would yield 2,500 users per active node. You could then design each cluster node to meet the performance and scalability requirement for 2,500 users. The next step would be to determine the fail over scenarios required by the user and cluster configuration. Exchange 2000/2003 limits the number of storage groups per node to four. This means that each cluster node can never have more than four Exchange storage groups running on it at any time. This limit is particularly important in a cluster fail over condition. If a failure has occurred and an Exchange virtual server has moved to another node, the total number of storage groups is still limited to four. Hopefully, this limitation will be removed in a later release of Exchange Server (in the Kodiak time frame) when technologies like 64-bit memory addressing are available. In the meantime, clusters must be designed with this limitation in mind.

Another important factor in Exchange cluster storage design is whether you choose to deploy an Active/Passive or Active/Active cluster design. Microsoft vehemently recommends Active/Passive clustering due to this configurations, superior tolerance for the virtual memory fragmentation issues discussed earlier. Microsoft cites two key reasons for its preference for Active/Passive clustering.

Scalability— Active/Passive Exchange clusters scale better since you size servers in the same manner and fashion as stand-alone nonclustered Exchange servers. In addition, Active/Passive clusters can support as many as eight cluster nodes in a plethora of configuration options. Active/Active clusters for Exchange 2000 are limited to two nodes and 1,900 concurrent MAPI user connections per node (not a hard limit, but a theoretical limit based on VM fragmentation). For Exchange 2003, the user connection limit for Active/Active clusters is expected to rise to only 2,500 based on the relatively minor improvements to the VM fragmentation issue.
Reliability— Active/Passive clusters are more reliable since they are not hampered by VM fragmentation issues and virtual servers always fail over to a clean node in the cluster (assuming that multiple node failures have not occurred).

The choice of Active/Passive versus Active/Active Exchange clusters has definite impacts on storage design for Exchange clusters. If you choose Active/Passive, your storage design constraints (nodes, storage groups, connections) are somewhat less, whereas if you choose Active/Active clusters you must bear the burden of VM fragmentation and the issues that multiple storage groups and databases add to this problem.

Once you have considered the per-node storage group limitations, you can determine how many users per storage group will be configured. Again, since one Exchange virtual server can contain multiple storage groups, take care to ensure that the four-SG-per-node rule is not exceeded during both normal and fail over conditions. In the 7,500-user cluster example, let’s keep it simple and plan for one virtual server per node and one storage group per virtual server (a ratio of 1:1). This means that one Exchange virtual server and a single storage group would service all 2,500 users on each node. Continuing to work backward, we can now begin to plan storage requirements and configuration for each cluster node. Using well-known best practices for maximizing disk I/O is the best approach here. Remember the rule of thumb from previous chapters—separate sequential from random I/O. An Exchange 2000/2003 database actually consists of two files: the properties store (*.EDB) and the streaming store (*.STM). The properties store is a B Tree database structured file that is accessed in a random I/O fashion. The streaming store is structured in clusters of 64-KB runs and is typically accessed in a random manner. In addition, these files have different access characteristics depending on the type of clients that will be supported. For MAPI protocol clients, the streaming store is not utilized. For Internet Protocol clients (such as IMAP, POP3, HTTP, and SMTP), the streaming store is the primary storage location with certain properties being stored in the properties store. Each Exchange storage group has one set of shared database transaction logs and can be configured with multiple database files (an *.EDB and *.STM pairing). Using our rule of thumb, each storage group should have a dedicated disk volume (preferably RAID1 or 0+1) in which to store the transaction log files since they are accessed in a strictly sequential manner. Depending on the clients supported, you may also choose to separate the streaming store and the property store onto separate physical volumes as well. However, based on the cost-effectiveness of such a configuration, most deployments will typically choose to place both the property and the streaming store on the same volume (configured as RAID5 or 0+1 for maximum performance and data protection). Table 8.8 identifies each Exchange database component and the best practices that should be followed for optimal design. Of course, storage design for any Exchange server involves more than just performance optimization. When planning storage for Exchange clusters, don’t forget to factor in management and administrative overhead, disaster-recovery concerns, and cost issues.

Table 8.8: Optimal Placement of Exchange Database Files (Performance Viewpoint)
Database Engine File	Storage Characteristics and Optimization
Transaction log files (*.LOG)	Sequential I/O: Dedicate a RAID1 or 0+1 array to each storage group for transaction logs.
Property Store (*.EDB)	Random I/O (4 KB): Dedicate a RAID1, 0+1, or 5 array to each storage group for the property store. Can be combined with streaming store if no or few Internet Protocol clients are supported. For heavy I/O environments, a separate array for each property store in a storage group (up to five databases can be configured) may be necessary (don’t forget to weigh the cost, management, and disaster-recovery factors).
Streaming store (*.STM)	Mostly large (32–350 KB) random I/O: Dedicate a RAID1, 0+1, or 5 array to each storage group for the streaming store. Can be combined with the properties store if no or few Internet protocol clients are supported. For Internet Protocol clients in heavy I/O environments, a separate array for each streaming store in a storage group may be necessary. However, this may double storage requirements in a cluster.

Drive letter limitations for Exchange clusters

In large and complex Exchange cluster configurations, drive letters are at a premium on Windows servers because you are limited to 26 lettered drives on all Windows machines. To alleviate this issue, Microsoft introduced mount points. Mount points have been around in the UNIX world for years and offer a method of adding additional “root” locations for file system access. The usage of mount points in clusters is a design requirement must-have to prevent limitations in Windows drive letters that could hamper large-scale cluster designs for Exchange. By utilizing mount points, multiple disk volumes for an Exchange virtual server or storage group can be integrated behind a single drive letter, ensuring that cluster management and fail over are greatly simplified, as well as ensuring that plenty of drive letters exist for potential future growth at the cluster-node, virtual-server, or storage-group level. Figure 8.10 provides an example Exchange 2003 cluster design utilizing mount points.

click to expand
Figure 8.10: Exchange 2003 Cluster design utilizing mount points.

When you have determined the number of users per node, the number of virtual servers per node, and the number of storage groups per virtual server, you can begin to design your cluster shared-storage configuration using the storage design best practices, and the recommendations previously discussed. In Figures 8.11 and 8.12, consider two sample cluster designs based on our earlier example of a 7,500-user, four-node cluster (Active/Passive) and a 4,000 user, two-node (Active/Active) cluster. These are only examples; you will need to determine the actual design for your deployment based on many factors.

click to expand
Figure 8.11: 4-Node (N+1) Exchange 2003 Cluster configuration (7,500 users).

click to expand
Figure 8.12: 2-Node (A/A) Exchange 2003 Cluster configuration (4,000 users).

8.5.3 Exchange cluster installation

Microsoft Windows Server 2003 does not provide a brand-new version of the clustering software, only minor enhancements to the Microsoft Cluster Server that shipped with Windows 2000. In order to set up a cluster, you need to run Windows 2003 on all machines that will be cluster member nodes. You need to install the first cluster member and make it part of a domain (member server). For a simple test configuration, you may want to promote the first node to a DC (using DCPROMO) and create your own forest. In general, it is not a good idea to use one member of a cluster as a Windows (AD) DC or GC server. When that server goes down, it can affect the Exchange services on the other members due to the fact that the AD service is no longer available. The general best practices defined for clustered configurations still apply: Keep it simple, and select one or two servers external to the cluster for AD domain controllers and GC servers. (Don’t plan on trying to cluster your DCs or GCs either; that practice is not supported.)

Before you can configure the cluster service, you need to ensure the configuration of the shared disk drives. The disks must have a signature and a partition (remember the lettering scheme) and must be of the type “basic.” If your disks are not basic, Cluster Server will not use them. To change dynamic disks into basic disks, use the Windows storage management MMC snap-in and select the Revert to Basic disk option. Note that, if a partition or volume is already defined on the drive, this option is grayed out. To enable the option, you must wipe out the existing volumes or partitions on the particular disk.

When you are ready, run the Cluster Administrator to enable the cluster-service to run on each of the cluster members. You should make sure that you have an operational cluster before going any further. This means that all cluster nodes are configured and disk resources are allocated and configured according to the storage design criteria previously determined. In addition, the cluster quorum resource must be established (required to create the cluster) and easily managed. A best practice is to uniquely identify the quorum disk (using the “Q” drive, for example) and identify one IP address and network name for the management of the cluster. For more information on Windows Cluster Service installation, setup, and configuration, please see www.microsoft.com/windowsserver2003/technologies/ clustering/default.mspx. Test your cluster installation to make sure that basic fail over operations work properly, and that the cluster resource remains accessible no matter which node(s) are active. Setting up a simple file server on your cluster is a great way to test this. Don’t proceed with your Exchange installation until you are confident that the basic clustering stuff is working right. Finally, get the Microsoft Cluster Diagnostics Tool (Clusdiag) from the Windows Server Resource Kit—it is a great tool for ensuring that your cluster is configured properly and for diagnosing cluster problems.

Installing Exchange 2000/2003 Server in a cluster

When installing Exchange 2000/2003 in a cluster, the most notable change from Exchange 5.5 is the placement of installed files. It is no longer necessary to define a cluster group or to place Exchange binary files on a shared cluster disk. The reason is quite simple: Now that Exchange can run in Active/Active mode on all members of a cluster, it is necessary for each server in the cluster to have a local copy of the binary files. During installation on a cluster node, Exchange 2000/2003 will discover the cluster configuration and proceed to install a cluster-aware version of the product locally on each cluster node.

Apart from the cluster installation notification dialog box, the installation proceeds as a normal installation and includes the schema update to the AD, if this has not been done already (of course, you should already have completed the forestprep and domainprep operations!). After that, you can install Exchange on the remaining nodes of the cluster. As much as possible, you should attempt to select the same options that you used for the first member installation. Again, you will be forced by the setup program to install on the local system drive. During the installation, the Exchange setup will recognize the cluster configuration and the fact that the organization already exists. Do not attempt to run installations on multiple cluster nodes at the same time; this is certain to cause you pain. After the installation of Exchange 2000/2003 on the cluster node is complete, the cluster node must be restarted before proceeding.

Creating Exchange 2000 virtual servers

Step 1: Creating the Exchange Resource Group

Before you can use the Exchange System Manager interface, you need to create a group that contains a minimum of the following:

One IP address.
One network name. This is the name under which the Exchange Server will appear in the Exchange 2000/2003 organization.
One or more disk resources that will be used to store transaction logs, databases, and temporary files.
The Exchange System Attendant (SA) resource. In fact, adding the System Attendant resource will result in the creation of all the other resources need for an Exchange 2000/2003 virtual server. These resources will be created by the Exchange Cluster Admin component (EXCLUADM.DLL) using Cluster Administrator.

Step 2: Adding the System Attendant resource

It is imperative to create the SA resource in order to get Exchange 2000/ 2003 running in a cluster. You will be prompted for the resource dependencies of the SA (the network name and any physical disk resources you desire the Exchange virtual server to utilize). In addition, the path of the data directory will also be required. Initially, the default drive and directory can be selected and changes can be made later to reflect the actual physical volumes on which transaction logs and database files will reside. Most important, all disk resources that the virtual server will utilize must be included as resource dependencies during creation of the SA resource; that means that you must already have physical disk resources for each volume so that you can specify them as dependencies to the SA. This is why configuring storage before the cluster is important! Next, if more than one exists, you will be prompted for the administrative group and the routing group where the Exchange virtual server (named after the network name resource of the Exchange group) will belong. The SA cluster resource configuration during Exchange virtual server creation is shown in Figure 8.13.

click to expand
Figure 8.13: Configuring Exchange system attendant resource dependencies.

The Cluster Administrator will then proceed to create the rest of the Exchange resources and put them into the Exchange resource group.

Figures 8.14 and 8.15 provide the alternative views available in the Cluster Administrator and Exchange System Manager of an Exchange cluster. Once resources are created, you can bring the entire resource group on-line and thereby make the Exchange virtual server available to clients. From this point on, you can use the Exchange System Manager MMC snap-in to manage the Exchange virtual servers in the cluster. Using the Exchange System Manager interface, you should be able to view the Exchange virtual server and note that the default databases have been created on the common volume. You can then modify the configuration for the location of the storage group transaction logs or database files as necessary. Remember, only the physical disk volumes that were included as resource dependencies for the virtual server (at creation time) will be available for use; the cluster’s storage system may have other physical disks that Exchange cannot see or use. As with Exchange 5.5 clusters, you should not attempt to stop, pause, or start the services other than through the Cluster Administrator. Do not use the Services Control Panel, the Exchange System Manager, or the command-line interface. When connecting, clients (whether MAPI or Internet Protocol– based) need to use the virtual server name, not the names or addresses of the individual nodes, when configuring which server the client connects to.

click to expand
Figure 8.14: Exchange Cluster View from Cluster Administrator.

click to expand
Figure 8.15: Exchange Cluster view from Exchange System Manager.

8.5.4 Migrating or upgrading to Exchange 2003 clusters

If you have already deployed Exchange 2000 clusters, Microsoft offers a relatively simple in-place upgrade path to Exchange 2003 (see In- place/ rolling upgrade strategy below). For Exchange 5.5 cluster users, Microsoft offers no supported method of going directly from Exchange 5.5 to Exchange 2003. If you are in this scenario, you best approach is to pursue the mailbox relocation strategy discussed next.

In-place/rolling upgrade strategy (Exchange 2000 to Exchange 2003)

One of the great things about having already deployed Exchange 2000 clusters is the fact that you can utilize the rolling upgrade feature of MSCS. A rolling upgrade allows an administrator to continue operation of the Exchange services with only minimal interruption, while upgrading each node using a “swing” or “rolling” succession approach. Upgrading an Exchange 2000 cluster to an Exchange 2003 cluster requires that you upgrade each individual cluster node and virtual server to Exchange 2003 (one at a time). The great thing is that you can do this while keeping virtual servers running (almost). To do this, all your Exchange 2000 nodes and virtual servers must already be upgraded to Exchange 2000 SP3. To accomplish the in-place/rolling upgrade, simply adapt this six-step procedure to your cluster’s configuration:

Ensure that the node you wish to upgrade has no Exchange virtual server (EVS) running on it by moving/failing the EVS over to another node in the cluster.
Perform the Exchange 2003 upgrade procedure by running Exchange 2003 setup on the node you wish to upgrade. Note that you must restart this node after the upgrade is complete.
Take the EVS from the node that you upgraded (now running on another cluster node) off-line using Cluster Administrator.
Use Cluster Administrator to move the EVS (while still off-line) to the node you have just upgraded to Exchange 2003.
On the upgraded node (to which you have just moved the EVS), open Cluster Administrator, right-click on the Exchange System Attendant resource, and select Upgrade Exchange Virtual Server from the menu.
Next, using Cluster Administrator, select the EVS cluster group, right-click and select Bring Online.

Once you have completed this procedure for the first node and EVS in your cluster, the same procedure can be adapted and repeated for the remaining nodes and EVSs in the cluster. Please note that this process is not reversible, and it is highly advisable to have completed a backup of Exchange data as well as a system state backup for each node before performing this procedure.

Move mailbox strategy (Exchange 5.5 to Exchange 2000/2003)

Another strategy available to Exchange administrators who are upgrading to Exchange 2003 from Exchange 5.5 is the mailbox relocation approach. With this strategy, an Exchange 2000/2003 cluster is added to the same site as an existing Exchange 5.5 cluster. Since the site replication services (SRS), which are required for an Exchange 2000/2003 server to interact in an Exchange 5.5 site, are not supported on cluster nodes, you need to have another Exchange 2000/2003 server in the Exchange 5.5 site. After the node is up, user mailboxes and public folders are moved from the Exchange 5.5 cluster (or other Exchange 5.5 servers for that matter) to the Exchange 2000/2003 cluster. This can be accomplished directly via the Exchange Administrator program (Move Mailbox) or via tools such as ExMerge. This strategy is palatable since it allows a phased or gradual migration from the Exchange 5.5 cluster environment to the Exchange 2000/2003 cluster environment. In addition, since no upgrade is actually being performed, the procedure is less risky and requires a less complex back-out plan. The disadvantage to this approach is that it requires the additional investment of hardware, software, and support resources to have two parallel systems in operation. Once all user and public data is transferred from the Exchange 5.5 cluster to the Exchange 2000 cluster, the Exchange 5.5 cluster or other servers can be decommissioned and Exchange services removed (assuming no other caveats exist, such as the Exchange 5.5 cluster being the first in the site, and so forth). Since there is still a high installed base of Exchange 5.5, most organizations will find this strategy attractive, despite the additional cost, because of the easier migration and lower risk.

8.5.5 Exchange cluster management best practices and lessons learned

Honestly, there just are not a lot of cluster installations out there in the Exchange world. With improvements in Exchange 2000 and further in Exchange 2003, clustered Exchange servers will be more common. What I provide in the following is my experience gained from dealing with organizations who have made the choice to cluster their Exchange servers.

Cluster management tools and monitoring

The important questions asked when deploying Exchange 2000/2003 in a clustered configuration are centered on whether the Exchange administrator experience will be the same in a clustered environment as in stand-alone Exchange 2000/2003. Administrators are justifiably concerned about the administrative and operational aspects of Exchange in a cluster. Realizing this concern, Microsoft designed Exchange 2000/2003 clustering to have minimal administration and management differences as compared with a nonclustered configuration. The goal was to make management of Exchange 2000 in a cluster no different from a stand-alone Exchange server. With Exchange 2000/2003’s dependence on the Windows AD, it is easier to accomplish this task, since so much Exchange configuration information is maintained in the AD. The necessary administrative differences become fairly intuitive to system managers. Roles such as adding and deleting users, managing storage, and other administrative tasks are no different in a clustered environment. Disaster recovery for clusters, while leveraging the same mechanisms as nonclustered servers, requires some additional measures discussed in a later section.

The major differences in managing Exchange 2000/2003 in a cluster have to do with the cluster resource management of Exchange services running in the cluster. Cluster administration of resources does entail a learning curve to successfully manage any application in a cluster. Prior to deploying Exchange clusters, you should ensure that your operations and system-management staff understand the idiosyncrasies of Windows clusters and services. Also, clustering tends to get more complicated in direct proportion to the number of cluster nodes. When more than two cluster nodes are configured, the complexities of managing Exchange clusters will multiply. Specifically, managing, allocating shared storage for, and planning fail over scenarios for multinode clusters will be most challenging. You can cope with this challenge and complexity with a thorough understanding of Windows clustering, Exchange 2000/2003 cluster implementation, and advance testing and piloting of an Exchange cluster before putting it into production.

Management of the Exchange 2000/2003 virtual servers is achieved via the Exchange System Manager MMC snap-in. When opening the snap-in, virtual servers will appear the same as stand-alone servers. You can then create additional storage groups or databases by directly managing the virtual servers via the Exchange System Manager (ESM). Each of these virtual servers can run on each member of the cluster on which it has been authorized. Don’t forget the per-node storage group limitations (four per node). By default, a single storage group is created for each virtual server, called “First Storage Group.” Also, one mailbox store is created for each virtual server. One public store per cluster (created in the first Exchange virtual server configured in the cluster) is also created by default. Each storage group’s databases, temporary files, and transaction logs are located on a disk resource specified at virtual server creation. As mentioned previously, the locations of all database files can be changed via ESM.

Monitoring Exchange 2000/2003 clusters has become a bit easier than it was for earlier versions of Exchange. This is particularly true if you utilize Microsoft Operations Manager (MOM). Shortly after the release of Exchange 2000, Microsoft made the MOM management pack for Exchange 2000 available. This management pack includes important rule based and service-based monitoring instrumentation for both clustered and stand-alone Exchange servers. For Exchange Server 2003, the MOM management pack is now available with your Exchange license and the management pack is included on the Exchange CD. Included in this management pack are the rules and instrumentation necessary to successfully monitor Exchange clusters and ensure that key events and conditions (such as those in Table 8.5) are proactively given the proper visibility. We will discuss proactive monitoring and management for Exchange further in Chapter 10.

Exchange cluster design

As I discussed earlier, you must make a fundamental decision from the beginning of your cluster design process. This decision involves whether you will use an Active/Active configuration (and live with the caveats and limitations) or choose an Active/Passive configuration (which Microsoft favors and highly recommends). For some, the choice may be simple (go with what Microsoft will stand behind). For others, business requirements and justifications may warrant the Active/Active scenario. Table 8.9 presents a summary of the supported Exchange 2003 cluster design scenarios.

Table 8.9: Exchange 2003 Cluster Design Scenarios
Scenario	Configurations	Description and Design Limitations
Active/active	Two-node Active/Active only	Two-node cluster maximum One virtual server per node Two storage groups per node 2,000 connection limit/40% CPU (variable*)
Active/passive	N+1 Examples: 2 Active/1 Passive 3 Active/1 Passive 7 Active/1 Passive	Passive node reserved for virtual server failover in the event of a primary node failure Eight-node cluster limit Sizing and configuration guideline similar to stand-alone server
	N+2 Examples: 2 Active/2 Passive 4 Active/2 Passive 6 Active/2 Passive	Two passive nodes reserved for virtual server failover in the event of a primary node failure Multiple passive nodes also enable other scenarios Eight-node cluster limit Sizing and configuration guideline similar to stand-alone server
	N+ N+ N Examples: 2 Active/1 Passive/1 Auxiliary** 4 Active/2 Passive/2 Auxiliary 6 Active/1 Passive/1 Auxiliary	Configuration involves the availability of both passive nodes for failover and passive nodes for auxiliary use such as maintenance, disaster-recovery operations, and excess capacity Eight-node cluster limit Sizing and configuration guideline similar to stand-alone server; auxiliary nodes may be scaled-back configurations for cost savings

*NOTE: 2,000-connection limit is theoretical based on virtual memory allocation and availability, as well as CPU utilization on each active node.

**Auxiliary nodes are used for backups and other operations and maintenance purposes.

The cluster design scenarios for Exchange Server 2003 offer a great deal of flexibility for the Active/Active configuration. As becomes obvious, two node Active/Active configurations have severe limitations that may ultimately stifle their adoption for most customers.

A note on geo-clustering for Exchange 2003

Microsoft has announced no specific support for Exchange 2003 using geographically dispersed cluster nodes. However, Windows Server 2003 does support this functionality via features such as the majority node set clustering configuration (which allows for distributed local quorum drives across multiple nodes). While the Windows development team has built some specific capabilities into Windows Server 2003 to support Geo-clustering, the Exchange development has not followed suit yet and provides no support in Exchange 2003 for this functionality. In fact, Exchange Server is unaware of this cluster configuration option and will function the same way in ordinary or “stretched” clusters. From a best-practices/recommendation point of view, I cannot recommend this configuration for Exchange 2003 since little testing and deployment data is available (and Microsoft does not support it for Exchange). However, I submit the following guidelines, based on MSCS limitations and the Exchange information store’s transacted storage nature, if you choose to explore this configuration for your Exchange clusters:

Cluster nodes must be on the same subnet (this is an MSCS requirement).
There is a 50-km distance limitation.
Data replication is synchronous between sites.
Replication latencies are less than 500 ms.

We discussed storage-data replication in Chapter 7, however, I do want to mention these limitations here in the context of Exchange clustering. Keep in mind that these configurations are highly dependent on hardware vendor technologies and configurations. Please ensure that you work closely with your storage vendor if you are considering storage replication of geographic clustering for Exchange.

Use standardized and simplified IP addressing and naming

In a clustered scenario, the cluster nodes, as well as the services they host, will require IP addresses and unique names. MCS requires that IP addresses allocated for the cluster, nodes, and services be static in nature (cannot be assigned via DHCP). All nodes and services must be preallocated IP addresses before the setup and installation is performed. These addresses should be structured in a manner that allows for simplified configuration and management of nodes and services. Likewise, naming for Exchange virtual servers and nodes should allow for simplified configuration and management. Table 8.10 illustrates a good strategy for IP addressing and naming for an example two-node Exchange 2000/2003 cluster. Notice that

Table 8.10: Cluster IP Addressing and Naming
Cluster or Node Name	Object/Resource	IP Address
MAGPIES	Cluster Management Name	132.192.1.100
HECKLE	VS1-NAME=RED-MSG-VS1	HECKLE: 132.192.1.10 RED-MSG-VS1: 132.192.1.11
JECKLE	VS2-NAME=RED-MSG-VS2	JECKLE: 132.192.1.20 RED-MSG-VS2: 132.192.1.21

IP addresses for the node name and virtual server name are closely correlated or in a easily memorable scheme for administrators and operators.

Resource ownership and fail over

Each cluster resource will automatically be configured with all cluster nodes in Exchange 2000/2003 installed as possible owners. However, if resources are created before all nodes have joined the cluster (or before Exchange 2000/2003 has been installed), these nodes will not be listed as possible owners. If they are not manually configured, resources will be prevented from failing over to these nodes. Take care when configuring hardware (disk and network), addressing (IP and network name), and Exchange (SA) resources to ensure that all nodes are included as possible owners. In addition, when configuring fail over and failback scenarios, list nodes in order in the Preferred Owners (for each virtual server resource group) dialog box in the Cluster Administrator. Although this situation can be remedied with manual administrator configuration, proper planning and thought will ensure proper fail over and failback operations by default once your cluster is configured and operational.

Removing Exchange virtual servers and binaries from a cluster

When removing Exchange from a cluster, take care to not interfere with the operations of other nodes. When removing an Exchange virtual server, take the cluster group first off-line. Next, the Exchange resources must be deleted. Removing the System Attendant resource will remove all other resources (based on dependency). Once the virtual server is removed from the cluster, the server can be deleted in Exchange System Manager. Finally, to remove the Exchange binary files from the cluster node, you must run the Exchange setup program and select the Remove option. When prompted, do not remove the Exchange cluster resources unless this is the last cluster node running Exchange services. Other important points here include the following:

Do not delete the EVS with the MTA instance unless it is the last node in the cluster. One virtual server in the cluster is created by default with an MTA instance for support of legacy options and connectors.
If the Microsoft Search instance for a virtual server is deleted, the entire virtual server must be recreated in order for content indexing to function for the virtual server.

Designing storage before configuring the cluster

With Exchange 2000/2003’s support for Active/Active clustering and multiple storage groups and databases, storage design can be quite complex. As discussed earlier in this chapter, consider all aspects before you configure your cluster. Since Exchange 2000/2003 storage groups are a subset of an EVS in a cluster, each storage group must fail over with the virtual server. This has implications for storage design. If you choose, based on performance considerations, to allocate a separate physical storage array for transaction log files and database files, each storage group will have a minimum of two arrays (one for logs and one for databases) that must provide the independence and granularity necessary to facilitate fail over. For example, if you have two storage groups and they share a common log file array (i.e., RAID1), they must necessarily be part of the same virtual server since the array will be the unit of fail over as a cluster resource. The implications of granularity, fail over, and virtual server-to-storage group mapping must be carefully planned. Another consideration for Exchange 2000/2003 will be the four-storage-group-per-node limitation. Whether pre- or postfail over, each cluster node will only support a maximum of four storage groups. Failure to follow this rule will result in issues and potential failures for EVSs.

The creation of disks and the allocation of databases to individual array sets is also a complex process. MSCS supports only basic disks as cluster resources. Take care to ensure that once shared storage disk resources are configured, they are initialized as the “basic” disk type before the first cluster node is configured. After creating the cluster, all shared disk resources must be managed by the cluster in order for Exchange to use them. Along with the disk types (“basic” versus “dynamic”) the use and limitations of drive letters in clusters is also an important consideration. As was discussed earlier in this chapter, if you are considering large cluster configurations (four or more nodes and more than four storage groups in the cluster, for example), you should definitely utilize mount points. The use of mount points can aid in cluster management and administration, as well as provide for future growth by limiting the number of drive letters that are used. A best practice for mount points is to assign a drive letter for each storage group and to assign all volumes associated with that storage group’s mount points that drive letter, as in the example shown in Table 8.11. In this example, we are utilizing only five drive letters for a configuration that would require 12 drive letters without the use of mount points.

Table 8.11: Utilizing Mount Points for Exchange Clusters
Drive Letter	Mount Point	Function
N$	N/A	Data volume for SG 1
	Mount point of N$	Transaction log volume for SG 1
O$	N/A	Data volume for SG 2
	Mount point of O$	Transaction log volume for SG 2
P$	N/A	Data volume for SG 3
	Mount point of P$	Transaction log volume for SG 3
Q$	N/A	Data volume for SG 4
	Mount point of P$	Transaction log volume for SG 4
R$	N/A	Backup/maintenance volume for SG1
	Mount point of R$	Backup/maintenance volume for SG2
	Mount point of R$	Backup/maintenance volume for SG3
	Mount point of R$	Backup/maintenance volume for SG4

A final word of advice about storage design: When creating EVSs, only one option is given as the data location. In order to allocate databases and log files to specific physical array sets, you must use Exchange System Manager after the virtual server has been created. In the Exchange System Manager MMC snap-in, database and log file locations can be configured on each node. Ensure that you have included all necessary disk resources in the EVS group in order to be utilized by Exchange. It is possible for cluster disk resources to be added to the resource group and configured for EVSs at a later time. However, at a minimum, one disk resource must be available.

8.5.6 Exchange cluster disaster recovery

A solid and consistent tape-backup strategy should be an integral part of your high-availability strategy for deploying Exchange Server. In a cluster environment, there are additional cluster-specific configuration issues that should be addressed to help you choose the method that best suits your high-availability requirements. For example, backup of an EVS on a cluster cannot be resumed midstream after a fail over. Most of these issues result from non-cluster-aware tape backup software and also relate to performing automated, scheduled backups. Since the recommended method for Exchange Server backup and restore is on-line (the backup software communicates with the information store via an API), disaster-recovery scenarios for Exchange Server clusters can be more complex than for stand-alone Exchange servers. In addition, whether the server is local or remote to the backup device, backup software can add other complexities to clustered Exchange Server environments.

One scenario would be to provide the capability for both cluster nodes to perform local backup and restore operations. In this case, a backup device and software tool is installed on each cluster node. The benefit of this strategy is that backup performance is increased because the backup device is in the same server as the disks. In this scenario, whichever server is currently running the Exchange services can be backed up with a local agent or backup software tool such as NT Backup. However, since the backup software may not be cluster aware, errors will occur if the backup software attempts to back up an information store that is not on the local server (depending on which cluster node Exchange Server is currently running). Conversely, the backup software may not be configured to back up an information store that has been failed over from the other cluster node, resulting in a missed backup opportunity.

The other scenario is a LAN-based backup strategy. Here, the information store from the Exchange server in the cluster is backed up over the network via a backup agent (provided by the backup software vendor). The obvious drawback to this method is that performance may suffer due to the slower throughput capabilities of the network. The backup server could be one of the servers in the cluster, or it could be a separate server that accesses the cluster over the client LAN or dedicated disaster-recovery LAN. If the backup server were not a member of the cluster, a failure of this server would cause all data backup from the cluster to fail. In addition, since the remote backup software tool does not have knowledge of which node Exchange services are running on, the backup could fail because the target information store was not available. If the server in the cluster being backed up fails, the tape backup software temporarily halts. However, because the autoreconnect feature to the backup agent may vary by vendor, the backup software may or may not be able to reconnect and continue with the backup of the Exchange Server information store once the cluster fail over is complete. If the tape backup software is running on a cluster member and that server fails, the backup software agent may be configurable as a cluster group to fail over to the other server. This presents problems that cannot be overcome by noncluster-aware tape software. Since the backup software is unaware of the cluster, the only behavior that can be configured after a failure is for the backup software to switch to the partner server and restart the backup from the beginning. One of the drawbacks of non-cluster-aware tape software is that, if it is halted during a backup, it does not typically keep a log that can be used to restart the backup procedure in the middle of the backup; it only works from the very beginning. In order to provide the best disaster-recovery scenario for Exchange Server on MSCS, consult with your backup software vendor regarding support for MSCS and Exchange 2000/2003.

Exchange 2000/2003 offers greatly improved clustering capabilities over previous versions. Clustering is now a viable option for significantly increasing availability and for server consolidation. While the new capabilities are worth investigating, they also create additional complexity. Take care at every deployment phase—particularly in the planning and design phase. By starting with a solid understanding of how Microsoft Cluster is implemented and how Exchange 2000/2003 leverages this, you have a foundation on which to build a successful deployment. With the acceptance and understanding of the limits of the technology available, you can successfully deploy Exchange Server on MSCS and significantly increase system availability and protect against hardware failures. Clustering Exchange Server can help your organization meet its high-availability requirements by protecting servers from critical failures that could not be tolerated in a nonclustered scenario.