Lesson 2: Load Planning and Failover Strategies | MCSE Training Kit Exam 70-224(c) Microsoft Exchange 2000 Server Implementation and Administration

As emphasized in Lesson 1, only one node can gain access to a particular disk at any given time. This prevents running a particular virtual Exchange 2000 server on more than one node concurrently. Furthermore, the Windows 2000 Cluster service does not support the moving of running applications between nodes; hence, during a failover, clients will lose their connections and need to reconnect. Nevertheless, Microsoft Outlook 2000 and Exchange 2000 Server are designed to overcome these shortcomings in a smart way. For example, you don't need to restart Outlook 2000 to reconnect. A simple switch to another folder in your mailbox (let's say from Inbox to Contacts) does the job. Exchange 2000 Enterprise Server, again, supports multiple storage groups in the information store, which is the basis of static load balancing in clustered Exchange 2000 systems.

This lesson focuses on the optimal configuration of clustered systems by means of load-balancing mechanisms. Load balancing allows you to run similar services on multiple nodes, thus making better use of the available hardware than suggested in Figure 7.3.

At the end of this lesson, you will be able to:

Describe how to distribute the workload of Outlook 2000 clients across multiple nodes in a single cluster
Explain the processes of failover and failback

Estimated time to complete this lesson: 45 minutes

Load-Balancing Clusters

To best utilize the hardware resources available in a cluster, most organizations implement combined application servers that provide more than one kind of client/server services to their users (see Figure 7.5). If one node fails, its application instances (represented as virtual server resource groups) are moved to one of the remaining nodes. This may reduce the performance of this node somewhat, but the cluster quickly continues to provide the complete set of application services, which is probably more important than a temporary performance decrease.

click to view at full size

Figure 7.5 An example of a clustered multiapplication server

Active/Active Clustering

It may be desirable to dedicate individual clusters to one application type. For instance, you might want to configure a two-node cluster for Microsoft SQL 2000 Server and another four-node cluster for Exchange 2000 Server only. In this case, you need to configure multiple virtual servers of the same type per cluster and distribute them across the nodes, thus providing static load balancing. This configuration is often referred to as an active/active configuration. Keep in mind that each virtual server requires access to its own disk resources, meaning one or more dedicated sets of physical disks.

NOTE
Exchange 2000 Enterprise Server supports active/active clustering.

To maximize the use of all available servers in a cluster while maintaining a failover solution, you can configure multiple virtual servers and distribute them across your nodes. Virtual servers are resource groups. Resource groups contain resources, such as an IP address, network name, and a disk. However, only one resource group and therefore one virtual server can own a resource. In other words, if you want to configure four virtual servers in a four-node cluster, you will need four separate physical disks. Because the cluster server requires access to the quorum disk, you can configure n - 1 virtual Exchange 2000 servers (see Figure 7.6), where n represents the number of physical disks on the shared storage.

click to view at full size

Figure 7.6 A four-node cluster with three virtual Exchange 2000 servers

Tip
Theoretically, you can configure one virtual Exchange 2000 server per physical disk, including the disk containing the quorum resource. However, Microsoft does not recommend adding Exchange 2000 Server services to the virtual server representing the cluster (that is, owning the quorum disk). Defining dedicated virtual servers for Exchange 2000 simplifies service maintenance, such as taking a virtual Exchange 2000 system offline.

Full Load with Hot Spare

The configuration shown in Figure 7.6 corresponds to a fully loaded system with one hot spare node. When all nodes are online, Node D does not own a virtual Exchange 2000 server. This is the hot spare assuming the role of an Exchange 2000 server in case another node in the cluster fails or is unavailable for maintenance reasons. Provided that you don't run other applications on the hot spare, such as SQL 2000 Server, a single node failure will not affect the system performance.

Full-Load Active Nodes

The hot spare configuration bears the disadvantage of an idle server system when every node is operational. If you don't want to invest in this extra server, configure a three-node cluster with four disks, as shown in Figure 7.7 (or add a fifth disk to your four-node cluster). The disadvantage of a full-load cluster is that users might experience performance losses if one node has to take over the load of a failed member. Hence, it is a good idea to operate all nodes in the cluster at less than their maximum capabilities.

click to view at full size

Figure 7.7 A full-load active cluster with three virtual Exchange 2000 servers

To avoid measurable performance losses, you would have to operate the nodes below the following theoretical limits:

Two-node clusters. 50% of the node's power (which is as good as configuring an active/passive cluster and using 100% of the active node's power)
Three-node clusters. 66% of the node's power
Four-node clusters. 75% of the node's power

However, in most cases it will be acceptable to operate the nodes of a full-load cluster at their maximum capabilities because node failures or failovers due to maintenance should seldom occur and temporary performance losses are usually not critical. Note that central processing unit (CPU) utilization is not a perfect measure of the node's power; a node will begin to experience performance degradation if operating above 70% peak CPU utilization for an extended period of time.

Exchange 2000 and Virtual Server Limitations

When designing your clustered Exchange 2000 environment, keep in mind that limitations apply to your virtual servers. Several Exchange 2000 Server components are not supported in a cluster, and others can only run in an active/passive configuration. The Message Transfer Agent (MTA), for instance, cannot run on more than one node in the cluster, implicitly enforcing an active/passive configuration. The same restriction applies to the Chat Service.

The Exchange 2000 components support the following cluster configurations:

Chat. Active/passive
Full-text indexing. Active/active
HTTP. Active/active
IMAP. Active/active
Information Store. Active/active
Instant Messaging. Not supported
Key Management Service (KMS). Not supported
MTA. Active/passive
Network News Transfer Protocol (NNTP). Not supported
POP3. Active/active
Simple Mail Transfer Protocol (SMTP). Active/active
System Attendant (SA). Active/active

Important

Exchange 2000 Server supports only one public store in a cluster. When adding an additional virtual server, make sure you delete the public store in the new resource group before bringing the virtual server online.

Failover and Failback Procedures

Failover and failback are cluster-specific procedures to move resource groups (with all their associated resources) between nodes. Failover is transferring resource groups from a decommissioned or failed node to another node in the cluster that is available. Failback describes the process of moving the resource groups back when the node that was offline is online again (see Figure 7.8).

click to view at full size

Figure 7.8 Failover and failback of virtual Exchange 2000 servers

Failover

A failover can occur in two situations: Either you trigger it manually for maintenance reasons or the Cluster service initiates it automatically in case of a resource failure on the node owning the resource. If a resource fails, the Resource Manager first attempts a resource restart on the local node. If this does not correct the problem, the Resource Manager will take the resource group offline along with its dependent resources and inform the Failover Manager that the affected group should be moved to another node and restarted there.

The Failover Manager is now responsible for deciding where to move the resource group. It communicates with its counterparts on the remaining active nodes to arbitrate the ownership of the resource group. This arbitration relies on the node preference list that you can specify when creating resources in Cluster Administrator. The arbitration can also take into account other factors such as the capabilities of each node, the current load, and application information. After a new node is determined for the resource group, all nodes update their cluster databases to track which node owns the resource group. At this point, the new owner of the resource group turns control of the resources within the resource group over to its Resource Manager. If multiple resource groups are affected, for instance, because of a total node failure, the process is repeated for all of these groups.

Failback

If you have specified a preferred owner for a resource group, and this node comes back online again, the Failover Manager will fail back the resource group to the recovered or restarted node. Cluster service provides protection against continuous resource failures resulting from repeating failback to nodes that have not been correctly recovered by limiting the number of times the failback is attempted. Likewise, you can configure specific hours of the day during which the failback of a group is prohibited, for instance, at peak business hours.

By default, resource groups are set not to fail back automatically when the original node is recovered. Without manual configuration of a failback policy, groups continue to run on the alternate node after the failed node comes back online.