As emphasized in Lesson 1, only one node can gain access to a particular disk at any given time. This prevents running a particular virtual Exchange 2000 server on more than one node concurrently. Furthermore, the Windows 2000 Cluster service does not support the moving of running applications between nodes; hence, during a failover, clients will lose their connections and need to reconnect. Nevertheless, Microsoft Outlook 2000 and Exchange 2000 Server are designed to overcome these shortcomings in a smart way. For example, you don't need to restart Outlook 2000 to reconnect. A simple switch to another folder in your mailbox (let's say from Inbox to Contacts) does the job. Exchange 2000 Enterprise Server, again, supports multiple storage groups in the information store, which is the basis of static load balancing in clustered Exchange 2000 systems.
This lesson focuses on the optimal configuration of clustered systems by means of load-balancing mechanisms. Load balancing allows you to run similar services on multiple nodes, thus making better use of the available hardware than suggested in Figure 7.3.
At the end of this lesson, you will be able to:
Estimated time to complete this lesson: 45 minutes
To best utilize the hardware resources available in a cluster, most organizations implement combined application servers that provide more than one kind of client/server services to their users (see Figure 7.5). If one node fails, its application instances (represented as virtual server resource groups) are moved to one of the remaining nodes. This may reduce the performance of this node somewhat, but the cluster quickly continues to provide the complete set of application services, which is probably more important than a temporary performance decrease.
Figure 7.5 An example of a clustered multiapplication server
It may be desirable to dedicate individual clusters to one application type. For instance, you might want to configure a two-node cluster for Microsoft SQL 2000 Server and another four-node cluster for Exchange 2000 Server only. In this case, you need to configure multiple virtual servers of the same type per cluster and distribute them across the nodes, thus providing static load balancing. This configuration is often referred to as an active/active configuration. Keep in mind that each virtual server requires access to its own disk resources, meaning one or more dedicated sets of physical disks.
NOTE
Exchange 2000 Enterprise Server supports active/active clustering.
To maximize the use of all available servers in a cluster while maintaining a failover solution, you can configure multiple virtual servers and distribute them across your nodes. Virtual servers are resource groups. Resource groups contain resources, such as an IP address, network name, and a disk. However, only one resource group and therefore one virtual server can own a resource. In other words, if you want to configure four virtual servers in a four-node cluster, you will need four separate physical disks. Because the cluster server requires access to the quorum disk, you can configure n - 1 virtual Exchange 2000 servers (see Figure 7.6), where n represents the number of physical disks on the shared storage.
Figure 7.6 A four-node cluster with three virtual Exchange 2000 servers
Tip
Theoretically, you can configure one virtual Exchange 2000 server per physical disk, including the disk containing the quorum resource. However, Microsoft does not recommend adding Exchange 2000 Server services to the virtual server representing the cluster (that is, owning the quorum disk). Defining dedicated virtual servers for Exchange 2000 simplifies service maintenance, such as taking a virtual Exchange 2000 system offline.
The configuration shown in Figure 7.6 corresponds to a fully loaded system with one hot spare node. When all nodes are online, Node D does not own a virtual Exchange 2000 server. This is the hot spare assuming the role of an Exchange 2000 server in case another node in the cluster fails or is unavailable for maintenance reasons. Provided that you don't run other applications on the hot spare, such as SQL 2000 Server, a single node failure will not affect the system performance.
The hot spare configuration bears the disadvantage of an idle server system when every node is operational. If you don't want to invest in this extra server, configure a three-node cluster with four disks, as shown in Figure 7.7 (or add a fifth disk to your four-node cluster). The disadvantage of a full-load cluster is that users might experience performance losses if one node has to take over the load of a failed member. Hence, it is a good idea to operate all nodes in the cluster at less than their maximum capabilities.
Figure 7.7 A full-load active cluster with three virtual Exchange 2000 servers
To avoid measurable performance losses, you would have to operate the nodes below the following theoretical limits:
However, in most cases it will be acceptable to operate the nodes of a full-load cluster at their maximum capabilities because node failures or failovers due to maintenance should seldom occur and temporary performance losses are usually not critical. Note that central processing unit (CPU) utilization is not a perfect measure of the node's power; a node will begin to experience performance degradation if operating above 70% peak CPU utilization for an extended period of time.
When designing your clustered Exchange 2000 environment, keep in mind that limitations apply to your virtual servers. Several Exchange 2000 Server components are not supported in a cluster, and others can only run in an active/passive configuration. The Message Transfer Agent (MTA), for instance, cannot run on more than one node in the cluster, implicitly enforcing an active/passive configuration. The same restriction applies to the Chat Service.
The Exchange 2000 components support the following cluster configurations:
Important
Failover and failback are cluster-specific procedures to move resource groups (with all their associated resources) between nodes. Failover is transferring resource groups from a decommissioned or failed node to another node in the cluster that is available. Failback describes the process of moving the resource groups back when the node that was offline is online again (see Figure 7.8).
Figure 7.8 Failover and failback of virtual Exchange 2000 servers
A failover can occur in two situations: Either you trigger it manually for maintenance reasons or the Cluster service initiates it automatically in case of a resource failure on the node owning the resource. If a resource fails, the Resource Manager first attempts a resource restart on the local node. If this does not correct the problem, the Resource Manager will take the resource group offline along with its dependent resources and inform the Failover Manager that the affected group should be moved to another node and restarted there.
The Failover Manager is now responsible for deciding where to move the resource group. It communicates with its counterparts on the remaining active nodes to arbitrate the ownership of the resource group. This arbitration relies on the node preference list that you can specify when creating resources in Cluster Administrator. The arbitration can also take into account other factors such as the capabilities of each node, the current load, and application information. After a new node is determined for the resource group, all nodes update their cluster databases to track which node owns the resource group. At this point, the new owner of the resource group turns control of the resources within the resource group over to its Resource Manager. If multiple resource groups are affected, for instance, because of a total node failure, the process is repeated for all of these groups.
If you have specified a preferred owner for a resource group, and this node comes back online again, the Failover Manager will fail back the resource group to the recovered or restarted node. Cluster service provides protection against continuous resource failures resulting from repeating failback to nodes that have not been correctly recovered by limiting the number of times the failback is attempted. Likewise, you can configure specific hours of the day during which the failback of a group is prohibited, for instance, at peak business hours.
By default, resource groups are set not to fail back automatically when the original node is recovered. Without manual configuration of a failback policy, groups continue to run on the alternate node after the failed node comes back online.