Server Cluster Architecture

Server clusters are based on a shared-nothing model of cluster architecture. This model refers to how servers in a cluster manage and use local and common cluster devices and resources.

Shared-Nothing Cluster

In the shared-nothing cluster, each server owns and manages its local devices. Devices common to the cluster, such as a common disk array and connection media, are selectively owned and managed by a single server at any given time.

The shared-nothing model makes it easier to manage disk devices and standard applications. This model does not require any special cabling or applications and enables server clusters to support standard Windows Server 2003 “based and Windows 2000 “based applications and disk resources.

Local Storage Devices and Media Connections

Server clusters use the standard Windows Server 2003 and Windows 2000 Server drivers for local storage devices and media connections. Server clusters support several connection media for the external common devices that need to be accessible by all servers in the cluster.

External storage devices that are common to the cluster require small computer system interface (SCSI) devices and support standard PCI-based SCSI connections as well as SCSI over Fibre Channel and SCSI bus with multiple initiators. Fiber connections are SCSI devices, simply hosted on a Fibre Channel bus instead of a SCSI bus. Conceptually, Fibre Channel technology encapsulates SCSI commands within the Fibre Channel and makes it possible to use the SCSI commands server clusters are designed to support. These SCSI commands are Reserve/Release and Bus Reset and will function the same over standard or nonfiber SCSI interconnect media.

Figure 13-1 illustrates components of a two-node server cluster that can comprise servers running either Windows Server 2003, Enterprise Edition, or Windows 2000 Enterprise Server with shared storage device connections using SCSI or SCSI over Fibre Channel.

Figure 13-1. This diagram shows a two-node server cluster running Windows Server 2003, Enterprise Edition.

graphics/f13xo01.jpg

Windows Server 2003, Datacenter Edition, supports from two to eight node clusters and does require device connections using Fibre Channel, as shown in Figure 13-2.

Figure 13-2. This diagram shows a four-node server cluster running Windows Server 2003, Datacenter Edition.

graphics/f13xo02.jpg

Virtual Servers

One of the benefits of clusters is that applications and services running on a server cluster can be exposed to users and workstations as virtual servers in the following circumstances:

Physical view.

To users and clients , connecting to an application or service running as a clustered virtual server appears to be the same process as connecting to a single physical server. In fact, the connection to a virtual server can be hosted by any node in the cluster. The user or client application will not know which node is actually hosting the virtual server. Services or applications that are not accessed by users or client applications can run on a cluster node without being managed as a virtual server. Multiple virtual servers representing multiple applications can be hosted in a cluster, as illustrated in Figure 13-3.

Figure 13-3. This diagram shows a physical view of virtual servers under server clusters.

Figure 13-3 illustrates a two-node cluster with four virtual servers; two virtual servers exist on each node. Server clusters manage the virtual server as a resource group, with each virtual server resource group containing two resources: an IP address and a network name that is mapped to the IP address.
Client view.

Application client connections to a virtual server are made by a client session that knows only the IP address that the cluster service publishes as the address of the virtual server. The client view is simply a view of individual network names and IP addresses. Using the example of a two-node cluster supporting four virtual servers, Figure 13-4 illustrates the client view of the cluster nodes and four virtual servers.

As shown in Figure 13-4, the client sees only the IP addresses and names and does not see information about the physical location of any of the virtual servers. This allows server clusters to provide highly available support for the applications running as virtual servers.

Figure 13-4. This diagram shows a client view of server cluster virtual servers.
Application or server failure.

In the event of an application or server failure, the cluster service moves the entire virtual server resource group to another node in the cluster. When such a failure occurs, the client will detect a failure in its session with the application and attempt to reconnect in exactly the same manner as the original connection. It will be able to do this successfully because the cluster service simply maps the published IP address of the virtual server to a surviving node in the cluster during recovery operations. The client session can reestablish the connection to the application without needing to know that the application is now physically hosted on a different node in the cluster.

Note that while this provides high availability of the application or service, session state information related to the failed client session is lost unless the application is designed or configured to store client session data on disk for retrieval during application recovery. Server clusters enable high availability but do not provide application fault tolerance unless the application itself supports fault-tolerant transaction behavior.
DHCP.

Microsoft DHCP service is a service that provides an example of an application that stores client data and can recover from failed client sessions. DHCP client IP address reservations are saved in the DHCP database. If the DHCP server resource fails, the DHCP database can be moved to an available node in the cluster and restarted with restored client data from the DHCP database.

Resources

A resource represents a physical object or an instance of running code: a disk, an IP address, an MSMQ queue, a COM object, and so on. From a management perspective, resources can be independently started and stopped and each one can be monitored to ensure that it is healthy . From the cluster service perspective, a resource can be in a number of states, as follows :

Off line.

The resource is shut down or out of service.
Started.

The resource is loaded into memory and is capable of being brought on line as required by the resource manager.
On line.

The resource is functioning correctly and is capable of servicing requests .
Failed.

The resource is no longer functional and could not be restarted.

Resources and Dependencies

As just described, an application actually consists of multiple pieces. Some pieces can be code, and others can be physical resources required by the application. The various pieces of an application are related in various ways: for example, an application that writes to a disk cannot come on line until the disk is available. If the disk fails, then, by definition, the application cannot continue to run because it writes to the disk.

Dependencies can be set up among resources that express various relationships among the resources. In Figure 13-5, the SQL resource has a start order dependency on a disk resource and a network name resource. A network name resource in turn has a dependency on an IP address resource. If the user attempts to bring the SQL resource on line when the IP address and network name resources are off line and the disk resource is on line, the IP address resource is brought on line first, followed by the network name resource, and finally the SQL resource is brought on line. Resources that have no dependencies among them, such as the network name and the disk in Figure 13-5, have no defined startup order. In these cases, the resources can be started in parallel.

Figure 13-5. This diagram shows dependencies among resources.

graphics/f13xo05.jpg

A resource group is a collection of one or more resources that are managed and monitored as a single unit. A resource group can be started or stopped. If a resource group is started, each resource in the group is started (taking into account any start order defined by the dependencies among resources in the group). If a resource group is stopped, all the resources in the group are stopped (taking into account any stop order resource dependencies). Dependencies among resources cannot span a group. In other words, the set of resources within a group is an autonomous unit that can be started and stopped independently of any other group. A group is a single indivisible unit, and in a cluster environment it cannot span the nodes of a cluster: it's restricted to a single node. In clusters that support failover applications, the group is the unit of failover. Figure 13-6 represents the SQL resources placed together in a group, called SQL Group.

Figure 13-6. This diagram shows resources making up a SQL Group.

graphics/f13xo06.jpg

Resource groups are logical collections of cluster resources. Typically, a resource group is made up of logically related resources such as applications and their associated peripherals and data. However, resource groups can contain cluster entities that are related only by administrative needs, such as an administrative collection of virtual server names and IP addresses. A resource group can be owned by only one node at a time, and individual resources within a group must exist on the node that currently owns the group. In any given instance, different servers in the cluster cannot own different resources in the same resource group.

Each resource group has an associated clusterwide policy that specifies the server on which the group prefers to run and the server to which the group should move in case of a failure. Each group also has a network service name and address to enable network clients to bind to the services provided by the resource group. In the event of a failure, resource groups can be failed over or moved as atomic units from the failed node to another available node in the cluster.

Each resource in a group might depend on other resources in the cluster. Dependencies are relationships among resources that indicate which resources need to be started and be available before another resource can be started. For example, a database application might depend on the availability of a disk, an IP address, and a network name to be able to start and provide services to other applications and clients. Resource dependencies are identified using cluster resource group properties and enable the Cluster service to control the order in which resources are brought on and off line. The scope of any identified dependency is limited to resources within the same resource group. Cluster-managed dependencies cannot extend beyond the resource group because resource groups can be brought on line and off line and moved independently.

Failover Policies

Failover is the mechanism that single-instance applications and the individual partitions of a partitioned application typically employ for high availability. (The term pack has been coined to describe a highly available single-instance application or partition.) In a two-node cluster, defining failover policies is trivial. If one node fails, the only option is to fail over to the remaining node. As the size of a cluster increases , different failover policies are possible, and each one has different characteristics:

Failover pairs.

In a large cluster, failover policies can be defined such that each application is set to fail over between two nodes. The simple example illustrated in Figure 13-7 shows two applications ”App1 and App2 ”in a four-node cluster.

Figure 13-7. This diagram illustrates failover with two applications in a four-node cluster.

This configuration has the following pros and cons:
- Good for clusters that are supporting heavyweight applications such as databases. This configuration ensures that in the event of failure, two applications will not be hosted on the same node.
- Very easy to plan capacity. Each node is sized based on the application that it will need to host (just like a two-node cluster hosting one application).
- Easy to determine effect of a node failure on availability and performance of the system.
- Provides the flexibility of a larger cluster. In the event that a node is taken out for maintenance, the buddy for a given application can be changed dynamically. (Standby policy might result. See the next main bulleted item for more details of standby policy.)
- In simple configurations such as this one, only 50 percent of the capacity of the cluster is in use.
- Administrator intervention might be required in the event of multiple failures.
  
  Server clusters support failover pairs on all versions of Windows by limiting the possible owner list for each resource to a given pair of nodes.
Hot standby server.

To reduce the overhead of failover pairs, the spare node for each pair can be consolidated into a single node. This provides a hot standby server that is capable of picking up the work in the event of a failure, as illustrated in Figure 13-8.

Figure 13-8. This diagram illustrates a failover configuration using a hot standby server.

The standby server configuration has the following pros and cons:
- Good for clusters that are supporting heavyweight applications such as databases. This configuration ensures that in the event of a single failure, two applications will not be hosted on the same node.
- Easy to plan capacity. Each node is sized based on the application that it will need to host; the spare is sized to be the maximum of the other nodes.
- Easy to determine effect of a node failure on availability and performance of the system.
- Configuration targeted toward a single point of failure.
- Doesn't really handle multiple failures well. This might be an issue during scheduled maintenance, when the spare might be in use.
Windows Clustering supports standby servers today using a combination of the possible owners list and the preferred owners list. The preferred node should be set to the node on which the application will run by default, and the possible owners for a given resource should be set to the preferred node and the spare node.
N+I.

Standby server works well for four-node clusters in some configurations; however, its ability to handle multiple failures is limited. N+I configurations are an extension of the standby server concept, according to which N nodes host applications and I nodes are spare, as illustrated in Figure 13-9.

Figure 13-9. This diagram illustrates an N+I spare node configuration.

An N+I configuration has the following pros and cons:
- Good for clusters that are supporting heavyweight applications such as databases or Microsoft Exchange. This configuration ensures that in the event of a failure an application instance will fail over to a spare node, not one that is already in use.
- Easy to plan capacity. Each node is sized based on the application that it will need to host.
- Easy to determine effect of a node failure on availability and performance of the system.
- Works well for multiple failures.
- Does not really handle multiple applications running in the same cluster well. This policy is best suited to applications running on a dedicated cluster.
Windows Clustering supports N+I scenarios in Windows Server 2003 using a cluster group public property, AntiAffinityClassName . This property can contain an arbitrary string of characters . In the event of a failover, if a group being failed over has a nonempty string in the AntiAffinityClassName property, the failover manager will check all other nodes.

If any nodes (in the possible owners list for the resource) are not hosting a group with the same value in AntiAffinityClassName , those nodes are considered a good target for failover. If all nodes in the cluster are hosting groups that contain the same value in the AntiAffinityClassName property, the preferred node list is used to select a failover target.
Failover ring.

Failover rings allow each node in the cluster to run an application instance. In the event of a failure, the application on the failed node is moved to the next node in sequence, as shown in Figure 13-10.

Figure 13-10. This diagram illustrates a failover configuration using a failover ring.

This configuration has the following pros and cons:
- Good for clusters that are supporting several small application instances wherein the capacity of any node is large enough to support several at the same time.
- Easy to predict effect on performance of a node failure.
- Easy to plan capacity for a single failure.
- Does not work well for all cases of multiple failures. If node 1 fails, node 2 will host two application instances and nodes 3 and 4 will each host one application instance. If node 2 then fails, node 3 will be hosting three application instances and node 4 will be hosting one instance.
- Not well suited to heavyweight applications because multiple instances might end up being hosted on the same node, even if there are lightly loaded nodes.
Failover rings are supported by server clusters using Windows Server 2003. This is done by defining the order of failover for a given group using the preferred owner list. A node order should be chosen , and then the preferred node list should be set up with each group starting at a different node.
Random.

In large clusters or even four-node clusters that are running several applications, defining specific failover targets or policies for each application instance can be extremely cumbersome and error prone. The best policy in some cases is to allow the target to be chosen at random, with a statistical probability that this will spread the load around the cluster in the event of a failure.

A random failover policy has the following pros and cons:
- Good for clusters that are supporting several small application instances wherein the capacity of any node is large enough to support several at the same time.
- Does not require an administrator to decide where any given application should fail over to.
- As long as there are sufficient applications or the applications are partitioned finely enough, provides a good mechanism to statistically load-balance the applications across the cluster in the event of a failure.
- Works well for multiple failures.
- Well tuned to handling multiple applications or many instances of the same application running in the same cluster.
- Can be difficult to plan capacity. There is no real guarantee that the load will be balanced across the cluster.
- Not easy to predict effect on performance of a node failure.
- Not well suited to heavyweight applications because multiple instances might end up being hosted on the same node, even if there are lightly loaded nodes.
Windows Clustering in Windows Server 2003 randomizes the failover target in the event of node failure. Each resource group that has an empty preferred owners list will be failed over to a random node in the cluster in the event that the node currently hosting it fails.
Customized control.

In some cases, specific nodes might be preferred for a given application instance. A configuration that ties applications to nodes has the following pros and cons:
- Administrator has full control over what happens when a failure occurs.
- Capacity planning is easy since failure scenarios are predictable.
- With many applications running in a cluster, defining a good policy for failures can be extremely complex.
- It's very hard to plan for multiple cascaded failures.

Preferred Node List

Windows Clustering provides full control over the order of failover by using the preferred node list feature. The full semantics of the preferred node list can be defined as shown in Table 13-1.

Table 13-1. Preferred Node List

Preferred Node List Item	Move Group to Best Possible Initiated via Administrator	Failover Resulting from Node or Group Failure
Contains all nodes in cluster	Group is moved to highest node in preferred node list that is up and running in the cluster.	Group is moved to the next node on the preferred node list.
Contains a subset of the nodes in the cluster	Group is moved to highest node in preferred node list that is up and running in the cluster. If no nodes in the preferred node list are up and running, the group is moved to a random node.	Group is moved to the next node on the preferred node list. If the node that was hosting the group is the last on the list or was not in the preferred node list, the group is moved to a random node.
Empty	Group is moved to a random node.	Group is moved to a random node.

Preferred Node List Item

Move Group to Best Possible Initiated via Administrator

Failover Resulting from Node or Group Failure

Contains all nodes in cluster

Group is moved to highest node in preferred node list that is up and running in the cluster.

Group is moved to the next node on the preferred node list.

Contains a subset of the nodes in the cluster

Group is moved to highest node in preferred node list that is up and running in the cluster.

If no nodes in the preferred node list are up and running, the group is moved to a random node.

Group is moved to the next node on the preferred node list.

If the node that was hosting the group is the last on the list or was not in the preferred node list, the group is moved to a random node.

Empty

Group is moved to a random node.

Top

Shared-Nothing Cluster

Local Storage Devices and Media Connections

Figure 13-1. This diagram shows a two-node server cluster running Windows Server 2003, Enterprise Edition.

Figure 13-2. This diagram shows a four-node server cluster running Windows Server 2003, Datacenter Edition.

Virtual Servers

Figure 13-3. This diagram shows a physical view of virtual servers under server clusters.

Figure 13-4. This diagram shows a client view of server cluster virtual servers.

Resources

Resources and Dependencies

Figure 13-5. This diagram shows dependencies among resources.

Figure 13-6. This diagram shows resources making up a SQL Group.

Failover Policies

Figure 13-7. This diagram illustrates failover with two applications in a four-node cluster.

Figure 13-8. This diagram illustrates a failover configuration using a hot standby server.

Figure 13-9. This diagram illustrates an N+I spare node configuration.

Figure 13-10. This diagram illustrates a failover configuration using a failover ring.

Preferred Node List

Table 13-1. Preferred Node List