Server Clusters

[Previous] [Next]

A server cluster is a group of independent nodes that work together as a single system. They share a common cluster database that enables recovery in the event of the failure of any node. A server cluster uses a jointly connected resource, generally a disk array on a shared SCSI bus, which is available to all nodes in the cluster. Each Windows 2000 Advanced Server node in the cluster must have access to the array, and each node in the cluster must be able to communicate at all times with the other nodes in the cluster.

Windows 2000 supports server clusters only on machines running Advanced Server. Additionally, as shipped, it supports only two-node clusters using a shared disk resource, via either Fibre Channel or a shared SCSI bus. Both nodes of the cluster must be running TCP/IP for networking and should have at least one dedicated network interconnect available. To avoid a single point of failure, a second network interconnect is highly recommended.

Server Cluster Concepts

To understand and implement server clusters, it is important to understand several new concepts and their ramifications, as well as specialized meanings for certain terms.

Networks (Interconnects)

A cluster has two distinct types of networks: the private network that's used to maintain communications between nodes in the cluster and the public network that clients of the cluster use to connect to the services of the cluster. Each of these networks can share the same network card and physical network cabling, but it is a good practice to keep them separate. Having them separate gives you an alternate path for interconnection between the nodes of the cluster. Since the interconnect between the nodes of a cluster is a potential single point of failure, it should always be redundant. The cluster service will use all available networks, both private and public, to maintain communications between nodes.

In the event of a failure of communications between nodes of the cluster, the nodes are partitioned and each node attempts to gain control of the quorum resource (discussed later under "Types of Resources") and thus the shared disk. One node will shut down, while the other will attempt to maintain the processes of the cluster. However, since there is no guarantee that the node with a working network card will be the one that gains control, it is possible that all services from the cluster will be unavailable.

Nodes

A node is a member of a server cluster. It must be running Windows 2000 Advanced Server and Windows Clustering. It must also be running TCP/IP, must be connected to the shared cluster storage device, and must have at least one network interconnect to the other nodes in the cluster.

Groups

Groups are the units of failover. Each group contains one or more resources. Should any of the resources within the group fail, all will fail over together according to the failover policy defined for the group. A group can be owned by only one node at a time. All resources within the group run on the same node. If a resource within the group fails and must be moved to an alternate node, all other resources in that group must be moved as well. When the cause of failure on the originating node is resolved, the group will fall back to its original location, based on the failback policy for the group.

Resources

Any physical or logical entity that can be brought online or offline can be a server cluster resource. It must be able to be owned by only one node at a time and will be managed as part of the cluster. The quorum resource is a special resource. It is the repository of the configuration data of the cluster and the recovery logs that allow recovery of the cluster in the event of a failure. The quorum resource must be able to be controlled by a single node, it must provide physical storage for the recovery logs and cluster database, and it must use the NTFS file system. The only resource type supported for a quorum resource is the Physical Disk resource as shipped with Windows 2000 (this and other resource types are described in the next section), but it is possible that other quorum resource types will be developed and certified by third parties.

Types of Resources

Windows 2000 Advanced Server includes several different resource types; the sections that follow examine each of these resource types and the role they play in a server cluster. The available cluster resource types are

  • Physical Disk
  • DHCP
  • WINS
  • Print Spooler
  • File Share
  • Internet Protocol
  • Network Name
  • Generic Application
  • Generic Service

Physical Disk

The Physical Disk resource type is the central resource type required as a minimum for all server clusters. It is used for the quorum resource that controls what node in the cluster is in control of all other resources. The Physical Disk resource type is used to manage a shared cluster storage device. It has the same drive letter on all cluster servers.

DHCP and WINS

The DHCP service provides IP addresses and various other TCP/IP settings to clients, while the WINS service provides dynamic resolution of NetBIOS names to IP addresses. Both can be run as a resource of the cluster, providing for high availability of these critical services to network clients. In order for failover to work correctly, the DHCP and WINS databases must reside on the shared cluster storage.

Print Spooler

The Print Spooler resource type lets you cluster print services, making them fault tolerant and saving a tremendous number of help desk calls when the print server fails. It will also avoid the problem of people simply clicking the Print button over and over when there's a problem, resulting in a very long and repetitious print queue.

In order to be clustered, a printer must be connected to the server via the network. Obviously, you can't connect the printer to a local port such as a parallel or USB port directly attached to one of the nodes of the cluster. The client can address the printer either by name or by IP address, just as it would a nonclustered printer on the network.

In the event of a failover, all jobs that are currently spooled to the printer are restarted. Jobs that are in the process of spooling from the client are discarded.

File Share

You can use a server cluster to provide a high-availability file server using the File Share resource type. The File Share resource type lets you manage your shared file systems in three different ways:

  • As a standard file share with only the top-level folder visible as a share name.
  • As shared subdirectories, where the top-level folder and each of its immediate subfolders are shared with separate names. This makes it extremely easy to manage users' home directories, for example.
  • As a stand-alone DFS root. You cannot, however, use a cluster server File Share resource as part of a fault-tolerant DFS root.

Internet Protocol and Network Name

The Internet Protocol resource type is used to manage the IP addresses of the cluster. When an Internet Protocol resource is combined with a Network Name resource and one or more applications, you can create a virtual server. Virtual servers allow clients to continue to use the same name to access the cluster even after a failover has occurred. No client-side management is required, since to the client the virtual server is unchanged.

Generic Application

The Generic Application resource type allows you to manage regular, cluster-unaware applications in the cluster. A cluster-unaware application that is to be used in a cluster must, as a minimum,

  • Be able to store its data in a configurable location
  • Use TCP/IP to connect to clients
  • Have clients that can reconnect in the event of an intermittent network failure

When you install a generic, cluster-unaware application, you have two choices: you can install it onto the shared cluster storage, or you can install it individually on each node of the cluster. The first method is certainly easier, since you install the application only once for the whole cluster. However, if you use this method you won't be able to perform a rolling upgrade of the application, since it appears only once. (A rolling upgrade is an upgrade of the application in which the workload is moved to one server while the application on the other server is upgraded and then the roles are reversed to upgrade the first server.)

To give yourself the ability to perform rolling upgrades on the application, you need to install a copy onto each node of the cluster. You will need to place it in the same folder and path on each node. This method uses more disk space than installing onto the shared cluster storage, but it permits you to perform rolling upgrades, upgrading each node of the cluster separately.

Generic Service

Finally, server clusters support one additional type of resource—the Generic Service resource. This is the most basic resource type, but it does allow you to manage your Windows 2000 services as a cluster resource.

Defining Failover and Failback

Windows 2000 Server clusters allow you to define the failover and failback policies for each group or virtual server. This ability enables you to tune the exact behavior of each application or group of applications to balance the need for high availability against the overall resources available to the cluster in a failure situation. Also, when the failed node becomes available again, your failback policy will determine whether the failed resource is immediately returned to the restored node, is maintained at the failed-over node, or migrates back to the restored node at some predetermined point in the future. These options allow you to plan for the disruption caused when a shift in node ownership occurs, limiting the impact by timing it for off-hours.

Configuring a Server Cluster

When planning your server cluster, you'll need to think ahead to what your goal is for the cluster and what you can reasonably expect from it. Server clusters provide for extremely high availability and resource load balancing, but you need to make sure your hardware, applications, and policies are appropriate.

High Availability with Load Balancing

You can configure your cluster with static load balancing, in which some applications run on one node while others run on another node. If one node fails, the applications or resources on the failed node will fail over to run on the other node, providing high availability of your resources while balancing the load across the cluster. In the event of failure, you will have a reduced load capacity, and you should implement procedures either to limit the load by reducing performance or availability, or to not provide some less-critical services during a failure.

Maximum Availability Without Load Balancing

By configuring one node as a "hot spare," you can provide maximum availability for critical applications. This scenario requires that your server nodes be sufficiently powerful to run the entire load of the cluster by themselves, and it certainly has the greatest hardware cost. But if one node fails, the other node takes over all processing for the cluster, and there is no reduced capacity for the applications.

Partial Failover (Load Shedding)

You can configure your cluster so that critical applications are protected in a failure situation but noncritical ones simply run as though they were on a stand-alone server. If the server on which they are running fails, the noncritical applications or resources are unavailable until the node is recovered. The critical applications on the node, however, are set to fail over to the other node of the cluster. You may even have applications on the remaining node that are set to shut down if the other node fails, allowing you to maintain a high level of performance and availability of your most critical applications while shedding the load from less critical applications and services when necessary. This strategy can be very effective when you must, for example, service certain critical applications or users under any and all circumstances but can allow other applications and users with a lower priority to temporarily fail.

Virtual Server Only

You can create a server cluster that has only a single node, which allows you to take advantage of the virtual server concept to simplify the management and look of the resources on your network. Having a single node doesn't give you any additional protection against failure or any additional load balancing over that provided by simply running a single stand-alone server, but it allows you to easily manage groups of resources as a virtual server.

This scenario is an effective way to stage an implementation. You create the initial virtual server, putting your most important resources on it in a limited fashion. Then, when you're ready, you add an additional node to the server cluster and define your failover and failback policies, giving you a high-availability environment with minimal disruption to your user community. In this scenario, you can space hardware purchases over a longer period while providing services in a controlled test environment.

Planning the Capacity of a Server Cluster

Capacity planning for a server cluster can be a complicated process. You need to thoroughly understand the applications that will be running on your cluster and unnecessary make some hard decisions about exactly which applications you can live without and which ones must be maintained under all circumstances. You'll also need a clear understanding of the interdependencies of the resources and applications you'll be supporting.

The first step is to quantify your groups or virtual servers. Make a comprehensive list of all applications in your environment, and then determine which ones will need to fail over and which ones can be allowed to fail but still should be run on a virtual server.

Next determine the dependencies of these applications and what resources they need in order to function. This information allows you to group dependent applications and resources in the same group or virtual server. Keep in mind that a resource can't span groups, so if multiple applications depend on a resource, such as a Web server, they must all reside in the same group or on the same virtual server as the Web server and thus will share the same failover and failback policies.

A useful mechanism for getting a handle on your dependencies is to list all your applications and resources and draw a dependency tree for each major application or resource. This will help you visualize not only the resources that your application is directly dependent on, but also the second-hand and third-hand dependencies that might not be obvious at first glance. For example, a cluster that is used as a high-availability file server uses the File Share resource. And it makes perfect sense that this File Share resource is dependent on the Physical Disk resource. It's also dependent on the Network Name resource. However, the Network Name resource is dependent on the IP resource. Thus, although the File Share resource isn't directly dependent on the IP resource, when you draw the dependency tree you will see that they all need to reside in the same group or on the same virtual server. Figure 15-2 illustrates this dependency tree.

click to view at full size.

Figure 15-2. The dependency tree for a File Share resource.

Finally, as you're determining your cluster capacity, you need to plan for the effect of a failover. Each server must have sufficient capacity to handle the additional load imposed on it when a node fails and it is required to run the applications or resources that were owned by the failed node.

The disk capacity for the shared cluster storage must be sufficient to handle all the applications that will be running in the cluster as well as to provide the storage that the cluster itself requires for the quorum resource. Be sure to provide enough RAM and CPU capacity on each node of the cluster so that the failure of one node won't overload the other node to the point that it too fails. This possibility can also be managed to some extent by determining your real service requirements for different applications and user communities and reducing the performance or capacity of those that are less essential during a failure. However, such planned load shedding may not be sufficient and frequently takes a significant amount of time to be accomplished, so give yourself some margin to handle that initial surge during failover.



Microsoft Windows 2000 Server Administrator's Companion, Vol. 1
Microsoft Windows 2000 Server Administrators Companion (IT-Administrators Companion)
ISBN: 1572318198
EAN: 2147483647
Year: 2000
Pages: 366

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net