Making Server Clustering Part of Your High-Availability Plan


EXAM 70-293 OBJECTIVE 4.11

Certain circumstances require an application to be operational more consistently than standard hardware would allow. Databases and mail servers often have this need. What if it were possible to have more than one server ready to run the critical application? What if there were a software component that automatically managed the operation of the application so that, if one server experienced a failure, another server would automatically take over and keep the application running? Such a technology exists, and it’s called server clustering.

The basic idea of server clustering has been around for many years on other computing platforms. Microsoft initially released its server cluster technology as part of Windows NT 4.0 Enterprise Edition. It supported two nodes and a limited number of applications. Server clustering was further refined with the release of Windows 2000 Advanced and Datacenter Server Editions. Server clusters were simpler to create, and more applications were available. In addition, some publishers began to make their applications “cluster-aware,” so that their applications installed and operated more easily on a server cluster. Now with the release of Windows Server 2003, we see another level of improvement on the server clustering technology. Server clusters now support much larger clusters and more robust configurations. Server clusters are easier to create and manage. Features that were available only in the Datacenter Edition of Windows 2000 have now been made available in the Enterprise Edition of Windows Server 2003.

Terminology and Concepts

Although it has been used previously, a more formal definition of a server cluster is needed. For our purposes, a server cluster is a group of independent servers that work together to increase application availability to client systems and appear to clients under one common name. The independent servers that make up a server cluster are individually called nodes. Nodes in a server cluster monitor each other’s status through a communication mechanism called a heartbeat. The heartbeat is a series of messages that allow the server cluster nodes to detect communication failures and, if necessary, perform a failover operation. A failover is the process by which resources are stopped on one node and started on another.

Cluster Nodes

A server cluster node is an independent server. This server must be running Windows 2000 Advanced Server, Windows 2000 Datacenter Server, Windows Server 2003 Enterprise Edition, or Windows Server 2003 Datacenter Edition. The two editions of Windows Server 2003 cannot be used in the same server cluster, but either can exist in a server cluster with a Windows 2000 Advanced Server node. Since Windows Server 2003 Datacenter Edition is available only through original equipment manufacturers (OEMs), this chapter deals with server clusters constructed with the Advanced Server Edition of Windows Server 2003 unless specifically stated otherwise.

A server cluster node should be a robust system. When designing your server cluster, do not overlook applying fault-tolerant concepts to the individual nodes. Using individual fault-tolerant components to build fault-tolerant nodes to build fault-tolerant server clusters can be described as “fault tolerance in depth.” This approach will increase overall reliability and make your life easier.

A server cluster consists of anywhere between one and eight nodes. These nodes do not necessarily need to have identical configurations, although that is a frequent design element. Each node in a server cluster can be configured to have a primary role that is different from the other nodes in the server cluster. This allows you to have better overall utilization of the server cluster if each node is actively providing services. A node is connected to one or more storage devices, which contain disks that house information about the server cluster. Each node also contains one or more separate network interfaces that provide client communications and support heartbeat communications.

Cluster Groups

The smallest unit of service that a server cluster can provide is a resource. A resource is a physical or logical component that can be managed on an individual basis and can be independently activated or deactivated (called bringing the resource online or offline). A resource can be owned by only one node at a time.

There are several predefined (called “standard”) types of resources known to Windows Server 2003. Each type is used for a specific purpose. The following are some of the most common standard resource types:

  • Physical Disk Represents and manages disks present on a shared cluster storage device. Can be partitioned like a regular disk. Can be assigned a drive letter or used as an NTFS mounted drive.

  • IP Address Manages an IP address.

  • Network Name Manages a unique NetBIOS name on the network, separate from the NetBIOS name of the node on which the resource is running.

  • Generic Service Manages a Windows operating system service as a cluster resource. Helps ensure that the service operates in one place at one time.

  • Generic Script Manages a script as a cluster resource (new to Windows Server 2003).

  • File Share Creates and manages a Windows file share as a cluster resource.

Other standard resource types allow you to manage clustered print servers, Dynamic Host Configuration Protocol (DHCP) servers, Windows Internet Name Service (WINS) servers, and generic noncluster-aware applications. (It is also possible to create new resource types through the use of dynamic link library files.)

Individual resources are combined to form cluster groups. A cluster group is a collection of server resources that defines the relationships of resource within the group to each other and defines the unit of failover, so that if one resource moves between nodes, all resources in the group also move. As with individual resources, a cluster group can be owned by only one node at a time. To use an analogy from chemistry, resources are atoms and groups are compounds. The cluster group is the primary unit of administration in a server cluster. Similar or interdependent resources are combined into the same group. A resource cannot be dependent on another resource that is not in the same cluster group. Most cluster groups are designed around either an application or a storage unit. It is in this way that individual applications or disks in a server cluster are controlled independently of other applications or disks.

Failover and Failback

If a resource on a node fails, the cluster service will first attempt to reactivate the resource on the same node. If unable to do so, the cluster service will move the cluster group to another node in the server cluster. This process is called a failover. A failover can be triggered manually by the administrator or automatically by a node failure. A failover can involve multiple nodes if the server cluster is configured this way, and each group can have different failover policies defined.

A failback is the corollary of a failover. When the original node that hosted the failed-over resource(s) comes back online, the cluster service can return the cluster group to operation on the original node. This failback policy can be defined individually for a cluster group or disabled entirely. Failback is usually performed at times of low utilization to avoid impacting clients, and it can be set to follow specific schedules.

Cluster Services and Name Resolution

A server cluster appears to clients as one common name, regardless of the number of nodes in the server cluster. It is for this reason that the server cluster name must be unique on your network. Ensure that the server cluster name is different from the names of other server clusters, domain names, servers, and workstations on your network. The server cluster will register its name with the WINS and DNS servers configured on the node running the default cluster group.

Individual applications that run on a server cluster can (and should) be configured to run in separate cluster groups. The applications must also have unique names on the network and will also automatically register with WINS and DNS. Do not use static WINS entries for your resources. Doing so will prevent an update to the WINS registered address in the event of a failover.

How Clustering Works

Each node in a server cluster is connected to one or more storage devices. These storage devices contain one or more disks. If the server cluster contains two nodes, you can use either a SCSI interface to the storage devices or a Fibre Channel interface. For three or more node server clusters, Fibre Channel is recommended. If you are using a 64-bit edition of Windows Server 2003, Fibre Channel is the required interface, regardless of the number of nodes.

Fibre Channel has many benefits over SCSI. Fibre Channel is faster and easily expands beyond two nodes. Fibre Channel cabling is simpler, and Fibre Channel automatically configures itself. However, Fibre Channel is also more expensive than SCSI, requires more components, and can be more complicated to design and manage.

On any server cluster, there is something called the quorum resource. The quorum resource is used to determine the state of the server cluster. The node that controls the quorum resource controls the server cluster, and only one node at a time can own the quorum resource. This prevents a situation called split-brain, which occurs when more than one node believes it controls the server cluster and behaves accordingly. Split-brain was a problem that occurred in the early development of server cluster technologies. The introduction of the quorum resource solved this problem.

Cluster Models

There are three basic server cluster design models available to choose from: single node, single quorum, and majority node set. Each is designed to fit a specific set of circumstances. Before you begin designing your server cluster, make sure you have a thorough understanding of these models.

Exam Warning

Make sure you understand the differences between the server cluster models and the circumstances in which each is normally used.

Single Node

A single-node server cluster model is primarily used for development and testing purposes. As its name implies, it consists of one node. An external disk resource may or may not be present. If an external disk resource is not present, the local disk is configured as the cluster storage device, and the server cluster configuration is kept there.

Failover is not possible with this server cluster model, because there is only one node. However, as with any server cluster model, it is possible to create multiple virtual servers. (A virtual server is a cluster group that contains its own dedicated IP address, network name, and services and is indistinguishable from other servers from a client’s perspective.) Figure 9.1 illustrates the structure of a single-node server cluster.

click to expand
Figure 9.1: Single Node Server Cluster

If a resource fails, the cluster service will attempt to automatically restart any applications and dependent resources. This can be useful when applied to applications that do not have built-in restart capabilities but would benefit from that capability.

Some applications that are designed for use on server clusters will not work on a single-node cluster model. Microsoft SQL Server and Microsoft Exchange Server are two examples. Applications like these require the use of one of the other two server cluster models.

Single Quorum Device

The single quorum device server cluster model is the most common and will likely continue to be the most heavily used. It has been around since Microsoft first introduced its server clustering technology.

This type of server cluster contains two or more nodes, and each node is connected to the cluster storage devices. There is a single quorum device (a physical disk) that resides on the cluster storage device. There is a single copy of the cluster configuration and operational state, which is stored on the quorum resource.

Each node in the server cluster can be configured to run different applications or to act simply as a hot-standby device waiting for a failover to occur. Figure 9.2 illustrates the structure of a single quorum device server cluster with two nodes.

click to expand
Figure 9.2: Single Quorum Device Server Cluster

Majority Node Set

The majority node set (MNS) model is new in Windows Server 2003. Each node in the server cluster may or may not be connected to a shared cluster storage device. Each node maintains its own copy of the server cluster configuration data, and the cluster service is responsible for ensuring that this configuration data remains consistent across all nodes. Synchronization of quorum data occurs over Server Message Block (SMB) file shares. This communication is unencrypted. Figure 9.3 illustrates the structure of the MNS model.

click to expand
Figure 9.3: A Majority Node Set Server Cluster

This model is normally used as part of an OEM predesigned or preconfigured configuration. It has the ability to support geographically distributed server clusters. When used in geographically dispersed configurations, network latency becomes an issue. You must ensure that the round-trip network latency is a maximum of 500 milliseconds (ms), or you will experience availability problems.

The behavior of an MNS server cluster differs from that of a single quorum device server cluster. In a single quorum device server cluster, one node can fail and the server cluster can still function. This is not necessarily the case in an MNS cluster. To avoid split-brain, a majority of the nodes must be active and available for the server cluster to function. In essence, this means that 50 percent plus 1 of the nodes must be operational at all times for the server cluster to remain operational. Table 9.1 illustrates this relationship.

Table 9.1: Majority Node Set Server Cluster Failure Tolerance

Number of Nodes in MNS Server Cluster

Maximum Node Failures before Complete Cluster Failure

Nodes Required to Continue Cluster Operations

1

0

1

2

0

2

3

1

2

4

1

3

5

2

3

6

2

4

7

3

4

8

3

5

Server Cluster Deployment Options

EXAM 70-293 OBJECTIVE 4.3

When you use either the single quorum device model or MNS model, there are a variety of ways that you can configure your clustered applications to act during a failover operation. The choices vary with the number of nodes in your server cluster, and each has advantages and disadvantages.

These deployment options are not always mutually exclusive. In a server cluster with several nodes and multiple cluster groups, it is possible that some groups will use one deployment option while other groups use a different one. Consider these options carefully when you design larger server clusters.

Exam Warning

Expect questions related to the cluster deployment options. A good understanding of each deployment option, how the options are configured, and the advantages/disadvantages of each will help you on the exam.

N-Node Failover Pairs

The N-node failover pairs deployment option specifies that two nodes, and only two nodes, may run the application. This is the simplest option and is, in essence, the only option available in a two-node server cluster. If configured in a larger server cluster with three or more nodes, the application will not be able to function if both nodes are not operational. In larger server clusters made up of nodes with different processing capabilities or capacities, you can use this option to limit an application to running on only the nodes capable of adequately servicing the application.

An N-node failover pair is configured by specifying the two nodes in the Possible Owners property for the cluster resource, as shown in Figure 9.4. You can set the Possible Owners property using the server cluster administrative tools described in the “Server Cluster Administration” section later in this chapter. Every cluster resource has a Possible Owners property that can be configured or left blank.

click to expand
Figure 9.4: Setting the Possible Owners Property

Figure 9.5 illustrates an N-node failover configuration in a server cluster with four nodes—A, B, C and D—in its normal operational state. Nodes A and B are configured as a failover pair, and nodes C and D are also a failover pair. Assorted virtual servers are active and are spread among the nodes.

click to expand
Figure 9.5: N-Node Failover, Initial State

Figure 9.6 shows the same server cluster as Figure 9.5, but after two of the nodes failed. As you can see, node B has taken ownership of the virtual servers that were operating on its failover partner (node A). Node C has also taken ownership of node D’s virtual servers.

click to expand
Figure 9.6: N-Node Failover, Failed State

Note that Figures 9.5 and 9.6 depict a single quorum device server cluster. An MNS server cluster with four nodes could not operate with failed two nodes. The storage devices and Interconnects have been removed from the images for clarity.

Hot-Standby Server/N+I

The hot-standby server/N+1 deployment option is possible on server clusters with two or more nodes and is sometimes referred to as an active/passive design. In this design, you specify one node in the server cluster as a hot spare. This hot-spare node is normally idle or lightly loaded. It acts as the failover destination for other nodes in the cluster.

The main advantage of this option is cost savings. If a two-node server cluster is configured with one node running the application(s) (the N or active node) and one node standing idle, waiting for a failover (the I or passive node), the overhead cost in hardware is 50 percent. In an eight-node server cluster with seven N (active) nodes and one I (passive) node, the overhead cost is about 15 percent.

This option is not limited to a single hot-spare node. An eight-node server cluster could be configured with one N node and seven I nodes or any other possible combination. In these configurations, the overhead cost savings would be quite a bit less or nonexistent.

Configure this option by setting the Preferred Owners property of the group to the N node(s), as shown in Figure 9.7, and the Possible Owners of the resources to the N and I nodes. As mentioned earlier, the Possible Owners property is a property of the individual resource. The Preferred Owner property, however, applies only to cluster groups. Both the Possible Owners and Preferred Owners properties are configured via the server cluster administrative tools, which are covered in the “Server Cluster Administration” section later in this chapter.

click to expand
Figure 9.7: Setting the Preferred Owners Property

Figure 9.8 illustrates a four-node server cluster configured with three active (N) nodes and one passive (I) node in its normal operational state. Each active node supports various virtual servers.

click to expand
Figure 9.8: Hot-Standby/N+I Configuration, Initial State

Figure 9.9 shows the same server cluster as Figure 9.8, but after the failure of two of the nodes. The virtual servers that were operating on the failed nodes have failed over to the I node. Again, if this were an MNS server cluster, there would not be enough nodes operating to support the server cluster. The MNS cluster would have failed when the second node failed, but the virtual servers from the first node would have been successfully failed over to the I node. Again, note that the storage devices and Interconnects have been removed from both images.

click to expand
Figure 9.9: Hot Standby/N+I Configuration, Failed State

Failover Ring

A failover ring is mainly used when all nodes in a server cluster are active. When a failover occurs, applications are moved to the next node in line. This mode is possible if all nodes in the server cluster have enough excess capacity to support additional applications beyond what they normally run. If a node is operating at peak utilization, a failover to that node may reduce performance for all applications running on that node after the failover.

The order of failover is defined by the order the nodes appear in the Preferred Owner list (see Figure 9.7). The default node for the application is listed first. A failover will attempt to move the cluster group to each node on the list, in order, until the group successfully starts.

It is possible to limit the size of the failover ring by not specifying all the cluster nodes on the Preferred Owner list. In effect, this combines the N+I and failover ring options to produce a hybrid option. This hybrid option reduces the N+I overhead cost to zero, but you need to make sure that enough capacity is present to support your applications.

Figure 9.10 illustrates an eight-node server cluster in a failover ring configuration in its initial state. This server cluster is operating with eight nodes. To simplify the diagram, each node is running one virtual server. (The configuration of the failover ring in this scenario is very simple: each node fails over to the next node, with the last node set to fail over to the first, and so on.) Storage devices and Interconnects have been removed for clarity.

click to expand
Figure 9.10: Failover Ring Configuration, Initial State

Figure 9.11 illustrates the failover ring configuration after the server cluster has experienced a failure of half of its nodes. Notice how node F has picked up the virtual servers from nodes D and E, and how node A picked up the virtual server from node H. Again, if this were an MNS server cluster, there would not be enough nodes left operational for the server cluster to function. As usual, storage devices and Interconnects have been removed from the image for clarity.

click to expand
Figure 9.11: Failover Ring Configuration, Failed State

Random

The random deployment option makes the cluster service determine the destination for a failover. This option is used in large server clusters where each node is active and it is difficult to specify an order of failover because of the needs and complexity of the environment. When adopting this option, it is important to make sure that each node has sufficient excess capacity to handle additional load. Otherwise, a failover may reduce performance for applications running on a node that is at or near peak capacity.

This mode is configured by not defining a Preferred Owner for the resource group. The cluster service will attempt to determine a suitable node for the application in the event of a failover. Figure 9.12 illustrates a random failover configuration in the initial state.

click to expand
Figure 9.12: Random Configuration, Initial State

It shows a server cluster of eight nodes supporting two virtual servers, each in its normal operating mode.

Figure 9.13 shows the same configuration after this server cluster has experienced a failure of three of its nodes. Notice how the virtual servers have been distributed seemingly at random to the surviving nodes. If this were an MNS server cluster, it would still be functioning.

click to expand
Figure 9.13: Random Configuration, Failed State

Server Cluster Administration

After a server cluster is operational, it must be administered. There are two tools provided to you to accomplish this: Cluster Administrator, an interactive graphical utility, and Cluster.exe, provided for use at the command line and in scripts or batch files.

Using the Cluster Administrator Tool

To access the Cluster Administrator utility, select Start | Administrative Tools | Cluster Administrator. The Cluster Administrator utility, shown in Figure 9.14, allows you to create a new server cluster, add nodes to an existing server cluster, and perform administrative tasks on a server cluster.

click to expand
Figure 9.14: The Cluster Administrator Window

At the Open Connection to Cluster dialog box, shown in Figure 9.15, you can enter the name of a server cluster or browse for it.


Figure 9.15: The Open Connection Dialog Box

If you wish to create a new server cluster, select Create new cluster in the Action drop-down list box and click OK. This will start the New Server Cluster Wizard, which will step you through the process of creating a new server cluster. Selecting Add nodes to cluster in the Action drop-down list will start the Add Nodes Wizard. This Wizard lets you add nodes to an existing server cluster.

Using Command-Line Tools

Cluster.exe is the command-line utility you can use to create or administer a server cluster. It has all of the capabilities of the Cluster Administrator graphical utility and more. Cluster.exe has numerous options. Figure 9.16 shows the syntax of the cluster.exe command and the options you can use with it.

start figure

CLUSTER /LIST[:domain-name] CLUSTER /CHANGEPASS[WORD] /? CLUSTER /CHANGEPASS[WORD] /HELP CLUSTER /CLUSTER:clustername1[,clustername2[,...]]         /CHANGEPASS[WORD][:newpassword[,oldpassword]] <options> <options> =   [/FORCE] [/QUIET] [/SKIPDC] [/TEST] [/VERB[OSE]] [/UNATTEND[ED]] [/?] [/HELP] CLUSTER [/CLUSTER:]cluster-name <options> <options> =   /CREATE [/NODE:node-name] [/VERB[OSE]] [/UNATTEND[ED]] [/MIN[IMUM]]     /USER:domain\username | username@domain [/PASS[WORD]:password]     /IPADDR[ESS]:xxx.xxx.xxx.xxx[,xxx.xxx.xxx.xxx,network-connection-         name]   /ADD[NODES][:node-name[,node-name ...]] [/VERB[OSE]] [/UNATTEND[ED]]     [/MIN[IMUM]] [/PASSWORD:service-account-password] CLUSTER [[/CLUSTER:]cluster-name] <options> <options> =   /CREATE [/NODE:node-name] /WIZ[ARD] [/MIN[IMUM]]     [/USER:domain\username | username@domain] [/PASS[WORD]:password]     [/IPADDR[ESS]:xxx.xxx.xxx.xxx]   /ADD[NODES][:node-name[,node-name ...]] /WIZ[ARD] [/MIN[IMUM]]     [/PASSWORD:service-account-password]   /PROP[ERTIES] [<prop-list>]   /PRIV[PROPERTIES] [<prop-list>]   /PROP[ERTIES][:propname[,propname ...] /USEDEFAULT]   /PRIV[PROPERTIES][:propname[,propname ...] /USEDEFAULT]   /REN[AME]:cluster-name   /QUORUM[RESOURCE][:resource-name] [/PATH:path] [/MAXLOGSIZE:max-size-       kbytes]   /SETFAIL[UREACTIONS][:node-name[,node-name ...]]   /LISTNETPRI[ORITY]   /SETNETPRI[ORITY]:net[,net ...]   /REG[ADMIN]EXT:admin-extension-dll[,admin-extension-dll ...]   /UNREG[ADMIN]EXT:admin-extension-dll[,admin-extension-dll ...]   /VER[SION]   NODE [node-name] node-command   GROUP [group-name] group-command   RES[OURCE] [resource-name] resource-command   {RESOURCETYPE|RESTYPE} [resourcetype-name] resourcetype-command   NET[WORK] [network-name] network-command   NETINT[ERFACE] [interface-name] interface-command <prop-list> =   name=value[,value ...][:<format>] [name=value[,value ...][:<format>       ] ...] <format> =   BINARY|DWORD|STR[ING]|EXPANDSTR[ING]|MULTISTR[ING]|SECURITY|ULARGE CLUSTER /? CLUSTER /HELP Note: With the /CREATE, /ADDNODES, and /CHANGEPASSWORD options, you       will be prompted for passwords not provided on the command line       unless you also specify the /UNATTENDED option.

end figure

Figure 9.16: Cluster.exe Command Options

The following are some of the tasks that are impossible to do with Cluster Administrator or are easier to perform with Cluster.exe:

  • Changing the password on the cluster service account

  • Creating a server cluster or adding a node to a server cluster from a script

  • Creating a server cluster as part of an unattended setup of Windows Server 2003

  • Performing operations on multiple server clusters at the same time

Recovering from Cluster Node Failure

EXAM 70-293 OBJECTIVE 4.3.2

It is reasonable to assume that on any server cluster, you will have a component failure or need to take part of the server cluster offline for service. A properly designed and maintained server cluster should have no problems. But what if something causes the node to fail? For example, if a local hard disk in the node crashes, how do you recover?

Many of the same basic administrative tasks performed on nonclustered servers apply to clustered ones. Following the same practices will help prevent unplanned downtime and assist in restoring service when service is lost:

  • Have good documentation Proper and complete documentation is the greatest asset you can have when trying to restore service. Configuration and contact information should also be included in your documentation.

  • Perform regular backups and periodically test restores Clusters need to be backed up just like any other computer system. Periodically testing a restore will help keep the process fresh and help protect against hardware, media, and some software failures.

  • Perform Automated System Recovery (ASR) backups When performing an ASR backup on your server cluster, make sure that one node owns the quorum resource during the ASR backup. If you need an ASR restore, this will be a critical component.

  • Develop performance baselines A performance baseline should be developed for each node and the server cluster as a whole. This will help you determine if your server cluster is not performing properly or is being outgrown.

If a node experiences a failure, any groups that were on the failed node should be moved to another node (unless you are using the single-node model). You should then repair the failed components in the node in the same way as you would repair any computer system.

If repairing the node involves the replacement of the boot and/or system drives, you may need to do an ASR restore. As a precaution, you should physically disconnect the node from the cluster’s shared storage devices first. Once the ASR restore is complete, shut down the node, reconnect it to the shared storage devices, and boot the node.

Server Clustering Best Practices

There are many ways to accomplish the setup and operation of a server cluster, but some methods are more reliable than others. Microsoft has published a number of “Best Practices” documents relating to its products and technologies, and server clusters are no exception.

start sidebar
Head of the Class...
Preparation Is Key

One of the great secrets of successfully building server clusters is extensive preparation. This can (and probably will) be tedious and time-consuming, but it will make your installation much more likely to succeed.

In addition to a good design and thorough documentation, appropriate hardware preparation is critical. Ensure that all the hardware components work correctly and have their firmware versions up-to-date. If you are using identically configured nodes, make sure that they are installed identically, even down to the slot the expansion cards sit in.

As an example, I was once required to create four clustered configurations in four days. Assembly and configuration of the hardware and updating firmware took three and half days. This was time well spent, because the actual installation of the cluster services and software took three hours to complete on all four server clusters.

end sidebar

Hardware Issues

The foundation of your server cluster is the hardware. It is critical to build reliable nodes at the hardware level. You cannot build high availability from unreliable or unknown components.

Compatibility List

Microsoft’s position since it first began publishing cluster technology is that the hardware components used in a server cluster and the entire server cluster configuration itself must be listed on the Hardware Compatibility List (HCL) in order to receive support. With the introduction of Windows XP, Microsoft changed from the HCL to the Windows Catalog. Windows Server 2003-compatible hardware is listed in the Windows Server Catalog, but the concept and support requirements remain the same as they were with the HCL.

In order to receive technical support from Microsoft, ensure that your entire hardware configuration is listed as compatible in the Windows Server Catalog. Using unlisted hardware does not mean you cannot make the hardware work; it simply means that you cannot call Microsoft for help if the need arises.

Network Interface Controllers

A server cluster requires at least two network interfaces to function: one for the public network (where client requests come from) and one for the private interconnect network (for the heartbeat). Since a single private interconnect would present a single point of failure, it is good practice to have at least two interconnects. Do not use a teamed configuration with interconnects. A teamed configuration binds two or more physical interfaces together into one logical interface. Using teamed controllers preserves the single point of failure.

Network controllers should be identical. This includes not only the manufacturer and model, but also the firmware and drivers. Using identical controllers will also simplify the design of your server cluster and make troubleshooting easier.

Change the default name of each network interface to a descriptive name. Use Heartbeat, Interconnect, or Private for the interconnect interface. Similarly, use Public, Primary, or some similar name for the public interfaces. You should configure these names identically on each node. Following this procedure will make identifying and troubleshooting network issues much easier.

Storage Devices

No single resource in a server cluster requires more planning and preparation than shared storage. Poor planning can make management tasks quite difficult. Planning cluster disk resources requires attention to numerous details.

First, thorough planning must be done for the acquisition of the shared disk hardware. Develop capacity requirements and design disk layouts. Dynamic disks, volume sets, remote storage, removable storage, and software-based RAID cannot be used for shared cluster disks. Plan on using hardware RAID, and purchase extra hard disks for use as RAID hot spares.

If a single RAID controller is part of the design (likely in a single-node cluster), make sure that you keep an identical spare RAID controller on hand. The spare should be the exact brand and model and have the same firmware version as your production RAID controller.

If you are using Fibre Channel-based controllers, consider using multiple Fibre Channel host bus adapters (HBAs) configured in either a load-balanced or failover configuration. This will increase the cost of the cluster, but fault-tolerance will also increase. Before purchasing redundant HBAs, make sure that they are of the same brand, model, and firmware version. Also, ensure that the hardware vendor includes any necessary drivers or software to support the redundant HBA configuration.

If you are using SCSI-based controllers, ensure that each SCSI adapter on the shared storage bus is configured with a different SCSI ID. Also ensure that the shared SCSI bus is properly terminated. If either of these tasks is not done properly, data could be lost, hardware could be damaged, or the second cluster node may not properly join the cluster.

Use caution with write caching of shared disks. If power fails or a failover occurs before data is completely written to disk, data can be corrupted or lost. Disable write caching in Device Manager by clearing the Enable write caching on the disk check box on the Policies tab in the Properties of the drive, shown in Figures 9.17 and 9.18. If the RAID controller supports write caching, either disable the feature or ensure that battery backup for the cache or an alternate power supply for the controller is available.

click to expand
Figure 9.17: Accessing Disk Drive Properties in Device Manager

click to expand
Figure 9.18: Disabling Write Caching on a Drive through Device Manager

When starting the installation of the first node, ensure that the first node is the only node on the shared storage bus. This must be done to properly partition and format the drives in the shared storage. Until the cluster service is installed, other nodes can access the shared disks and cause data corruption.

If you are using a sophisticated disk system for shared cluster storage, use the features of the system to create logical drives that your nodes will access. This step is necessary because the disk is the smallest unit of storage that is recognized as a cluster resource. All of the partitions on a disk move with the disk between cluster nodes.

Once the first node is booted, format your shared drives. Only the NTFS file system is supported on clustered disks. The quorum drive should be created first. A minimum of 500MB should be assigned to the quorum drive, and no applications should reside on it. Partition and format the rest of your clustered drives as planned. Assign drive letters as you normally would, as shown in Figure 9.19, and document them. You can assign any drive letters that are not already in use, but it is a good idea to adopt the convention of assigning the quorum drive the same drive letter each time you create a cluster—Q (for quorum) is a good choice. Once you have assigned drive letters, you will need to match these drive-letter assignments on each node in the cluster.

click to expand
Figure 9.19: Configuring Clustered Disks in Disk Management

In addition to drive-letter assignments, you also have the option of using NTFS mounted drives. A mounted drive does not use a drive letter, but appears as a folder on an existing drive. Mounted drives on clustered storage must be mounted to a drive residing on shared storage in the same cluster group and are dependent on this “root” disk.

Planning sufficient allocation of disk space for your applications is critical. Since you cannot use dynamic disks on shared storage without using third-party tools, it is difficult to increase the size of clustered disks. Be sure to allow for data growth when initially sizing your partitions. This is a situation where it is better to allocate a few megabytes too many than a few kilobytes too few.

If you plan on using the generic script resource type, make sure the script file resides on a local disk, not a shared disk. It is possible for errant scripts to be the cause of a failover, and if a script resides on a clustered disk, the script “disappears” from under the node executing it. By keeping the scripts on a local disk, they remain available to the node at all times, and the appropriate error-checking logic can be used when errors are encountered.

Power-Saving Features

Windows Server 2003 includes power-management features that allow you to reduce the power consumed by your servers. This is very useful on laptop computers and some small servers, but can cause serious problems if used on clustered servers. If more than one node were to enter a standby or hibernation state, the server cluster could fail.

The power-saving options in Windows Server 2003 must be disabled for server clusters. Nodes should be configured to use the Always On power scheme, as shown in Figure 9.20. To access this option, select Start | Control Panel | Power Options. Using this power scheme will prevent the system from shutting down its hard drives and attempting to enter a standby or hibernation state.

click to expand
Figure 9.20: Enabling the Always On Power Scheme

Cluster Network Configuration

EXAM 70-293 OBJECTIVE 4.3

Communications are a critical part of server cluster operations. Nodes must communicate with each other directly over the interconnects in order to determine each other’s health and, if necessary, initiate a failover. Nodes must also communicate with client systems over the public network to provide services. Both of these networks require proper planning.

When referring to server clusters, there are four types of networks:

  • Internal cluster communications only (private network) Used by nodes to handle their communication requirements only. No clients are present on this network. This network should be physically separated from other networks and must have good response times (less than 500 ms) in order to avoid availability problems.

  • Client access only (public network) Used to service client requests only. No internal cluster communication occurs over this network.

  • All communications (mixed network) Can handle both categories of communications traffic. Normally, this network acts as a backup to a private network, but that is not required.

  • Nonclustered network (disabled) Unavailable for use by the cluster for either servicing clients or for internal communications.

When you create the server cluster through the New Server Cluster Wizard, it will detect the different networks configured in the server. You will be asked to select the role each network will have in the server cluster. Select Internal cluster communications only (private network) for the interconnect(s), as shown in Figure 9.21, instead of accepting the default value (which will mix the server cluster heartbeat traffic with client communication traffic).

click to expand
Figure 9.21: Configuring Interconnect Networks

If you are using only a single interconnect, you should configure at least one public network interface with the All communications (mixed network) setting, as shown in Figure 9.22. This allows the server cluster to have a backup path for internal server cluster communications, if one is needed. If you have multiple interconnects configured, you should set the public interfaces to the Client access only (private network) setting.

click to expand
Figure 9.22: Configuring Public Networks

Multiple Interconnections

At least one interconnect between nodes is required. Node status messages are passed over this communication path. If this path becomes unavailable, a failover may be initiated. Because of this, multiple interconnects are recommended.

If your server cluster is configured with multiple interconnects, the reliability of the interconnects goes up. If a heartbeat message on one interconnect path goes unanswered, the node will attempt to use the other interconnect paths before initiating a failover. As with most components in a high-availability system, redundancy is good.

When using multiple interconnects, follow the same rules previously stated for configuration, but try to avoid using multiple ports on the same multiport network interface card (NIC). If the card fails, you will lose the interconnect. If you are using two dual-port cards, try to configure the system to use one port on each card for interconnects and the other port for your public network.

Node-to-Node Communication

The interconnects are used by the nodes to determine each other’s status. This communication is unencrypted and frequent. Normal client activity does not occur on this network, so you should not have client-type services assigned to the network interface used for interconnects. Windows Server 2003 normally attaches the following services to each network interface:

  • Client for Microsoft Networks

  • Network Load Balancing

  • File and Printer Sharing for Microsoft Networks

  • Internet Protocol (TCP/IP)

You should uncheck the first three services from each interconnect interface (the properties of a network interface are accessible via Start | Control Panel | Network Connections). Only TCP/IP should be assigned. Figure 9.23 shows a properly configured interconnect interface.

click to expand
Figure 9.23: Configuring an Interconnect Interface

You should also make sure that the Network Priority property of the server cluster is configured with the interconnect(s) given highest priority, as shown in Figure 9.24. This ensures that internal cluster communication attempts are made on the interconnects first. To access this property, in Cluster Administrator, right-click the server cluster name and select Properties.

click to expand
Figure 9.24: Setting the Network Priority Property of the Cluster

Binding Order

Binding is the process of linking the various communications components togethe, in the proper order to establish the communications path. To configure the binding order of communication protocols and services to the network interface, select Start | Control Panel | Network Connections. Click the Advanced menu and select Advanced Settings…. When establishing the order of network connections, you should ensure that the public interfaces appear highest on the list, followed by interconnects, and then any other interfaces. Figure 9.25 shows this binding order.

click to expand
Figure 9.25: Setting the Proper Binding Order of Interfaces

Adapter Settings

All network interfaces in a server cluster should be manually set for speed and duplex mode. Do not allow the network adapters to attempt to auto-negotiate these settings. If the controllers negotiate differently, your communications can be disrupted. Also, in many cases, a crossover cable is used on the interconnects. In these cases, an auto-negotiation may fail entirely, and the interconnect may never be established, affecting cluster operation.

As mentioned earlier, teamed network adapters must not be used for interconnects. However, they are perfectly acceptable for the public network interfaces. A failover or load-balanced configuration increases redundancy and reliability.

TCP/IP Settings

Static IP addresses (along with the relevant DNS and WINS information) should be used on public network interfaces. For the interconnects, you must use static IP addresses.

It is also a good practice to assign private IP addresses on interconnects from a different address class than your public class. For example, if you are using class A addresses (10.x.x.x) on your public interface, you could use class C addresses (192.168.x.x) on your interconnects. Following this practice helps easily identify the type of network you may be troubleshooting just by looking at the address class. Using addresses this way is not required, but it does prove useful.

Finally, you should not configure IP gateway, DNS, or WINS addresses on your interconnect interfaces. Name resolution is usually not required on interconnects and, if configured, could cause conflicts with name resolution on your public interfaces. All public interfaces must reside on the same IP subnet. Likewise, all interconnect interfaces must reside on the same IP subnet.

The Default Cluster Group

Every server cluster has at least one cluster group: the default. This group contains the following resources:

  • Quorum disk (which contains the quorum resource and logs)

  • Cluster IP address

  • Cluster name (which creates the virtual server)

When designing your server cluster, you should not plan on using these resources for anything other than system administration. If this group is offline for any reason, cluster operation can be compromised. Do not install applications on the quorum drive or in the default cluster group.

Security

Security is a consideration for any computer system. Server clusters are no exception. In fact, because they often contain critical information, they should usually be more closely guarded than a standard server.

Physical Security

Nodes should be kept in controlled environments and behind locked doors. More downtime is caused by accident than by intent. It is obvious that you would not want an unhappy or ex-employee to have access to your computer systems, but what about the curious user? Both can lead to the same end.

When setting up physical security, do not forget to include the power systems, network switches and routers, keyboards, mice, and monitors. Unauthorized access to any of these can lead to an unexpected outage.

Public/Mixed Networks

It is a good idea to isolate critical server clusters behind firewalls if possible. A properly configured firewall will also allow you to control the network traffic your server cluster encounters.

If there are infrastructure servers (DNS, WINS, and so on) that are relied on to access the server cluster, make sure that those servers are secured as well. If, for example, name resolution fails, it is possible that clients will not be able to access the server cluster even though it is fully operational.

Private Networks

The traffic on the private interconnect networks is meant to be used and accessed by nodes only. If high traffic levels disrupt or delay heartbeat messages, the server cluster may interpret this as a node failure and initiate a failover. For this reason, it is a good idea to place the interconnects on their own switch or virtual LAN (VLAN) and to not mix heartbeats with other traffic.

Do not place infrastructure servers (DNS, WINS, DHCP, and so on) on the same subnet as the interconnects. These services are not used by the interconnects and may cause the conflicts you wish to avoid.

Remote Administration of Cluster Nodes

Administration of your server cluster should be limited to a few controlled and trusted nodes. The administrative tools are quite powerful and could be used intentionally or accidentally to cause failovers, service stoppages, resource stoppages, or node evictions.

Use of Terminal Services on nodes is debatable. Terminal Services works just fine on nodes and actually includes some benefits. Evaluate your administrative, security, and operational needs to determine if installing Terminal Services on your nodes is appropriate for your situation.

The Cluster Service Account

The account that the cluster service uses must be a domain-level account and configured to be a member of the local Administrators group on each node. This account should not be a member of the Domain Admins group. Using an account that has elevated domain-level privileges would present a strong security risk if the cluster service account were to become compromised.

Do not use the cluster service account for administration, and be sure to configure it so that it can log on to only cluster nodes. Use different cluster service accounts for each cluster in your environment. This limits the scope of a security breach in the event that one occurs. If any of the applications running on your server cluster require accounts for operation, create and assign accounts specifically for the applications. Do not use the cluster service account for running applications. Doing so would make your cluster vulnerable to a malfunctioning application.

If you are required to permanently evict (forcibly remove) a node from a server cluster, you should manually remove the cluster service account from the appropriate local security groups on the evicted node. The cluster administrative tools will not automatically remove this account. Leaving this account with elevated permissions on an evicted node can expose you to security risks for both the evicted node and your domain.

Another possible method of securing a server cluster is to create a domainlet. A domainlet is a domain created just to host a server cluster. Each node in the server cluster is a domain controller of the domain. A domainlet allows you to better define and control the security boundary for the cluster. There are advantages and disadvantages to this approach. (For more information about domainlets, visit Microsoft’s Web site.)

Client Access

Use the security features built into Windows Server 2003 and Active Directory (AD) to secure the applications and data on your server cluster. Turn on and use the auditing features of the operating system to see what activity is occurring on your server cluster.

Administrative Access

In larger organizations, it may be possible to have a different group of personnel responsible for administering clusters than those that perform other administrative tasks. Evaluate this possibility in your organization. If this strategy is adopted, assign these cluster administrators to a domain group and make that group a member of the appropriate local groups on the nodes. Also, assign NTFS permissions in a similar manner.

Cluster Data Security

As with any server, data should be accessed in a controlled manner. You do not want users accessing, deleting, or corrupting data. Assign appropriate NTFS file system permissions on a server cluster, just as you would assign them on a stand-alone server.

Disk Resource Security

Use NTFS permissions to ensure that only members of the Administrators group and the cluster service account can access the quorum disk. If you use scripts and the generic script resource type, you should assign appropriate NTFS Execute permissions to the scripts. A buggy script, or one run in an unplanned or uncontrolled manner, may cause data loss or a service outage.

Cluster Configuration Log File Security

When a cluster is created or a node is added to a cluster using the wizard, a file containing critical information about the cluster is placed the %systemroot%\System32\LogFiles\Cluster\ directory, unless you do not have administrative permissions on the node; in that case, the file is placed in the %temp% directory. The log file, ClCfgSrv.log, should have NTFS permissions that allow access to only the Administrators group and the cluster service account.

Exercise 9.01: Creating a New Cluster

start example

This exercise will walk you through the steps of creating a server cluster. Only the creation of the first node is covered. Each server cluster and network configuration is unique. You will need to substitute your TCP/IP addresses and account names, and adjust this process to fit your hardware.

  1. Properly assemble your hardware. Ensure that only this first node is connected to and can access the shared storage unit(s).

  2. Assign friendly names to your network interfaces and configure them with static IP addresses.

  3. Log on to your domain with an account capable of creating user accounts. Open Active Directory Users and Computers. In the Users container, create an account called ClusterAdmin matching the settings shown in Figures 9.26 and 9.27. Close Active Directory Users and Computers.

    click to expand
    Figure 9.26: Create a New Cluster Service User Account

    click to expand
    Figure 9.27: Assign a Password and Properties to New Cluster Service User Account

  4. Log on to your first cluster node and start Cluster Administrator by selecting Start | Administrative Tools | Cluster Administrator.

  5. When the Open Connection to Cluster dialog box is presented (Figure 9.28), select Create new cluster from the Action drop-down box and click OK.

    click to expand
    Figure 9.28: Open Connection to Cluster

  6. The New Server Cluster Wizard will start, as shown in Figure 9.29. Click Next.

    click to expand
    Figure 9.29: The New Server Cluster Wizard’s Welcome Window

  7. Select your domain in the Domain drop-down list and enter cluster1 in the Cluster name text box, as shown in Figure 9.30. Click Next.

    click to expand
    Figure 9.30: Specify the Cluster Name and Domain

  8. Enter the name of the computer that will become your first node in the Computer name text box, as shown in Figure 9.31, and click Next.

    click to expand
    Figure 9.31: Select the Computer Name

  9. The Analyzing Configuration window will appear, as shown in Figure 9.32, while the configuration of the node is verified. You can click the View Log… button to see the history of actions the Wizard has performed, or click the Details… button to see the most recent task.

    click to expand
    Figure 9.32: Analyzing the Configuration of the Cluster Node

  10. When the analysis is completed, the Analyzing Configuration window will show the tasks completed, as shown in Figure 9.33. Click the plus signs (+) to see the details behind each step. When you’re finished examining the details, click Next.

    click to expand
    Figure 9.33: Finished Analyzing the Configuration of the Cluster Node

  11. You are asked what IP address you want assigned to the server cluster, as shown in Figure 9.34. Enter the appropriate IP Address and click Next.

    click to expand
    Figure 9.34: Enter the Cluster IP Address

  12. In the Cluster Service Account window, shown in Figure 9.35, enter the User name, Password, and Domain for the cluster service account you created in step 3. Then click Next.

    click to expand
    Figure 9.35: Enter the Cluster Service Account Information

  13. The Wizard will display the proposed server cluster configuration, as shown in Figure 9.36. Review the information.

    click to expand
    Figure 9.36: Review the Proposed Cluster Configuration

  14. Click the Quorum… button. Select the correct quorum disk for your configuration from the drop-down list, as shown in Figure 9.37, and select OK.

    click to expand
    Figure 9.37: Select the Quorum Disk

  15. The wizard will now create the server cluster, as shown in Figure 9.38. As the configuration progresses, you can click View Log… or Details… to see what the wizard is doing.

    click to expand
    Figure 9.38: Creating the Cluster

  16. When the wizard finishes creating the server cluster, the Creating the Cluster window will show the tasks completed, as shown in Figure 9.39. Click the plus signs (+) to see details about each step performed. Click Next.

    click to expand
    Figure 9.39: Completed Cluster Creation

  17. The wizard informs you that the server cluster is created, as shown in Figure 9.40. You can click View Log… to examine all of the activity involved in the creation. Click Finish to exit the wizard.

    click to expand
    Figure 9.40: The Wizard’s Final Window

  18. The Cluster Administrator utility appears. As shown in Figure 9.41, it displays the server cluster you just created.

    click to expand
    Figure 9.41: The Newly Created Cluster

  19. Right-click the server cluster name (CLUSTER1) and select Properties. Click the Network Priority tab and move Interconnect to the top of the list, as shown in Figure 9.42. Click Apply.

    click to expand
    Figure 9.42: Change Network Priorities

  20. Examine the Quorum and Security tabs to become familiar with the default settings on these tabs. When you have finished reviewing the configuration of these tabs, click OK. Then close Cluster Administrator.

end example




MCSE Planning and Maintaining a Windows Server 2003 Network Infrastructure. Exam 70-293 Study Guide and DVD Training System
MCSE Planning and Maintaining a Windows Server 2003 Network Infrastructure: Exam 70-293 Study Guide and DVD Training System
ISBN: 1931836930
EAN: 2147483647
Year: 2003
Pages: 173

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net