Microsoft Cluster Server (MSCS) Improvements | Windows Server 2003 on Proliants. Deployment Techniques and Management Tools for System Administrators

< Day Day Up >

This section describes some of the new cluster functionality in Windows Server 2003.

New Quorum Schemes

A cluster is a group of servers that work together to serve an application to a user community. Clusters have been around for a long time, and MSCS is only one of the many cluster implementations available. For a cluster to work, the members must guarantee data integrity. There are many aspects to guaranteeing data integrity; one of them is quorum. Quorum guarantees that when communication is not available between some cluster members , the cluster will not partition into two independent subclusters performing unsynchronized access to the data.

Until Windows Server 2003, MSCS quorum was 100% reliant on a shared quorum disk. No matter how many servers were available, if the quorum disk was not functional, the cluster could not proceed. With Windows Server 2003, Microsoft now allows multiple quorum schemes. The added schemes allow much greater flexibility and also align the Microsoft offering with most of the other cluster offerings.

Local Quorum

This quorum scheme is actually not new, but it's now fully documented and supported. It supports only single node clusters and, by default, stores the quorum information onto the boot partition in a subdirectory of the %WinDir%\Cluster\ directory. The resource type of "Local Quorum" is implemented by clusres.dll.

Microsoft has always mentioned the single-node cluster as a solution to a misbehaving application that needed to be automatically restarted after it died. Now, it is actually possible to implement this elegant solution using the standard cluster configuration wizard.

Single-node clusters are also a great solution for application developers or system Administrators that need to practice on a cluster, but do not have access to a full-blown shared storage cluster.

note

When upgrading an existing cluster from Windows NT 4.0 to Windows Server 2003, the "Local Quorum" resource is not defined. This is documented in Microsoft KB article 812544, "Local Quorum Resource Not Available After Upgrade to Windows Server 2003," and is not much of a problem because a Windows NT 4.0 cluster should be a two-node cluster and cannot make use of the feature.

Majority Node Set

This quorum scheme introduces the concept of majority to the quorum requirement. As mentioned earlier, the previous quorum schemes were all or nothing. Majority Node Set actually counts cluster members, and allows a cluster to function if a majority of the members are accounted for. It's not possible to assign weight to cluster members; each node can have at most one vote. The resource type "Majority Node Set" is implemented by clusres.dll.

Whereas the other quorum schemes are disk-based, this quorum resource is actually network-based. All participating cluster members have a special hidden network share that is used to contain a replica of the quorum data. The network share is also located on the boot partition in a subdirectory of the %WinDir%\Cluster\ directory for each participating cluster member. The network share is created when the quorum resource is created, and is managed by the server service. All the file shares that form a replica set are kept synchronized via CRS (Consistency for Replica Set). The cluster software still accesses the quorum device, but the accesses are rerouted through an IFS (Installable File System) to CRS, which guarantees consistency among all the members of the replica set. If changes cannot be verifiably propagated to a majority of the members, the change is lost ”and because a majority of the nodes cannot be accessed, quorum is lost.

It is important to note that even though a Majority Node Set quorum-based cluster does not require any shared storage and has no dependency on a single quorum device, its reliability now depends on many other network components . Most notably, Majority Node Set uses SMB (Server Message Block) for network communication. SMB is an old protocol that was not initially designed for fault-tolerance or complicated network infrastructures . Microsoft has gone to great lengths to make it as resilient as possible, but it would be a mistake to use a Majority Node Set quorum scheme solely because it removes the dependence on shared storage.

A few unexpected oddities about Majority Node Set quorum based clusters include the following:

A node does not have to be an active participant in the cluster for its vote to be counted. Votes are counted when server service successfully exports the SMB hidden share and it is successfully accessed by the counting node; not because the cluster service is up.
Conversely, a node can be an active cluster member, running applications and providing clustered services and yet not be participating toward quorum. This can happen if the hidden share is not successfully accessible by other members of the cluster. Anything from a server service failure to a network problem to a permission problem could cause this situation.
The hidden share is created only once, when the resource is initially configured. The server service automatically makes it available every time it starts. However, if the share is accidentally deleted, starting or stopping the cluster service (directly or via a reboot) will not re-create the share.

Now that you understand the resource behavior, it's important to understand when using it is beneficial and when it is not. You should note that switching quorum scheme is extremely easy and does not involve downtime. The steps to switching quorum scheme are

1. Create the new quorum resource. Use the New Resource option in the Cluster Administrator and create the resource. It is recommended that the resource be created in the cluster group.

2. Change the quorum scheme. Use the Quorum tab of the cluster property sheet. If step 1 was performed successfully, the new quorum resource should be available in the Quorum Resource pull-down list.

3. Optionally, delete or move the old quorum resource to another cluster group as appropriate.

Because of this simplicity, Majority Node Set quorum is a great tool to use when a disk used as the quorum disk needs to be replaced . Before Windows Server 2003, it would have required multiple outages. In Windows Server 2003, the steps are simple and require no downtime:

1. Create the new Majority Node Set resource in the same group as the existing Physical Disk quorum resource.

2. Change the quorum scheme.

3. Delete the Physical Disk resource.

4. Replace the disk or perform whatever action is needed.

5. Format and assign a drive letter to the new disk.

6. Create a Physical Disk resource for the new disk and bring it online.

7. Change the quorum scheme back to the Physical Disk.

8. Delete the Majority Node Set resource.

Another use for Majority Node Set is in geo-clusters ( geographically dispersed cluster). Geo-clusters are clusters that are divided among distant sites for disaster tolerance. First, it should be noted that a geo-cluster distributed over two sites benefits from having an odd number of nodes. A primary site is given a majority of the nodes and a secondary site is given a minority of the nodes. This configuration allows the master site to survive the loss of the slave site or of intersite communication without human intervention. Should the master site go down, the slave site can regain quorum with human intervention. See TechNet article 258078, "Cluster Service Startup Options," for information on /ForceQuorum startup switch.

Additional Information

You can configure a cluster with multiple quorum-capable resources. Just like you could have multiple Physical Disk resources, you now can have a Majority Node Set resource and a Physical Disk and a Local Quorum in the same cluster group. However, at any one time, only one resource can be the active quorum device. The quorum device is one of the cluster properties.

Changing the current quorum resource is quite simple and does not require any downtime. From the Quorum tab, in the Quorum Resource pull-down list, pick the resource you want to change to. Note the resource has to be online, but does not have to be in the same resource group as the previous quorum resource nor does it have to be in the cluster group. However, a best practice suggests keeping all resources in the cluster group.

Finally, Microsoft has automated the quorum resource selection during cluster setup, and it is important to note how setup selects the quorum resource. The quorum resource can be changed after setup completes, but it's so much easier to get it right the first time! Setup's first choice for quorum is a shared Physical Disk of 500MB, and setup will use the Local Quorum scheme only if it cannot find any suitable shared physical disk. Setup never selects the Majority Node Set as a default quorum scheme. The setup quorum selection process comprises three steps:

1. Find all shared Physical Disks that are larger than 500MB and pick the smallest.

2. If no disk is larger than 500MB, find all disks larger than 50MB and pick the largest.

3. If no disk is larger than 50MB, select Local Quorum.

note

After Local Quorum is selected, it must be manually changed before additional nodes can be added.

The 500MB size was selected as the most efficient size for an NTFS partition. Note that it is still recommended that

The quorum disk should not have any other logical partition besides the one used for quorum.
No other applications must be configured to use the quorum disk.
The quorum resource must be in the cluster group.

New Application-Monitor Resources

Windows Server 2003 provides new support to enhance and simplify running noncluster-aware applications on a cluster. This section will cover the details of using this new functionality.

Generic Script Resource

Prior to Windows Server 2003, there were only two ways a cluster-unaware application could be integrated with a server cluster:

Write an application-specific resource DLL in a high-level language, which required C or C++ skills. Unfortunately, this method is expensive, and has discouraged many vendors or in-house developers from supporting MSCS.
Use the Generic Application resource. This method is extremely simple, but does not offer any application-specific way to actually verify that the application is functioning.

Because of the limitations of these two methods , very few applications were made cluster-aware . With Windows Server 2003, it's now possible to use a script to monitor an application and make it cluster-aware. Any scripting language supported by Windows Scripting Host will work. This means that out of the box, Windows Server 2003 can support an application monitor written in JS (Java Script) or VBS

Support for the feature is provided via the Generic Script resource. At the same time, Microsoft migrated IIS cluster support from an application monitor written in C to one using a Generic Script resource with the monitor written in VBS. Therefore, the IIS application monitor is an excellent example that can be used by anyone trying to get started. If you want to get a starting point for a monitor written in JS, check out the SDK (Software Developer's Kit) from the Microsoft Web site.

When configuring the resource, all you need is the location of the script. If the script is on a shared disk, you should make the Generic Script resource have a dependency on the Physical Disk. However, the script can be on a local disk as long as the file path is the same on all cluster members.

The only requirement imposed on the script is that it must have seven entry points plus the main routine. The seven entry points must be

Open () : Called when the resource is created. There are no input parameters, and it returns no value. The application monitor needs to perform all of its "first time" work here. This is where Registry checkpoints or crypto-checkpoints (checkpoints are files that contain Registry or cryptographic information that needs to be replicated across cluster nodes) need to be created. The application itself should not be started; that will occur in the online() method. The resources in the dependency tree are not guaranteed to be online and should not be relied on. However, they are guaranteed to be created.
Online() : Invoked to bring the resource online. It requires no input parameters and returns no value. This method is called every time the resource comes online. When this method returns, the application should be running.
LooksAlive() : Invoked every LooksAlive period. The default value is five seconds. The method must return either True or False to indicate the state of the application. Because this method executes often, it should be a very small footprint and not perform any CPU- intensive tasks . However, a cluster is very forgiving . If the method takes too long, it will not fail the resource, but the resource monitor will keep the same number of LooksAlive between each IsAlive .
IsAlive() : Invoked every IsAlive period unless LooksAlive takes too long, as previously explained. The default value is 60 seconds. The method must return either True or False to indicate the state of the application. See the following "Access to WMI Providers" section for an example of VBS implementation. Note that, as explained previously, if the LooksAlive method takes longer to execute than the LooksAlive period, the next LooksAlive is delayed, and eventually IsAlive is also delayed.
Offline() : Called when the resource is brought offline. This routine should shut down the application quickly and cleanly.
Close() : Invoked when the cluster Generic Script Resource is deleted. It should dereference objects and clean up the environment. The system should be left in the same state as found when the original open method was called. Any checkpoint should be removed.
Terminate() : Should immediately shut down the application. It is invoked when the Offline method does not work properly or when the IsAlive or LooksAlive methods have returned False .

As noted, only IsAlive() and LooksAlive() must provide any work and return either True or False .

A few helper methods are also provided:

You can make entries in the cluster.log using the Resource.LogInformation method.
It is possible to create Registry REG_SZ values using Resource.AddProperty .
The REG_SZ value can then be used as a permanent variable using its name .
It is possible to delete Registry REG_SZ values using Resource.DeleteProperty .
The name given to the resource when it was configured is also available via Resource.Name .

The Generic Script resource is extremely useful when cluster awareness is needed for an existing simple application.

Shadow Volume Scheduler

Windows Server 2003 introduces the Volume Shadowing feature, which is Microsoft's implementation of snapshot technology. This is not a replacement for a good backup strategy, but it is a powerful tool to allow users to recover documents that they accidentally have damaged or lost. Because of the design of the feature, it is only available over SMB (Server Message Block)/file shares. Because clusters are great file servers, it's a great feature to enable on a cluster file server.

The feature is cluster-aware on its own because it is built in NTFS; however, the schedule-based shadow creation requires a cluster resource in the same group as the Physical Disk being shadowed . To configure a clustered ShadowCopy volume, follow these steps:

1. Configure the cluster, and verify its operations.

2. Put all Physical Disks resources involved in the same cluster group.

3. Create the VS and the initial file share.

4. Make sure the file share has a dependency on the Physical Disk(s) and the network name.

5. Use the standard shadow copy management tools to manage the schedule.

warning

Do not use Cluster Administrator to create or manage the Shadow Volume schedule.

The feature is very simple to use as long as you follow a few simple rules:

Always configure the resource group with the Physical Disk from Cluster Administrator before configuring the shadowing from the Explorer volume properties.
Always ensure that the Physical Disk is online before configuring shadowing.
Never manage or modify the Volume Shadowing scheduler resource from Cluster Administrator; always use the Explorer interface.

In a cluster with multiple Physical Disk resources in multiple cluster groups, each cluster group will have at most one shadow copy scheduler resource configured. The resource will handle all disks in the group.

note

A U.S. university thought enough of the Volume Shadowing feature to justify upgrading its file share cluster to Windows Server 2003. The university now performs scheduled snapshots twice daily. The users are able to recover files they accidentally damaged without having to request a restore from the MIS department ”at a significant saving of help desk time. The MIS department still performs regular backups .

Message Queue Triggers

Windows Server 2003 has improved cluster awareness of MSMQ. You can now have multiple message queuing resources configured in a cluster as long as they are in different cluster resource groups. It is also possible to cluster Message Queue Triggers. The feature is not configured by default, and needs to be selected by choosing Add, Remove Program, Add, Remove Windows Component, Application Server, Message Queuing.

Whereas the dependency on Microsoft Distributed Transaction Coordinator (MSDTC) was removed from the MSMQ resource, MSDTC is still required for the MSMQ Trigger resource. Note that MSDTC cluster maintenance was also greatly simplified in Windows Server 2003. Clustered instances of MSDT can be completely managed from the Cluster Administrator. The CLI (Command Language Interface) utility, comclust.exe, is now obsolete.

WMI (Windows Management Interface) Management

As mentioned in Chapter 10, "System Administration," cluster is one of those components that was given a WMI provider. This section is limited to an introduction of cluster management with WMI. The cluster WMI provider exposes 36 classes under the root\MSCluster path. This section introduced most of these classes and provides examples on how to use them. The examples are in VBS and WMIC. However, they can be accessed using many other programming languages.

It is important to remember that the majority of the properties are generic properties inherited from parent classes and might not contain information that is particularly relevant to managing a cluster. However, all the required information is present in some properties. Similarly, some methods have been defined when the initial cluster WMI provider was designed, but were never implemented after it was realized they had no purpose. If you require more information, the bulk of the cluster WMI provider is documented in the platform SDK.

MSCluster_Cluster

Instances of this class represent a cluster. Because a node can be part of only one MSCS cluster at any time, this class can have at most one instance. To view the names of the class's 28 properties using WMIC, use the command:

 C:\> WMIC /NameSpace:\root\MSCluster PATH MSCluster_Cluster GET /?

note

The commands used here and throughout this chapter are one-line commands; ignore the book typesetting that breaks the line. There is no dash character, just underline.

To view an individual property, you could use something like:

 C:\> WMIC /NameSpace:\root\MSCluster PATH MSCluster_Cluster GET QuorumLogFileSize

In this particular example, the command would display the current log file size.

The instance also has two methods. Rename allows you to rename the cluster and takes care of everything that has to be done. SetQuorumResource allows changing the quorum resource on-the-fly , as explained in the prior section.

Through association classes, it's possible to access

NICs in the cluster using MSCluster_ClusterToNetworkInterface
Networks used by the cluster via MSCluster_ClusterToNetwork
All servers that have been added to the cluster, even when down using MSCluster_ClusterToNode
The resource that is currently used for quorum using MSCluster_ClusterToQuorumResource
All the configured resources via MSCluster_ClusterToResource
All the resource groups using MSCluster_ClusterToResourceGroup
The configured resource type via MSCluster_ClusterToResourceType

MSCluster_Event

Six event classes provide access to cluster event in real-time. The names are self-descriptive:

MSCluster_EventGroupStateChange
MSCluster_EventObjectAdd
MSCluster_EventObjectRemove
MSCluster_EventPropertyChange
MSCluster_EventResourceStateChange
MSCluster_EventStateChange

With the event classes, it becomes simple to trigger some specific actions based on events. For example, one could e-mail a warning when a certain resource goes offline. Or even simpler, it is possible to log all cluster events to a command console in real time. The following simple VBS does just that for all resource state changes:

 REM REM    Display all resource change events as they occur REM DIM oEvents, oItem SET oEvents = GetObject _ ( "winmgmts://./root/MSCluster" )._     ExecNotificationQuery _ ( "Select * from MSCluster_EventResourceStateChange" ) WScript.Echo ( "EventResourceStateChange:" ) DO     SET oItem = oEvents.NextEvent     WITH oItem     WScript.Echo DATE & " " & TIME     WScript.Echo ( "EventGroup=" & .EventGroup )     WScript.Echo ( "EventNewState=" & .EventNewState )     WScript.Echo ( "EventNode=" & .EventNode )     WScript.Echo ( "EventObjectName=" & .EventObjectName )     WScript.Echo ( "EventObjectPath=" & .EventObjectPath )     WScript.Echo ( "EventObjectType=" & .EventObjectType )     WScript.Echo ( "EventTypeMajor=" & .EventTypeMajor )     WScript.Echo ( "EventTypeMinor=" & .EventTypeMinor ) LOOP SET oEvents = NOTHING SET oItem = NOTHING

Obviously, you might want to add logic to filter the changes and display only when resources fail, but as you see, the coding is very simple and effective.

MSCluster_NetworkInterface

Instances of this class represent a NIC used by the cluster software, but not necessarily a NIC that is on the current cluster member. It is worth noting that a NIC represents a hardware device, and as such does not include any software configuration information.

MSCluster_Network

Unlike the NIC, the network represents a software configuration. A TCP/IP network is defined by an address and a subnet. This in turn determines which NICs are used to service traffic on that network. To easily see the networks used by your cluster, you could use the following command:

 C:\>WMIC /NameSpace\root\MSCluster PATH MSCluster_Network GET Name,Address,AddressMask

It is also extremely simple to use the association class NetworkToNetworkInterface to find which NICs are used for which network.

The only method available is Rename . It renames the network and also renames the NICs on the cluster members to match the new name.

MSCluster_Node

Instances of this class represent a server that has joined the cluster. The server does not have to be up and participating in the cluster to be included in the instance set. There are 29 properties for this class, including the node name and the software version.

Association classes also exist to associate a cluster node with the NIC ( MSCluster_NodeToNetworkInterface ), with the groups online on this server ( MSCluster_Node ToActiveGroup ), with the resources running on this server MSCluster_NodeToActive Resource ), and with the service account used by the cluster service on this node ( MSCluster_NodeToHostedService ).

MSCluster_Resource

Resources are the entities that perform the actual work of monitoring an application and ensuring high availability. Each instance of this class represents one configured resource in the cluster. The class has 33 properties that provide you access to everything from the ResourceName to the IsAlive timer to whether the resource is quorum-capable.

Furthermore, because different resource types have different properties, this class contains a pointer to a list of resource-specific properties. For example, all resources have a LooksAlivePollInterval property, but only resources of the type "IP Address" have an EnableNetBios property ”it would make no sense for a Physical Disk resource!

Here is an example of looking at the LooksAlivePollInterval from the command prompt:

 C:\>WMIC /NameSpace:\Root\MSCluster PATH MSCluster_Resource WHERE Name="Cluster IP Address" GET LooksAlivePollInterval

And here is an example of using a VBS to get the address of the same resource:

 REM REM    Display the cluster main IP address REM DIM oResourceSet, oResource, oPropertySet, oProperty, strQuery strQuery = "Select * From MSCluster_Resource " & _            "where Name=""Cluster IP Address""" SET oResourceSet = GetObject( _            "winmgmts://./root/MSCluster" ).ExecQuery( strQuery ) FOR EACH oResource IN oResourceSet     SET oPropertySet = oResource.PrivateProperties     FOR EACH oProperty IN oPropertySet.Properties_     IF oProperty.Name = "Address" THEN           WScript.Echo( "Cluster IP Address is " & oProperty.Value )           EXIT FOR       END IF     NEXT NEXT SET oResourceSet = NOTHING SET oResource = NOTHING SET oPropertySet = NOTHING SET oProperty = NOTHING

See Appendix E, "Sample VBS," for a sample VBS that lists all properties of all resources in a cluster. The class also has 13 methods:

AddCryptoCheckpoint
AddDependency
AddRegistryCheckpoint
BringOnline
CreateResource
DeleteResource
FailResource
MoveToNewGroup
RemoveCryptoCheckpoint
RemoveDependency
RemoveRegistryCheckpoint
Rename
TakeOffline

note

The BringOnline and TakeOffline methods might return a WBEM_E_GENERIC error if a timeout is not provided. However, you can ignore the error.

MSCluster_ResourceGroup

This class represents cluster groups. Each instance has 14 defined properties mostly dealing with failover and fallback. There are 6 methods available:

BringOnline
CreateGroup
DeleteGroup
MoveToNewNode
Rename
TakeOffline

note

The BringOnline and TakeOffline methods have the same tendency of returning an WBEM_E_GENERIC error if an optional timeout parameter is not provided.

There are two association classes. MSCluster_ResourceGroupToResource relates resource groups with the resources in that group, and MSCluster_ResourceGroupToPreferredNode relates a resource group to the preferred server node list.

MSCluster_ResourceType

The feature-set of each resource is implemented through a resource type. The type defines the capability of each resource. The base OS contains support for 15 different resource types. However, as cluster-aware applications are installed, new resource types are defined.

The class has 15 properties and 2 methods. Note the list of resource-specific properties is not available from the resource type, only from instances of resources for that particular type. There is a single association class MSCluster_ResourceTypeToResource that can associate resources and the resource type.

Here is an example VBS that uses two association classes to relate a specified resource group to its resources and then associate those resources to their resource type:

 REM REM    Given a resource group name, find all the resources REM    and their types in that group using associators REM OPTION EXPLICIT ON ERROR RESUME NEXT DIM refWMIServices, oResourceSet, oResource DIM oResourceTypeSet, oResourceType DIM strQuery, strInput, nCount WSCript.Echo ("Enter the name of the resource group to look for:" ) strInput = WScript.StdIn.ReadLine SET refWMIServices = GetObject( "winmgmts://./root/MSCluster" ) strQuery = "ASSOCIATORS OF {MSCluster_ResourceGroup='" & strInput &_ "'} WHERE  AssocClass=MSCluster_ResourceGroupToResource" SET oResourceSet = refWMIServices.ExecQuery( strQuery ) nCount = oResourceSet.Count IF Err.Number <> 0 THEN     WScript.Echo ("Rsource group " & strInput & " was not found!" )     WsCript.Quit END IF WScript.Echo ( "Got " & nCount & " resources in " & strInput & ":" ) FOR EACH oResource IN oResourceSet     WITH oResource         strQuery = "ASSOCIATORS OF {MSCluster_Resource='" & .Name & _                "'} WHERE AssocClass=MSCluster_ResourceTypeToResource"         SET oResourceTypeSet = refWMIServices.ExecQuery( strQuery)         FOR EACH oResourceType IN oResourceTypeSet             WScript.Echo ( "Resource Name --> " & .Name & _                            " of type --> " & oResourceType.Name )         NEXT     END WITH NEXT SET oResourceSet = NOTHING SET oResource = NOTHING SET oResourceTypeSet = NOTHING SET oResourceType = NOTHING

MSCluster_Service

This class describes the service account that is used to run the cluster service. The class will have as many instances as there are instances of the MSCluster_Node class.

Application Classes

Some cluster-aware applications also have WMI providers. For example, Exchange offers management access to most of its functionality and even has a class specific to clustered Exchange instances. The repository path is Root/CIMV2/Applications/Exchange , and the class is ExchangeClusterResource . To see what properties are available, use this command:

 WMIC /NAMESPACE:\root\CIMV2\Applications\Exchange PATH ExchangeClusterResource GET /?:FULL

WMIC Management

WMIC offers much of the same functionality as the pre-Windows Server 2003 CLUSTER.EXE command. However, with WMIC, the same syntax applies to anything that has a WMI provider, unlike the CLUSTER.EXE command, which is cluster-specific and requires the Administrator to learn yet another command and syntax.

Update to Spooler Resource

The spooler resource has seen some steady improvements since the initial release of Windows NT 4.0. With Windows Server 2003, the print drivers are now automatically replicated between the nodes and kept synchronized. Also, when configuring a new cluster printer, you don't have to manually load the driver by using a dummy printer. You configure the printer from the virtual server, and the cluster software handles it all.

A new directory is created on the Physical Disk that the Spooler resource depends on. The directory is called \PrinterDrivers and resides next to the \Spool directory. The print driver and the color management drivers are replicated between the shared disk and this directory on the boot partition %WindDir%\System32\Spool\Drivers\GUID-For-Spooler-Resource .

This functionality is actually generic enough that any driver that is different between the shared drive and the local server spool directory will get updated. This in turn implies that it supports updating the printer driver via SUS (Software Update Service). When the driver is updated, a copy of the new driver is updated on the shared disk, and the updates are propagated to servers as the spooler resource comes online on those servers. It can also "heal" a spooler problem caused by a corrupted printer driver by restoring the correct driver next time the spooler resource comes online.

Update to Cluster Management Tools

A lot of other miscellaneous cluster updates shipped with Windows Server 2003. None is drastic, but together they greatly improve the reliability and ease of maintenance.

Configuration Wizard

Cluster configuration has been isolated into a standalone wizard. On SKUs that support clustering, it's no longer necessary to select Add/Remove Program to add the clustering software because it's always there. You can start the cluster configuration by using the command CLUSTER/CREATE/WIZARD, by going through the cluster Administrator UI (user interface) by selecting File, New, Cluster, or by selecting Create New Cluster when you are prompted for an action.

The wizard has been streamlined and asks fewer questions. It automatically attempts to find the appropriate quorum resource and doesn't require the network mask for the cluster IP address. The algorithm used to select a quorum was explained under "Additional Information" of the New Quorum Scheme section earlier in this chapter.

If you are using StorageWorks shared storage, you should carve a disk of about 550MB from the controller. This will allow for the file system overhead and still leave you with a quorum disk of 500+MB.

The cluster wizard also tests the feasibility of your request before attempting to perform the request. If a problem is found during the analysis phase, you can fix the problem and restart the analysis phase without canceling out of the wizard. The analysis phase has a color-coded display:

Green : Everything is fine.
Red : A showstopper was found.
Blue : Analysis is still in progress.

Furthermore, each step of the analysis is visible and if a warning or error is found, you can get more information about it by double-clicking the entry.

Similarly, adding nodes to the cluster has been simplified and has received the same enhancements. You can add multiple nodes at a time, and during the analysis phase, all nodes are tested . If a problem is identified, the node with the problem is flagged. You can continue and omit that node or fix the problem with that node and restart the analysis phase.

Another new feature of the cluster configuration wizard is the Advanced button found on the Select Computers screen. This gives you access to a prompt to disable enumeration of your shared storage. This option should only be used when you have a complex SAN infrastructure that causes the cluster wizard to misbehave. It should not be used to hide SAN configuration problems, but rather after you have verified your SAN is correctly configured but is still causing the cluster configuration wizard confusion. Using this option results in the following:

A cluster is created without shared storage using a local quorum scheme.
A new node is added without checking that it can access the shared storage.
You need to manually create Physical Disk resources for all disks to be managed by the cluster. The shared disks will not be protected until this is done.

New Log Files

To help troubleshoot cluster-installation problems, the cluster setup wizard now produces a log file named ClCfgSrv.Log . Oddly, the file is stored in %winDir%\System32\LogFiles\Cluster , which is not the same directory as the other cluster- related log files. That file contains information about every operation performed during cluster configuration. Setup operations can involve multiple nodes ”for example, a node that runs the wizard and nodes that are being added. Each of those nodes will contain the log file and the information that is relevant to that particular node.

Internally, cluster keeps tracks of object using GUIDs (Globally Unique IDentifier). GUIDs can make it hard to follow the events recorded in the log files. To help, a new file called %WinDir%\Cluster\Cluster.OML is now created. The file records all cluster GUIDs and the human understandable name and properties of that object. It is important to note that when you want to use that file, it is best to search starting at the bottom and working your way up to find the most current properties for a GUID.

Beware that even though the new log files have a similar format to the cluster.log, unlike the cluster.log, the time stamps are not GMT, but always local time.

New Disaster Recovery Tools

Shared-storage problems are the most common reason for downtime. In Windows Server 2003, Microsoft introduces a couple of tools to help recover faster from those issues. There are two separate issues to be dealt with when losing a Physical Disk:

If it is the quorum disk, you lose quorum information and the cluster is down.
If it is a data disk, you lose the signature and drive letter and the resource group is down.

The two new tools, both of which can be found in the resource kit, are clusterrecovery.exe and confdisk.exe.

Confdisk.exe : Actually not cluster-specific and is part of the new ASR (Automatic Server Recovery) that replaces ERD. The ASR.SIF file is used to save the disk configuration. Then, when a disk is lost, the same configuration file is used to restore the configuration. The configuration includes the disk signature, disk partitions with size, and the associated driver letters . ConfDisk.exe is extremely fast and you should modify your backup procedures to automatically use it to create an ASR.SIF before every backup so that the ASR.SIF is included in the backup.
ClusterRecovery.exe : Cluster-specific and uses the information in the Registry to replace the disk instead of the ASR.SIF file. The utility is UI-based and will ask you what disk you want to replace and the new disk name. So unlike ConfDisk.exe, you must format the disk and assign it some drive letter (not the same as the old disk). This utility also handles rebuilding the quorum information if the disk also happens to be the quorum disk

note

Neither utility provides an automatic backup/restore of the data, so a good solid backup policy must still be used to protect data.

Changing the Cluster Service Password

With increased hacker activities, many Administrators are faced with corporate mandates to change passwords at regular intervals. In the past, changing the cluster service account pass word could be painful and time-consuming . The password had to be changed in the account database, and then all the cluster servers had to have the password updated for the service account. In Windows Server 2003, Microsoft added a new CLI to automate that process. The command is Cluster/ChangePass and has the following limitations:

The command is not transactional, so if an error occurs while it's making the many updates, you need to manually take over and clean up.
If you have a generic service account that is used by services other than cluster, those other services will need to be handled manually.

You should use that command in two steps: first, using the /Test qualifier to verify that the command has a high probability of success, and then a second time without the /Test after all the issues reported by using /Test have been resolved. This reduces the chances of the command failing midway through the multiple updates.

New cluster.log Features and Limitations

The cluster log file has also been improved. Every entry now includes a severity level. The three defined severity levels are INFO , WARN , and ERR . This greatly simplifies initial troubleshooting by allowing you to first concentrate on all ERR entries. Note that it isn't possible to completely ignore the other severities. For example, all entries made by a Generic Script resource will be flagged as INFO .

Another great addition is a GMT to local time stamp. Every time the cluster service starts, it logs an entry that shows the local time and the GMT time. Even though converting back and forth between GMT and local time should be easy, the addition of daylight savings time and other oddities caused many of us to waste time looking at the wrong entries in the cluster log. The drawback of login only at service startup is that if your cluster is truly high availability and you happen to be running chatty cluster applications (such as Exchange Server 2000 or SQL 2000), your cluster log can easily roll over and not include a service startup when you need to use the cluster.log.

Hang Detection

One of the most frustrating cluster problems is when a cluster node appears to hang and yet the cluster resources do not failover. The source of the problem is usually a hung or deadlocked application that causes the server to become unresponsive to user mode applications, yet allows the cluster heartbeat messages and IsAlive handling to continue being processed so that the cluster services never notice that anything is wrong. Even though the source of the problem is a misbehaving application, cluster is designed to handle failed applications.

Windows Server 2003 introduces two new hang troubleshooting features that allow Administrators to attempt to isolate the problem. Note, hang detection has been back-ported to Windows 2000, see Microsoft KB article 815267, "How to Enable User Mode Hang Detection on a Server Cluster," for details. The first feature is to make the cluster service report its status to the cluster network driver. If the cluster service is prevented from reporting its status, the network driver initiates a failure. The other check is to force the resource monitors to verify that resource functions do not stall forever (deadlock). If the resource monitor ever detects such an event, it will terminate itself and cause the failover manager to take action.

Cluster service hang detection is enabled through two cluster properties stored in the Registry:

ClusSvcHeartbeatTimeout : Determines how long the cluster service can miss reporting its status before it is determined to be hung. The default value is 60 seconds.
HangRecoveryAction : Sets what action to take once the cluster service is determined to be hung. The valid options are 0 to disable, 1 to log an event 1129 in the System Event Log, 2 to terminate cluster service and log event 1128, and 3 to crash the machine with a STOP 0x0000009E (USER_MODE_ HEALTH_MONITOR).

Deadlock hang detection is enabled through four cluster properties:

EnableResourceDllDeadlockDetection : Can be 0 (disabled) or 1 (enabled). Default is disabled.
ResourceDllDeadlockTimeout : DWORD that contains the timeout in seconds that the resource monitor will wait to declare a deadlock. Minimum is 180 seconds, and default is 240 seconds.
ResourceDllDeadlockThreshold : DWORD with number of times the Failover Manager will allow a deadlock within the ResourceDllDeadlockPeriod . After that limit is exceeded, Failover Manager informs the Membership Manager. Defaults to 3.
ResourceDllDeadlockPeriod : DWORD of time in seconds during which Failover Manager allows ResourceDllDeadlockThreshold deadlocks in the resource monitor. Minimum is 180 seconds, and default is 1,800 seconds.

The cluster command should be used to set those parameters. Enable it using the following command:

 cluster.exe /cluster:TestCluster /prop EnableResourceDllDeadlockDetection=1

note

Some of those troubleshooting actions can be quite drastic and will affect other applications running on the cluster beside the ones that are hung.

AD Support

Windows 2000 SP3 introduced new functionality to the cluster server by allowing the VS to get registered in the AD (Active Directory) as a CO (Computer Object). In pre-Windows 2000 SP2, only MSMQ registered the VS as a CO. The lack of AD integration prevented Kerberos authentication for cluster access and required the older, less secure NTLM ( NT LAN Manager) authentication. The functionality was added in the network name resource. However, prior to Windows Server 2003, the only interface to the functionality was via CLI. Windows Server 2003 introduced a GUI (Graphical User Interface) to control AD integration.

Best Practices

There are many considerations when updating high availability systems; careful planning and choosing procedures that minimize risk may not always the simplest or cheapest option, but due diligence in planning and execution will minimize complications. This section will present guidelines to avoid complications.

Server Consolidation

With larger servers becoming available and Microsoft pushing the scale-up concept, many sites have investigated server consolidation as a means to reduce cost. However, extreme care must be exercised when performing server consolidation using a cluster. The two tend to have opposite effects. Clustering is a high-availability option, whereas server consolidation is a cost-containment option that often decreases availability.

The general rule is to reduce complexity to increase availability. The best way to reduce complexity is to reduce the number of applications running on a server. That is why high-availability systems are best dedicated to a single application.

However, application consolidation onto a cluster is sometimes required. When doing so, care must be exercised to consider the many aspects of application compatibility:

Troubleshooting compatibility : Some applications transfer data through a server, but do not store much information on the server. When problems develop in those applications, a reboot is often the fastest recovery method. However, for applications that store and maintain a great amount of data, a reboot is often the last desperate step in problem resolution.
Backup scheduling : Some applications have different peak demand schedules, and therefore have different backup window schedules. For example, payroll applications often run at off hours and would be most impacted by the off- hour backup requirements of other applications.
Political compatibility : Great care must be exercised when multiple political entities (departments) decide to pool their resources to fund equipment for a cluster. Political wrangling can be the source of many impasses. Expectations must be set correctly as to what is and is not possible as far as resource scheduling and partitioning are concerned .
Memory requirements : Earlier in the chapter, incompatibilities between Exchange and SQL memory requirements were discussed. Unfortunately, this can apply to a lot of other applications.
Management compatibilities : After applications share a server, the Administrators that are privileged can affect any of the applications on the server. Just because a DBA (database Administrator) is qualified to manage an SQL database, does not mean you want that person to control the stock purchase Web page.
Resource requirements : Sometimes, you have multiple small applications that do not have much CPU or memory requirements, but are extremely demanding of the I/O subsystem. Consolidating such applications on a server might require separate I/O subsystems for each application, and that might not be a viable option on the size of server that the CPU and memory requirement would dictate .

As a rule, it's better to consolidate multiple similar applications to reduce the number of components (such as multiple SQL databases onto one server); however, if the applications are too similar (multiple real-time applications), they can also interfere with one another.

Miscellaneous

Windows Server 2003 also contains a host of other improvements. Here is a simple list that might help you decide if this is a solution for you:

Terminal Services is now supported in a cluster in application mode. This does not imply that the terminal sessions are cluster-aware and will be moved from node to node, just that the server farm will appear to recover much quicker from failure.

note

Just because Microsoft allows clustering of Terminal Services, it does not mean it makes sense. NLB clustering might be more appropriate.
A Terminal Server session directory can be installed on a cluster to provide high availability to the terminal session directory. The session directory is a front-end process used to load balance Terminal Server sessions onto a Terminal Server farm.
Windows clustering does not support mixed architecture (IA32 versus IA64) clustering. The application monitor architecture that uses a resource DLL prevents different architectures from clustering together.
Cross over cables and general network cable problems are better handled with DHCPDisableMediaSense set to 1 on Enterprise and Datacenter Editions by default. This prevents the TCP/IP stack from being unloaded when a network cable is not detected .
To reduce the intracluster network traffic, clusters with more than two nodes try to switch from using unicast to multicast for heartbeats. ClusNet determines whether the infrastructure allows communication via multicast. If, for any reason, multicast communication does not work, ClusNet reverts back to targeted unicast. The cluster also attempts to make use of a MADCAP (Multicast Address Dynamic Client Allocation Protocol) server: MADCAP is to multicast what DHCP is to unicast. Windows 2000 and newer DHCP servers can also be configured as MADCAP servers.
Client Side Caching (CSC is offline file access also called Offline Folders) is now supported. Prior to Windows Server 2003, CSC was not cluster aware and manual tweaking was required to use it in a clustered environment.
It is possible to instantiate multiple DFS (Distributed File System) roots in the same cluster. DFS roots are great tools for hiding topology and presenting a unified namespace for clients to access.
On a cluster with more than two nodes, it's now possible to move a group selecting Best Possible, and the cluster automatically figures out which machine to pick based on the preferred owner list. (See the Help and Support center and Microsoft KB article 299631, "Failover Behavior on Clusters of Three or More Nodes."
On clusters with SAN-based shared storage, if your SAN vendor supports the new StorPort architecture, cluster will use targeted LUN (Logical Unit Number) resets instead of SCSI bus resets for arbitration. This makes arbitration much less disruptive. Furthermore, the StorPort architecture allows your cluster to make full use of the SAN potential that was not possible with the SCSIport architecture.

< Day Day Up >