Windows Clustering

   

The Windows Clustering technologies have been enhanced to provide additional facilities that enable a wider range of server cluster scenarios and topologies to be deployed.

The many important new and improved features for server clusters cover a wide range of categories: general, installation, resources, network enhancements, storage, operations, support, and troubleshooting.

General Improvements

General improvements in the Cluster service for Windows Server 2003 include the following:

  • Larger cluster sizes.

    Windows Server 2003, Enterprise Edition, now supports eight-node clusters (formerly two), and the Datacenter Edition now supports eight-node clusters (formerly four). Larger cluster sizes provide greater flexibility. Being able to use larger cluster sizes provides much more flexibility in how applications can be deployed on a server cluster. Applications that support multiple instances can run more instances across more nodes; multiple applications can be deployed on a single server cluster with much more flexibility and control over the semantics if a node fails or is taken down for maintenance.

  • 64-bit support.

    The 64-bit versions of Windows Server 2003 support the Cluster service; 64-bit supports large memory needs. SQL Server 2000 Enterprise Edition (64-bit) is one example of an application that can make use of the increased memory space of the 64-bit versions of Windows Server 2003 (up to 4 TB ”Windows 2000 Datacenter Server supports up to 64 GB only), while at the same time taking advantage of clustering. This provides an incredibly powerful platform for the most computer- intensive applications while ensuring high availability of those applications.

  • High availability.

    Terminal Server directory service can be made highly available through failover.

  • Cluster installation wizard.

    The cluster installation wizard provides validation and verification. It allows generic scripting to make applications highly available.

  • Majority node set clusters.

    Windows Server 2003 has an optional quorum resource that does not require a disk on a shared bus for the quorum device. This feature is designed to be built into larger end-to-end solutions by OEMs, independent hardware vendors (IHVs), and other software vendors rather than to be deployed by end users specifically ”although this is possible for experienced users. Scenarios in which majority node set (MNS) clusters add value include

    • Geographically dispersed clusters. This mechanism provides a single Microsoft-supplied quorum resource that is independent of any storage solution for a geographically dispersed or multisite cluster. There is a separate cluster hardware compatibility list (HCL) for geographic clusters.

    • Highly available devices with no shared disks. These low-cost or appliance-like devices use techniques other than shared disks, such as log shipping or software disk ”or file system replication and mirroring ”to make data available on multiple nodes in the cluster.

    Windows Server 2003 provides no mechanism to mirror or replicate user data across the nodes of an MNS cluster. So while building clusters with no shared disks at all is possible, making the application data highly available and redundant across machines is an application-specific issue. MNS clusters provide the following benefits:

    • Storage abstraction.

      Frees up the storage subsystem to manage data replication among multiple sites in the most appropriate way without having to worry about a shared quorum disk; at the same time, the concept of a single virtual cluster is supported.

    • No shared disks.

      Some scenarios require tightly consistent cluster features yet don't require shared disks ”for example, (a) clusters wherein the application keeps data consistent between nodes, such as database log shipping and file replication for relatively static data; and (b) clusters that host applications that have no persistent data but need to cooperate in a tightly coupled way to provide consistent volatile state.

    • Enhanced redundancy.

      If the shared quorum disk is corrupted in any way, the entire cluster goes off line. With majority node sets, the corruption of quorum on one node does not bring the entire cluster off line.

Installation

Improvements in the server cluster installation process include the following:

  • Installation by default.

    Cluster files are placed on the nodes when Windows Server 2003 is installed. You need only configure a cluster by launching Cluster Administrator or script the configuration with Cluster.exe. In addition, third-party quorum resources can be preinstalled and then selected during server cluster configuration rather than having additional, resource-specific, procedures. All server cluster configurations can be deployed the same way. These features provide the following benefits:

    • Easier administration.

      You no longer need to provide a media CD to install the Cluster service.

    • No reboot.

      You no longer need to reboot after you install or uninstall the Cluster service.

  • Preconfiguration analysis.

    The installation process analyzes and verifies hardware and software configuration and identifies potential problems. It provides a comprehensive and easy-to-read report on any potential configuration issues before the server cluster is created. This ensures that any known incompatibilities are detected prior to configuration. For example, Services for Macintosh (SFM), NLB, dynamic disks, and addresses issued using Dynamic Host Configuration Protocol (DHCP) are not supported with the Cluster service.

  • Default values.

    Installation creates a server cluster that conforms to best practices using default values and heuristics. Many times, for newly created server clusters, the default values are the most appropriate configuration. Server cluster creation asks many fewer setup questions, data is collected, and the code makes decisions about the configuration. The goal is to get a default server cluster up and running that can then be customized using the server cluster administration tools if required. This allows multiple nodes to be added to a server cluster in a single operation. This makes it quicker and easier to create multinode server clusters.

  • Extensible architecture.

    Extensible architecture allows applications and system components to take part in server cluster configuration. For example, applications can be installed prior to a server being clustered, and the application can participate in (or even block) this node joining the server cluster. This allows applications to set up server cluster resources or change their configuration as part of server cluster installations rather than as a separate postserver cluster installation task.

  • Remote administration.

    Remote administration allows full remote creation and configuration of the server cluster. New server clusters can be created and nodes can be added to an existing server cluster from a remote management station. In addition, drive letter changes and physical disk resource failover are updated to Terminal Server client sessions. This allows for better remote administration via Terminal Services.

  • Command-line tools.

    Server cluster creation and configuration can be scripted through the Cluster.exe command-line tool. This makes it much easier to automate the process of creating a cluster.

  • Simpler uninstallation.

    Uninstalling the Cluster service from a node is now a one-step process of exiting the node. Previous versions required eviction and then uninstallation. Uninstalling the Cluster service is much more efficient than before because you need only to evict the node through Cluster Administrator or Cluster.exe, and the node is unconfigured for Cluster support. Also, a new switch for Cluster.exe will force the uninstallation if getting into Cluster Administrator is problematic .

  • Local quorum.

    If a node isn't attached to a shared disk, it will automatically configure a local quorum resource. It's also possible to create a local quorum resource once the Cluster service is running. It's easy for a user to create a test cluster on his or her local PC for testing out cluster applications or for getting familiar with the Cluster service. You don't need special cluster hardware that has been certified on the Microsoft Cluster HCL to run a test cluster. Local quorum is supported only for one-node clusters. In addition, the use of hardware not certified on the HCL isn't supported for production environments. In the event that you lose all your shared disks, one option for getting a temporary cluster working (for example, while you wait for new hardware) is to use the Cluster.exe /fixquorum switch to start the cluster. After doing this, create a local quorum resource and set it as your quorum:

    • For a print cluster, you can point the spool folder to the local disk.

    • For a file share, you can point the file share resource to the local disk, where backup data has been restored.

  • Quorum selection.

    You no longer need to select which disk is going to be used as the quorum resource. It's automatically configured on the smallest disk that is larger than 50 MB and formatted using NTFS. The end user no longer has to worry about which disk to use for the quorum. The option to move the quorum resource to another disk is available during setup or after the cluster has been configured.

  • Active Directory.

    The Cluster service now has much tighter integration with the Active Directory directory service, including a virtual computer object, Kerberos authentication, and a default location for services such as Microsoft Message Queuing (MSMQ) to publish service control points. By publishing a cluster virtual server as a computer object in Active Directory, users can access the virtual server just as they can access any other Windows 2000 server.

    The only roles for the virtual server computer object in Windows Server 2003 are to allow Kerberos authentication for services hosted on a virtual server and to allow cluster-aware and Active Directory “aware services (such as MSMQ) to publish service provider information specific to the virtual server they are hosted in. Following are further details of Kerberos authentication and of publishing services:

    • Kerberos authentication.

      Kerberos authentication allows a user to be authenticated against a server without ever having to send the user's password. Instead, the user presents a ticket that grants access to the server. This contrasts with NTLM authentication, used by Windows Clustering in Windows 2000, which sends the user's password as a hash over the network. In addition, Kerberos supports mutual authentication of client and server and allows delegation of authentication across multiple machines. To have Kerberos authentication for the virtual server in a mixed-mode cluster ”for example, Windows 2000 and Windows Server 2003 ”you must be running Windows 2000 Enterprise Server SP3 or a later version. Otherwise, NTLM will be used for all authentication.

    • Publishing services.

      Now that the Cluster service is Active Directory “aware, it can integrate with other services that publish information about a service in Active Directory. For example, MSMQ 2.0 can publish information about public queues in Active Directory so that users can easily find their nearest queue. Windows Server 2003 now extends this service to allow clustered public queue information to be published in Active Directory. Cluster integration does not make any changes to the Active Directory schema.

    Caution

    Although the network name server cluster resource publishes a computer object in Active Directory, that computer object should not be used for administrative tasks such as applying Group Policy.


Resources

Improvements in server cluster resources include the following:

  • Printer configuration.

    Windows Clustering now provides a much simpler configuration process for setting up clustered printers. To set up a clustered print server, you need to configure only the spooler resource in Cluster Administrator and then connect to the virtual server to configure the ports and print queues. This is an improvement over previous versions of Windows Clustering, in which you had to repeat the configuration steps on each node in the cluster, including installing printer drivers.

  • MSDTC configuration.

    The Microsoft Distributed Transaction Coordinator (MSDTC) can now be configured once and replicated to all nodes. In previous versions, the Comclust.exe utility had to be run on each node in order to cluster the MSDTC. It's now possible to configure MSDTC as a resource type, assign it to a resource group, and have it automatically configured on all cluster nodes. Also, once configured, when new nodes are added to the cluster, DTC is set up on the new node automatically.

  • Scripting.

    You can make existing applications server cluster “aware by using scripting (Visual Basic Scripting Edition and JScript) rather than by writing resource DLLs in C or Visual C++. Scripting makes it much simpler to write specific resource plug-ins for applications so that they can be monitored and controlled in a server cluster. Scripting supports resource-specific properties, which enable a resource script to store server cluster “wide configurations that can be used and managed in the same way as any other resource. Adding to the script can also enhance health checking. For example, you can start off with a simple generic script, and you can then add to it later to check whether it's providing the desired service.

  • MSMQ triggers.

    The Cluster service has enhanced the MSMQ resource type to allow multiple instances on the same cluster. MSMQ triggers let you have multiple clustered message queues running at the same time. This provides increased performance (in the case of Active/Active MSMQ clusters) and flexibility. You can have only one MSMQ resource per cluster group.

Network Enhancements

Improvements in network enhancements for Windows Clustering include the following:

  • Enhanced network failover.

    The Cluster service now supports enhanced logic for failover when a complete loss of internal (heartbeat) communication has occurred. The network state for public communication of all nodes is now taken into account. In Windows 2000, if node A owned the quorum disk and lost all network interfaces (for example, public and heartbeat), it would retain control of the cluster, even though no one could communicate with it and another node might have had a working public interface. Windows Server 2003 cluster nodes now take the state of public interfaces into account prior to arbitrating for control of the cluster.

  • Media sense detection.

    When using the Cluster service, if network connectivity is lost, the TCP/IP stack does not get unloaded as it did in Windows 2000 by default. There is no longer the need to set the DisableDHCPMediaSense registry key. In Windows 2000, if network connectivity was lost, the TCP/IP stack was unloaded, which meant that all resources that depended on IP addresses were taken off line. Also, when the networks came back on line, their network role reverted to the default setting ”for example, client and private. With Media Sense disabled by default, the network role is preserved and all IP address “dependent resources are kept on line.

  • Multicast heartbeat.

    Multicast heartbeats are allowed between nodes in a server cluster. Multicast heartbeat is automatically selected if the cluster is large enough and if the network infrastructure can support multicasting between the cluster nodes. Although the multicast parameters can be controlled manually, a typical configuration requires no administration tasks or tuning to enable this feature. If multicast communication fails for any reason, the internal communications will revert to unicast. All internal communications are signed and secure. Using multicast reduces the amount of traffic in a cluster subnet. This can be particularly beneficial in clusters of more than two nodes or in geographically dispersed clusters.

Storage

Improvements in storage when using Windows Clustering include the following:

  • Resizing of clustered disks.

    Clustered disks can be resized dynamically by using the command-line tool DiskPart, provided the underlying storage infrastructure is capable of extending a logical unit dynamically. If you increase the size of a shared disk, the Cluster service will now dynamically adjust to it. This is particularly helpful for storage area networks (SANs), where volume sizes can change easily to avoid disk-full situations.

  • Volume mount points.

    Volume mount points are now supported on shared disks (excluding the quorum) and will work properly on failover if configured correctly. Volume mount points (in Windows 2000 or later versions) are directories that point to specified disk volumes in a persistent manner: for example, you can configure C:\Data to point to a disk volume. They alleviate the need to associate each disk volume with a drive letter, thereby overcoming the 26-drive-letter limitation. For example, without volume mount points, you would have to create a G drive to map the data volume to. Now that the Cluster service supports volume mount points, you have much greater flexibility in how you map your shared disk namespace. The directory that hosts the volume mount point must be NTFS because the underlying mechanism uses NTFS reparse points. However, the file system that is being mounted can be file allocation table (FAT), FAT32, NTFS, Compact Disc File System (CDFS), or Universal Disc File System (UDFS).

  • Client-side caching.

    Client-side caching (CSC) is now supported for clustered file shares. CSC for clustered file shares enables a client to cache data stored on a clustered share. The client works on a local copy of the data, which is uploaded back to the server cluster when the file is closed. This allows any failure of a server in the server cluster, and subsequent failover of the file share service, to be hidden from the client.

  • Distributed File System.

    Distributed File System (DFS) has had a number of improvements, including multiple stand-alone roots, independent root failover, and support for Active-Active configurations. DFS allows multiple file shares on different machines to be aggregated into a common namespace: for example, \\dfsroot\share1 and \\dfsroot\share2 are actually aggregated from \\server1\share1 and \\server2\share2. New clustering benefits include

    • Multiple stand-alone roots.

      Previous versions supported only one clustered stand-alone root. You can now have multiple clustered stand-alone roots, giving you much greater flexibility in planning your distributed file system namespace: for example, multiple DFS roots on the same virtual server or multiple DFS roots on different virtual servers.

    • Independent failover.

      Granular failover control is available for each DFS root. This lets you configure failover settings on an individual basis, which results in faster failover times.

    • Active/active configurations.

      You can now have multiple stand-alone roots running actively on multiple nodes.

  • Encrypting File System.

    With Windows Server 2003, the Encrypting File System (EFS) is supported on clustered file shares. This allows data to be stored in an encrypted format on clustered disks.

  • Storage area networks.

    Clustering has been optimized for SANs, including targeted device resets and shared storage buses:

    • Targeted bus resets.

      The server cluster software now issues a special control code when releasing disk drives during arbitration. This control code can be used in conjunction with host bus adapter (HBA) drivers, which support the extended Windows Server 2003 feature set, to selectively reset devices on the SAN rather than perform a full bus reset. This ensures that the server cluster has much lower impact on the SAN fabric.

    • Shared storage bus.

      Shared disks can be located on the same storage bus as the boot, page-file, and dump-file disks. This allows a clustered server to have a single storage bus (or a single redundant storage bus).

This feature is disabled by default because of configuration restrictions. It can and should be enabled only by OEMs and IHVs for specific and qualified solutions. This is not a general-purpose feature exposed to end users.

Operations

Improvements in operations for Windows Clustering include the following:

  • Backup and restore.

    You can actively restore the local cluster nodes' cluster configuration, or you can restore the cluster information to all nodes in the cluster. Node restoration is also built into Automated System Recovery (ASR).

    • Backup and restore.

      Backup (NTBackup.exe) in Windows Server 2003 has been enhanced to enable seamless backups and restores of the local cluster database and to make it possible to restore the configuration locally and to all nodes in a cluster.

    • Automated system recovery.

      ASR can completely restore a cluster in a variety of scenarios, including damaged or missing system files, complete operating system reinstallation as a result of hardware failure, a damaged cluster database, or changed disk signatures (including shared).

  • Group affinity support.

    Group affinity support allows an application to describe itself as an N+I application. This means that if an application is running actively on N nodes of a server cluster, I spare nodes are available if an active node fails. In the event of failure, the failover manager will try to ensure that the application is failed over to a spare node rather than to a node that is currently running the application. Applications are failed over to spare nodes before active nodes.

  • Node eviction.

    Evicting a node from a server cluster no longer requires a reboot to clean up the server cluster state. A node can be moved from one server cluster to another without having to reboot. In the event of a catastrophic failure, the server cluster configuration can be force-cleaned regardless of the server cluster state. The following benefits ensue:

    • Increased availability.

      No reboots increases the uptime of the system.

    • Disaster recovery.

      In the event of a node failure, the cluster can be cleaned up easily.

  • Rolling upgrades.

    Rolling upgrades allow one node in a cluster to be taken off line for upgrading while other nodes in the cluster continue to function on an older version. Rolling upgrades are supported in Windows 2000 and Windows Server 2003, although there is no support for rolling upgrades from a Microsoft Windows NT 4.0 cluster to a Windows Server 2003 cluster. An upgrade from Windows NT 4.0 is supported, but the cluster will have to be taken off line during the upgrade.

  • Password change.

    Using Windows Server 2003, you can change the Cluster service account password on the domain as well as on each local node without having to take the cluster off line. If multiple clusters use the same Cluster service account, you can change them simultaneously . In Microsoft Windows NT 4.0 and Microsoft Windows 2000, to change the Cluster service account password, you have to stop the cluster service on all nodes before you can make the password change.

  • Resource deletion.

    Resources can be deleted in Cluster Administrator or with Cluster.exe without taking them off line first. In previous versions, you first had to take a resource off line before you could delete it. Now the Cluster service will take resources off line automatically and then delete them.

  • WMI support.

    Server clusters provide WMI support for

    • Cluster control and management functions.

      These include starting and stopping resources, creating new resources and dependencies, and other functions.

    • Application and cluster state information.

      WMI can be used to query whether applications are on line and whether cluster nodes are up and running, as well as to request a variety of other status information.

    • Cluster state change events.

      Cluster state change events are propagated via WMI to allow applications to subscribe to WMI events that show when an application has failed, when an application is restarted, when a node fails, and other occurrences.

    • Better management.

      WMI support enables server clusters to be managed as part of an overall WMI environment.

Supporting and Troubleshooting

Improvements in support and troubleshooting when working with Windows Clustering include the following:

  • Offline/failure reason codes.

    These codes provide additional information to the resource as to why the application was taken off line or why it failed. Reason codes enable an application to use different semantics if it or one of its dependencies has failed, as opposed to the administrator specifically moving the group to another node in the server cluster.

  • Software tracing.

    The Cluster service now has a feature called software tracing that will produce more information to help with troubleshooting cluster issues. This is a new method for debugging that will allow Microsoft to debug the Cluster service without loading checked build versions of the DLLs (symbols).

  • Cluster logs.

    A number of improvements have been made to the Cluster service log files, including a setup log, error levels ( info , warn, err), local server time entry, and GUID to resource name mapping.

    • Setup log.

      During configuration of the Cluster service, a separate setup log (%SystemRoot%\system32\Logfiles\Cluster\ClCfgSrv.log) is created to assist in troubleshooting.

    • Error levels.

      This makes it easy to be able to highlight just the entries that require action ”for example, err .

    • Local server time stamp.

      This assists in comparing event log entries with cluster logs.

  • Event Log.

    Additional events are written to the event log, not only indicating error cases but showing when resources are successfully failed over from one node to another. Improvements to the event log enable event log parsing and management tools to be used to track successful failovers rather than just catastrophic failures.

  • Clusdiag.

    A new tool named Clusdiag is available in the Windows Server 2003 Resource Kit. Clusdiag offers the following abilities :

    • Better troubleshooting.

      Clusdiag makes reading and correlating cluster logs across multiple cluster nodes and debugging of cluster issues more straightforward.

    • Validation and testing.

      Clusdiag allows users to run stress tests on the server, storage, and clustering infrastructure. As a result, it can be used as a validation and test tool before a cluster is put into production.

  • CHKDSK log.

    The Cluster service creates a CHKDSK log whenever CHKDSK is run on a shared disk. This allows a system administrator to find out about and react to any issues that were discovered during the CHKDSK process.

  • Disk corruption.

    When disk corruption is suspected, the Cluster service reports the results of CHKDSK in event logs and creates a log in %SystemRoot%\Cluster. Results are logged in the Application event log and in Cluster.log. In addition, Cluster.log references a log file (for example, %windir%\CLUSTER\CHKDSK_DISK2_SIGE9443789.LOG) in which detailed CHKDSK output is recorded.


   
Top


Introducing Microsoft Windows Server 2003
Introducing Microsoft Windows Server(TM) 2003
ISBN: 0735615705
EAN: 2147483647
Year: 2005
Pages: 153

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net