The Cluster service in Windows 2000 Advanced Server and Windows 2000 Datacenter Server provides a foundation for server clusters. When one server in a cluster fails or is taken offline, another server in the cluster takes over the operations of the failed server. Clients using server resources experience little or no interruption of their work because support for resources is moved from one server to the other. When implementing Windows clustering into a network design, you must consider many factors and prepare the environment that supports the clusters. For example, you must select which applications to run on a server cluster, and you must determine failover policies for resource groups. This lesson focuses on those aspects of planning a server cluster that you should consider when designing your network.
You should consider a number of steps when planning a server cluster, including identifying network risks, choosing applications to run on the cluster, choosing a domain model, choosing a cluster model, planning resource groups, determining failover policies, planning fault-tolerant storage, and determining capacity requirements. This section discusses each of these steps.
With Windows 2000, you can use server clusters to provide increased availability. However, server clusters aren’t designed to protect all components of your workflow in all circumstances. For example, clusters aren’t an alternative to backing up data; they protect the availability of data only, not the data itself.
When you configure a cluster, you should identify any possible single points of failure in your network environment. In general, you should try to minimize those points of failure and provide mechanisms that will maintain service when a failure occurs.
Windows 2000 Advanced Server and Windows 2000 Datacenter Server include built-in features (in addition to the Cluster service) that protect certain computer and network processes during failure. These features include two redundant array of independent disks (RAID) implementations: mirroring (RAID-1) and striping with parity (RAID-5). You should note, however, that software implementations of RAID are used to protect a computer’s internal drives, not the external storage used by the cluster.
To further increase the availability of network resources and prevent the loss of data, do the following:
You can adapt many, but not all, applications to run on a server cluster. Of those that can, you don’t need to set them all up as cluster resources. The following criteria determine whether an application can adapt to server clustering failover mechanisms:
In addition to these specifications, client applications that connect to the server application must be able to retry and recover from temporary network failures. During failover, client applications experience a temporary loss of network connectivity. If the client application is configured to recover from temporary network connection problems, it’s able to continue operating after a server failover.
Applications that can be failed over can be divided into two groups: those that support the Cluster API and those that don’t. Applications that support the Cluster API are defined as cluster-aware. These applications can register with the Cluster service to receive status and notification information, and they can use the Cluster API to administer clusters. Applications that don’t support the Cluster API are defined as cluster-unaware. If cluster-unaware applications meet the TCP/IP and remote-storage criteria, you can still use them in a cluster and often configure them to failover.
In either case, applications that keep significant state information in memory aren’t the best applications for clustering because information that’s not stored on disk is lost at failover.
Nodes in a server cluster must belong to the same domain. The cluster nodes, which must be configured with Windows 2000 Advanced Server or Windows 2000 Datacenter Server, can be either member servers or domain controllers. If you configure your cluster nodes as domain controllers, you must account for the additional overhead that’s incurred by the domain controller services. If you configure the cluster nodes as member servers, the cluster’s availability depends on the availability of the domain controller, which must be high.
In large networks running on Windows 2000, domain controllers can require substantial resources to replicate the directory and authenticate clients. For this reason, many applications, such as Microsoft SQL Server and Message Queuing, should not be installed on domain controllers in order to maximize performance. However, if you have a very small network in which account information rarely changes and in which users don’t log on and off frequently, you can use domain controllers as cluster nodes.
Server clusters can be categorized into different configuration models. You should choose a cluster model that best matches your organization’s needs. Cluster models are discussed in more detail in Lesson 3, "Choosing a Server Cluster Model."
You can take six steps to organize your applications and other resources into groups. This section reviews each of these steps.
Make a list of all applications that will run on the cluster nodes, regardless of whether or not you plan to use them with the Cluster service. You can determine your capacity needs by adding up the resources necessary for each resource group and the resources necessary for those applications and services that will run independently of the Cluster service.
Determine which applications on your list can use failover and which applications will reside on cluster nodes but won’t use failover (because it’s inconvenient, unnecessary, or impossible to configure). Although you don’t set failover policies for these applications or arrange them in groups, they still use a portion of the server capacity.
Before clustering an application, review the application license or check with the application vendor. Each application vendor sets its own licensing policies for applications running on clusters.
Determine which hardware, connections, and operating system software a server cluster can protect in your network environment. For example, the Cluster service can failover print spoolers to protect client access to printing services and failover file-server resources to maintain client access to files. In both cases, capacity is affected, such as the random access memory (RAM) required to service the clients.
Once you have a complete list of all the resources, determine which ones are your core resources, and then determine which ones support the core resources. For example, a SQL Server resource would be your core resource, and Network Name, IP Address, and Disk resources would support the SQL Server resource. All these resources must be in the same group to ensure that the Cluster service keeps interdependent resources together at all times.
Once you’ve listed all your resources and their dependencies, you’re ready to make a preliminary decision about how to group these resources. In many cases resource groupings are very apparent because dependencies restrict how you can group some resources.
When grouping together resources, you should adhere to these guidelines:
For example, if several applications depend on a particular resource, you must include all of those applications with that resource in a single group. Suppose, for example, a Web-server application provides access to Web pages and that those Web pages provide result sets that clients access by querying an SQL-database application through the use of Hypertext Markup Language (HTML) forms. If you put the Web server and the SQL database in the same group, the data for both core applications can reside on a specific disk volume. Because both applications exist within the same group, you can also create an IP address and network name specifically for this resource group.
When not restricted by resource dependencies, you can organize groups by administrative convenience. For example, you might put file-sharing and print-spooling resources (along with their dependencies) into one group because viewing those particular applications as a single entity makes it easier to administer the network. You can give this group a unique name for the part of your organization it serves, such as Accounting File and Print. Whenever you need to intervene with that department’s file- and print-sharing activities, you’d look for this group in Cluster Administrator.
After you list the resources that you want to group together, assign a different name to each group and create a dependency tree. A dependency tree is useful for visualizing the dependency relationships between resources.
To create a dependency tree, first write down all the resources in a particular group. Then draw arrows from each resource to each resource on which the resource directly depends.
A direct dependency between resource A and resource B means that no intermediary resources are between the two resources. An indirect dependency occurs when a transitive relationship exists between resources. For example, if resource A depends on resource B and resource B depends on resource C, there’s an indirect dependency between resource A and resource C, rather than a direct one.
Figure 4.6 shows the resources in a final grouping assignment in a dependency tree.
Figure 4.6 - A simple dependency tree
In Figure 4.6 the File Share resource depends on the Network Name resource, which in turn depends on the IP Address resource. However, the File Share resource doesn’t directly depend on the IP Address resource.
You must assign failover policies for each group of resources in your cluster. These policies determine exactly how a group behaves when failover occurs. You can choose which policies are most appropriate for each resource group you set up.
Failover policies for groups include three settings:
Many groups include disk resources for disks on shared buses. In some cases, these are simple physical disks, but in other cases they’re complex disk subsystems containing multiple disks. Almost all resource groups depend on the disks on the shared buses. An unrecoverable failure of a disk resource results in certain failure of all groups that depend on that resource.
For these reasons, you might decide to use special methods to protect your disks and disk subsystems from failures. One common solution is the use of a hardware-based RAID solution. RAID support ensures the high availability of data contained on disk sets in your clusters. Some of these hardware-based solutions are considered fault tolerant, which means that data isn’t lost if a member of the disk set fails. You might also use a storage area network (SAN), which can be located on- or off-site.
You can’t use software fault-tolerant disk sets for cluster storage.
The Microsoft Windows Hardware Compatibility List contains many different hardware RAID configurations for clusters. Because many hardware RAID solutions provide power, bus, and cable redundancy within a single cabinet and track the state of each component in the hardware RAID firmware, they provide data availability with multiple redundancy, protecting against multiple points of failure. Hardware RAID solutions can also use an onboard processor and cache. Windows 2000 can use these disks as standard disk resources.
When implementing hardware RAID, you should use redundant RAID controllers to make sure that the controller won’t be a single point of failure.
A SAN is a high-speed, special-purpose network (or subnetwork) that interconnects different kinds of data storage devices with an associated data server on behalf of a larger network of users. Typically, a SAN is often part of the overall network of computing resources and it’s usually clustered in close proximity to other computing resources. However, a SAN can extend to remote locations for backup and archival storage, using WAN carrier technologies such as Asynchronous Transfer Mode (ATM) or Synchronous Optical Network (SONET).
SANs support disk mirroring, backup and restore, the archival and retrieval of archived data, data migration from one storage device to another, and the sharing of data among different servers in a network. SANs can incorporate subnetworks with network-attached storage systems.
After you assess your clustering needs, you’re ready to determine how many servers you need and with what specifications, such as memory and hard disk storage. Capacity planning for clusters is discussed in Chapter 7, "Capacity Planning."
The process of planning your server configuration has several steps. In each of these steps you must decide which configuration is best suited to your organization. Table 4.3 describes the decisions that you must make for each of these steps.
Table 4.3 Planning Your Server Cluster
Step | Description |
---|---|
Identifying network risks | When implementing clusters and the environment in which they’re located, you should minimize the number of single points of failure and provide mechanisms that maintian ser- vice when a failure occurs or that minimize the amount of unscheduled downtime. |
Choosing applications to run on the cluster | The server applications must use TCP/IP and be able to specify where application data is stored. Client applications that connect to the server applications must be able to retry the connection and recover from temporary network failures. Server applications that keep significant state information in memory aren’t good candidates for clustering. |
Choosing a domain model | You can configure nodes in a cluster as member servers or domain controllers. In either case the nodes must belong to the same domain. If you configure the nodes as member servers, the availability of the cluster depends on the availability of the domain controller. |
Choosing a server cluster model | Server clusters can be categorized into different configuration models. Clustering models are discussed in more detail in Lesson 3, "Choosing a Server Cluster Model." |
Planning the resource groups | You should follow six steps when organizing resource groups: listing applications, sorting applications, listing other resources, listing dependencies, making preliminary grouping decisions, and making final grouping decisions. |
Determining failover policies for groups | You can assign failover policies to each group of resources in a cluster. Failover policies include three settings: Failover Timing, Preferred Node, and Failback Timing. |
Planning fault-tolerant storage | You should protect the clustering shared storage from failures; however, you can’t use software fault-tolerant disks in that storage. Hardware-based RAID and SAN, along with redundant controllers, offer a highly available solution for cluster data. |
Determining capacity requirements | After you assess your clustering needs, you should determine your capacity requirements. This process is discussed in Chapter 7, "Capacity Planning." |
When planning a Windows 2000 Advanced Server or Windows 2000 Datacenter Server cluster, you should adhere to the following guidelines:
Northwind Traders imports gift items from Southeast Asia into the United States. The company sells these items to wholesale outlets in the United States and Europe. The company is setting up a Web-based system that will allow whole-sale customers to place orders online. The site’s goal is to be available all day, every day to accommodate various time zones and work schedules. The network includes a database that contains customer, product, and order information. Northwind Traders plans to use the Clustering service in Windows 2000 Advanced Server to provide highly available data.
Before implementing the cluster, Northwind Traders will use the planning process outlined in this lesson to determine how to set up the cluster. The first step is to ensure that any single point of failure in the network is eliminated. The Web site and its network infrastructure will use redundancy throughout the network to achieve high availability. For example, redundant LANs and power sources will be used to prevent failure.
Northwind Traders is using SQL Server 2000 to manage the database because SQL Server uses TCP/IP and the application is able to specify where application data is stored. A resource group will be created that contains SQL Server and any dependent resources. The failover policies for the group will be configured as follows: the Failover Timing setting will be configured to first try restarting resources before failover occurs. A preferred node, however, won’t be designated. The servers are configured as member servers, so the domain controller services for that domain are designed to be highly available. Each cluster node will utilize redundant Fibre Channel host bus adapters (HBAs) to connect to a SAN. Each Fibre Channel HBA will be cross-connected to separate switches for redundancy. Each switch will also be connected to redundant Fibre Channel controllers on the external Fibre Channel storage array. The storage array itself should already have redundant internal components and built-in fault tolerance. This configuration eliminates any single point of failure throughout the entire SAN.
Figure 4.7 shows how the two servers are connected to the corporate network and the SAN. Notice that dual NICs are used for network connectivity: one for client communication and one for the private cluster communication.
Figure 4.7 - Cluster configuration with SAN
When implementing Windows clustering into a network design, you must plan the configuration of specific components within the Cluster service and prepare the environment that supports the clusters. You should minimize the number of single points of failure in your environment and provide mechanisms that maintain service when a failure occurs. Clustering applications must use TCP/IP and be able to specify where the application data is stored. You should assign the failover policies for each group of resources in your cluster to determine how a group behaves. Nodes in a server cluster can be either member servers or domain controllers, and server clusters can be categorized into different configuration models. You should follow six steps when organizing resource groups: listing applications, sorting applications, listing other resources, listing dependencies, making preliminary grouping decisions, and making final grouping decisions. You can use hardware-based RAID and redundant controllers to make cluster storage fault tolerant. After you assess your clustering needs, you’re ready to determine how many servers you need and with what specifications, such as memory and hard disk storage.