The primary administrative tasks on a cluster are:
Because of their scope and complexity, certain tasks that can be viewed as administrative—such as forcing cluster synchronization or modifying a member's load balancing configuration—are covered later in this chapter.
When you want to add a server to a cluster, you can use the Add Cluster Member wizard; it uses a dialog that is similar to the New Cluster Wizard but with fewer steps. The wizard steps are used to identify the new member, provide credentials if required, analyze the server's configuration, and add the server to a cluster.
NOTE
Before you add a server to the cluster, you need to assess its hardware configuration. You should do this for two reasons: first, to verify that it meets the minimum configuration requirements to be a cluster member; and second, to determine whether its processing capabilities are adequate. Don't forget, there is the potential for any member to be pressed into service as a cluster controller. Use the existing controller's configuration as a guideline for evaluating this server.
As trivial as a welcoming page may seem—users do tend to ignore them and hit the Next button—let's start with this page because it presents important information.
NOTE
You can add a server to a cluster by invoking the wizard from either the server that you want to add, from the cluster controller, or remotely from a computer outside the cluster. In any case, you must supply the appropriate administrative credentials to connect to the server that you're not logged on to—either for the cluster controller or the potential new member.
Welcome to the Add Cluster Member Wizard
In addition to telling you what the wizard does, the opening page provides these set up warnings:
Server Name and Credentials
With this page, you specify the server to add, either by browsing the network or by entering the server's name or IP address. You have to provide explicit credentials for an account that has administrative privileges to continue.
Controller
Virtually identical to the Server Name and Credentials page, this is where you provide the name of the cluster controller for the target cluster. Unless you're working on the controller, you will have to provide administrative credentials.
Analyzing Server Configuration
During this analysis phase, the wizard checks the configuration of the server you want to add as well as the target cluster controller. The following information is gathered:
Load Balancing Options
If the cluster that you're joining already has NLB installed, the network adapter selection list appears dimmed. If not, you'll have to specify the network adapter that you want to use for load balancing. The load balancing cases described for the cluster creation process also apply in this case. The screen capture in Figure 4.4 illustrates the load balancing options that are available.
Figure 4.4 Available load balancing options when adding a cluster member
Two items should be noted on the Cluster Member Options page shown in Figure 4.4. They are the settings for Automatically synchronize this cluster member and Bring this cluster member online, which are enabled by default. There are cases where you will not want the member to be synchronized to the controller and/or brought online for load balancing immediately. This may be after a staged deployment or if you want to test new content by using live users.
Finish
During this phase, the wizard generates setup XML for the new member, updates the cluster membership list on the controller and new member, generates member and cluster configuration settings, and returns a success or failure notification. The final step is synchronization, in which cluster controller content and settings are replicated to the new member. The new member is brought online for load balancing by default, but you can defer this step until later if you prefer.
Figure 4.5 illustrates the network-level configurations that occur when a server is added to an NLB cluster.
Figure 4.5 Network-level configurations as a result of adding a cluster member
Two items are of particular interest in the illustration shown in Figure 4.5. First, the static IP address on the controller's front-end network adapter, which is used for load balancing, is bound to the front-end network adapter on the new member. This cluster IP address is used for servicing all incoming TCP/UDP requests according to the NLB port-rule settings for a given port. Which is to say, HTTP for port 80. On a COM+ routing cluster, NLB uses this address to service incoming RPC activation requests. Second, if NLB is used for load balancing, the media access control address of the controller's front-end network adapter is assigned as the media access control address for the front-end network adapter on the new member. This is why, at the Ethernet level; all cluster members can "hear" inbound TCP/UDP level requests that are sent to the cluster IP address.
NLB and network adapter media access control addresses
In Unicast mode, NLB overwrites the network adapter's media access control address with its own virtual media access control address by using the registry. Some network adapter drivers do not allow their media access control address to be overwritten in the registry.The work-around is to use multicast mode, which adds a virtual media access control address to the existing network adapter's media access control address, or use a different network adapter that allows overwriting the media access control address in the registry. Because the Application Center user interface doesn't enable you to create a multicast cluster, you have to do this manually. The following steps are required:
- Manually configure NLB on the controller before creating a cluster.
- Choose Keep existing settings when running the cluster creation wizard.
When you add a member to the cluster, the multicast settings are replicated to the new member.
The IP addresses on the back end are dynamically allocated by DHCP and are used for transmitting cluster heartbeats as well as for content replication.
NOTE
DHCP-assigned addresses are not mandatory on the back-end adapter; you can choose to use static IP addresses on the back-end.
Application Center cluster heartbeats
The cluster controller sends an Internet Control Message Protocol (ICMP) ping to every member at 2-second intervals. The cluster controller makes a call to the name resolution service to determine the appropriate IP address to ping. If this fails or the service is turned off, the controller calls the Windows Socket API (Winsock) function GetHostByName() for each member to determine the IP address to ping. Each member has 1 second in which to respond to the ping. If a member doesn't respond to more than 2 consecutive pings, it is assumed to be "Dead" from a networking perspective. Its status will switch back to Alive if it starts responding again and does so for 3 consecutive pings. This heartbeat, transmitted over the back-end network adapters, doesn't do any health or performance checking on the application level; it simply verifies that a server can communicate at the TCP level.For more information about ICMP and/or GetHostByName(), see the Platform Software Development Kit.
You can launch the Remove Cluster Member dialog box from the individual member's node in the MMC. If the member is still online, you'll be cued with a warning to that effect. In the case of a Web-based cluster, the online members are actively servicing HTTP requests, so they should be set offline before removing them from the cluster. You can, however, simply force a member's removal without any draining period.
WARNING
If you choose to remove a member without first setting it offline, there is a strong potential for terminating client connections in mid-session. Any work that these users are doing may be lost.
If you're initiating a member's removal from a different cluster member, you will have to connect to the target member by using an account that has administrative privileges on that member.
After the preliminary identification and validation is completed, a component is called that carries out the following tasks:
NOTE
If there is more than one cluster member, Remove Cluster Member is unavailable for the cluster controller node in the member tree.
The final step in the removal process is the execution of a component that cleans up the member that was removed. This component:
TIP
If you do end up in a situation where a server becomes unstable or inoperable, you should remove it from the cluster and use the command-line tool CLUSTER /CLEAN against the server to clean up all the cluster configuration settings. After you have a clean server, you can re-install Application Center and add the server back into the cluster.
You can force a restart of any cluster member whose node has focus in the console tree by using Restart Cluster Member (All Tasks). This action forces a warm restart of the specified member.
Whenever various Application Center services have to be restarted because of a Service Control Manager net start or a member restart, the following sequence of events occurs:
If the member being restarted isn't the controller but the controller was found, the next set of activities are added to the restart sequence:
If NLB is configured on the cluster, an additional set of start-up actions is triggered. These actions are:
At this point the restart sequence executes some final tasks before finishing the server restart:
Changing the designated controller for a cluster is a fairly simple process from the user's perspective—it consists of selecting a member node in the console tree (assuming that the user is connected to the controller) and launching the Designate as Controller command. Alternatively, if the cluster controller is down, you can connect directly to the member that you want to designate as the controller and invoke the preceding command.
TIP
Prior to promoting a member to controller status, you should do a full synchronization of the member to the current controller.
There are two situations that can exist when you decide to explicitly change the designated cluster controller:
The Controller Is Up and Running
In this scenario you've decided to promote a member to controller status even though the current controller is up and running. (Reasons for making this change may be to add more memory or replace one of the network adapters.)
You should not change the controller if one of the following operations is in progress:
WARNING
If you launch the Designate as Controller command while one of the previously described operations is in progress, the controller change will fail.
If the preceding conditions do not exist, the following sequence of events occurs involving the current controller (S1) and the member that will become the controller (S2).
In the first step, the administration program verifies that S2 can be contacted. If not, the operation is stopped. The next step involves a call to S1 to see if the cluster is in a controller-less state. If it is, the processing described in the following section, "The Controller Is Not Available," occurs.
If the current controller (S1) can be contacted and appears to be functioning normally, S1:
If S1 fails before completing the preceding step, an error is returned to the administration program and no timers are started. If an S1 failure occurs after notifying some of the members, these members will have started their timers. As soon as these members are notified of the S1 failure, they expire their timers and wait to be notified of a controller recovery or controller change.
Assuming there isn't a failure, S1:
If control is successfully transferred to the new controller, S2:
If S2 fails between responding to the COM call from S1 and the firing of the new controller event, the members expire their timers and revert to regarding S1 as the controller. If S2 fails after telling only a subset of the members that it's taking over as the controller, the timer expires on the members that haven't been told about the controller change—they switch back to regarding S1 as the controller. Members on which the timer expires fire an event/alert that tells the administrator what has happened.
If controller transfer is successful to this point, the call from S1 to S2 returns and S1 changes its local pointer to reference S2 as the cluster controller. When the S1 cluster reference change is saved, the controller change is finished. (If S2 fails before this reference is changed, the administrator is notified and the controller change has to be redone.)
There are additional special cases that can happen during the course of a controller change:
The Controller Is Not Available
When the controller (S) is not available and the cluster is in a controller-less state, in your role of administrator you have to designate another cluster member as the controller.
In order to disband a cluster, you have to remove each cluster member, leaving the cluster controller as the last member to remove. As noted earlier, the option to remove the cluster controller is not available unless it's the only cluster member.
Although this approach may seem tedious when faced with the task of disbanding a large cluster, it's actually an excellent feature from a production perspective. If a cluster, regardless of its size, could be disabled with a single command, the potential for wreaking havoc in a production cluster is frightening. (For a script example that illustrates how you can remove a group of members from a cluster by using a single batch file, see Chapter 11, "Working with the Command-Line Tool and Scripts.")
When you initiate the removal of the cluster controller from the cluster, the following activities occur: