After you have planned and implemented your new high availability server solution, you need to perform routine monitoring and management tasks on it to keep it operating correctly. For the most part, high availability solutions are prone to the same sorts of problems that plague regular servers: hardware failures, service stoppages, data corruption, and so on. Recall that the purpose of implementing a high availability solution is not to prevent these sorts of events from occurring (although it would be nice), but instead to minimize the impact of these occurrences on the client experience. In this section we examine the process to recover from the failure of a cluster node as well as some of the tools that can be used to monitor network load balancing implementations . Recovering from Failed Cluster Nodes
If you've done all your planning and implementing right up to now, you are ready for the day when something ”anything ”happens that renders one of your cluster nodes inoperable. As mentioned previously, high availability servers are still subject to the same sorts of problems and failures that plague any server; the difference is that because you have implemented a high availability solution, your clients will continue to have access to the required applications services and should not, under most circumstances, even notice that something terrible has happened behind the scenes. It's a nice feeling knowing that even if disaster strikes, your clients can still carry on as if nothing ever happened. However, you cannot afford to rest on your laurels when disaster does strike; you will need to get that failed cluster node back online and in the cluster in short order. How you do this will depend exactly on what the problem at hand is. In most cases, when an MSCS cluster node has failed, you either need to rebuild it (hardware failure) or restore it (software failure or corruption) from an earlier backup set. In either case, you need to first evict the node from the cluster. To evict a node from a cluster, open the Cluster Administrator and connect to the cluster in question. Locate the node to be evicted and right-click on it. From the context menu, select Evict Node, as shown in Figure 5.37. You cannot evict a node where the Cluster Service is still running. Figure 5.37. You may need to evict a cluster node for a variety of reasons.
Evicting the last remaining node in the MSCS cluster removes the entire cluster itself, so be careful not to do so unless this is your intention . The eviction process is fairly abrupt, but it poses no problem to an already failed node because it is no longer providing service to clients. After the cluster node has been evicted, you can rebuild it or perform a restoration on it as required. Should you need to rebuild the cluster node, you need to ensure that its configuration matches exactly that of the node it is replacing. IP address, local driver letters , computer name , and domain membership are all critical to being able to successfully join the newly created node to the cluster. In the event that you need to perform a restoration from a previous backup set, you can perform this as discussed in Chapter 6, "Monitoring and Maintaining Server Availability." In a worst-case scenario in which you cannot evict a cluster node that is still operating but is experiencing problems with the Cluster Service, you can initiate a manual removal of the Cluster Service from the node by issuing the command cluster node nodename /forcecleanup from the command line, as shown in Figure 5.38. Figure 5.38. If nothing else works to evict the cluster node, you can initiate a manual removal of the Cluster Service.
After a cluster node has failed, you should also monitor the remaining cluster nodes to ensure that they are not adversely affected or overloaded as a result. This situation can easily occur when Active/Active clustering is being used. Chapter 6 discusses monitoring server performance in Windows Server 2003. Lastly, after a cluster node has failed, you should make sure that any failovers that were configured to occur have occurred properly. If they have not already properly occurred, you need to manually move the resource group by right-clicking on it and selecting Move Group , as shown in Figure 5.39. Figure 5.39. You may have to manually move a resource group if the failover has not occurred properly for some reason.
Monitoring Network Load Balancing
When it comes to monitoring your NLB clusters, there is really not a whole lot to do. You should, as a standard administrative practice, perform basic performance monitoring on each of your NLB cluster nodes. You should monitor the following items:
Using the Performance console to monitor and baseline servers is discussed at length in Chapter 6 and is not discussed here. EXAM TIP Using nlb.exe remotely The strength of the nlb.exe command is that it can be used to manage NLB clusters and cluster nodes remotely across a LAN or WAN if desired. To run the nlb.exe command from a remote computer, you must enable remote control for the NLB cluster. Enabling remote control presents security risks to the NLB cluster, such as data tampering, denial of service (DoS), and unintentional data disclosure to attackers . Remote control should be used only from a trusted computer inside the same firewall as the NLB cluster or over a VPN if outside the firewall. If you choose to enable remote control despite the risks associated with it, you should take steps to protect the NLB cluster from attack as a result. The default User Datagram Protocol (UDP) control ports for the cluster, 1717 and 2504 at the cluster VIP, should be protected by a firewall. Also, you must ensure that you have configured a strong remote control password. You can, however, also perform some monitoring of your NLB cluster and NLB cluster hosts from the command line using the wlbs.exe command. For those of you screaming out that the Windows Load Balancing Service was retired with Windows NT 4.0, you are very much correct; Microsoft has kept the WLBS acronym around for good measure. In reality, wlbs.exe and nlb.exe are identical in every way; therefore, we discuss nlb.exe . The nlb.exe command has the following basic context: nlb command [ remote options ] . A complete listing of all available NLB commands can be found in the Windows Server 2003 help files or online at www.microsoft.com/technet/prodtechnol/windowsserver2003/proddocs/entserver/nlb_command.asp. From a monitoring point of view, we focus only on the commands outlined in Table 5.2. Table 5.2. The nlb.exe Monitoring-Specific Commands
|