It's likely that your directory service is (or will be) a vital part of your computing infrastructure. Users may depend on it for things such as login, personalization, and address books. Applications may depend on it for such things as access control and e-mail delivery. Failure or unavailability of the directory can result in downtime for users and applications, translating into lost time and money. By monitoring your directory service, you can learn of outages as soon as they occur. With more sophisticated, proactive monitoring strategies, you can also anticipate problems before they result in an outage or degraded service. Information you gather from this type of monitoring can be used to fine-tune your directory server software. Proactive monitoring alerts you to the need to change directory configuration parameters to optimize performance for common queries. It can also provide data that can help you optimize your management procedures. For example, an increase in the number of updates handled by your directory may signal the need for more frequent backups . A monitoring system consists of four conceptual modules (see Figure 19.1):
Figure 19.1. A Conceptual Overview of Monitoring
The monitoring system shown in Figure 19.1 is a conceptual model that we use to frame our discussion of directory monitoring. Any of the modules' functions could be performed by humans or a software program; however, if you use a commercially available network management system (NMS), you'll probably find that it implements some or all of these functions for you. The most basic type of monitoring detects when the directory (or a part of it) is unavailable, perhaps because a server machine has crashed or has become unreachable because of a network failure. These directory failures are hard failures ” that is, a part of the directory has failed completely. Other types of directory problems can result in degraded performance. For example, looping electronic mail can cause the load on the directory to increase dramatically as the messaging servers attempt to deliver the looping mail. A more advanced monitoring tool could conceivably detect the increased load on the directory and alert a system administrator, who could take corrective action. A complete monitoring system should be able to detect hard failures and also detect when performance drops below an acceptable level. In addition to detecting hard failures and unacceptable performance degradation, a well-designed monitoring solution provides you with valuable information on performance trends. Such proactive monitoring can help you anticipate problems before they become serious enough for your users to notice. Methods of MonitoringThere are several ways to monitor a directory service. Following are the various types of monitoring you should consider:
Later in this chapter, we discuss each of these five approaches in detail and provide specific examples. General Monitoring PrinciplesBefore we discuss specific monitoring methods, let's take a moment to introduce some general principles that apply to all methods. Monitor UnobtrusivelyYou should always understand the implications of your monitoring strategy. A poorly designed monitoring system may adversely affect performance if it places a heavy load on the directory service. In general, you should strive to make the monitoring as unobtrusive as possible while still providing the information you need. How do you make your monitoring unobtrusive? Use the most lightweight method available that gives you the needed information. For example, if you probe the directory, retrieving a single entry is probably sufficient; it's unnecessary to retrieve many entries. You should also perform the probe no more often than necessary to implement your desired responsiveness. For example, you can discover problems sooner if you probe the directory every five seconds, but that may be overkill; it's probably reasonable to probe every minute, or even every five minutes, depending on the level of service you are expected to provide to your users. One Failure Can Cause Other FailuresAnother potential problem if a failure occurs is that it may trigger other alerts in your monitoring system. For example, if one of a set of replicated servers becomes unavailable, the load on the remaining servers may increase as clients reapportion themselves among the remaining replicas. Keep a Problem HistoryYou should strive to design your monitoring system so that it provides a reliable history of problems and summarized usage data. For example, if you use a commercial NMS that logs alerts in a standard format, you might periodically extract the directory- related alerts and archive them in a central location. These extracted logs can help you identify trends that you can use to plan for expansion ”and demonstrate your ever improving reliability figures to management (or hide the figures if they don't show improvement!). You can also summarize each day's directory logs and save the summary in an archive. Over time, the archived data becomes increasingly useful for capacity planning. Have a PlanFinally, for every type of failure you can anticipate, you should create a written action plan to share with all operators and support personnel who might be the first to learn of a failure. It's also a good idea to have a default action plan to be followed in the event of an unanticipated error. Action plans are covered in more detail in the Taking Action section of this chapter. |