Understanding and Deploying LDAP Directory Services > 18. Monitoring > An Introduction to Monitoring |
An Introduction to MonitoringIt's likely that your directory service is (or will be) a vital part of your computing infrastructure. Users may depend on it for things such as login and address books, and applications may depend on it for such things as access control and email delivery. Failure or unavailability of the directory can result in downtime for users and applications, which translates into lost time and money. By monitoring your directory service, you can learn of outages as soon as they occur. With more-sophisticated, proactive monitoring strategies, you can also anticipate problems before they result in an outage or degraded service. Information you gather from this type of monitoring can be used to fine-tune your directory server software. For example, proactive monitoring may alert you to the need to change directory configuration parameters to optimize performance for common queries. Proactive monitoring can also provide data that can help you optimize your management procedures. For example, an increase in the number of updates handled by your directory may signal the need for more frequent backups . A monitoring system consists of three conceptual modules (see Figure 18.1), described in the following list: Figure 18.1 A conceptual overview of monitoring.
The monitoring system shown in Figure 18.1 is a conceptual model that we use to frame our discussion of directory monitoring. Any of the modules' functions could be performed by humans or a software program; however, if you use a commercially available network management system (NMS), you'll probably find that it implements all of these functions for you. The most basic type of monitoring detects when the directory (or a part of it) is unavailable, perhaps because a server machine has crashed or has become unreachable as a result of a network failure. These directory failures are hard failures ”that is, a part of the directory has failed completely. Other types of directory problems can result in degraded performance. For example, looping electronic mail can cause the load on the directory to increase dramatically as the messaging servers attempt to deliver the looping mail. A more advanced monitoring tool could conceivably detect the increased load on the directory and alert a system administrator, who could take corrective action. A complete monitoring system should be able to detect hard failures and also detect when performance drops below an acceptable level. In addition to detecting hard failures and unacceptable performance degradation, a well-designed monitoring solution also provides you with valuable information on performance trends. Such proactive monitoring can help you anticipate problems before they become serious enough for your users to notice. Methods of MonitoringThere are a number of ways to monitor a directory service. Following are the various types of monitoring you should consider:
Later in this chapter we discuss each of these five approaches in detail and provide specific examples of each. General Monitoring PrinciplesBefore we discuss specific monitoring methods, let's take a moment to introduce some general principles that apply to all methods. Monitoring UnobtrusivelyYou should always understand the implications of your monitoring strategy. It's possible for a poorly designed monitoring system to adversely affect performance if it places a heavy load on the directory service. In general, you should strive to make the monitoring as unobtrusive as possible while still providing the information you need. How do you make your monitoring unobtrusive? You should use the available method that is the most lightweight but gives you the needed information. For example, if you probe the directory, retrieving a single entry is probably sufficient; it's unnecessary to retrieve many entries. You should also perform the probe no more often than necessary to implement your desired responsiveness. For example, you can discover problems sooner if you probe the directory every five seconds, but that may be overkill; it's probably reasonable to probe every minute, or even every five minutes, depending on the level of service you are expected to provide to your users. One Failure Causing Other FailuresIt's also possible that if a failure occurs, it may trigger other alerts in your monitoring system. For example, if one of a set of replicated servers becomes unavailable, the load on the remaining servers may increase as clients reapportion themselves among the remaining replicas. If this occurs, you can try to reduce the load on the remaining servers by disabling noncritical applications or by bringing additional replicas online. In any case, such an event signals the need for additional capacity to provide some headroom should it happen again. Keeping a Problem HistoryYou should strive to design your monitoring system so that it provides you with a reliable history of problems. For example, if you use a commercial network management system that logs alerts in a standard format, you might periodically extract the directory- related alerts and archive them in a central location. These extracted logs can help you identify trends that you can use to plan for expansion ”and demonstrate your ever-improving reliability figures to management (or hide the figures if they don't show improvement!). Having a PlanFinally, for every type of failure you can anticipate, you should create a written action plan to share with all operators and support personnel who might be the first to learn of a failure. It's also a good idea to have a default action plan to be followed when an unanticipated error occurs. Action plans are covered in more detail in the "Taking Action" section of this chapter.
|
Index terms contained in this sectionaction plansmonitoring degraded perforamnce monitoring Device and Application Probing module monitoring directories monitoring 2nd action plans degraded perforamnce Device and Application Probing module Event Correlation module failures causing failures hard failures indirect log file analysis NMSs Notification module operating system performance data problem histories unobtrusively 2nd with LDAP with SNMP Event Correlation module monitoring failures causing failures monitoring hard monitoring hard failures monitoring indirect monitoring LDAP monitoring with log files monitoring logs problem histories maintenance problem histories modules monitoring Device and Application Probing Event Correlation Notification monitoring 2nd action plans degraded performance failures causing failures hard failures indirect log file analysis modules Device and Application Probing Event Correlation Notification NMSs operating systems performance data problem histories unobtrusively 2nd with LDAP with SNMP NMS (Network Managing System) monitoring Notification module monitoring operating systems performance data monitoring performance degraded monitoring operating system monitoring problems histories monitoring protocols SNMP monitoring with SNMP monitoring with SNMP (Simple Network Management Protocol) monitoring with troubleshooting problem histories |
2002, O'Reilly & Associates, Inc. |