Prehistory and Early Electronic Directories | Understanding and Deploying LDAP Directory Services (2nd Edition)

Understanding and Deploying LDAP Directory Services > 18. Monitoring > An Introduction to Monitoring

< BACK

CONTINUE >

153021169001182127177100019128036004029190136140232051053054012004115230005232218103002

An Introduction to Monitoring

It's likely that your directory service is (or will be) a vital part of your computing infrastructure. Users may depend on it for things such as login and address books, and applications may depend on it for such things as access control and email delivery. Failure or unavailability of the directory can result in downtime for users and applications, which translates into lost time and money. By monitoring your directory service, you can learn of outages as soon as they occur.

With more-sophisticated, proactive monitoring strategies, you can also anticipate problems before they result in an outage or degraded service. Information you gather from this type of monitoring can be used to fine-tune your directory server software. For example, proactive monitoring may alert you to the need to change directory configuration parameters to optimize performance for common queries. Proactive monitoring can also provide data that can help you optimize your management procedures. For example, an increase in the number of updates handled by your directory may signal the need for more frequent backups .

A monitoring system consists of three conceptual modules (see Figure 18.1), described in the following list:

Figure 18.1 A conceptual overview of monitoring.

The Device and Application Probing module is responsible for periodically checking the status of the monitored devices, hosts , and applications. When a device fails a test, an event is generated that describes the device and the test that was performed.
The Event Correlation module is fed these events and correlates them to determine the root cause, and then it suppresses any events that might have occurred as a consequence of other events. For example, a network router might fail, temporarily making all devices, hosts, and applications beyond it inaccessible. Any alerts for those events would be suppressed because they are probably false alarms. After suppressing any inappropriate events, the module constructs one or more alerts and sends them to the notification module.
The Notification module receives alerts and notifies the appropriate persons who can remedy the problem. Alternatively, the Notification module might arrange for an automated system to take some remedial action such as restarting a server process or rebooting a failed server.

The monitoring system shown in Figure 18.1 is a conceptual model that we use to frame our discussion of directory monitoring. Any of the modules' functions could be performed by humans or a software program; however, if you use a commercially available network management system (NMS), you'll probably find that it implements all of these functions for you.

The most basic type of monitoring detects when the directory (or a part of it) is unavailable, perhaps because a server machine has crashed or has become unreachable as a result of a network failure. These directory failures are hard failures ”that is, a part of the directory has failed completely.

Other types of directory problems can result in degraded performance. For example, looping electronic mail can cause the load on the directory to increase dramatically as the messaging servers attempt to deliver the looping mail. A more advanced monitoring tool could conceivably detect the increased load on the directory and alert a system administrator, who could take corrective action. A complete monitoring system should be able to detect hard failures and also detect when performance drops below an acceptable level.

In addition to detecting hard failures and unacceptable performance degradation, a well-designed monitoring solution also provides you with valuable information on performance trends. Such proactive monitoring can help you anticipate problems before they become serious enough for your users to notice.

Methods of Monitoring

There are a number of ways to monitor a directory service. Following are the various types of monitoring you should consider:

Monitoring with Simple Network Management Protocol (SNMP). Although SNMP has found its widest application in the management of networking hardware such as switches, hubs, and routers, it is also possible to use SNMP to monitor and manage application processes running on server computers. SNMP allows a management application to monitor the status of an entity on the network. It's also possible for a management application to be asynchronously notified via the SNMP trap mechanism when some sort of problem occurs (if a server process terminates unexpectedly, for example).
Probing the directory via LDAP. One of the most straightforward and useful ways to monitor your directory service is to probe it by connecting to it as a client and issuing LDAP requests . For example, a simple probing tool might connect to a directory server and issue a search request for a given entry. If the entry is returned within a reasonable span of time, the directory is considered functional. If not, the probing tool can report a failure.
Monitoring operating system performance data. Most modern operating systems (OSs) provide tools to query their operating parameters. This type of information can help you identify when your directory server performance is suffering because of an OS problem.
Indirect Monitoring .Monitoring the applications that utilize the directory provides more of an end user view of the reliability and responsiveness of your system.
Log File Analysis. You can automatically scan the directory service's log files for messages that indicate an error condition, and you can watch for conditions that signal a performance problem. Log file analysis is also a good way to perform proactive monitoring, in which you identify undesirable performance trends and telltale signs of impending problems before they are noticed by your users.

Later in this chapter we discuss each of these five approaches in detail and provide specific examples of each.

General Monitoring Principles

Before we discuss specific monitoring methods, let's take a moment to introduce some general principles that apply to all methods.

Monitoring Unobtrusively

You should always understand the implications of your monitoring strategy. It's possible for a poorly designed monitoring system to adversely affect performance if it places a heavy load on the directory service. In general, you should strive to make the monitoring as unobtrusive as possible while still providing the information you need.

How do you make your monitoring unobtrusive? You should use the available method that is the most lightweight but gives you the needed information. For example, if you probe the directory, retrieving a single entry is probably sufficient; it's unnecessary to retrieve many entries. You should also perform the probe no more often than necessary to implement your desired responsiveness. For example, you can discover problems sooner if you probe the directory every five seconds, but that may be overkill; it's probably reasonable to probe every minute, or even every five minutes, depending on the level of service you are expected to provide to your users.

One Failure Causing Other Failures

It's also possible that if a failure occurs, it may trigger other alerts in your monitoring system. For example, if one of a set of replicated servers becomes unavailable, the load on the remaining servers may increase as clients reapportion themselves among the remaining replicas. If this occurs, you can try to reduce the load on the remaining servers by disabling noncritical applications or by bringing additional replicas online. In any case, such an event signals the need for additional capacity to provide some headroom should it happen again.

Keeping a Problem History

You should strive to design your monitoring system so that it provides you with a reliable history of problems. For example, if you use a commercial network management system that logs alerts in a standard format, you might periodically extract the directory- related alerts and archive them in a central location. These extracted logs can help you identify trends that you can use to plan for expansion ”and demonstrate your ever-improving reliability figures to management (or hide the figures if they don't show improvement!).

Having a Plan

Finally, for every type of failure you can anticipate, you should create a written action plan to share with all operators and support personnel who might be the first to learn of a failure. It's also a good idea to have a default action plan to be followed when an unanticipated error occurs. Action plans are covered in more detail in the "Taking Action" section of this chapter.

Understanding and Deploying LDAP Directory Services, 2002 New Riders Publishing

< BACK

CONTINUE >

Index terms contained in this section

action plans
monitoring
degraded perforamnce
monitoring
Device and Application Probing module
monitoring
directories
monitoring 2nd
action plans
degraded perforamnce
Device and Application Probing module
Event Correlation module
failures causing failures
hard failures
indirect
log file analysis
NMSs
Notification module
operating system performance data
problem histories
unobtrusively 2nd
with LDAP
with SNMP
Event Correlation module
monitoring
failures
causing failures
monitoring
hard
monitoring
hard failures
monitoring
indirect monitoring
LDAP
monitoring with
log files
monitoring
logs
problem histories
maintenance
problem histories
modules
monitoring
Device and Application Probing
Event Correlation
Notification
monitoring 2nd
action plans
degraded performance
failures
causing failures
hard failures
indirect
log file analysis
modules
Device and Application Probing
Event Correlation
Notification
NMSs
operating systems
performance data
problem histories
unobtrusively 2nd
with LDAP
with SNMP
NMS (Network Managing System)
monitoring
Notification module
monitoring
operating systems
performance data
monitoring
performance
degraded
monitoring
operating system
monitoring
problems histories
monitoring
protocols
SNMP
monitoring with
SNMP
monitoring with
SNMP (Simple Network Management Protocol)
monitoring with
troubleshooting
problem histories

2002, O'Reilly & Associates, Inc.