Introduction to Monitoring

   

It's likely that your directory service is (or will be) a vital part of your computing infrastructure. Users may depend on it for things such as login, personalization, and address books. Applications may depend on it for such things as access control and e-mail delivery. Failure or unavailability of the directory can result in downtime for users and applications, translating into lost time and money. By monitoring your directory service, you can learn of outages as soon as they occur.

With more sophisticated, proactive monitoring strategies, you can also anticipate problems before they result in an outage or degraded service. Information you gather from this type of monitoring can be used to fine-tune your directory server software. Proactive monitoring alerts you to the need to change directory configuration parameters to optimize performance for common queries. It can also provide data that can help you optimize your management procedures. For example, an increase in the number of updates handled by your directory may signal the need for more frequent backups .

A monitoring system consists of four conceptual modules (see Figure 19.1):

  1. The device and application probing module . This module is responsible for periodically checking the status of the monitored devices, hosts , and applications. When a device fails a test, an event is generated that describes the device and the test that was performed. This module may also log the response time for each probe, for use in performance analysis.

  2. The event correlation module . This module is fed events, analyzes them to determine the root cause, and suppresses any events that might have occurred because of other events. For example, a network router might fail, temporarily making all devices, hosts, and applications beyond it inaccessible. Any alerts for those events would be suppressed because they are probably false alarms. After suppressing any inappropriate events, the module constructs one or more alerts and sends them to the notification module.

  3. The notification module . This module receives alerts and notifies the appropriate people who can remedy the problem. Alternatively, this module might arrange for an automated system to take remedial action, such as restarting a server process or rebooting a failed server.

  4. The performance analysis module . This module receives the raw data from the device and application probing module. The raw data generally includes probe response time and usage data collected from devices. The performance analysis module may also read and interpret application logs collected from servers.

Figure 19.1. A Conceptual Overview of Monitoring

The monitoring system shown in Figure 19.1 is a conceptual model that we use to frame our discussion of directory monitoring. Any of the modules' functions could be performed by humans or a software program; however, if you use a commercially available network management system (NMS), you'll probably find that it implements some or all of these functions for you.

The most basic type of monitoring detects when the directory (or a part of it) is unavailable, perhaps because a server machine has crashed or has become unreachable because of a network failure. These directory failures are hard failures ” that is, a part of the directory has failed completely.

Other types of directory problems can result in degraded performance. For example, looping electronic mail can cause the load on the directory to increase dramatically as the messaging servers attempt to deliver the looping mail. A more advanced monitoring tool could conceivably detect the increased load on the directory and alert a system administrator, who could take corrective action. A complete monitoring system should be able to detect hard failures and also detect when performance drops below an acceptable level.

In addition to detecting hard failures and unacceptable performance degradation, a well-designed monitoring solution provides you with valuable information on performance trends. Such proactive monitoring can help you anticipate problems before they become serious enough for your users to notice.

Methods of Monitoring

There are several ways to monitor a directory service. Following are the various types of monitoring you should consider:

  • Monitoring with Simple Network Management Protocol ( SNMP ) . SNMP is a network protocol that allows a management application to monitor the state of managed devices on your network. Although SNMP has found its widest application in the management of networking hardware such as switches, hubs, and routers, it is also possible to use SNMP to monitor and manage application processes running on server computers. SNMP allows a management application to monitor the status of an entity on the network. It's also possible for a management application to be asynchronously notified via the SNMP trap mechanism when some sort of problem occurs (if a server process terminates unexpectedly, for example). We'll discuss SNMP in more detail later in this chapter.

  • Probing the directory via Lightweight Directory Access Protocol ( LDAP ) . One of the most straightforward and useful ways to monitor your directory service is to probe it by connecting to it as a client and issuing LDAP requests . For example, a simple probing tool might connect to a directory server and issue a search request for a given entry. If the entry is returned within a reasonable span of time, the directory is considered functional. If not, the probing tool can report a failure.

  • Monitoring operating system performance data . Most modern operating systems (OSs) provide tools to query their operating parameters. This type of information can help you identify when your directory server performance is suffering because of an OS problem.

  • Indirect monitoring . Monitoring the applications that use the directory provides more of an end- user view of the reliability and responsiveness of your system.

  • Analyzing log files . You can automatically scan the directory service's log files for messages that indicate an error condition, and you can watch for conditions that signal a performance problem. Log file analysis is also a good way to perform proactive monitoring, in which you identify undesirable performance trends and telltale signs of impending problems before they are noticed by your users. Finally, log file analysis is the best way to understand how your directory's usage patterns are changing over time. This information is invaluable during capacity planning.

Later in this chapter, we discuss each of these five approaches in detail and provide specific examples.

General Monitoring Principles

Before we discuss specific monitoring methods, let's take a moment to introduce some general principles that apply to all methods.

Monitor Unobtrusively

You should always understand the implications of your monitoring strategy. A poorly designed monitoring system may adversely affect performance if it places a heavy load on the directory service. In general, you should strive to make the monitoring as unobtrusive as possible while still providing the information you need.

How do you make your monitoring unobtrusive? Use the most lightweight method available that gives you the needed information. For example, if you probe the directory, retrieving a single entry is probably sufficient; it's unnecessary to retrieve many entries. You should also perform the probe no more often than necessary to implement your desired responsiveness. For example, you can discover problems sooner if you probe the directory every five seconds, but that may be overkill; it's probably reasonable to probe every minute, or even every five minutes, depending on the level of service you are expected to provide to your users.

One Failure Can Cause Other Failures

Another potential problem if a failure occurs is that it may trigger other alerts in your monitoring system. For example, if one of a set of replicated servers becomes unavailable, the load on the remaining servers may increase as clients reapportion themselves among the remaining replicas.

Keep a Problem History

You should strive to design your monitoring system so that it provides a reliable history of problems and summarized usage data. For example, if you use a commercial NMS that logs alerts in a standard format, you might periodically extract the directory- related alerts and archive them in a central location. These extracted logs can help you identify trends that you can use to plan for expansion ”and demonstrate your ever improving reliability figures to management (or hide the figures if they don't show improvement!). You can also summarize each day's directory logs and save the summary in an archive. Over time, the archived data becomes increasingly useful for capacity planning.

Have a Plan

Finally, for every type of failure you can anticipate, you should create a written action plan to share with all operators and support personnel who might be the first to learn of a failure. It's also a good idea to have a default action plan to be followed in the event of an unanticipated error. Action plans are covered in more detail in the Taking Action section of this chapter.

   


Understanding and Deploying LDAP Directory Services
Understanding and Deploying LDAP Directory Services (2nd Edition)
ISBN: 0672323168
EAN: 2147483647
Year: 2002
Pages: 242

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net