Notification Techniques | Understanding and Deploying LDAP Directory Services (2nd Edition)

The whole point of monitoring your directory is to provide the best possible service for your users by anticipating problems before they cause a failure and detecting and repairing failures quickly. If your automated monitoring tools detect a problem, they need to notify someone so that appropriate action can be taken.

In this section we present some general principles you should follow when planning your notification strategy, and we suggest some notification methods you might use. The possibilities range from simple manual systems to sophisticated automated systems.

Basic Notification Principles

An effective notification system accomplishes four goals when a failure of the directory service is detected :

It notifies the people responsible for fixing the directory system.
It notifies the people who administer affected systems, such as electronic mail servers.
It notifies the people affected by the outage .
It notifies each person in an appropriate way.

As soon as a problem is detected, your system should notify the person who can fix it. Depending on your organization and its policies, this person might be a rotating "firefighter" who is on call and has responsibility for taking care of any serious emergencies, or it might be the person who deployed the directory service. This type of notification is typically urgent; the person should be telephoned or paged if the directory is a critical 24x7 service.

The notification system should also send a notification to whoever administers systems that might be affected by the directory failure. For example, if the directory server provides service to several electronic mail servers, the administrators of these servers may want to know about the directory server failure, especially if they receive complaints from users. This type of notification is advisory; it might take the form of an e-mail message to the administrator, or a special Web page on your intranet might list known outages. If you use e-mail as a notification method, be aware that e-mail delivery itself might be delayed by the directory outage.

Your users may also want to receive or obtain some sort of notification when the directory service is unavailable, along with an estimate of when the directory will be back in service. For this purpose, you could maintain a Web page listing all known outages so that end users can learn for themselves about system failures (instead of calling the Help Desk). You can also publish your system maintenance messages in ways that don't depend on the network itself, perhaps via a recorded telephone message. Making this type of information directly available to end users improves user satisfaction and lowers Help Desk costs.

The type of notification should be appropriate for the intended recipient. If you have a support person on 24- hour call (and she is receiving on-call pay), it's entirely appropriate to page her at 4 A.M. and tell her about the failure of a critical directory server. On the other hand, if your organization is experimenting with directory services and nobody is on call, be judicious with your use of intrusive notification methods such as pages and telephone calls. The last thing you want to do when piloting a directory service is to alienate the people whom you need to make the project work! For end users and system administrators responsible for dependent systems, an on-demand notification system such as a Web page or recorded telephone message is usually appropriate.

An important point to remember is that you should avoid "crying wolf." In other words, your notification system should try to get someone's attention only when there is a real problem. If it generates false alarms, the credibility of future notification messages that your system generates will be suspect. For example, suppose a network router fails, disconnecting your directory server from the rest of the network, including your monitoring station. You shouldn't drag the directory administrator out of bed in the middle of the night; there's nothing she can do about the problem (unless, of course, she also manages the router). Whoever or whatever looks at multiple alerts from your monitoring system needs to have the ability to analyze those alerts and determine the root cause.

When a notification message does need to be generated, the message should be tailored to the intended audience. For a directory administrator, we suggest that the notification message describe the test that failed, in what manner it failed, and when it failed. For example, if the directory is monitored via SNMP and a "server down" trap is generated, the message should state exactly that fact, as well as the time the trap was received by the NMS. The following message would be appropriate for a directory administrator:

 Directory server ldap2.example.com SNMP trap (server down) at 02:32:23

For end users, however, the message should be more descriptive of the effect the problem will have ”for example,

 Directory server ldap2.example.com is unavailable as of 2:32 AM

If the person or system generating the alert message has knowledge of services that depend on the failed server, an even more informative message, like the following, can be generated:

 Directory server ldap2.example.com is unavailable as of 8:45 AM. Electronic mail for users with accounts on mail2.example.com and mail4.example.com will be inaccessible. Service should be restored by 10:00 AM.

Ideally your notification should allow a person fixing the problem to annotate any status information you provide. For example, if your notification messages are generated by an operator in a 24x7 network operations center, a directory administrator repairing a failed directory server should be able to call the center, give the operator an estimated time for the repair, and expect the network operations center operator in turn to be able to update any status messages via the Web, automatic phone messages, or other means.

Notification Methods

There are various different methods you might use to notify people of problems with your directory service. If you already have a monitoring and notification infrastructure in your organization, it probably makes sense just to use that instead of developing something new. The following sections might provide some new ideas you can incorporate into an existing notification system or a custom system built from scratch.

One approach is to have an operator sit at a monitoring console and watch for alerts. If an alert is generated for a directory service, the operator can restart the directory service or call an appropriate person. With this type of manual approach, the operator needs to have clear procedures to follow, especially if he is not an expert on directory systems. The procedures should state when it is appropriate to restart the directory and when an expert should be called. Some sort of audit trail should also be generated, especially if the operator repairs the problem without expert intervention. For example, if the directory service fails at the same time each day and is simply rebooted by the operations staff, the deployment staff needs to learn about this so that it can analyze and remedy the problem.

Another approach is to use or build a software program that performs the notification function. Conceptually, the program receives an alert that names a device, server, or application, which is looked up in a table and mapped to a set of people and notification methods. Such a table might look like Figure 19.8. This excerpt from a larger notification table indicates that when a failure is detected with server ldap-hq.example.com , one of two actions is to be taken, depending on the time of day. Between 8 A.M. and 5 P.M. (0800 “1700), a page is sent to the telephone number 1-800-555-1234 using PIN (pager identification number) 9511, and e-mail is sent to bjensen@example.com . Between 5 P.M. and 8 A.M. , a different pager is signaled, and e-mail is sent to oncall-directory@example.com . A similar procedure is in effect for the LDAP server ldap-cleveland.example.com . (Our sample Perl scripts at the end of the chapter include a simple, table-driven notification package.)

Figure 19.8. A Portion of a Notification Table

Testing Your Notification System

In the same way that the Emergency Broadcast System in the United States is periodically tested to ensure its functionality, you should test your notification system from time to time. There are several approaches you might use.

The notification system itself might periodically issue a special alert that causes a special test notification to be sent. Any notification generated, such as pages or e-mail messages, should be obviously marked as a test. For example, a test of the e-mail notification method might be worded like this:

 Testing automatic notification system. No action is required on your part.

In addition, it may be prudent to test the notification tables by injecting all possible alerts into the system and checking that the correct notifications are made. Of course, you need to schedule such a test well in advance and inform all the people who will receive notification during this process. You also need to take special precautions to watch for actual failures that occur during the test.