Selecting and Developing Monitoring Tools

   

As you set out to design your directory monitoring system, you have two main alternatives. You can choose an NMS package, such as IBM Tivoli NetView, Computer Associates' Unicenter, Hewlett-Packard's HP OpenView, or Aprisma's SPECTRUM; or you can develop your own set of tools to monitor the directory. Which approach should you use?

NMS packages have historically been used to monitor SNMP-enabled network devices such as routers, hubs, and switches. These packages typically have excellent data archiving and reporting capabilities; allow the definition of customized alerts; and offer event correlation capabilities, which permit the NMS to suppress spurious alerts. In addition, many NMSs can directly perform notification via e-mail, telephone, and pager. If your directory server software supports monitoring via SNMP, monitoring it with an NMS is a natural choice. This is especially true if you are already using an NMS to monitor the rest of your network. Even if your directory server does not support SNMP monitoring, you can install an SNMP agent such as Concord's eHealth SystemEDGE on each directory server. SystemEDGE can perform various different checks on the health of the operating system and directory server processes. We'll discuss this approach shortly.

If you do not already use an NMS in your organization and you cannot justify the cost of purchasing and deploying one, it makes sense to develop a set of tools that perform the directory monitoring function. A simple set of Perl scripts, for example, can be used to perform extensive monitoring of your directory service. Later in this chapter we offer some general design hints for developing a set of tools and show you how you can develop notification methods .

In the following sections we discuss the NMS and custom monitoring approaches in detail.

Monitoring Your Directory with SNMP and an NMS

If your directory server supports SNMP, or if you use SNMP agent software on your servers, you can monitor it with one of many commercially available NMS software packages. Before we discuss this topic, however, let's examine how SNMP works.

Introduction to SNMP

Simple Network Management Protocol is an Internet standard protocol for exchanging management information. In a typical SNMP installation, a managed device runs an SNMP agent, and the management station runs an SNMP manager application. The manager may request information from the agent with an SNMP GET request, and the agent responds with a GET response containing the requested data.

Figure 19.2 shows a manager requesting a piece of information from a managed device. The NMS has issued a request to read a piece of management information from the managed device, an Ethernet hub. The device returns the requested information ( systemUpTime , or elapsed time since the device was powered on) to the NMS.

Figure 19.2. An SNMP Manager Issues a GET Request to a Managed Device

SNMP also allows a manager to send a SET request to an agent. This request instructs the agent to modify its operational status in some way. Figure 19.3 shows a manager issuing a SET request to a managed device. This SET request sets the system.sysContact.0 Management Information Base (MIB) variable to the string bjensen@example.com , which represents the e-mail address of the person responsible for the managed device (we'll discuss the MIB shortly).

Figure 19.3. An SNMP Manager Issues a SET Request to a Managed Device

Caution

Older SNMP standards do not provide secure authentication, so SNMP SET commands are rarely enabled in managed devices.


Managed devices can also generate spontaneous messages called traps , which indicate an exceptional condition. In Figure 19.4, the managed device has encountered an error and has generated a trap that it has sent to the management system. As before, the NMS may choose to take some action upon receiving a trap from a managed device.

Figure 19.4. A Network Device Generates an SNMP Trap

SNMP can be used to manage a wide range of devices on a network, including switches, routers, and hubs. But how does an SNMP management application know what types of data are made available by a particular agent? The collection of available management information for all devices in the universe is described in the MIB, a huge tree of management information. Each type of device has its own branch of the tree, and the leaves of the tree represent the actual parameters that may be managed. The structure of the MIB isn't particularly important to our discussion, however. Suffice it to say that the MIB assigns a unique identifier to every possible parameter on every type of device you might want to monitor. These identifiers let managers and agents refer precisely and unambiguously to operating parameters of every device on the network.

When a vendor creates a new piece of networking hardware and wants to make it manageable via SNMP, it needs to create agent software for the device and document the MIB variables that the device supports. If the device performs a common function for which variables already exist in the MIB, no additional MIB variable needs to be defined. If, on the other hand, the device provides new functionality that is to be monitored via SNMP, the vendor needs to assign and document new MIB variables that correspond to these new functions.

In a typical enterprise, one or more NMS stations monitor large numbers of network devices. These management stations poll the devices and record the collected data in a database. Other parts of the NMS display the data graphically. For example, an NMS might offer a pictorial view of a vendor's router. If one of the interfaces on the router encounters many errors in a short time, the NMS package might color the view red to indicate a problem and alert an on-call person by e-mail or pager. NMS software makes the task of managing hundreds or thousands of network devices much easier by automating the data collection and analysis processes.

Directory Servers and the Directory Server Monitoring MIB

Although SNMP is most often used to monitor networking hardware such as hubs, switches, and routers, it can also be used to monitor applications software such as an LDAP server. Figure 19.5 shows how Netscape Directory Server 6 can be monitored with SNMP.

Figure 19.5. Monitoring Netscape Directory Server via SNMP

The host that runs Netscape Directory Server must run an SNMP master agent (on Unix platforms the master agent is included with Netscape Administration Server; on Windows the master agent is included with the operating system). This master agent routes all incoming communication from the NMS to the appropriate subagent (there might be more than one subagent if more than one monitored server is running on the host). The subagent processes the SNMP request and returns the result to the master agent, which routes it to the NMS. Similarly, a subagent may send a trap to the master agent, which forwards it to the NMS.

Netscape Directory Server supports a directory server monitoring MIB that is similar to a subset of the directory server monitoring MIB defined in RFC 2605.

The MIB variables supported by the Netscape server are divided into two sections, or tables: the operations table and the entries table. The operations table provides information about the operations processed by the server, and the entries table provides information about the entries stored by the server. All the counters described in the table are reset whenever the directory server is restarted. The MIB variables supported by Netscape Directory Server are listed in Table 19.1.

Table 19.1. SNMP MIB Variables Supported by Netscape Directory Server

MIB Variable

Description

Operations (all totals are since server startup)

dsAnonymousBinds

The number of anonymous bind operations processed

dsUnauthBinds

The number of unauthenticated (anonymous) bind operations processed

dsSimpleAuthBinds

The number of bind operations that used simple authentication

dsStrongAuthBinds

The number of bind operations that used strong authentication (SSL or a strong SASL mechanism such as HTTP digest) to authenticate the client's identity

dsBindSecurityErrors

The number of bind requests that failed because of invalid credentials

dsInOps

The number of operations forwarded to this directory server from another

dsCompareOps

The number of compare operations processed

dsAddEntryOps

The number of add operations processed

dsRemoveEntryOps

The number of delete operations processed

dsModifyEntryOps

The number of modify operations processed

dsModifyRDNOps

The number of modify RDN operations processed

dsSearchOps

The number of search operations (of any scope) processed

dsOneLevelSearchOps

The number of scope=onelevel search operations processed

dsWholeSubtreeSearchOps

The number of scope= subtree search operations processed

dsReferrals

The number of referrals returned to clients

dsSecurityErrors

The number of operations forwarded to this directory server but rejected because of security problems

dsErrors

The number of requests that could not be processed because of errors

Entries

dsaCacheEntries

The number of entries contained in the server's entry cache

dsaCacheHits

The number of operations serviced from the entry cache

In addition to making these MIB variables available via SNMP GET requests, the Netscape Directory Server SNMP subagent generates an enterprise-specific trap whenever the server starts up and whenever it shuts down, either normally or abnormally. You can make use of this information by configuring your NMS to generate an alert when parameters exceed some preset limits. You may also want to configure your NMS to warn you if it receives a "server down" trap, indicating that the server has shut down for some reason.

More information about using SNMP with Netscape Directory Server can be found in the Netscape Directory Server Administrator's Guide .

Monitoring Your Directory Server Using Host-Based SNMP Agents

It's also possible to install a general-purpose SNMP agent on all of your directory server hosts . These types of agents can monitor the overall health of your servers and report back to your NMS over SNMP. For example, the agent can be configured to monitor disk space usage. It can make the current disk utilization data available to your NMS via SNMP polling, and it can be configured to generate an SNMP trap when the disk utilization exceeds a predefined threshold.

An example of this type of agent is Concord's eHealth SystemEDGE. SystemEDGE can do all of the following:

  • Verify that your directory server process is still running, and generate an SNMP trap if it is not.

  • Watch for specific error messages to appear in a log file or the Windows event log, and generate an SNMP trap if the message is logged.

  • Monitor system usage parameters such as disk utilization, virtual memory usage, and CPU utilization, and generate an SNMP trap when thresholds are exceeded.

  • Record historical data on system usage.

This type of monitoring is very useful, providing additional information that the Directory Services Monitoring MIB does not provide.

Monitoring Your Directory with Custom Probing Tools

If you don't have an NMS but still need to monitor your directory service, you can construct a simple set of tools using off-the-shelf components such as command-line LDAP tools or PerLDAP. In this section we suggest some general design principles for developing your own monitoring tools.

We suggest you probe your directory by performing the same general types of operations that your users and directory-enabled applications perform. For example, you might choose to probe your directory by periodically attempting to read an entry from the directory (see Figure 19.6).

Figure 19.6. Probing the Directory via LDAP

When you probe the directory in this manner, the expected result is that a single entry will be returned. If a different result is obtained, the LDAP server is unavailable, unreachable because of a network failure, or malfunctioning in some manner. Table 19.2 summarizes the most common failure modes for a monitoring client.

How do you know when to declare a failure? Think about the error conditions you consider to be critical. If the directory server is completely unresponsive , this is obviously a serious condition that must be remedied. But what if the server responds slowly? Part of your development effort involves setting thresholds. For example, you might decide that a response time of longer than three seconds for a simple search constitutes a serious degradation of performance and calls for notification of the appropriate person.

If one of the first two error conditions in Table 19.2 is encountered, the fault may lie with the network rather than the server. You can confirm this condition by sending an Internet Control Message Protocol (ICMP) echo request (a ping) to the router closest to the directory server. If the router isn't able to receive a ping, you really can't draw any conclusions about the state of the directory server. About the best you can do is to declare its state "unknown."

Table 19.2. Types of Monitoring Failures

Result

Explanation

No route to host.

The directory server host cannot be contacted. A network failure or DNS lookup failure is the most likely cause.

Connection times out.

The directory server host is down, or the network between the monitoring tool and the directory server is down.

Connection refused .

No directory server is running on the host.

An LDAP error code other than LDAP_SUCCESS(0) is returned.

The directory server encountered an error while servicing the search request.

Server responds, but too slowly.

The directory server is experiencing a problem or misconfiguration that is degrading its performance, or the server is simply overloaded.

How you handle this situation is up to you. You can consider it a failure and notify the appropriate people, or you can ignore the error condition on the assumption that the server is running but unreachable. Another option is to place the monitoring tool on the same network as the server(s) being monitored. This makes it much less likely that the monitoring host will lose contact with the monitored server. Of course, a network failure will cause you to lose contact with the monitoring host, but historical data collected will be more complete.

In addition to simple searches, you might consider test probes that approximate the types of activity your users and directory-enabled applications generate. For example, if a common activity is for your users to update their personal information (such as home telephone number), you might develop a script that replaces the homeTelephoneNumber attribute of a special test entry. If you're not sure what types of test probes would approximate user activity, examine your server logs.

Tip

The sample Perl script included at the end of this chapter implements a test like those described here: It searches for an entry and reports an error if the entry cannot be retrieved. The script is "smart"; that is, it avoids generating an alert if the directory server becomes unreachable because of a network failure.


We recommend that you use a modular approach when designing custom monitoring tools. In a modular approach, you separate the actual monitoring processes from the policy decisions about what constitutes a failure and who should be notified. One way to accomplish such a separation is to have the probing modules simply write their test results to a log file in a standard format, then have another process read and interpret the log file and implement a notification policy. For example, you might implement a policy stating that a directory server at a remote location must exhibit slow performance twice in any five-minute period before an alert is raised (doing this requires saving state information between probes). Figure 19.7 depicts such an architecture.

Figure 19.7. Separation of Monitoring Functions and Alert/Notification Functions

Log File Analysis

Directory server log files can provide an effective way to monitor your servers. Most server software logs warning and error messages to a well-known location. On Unix systems, software often logs messages using the syslog facility. On Windows systems, messages are often logged to the Windows event log and may be viewed with the event viewer utility. Netscape Directory Server logs error messages to a file rather than using the system-provided logging facilities.

If you use the SystemEDGE agent described earlier in this chapter, you can configure it to watch for specific error messages to appear in the log and generate a trap if they do. Freely available tools like swatch can also be used to watch log files and take action in response to particular error messages.

Log file analysis is also the centerpiece of performance monitoring and analysis, which is discussed later in this chapter.

Operating System Performance Data

OS performance data can be a valuable aid in identifying the underlying causes of performance trends. For example, you can discover the total amount of virtual memory in use on the system or the amount of free disk space on a particular disk or disk partition on virtually any OS. These types of information can help you identify when your directory server performance is suffering because of an OS problem. For example, if your OS constantly writes to its virtual memory paging space, the physical memory may be insufficient for the applications running on the machine. You can remedy this problem by reducing the number of applications running on the machine (by moving them to another machine), or by adding additional RAM to the machine.

OS performance data can be monitored by an SNMP agent that obtains it and makes it available via SNMP, as mentioned earlier. This capability allows you to monitor the server's operating system health from your central NMS, if you have one. If you have no NMS, you can write custom scripts that periodically sample system performance parameters and log the sampled data to files for later analysis.

Monitoring Synchronization Processes and Data Quality

If your directory is synchronized from an authoritative data source such as a PeopleSoft human resources database, it's a good idea to monitor the quality of your data. For example, you can automatically spot-check a few entries randomly and verify that the data in the directory matches the data in the HR system. Discrepancies indicate that the synchronization process is not working correctly. You can also monitor the quality of data for which the directory itself is the authoritative data source. For example, you might check whether e-mail addresses in the directory are in the correct format. More information on monitoring data quality via data validation techniques can be found in Chapter 18, Maintaining Data.

Indirect Monitoring

Another approach that should be part of your monitoring strategy is indirect monitoring. In this approach you don't monitor the individual services such as your directory. Instead you construct monitoring tools that measure availability and response times for the things that matter directly to your end users. For example, if you provide an electronic mail service that depends on the directory, you might periodically measure the time it takes to send an e-mail message from one test user to another.

Indirect monitoring is the real acid test for your servers and network because any single part of your complex, interconnected systems can be at fault. In combination with monitoring the individual devices, servers, and server processes that make up your directory-enabled application, indirect monitoring allows you to quickly identify and repair problems ”and ultimately improve the reliability and performance of all your directory-enabled applications.

   


Understanding and Deploying LDAP Directory Services
Understanding and Deploying LDAP Directory Services (2nd Edition)
ISBN: 0672323168
EAN: 2147483647
Year: 2002
Pages: 242

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net