Understanding and Deploying LDAP Directory Services > 18. Monitoring > Selecting and Developing Monitoring Tools |
Selecting and Developing Monitoring ToolsAs you set out to design your directory monitoring system, you have two main alternatives to choose from. You can choose an NMS package, such as IBM's Tivoli TME 10, Computer Associates' CA/Unicenter TNG, Hewlett-Packard's OpenView, IBM's NetView, or Cabletron's Spectrum; or you can choose to develop your own set of tools to monitor the directory. Which approach should you use? NMS packages have historically been used to monitor SNMP-enabled network devices such as routers, hubs, and switches. These packages typically have excellent data archiving and reporting capabilities; allow the definition of customized alerts; and offer event-correlation capabilities, which permit the NMS to suppress spurious alerts. Additionally, many NMSs can directly perform notification via email, telephone, and pager. If your directory server software supports monitoring via SNMP, monitoring it with an NMS is a natural choice. This is especially true if you already are using an NMS to monitor the rest of your network. If you do not already use an NMS in your organization, and if you cannot justify the cost of purchasing and deploying one, it makes sense to develop a set of tools that perform the directory monitoring function. A simple set of Perl scripts, for example, can be used to perform extensive monitoring of your directory service. We offer some general design hints for developing a set of tools and show you how you can develop notification methods later in the chapter. If you have (or plan to deploy) an NMS, but your directory does not support monitoring via SNMP, you might use a hybrid approach in which custom-developed probing tools are integrated into your NMS. (The NMS software, of course, must support this.) The advantage of a hybrid system is that you still benefit from the event correlation, logging, and notification services provided by your NMS software. Even if your directory service supports SNMP monitoring, you may find it beneficial to use this hybrid approach because it allows you to monitor the directory in the same way your users access it. In the following sections, we discuss the NMS and custom monitoring approaches in greater detail. Monitoring Your Directory with SNMP and a Network Management SystemIf your directory server supports SNMP, you can monitor it with one of many commercially available NMS software packages. Before we discuss this topic, however, let's examine how SNMP works. An Introduction to SNMPSNMP is an Internet standard protocol for exchanging management information. In a typical SNMP installation, a managed device runs an SNMP agent, and the management station runs an SNMP manager application. The manager may request information from the agent with an SNMP GET request, and the agent responds with a GET response containing the requested data. Figure 18.2 shows a manager requesting a piece of information from a managed device. Figure 18.2 An SNMP manager requests information from a managed device.In Figure 18.2, the NMS has issued a request to read a piece of management information from the managed device, an Ethernet hub. The device returns the requested information ( systemUpTime , or elapsed time since the device was powered on) to the NMS. SNMP also allows a manager to send a SET request to an agent. This instructs the agent to modify its operational status in some way. Figure 18.3 shows a manager issuing a SET request to a managed device. This SET request sets the system.sysContact.0 Management Information Base (MIB) variable to the string bjensen@airius.com , which represents the email address of the person responsible for the managed device. Figure 18.3 An SNMP manager issues a SET request to a managed device.Warning Older SNMP standards do not provide secure authentication, so SNMP SET commands are rarely enabled in managed devices. Managed devices can also generate spontaneous messages called traps , which indicate some exceptional condition. In Figure 18.4, the managed device has encountered some error and has generated a trap that it has sent to the management station. As before, the NMS may choose to take some action upon receiving a trap from a managed device. Figure 18.4 A network device generates an SNMP trap.SNMP can be used to manage a wide range of devices on a network, including switches, routers, and hubs. But how does the manager know what types of data are made available by a particular agent? The collection of available management information for all devices in the universe is described in the MIB ” a huge tree of management information. Each type of device has its own branch of the tree, and the leaves of the tree represent the actual parameters that may be managed. The actual structure of the MIB isn't particularly important to our discussion, however. Suffice it to say that the MIB assigns a unique identifier to every possible parameter on every type of device you might want to monitor. This lets managers and agents refer precisely and unambiguously to operating parameters of every device on the network. When a vendor creates a new piece of networking hardware and wants to make it manageable via SNMP, it needs to create agent software for the device and document the MIB variables that the device supports. If the device performs some common function for which variables already exist in the MIB, no additional MIB variable needs to be defined. If, on the other hand, the device provides new functionality that is to be monitored via SNMP, the vendor needs to assign and document new MIB variables that correspond to these new functions. In a typical enterprise, one or more NMS stations monitor large numbers of network devices. These management stations poll the devices and record the collected data in a database. Other parts of the NMS display the data graphically. For example, an NMS might offer a pictorial view of a vendor's router. If one of the interfaces on the router encounters a large number of errors in a short time, the NMS package might color the view red to indicate a problem and alert an on-call person by email or pager. NMS software makes the task of managing hundreds or thousands of network devices much easier by automating the data collection and analysis process. SNMP and Directory ServersAlthough SNMP is most often used to monitor networking hardware such as hubs, switches, and routers, it can also be used to monitor applications software such as an LDAP server. Figure 18.5 shows how Netscape Directory Server 3.0 can be monitored using SNMP. Figure 18.5 Monitoring Netscape Directory Server via SNMP.The host that runs the Netscape Directory Server must run an SNMP master agent (on UNIX platforms, the master agent is included with the Netscape Administration Server; on Windows NT, the master agent is included with the operating system). This master agent routes all incoming communication from the NMS to the appropriate subagent (there might be more than one subagent if more than one monitored server is running on the host). The subagent processes the SNMP request and returns the result to the master agent, which routes it to the NMS. Similarly, a subagent may send a trap to the master agent, which forwards it to the NMS. Netscape Directory Server supports a subset of the MADMAN MIB defined in Internet Draft draft-ietf-madman-dsa-mib-1-07.txt , which is a product of the Mail and Directory Management (MADMAN) working group in the IETF. As of this writing, this MIB is about to become a proposed standard and will be published as an Internet RFC. The MIB variables supported by the Netscape server are divided into three sections, or tables: Operations, Entries, and Server Interaction. All the counters described in the table are reset whenever the directory server is restarted. The MIB variables supported by Netscape Directory Server are listed in Table 18.1. (Note that any MIB variable present in the Internet Draft but absent from Table 18.1 is not supported by Netscape Directory Server.) Table 18.1. SNMP MIB variables supported by Netscape Directory Server
In addition to making these MIB variables available via SNMP GET requests, the Netscape Directory Server SNMP subagent generates an enterprise-specific trap whenever the server starts up and whenever it shuts down, either normally or abnormally. You can make use of this information by configuring your NMS to generate an alert when parameters exceed some preset limits. You may also want to configure your NMS to warn you if it receives a "server down" trap, indicating that the server has shut down for some reason. More information about using SNMP with Netscape Directory Server can be found in the Netscape Directory Server Administrator's Guide. Monitoring Your Directory with Custom Probing ToolsIf you don't have an NMS but still need to monitor your directory service, a simple set of tools can be constructed using off-the-shelf components such as command-line LDAP tools or PerLDAP. In this section, we suggest some general design principles for developing your own monitoring tools. We suggest you probe your directory by performing the same general types of operations that your users and directory-enabled applications perform. For example, you might choose to probe your directory by periodically attempting to read an entry from the directory (see Figure 18.6). Figure 18.6 Probing the directory via LDAP.When you probe the directory in this manner, the expected result is that a single entry will be returned. If some other result is obtained, the LDAP server is either unavailable, unreachable because of a network failure, or malfunctioning in some manner. Table 18.2 summarizes the most common failure modes for a monitoring client. Table 18.2. Types of monitoring failures
How do you know when to declare a failure? Think about the error conditions you consider to be critical. If the directory server is completely unresponsive , this is obviously a serious condition that must be remedied. But what if the server responds slowly? Part of your development effort involves setting thresholds. For example, you might decide that a response time of longer than three seconds for a simple search constitutes a serious degradation of performance and calls for notification of the appropriate person. If one of the first two error conditions in Table 18.2 is encountered, the fault may lie with the network rather than the server. You can distinguish this condition by sending an ICMP echo request (a ping ) to the router closest to the directory server. If the router isn't able to receive a ping , you really can't draw any conclusions about the state of the directory server. About the best you can do is declare its state "unknown." How you handle this situation is up to you. You can consider it a failure and notify the appropriate people, or you can ignore the error condition on the assumption that the server is running but unreachable. In addition to simple searches, you might consider test probes that approximate the types of activity your users and directory-enabled applications generate. For example, if a common activity is for your users to update their personal information (such as home telephone number), you might develop a script that replaces the homeTelephoneNumber attribute of a special test entry. It's a good idea to spend some time examining any access logs your server generates; this way, you can tally the various types of operations it handles and perform similar operations in your probing tests. Tip The sample Perl script included at the end of this chapter implements a test like those described here: It searches for an entry and reports an error if the entry cannot be retrieved. The script is "smart" ”that is, it avoids generating an alert if the directory server becomes unreachable because of a network failure. We recommend that you use a modular approach when designing custom monitoring tools. In a modular approach, you separate the actual monitoring processes from the policy decisions about what constitutes a failure and who should be notified. One way to accomplish this is to have the probing modules simply write their test results to a log file in a standard format, then have another process read and interpret the log file and implement a notification policy. For example, you might implement a policy stating that a directory server at a remote location must exhibit slow performance twice in any five-minute period before an alert is raised (doing this requires saving state information between probes). Figure 18.7 depicts such an architecture. Figure 18.7 Separation of monitoring functions and alert/notification functions.Suppose that you have an existing NMS deployed, but your directory server is not manageable via SNMP. How can you integrate monitoring of your server into your NMS? Fortunately, many NMS software packages include application management functions, which include the ability to monitor and manage application packages. Although not a trivial task, monitoring of a directory server can be performed with one of the application management “capable packages. IBM's Tivoli TME 10 and Computer Associates' CA/Unicenter TNG are two products that include these capabilities. Netscape SuiteSpot servers can be managed and monitored using the Tivoli TME 10 Module for SuiteSpot, and Lotus Notes/Domino servers can be managed and monitored with TME 10 and the TME 10 Module for Notes and Domino. Log File AnalysisDirectory server log files can provide an effective way to monitor your servers. Most server software logs warning and error messages to a well-known location. On UNIX systems, software often logs messages using the syslog facility. On Windows NT systems, messages are often logged to the NT Event Log and may be viewed using the Event Viewer utility. Netscape Directory Server logs error messages to a file rather than using the system-provided logging facilities. A number of available tools can be used to monitor the directory server logs for error messages. These utilities can be configured to trigger an alert when, for example, a message matching a particular pattern appears in the log file. Automatically analyzing log files can also be useful in identifying performance trends that indicate a problem. For example, Netscape Directory Server 4.0 writes to its access log the elapsed time each operation takes to complete. You might choose to develop a tool that keeps a running average of the elapsed times reported for the last 100 operations. If the average elapsed time exceeds some threshold, an alert can be generated. Log file analysis is also the centerpiece of a proactive monitoring strategy, which is discussed later in this chapter. Operating System Performance DataOS performance data can be a valuable aid in identifying the underlying causes of performance trends. For example, you can discover the total amount of virtual memory in use on the system or the amount of free disk space on a particular disk or disk partition on virtually any OS. These types of information can help you identify when your directory server performance is suffering because of some OS problem. For example, if you note that your OS constantly writes to its virtual memory paging space, this is an indication that there is insufficient physical memory for the applications running on the machine. You can remedy this problem by reducing the number of applications running on the machine (by moving them to another machine), or by adding additional RAM to the machine. OS performance data can be monitored by writing custom scripts that periodically check the system. Alternatively, you can install an SNMP agent that obtains this OS data and makes it available via SNMP. Then, you can monitor the server's operating system health from your central NMS, if you have one. Indirect MonitoringAnother approach that should be part of your monitoring strategy is indirect monitoring. In this approach, you don't monitor the individual services such as your directory. Instead, you construct monitoring tools that measure availability and response times for the things that matter directly to your end users. For example, if you provide an electronic mail service that depends on the directory, you might periodically measure the time it takes to send an electronic mail message from one user to another. Indirect monitoring is the real acid test for your servers and network because any single part of your complex, interconnected systems can be at fault. Combined with monitoring the individual devices, servers, and server processes that make up your directory-enabled application, you can quickly identify and repair problems ”and ultimately improve the reliability and performance of all your directory-enabled applications.
|
Index terms contained in this sectionagentsSNMP creating applications monitoring tools collecting data SNMP connections refusals directory monitoring timeout failures directory monitoring 2nd data collecting and displaying SNMP types SNMP management directories monitoring NMS packages 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th probing tools 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th displaying data SNMP dsaAddEntryOps MIB SNMP variable dsaAnonymousBinds MIB SNMP variable DsaBindSecurityErrors MIB SNMP variable dsaCacheEntries MIB SNMP variable dsaCacheHits MIB SNMP variable dsaCompareOps MIB SNMP variable dsaErrors MIB SNMP variable dsaFailures MIB SNMP variable DsaFailuresSinceLastSuccess MIB SNMP variable dsaIntIndex MIB SNMP variable dsaModifyEntryOps MIB SNMP variable dsaModifyRDNOps MIB SNMP variable dsaOneLevelSearchOps MIB SNMP variable dsaReferrals MIB SNMP variable dsaRemoveEntryOps MIB SNMP variable dsaSearchOps MIB SNMP variable dsaSimpleAuthBinds MIB SNMP variable dsaStrongAuthBinds MIB SNMP variable dsaSuccesses MIB SNMP variable dsaTimeOfCreation MIB SNMP variable dsaTimeOfLastAttempt MIB SNMP variable dsaTimeOfLastSuccess MIB SNMP variable dsaUnauthBinds MIB SNMP variable dsaURL MIB SNMP variable Entries Table MIB SNMP variables failures directory monitoring 2nd slow server response 2nd GET requests and responses SNMP ICMP echo requests directory monitoring indirect monitoring 2nd LDAP error codes directory monitoring log files monitoring 2nd MIB variables SNMP server monitoring 2nd 3rd 4th 5th 6th modular design monitoring tools monitoring NMS packages 2nd SNMP 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th probing tools application management failure types 2nd 3rd 4th indirect monitoring 2nd log file analysis 2nd modular designs operating system performance data 2nd test probes NMS packages monitoring with 2nd SNMP 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th No route to host failure directory monitoring 2nd operating systems monitoring performance 2nd Operations Table MIB SNMP variables 2nd 3rd performance monitoring operating systems 2nd pings directory monitoring probing tools monitoring with application management failure types 2nd 3rd 4th indirect monitoring 2nd log file analysis 2nd modular designs operating system performance data 2nd test probes protocols SNMP NMS monitoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th refusals connections directory monitoring Server Interaction Table MIB SNMP variables 2nd 3rd 4th 5th servers slow response times directory monitoring 2nd SNMP monitoring 2nd 3rd MIB variables 2nd 3rd 4th 5th 6th subagent traps SET requests SNMP slow server response times directory monitoring 2nd SNMP NMS monitoring 2nd 3rd 4th 5th agents, creating collecting and displaying data data types GET requests and reponses servers 2nd 3rd 4th 5th 6th 7th 8th 9th 10th SET requests traps subagents SNMP server monitoring traps syslog facility monitoring log files test probes directory monitoring timeout failures directory monitoring 2nd traps SNMP SNMP subagent server monitoring variables MIB SNMP server monitoring 2nd 3rd 4th 5th 6th |
2002, O'Reilly & Associates, Inc. |