Monitoring Network Services | UNIX Fault Management: A Guide for System Administrators

I l @ ve RuBoard

A variety of applications can be accessed over the Internet or via corporate intranets . For instance, a client application may use the network to access an application on the server system. The application running on the server system is referred to as a network service.

The availability and response time of the network service is what the user cares about. This section discusses tools for monitoring the network service. Some of the more common network services are discussed, such as the Domain Name System (DNS), Network Information Service (NIS), and Network File System (NFS). Other applications are discussed in Chapter 7.

Many network services are started by a daemon process in response to a client request. The daemon process is called inetd. You can use ps to see whether this daemon is running. If it is not running, you should try to restart it by executing /usr/sbin/inetd as superuser. You can use netstat “a to list the network applications currently running on a system. The TCP or UDP socket ports associated with each service are listed in the file /etc/services. If a service is not running, you may want to check this file to see whether the service is configured. The service should also be listed in the file /etc/inetd.conf. The inetd daemon writes errors and informative messages to /var/adm/syslog/syslog.log when new services are started or stopped .

There are other important network services that are up all the time (or should be). These services include NFS and HTTP. The rest of this section lists monitoring tools specific to each network service.

Monitoring DHCP/BOOTP Servers

The Bootstrap Protocol (BOOTP) allows systems to obtain their network configuration information, such as their IP address, from a remote server system. The BOOTP service bootpd may use the Trivial File Transfer Protocol (TFTP) to transfer data back to the client. Make sure that both are configured as Internet services in /etc/services and /etc/inetd.conf.

When receiving a BOOTP request, a service checks whether it has information for the client in its database. If it does, it replies with the information; otherwise , it may forward the request to another system. You may need an analyzer to ensure the key parameters set by the client in the BOOTP request are correct.

Client information is put in the file /etc/bootptab. If BOOTP is working correctly in general, but not for a specific system, you should check this file to ensure the configuration information is correct. You may want to issue a test from the server by using bootquery, which can be used to send test BOOTP requests to the BOOTP server. In this way, you can verify that the service is starting in response to a query. By specifying the client address in the query, and by enabling replies to this client to be broadcast ” a configuration option in /etc/ bootptab ” you can see the actual reply sent by this BOOTP server.

Dynamic Host Configuration Protocol (DHCP) is a superset of BOOTP. Both bootpd and DHCP can write some error messages to the system log file. If you are troubleshooting a problem, make sure that bootpd is started with the “d 2 command option, to enable informative log messages to be written to the system log file.

DHCP allows for clients to obtain their IP address from a pool of available addresses. The /etc/bootptab file is used for BOOTP clients and for DHCP clients with fixed IP addresses. Clients belonging to a pool are configured in /etc/dhcptab.

On HP-UX, dhcptools can be used to trace DHCP packets or to examine the internal state of the DHCP daemon. This can be used, for example, to find configuration problems. The operator can determine whether IP addresses are available to be given out to clients, and the total number of IP addresses available. dhcptools also has an option to allow for packet tracing of DHCP requests and replies.

Note that monitoring the server status is difficult, because DHCP can shut itself down when it has nothing to do.

Monitoring DNS/NIS Name Servers

Domain Name System (DNS) and Network Information Service (NIS) are two services for providing name resolution. NIS has additional capabilities, such as maintaining consistent configuration information across multiple systems.

An easy way to check these services is to make a query to the server application. In addition to checking the service availability, you can get a snapshot of the response time. Note that in this and other examples, the monitoring of the service is being done from the client system.

The configuration file /etc/resolv.conf contains the IP addresses of name servers being used by the local system. An example is shown in Listing 6-16. You may want to ping the servers listed in this file if you are having trouble with name resolution.

Name Service Switch

The Name Service Switch enables you to specify which name service should be used for name resolution and other queries. You may want to check the /etc/nsswitch.conf file to see how name resolution is being done on your system. The nsquery command can be used to test the behavior of the Name Service Switch.

nslookup

nslookup can be used to determine whether a name server is aware of a particular host name. Listing 6-17 shows a name server successfully resolving the system name to an IP address.

Monitoring FTP

File Transfer Protocol (FTP) is used for transferring files efficiently between systems. An easy way to check whether FTP is working is to use ftp to transfer an arbitrary file. Along with an indication of the success of the request, ftp tells you the transfer rate. While connected, you can check the status of the remote FTP session by using the status FTP command, as shown in Listing 6-18.

Listing 6-16 Checking name servers in /etc/resolv.conf.

 # more /etc/resolv.conf domain cup.hp.com nameserver 15.27.217.36 # cup44ux nameserver 15.13.168.63 # hpperf1

Listing 6-17 Using nslookup to get a system's IP address.

 # nslookup gsyview1 Name Server:  cup44ux.cup.hp.com Address:  15.27.217.36 Name:    gsyview1.cup.hp.com Address:  15.13.174.132

ftp denies access to the local user accounts listed in /etc/ftpusers, so be sure to check this file if ftp is not working for certain users.

Monitoring NFS

The Network File System is meant to give users a transparent view of remote filesystems, and to make those remote filesystems appear to be attached locally. NFS can be a useful tool to share filesystems to save disk space and make management easier. All NFS clients share the single copy of the file that is stored on the NFS server. The downside of the transparency is that a client may not be aware of the network traffic they are causing by reading a simple file. Also, the client may not know that others are simultaneously accessing the files.

Numerous things can go wrong with NFS. Processes can die or hang. The key processes for NFS are nfsd, automount, lockd, statd, and portmap. The status of these processes can be checked with the ps command.

NFS requests are processed by NFS server daemons (nfsd). These daemons read requests from a single UDP socket. The number of daemons is adjustable, with the optimal number dependent on system load and NFS activity. If you do not have enough daemons running, then the kernel may drop some NFS requests when they arrive , to avoid overflowing the storage area reserved for these packets. You can detect this problem by checking the number of UDP socket overflows using netstat “s. Increasing the number of daemons should reduce socket overflows and increase NFS performance. However, no tools exist to help you determine how many daemons to add. An excess of NFS daemons may be identified through an increasing load average on the server. Also, because the number of overflows is a system-wide statistic, the problem may be due to another UDP application. You may want to use netstat to show the current UDP activity, before you assume that any problems are due to NFS.

Listing 6-18 Output from FTP status command showing status of remote machine.

 ftp> status Connected to hpssc16.cup.hp.com. No proxy connection. Mode: stream; Type: binary; Form: non-print; Structure: file Verbose: on; Bell: off; Prompting: on; Globbing: on Store unique: off; Receive unique: off Case: off; CR stripping: on Ntrans: off Nmap: off Hash mark printing: off; Use of PORT cmds: on ftp>

If you are responsible for monitoring NFS, you should also check the status of exported filesystems, and disks to see whether they are full or unavailable. NFS statistics should be checked to see whether NFS timeouts have been reported . MeasureWare collects statistics on the number and rate of NFS requests during a time interval. BMC PATROL provides a tool with its UNIX KM to monitor NFS activities and Remote Procedure Calls (RPCs).

nfsstat

The nfsstat output can give you an idea of the type of NFS workload occurring in your environment. This can be useful if you want to establish benchmarks to measure your typical server load.

nfsstat can report information for an NFS client, an NFS server, or both. The example in Listing 6-19 shows the statistics for an NFS client.

The bad transaction identifier (XID) metric indicates the number of times replies are received without a request outstanding. This could be an indication that client requests are timing out too quickly and being resent . If the client has many bad XID packets and timeouts, you should increase the timeout value by using the mount utility.

A large number of symbolic link resolutions on the server may indicate that the client has configured inappropriate names as mount points.

Listing 6-19 Output showing a client's NFS statistics.

 # nfsstat -c Client rpc: calls    badcalls  retrans   badxid   timeout   wait       newcred 200      0         0         0        0         0          0 Client nfs: calls    badcalls  nclget    nclsleep 200      0         200       0 null     getattr   setattr   root     lookup    readlink   read 0  0%    133 66%   0  0%     0  0%    0  0%     0  0%      0  0% wrcache  write     create    remove   rename    link       symlink 0  0%    0  0%     0  0%     0  0%    0  0%     0  0%      0  0% mkdir    rmdir     readdir   statfs 0  0%    0  0%     0  0%     67 33% #

NetMetrix for NFS

NetMetrix provides an extension for monitoring the NFS service. Monitors are used to capture the network traffic, after which NFS service performance can be calculated. Graphs can show information such as the load, response time, number of retransmissions, and number of errors.

HA NFS

An HA NFS toolkit is available as an add-on to the MC/ServiceGuard product. NFS server failures can be detected and reported to MC/ServiceGuard, with NFS service restarted on another node. NFS locks are not maintained during the failover.

Monitoring Remote Connectivity

rlogin and telnet provide remote login capability so that clients can access the UNIX systems. For rlogin, the application started by inetd is called rlogind, and for telnet, the daemon is telnetd. The who command can show you the users that are logged into a system. Use ps “ef grep rlogind to see how many users are remotely connected, and for how long they have been connected.

Monitoring Web Servers

Monitoring Internet services is a new phenomenon , and tools to monitor Web servers are just beginning to emerge. Netscape's SuiteSpot server software is being bundled with HP OpenView technology for Internet monitoring and fault management for high-end, enterprise-level HP-UX users.

The HP OpenView Internet Service Manager provides tools to manage Web resource utilization and performance. Monitoring is done from an IT/O management station. Management is provided for the following UNIX-based Internet services: Netscape Enterprise Server, Netscape Fastrack Server, Netscape News, Netscape Proxy Server, Netscape Mail, NCSA Web Server, and Apache Web Server. Internet Service Manager can also identify broken HTML links.

CompuWare also has some tools, such as EcoScope and Single View, which can be used to monitor Web servers. Network traffic can be analyzed at the protocol level to capture HTTP packets. Single View can then report on the traffic load, with specialized Internet utilization reports for Web (HTTP), secure Web (HTTPS), and other protocols.

I l @ ve RuBoard