Monitoring Networking and Transport Protocols | UNIX Fault Management: A Guide for System Administrators

I l @ ve RuBoard

Networking problems can be difficult to diagnose, because networking involves a variety of hardware and software components . The previous section focused on the physical cable and LAN card. This section shows tools for checking the behavior of the networking protocols, primarily those in the TCP/IP protocol suite.

This section describes monitoring for the network and transport layers of the OSI seven-layer networking model. These layers include the protocols for data transmission, routing, sequencing, and flow control. In a LAN environment, error detection and recovery are the responsibility of the transport layer. A variety of key resources need to be monitored at these layers, such as the IP addresses being used by each client and server, the number of active connections to a server, and error statistics for each protocol.

Using SNMP Instrumentation

The MIB-II was discussed earlier because of its ability to report status information for each network interface. The MIB-II also contains information for some key networking protocols, such as IP, ICMP (Internet Control Message Protocol), TCP, ARP, and UDP.

If you have access to a MIB Browser, then you may want to browse this MIB to see the interesting statistics that you can access. For example, for IP, you can check the amount of data being sent or discarded, and the amount being received, and get an idea of the fragmentation, reassembly, or delivery errors that are occurring. For ICMP, you can determine the number of times a destination was unreachable. For TCP, you can look at the number of connection attempts and see the current connections. For UDP, you can see the number of open sockets.

This section is brief, because very similar information is provided by the netstat command, which is discussed in detail later in this section. Unless you are monitoring the MIB variables of multiple systems through a network management product, you probably want to use the netstat command.

Using Standard Commands and Tools

Common UNIX commands for obtaining network and transport protocol information are included in this section.

ping

The ping command uses ICMP echo to test connectivity to a remote node and record round-trip times and routes for network packets. The percentage of packets lost is also shown. Because ICMP uses IP for its data transmission, ping not only tests the link itself, but it also tests for correct network protocol behavior. Transport protocols such as TCP and UDP are not involved in a ping. Listing 6-6 shows an example of the ping command. Note that ping can be used with an IP address or system name. If used with a name, ping can determine whether the name resolution service is working properly.

If the ICMP sequence numbers are not in numerical order, a hardware problem may be present. If no response is made to a ping, a variety of problems are possible. If other systems can be pinged, especially systems with the same route, then the remote system might have failed. If no system can be pinged, it could be a sign that the local system is not connected to the network. The ping command may fail for other reasons, such as the failure of an intermediate system or a misconfigured routing table.

Listing 6-6 Output from ping command.

 cancan#ping chacha PING chacha.cup.hp.com: 64 byte packets 64 bytes from 15.13.173.94: icmp_seq=0. time=0. ms 64 bytes from 15.13.173.94: icmp_seq=1. time=0. ms 64 bytes from 15.13.173.94: icmp_seq=2. time=1. ms 64 bytes from 15.13.173.94: icmp_seq=3. time=0. ms ----chacha.cup.hp.com PING Statistics---- 4 packets transmitted, 4 packets received, 0% packet loss round-trip (ms)  min/avg/max = 0/0/1 cancan#

One downside of the ping command is that one end of the communication must be the system you are logged into. The communication problem may be between two other systems. With NNM's Remote Ping tool, you can specify an arbitrary pair of systems to ping each other.

ifconfig

The ifconfig command is used to configure network interfaces and it can be used to determine whether IP has been set up correctly. The output, shown in Listing 6-7, provides information such as the IP address associated with the interface and the subnet mask.

The operational state is included in the output.

Listing 6-7 Output from ifconfig command.

 cancan#ifconfig lan2 lan2: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST>         inet 15.13.173.93 netmask fffff800 broadcast 15.13.175.255 cancan#

arp

The Address Resolution Protocol (ARP) is used when a sending system knows the IP address of the destination, but not the station address. An ARP request is broadcast on the network, asking the destination or intermediate router to respond with its station address. Recently used IP-address-to-station-address mappings are stored in an ARP cache.

The arp command can be used to display the current entries in the ARP cache. This can be a useful tool if you think that two systems may be using the same IP address. You can check the MAC address on the remote system and compare it to the value in the local system's ARP cache. You may also want to use a LAN analyzer to see whether multiple ARP replies are being received by the local system. Note that in the arp output in Listing 6-8, one station address (0:60:3E:81:E:A0) appears multiple times. This system is a router supporting multiple network interfaces.

Listing 6-8 Output from arp command showing remote station addresses.

 # arp -a hpovsg.cup.hp.com (15.13.169.163) at 8:0:9:78:2d:eb ether cup44ux.cup.hp.com (15.27.217.36) at 0:60:3e:81:e:a0 ether hpssc16.cup.hp.com (15.13.169.233) at 8:0:9:4a:18:61 ether hphamc1.cup.hp.com (15.13.174.176) at 8:0:9:84:71:94 ether hphamc2.cup.hp.com (15.13.174.177) at 8:0:9:84:21:e8 ether c3p0.cup.hp.com (15.13.171.249) at 8:0:9:e3:92:69 ether wayne.cup.hp.com (15.13.171.185) at 8:0:9:87:bb:6d ether chewy.cup.hp.com (15.13.172.249) at 0:60:b0:59:ae:14 ether perseus.cup.hp.com (15.61.200.218) at 0:60:3e:81:e:a0 ether hpssc02.cup.hp.com (15.13.169.219) at 8:0:9:82:2d:e8 ether clue.cup.hp.com (15.13.168.123) at 8:0:9:92:e9:7f ether cup44ux.cup.hp.com (15.13.168.124) at 8:0:9:5a:f9:e5 ether chacha.cup.hp.com (15.13.173.94) at 8:0:9:c3:fa:7d ether #

Another way to find out which computer system is associated with a given station address is to use NNM, which has a Locate menu option. By specifying the station address using this menu option, NNM searches its object database to find the corresponding system.

The arp command can also be used to add or delete entries from the ARP cache. This is useful if you want to use ping to diagnose IP and ARP failures.

netstat

The netstat command can be used to show information about network interfaces and protocols. Statistics can be shown for the TCP, UDP, IP, ICMP, ARP, and other protocols. netstat can be used to show all active TCP connections and their current TCP state. With netstat, you can display all the IP addresses currently configured on the system. For example, you can determine the total number of TCP connections requested and accepted. Information can be displayed on the network traffic sent on a configured network interface and its configured IP addresses. Internal data structures, such as socket structures and routing tables, can also be shown.

Listing 6-9 shows how netstat can be used to get statistics about network interfaces. These statistics are also available from the lanadmin command, but lanadmin doesn't provide IP address information. During periods of normal operation, the error rates shown here should be very low (much less than 1 percent). Note that, in this example, MC/ServiceGuard is being used on the system, and multiple IP addresses are bound to the same network interface.

On HP-UX, use netstat “in to make sure that two network interfaces are not using the same network address (or subnetwork address), because this is not supported on HP-UX.

With netstat, you can see protocol statistics for each of the key networking protocols. Listings 6-10 through 6-13 show sample output for IP, ICMP, TCP, and UDP, respectively.

ICMP provides numerous useful error statistics. Destination unreachable errors can be caused by many things, including an unknown destination name, routing problem, or bad hardware. A source quench is a request to a system to reduce its transmission rate because of congestion. Routing redirect messages are used to request that a system use an alternate route to a destination network.

Listing 6-9 Output from netstat showing configured IP addresses.

 bass:/>netstat -in Name     Mtu  Network     Address          Ipkts Ierrs    Opkts Oerrs  Coll lo0      4136 127.0.0.0   127.0.0.1      4693211     0  4693219     0     0 lan3     1500 15.13.168.0 15.13.168.61  30925479     0  7106611     0     0 lan2     1500 192.5.1.0   192.5.1.61      873225    15  1555027   382 20374 lan2:1   1500 192.5.1.0   192.5.1.3           16     0        0     0     0 lan2:2   1500 192.5.1.0   192.5.1.9           16     0        0     0     0 lan2:3   1500 192.5.1.0   192.5.1.7           16     0        0     0     0 lan2:4   1500 192.5.1.0   192.5.1.5           16     0        0     0     0 lan2:5   1500 192.5.1.0   192.5.1.19          16     0        0     0     0 lan2:6   1500 192.5.1.0   192.5.1.15          16     0        0     0     0 lan2:7   1500 192.5.1.0   192.5.1.13          16     0        0     0     0 lan2:8   1500 192.5.1.0   192.5.1.17          16     0        0     0     0 lan2:9   1500 192.5.1.0   192.5.1.29          16     0        0     0     0 lan1*    1500 none        none                 0     0        0     0     0 bass:/>

Listing 6-10 Using netstat to show IP statistics.

 # netstat -p ip ip:         112231915 total packets received         0 bad header checksums         0 with size smaller than minimum         0 with data size < data length         0 with header length < data size         0 with data length < header length         0 illegal ip source address         178 ip version unsupported         193 fragments received         0 fragments dropped (dup or out of space)         1 fragment dropped after timeout         0 packets forwarded         2382091 packets not forwardable         0 redirects sent

Listing 6-11 Using netstat to show ICMP statistics.

 # netstat -p icmp icmp:         7368 calls to generate an ICMP error message         0 errors not generated because old message was ICMP         0 errors not generated 'cuz old message was broadcast         Output histogram:                 echo reply: 215963                 destination unreachable: 7368         64 messages with bad code fields         21 messages < minimum length         0 messages with a bad checksum         0 messages with bad length         Input histogram:                 echo reply: 12231                 destination unreachable: 182468                 source quench: 35                 routing redirect: 108                 echo: 215963                 time exceeded: 18                 address mask request: 124         215963 responses sent

Listing 6-12 Using netstat to show TCP statistics.

 #netstat -p tcp tcp:         275906 packets sent                 7346 data packets (948597 bytes)                 0 data packets (0 bytes) retransmitted                 12967 ack-only packets (1627 delayed)                 0 URG only packets                 22 window probe packets                 1 window update packet                 255570 control packets         523932 packets received                 14736 acks (for 912101 bytes)                 3299 duplicate acks                 0 acks for unsent data                 12785 packets (854136 bytes) received in-sequence                 3 completely duplicate packets (0 bytes)                 0 packets with some dup. data (0 bytes duped)                 2837 out-of-order packets (0 bytes)                 2 packets (0 bytes) of data after window                 0 window probes                 55 window update packets                 0 packets received after close                 0 discarded for bad checksums                 0 discarded for bad header offset fields                 0 discarded because packet too short         249910 connection requests         2842 connection accepts         5677 connections established (including accepts)         255585 connections closed (including 14 drops)         247075 embryonic connections dropped         15772 segments updated rtt (of 265683 attempts)         0 retransmit timeouts                 0 connections dropped by rexmit timeout         0 persist timeouts         0 keepalive timeouts                 0 keepalive probes sent                 0 connections dropped by keepalive

You can monitor TCP statistics over time to see whether new connections are being established, for example.

netstat “a can be used to show the state of TCP connections and UDP sockets. You can determine, for example, whether a connection is fully established or the server is still waiting for a connect request from a client. The output will show the name of the network service if it is registered in /etc/services.

The netstat command can also be used to show the amount of memory in use and dedicated to network packets. If insufficient memory is available for IP fragmentation reassembly on the destination or intermediate systems, network performance can be affected. Data will need to be retransmitted. You can check the number of fragments being dropped by using netstat “sp ip. The ndd command can be used to display and change the ip_reass_mem_limit value.

nettl

nettl is HP-UX's host-based facility for tracing network packets leaving and coming into the system. Tracing is available for links such as IEEE802.3 and FDDI, as well as upper-level protocols such as TCP/IP and the OSI transport layer. nettl also logs information about network events.

To use nettl, you first need to start the tracing facility. You then need to start a specific trace, using nettl “tn. You can list the specific networking subsystems that you want included in the trace, such as the network or transport layer.

Listing 6-13 Using netstat to show UDP statistics.

 # netstat -p udp udp:         0 incomplete headers         0 bad data length fields         0 bad checksums         2217 socket overflows         0 data discards

A trace file is specified when starting a trace. Depending on the size of the output, nettl creates two trace files. A suffix of .TRC0 is appended to the most recent trace file, and the older file will have a .TRC1 extension. Listing 6-14 shows how to start and stop a trace of the ICMP traffic sent for a ping request.

After the trace data is collected, you can format the output by using netfmt. The formatted data can also be written to a file. Listing 6-15 shows a portion of the formatted file that resulted from the preceding trace.

Because of its impact on system performance, you should use nettl only when you can't diagnose a problem through other tools or when the system has significant excess capacity.

Listing 6-14 ICMP/ping trace example.

 # nettl -start # nettl -tn pduin pduout -e ns_ls_icmp -f pingtrc # ping hoffman PING hpssc16.cup.hp.com: 64 byte packets 64 bytes from 15.13.169.233: icmp_seq=0. time=2. ms 64 bytes from 15.13.169.233: icmp_seq=1. time=2. ms 64 bytes from 15.13.169.233: icmp_seq=2. time=2. ms ----hpssc16.cup.hp.com PING Statistics---- 3 packets transmitted, 3 packets received, 0% packet loss round-trip (ms)  min/avg/max = 2/2/2 # nettl -tf -e all # netfmt -Nnl -f pingtrc.TRC0 > pingfmt # nettl -stop

Listing 6-15 ICMP/ping formatted trace output.

 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ARPA/9000 NETWORKING^^^^^^^^^^^^^^^^^^^^^^^@#%   Timestamp            : Tue Jan 26 PST 1999 10:41:05.634676   Process ID           : [ICS]              Subsystem        : NS_LS_ICMP   User ID ( UID )      : -1                 Trace Kind       : PDU IN TRACE   Device ID            : -1                 Path ID          : -1   Connection ID        : 0   Location             : 00123 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ------------------------------- ICMP Header ------------------------------- type: ECHOREPLY          chksum: 0xffe3         id: 10715       seq: 1 code: none               ------------------------------- User Data ---------------------------------    0: 36 ae 0c 41 00 09 a8 44 08 09 0a 0b 0c 0d 0e 0f  6..A...D........   16: 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f  ................   32: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f   !"#$%&'()*+,-./   48: 30 31 32 33 34 35 36 37 -- -- -- -- -- -- -- --  01234567........

I l @ ve RuBoard