9.1 Ping | Linux Kernel in a Nutshell (In a Nutshell (OReilly))

Perhaps the first tool that most administrators reach for when debugging a network problem is the ping program. It can tell you if a machine is alive on the network, and it can print statistics on the network conditions from your machine to another. Though the ping program is relatively simple, it has a few subtleties that are often overlooked.

Ping comes installed with every Unix operating system, so there is no need to build it yourself unless you wish to use a different version than the one you have installed. Be aware that the program tends to use different options and has different default behavior on different systems.

9.1.1 How Ping Works

The ping program operates by sending an Internet Control Message Protocol (ICMP) ^[1] echo message (an ICMP message whose type is echo ) to a remote host. When a networked device, such as the remote host, receives the ICMP echo message, it responds with an ICMP echo reply to the sending host. The ping program waits to receive this ICMP echo reply message and uses the fact that it has arrived, the amount of time it took to arrive , and other data to report statistics back to the user .

^[1] ICMP is a part of the Internet Protocol and is used for sending messages about errors and other control information at the IP layer.

Using a ping test on a device is similar to using sonar on a submarine. Your submarine sends out a loud ping and then waits to hear the response from the sound bouncing off other objects. You want to know if the sound returned at all ( otherwise there's nothing out there), and if it does, you want to know how long it took to make the trip.

In the simplest case, the ping program can be used as a test to see if a machine is reachable on the network. One or more ICMP echo messages are sent, and if the device responds within a reasonable amount of time, the ping program will indicate that the machine is alive:

 Solaris% ping workstation.example.com workstation.example.com is alive Solaris% ping client.example.com no answer from client.example.com

Note that one difference in the ping program behavior on different platforms is already relevant. The Solaris version of ping performs this simple alive-or-dead test by default, while the Linux version will send continuous echo requests unless you specifically ask it not to.

If the test is successful, what have you learned? You know that the ICMP packet was sent from your workstation to the remote host and that the remote host was able to send an ICMP packet back to your workstation. If the test fails, however, you do not know exactly where the problem is. It may be that your ICMP echo packets are not reaching the remote machine, or it may be that the remote machine is receiving the packets but the responses are not reaching your workstation. This may be or may not be expected behavior. Some sites administratively block ICMP traffic so that even if a host is on the network, you will not be able to ping it. On rare occasion, you may find a host where the operating system has been modified to ignore ICMP echo messages while other parts of the system will respond normally to network requests. Typically, however, you can expect a host operating under normal conditions to respond to pings , especially if you know it did at some point in the past.

One common problem that can be diagnosed with the ping program is a machine whose netmask or gateway is misconfigured. If either of these pieces of information is incorrect, you may not be able to successfully ping the machine from a different network, but you can ping it from a host (or router) that is on the same network. ^[2] Details on using a router to ping a host are presented later in this section.

^[2] The definition of "network" is a little fuzzy here, and "subnet" might be a better word. It is essentially any set of machines that can communicate directly without going through an IP router.

You can diagnose other problems by running ping in a different mode. By repeatedly sending ICMP echo request packets, a continuous ping test can report results as they change in real time. Linux will use this behavior by default; on Solaris you must use the -s option:

 Solaris% ping -s server1.example.com PING server1.example.com: 56 data bytes 64 bytes from server1.example.com (192.0.2.3): icmp_seq=0. time=14. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=1. time=14. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=2. time=14. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=3. time=13. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=4. time=13. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=5. time=13. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=6. time=13. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=7. time=14. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=8. time=14. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=9. time=13. ms 64 bytes from server1.example.com (192.0.2.3): icmp_seq=10. time=14. ms ^C ----server1.example.com PING Statistics---- 11 packets transmitted, 11 packets received, 0% packet loss round-trip (ms) min/avg/max = 13/13/14

Once every second, the ping program sends an ICMP echo packet. For each ICMP echo reply the program receives, a single line of output is printed. When the user sends a break (by typing <ctrl>-C), the program terminates and reports the cumulative statistics. Included in the statistics is the packet loss rate, which is the percentage of ICMP packets sent for which there was never a corresponding response. Though applications will tolerate low levels of packet loss, any amount of packet loss indicates a network problem. A solid local network should have 0% packet loss.

The last field on each line printed in a continuous ping test indicates the time period from when an ICMP echo packet was sent and the corresponding ICMP echo reply was received. This is called the round-trip-time ( RTT ). In this case, the average RTT is 13 milliseconds , as you can see from the last line of the output. What is a normal value for the RTT? The answer depends on what you are testing. If the remote host is next door on the same physical network, you would expect a low RTT, say 0 “3 ms. If instead the remote host is across many networks and on the other side of the world, you should not be surprised to see 150 ms or larger RTTs.

Because the RTT is a very high-level measurement of latency, a large RTT value can be the result of any number of factors and it does not immediately indicate that something is broken. For example, some transmission media such as satellite links are expected to have a high latency. ^[3] One common condition that causes high RTT times is when the CPU of the host being pinged is too busy to respond quickly to ICMP requests. Many routers will prioritize other tasks over responding to ICMP when the CPU becomes bogged down with tasks . If this happens, the RTTs to the router will be much higher than normal but not because of any problem with the network itself.

^[3] It takes light a little while to make it all the way up to the satellite and back, of course.

You will also want to note whether the RTT values are consistent or erratic. Very large and erratic changes in RTTs can be a sign of congestion, high collision rate, route flapping, or other network problems.

Options for Ping

Though each ping program is different, most will let you change the default options to facilitate more interesting testing. The most important option is the size of the ICMP packets sent. The default size of the data portion sent by most ping programs is 56 bytes. With 28 bytes added for the IP and ICMP headers, the full IP packet ends up being 84 bytes long, which is still a relatively small packet. ^[4] Since a number of network problems will not present themselves unless larger packets are used, you will frequently want to instruct the ping program to send more data in a single packet. On Solaris, the packet size is specified as an additional argument after the hostname; on Linux, you must use the -s option (not to be confused with the Solaris -s option, which requests a continuous ping).

^[4] You will notice that the ping program reports 64 bytes received; this is referring to the combined ICMP header and data, but not the IP header. The IP header adds 20 bytes, for a total packet length of 84 bytes.

 Solaris% ping -s client.example.com 1450 PING client.example.com: 1450 data bytes 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=0. time=1. ms 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=1. time=1. ms 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=2. time=1. ms 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=3. time=1. ms 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=4. time=1. ms 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=5. time=1. ms 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=6. time=1. ms 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=7. time=1. ms 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=8. time=1. ms 1458 bytes from CLIENT.EXAMPLE.COM (192.0.2.114): icmp_seq=9. time=1. ms ^C ----client.example.com PING Statistics---- 10 packets transmitted, 10 packets received, 0% packet loss round-trip (ms) min/avg/max = 1/1/1

Because Ethernet can support packets as large as 1500 bytes, we choose to send packets with 1450 bytes of data, which leaves a little room for protocol headers. Note that if one of the links between us and client.example.com has a smaller maximum transmission unit (MTU) than 1500 bytes, the packets will need to be fragmented before transmission, which may have unexpected results on your test. If the problem you are debugging is possibly an MTU problem, the results will be relevant, but if it is not an MTU problem, they may confuse the issue. A thorough ping test will test each link at a variety of packet sizes.

Other options that many ping programs will allow include changing the time interval between pings, the IP time-to-live (TTL) value, the IP source address and the IP type of service field. While these are occasionally useful options, they are not often needed to help diagnose network problems.

Pinging from Network Devices

Most managed network devices have ping software built in, which allows you to run a ping test from many different places on your network. The IOS ping that runs on Cisco routers, for example, is illustrated here:

 router# ping client Translating "client"...domain server (192.0.2.160) [OK] Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 192.0.2.114, timeoout is 2 se... !!!!! Success rate is 100 percent (5/5), round-trip min/avg/maz = 1/1...

As with the other ping programs, the Cisco IOS ping will allow you to set more interesting options. If you run the ping command with no arguments, you will be prompted for the more advanced features:

 router# ping Protocol [ip]: Target IP address: 192.0.2.114 Repeat count [5]: 50 Datagram size [100]: 1450 Timeout in seconds [2]: Extended commands [n]: y Source address or interface: Type of service [0]: Set DF bit in IP header? [no]: Validate reply data? [no]: yes Data pattern [0xABCD]: Loose, Strict, Record, Timestamp, Verbose[none]: Sweep range of sizes [n]: Type escape sequence to abort. Sending 50, 1450-byte ICMP Echos to 192.0.2.114, timeoout is 2 ... !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Success rate is 100 percent (50/50), round-trip min/avg/max = 1...

Setting the don't-fragment (DF) bit in the IP header can help debug MTU and fragmentation problems. Validating that the ICMP data field is returned as sent can help detect data corruption on the wire. These and all of the other extended options are available only from enable mode on the router.

Running an Effective Ping Test

Using the ping program to effectively diagnose a problem frequently requires more thought than simply running the program once against a problematic host. A good ping test will explore different possible problems and attempt to eliminate anything unrelated to the true problem. If ping performance is poor to a particular host, is it also as bad to other hosts on the same network? What about hosts on networks between yours and that of the remote host? Does the problem exhibit itself for large packets only? Is the problem transient or consistent? Use these questions as a guide to begin your debugging.