Section 41.2. Common Troubleshooting Commands

41.2. Common Troubleshooting Commands

A discussion follows of commands you can use to resolve network problems.

41.2.1. ping

ping can do more than just determine basic connectivity. You can also use it to discover the quality of a network connection. If users are complaining about a spotty network connection, using ping in the right way can give you a reasonably accurate idea of how much of a problem exists.

You will not be able to determine that this is the problem by using the ping command in the standard way. However, if you use ping to generate a flood of packets, you can get a fairly accurate idea of how intermittent the connection really is. As root, use the -f option to generate a ping flood, as shown here:

 root@james:~ # ping -f albion PING albion.stangernet.com (192.168.2.57) 56(84) bytes of data. ......................................................................................... ......................................................................................... ......................................................................................... ............. --- albion.stangernet.com ping statistics --- 433 packets transmitted, 153 received, 64% packet loss, time 4833ms rtt min/avg/max/mdev = 2.470/2.859/6.359/0.623 ms, ipg/ewma 11.189/3.012 ms root@james:~ #

Notice that the output says that 64% of the packets were lost. Generally, a packet loss rate between 1% and 2% is tolerable, except by the most sensitive applications. Rates higher than even 2%, and especially 5%, are generally too high for a reliable network connection that is doing any real work (e.g., an X Window System session or a database connection).

So far, ping has not yet helped you determine if this system is experiencing a hardware problem or a software problem. But now that you know some sort of problem exists, you can begin hypothesizing. Steps to take might include:

Make sure that other systems are not experiencing the same problem. This may involve verifying that the hub or switch is working properly.
Send a flood of packets to additional systems to make sure that the problem does not reside on the remote system.
Check the physical connection on the local system, as well as to the hub or switch.
Make sure that the driver on the local system matches the hardware.
Check the NIC's subnet mask.

Tip: Intrusion detection systems (IDS), described in Chapter 40, cannot tell the difference between an authorized, well-intended ping flood and one that is intended as an attack. If necessary, warn your security team that you are conducting ping floods before you create one.

If the system can't connect to a remote network such as the Internet, ping the router. Doing so involves more than a ping of the interface for the subnet. Ping the interface on the far side of the router. Then move to pinging hosts on the other side of the router. Understand, however, that many system administrators use access control lists to disable pinging across routers and switches.

Finally, when using ping, consider the following:

Use the -I option to choose the correct interface: Many systems have multiple Ethernet interfaces. You will want to make sure you are pinging the correct system.
-Use the n option if name resolution has failed: Doing so helps ensure that only IP address information is used and returned.

41.2.2. telnet and netcat

You have already learned that you can use telnet and netcat (nc on some systems) to query ports and gather information. It is important, however, to understand that you will be presented with different types of messages and errors in the context of a troubleshooting situation. Not all responses and errors are equally meaningful. But most of the responses can be quite useful. Table 41-2 provides a useful list of the most common responses.

Table 41-2. Responses to telnet and netcat queries
Response	Explanation
"Name or service not known" or "No route to host"	No system exists with that IP address or name.
"Connection refused"	Confirms that a remote system is listening. However, the port you have attempted to connect to is not open or is blocked by an iptables or ipchains rule. You nevertheless have found a live system.
"Name or service not known" or "Forward host lookup failed: Unknown host"	A DNS error indicating that no host by this name exists. This does not mean that the host does not exist at all. The name server is simply reporting that this name does not exist. Try connecting to this host by IP address. Possibly useful when troubleshooting DNS.
Connection hangs for a moment, then is dropped with no explanation	An application such as TCP wrappers has processed the connection, then dropped it. Useful when troubleshooting TCP wrappers configuration or in determining problems with nonworking services.
Connection seems to hang indefinitely	Usually implies that telnet or netcat has connected to a port. Note that in some cases, if you do not wait long enough, you can mistake a connection for a failed connection. Wait 4 or 5 seconds before you think that you have made a connection.

Once you have made a connection using telnet or netcat, you can then type in commands and send them to the listening port. Sometimes, that port may not respond at all. At other times, the port drops the connection immediately or returns gibberish.

In some cases, the port may allow an interactive session. SMTP, POP-3, and IMAP servers allow you to open a session and send commands. Many system administrators have memorized the necessary commands to send and receive email using nothing more than a telnet client or netcat. Following is an example of how you can use netcat to read e-mail from a POP-3 server:

 # netcat mail.company.com 110 Trying 214.27.208.3... Connected to mail.company.com. Escape character is '^]'. +OK (rwcrpxc59) Maillennium POP3/PROXY server #65 USER lpicprofessional +OK PASS passedexam1 +OK ready LIST +OK 1 messages (31227) 1 31227 . RETR 1 From: certification@lpi.org Subject: Congratulations Congratulations upon achieving LPIC 2 status. QUIT

In this sequence, netcat was used to connect to port 110 (the standard POP-3 port), and the user proceeded to enter a series of commands to read an email. First, the user issued the USER and PASS commands to authenticate to the remote system. Then, the LIST command was issued to see if any emails were waiting. In this particular session, one message was waiting.

To read the email message, the user simply typed RETR 1. The contents of the message were then displayed, giving good news in this case. If multiple email messages existed, the user could have typed RETR 3 to read the third message, or RETR 41 to read the 41st message. To end the session, the user typed QUIT. A session using telnet would use the identical POP-3 commands.

You can also communicate with web servers using telnet or netcat. Following is a simple HTTP session using netcat:

 $ netcat stageserver.company.com 80 GET / <html> <head> <title>Web site</title> </head> <body bgcolor="teal"> <p>Placeholder for Web site.</p> </body> </html> $

First, netcat was used to connect to the Web server named stageserver.company.com. The HTTP GET / command was then used to returned the default web page that would normally be read by a standard web client. Instead of using the GET / command, you can simply type in gibberish. Many web servers, especially if they are still using default settings, will reveal the server version and other information:

 $ netcat james 80 asdf <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>501 Method Not Implemented</title> </head><body> <h1>Method Not Implemented</h1> <p>asdf to /index.html not supported.<br /> </p> <hr> <address>Apache/2.0.53 (Ubuntu) PHP/4.3.10-10ubuntu4.3 Server at james.stangernet.com Port 80</address> </body></html> $

Here, netcat was used to connect to a private web server maintained by the author at the host system james. In response to the gibberish entered by the user, the server issued a response that included not only the version of Apache Server, but also the server operating system and the fact that PHP is enabled. Not all daemons will respond with useful information, however.

41.2.3. ifconfig

The ifconfig command can be quite helpful during troubleshooting if you take the time to read all the information it provides. In addition to standard networking information (e.g., the IP address and subnet mask), the typical ifconfig output tells you the following:

Whether or not the interface is up (the UP flag)
If it is in broadcast and multicast mode
The number of packets received and transmitted since last activation
The number of errors and overruns
The number of bytes received and transmitted
The interrupt used, and base address

Here is an example of ifconfig output:

 $ ifconfig eth0 eth0      Link encap:Ethernet  HWaddr 00:80:5F:EA:86:8F           inet addr:24.17.140.230  Bcast:255.255.255.255  Mask:255.255.252.0           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1           RX packets:44354070 errors:0 dropped:0 overruns:0 frame:0           TX packets:3078006 errors:0 dropped:0 overruns:0 carrier:0           collisions:113575 txqueuelen:100           RX bytes:1730626695 (1650.4 Mb)  TX bytes:553335663 (527.7 Mb)           Interrupt:11 Base address:0x6100

The information gathered here can help you narrow down both hardware and software errors.

41.2.4. traceroute

Don't underestimate the usefulness of the traceroute command. Don't be too confident that you know everything about traceroute, either. For the exam, be able to identify each element of traceroute output. Consider the following example:

 # traceroute 213.236.195.41 traceroute to 213.236.195.41 (213.236.195.41), 30 hops max, 38 byte packets  1  linpro-intra-gw (80.232.36.129)  0.212 ms  0.154 ms  0.133 ms  2  tott (80.232.38.218)  0.931 ms  0.783 ms  1.209 ms  3  tdc-A100M-0225-hsrp.linpro.net (80.232.38.220)  1.471 ms  1.505 ms  1.678 ms  4  212.37.252.2 (212.37.252.2)  1.469 ms  1.834 ms  2.457 ms  5  pos3-0.622M.osl-nyd-cr1.ip.teledanmark.no (213.236.195.41)  2.043 ms *  2.906 ms

In the output, notice that each hop has three latency times shown. If you were to ping these systems, you would receive the same times. If the routing is randomized through some routing daemon, subsequent uses of traceroute could discover new hosts.

An asterisk represents either a lost packet or the fact that a router has been programmed not to respond to the particular type of ICMP packets traceroute uses within the timeout period you have specified. The default timeout period for traceroute is five seconds.

Sometimes, you may see the !N or !X flags in place of the latency information traceroute usually provides. The !N flag means that the host or network cannot be reached. The !X flag means that the administrator of the remote system has prohibited the use of ICMP, but was kind enough to configure the router to send a message informing traceroute about the prohibition.

41.2.5. netstat and route

You already know that netstat is useful for checking open connections, as well as viewing the routing table. We also discussed the route command in Chapter 19. The output of each command is slightly different, and this difference might be important in a troubleshooting situation.

Consider the following netstat output:

 $ netstat -r Kernel IP routing table Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface localnet        *               255.255.255.0   U       500   0       0 eth0 default         system1234.stan 0.0.0.0         UG      5000  0       0 eth0 $

For the sake of comparison, consider the following route output:

 $ route Kernel IP routing table Destination     Gateway         Genmask         Flags Metric Ref    Use Iface localnet        *               255.255.255.0   U     0      0        0 eth0 default         system1234.stan 0.0.0.0         UG    0      0        0 eth0 $

The information from the two commands seems identical, but there are subtle differences. The output for netstat contains information for both the Maximum Segment Size (MSS) and Initial Round Trip Time (IRRT). The route command does not report these values by default.

The MSS value indicates the largest amount of data (in bytes) that the system can handle without fragmenting the packet. Generally, you want the MSS value to be less than the Maximum Trnsmission Unit (MTU), which is 1500 for Ethernet systems. A value of 0 means that the default is used, which is 536 bytes for Linux systems. The previous output shows that the MSS is 500, so the system is likely functioning well in this regard.

The IRRT value displays (in milliseconds) the amount of time allowed for initial TCP connections to complete. On our system there is a 0 value, which means that the system is using the default value (300 milliseconds).

The route command provides the routing metric and the Use field, which netstat does not. The MeTRic field indicates the distance to a destination target. It is no longer used by modern systems, though if you use a routing daemon, you may need to read this value. Knowledge of routing daemons is not required for the LPI Exams.

The Use field indicates the number of lookups for the particular route. If you use route's -C option, you will see the number of times the cache has correctly looked up the route. If you use route's -F option, you will see the number of misses.