Troubleshooting Network Problems

   


Network problems can be manifested in a variety of scenarios. Some are very difficult to diagnose, while others become apparent very quickly. The system manager, although not normally responsible for the network, must know about network issues and how they can affect the systems that he is responsible for.

The majority of larger installations make use of network management software, such as Solstice Domain Manager (discussed in Chapter 14) in one form or another, so most of the problems that will be encountered should be dealt with by this software. However, to configure a network management product, a level of knowledge is required to understand exactly what events are being monitored and to determine the remedial action to take if it occurs.

This section looks at some of the basic troubleshooting tools used to determine the status of a system's network capability and also to diagnose network- related problems.

ifconfig

The ifconfig command is probably the first command to be run when diagnosing a network problem. From the information returned by the command, it is possible to verify that the network interface is functioning correctly, that the IP and broadcast addresses are correct, and that the network mask being used is also correct. Listing 13.2 contains the result from running the ifconfig command with the -a flag to show all network interfaces.

Listing 13.2 Sample Output from the Command ifconfig -a Showing the Status of All Connected Network Interfaces
 taurus# ifconfig -a  lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232        inet 127.0.0.1 netmask ffffff00  le0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500        inet 210.127.8.3 netmask ffffff00 broadcast 210.127.8.255        ether 8:0:20:a:a1:2a  tauru s# 

ping

The ping command is another extremely popular network monitoring tool. It is used for a number of purposes:

  • To determine whether a known remote system can be contacted

  • To see if a remote system is visible on the network and can respond to requests

  • To see the round-trip time of sending a data packet to a remote host

  • To establish the amount (if any) of packet loss being suffered on a communications link between two hosts

Listing 13.3 contains sample output from two different executions of the ping command. The first one merely establishes whether the remote host is responding to requests. The second sends a fixed- size data packet and records the time taken to send it, as well as overall statistics on round-trip times and packet loss.

Listing 13.3 Two Options from the ping Command, One to Establish the Status of a Remote System and One to Determine Transmission Times and Reliability
 leo# ping taurus  taurus is alive  [ /export/home/john ]  leo#  leo# ping -s taurus  PING taurus: 56 data bytes  64 bytes from taurus (210.127.8.3): icmp_seq=0. time=15. ms  64 bytes from taurus (210.127.8.3): icmp_seq=1. time=6. ms  64 bytes from taurus (210.127.8.3): icmp_seq=2. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=3. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=4. time=6. ms  64 bytes from taurus (210.127.8.3): icmp_seq=5. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=6. time=6. ms  64 bytes from taurus (210.127.8.3): icmp_seq=7. time=6. ms  64 bytes from taurus (210.127.8.3): icmp_seq=8. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=9. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=10. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=11. time=6. ms  64 bytes from taurus (210.127.8.3): icmp_seq=12. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=13. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=14. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=15. time=7. ms  64 bytes from taurus (210.127.8.3): icmp_seq=16. time=5. ms  64 bytes from taurus (210.127.8.3): icmp_seq=17. time=5. ms  ^C    taurus PING Statistics    18 packets transmitted, 18 packets received, 0% packet loss  round-trip (ms)   min/avg/max = 5/5/15  leo# 

With the ever-increasing use of firewalls, it is possible that the network administrator might disable the protocol that ping uses ”that is, the Internet Control Message Protocol (ICMP). If this protocol is disabled, then any ping messages that pass through the firewall will fail, indicating (perhaps falsely) that a system is down.

netstat

The netstat command has a wide range of uses. It can be used to monitor the state of a network interface, to determine which network connections are established (or hung) with remote systems, to provide information based on specific network protocols, and also to display the internal routing table.

As an example, consider Listing 13.4, which contains the first of two samples of output from the netstat command. The output is from a Sun Enterprise 250 running Solaris 2.6, where the import of an Oracle database was taking an unacceptable amount of time yet all network connections were working as expected. On running the netstat -i command, with an interval of 5 seconds, a high number of input errors were apparent.

Listing 13.4 Sample Output from the Command netstat -i Showing the Abnormally High Number of Input Errors on the Network Interface
 leo# netstat -i 5  input    hme0     output              input   (Total)  output  packets  errs     packets   errs   colls  packets errs     packets errs    colls  44845867 25576336 3627474    204  122543 44952869 25576336 3734476  204   122543       108       44      28      0       0      108       44      28    0        0        99       48      25      0       0       99       48      25    0        0       182       77      26      0       0      182       77      26    0        0       154       86      26      0       0      154       86      26    0        0       179      113      25      0       0      179       113     25    0        0        89       42      27      0       0       89       42      27    0        0       101       47      28      0       0      101       47      28    0        0       111       38      26      0       0      111       38      26    0        0       136       55      25      0       0      136       55      25    0        0       150       59      35      0       1      150       59      35    0        1  leo# 

Some searching on the Sunsolve database revealed a patch for the symptoms. That patch was duly installed and fixed the problem, as displayed in Listing 13.5, which shows the same network interface following the patch installation.

Listing 13.5 Sample Output from the Command netstat -i Showing That the Problem Is Resolved
 leo# netstat -i 5  input   hme0      output           input  (Total)    output  packets  errs     packets   errs   colls  packets errs     packets errs    colls  8623134     0     707459     103   37491  8633461    0     717786   103    37491      138     0         21       0       0      138    0         21     0        0       64     0         22       0       0       64    0         22     0        0      106     0         21       0       0      106    0         21     0        0       92     0         21       0       0       92    0         21     0        0      150     0         24       0       0      150    0         24     0        0       82     0         21       0       0       82    0         21     0        0      128     0         21       0       0      128    0         21     0        0      114     0         21       0       0      114    0         21     0        0       93     0         21       0       0       93    0         21     0        0      124     0         22       0       0      126    0         24     0        0      109     0         29       0       0      111    0         31     0        0  leo# 

This example was more interesting because it did not appear to be a network problem at all. Indeed, this indicated a performance issue because the import took much longer to complete than was expected.

traceroute

The traceroute command does exactly as you would expect: It traces the path taken to get from one host to another. It displays information about each of the "hops" along the way. This command is extremely useful when trying to determine why two hosts are incapable of communicating because it will indicate routers along the way that are not responding. Listing 13.6 shows an example of the traceroute command.

Listing 13.6 The traceroute Command Showing the Path Taken to Reach a Remote Host
 leo# traceroute taurus  traceroute to taurus (210.127.8.3), 30 hops max, 40 byte packets  leo-router (209.127.8.1)   3 ms 2 ms   2 ms  bb1-gate-x (188.101.25.67)   22 ms 21 ms   18 ms   3   bb2-gate-a (187.100.80.10)   32 ms 29 ms   17 ms  4   bb5-area-xconn-alpha (192.150.100.68)   7 ms 5 ms   3 ms   5   taurus-router (210.127.8.1)   6 ms 4 ms 3 ms  6   taurus (210.127.8.3)    7 ms 5 ms   3 ms  leo# 

snoop

The snoop command is a powerful network command that captures packets on a network interface. The captured packets can be displayed on the screen as they occur or can be saved to a file for later analysis. The snoop command requires superuser privilege to run because it puts the network interface into promiscuous mode so that all packets can be captured. Listing 13.7 demonstrates the type of information that can be gathered with this command. The example shows a simple Telnet connection being established by user john with his password of john1.

It is worth noting that snoop will capture all traffic on a nonswitched network, including traffic between other systems that might not have been requested . On a switched network, however, snoop will capture only packets on the system on which the command is being run and any systems communicating with it; network traffic from other systems will not be captured.

The command is extremely useful, however, particularly when trying to see if acknowledgements are being received from a remote host.

Listing 13.7 The snoop Command Can Even Capture Password Information That Is Transmitted Across the Network
 34   0.04790          aries-> taurus          TELNET C port=60321   35   0.00006         taurus -> aries          TELNET R port=60321 73  775login:   36  0.00045        aries-> taurus          TELNET C port=60321   37  0.00015          taurus -> aries          TELNET R port=60321   38  0.04937           aries-> taurus          TELNET C port=60321   39  0.60130           aries-> taurus          TELNET C port=60321 j   40  0.00021          taurus -> aries          TELNET R port=60321 j   41  0.04841           aries-> taurus          TELNET C port=60321   42  0.01563           aries-> taurus          TELNET C port=60321 o   43  0.00019          taurus -> aries          TELNET R port=60321 o   44  0.04418           aries-> taurus          TELNET C port=60321   45  0.11145           aries-> taurus          TELNET C port=60321 h   46  0.00013          taurus -> aries          TELNET R port=60321 h   47  0.04839           aries-> taurus          TELNET C port=60321   48  0.12778           aries-> taurus          TELNET C port=60321 n   49  0.00012          taurus -> aries          TELNET R port=60321 n   50  0.04208           aries-> taurus          TELNET C port=60321   51  0.31836           aries-> taurus          TELNET C port=60321   52  0.00021          taurus -> aries          TELNET R port=60321   53  0.04148           aries-> taurus          TELNET C port=60321   54  0.00004          taurus -> aries          TELNET R port=60321 Password:   55  0.04991           aries-> taurus          TELNET C port=60321   56  0.53745           aries-> taurus          TELNET C port=60321 j   57  0.09865          taurus -> aries          TELNET R port=60321   58  0.00022           aries-> taurus          TELNET C port=60321 o   59  0.09976          taurus -> aries          TELNET R port=60321   60  0.03078           aries-> taurus          TELNET C port=60321 h   61  0.09923          taurus -> aries          TELNET R port=60321   62  0.07719           aries-> taurus          TELNET C port=60321 n   63  0.09280          taurus -> aries          TELNET R port=60321   64  0.09994           aries-> taurus          TELNET C port=60321 1   65  0.10005          taurus -> aries          TELNET R port=60321   66  0.12767           aries-> taurus          TELNET C port=60321   67  0.00041          taurus -> aries          TELNET R port=60321   68  0.04594           aries-> taurus          TELNET C port=60321   69  0.00007          taurus -> aries          TELNET R port=60321 Last login:  Wed Jan 3 19:05:21 from aries   70  0.04989           aries-> taurus          TELNET C port=60321   71  0.00005          taurus -> aries          TELNET R port=60321 Sun Microsys  tems Inc   72  0.04991           aries-> taurus          TELNET C port=60321   73  0.01966          taurus -> aries          TELNET R port=60321 [ /export/ho  me/john 

lsof

All the commands mentioned previously in this section are bundled with the standard installation of the Solaris operating environment. This one, lsof , is freely available in the public domain and can be downloaded from a number of sites, such as http://www.sunfreeware.com.

The lsof command displays the files that are opened by processes running on the system. It is extremely useful when trying to determine why a file system cannot be unmounted. The information provided is often sufficient for the administrator to identify the offending process and take the appropriate action.

Listing 13.8 shows the number of open files owned by user john, who has simply logged on to the system.

Listing 13.8 The lsof Command Shows How Many Files a User Opens Merely by Logging On and Running a Single Shell
 aries# lsof -u john  COMMAND        PID      USER     FD      TYPE     DEVICE   SIZE/OFF         NODE  NAME  ksh            16282    john     cwd     VDIR     32,56    1024  10176  /export/home/john  ksh            16282    john     txt     VREG     32,6     192764  36481  /usr/bin/ksh  ksh            16282    john     txt     VREG     32,6     1115940  23161  /usr/lib/libc.so.1  ksh            16282    john     txt     VREG     32,6     832236  22968  /usr/lib/libnsl.so.1  ksh            16282    john     txt     VREG     32,6     17252  18373  /usr/platform/sun4u/lib/libc_psr.so.1  ksh            16282    john     txt     VREG     32,6     19876  22890  /usr/lib/libmp.so.2  ksh            16282    john     txt     VREG     32,6     56988  22908  /usr/lib/libsocket.so.1  ksh            16282    john     txt     VREG     32,6     4600  23169  /usr/lib/libdl.so.1  ksh            16282    john     txt     VREG     32,6     183060  22768  /usr/lib/ld.so.1  ksh            16282    john     0u      VCHR     24,0     0t670181 135213/devices/pseudo/pts@0:0->ttcompat->ldterm->ptem->pts  ksh            16282    john     1u      VCHR     24,0     0t670181  135213/devices/pseudo/pts@0:0->ttcompat->ldterm->ptem->pts  ksh            16282    john     2u      VCHR     24,0     0t670181  135213/devices/pseudo/pts@0:0->ttcompat->ldterm->ptem->pts  ksh            16282    john     63u     VREG     32,56    4406  10192  /export/home/john/.sh_history  aries# 

   
Top


Solaris System Management
Solaris System Management (New Riders Professional Library)
ISBN: 073571018X
EAN: 2147483647
Year: 2001
Pages: 101
Authors: John Philcox

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net