The purpose of every troubleshooting model is to find a common denominator for all possible issues and to offer a common approach or problem-solving model. The objective is to represent (reduce) a complex problem into a couple or even one point of failure, and then to restore the functionality of the whole system. You are faced with the question: which is better, analysis or synthesis? Troubleshooting ModelsThere is more than one Cisco troubleshooting model defined in numerous books and conferences. One such model is defined in the book Cisco Internetwork Troubleshooting (by Laura Chappel and Dan Farkas, Cisco Press, 1999), where an eight-step problem solving model is recommended. Some of the commonly accepted troubleshooting steps and tips, recommended by the Cisco troubleshooting society, are available on Cisco.com. Some of the more relevant practices from these resources follow. The following are basic troubleshooting steps:
NOTE My experience when troubleshooting is, if you understand the design and configuration of the network, gathering the right information is key to finding a resolution. Knowing the baseline performance of the network and asking the right question is 50 percent of the solution. The following are basic troubleshooting tips:
Based on the sources, many find common ground in the actions of understanding the baseline and topology, gathering information, asking the right questions, learning tools, implementing solutions, and documenting changes. These are the main steps of the troubleshooting process. Later in the discussion, you learn about some common and some Cisco-specific troubleshooting tools (Cisco IOS-based), which are designed to alleviate the process.
There is no universal troubleshooting tool. The choice of the right tool depends on your hands-on experience. Every tool provides information for one or more aspects of the network, and it is good practice to use several or a combination of tools in the troubleshooting process. Nevertheless, some tools related to issues such as latency measurement, performance measurement, or reach ability of a certain hop can provide more common results. Some issues, such as how to measure the performance of the network, have more than one answer, and it is recommended to apply a comparative approach whenever possible. If it is possible to use the same tool (i.e., the ping utility), compare the results obtained from different technologies or segments of the network. The BaselineThe baseline is the expected behavior of the system that you are about to troubleshoot. The baseline is an important factor, but it is hidden in the beginning of the troubleshooting process. In general, the baseline includes performance characteristics, number and type of hops, and expected round-trip times (RTTs). It might include some yes or no questions, such as compression, multicast, Voice over IP (VoIP), quality of service (QoS), and so on. The baseline should be technology and architecture based. You should set expectations before starting the troubleshooting process. One key factor is to have a map of the network topology. The following sections discuss aspects of the network baseline and tools for helping you establishing the baseline. Network TopologyAt a minimum, you need to have a documented network topology. For Internet connections, Internet service providers (ISPs) apply different methods to prevent end users from discovering the topology, assuming that this is one of the first steps in preventing denial of service (DOS) attacks from hackers. One recommended tool to discover the network topology is to run the Cisco Discovery Protocol (CDP). CDP is a Cisco proprietary protocol that runs on the data link layer (Layer 2) between directly connected and adjacent devices (neighbors), including routers, bridges, switches, access servers, and virtually all Cisco IOS devices. CDP works not only for IOS-based, but non-IOS devices such as IP phones and Aironet access points (APs). CDP is enabled by default on all broadcast interfaces. It is useful for debugging connectivity issues and building topology maps. The high-level data link control (HDLC) protocol type 0x2000 is assigned to CDP. It uses a second layer multicast Media Access Control (MAC) address 01-00-0C-CC-CC to send and receive periodic messages and to collect data. Using a multicast address prevents the routing because Cisco routers and some switches do not forward this traffic unless specifically configured. If any information has changed since the last packet was received, the new one is cached, ignoring the Time To Live (TTL) value. In this way, CDP provides a quick state discovery. The recommended commands for topology discovery are as follows : Router# show cdp neighbors Router# show cdp neighbors detail The output from these commands returns specific details including device type, IP address, active interface (Ethernet, serial), port type, and port number. The first command provides short output, and the second command provides detailed information about adjacent topology specifics, as shown in Example 4-1. Example 4-1. The show cdp neighbors detail Command Output Router# show cdp neighbors detail <output omitted> Device ID: access-gateway.cisco.com Entry address(es): IP address: 10.10.0.7 Platform: cisco Catalyst 6000, Capabilities: Router Switch IGMP Interface: FastEthernet0/0, Port ID (outgoing port): FastEthernet3/21 Holdtime : 136 sec Version : Cisco Internetwork Operating System Software IOS (tm) c6sup1_rp Software (c6sup1_rp-JK2SV-M), Version 12.1(8b)E9, Copyright 1986-2002 by cisco Systems, Inc. Compiled Sun 17-Feb-02 11:22 by erlang advertisement version: 2 VTP Management Domain: '' Native VLAN: 10 Duplex: full <output omitted> Extensive information from the CDP output requires the CDP to be disabled in interfaces that cannot be controlled or that are user-owned or requested , such as dialer interfaces. To disable the CDP on a dialer interface, the following command has to be configured: Router(config-if)# no cdp enable Another source of topology information can be the routing protocol. All routing protocols support information for their neighbors, and link-based routing protocols, such as Open Shortest Path First (OSPF) and Enhanced Interior Gateway Routing Protocol (EIGRP), support topology information. As soon as all members are running routing protocols (instead of static routing), these commands can be informative. Information for neighbors is usually more concise . The following command is an example: Router# show ip eigrp neighbors detail The topology output is usually detailed and relatively complex. It is not discussed in this book. It is up to you to know what type of information is required and what topology command to use. [1] It is common for a consulting engineer to start troubleshooting in a new environment where the topology of the network is not available. Tools used in combination, such as telnet , ping, cdp , and traceroute commands, produce information that can create a network topology before starting the troubleshooting process. Overall, the collected baseline information provides you not only with topology information, but also with IOS versions, types of devices, IP addressing conventions, and both routed and routing protocols. Performance Characteristics and Path CharacteristicsThe baseline is related to the expected behavior of the system, including performance characteristics, latency, and expected RTTs. A useful and multioptional tool for establishing baseline performance is the path characteristics (PathChar or Pchar) utility. Pchar is designed to baseline the network performance. The utility enables you to stress test each hop and provides the ability to collect detailed information. It is based on the path characterization model of Van Jacobson. Pchar measures network performance on a per-hop and a total path basis. It supports IPv4 and IPv6, and it is useful in isolating performance problems. Some of the drawbacks are that the latency figures might not be accurate when measuring application performance, and Internet Control Message Protocol (ICMP) messages can be filtered or respond differently (typical in the case of satellite-based connections). The utility runs on UNIX, Linux, and Solaris platforms. The options are listed in Table 4-1. The following shows the usage of Pchar: UNIX-host:/users/pnedeltc> ./pchar Usage: ./pchar [-a analysis] [-b burst] [-c] [-d debuglevel] [-g gap] [-G gaptype] [-h] [-H hops] [-I increment] [-m mtu] [-n] [-p protocol] [-P port] [-q] [-R reps] [-s hop] [-t timeout] [-T tos] [-v] [-V] [-w file] -r file host] Table 4-1. Path Characteristics (Pchar) Options
Table 4-2 shows some performance measurements from Cisco users of different technologies in the U.S. that connect to the corporate network. Table 4-2. Pchar and a Baseline Measurement
Usually, the bottleneck is at the last hop in the test, but not always. In the following examples (see Figure 4-3 and Example 4-2), the Cisco 350 AP is connected to an 804 ISDN router, which in turn is calling a 7206VXR core router. A Microsoft Windows 2000 laptop (IP: 161.70.209.86) is associated with the AP (IP: 161.70.209.82), and the speed between them is 11 Mbps. Example 4-2. Pchar Discovers a Bottleneck UNIX-host:/users/pnedeltc> ./pchar -p ipv4icmp -v -R 3 -s 8 10.70.209.86 -p protocol ipv4icmp; -v verbose; -R-3 repetitions per hop; -s starting from hop 8 pchar to 10.70.209.86 (10.70.209.86) using ICMP/IPv4 (raw sockets) Using raw socket input Packet size increments from 32 to 1500 by 32 46 test(s) per repetition 3 repetition(s) per hop 7: 10.71.86.86 (UNIX-Host) Partial loss: 0 / 138 (0%) Partial char: rtt = 1.359319 ms, (b = -0.000003 ms/B), r2 = 0.000210 stddev rtt = 0.021433, stddev b = 0.000027 Partial queueing: avg = 0.000052 ms (0 bytes) 8: 10.70.192.243 (Cisco-isdn.cisco.com) Partial loss: 0 / 138 (0%) Partial char: rtt = 25.581711 ms, (b = 0.072820 ms/B), r2 = 0.999766 stddev rtt = 0.135654, stddev b = 0.000168 Partial queueing: avg = 0.029341 ms (402 bytes) Hop char: rtt = 24.222391 ms, bw = 109.855819 Kbps Hop queueing: avg = 0.029288 ms (402 bytes) 9: 10.70.209.81 (pnedeltc-isdn.cisco.com) ! 6% loss: Partial loss: 9 / 138 (6%) Partial char: rtt = 25.566884 ms, (b = 0.075063 ms/B), r2 = 0.999448 stddev rtt = 0.387928, stddev b = 0.000276 Partial queueing: avg = 0.029208 ms (402 bytes) Hop char: rtt = --.--- ms, bw = 3566.545092 Kbps Hop queueing: avg = -0.000133 ms (0 bytes) The connection between 10.70.209.81 and 10.70.209.86 is wireless 11 Mbps. 10: 10.70.209.86 (pnedeltc-isdn5.cisco.com) Path length: 10 hops Path char: rtt = 25.566884 ms r2 = 0.999448 ! This bottleneck matches the bandwidth of ! the core router 7206-isdn (see step 8): Path bottleneck: 109.855819 Kbps Path pipe: 351 bytes Path queueing: average = 0.029208 ms (402 bytes) Start time: Fri Dec 21 07:48:33 2001 End time: Fri Dec 21 07:51:43 2001 executor:/users/pnedeltc> Figure 4-3. Functional Model for Bottleneck Discovery
The Pchar is run from 10.71.86.86 towards 10.70.209.86 with the options -p protocol ipv4icmp; -v verbose; -R-3 repetitions per hop; -s starting from hop 8 . Because of configuration options, the utility performs an initial reachability test and assigns hop 7 to the source (IP: 10.71.86.86), hop 8 to the core ISDN router (IP: 10.70.192.243), 804 ISDN router (IP: 10.70.209.81), and reaches the final hopCisco AP 350 (IP: 10.70.209.86). The interesting spots are the level of lossabout 6 percent in hop 9 and the comparison between hop 8 and the last report, where the bottleneck is discovered to be at hop 8see the message bw = 109.855819 kbps. NOTE In Example 4-2, you can see stddev rtt = 0.387928 . This measurement is the standard deviation of the RTT. The standard deviation is the most commonly used measure of the spread. The calculation is based on variance, mean, and standard deviation. If the RTT = 1, 2, 3, the variance is s = [(1 x 2) 2 + (2 x 2) 2 + (3 x 2) 2 ] /3 ~ 0.667, and 2 is the mean. The standard deviation is the square root of the variance, which in this case, is s 1/2 ~ 0.816. The Pchar utility has two limitations : the utility has difficulties when the hop is a multipath type and when there are transparent hops because of bridges or ATM links on the path. |