Troubleshooting Models and the Baseline | Troubleshooting Remote Access Networks (CCIE Professional Development)

The purpose of every troubleshooting model is to find a common denominator for all possible issues and to offer a common approach or problem-solving model. The objective is to represent (reduce) a complex problem into a couple or even one point of failure, and then to restore the functionality of the whole system. You are faced with the question: which is better, analysis or synthesis?

Troubleshooting Models

There is more than one Cisco troubleshooting model defined in numerous books and conferences. One such model is defined in the book Cisco Internetwork Troubleshooting (by Laura Chappel and Dan Farkas, Cisco Press, 1999), where an eight-step problem solving model is recommended. Some of the commonly accepted troubleshooting steps and tips, recommended by the Cisco troubleshooting society, are available on Cisco.com. Some of the more relevant practices from these resources follow.

The following are basic troubleshooting steps:

1.	Identify how it should work.
2.	Review your topology.
3.	Identify the symptoms.
4.	Ask the right questions.
5.	Develop a plan of attack.
6.	Document your actions.

NOTE

My experience when troubleshooting is, if you understand the design and configuration of the network, gathering the right information is key to finding a resolution. Knowing the baseline performance of the network and asking the right question is 50 percent of the solution.

The following are basic troubleshooting tips:

Don't panic.
Understand your network.
Develop network baselines.
Gather the right information from your users.
Work methodically and document all of your actions.
Learn how to effectively use tools.
Figure out what tools and what options work best for each problem.
Use access lists when enabling debug commands.

Based on the sources, many find common ground in the actions of understanding the baseline and topology, gathering information, asking the right questions, learning tools, implementing solutions, and documenting changes. These are the main steps of the troubleshooting process. Later in the discussion, you learn about some common and some Cisco-specific troubleshooting tools (Cisco IOS-based), which are designed to alleviate the process.

IOS Commands

You should assume that all troubleshooting techniques described in this book are Cisco IOS-based. Despite your expertise in IOS, remember that IOS is constantly developing. The syntax of IOS is multioptional and sometimes can be complex. You must continue to study IOS on a regular basis and keep apprised of changes. One of the best sources for information is Cisco IOS Releases: The Complete Reference by Mack Coulibaly (Cisco Press, 2000), and the Cisco software center at www.cisco.com/public/sw-center/.

Short Format (Truncation) of IOS Commands

Typing the full version of commands is rare in Cisco-based troubleshooting. For example, most engineers won't type show running-config . Instead, they simply type show run or even sh run . Using such truncated commands can help you save time. Become familiar with the short form of commands so that you can work quicker and understand when others use them. A review of IOS terminology and commands at www.cisco.com is highly recommended.

Device Abbreviations Used in This Book

The abbreviations in this book about the devices and technologies are developed to be self-explanatory. For example, 5300-Dial means Cisco Access Server 5300 for dial, 804-isdn means Cisco 804 ISDN router for remote user , and 1602-frame means Cisco 1602 router for Frame Relay service.

There is no universal troubleshooting tool. The choice of the right tool depends on your hands-on experience. Every tool provides information for one or more aspects of the network, and it is good practice to use several or a combination of tools in the troubleshooting process. Nevertheless, some tools related to issues such as latency measurement, performance measurement, or reach ability of a certain hop can provide more common results. Some issues, such as how to measure the performance of the network, have more than one answer, and it is recommended to apply a comparative approach whenever possible. If it is possible to use the same tool (i.e., the ping utility), compare the results obtained from different technologies or segments of the network.

The Baseline

The baseline is the expected behavior of the system that you are about to troubleshoot. The baseline is an important factor, but it is hidden in the beginning of the troubleshooting process. In general, the baseline includes performance characteristics, number and type of hops, and expected round-trip times (RTTs). It might include some yes or no questions, such as compression, multicast, Voice over IP (VoIP), quality of service (QoS), and so on. The baseline should be technology and architecture based. You should set expectations before starting the troubleshooting process. One key factor is to have a map of the network topology. The following sections discuss aspects of the network baseline and tools for helping you establishing the baseline.

Network Topology

At a minimum, you need to have a documented network topology. For Internet connections, Internet service providers (ISPs) apply different methods to prevent end users from discovering the topology, assuming that this is one of the first steps in preventing denial of service (DOS) attacks from hackers.

One recommended tool to discover the network topology is to run the Cisco Discovery Protocol (CDP). CDP is a Cisco proprietary protocol that runs on the data link layer (Layer 2) between directly connected and adjacent devices (neighbors), including routers, bridges, switches, access servers, and virtually all Cisco IOS devices. CDP works not only for IOS-based, but non-IOS devices such as IP phones and Aironet access points (APs). CDP is enabled by default on all broadcast interfaces. It is useful for debugging connectivity issues and building topology maps.

The high-level data link control (HDLC) protocol type 0x2000 is assigned to CDP. It uses a second layer multicast Media Access Control (MAC) address 01-00-0C-CC-CC to send and receive periodic messages and to collect data. Using a multicast address prevents the routing because Cisco routers and some switches do not forward this traffic unless specifically configured. If any information has changed since the last packet was received, the new one is cached, ignoring the Time To Live (TTL) value. In this way, CDP provides a quick state discovery. The recommended commands for topology discovery are as follows :

 Router#  show cdp neighbors  Router#  show cdp neighbors detail

The output from these commands returns specific details including device type, IP address, active interface (Ethernet, serial), port type, and port number. The first command provides short output, and the second command provides detailed information about adjacent topology specifics, as shown in Example 4-1.

Example 4-1. The show cdp neighbors detail Command Output

 Router#  show cdp neighbors detail  <output omitted> Device ID: access-gateway.cisco.com Entry address(es): IP address: 10.10.0.7 Platform: cisco Catalyst 6000,  Capabilities: Router Switch IGMP Interface: FastEthernet0/0,  Port ID (outgoing port): FastEthernet3/21 Holdtime : 136 sec Version : Cisco Internetwork Operating System Software IOS (tm) c6sup1_rp Software (c6sup1_rp-JK2SV-M), Version 12.1(8b)E9, Copyright  1986-2002 by cisco Systems, Inc. Compiled Sun 17-Feb-02 11:22 by erlang advertisement version: 2 VTP Management Domain: '' Native VLAN: 10 Duplex: full <output omitted>

Extensive information from the CDP output requires the CDP to be disabled in interfaces that cannot be controlled or that are user-owned or requested , such as dialer interfaces. To disable the CDP on a dialer interface, the following command has to be configured:

 Router(config-if)#  no cdp enable

Another source of topology information can be the routing protocol. All routing protocols support information for their neighbors, and link-based routing protocols, such as Open Shortest Path First (OSPF) and Enhanced Interior Gateway Routing Protocol (EIGRP), support topology information. As soon as all members are running routing protocols (instead of static routing), these commands can be informative. Information for neighbors is usually more concise . The following command is an example:

 Router#  show ip eigrp neighbors detail

The topology output is usually detailed and relatively complex. It is not discussed in this book. It is up to you to know what type of information is required and what topology command to use. ^[1]

It is common for a consulting engineer to start troubleshooting in a new environment where the topology of the network is not available. Tools used in combination, such as telnet , ping, cdp , and traceroute commands, produce information that can create a network topology before starting the troubleshooting process.

Overall, the collected baseline information provides you not only with topology information, but also with IOS versions, types of devices, IP addressing conventions, and both routed and routing protocols.

Performance Characteristics and Path Characteristics

The baseline is related to the expected behavior of the system, including performance characteristics, latency, and expected RTTs. A useful and multioptional tool for establishing baseline performance is the path characteristics (PathChar or Pchar) utility.

Pchar is designed to baseline the network performance. The utility enables you to stress test each hop and provides the ability to collect detailed information. It is based on the path characterization model of Van Jacobson. Pchar measures network performance on a per-hop and a total path basis. It supports IPv4 and IPv6, and it is useful in isolating performance problems. Some of the drawbacks are that the latency figures might not be accurate when measuring application performance, and Internet Control Message Protocol (ICMP) messages can be filtered or respond differently (typical in the case of satellite-based connections).

The utility runs on UNIX, Linux, and Solaris platforms. The options are listed in Table 4-1. The following shows the usage of Pchar:

 UNIX-host:/users/pnedeltc>  ./pchar  Usage: ./pchar [-a analysis] [-b burst] [-c] [-d debuglevel] [-g gap] [-G gaptype] [-h] [-H hops] [-I increment] [-m mtu] [-n] [-p protocol] [-P port] [-q] [-R reps] [-s hop] [-t timeout] [-T tos] [-v] [-V] [-w file]  -r file  host]

Table 4-1. Path Characteristics (Pchar) Options

Pchar Option	Description
-a	Analysis Set analysis type (default is lsq ): lsq Least sum of squares linear fit kendall Linear fit using Kendall's test statistic lms Least median of squares linear fit lmsint Least median of squares linear fit (integer computations )
-b	Burst size (default is 1 )
-c	Ignore route changes (useful for load-balancing situations)
-d	Debug level Set debugging output level
-g	Gap Inter-test gap in seconds (default is 0.25 )
-G	Gap type Inter-test gap type (default is fixed ): fixed Fixed gap exp Exponentially distributed random
-H	Hops Maximum number of hops (default is 30 )
-h	Print this help information
-I	Increment Packet size increment (default is 32 )
-l	Host Set origin address of probes (defaults to hostname )
-m	Maximum transmission unit (MTU) Maximum packet size to check (default is 1500 )
-M	Mode Operational mode (defaults to pchar ): pchar Path characteristics trout Tiny traceroute
-n	Don't resolve addresses to hostnames
-p	Protocol Network protocol (default is ipv4udp ). Specifies the protocol that Pchar uses: ipv4udp (default) ipv4raw ipv4icmp ipv4tcp ipv6icmp ipv6udp
-P	Port Starting port number (default is 32768 )
-q	Quiet output
-r	File Read data from a file (- for stdin )
-R	Reps Repetitions per hop (default is 32 )
-s	Hop Starting hop number (default is 1 )
-S	Do Simple Network Management Protocol (SNMP) queries at each hop to determine the nexthop interface characteristics
-t	Timeout ICMP timeout in seconds (default is 3 )
-T	Type of service (ToS) Set IP ToS field (default is )
-v	Verbose output
-V	Print version information
w file	Write data to file (- for stdout )

Table 4-2 shows some performance measurements from Cisco users of different technologies in the U.S. that connect to the corporate network.

Table 4-2. Pchar and a Baseline Measurement

Technology	Speed	RTT	Path Bottleneck
Dial	56 kbps	125.6 ms	31 kbps
ISDN	128 kbps	22.8 ms	118 kbps
Frame (FT1)	128 kbps	13.8 ms	118 kbps
Frame (56K)	56 kbps	61.2 ms	52.8 kbps
Asymmetric DSL (ADSL)	~ 600 kbps down ~ 250 kbps up	11.6 ms	568.5 kbps
StarBand Satellite	~ 600 kbps down ~128 kbps up	813.5 ms	42 kbps
Cable Modem	10 Mbps	13.6 ms	7912 kbps
Sprint BroadBand	~ 500 kbps down ~ 128 kbps up	76.6 ms	720 kbps
Ethernet	10 Mbps	1.2 ms	~
Fiber Distributed Data Interface (FDDI)	100 Mbps	120 us	~
T1	1.54 Mbps	4.5 ms	1509 kbps
T3	45 Mbps	267 us	~
OC-3	150 Mbps	80 us	~
OC-12	622 Mbps	19 us	~

Usually, the bottleneck is at the last hop in the test, but not always. In the following examples (see Figure 4-3 and Example 4-2), the Cisco 350 AP is connected to an 804 ISDN router, which in turn is calling a 7206VXR core router. A Microsoft Windows 2000 laptop (IP: 161.70.209.86) is associated with the AP (IP: 161.70.209.82), and the speed between them is 11 Mbps.

Example 4-2. Pchar Discovers a Bottleneck

 UNIX-host:/users/pnedeltc>  ./pchar -p ipv4icmp -v -R 3 -s 8 10.70.209.86  -p  protocol ipv4icmp; -v  verbose; -R-3 repetitions per hop;     -s starting from hop 8 pchar to 10.70.209.86 (10.70.209.86) using ICMP/IPv4 (raw sockets) Using raw socket input Packet size increments from 32 to 1500 by 32 46 test(s) per repetition 3 repetition(s) per hop  7: 10.71.86.86 (UNIX-Host)     Partial loss:      0 / 138 (0%)     Partial char:      rtt = 1.359319 ms, (b = -0.000003 ms/B), r2 = 0.000210                        stddev rtt = 0.021433, stddev b = 0.000027     Partial queueing:  avg = 0.000052 ms (0 bytes)  8: 10.70.192.243 (Cisco-isdn.cisco.com)     Partial loss:      0 / 138 (0%)     Partial char:      rtt = 25.581711 ms, (b = 0.072820 ms/B), r2 = 0.999766                        stddev rtt = 0.135654, stddev b = 0.000168     Partial queueing:  avg = 0.029341 ms (402 bytes)     Hop char:          rtt = 24.222391 ms, bw = 109.855819 Kbps     Hop queueing:      avg = 0.029288 ms (402 bytes)  9: 10.70.209.81 (pnedeltc-isdn.cisco.com) ! 6% loss:     Partial loss:      9 / 138 (6%)     Partial char:      rtt = 25.566884 ms, (b = 0.075063 ms/B), r2 = 0.999448                        stddev rtt = 0.387928, stddev b = 0.000276     Partial queueing:  avg = 0.029208 ms (402 bytes)     Hop char:          rtt = --.--- ms, bw = 3566.545092 Kbps     Hop queueing:      avg = -0.000133 ms (0 bytes) The connection between 10.70.209.81 and 10.70.209.86 is wireless 11 Mbps. 10: 10.70.209.86 (pnedeltc-isdn5.cisco.com)     Path length:       10 hops     Path char:         rtt = 25.566884 ms r2 = 0.999448 ! This bottleneck matches the bandwidth of ! the core router 7206-isdn (see step 8):     Path bottleneck:   109.855819 Kbps     Path pipe:         351 bytes     Path queueing:     average = 0.029208 ms (402 bytes)     Start time:        Fri Dec 21 07:48:33 2001     End time:          Fri Dec 21 07:51:43 2001 executor:/users/pnedeltc>

Figure 4-3. Functional Model for Bottleneck Discovery

The Pchar is run from 10.71.86.86 towards 10.70.209.86 with the options -p protocol ipv4icmp; -v verbose; -R-3 repetitions per hop; -s starting from hop 8 . Because of configuration options, the utility performs an initial reachability test and assigns hop 7 to the source (IP: 10.71.86.86), hop 8 to the core ISDN router (IP: 10.70.192.243), 804 ISDN router (IP: 10.70.209.81), and reaches the final hopCisco AP 350 (IP: 10.70.209.86). The interesting spots are the level of lossabout 6 percent in hop 9 and the comparison between hop 8 and the last report, where the bottleneck is discovered to be at hop 8see the message bw = 109.855819 kbps.

NOTE

In Example 4-2, you can see stddev rtt = 0.387928 . This measurement is the standard deviation of the RTT. The standard deviation is the most commonly used measure of the spread. The calculation is based on variance, mean, and standard deviation. If the RTT = 1, 2, 3, the variance is s = [(1 x 2) ² + (2 x 2) ² + (3 x 2) ² ] /3 ~ 0.667, and 2 is the mean. The standard deviation is the square root of the variance, which in this case, is s ^1/2 ~ 0.816.

The Pchar utility has two limitations : the utility has difficulties when the hop is a multipath type and when there are transparent hops because of bridges or ATM links on the path.