This scenario represents one of the most common and, at the same time, most difficult to pinpoint issueswhen performance expectations are not met or the service is good for a period of time but there is no data exchange whatsoever for short periods of time. Some of the most common reasons for these problems are flapping lines and traffic shaping issues. Flapping LinesProblems that result from flapping lines are linked to the history of the PVC, as reported by the Frame Relay commands. As previously discussed, the Frame Relay router requests the status of all PVCs on the interface during the periodic polling cycles of LMI, which is typically every six polling cycles. The resulting full-status message response contains information on every PVC that is configured on that physical interface. The information includes the recent history of the PVC and its availability (inactive or active). The term flapping lines refers to the situation when the service continually changes its state from active to inactive, or is flapping. The user tries to exchange data, but the exchange is not available for a period of time, where the user cannot ping, telnet, or reach the other party's router or any party's IP address. After a while, the service comes back up and repeats the cycle. The typical reports that indicate a flapping line are shown in Example 18-23. Example 18-23. The History of the DLCI, Including the Relative Time When the Service Was Created and the Last Time the PVC Status Was Changed1602-frame#show frame-relay pvc PVC Statistics for interface Serial0 (Frame Relay DTE) Active Inactive Deleted Static Local 1 0 0 0 Switched 0 0 0 0 Unused 0 0 0 0 DLCI = 74, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial0.74 input pkts 42832 output pkts 49616 in bytes 17904175 out bytes 9379033 dropped pkts 62 in FECN pkts 0 in BECN pkts 0 out FECN pkts 0 out BECN pkts 0 in DE pkts 42832 out DE pkts 0 out bcast pkts 2580 out bcast bytes 777846 pvc create time 3w2d, last time pvc status changed 00:15:53 ! If the PVC was created 3w2d ago, you must identify what causes ! the pvc status to be changed so often. 1602-frame# ! Use the show service-module command to verify how the line parameters ! are reported: 1602-frame#show service-module Module type is 4-wire Switched 56K in DDS mode, Receiver has no alarms. Current line rate is 56 Kbits/sec and role is DSU side, Last clearing of alarm counters 1d19h ! This report matches the previous one oos/oof : 120, last occurred 00:15:53 loss of signal : 0, loss of sealing current: 0, ! The last time CSU/DSU was looped back (from you, or from ! the service provider) in order to test the connection CSU/DSU loopback : 107, last occurred 14:11:29 loopback from remote : 0, DTE loopback : 0, line loopback : 0, 1602-frame# If you check the serial interfaces, you see a high volume of errors on the input portion of the statistics, as shown in Example 18-24. Example 18-24. To See the Number of Input and Output Errors and Their Type, Use the show interfaces Command in Enabled Mode1602-frame#show interfaces Serial0 is up, line protocol is up Hardware is QUICC Serial (with onboard CSU/DSU) MTU 1500 bytes, BW 1544 Kbit, DLY 20000 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation FRAME-RELAY, loopback not set Keepalive set (10 sec) LMI enq sent 15696, LMI stat recvd 15107, LMI upd recvd 0, DTE LMI up LMI enq recvd 0, LMI stat sent 0, LMI upd sent 0 LMI DLCI 0 LMI type is ANSI Annex D frame relay DTE <output omitted> 5 minute input rate 2000 bits/sec, 4 packets/sec 5 minute output rate 2000 bits/sec, 3 packets/sec 59692 packets input, 19967056 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 6 giants, 0 throttles ! Look to the extensive number of errors and especially number of aborts. 201794 input errors, 11665 CRC, 159475 frame, 0 overrun, 0 ignored, 30654 abort 65338 packets output, 9595695 bytes, 0 underruns ! The number of interface resets is extremely high as well 0 output errors, 0 collisions, 11137 interface resets 0 output buffer failures, 0 output buffers swapped out 0 carrier transitions DCD=up DSR=up DTR=up RTS=up CTS=up This output shows that you are dealing with second-layer problems, which is the reason that the service went down. Identify which layer of the service is affected and not the first layer. Examine the output from the #show service-module command, which shows that the DTE never loses the signal (loss of signal : 0). Also, no carrier transitions exist (0 carrier transitions), which is typical when the line is out of sync. Obviously, the first layer is not affected; thus, focus on the protocol layer and its components and determine which one is causing the service to go down. Recall the way that LMI works and reports the DTE down, then increments the counter and resets the interface. Check the number of interface resets to confirm that the counters were incremented. The number of interface resets does not match the carrier transitions, but CRC and frame errors, which leads you to determine that you are dealing with second-layer issues. Remembering that, verify how the other partythe core routerreports the status of the PVC. The core router is configured for DLCI = 74 in Serial4/1:0. Its status is shown in Example 18-25. Example 18-25. Verifying the Status of PVC 74 on the Core Router 7206-frame#show frame-relay pvc 74 PVC Statistics for interface Serial4/1:0 (Frame Relay DTE) DLCI = 74, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial4/1:0.74 input pkts 167039 output pkts 157077 in bytes 26896506 out bytes 79023826 dropped pkts 116 in FECN pkts 0 in BECN pkts 0 out FECN pkts 0 out BECN pkts 0 in DE pkts 167039 out DE pkts 0 out bcast pkts 9748 out bcast bytes 2924400 pvc create time 3w0d, last time pvc status changed 00:17:09 cir 56000 bc 56000 be 0 byte limit 875 interval 125 mincir 28000 byte increment 875 Adaptive Shaping none pkts 156956 bytes 78972226 pkts delayed 51275 bytes delayed 56957021 shaping inactive traffic shaping drops 0 Queuing strategy: fifo Output queue 0/40, 116 drop, 51275 dequeued 7206-frame# There is nothing unusual from the core side to affect or reflect the issues that the remote side is experiencing. Further actions to fix the service include the following steps on the remote user's router:
After the successful resolution of this issue, it is good practice to check the status of the service from both sides the next day with the following commands: 1602-frame#show frame-relay pvc 74 7206-frame#show frame-relay pvc 74 The remote user's router reports the output shown in Example 18-26. Example 18-26. The Core Router Report for the Troubled PVC 74 PVC Statistics for interface Serial0 (Frame Relay DTE) Active Inactive Deleted Static Local 1 0 0 0 Switched 0 0 0 0 Unused 0 0 0 0 DLCI = 74, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial4/1:0.74 input pkts 186908 output pkts 175377 in bytes 29752740 out bytes 89258825 dropped pkts 116 in FECN pkts 0 in BECN pkts 0 out FECN pkts 0 out BECN pkts 0 in DE pkts 186908 out DE pkts 0 out bcast pkts 11055 out bcast bytes 3316500 ! The last time the pvc status has changed is 17:44.54. pvc create time 3w1d, last time pvc status changed 17:44:54 cir 56000 bc 56000 be 0 byte limit 875 interval 125 mincir 28000 byte increment 875 Adaptive Shaping none pkts 175256 bytes 89207225 pkts delayed 57828 bytes delayed 64737774 shaping inactive traffic shaping drops 0 Queuing strategy: fifo Output queue 0/40, 116 drop, 57828 dequeued 7206-frame# The last time the PVC changed is reported after the fix is implemented (the last time the PVC status changed was 17:44:54). If the number and type of errors is not incrementing from where they were before the changes were implemented, you can consider the case closedthe troubleshooting actions have corrected the problem. Traffic Shaping IssuesAnother important cause of performance issues is related to traffic shaping settings. The two basic cases are no traffic shaping and wrong traffic shaping; both equally affect performance of the service. The configuration for traffic shaping is covered in Chapter 16. It is a necessary feature if you need to prioritize different types of traffic by trimming the timers and counters, or configuring Bc, Be, and timing intervals. NOTE The Enhanced Local Management Interface (ELMI) is an interesting Cisco feature that is well-known for enabling Frame Relay quality of service (QoS) by using the 7206-frame(config-if)# frame-relay qos-autosense command. By turning this command on and off, it can perform dynamic traffic shaping. The feature enables the automated exchange of Frame Relay QoS parameter information, between the Cisco router and the Cisco switch (BPX/MGX and IGX platforms). The router uses the QoS values from the switch that are configurable to establish traffic shaping. More about this Cisco IOS feature can be found at www.cisco.com. Because of the use of traffic shaping, performance issues are recognized in three typical scenarios:
High RTT NumbersAn example of the first case is when the user and the core router are located in the same area, but the test is performed from a remote geographic location. The local carriers are X and Y, and the IXC (long-distance carrier) carries the traffic with one-way latency of about 80 ms. The local loop latency is definitely lower than the two-way latency of the long-distance carrier, given the locations and distance. If you assume that two local loops (core side and the remote user's side) each have latency equal to one-way latency of the local carrier (3 x 80), you can expect a RTT of about 240 ms. Now, to check the actual results, perform a ping with 64 bytes, which means that there is no need for fragmentation/defragmentation (see Example 18-27). Example 18-27. A Ping Test Performed to Find Out the RTTUNIX.cisco.com:/users/pnedeltc> ping 1602-frame PING 1602-frame.cisco.com: 56 data bytes 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=0. time=6191. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=1. time=7424. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=2. time=9101. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=3. time=11448. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=4. time=13887. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=5. time=16244. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=6. time=18760. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=7. time=22428. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=8. time=27199. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=9. time=34426. ms .................. 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=376. time=114440. ms ^C ----1602-frame.cisco.com PING Statistics---- 493 packets transmitted, 135 packets received, 72% packet loss round-trip (ms) min/avg/max = 6191/112398/133134 As you can see from the output, the RTT values are inconsistent and significantly exceed your expectations. If you trace the path (see Example 18-28), you see that the long-distance provider carries the trace to the remote site in 80 ms; however, the local service provider carries the trace within a local area in 200 ms. There is a problem with performance on the local loop. Example 18-28. A Trace Route Test, Performed to Find the Highest RTTStarting trace - Aug 27, 2001 10:19:40 Tracing to 1602-frame [10.84.11.73].... Hops IP Address RTT(ms) DNS Name 1 161.71.86.2 0 hop-dtb-gw1.cisco.com 2 161.71.241.153 0 hop-sbb4-gw1.cisco.com 3 161.71.241.37 0 hop-rbb-gw3.cisco.com 4 161.69.7.217 0 hop-rbb-gw1.cisco.com 5 161.69.7.158 0 hop-gb4-g0-0.cisco.com 6 161.68.86.58 81 hop-sj-pos.cisco.com 7 10.184.5.89 80 hop-rbb-gw1.cisco.com ! The trace reaches the other end, which is the core router. 8 10.84.5.222 80 7206-frame.cisco.com ! The trace reaches the remote user's router 9 10.84.11.73 200 1602-frame.cisco.com Host reached ! The last hop is 200 ms RTT The tests of the user's router show no buffer or hardware failures. However, if you check whether traffic shaping is applied to DLCI = 60, you can see that DLCI = 60 is not listed, as shown in Example 18-29. Example 18-29. Verifying if DLCI = 60 Is Listed Among DLCIs, with Applied Traffic Shaping 7206-frame#show traffic-shape Interface Se3/0:0 Access Target Byte Sustain Excess Interval Increment Adapt VC List Rate Limit bits/int bits/int (ms) (bytes) Active 38 56000 875 7000 0 125 875 - 34 56000 875 7000 0 125 875 - 33 56000 875 7000 0 125 875 - 32 56000 875 7000 0 125 875 - 22 56000 875 7000 0 125 875 - 16 56000 875 7000 0 125 875 - Interface Se3/0:0.17 Access Target Byte Sustain Excess Interval Increment Adapt VC List Rate Limit bits/int bits/int (ms) (bytes) Active 17 56000 875 7000 0 125 875 - Interface Se3/0:0.18 Access Target Byte Sustain Excess Interval Increment Adapt VC List Rate Limit bits/int bits/int (ms) (bytes) Active 18 56000 875 7000 0 125 875 - Interface Se3/0:0.20 Access Target Byte Sustain Excess Interval Increment Adapt VC List Rate Limit bits/int bits/int (ms) (bytes) Active 20 56000 875 7000 0 125 875 - Interface Se3/0:0.23 Access Target Byte Sustain Excess Interval Increment Adapt VC List Rate Limit bits/int bits/int (ms) (bytes) Active 23 56000 875 7000 0 125 875 - Interface Se3/0:0.24 Access Target Byte Sustain Excess Interval Increment Adapt VC List Rate Limit bits/int bits/int (ms) (bytes) Active 24 56000 875 7000 0 125 875 - <output omitted> Interface Se3/0:0.62 Access Target Byte Sustain Excess Interval Increment Adapt VC List Rate Limit bits/int bits/int (ms) (bytes) Active 62 384000 6000 384000 0 125 6000 - The first conclusion about this case can lead you to Scenario 1, local loop problems, or flapping links. After working with LEC, you might conclude that this is a traffic-shaping issue. The required fix is to implement the appropriate traffic-shaping map class. Slow PerformanceThe second traffic shaping issue is when performance is lower than what the user is expecting. In this scenario, the service is provisioned for an access rate of 384-kbps, but performance characteristics are closer to a 56-kbps circuit. The service is operational, the serial lines do not report any errors, and both the remote user's and the core router's configurations are set up correctly, and report normal. The output in Example 18-30 shows the DLCI = 98 parameters. Example 18-30. Verifying the Serial 4/0:0.98 Configuration on the Core Router 7206-frame#show interfaces serial 4/0:0.98 Serial4/0:0.98 is up, line protocol is up Hardware is Multichannel T1 Description: 1604-frame: 10.21.56.8/29 : 23161309 : 3844600235 Interface is unnumbered. Using address of Loopback2 (171.68.88.1) MTU 1500 bytes, BW 256 Kbit, DLY 20000 usec, reliability 255/255, txload 24/255, rxload 5/255 Encapsulation FRAME-RELAY The output from 7206-frame#show frame-relay pvc 98 is shown in Example 18-31. Example 18-31. Verifying if PVC 98 Has Traffic Shaping Applied to It7206-frame#show frame-relay pvc 98 DLCI = 98, DLCI USAGE = UNUSED, PVC STATUS = ACTIVE, INTERFACE = Serial4/0:0 input pkts 167755 output pkts 167552 in bytes 13750582 out bytes 189232810 dropped pkts 71 in FECN pkts 0 in BECN pkts 0 out FECN pkts 0 out BECN pkts 0 in DE pkts 167755 out DE pkts 0 out bcast pkts 16392 out bcast bytes 4967386 pvc create time 5d16h, last time pvc status changed 5d16h ! These are the parameters, defining the performance. cir 28000 bc 7000 be 0 limit 875 interval 125 mincir 28000 byte increment 875 Adaptive Shaping none pkts 167481 bytes 189226266 pkts delayed 128118 bytes delayed 171119678 shaping inactive traffic shaping drops 0 Serial4/0:0.98 dlci 98 is first come first serve default queuing Output queue 0/40, 71 drop, 128118 dequeued Next, you need to take some measurements from the interfaces Ethernet0 and Serial1 of the remote user's router, then ping from the core router or from any server in the same area with a packet size of 3000 bytes. Measure the end user's five-minute rate by entering the following two commands, as shown in Example 18-32: 1602-frame#show interfaces ethernet 0 | include 5 min 1602-frame#show interfaces serial 1 | include 5 min Example 18-32. Measuring the Input and Output Rate on Ethernet0 and Serial1 Interfaces of a Remote User's Router1602-frame#show interfaces ethernet 0 | include 5 min 5 minute input rate 7000 bits/sec, 13 packets/sec 5 minute output rate 8000 bits/sec, 18 packets/sec 1602-frame#show interfaces serial 1 | include 5 min 5 minute input rate 4000 bits/sec, 22 packets/sec 5 minute output rate 6000 bits/sec, 14 packets/sec 1602-frame# It is a good idea to monitor the RXD and TXD, and the reliability reports (reliability 255/255) of the 1602-frame router: Serial1 is up, line protocol is up Hardware is QUICC Serial (with FT1 CSU/DSU WIC) MTU 1500 bytes, BW 128 Kbit, DLY 20000 usec, reliability 255/255, txload 7/255, rxload 85/255 If you go back to review the previous outputs, you will notice that for a 384-kbps circuit, the core router reports the following: pvc create time 5d16h, last time pvc status changed 5d16h cir 28000 bc 7000 be 0 limit 875 interval 125 mincir 28000 byte increment 875 Adaptive Shaping none The new config inherits the default settings of Serial4/0:0, where traffic shaping is defined with no classes. The fix is easy to apply. First, create a map class, as shown in Example 18-33. Example 18-33. Example for Class Definition Called class-384-newmap-class frame-relay class-384-new no frame-relay adaptive-shaping frame-relay cir 384000 frame-relay bc 384000 frame-relay be 128000 frame-relay mincir 256000 Next, apply the map class, as shown in Example 18-34. Example 18-34. The Class class-384-new Is Applied to the Interface interface Serial4/0:0.98 point-to-point description 1604-frame frame: 10.21.56.8/29 : 23161309 : 3844600235 bandwidth 384 ip unnumbered Loopback2 no ip route-cache frame-relay class class-384-new frame-relay interface-dlci 98 IETF Repeat the status commands and compare the results, as shown in Example 18-35. Example 18-35. Check the Status of PVC 987206-frame#show frame-relay pvc 98 PVC Statistics for interface Serial4/0:0 (Frame Relay DTE) DLCI = 98, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial4/0:0.98 input pkts 536361 output pkts 659230 in bytes 100680624 out bytes 165617202 dropped pkts 68 in FECN pkts 0 in BECN pkts 0 out FECN pkts 0 out BECN pkts 0 in DE pkts 536361 out DE pkts 0 out bcast pkts 23237 out bcast bytes 1886266 pvc create time 2d04h, last time pvc status changed 1d05h ! The CIR now is 384000, bc=384000, be=128000. cir 384000 bc 384000 be 128000 limit 4000 interval 125 mincir 256000 byte increment 4000 Adaptive Shaping none pkts 659213 bytes 122881018 pkts delayed 20415 bytes delayed 19232012 shaping inactive traffic shaping drops 0 Serial4/0:0.98 dlci 98 is first come first serve default queuing Output queue 0/40, 29 drop, 20415 dequeued Finally, repeat the ping test and compare the results, as shown in Example 18-36. Example 18-36. Measuring the Input and Output Rate on Ethernet0 and Serial1 Interfaces of the Remote User's Router, After Implementing the Map Class1602-frame#show interfaces ethernet 0 | include 5 min 5 minute input rate 8000 bits/sec, 15 packets/sec 5 minute output rate 192000 bits/sec, 30 packets/sec 1602-frame#show interfaces serial 1 | include 5 min 5 minute input rate 70000 bits/sec, 29 packets/sec 5 minute output rate 6000 bits/sec, 16 packets/sec The performance has improved significantly. Flapping RoutesFlapping routes occur during the convergence process when there is instability in the network. Different routing protocols pose different requirements for Frame Relay, but sometimes even a lack of memory on the core router when the number of subscribed users increases can cause this issue. One symptom of network instability is when trace commands use different paths to reach the destination, indicating either a slow convergence process, or a change in the topology and a related change in the routing table. All the routers in the network must converge on the new topology when changes exist in the network. Toward this end, they begin sharing routing information, and each update nullifies the previous decision and triggers another update to the other routers. These routers, in turn, adjust their own routing tables and generate new updates, which cause flapping routes. The recommended way of dealing with this situation is far more complex than it appears, and requires additional troubleshooting. One possible solution is powering down the affected routers and slowly allowing convergence in your network, one router at a time. For more information, check www.cisco.com. Powering down production routers is not always feasible. Also, depending on the size and configuration of the routers, it might not work to do it router by router. In extremely large networks, with large routing tables, it might be necessary to shut down all interfaces, power cycle the box, and bring it back online one interface at a time. |