Scenario 3: Performance Issues from Flapping Lines and Traffic Shaping Issues


This scenario represents one of the most common and, at the same time, most difficult to pinpoint issueswhen performance expectations are not met or the service is good for a period of time but there is no data exchange whatsoever for short periods of time. Some of the most common reasons for these problems are flapping lines and traffic shaping issues.

Flapping Lines

Problems that result from flapping lines are linked to the history of the PVC, as reported by the Frame Relay commands. As previously discussed, the Frame Relay router requests the status of all PVCs on the interface during the periodic polling cycles of LMI, which is typically every six polling cycles. The resulting full-status message response contains information on every PVC that is configured on that physical interface. The information includes the recent history of the PVC and its availability (inactive or active). The term flapping lines refers to the situation when the service continually changes its state from active to inactive, or is flapping. The user tries to exchange data, but the exchange is not available for a period of time, where the user cannot ping, telnet, or reach the other party's router or any party's IP address. After a while, the service comes back up and repeats the cycle.

The typical reports that indicate a flapping line are shown in Example 18-23.

Example 18-23. The History of the DLCI, Including the Relative Time When the Service Was Created and the Last Time the PVC Status Was Changed
 1602-frame#show frame-relay pvc PVC Statistics for interface Serial0 (Frame Relay DTE)                   Active     Inactive      Deleted       Static   Local           1            0             0             0   Switched        0            0             0             0   Unused          0            0             0             0 DLCI = 74, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial0.74   input pkts 42832         output pkts 49616        in bytes 17904175   out bytes 9379033        dropped pkts 62          in FECN pkts 0   in BECN pkts 0           out FECN pkts 0          out BECN pkts 0   in DE pkts 42832         out DE pkts 0   out bcast pkts 2580       out bcast bytes 777846   pvc create time 3w2d, last time pvc status changed 00:15:53 ! If the PVC was created 3w2d ago, you must identify what causes ! the pvc status to be changed so often. 1602-frame# ! Use the show service-module command to verify how the line parameters ! are reported: 1602-frame#show service-module Module type is 4-wire Switched 56K in DDS mode, Receiver has no alarms. Current line rate is 56 Kbits/sec and role is DSU side, Last clearing of alarm counters 1d19h ! This report matches the previous one     oos/oof                :   120, last occurred 00:15:53     loss of signal         :    0,     loss of sealing current:    0, ! The last time CSU/DSU was looped back (from you, or from ! the service provider) in order to test the connection     CSU/DSU loopback       :    107, last occurred 14:11:29     loopback from remote   :    0,     DTE loopback           :    0,     line loopback          :    0, 1602-frame# 

If you check the serial interfaces, you see a high volume of errors on the input portion of the statistics, as shown in Example 18-24.

Example 18-24. To See the Number of Input and Output Errors and Their Type, Use the show interfaces Command in Enabled Mode
 1602-frame#show interfaces Serial0 is up, line protocol is up   Hardware is QUICC Serial (with onboard CSU/DSU)   MTU 1500 bytes, BW 1544 Kbit, DLY 20000 usec,      reliability 255/255, txload 1/255, rxload 1/255   Encapsulation FRAME-RELAY, loopback not set   Keepalive set (10 sec)   LMI enq sent  15696, LMI stat recvd 15107, LMI upd recvd 0, DTE LMI up   LMI enq recvd 0, LMI stat sent  0, LMI upd sent  0   LMI DLCI 0  LMI type is ANSI Annex D  frame relay DTE <output omitted>   5 minute input rate 2000 bits/sec, 4 packets/sec   5 minute output rate 2000 bits/sec, 3 packets/sec      59692 packets input, 19967056 bytes, 0 no buffer      Received 0 broadcasts, 0 runts, 6 giants, 0 throttles ! Look to the extensive number of errors and especially number of aborts.     201794 input errors, 11665 CRC, 159475 frame, 0 overrun, 0 ignored, 30654 abort      65338 packets output, 9595695 bytes, 0 underruns ! The number of interface resets is extremely high as well      0 output errors, 0 collisions, 11137 interface resets      0 output buffer failures, 0 output buffers swapped out      0 carrier transitions      DCD=up  DSR=up  DTR=up  RTS=up  CTS=up 

This output shows that you are dealing with second-layer problems, which is the reason that the service went down. Identify which layer of the service is affected and not the first layer. Examine the output from the #show service-module command, which shows that the DTE never loses the signal (loss of signal : 0). Also, no carrier transitions exist (0 carrier transitions), which is typical when the line is out of sync. Obviously, the first layer is not affected; thus, focus on the protocol layer and its components and determine which one is causing the service to go down. Recall the way that LMI works and reports the DTE down, then increments the counter and resets the interface. Check the number of interface resets to confirm that the counters were incremented. The number of interface resets does not match the carrier transitions, but CRC and frame errors, which leads you to determine that you are dealing with second-layer issues. Remembering that, verify how the other partythe core routerreports the status of the PVC. The core router is configured for DLCI = 74 in Serial4/1:0. Its status is shown in Example 18-25.

Example 18-25. Verifying the Status of PVC 74 on the Core Router
 7206-frame#show frame-relay pvc 74 PVC Statistics for interface Serial4/1:0 (Frame Relay DTE) DLCI = 74, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial4/1:0.74   input pkts 167039        output pkts 157077       in bytes 26896506   out bytes 79023826       dropped pkts 116         in FECN pkts 0   in BECN pkts 0           out FECN pkts 0          out BECN pkts 0   in DE pkts 167039        out DE pkts 0   out bcast pkts 9748      out bcast bytes 2924400   pvc create time 3w0d, last time pvc status changed 00:17:09   cir 56000     bc 56000     be 0         byte limit 875    interval 125   mincir 28000     byte increment 875   Adaptive Shaping none   pkts 156956    bytes 78972226  pkts delayed 51275     bytes delayed 56957021   shaping inactive   traffic shaping drops 0   Queuing strategy: fifo   Output queue 0/40, 116 drop, 51275 dequeued 7206-frame# 

There is nothing unusual from the core side to affect or reflect the issues that the remote side is experiencing.

Further actions to fix the service include the following steps on the remote user's router:

  • Check the performance parameters of the router with #show processes cpu.

  • Check for possible hardware problems using #show diag module_number.

  • Check for failing buffers with #show buffers fail.

  • Work with the service provider to eliminate the failing device or equipment.

After the successful resolution of this issue, it is good practice to check the status of the service from both sides the next day with the following commands:

 1602-frame#show frame-relay pvc 74 7206-frame#show frame-relay pvc 74 

The remote user's router reports the output shown in Example 18-26.

Example 18-26. The Core Router Report for the Troubled PVC 74
 PVC Statistics for interface Serial0 (Frame Relay DTE)               Active     Inactive      Deleted       Static   Local          1            0            0           0   Switched       0            0            0           0   Unused         0            0            0           0 DLCI = 74, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial4/1:0.74   input pkts 186908        output pkts 175377       in bytes 29752740   out bytes 89258825       dropped pkts 116         in FECN pkts 0   in BECN pkts 0           out FECN pkts 0          out BECN pkts 0   in DE pkts 186908        out DE pkts 0   out bcast pkts 11055     out bcast bytes 3316500 ! The last time the pvc status has changed is 17:44.54.   pvc create time 3w1d, last time pvc status changed 17:44:54   cir 56000     bc 56000     be 0         byte limit 875    interval 125   mincir 28000     byte increment 875   Adaptive Shaping none   pkts 175256    bytes 89207225  pkts delayed 57828     bytes delayed 64737774   shaping inactive   traffic shaping drops 0   Queuing strategy: fifo   Output queue 0/40, 116 drop, 57828 dequeued 7206-frame# 

The last time the PVC changed is reported after the fix is implemented (the last time the PVC status changed was 17:44:54). If the number and type of errors is not incrementing from where they were before the changes were implemented, you can consider the case closedthe troubleshooting actions have corrected the problem.

Traffic Shaping Issues

Another important cause of performance issues is related to traffic shaping settings. The two basic cases are no traffic shaping and wrong traffic shaping; both equally affect performance of the service.

The configuration for traffic shaping is covered in Chapter 16. It is a necessary feature if you need to prioritize different types of traffic by trimming the timers and counters, or configuring Bc, Be, and timing intervals.

NOTE

The Enhanced Local Management Interface (ELMI) is an interesting Cisco feature that is well-known for enabling Frame Relay quality of service (QoS) by using the 7206-frame(config-if)# frame-relay qos-autosense command. By turning this command on and off, it can perform dynamic traffic shaping. The feature enables the automated exchange of Frame Relay QoS parameter information, between the Cisco router and the Cisco switch (BPX/MGX and IGX platforms). The router uses the QoS values from the switch that are configurable to establish traffic shaping. More about this Cisco IOS feature can be found at www.cisco.com.


Because of the use of traffic shaping, performance issues are recognized in three typical scenarios:

  • Unacceptable high round-trip time (RTT)

  • The line that was provisioned for certain access rates provides less than expected performance

  • Flapping routes

High RTT Numbers

An example of the first case is when the user and the core router are located in the same area, but the test is performed from a remote geographic location. The local carriers are X and Y, and the IXC (long-distance carrier) carries the traffic with one-way latency of about 80 ms. The local loop latency is definitely lower than the two-way latency of the long-distance carrier, given the locations and distance. If you assume that two local loops (core side and the remote user's side) each have latency equal to one-way latency of the local carrier (3 x 80), you can expect a RTT of about 240 ms. Now, to check the actual results, perform a ping with 64 bytes, which means that there is no need for fragmentation/defragmentation (see Example 18-27).

Example 18-27. A Ping Test Performed to Find Out the RTT
  UNIX.cisco.com:/users/pnedeltc> ping 1602-frame PING 1602-frame.cisco.com: 56 data bytes 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=0. time=6191. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=1. time=7424. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=2. time=9101. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=3. time=11448. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=4. time=13887. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=5. time=16244. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=6. time=18760. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=7. time=22428. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=8. time=27199. ms 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=9. time=34426. ms .................. 64 bytes from 1602-frame.cisco.com (10.84.11.73): icmp_seq=376. time=114440. ms ^C ----1602-frame.cisco.com PING Statistics---- 493 packets transmitted, 135 packets received, 72% packet loss round-trip (ms)  min/avg/max = 6191/112398/133134 

As you can see from the output, the RTT values are inconsistent and significantly exceed your expectations.

If you trace the path (see Example 18-28), you see that the long-distance provider carries the trace to the remote site in 80 ms; however, the local service provider carries the trace within a local area in 200 ms. There is a problem with performance on the local loop.

Example 18-28. A Trace Route Test, Performed to Find the Highest RTT
 Starting trace - Aug 27, 2001  10:19:40 Tracing to 1602-frame [10.84.11.73].... Hops    IP Address                  RTT(ms)        DNS Name 1       161.71.86.2                 0              hop-dtb-gw1.cisco.com 2       161.71.241.153              0              hop-sbb4-gw1.cisco.com 3       161.71.241.37               0              hop-rbb-gw3.cisco.com 4       161.69.7.217                0              hop-rbb-gw1.cisco.com 5       161.69.7.158                0              hop-gb4-g0-0.cisco.com 6       161.68.86.58                81             hop-sj-pos.cisco.com 7       10.184.5.89                 80             hop-rbb-gw1.cisco.com ! The trace reaches the other end, which is the core router. 8       10.84.5.222                 80             7206-frame.cisco.com ! The trace reaches the remote user's router 9       10.84.11.73                 200            1602-frame.cisco.com Host reached ! The last hop is 200 ms RTT 

The tests of the user's router show no buffer or hardware failures. However, if you check whether traffic shaping is applied to DLCI = 60, you can see that DLCI = 60 is not listed, as shown in Example 18-29.

Example 18-29. Verifying if DLCI = 60 Is Listed Among DLCIs, with Applied Traffic Shaping
 7206-frame#show traffic-shape Interface   Se3/0:0        Access Target    Byte   Sustain   Excess    Interval  Increment Adapt VC     List   Rate      Limit  bits/int  bits/int  (ms)      (bytes)   Active 38            56000     875    7000      0         125       875       - 34            56000     875    7000      0         125       875       - 33            56000     875    7000      0         125       875       - 32            56000     875    7000      0         125       875       - 22            56000     875    7000      0         125       875       - 16            56000     875    7000      0         125       875       - Interface   Se3/0:0.17        Access Target    Byte   Sustain   Excess    Interval  Increment Adapt VC     List   Rate      Limit  bits/int  bits/int  (ms)      (bytes)   Active 17            56000     875    7000      0         125       875       - Interface   Se3/0:0.18        Access Target    Byte   Sustain   Excess    Interval  Increment Adapt VC     List   Rate      Limit  bits/int  bits/int  (ms)      (bytes)   Active 18            56000     875    7000      0         125       875       - Interface   Se3/0:0.20        Access Target    Byte   Sustain   Excess    Interval  Increment Adapt VC     List   Rate      Limit  bits/int  bits/int  (ms)      (bytes)   Active 20            56000     875    7000      0         125       875       - Interface   Se3/0:0.23        Access Target    Byte   Sustain   Excess    Interval  Increment Adapt VC     List   Rate      Limit  bits/int  bits/int  (ms)      (bytes)   Active 23            56000     875    7000      0         125       875       - Interface   Se3/0:0.24        Access Target    Byte   Sustain   Excess    Interval  Increment Adapt VC     List   Rate      Limit  bits/int  bits/int  (ms)      (bytes)   Active 24            56000     875    7000      0         125       875       - <output omitted> Interface   Se3/0:0.62        Access  Target    Byte    Sustain   Excess    Interval  Increment Adapt VC     List    Rate      Limit   bits/int  bits/int  (ms)      (bytes)   Active 62             384000    6000    384000    0         125        6000     - 

The first conclusion about this case can lead you to Scenario 1, local loop problems, or flapping links. After working with LEC, you might conclude that this is a traffic-shaping issue. The required fix is to implement the appropriate traffic-shaping map class.

Slow Performance

The second traffic shaping issue is when performance is lower than what the user is expecting. In this scenario, the service is provisioned for an access rate of 384-kbps, but performance characteristics are closer to a 56-kbps circuit. The service is operational, the serial lines do not report any errors, and both the remote user's and the core router's configurations are set up correctly, and report normal. The output in Example 18-30 shows the DLCI = 98 parameters.

Example 18-30. Verifying the Serial 4/0:0.98 Configuration on the Core Router
 7206-frame#show interfaces serial 4/0:0.98 Serial4/0:0.98 is up, line protocol is up   Hardware is Multichannel T1   Description: 1604-frame: 10.21.56.8/29 : 23161309 : 3844600235   Interface is unnumbered. Using address of Loopback2 (171.68.88.1)   MTU 1500 bytes, BW 256 Kbit, DLY 20000 usec,      reliability 255/255, txload 24/255, rxload 5/255   Encapsulation FRAME-RELAY 

The output from 7206-frame#show frame-relay pvc 98 is shown in Example 18-31.

Example 18-31. Verifying if PVC 98 Has Traffic Shaping Applied to It
 7206-frame#show frame-relay pvc 98 DLCI = 98, DLCI USAGE = UNUSED, PVC STATUS = ACTIVE, INTERFACE = Serial4/0:0   input pkts 167755        output pkts 167552       in bytes 13750582   out bytes 189232810      dropped pkts 71          in FECN pkts 0   in BECN pkts 0           out FECN pkts 0          out BECN pkts 0   in DE pkts 167755        out DE pkts 0   out bcast pkts 16392      out bcast bytes 4967386   pvc create time 5d16h, last time pvc status changed 5d16h ! These are the parameters, defining the performance.   cir 28000     bc 7000      be 0         limit 875    interval 125   mincir 28000     byte increment 875   Adaptive Shaping none   pkts 167481    bytes 189226266 pkts delayed 128118    bytes delayed 171119678   shaping inactive   traffic shaping drops 0   Serial4/0:0.98 dlci 98 is first come first serve default queuing   Output queue 0/40, 71 drop, 128118 dequeued 

Next, you need to take some measurements from the interfaces Ethernet0 and Serial1 of the remote user's router, then ping from the core router or from any server in the same area with a packet size of 3000 bytes. Measure the end user's five-minute rate by entering the following two commands, as shown in Example 18-32:

 1602-frame#show interfaces ethernet 0 | include 5 min 1602-frame#show interfaces serial 1 | include 5 min 

Example 18-32. Measuring the Input and Output Rate on Ethernet0 and Serial1 Interfaces of a Remote User's Router
 1602-frame#show interfaces ethernet 0 | include 5 min   5 minute input rate 7000 bits/sec, 13 packets/sec   5 minute output rate 8000 bits/sec, 18 packets/sec 1602-frame#show interfaces serial 1 | include 5 min   5 minute input rate 4000 bits/sec, 22 packets/sec   5 minute output rate 6000 bits/sec, 14 packets/sec 1602-frame# 

It is a good idea to monitor the RXD and TXD, and the reliability reports (reliability 255/255) of the 1602-frame router:

 Serial1 is up, line protocol is up   Hardware is QUICC Serial (with FT1 CSU/DSU WIC)   MTU 1500 bytes, BW 128 Kbit, DLY 20000 usec,      reliability 255/255, txload 7/255, rxload 85/255 

If you go back to review the previous outputs, you will notice that for a 384-kbps circuit, the core router reports the following:

 pvc create time 5d16h, last time pvc status changed 5d16h cir 28000     bc 7000      be 0         limit 875    interval 125 mincir 28000     byte increment 875   Adaptive Shaping none 

The new config inherits the default settings of Serial4/0:0, where traffic shaping is defined with no classes. The fix is easy to apply.

First, create a map class, as shown in Example 18-33.

Example 18-33. Example for Class Definition Called class-384-new
 map-class frame-relay class-384-new  no frame-relay adaptive-shaping  frame-relay cir 384000  frame-relay bc 384000  frame-relay be 128000  frame-relay mincir 256000 

Next, apply the map class, as shown in Example 18-34.

Example 18-34. The Class class-384-new Is Applied to the Interface
 interface Serial4/0:0.98 point-to-point  description 1604-frame frame: 10.21.56.8/29 : 23161309 : 3844600235  bandwidth 384  ip unnumbered Loopback2  no ip route-cache  frame-relay class class-384-new  frame-relay interface-dlci 98 IETF 

Repeat the status commands and compare the results, as shown in Example 18-35.

Example 18-35. Check the Status of PVC 98
 7206-frame#show frame-relay pvc 98 PVC Statistics for interface Serial4/0:0 (Frame Relay DTE) DLCI = 98, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial4/0:0.98   input pkts 536361        output pkts 659230       in bytes 100680624   out bytes 165617202      dropped pkts 68          in FECN pkts 0   in BECN pkts 0           out FECN pkts 0          out BECN pkts 0   in DE pkts 536361        out DE pkts 0   out bcast pkts 23237      out bcast bytes 1886266   pvc create time 2d04h, last time pvc status changed 1d05h ! The CIR now is 384000, bc=384000, be=128000.   cir 384000    bc 384000 be 128000         limit 4000   interval 125   mincir 256000    byte increment 4000  Adaptive Shaping none   pkts 659213    bytes 122881018 pkts delayed 20415     bytes delayed 19232012   shaping inactive   traffic shaping drops 0   Serial4/0:0.98 dlci 98 is first come first serve default queuing   Output queue 0/40, 29 drop, 20415 dequeued 

Finally, repeat the ping test and compare the results, as shown in Example 18-36.

Example 18-36. Measuring the Input and Output Rate on Ethernet0 and Serial1 Interfaces of the Remote User's Router, After Implementing the Map Class
 1602-frame#show interfaces ethernet 0 | include 5 min   5 minute input rate 8000 bits/sec, 15 packets/sec   5 minute output rate 192000 bits/sec, 30 packets/sec 1602-frame#show interfaces serial 1 | include 5 min   5 minute input rate 70000 bits/sec, 29 packets/sec   5 minute output rate 6000 bits/sec, 16 packets/sec 

The performance has improved significantly.

Flapping Routes

Flapping routes occur during the convergence process when there is instability in the network. Different routing protocols pose different requirements for Frame Relay, but sometimes even a lack of memory on the core router when the number of subscribed users increases can cause this issue. One symptom of network instability is when trace commands use different paths to reach the destination, indicating either a slow convergence process, or a change in the topology and a related change in the routing table.

All the routers in the network must converge on the new topology when changes exist in the network. Toward this end, they begin sharing routing information, and each update nullifies the previous decision and triggers another update to the other routers. These routers, in turn, adjust their own routing tables and generate new updates, which cause flapping routes. The recommended way of dealing with this situation is far more complex than it appears, and requires additional troubleshooting. One possible solution is powering down the affected routers and slowly allowing convergence in your network, one router at a time. For more information, check www.cisco.com. Powering down production routers is not always feasible. Also, depending on the size and configuration of the routers, it might not work to do it router by router. In extremely large networks, with large routing tables, it might be necessary to shut down all interfaces, power cycle the box, and bring it back online one interface at a time.




Troubleshooting Remote Access Networks CCIE Professional Development
Troubleshooting Remote Access Networks (CCIE Professional Development)
ISBN: 1587050765
EAN: 2147483647
Year: 2002
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net