Step 4: Create an Action Plan Troubleshooting WAN performance issues can be very difficult. As any good and honest network engineer will tell you, without the proper tools, solving these types of problems generally will be a hit or miss process. However, solid experience in the trenches of network troubleshooting can help isolate the problem and eventually resolve performance problems. The action plan was to identify and correct why the router was dropping so many packets. From our past experiences and help from Cisco documentation, drops can be corrected by increasing the output queue on the WAN interface or if you turn off WFQ (Weighted Fair Queuing) by issuing the no fair-queue command.
Step 5: Implement the Action Plan Hold Queues In an effort to control the number of drops on Router C and Router D, the hold queues were increased to 300 on Router C and Router D. Our experiences with other routers in our network showed this to work well. The steps to do this were: ROUTER_C#conf t Enter configuration commands, one per line. End with CNTL/Z. ROUTER_C(config-if)#hold-queue 300 out ROUTER_C(config-if)#no fair-queue In this particular situation, this change had negligible performance impact, and we found that at times performance actually got somewhat worse. It appeared that we now were dropping fewer frames and saturating an overloaded circuit with even more traffic as a result. It soon became clear that we needed to better understand the traffic flow between Downtown and Headquarters. We reasoned that if we could determine the source and destination IP addresses being passed, we might be able to isolate the problem. IP Accounting Because we had no access to a sniffer or probe on these circuits, we enabled IP accounting on the serial interface of Router C to determine if there was a particular address that was generating the majority of the traffic. Router C was chosen because it was closest to the impacted users Downtown and we believed that it could provide us with the most relevant information for this problem. The steps to do this were as follows: ROUTER_C# conf t Enter configuration commands, one per line. End with CNTL/Z. ROUTER_C(config)#int s0/0:0.1 ROUTER_C(config-if)#ip accounting The results showed a tremendous amount (approximately 30 percent) of the utilization was coming from a system on the LAN segment. In particular, these packets were directed broadcasts from this device (destination 177.2.4.255). Directed broadcasts are a special type of broadcast, used often in the MS Windows WINS environment. Directed broadcasts can create a problem if there is an excessive number of them transgressing the WAN environment. By contrast, normal broadcasts (that is, destination 255.255.255.255) stay on the local LAN and do not impact WAN routing or performance. The relevant show ip accounting output is as follows: ROUTER_C# show ip accounting Source Destination Packets Bytes 177.1.1.7 177.2.4.255 100322 53732122
Step 6: Gather Results Site documentation showed us that the Ethernet segments connected to the Downtown site were multi-netted (that is, they had multiple logical IP segments on the same physical wire). Figure 8-7 shows the Downtown LAN topology. IP Accounting Data Analysis Because Router C and Router D both had been configured as an OSPF stub area, they automatically forwarded any unknown packets through their default router (serial interface), namely to Router A. The impact was an extremely high traffic load on Router C and Router Ds WAN links to Router A. Previously, this was seen on the Ok CIR links to Router B. This problem was resolved once secondary addresses were put on Router C & D, which correctly reflected the multi-netted configuration of Downtown location.
The commands to put the secondary IP addresses on Router C and Router D were as follows: ROUTER_C#conf t Enter configuration commands, one per line. End with CNTL/Z ROUTER_C#int e0 ROUTER_C#ip address 177.2.4.1 255.255.255.0 secondary When these secondary addresses were in place, Routers C and D knew to keep localized broadcast traffic local. ip accounting was again run on the Cisco routers which confirmed that directed broadcasts were no longer being propagated through the Cisco routers, since both Router C and D had correct IP addresses for all local multi-netted networks. Performance greatly improved, as evidenced by the now stabilized number of drops and confirmed PVC utilization returning to normal levels. We made a phone call to the users who reported the initial problem, and the users confirmed that their performance was now working correctly. Case Study Conclusion and Design TipsAn essential approach in troubleshooting and eventually correcting this problem was to follow a structured troubleshooting methodology. Using the seven steps to troubleshooting as a guide, we corrected these problems in an orderly fashion.
|