Case Study: Using ping and traceroute to Isolate a BGP Meltdown Emergency


The Netadmin of Super E-commerce Company reported a performance issue with its existing e-commerce network. The company network has a mix of Windows and Linux servers at the data center, which is located in the corporate office. According to the network administrators, only external customers were facing issues while placing online orders. Many of them complained of long response times, and some said they had to log in several times as the session was being reset. The test team (located on the campus) as well as the employees did not experience the same issue.

The fact that the campus users were unaffected leads to the conclusion that the issue is related to the WAN connection. But it could be anything from a defective cable to a routing issue with the ISP.

After reaching the customer's site, the author was provided with the network layout, shown in Figure 2-9.

Figure 2-9. Logical Network Layout for Super E-commerce Co.


Based on the discussion in this chapter, the author utilized the following step-by-step approach for troubleshooting:

Step 1.

Connect the laptop to the server virtual LAN (VLAN) and use the ipconfig /all command to determine the IP address, default gateway, and DNS information as follows:

  Ethernet adapter Local Area Connection:         Connection-specific DNS Suffix  . : superecommerceinc.com         Description . . . . . . . . . . . : Broadcom 370x Controller         Physical Address. . . . . . . . . : 00-0c-66-er-35-A6         Dhcp Enabled. . . . . . . . . . . : Yes         Autoconfiguration Enabled . . . . : Yes         IP Address. . . . . . . . . . . . : 192.168.10.103         Subnet Mask . . . . . . . . . . . : 255.255.255.0         Default Gateway . . . . . . . . . : 192.168.10.1         DHCP Server . . . . . . . . . . . : 192.168.10.8         DNS Servers . . . . . . . . . . . : 192.168.10.9         Lease Obtained. . . . . . . . . . : Wednesday, July 20, 2005 11:30:53 PM         Lease Expires . . . . . . . . . . : Wednesday, July 27, 2005 11:30:53 PM 

Step 2.

After collecting the network information, use the traceroute command to identify the routing hops, as follows:

 c:\windows\system32>tracert www.cisco.com Tracing route to www.cisco.com [198.133.219.25] over a maximum of 30 hops:   1    <1 ms    <1 ms    <1 ms  192.168.10.1   2    50 ms     9 ms     9 ms  12.12.12.1   3    29 ms     9 ms     9 ms  12.44.197.161   4     8 ms     8 ms     9 ms  12.244.67.78   5    16 ms    16 ms    16 ms  www.cisco.com [198.133.219.25] 

Step 3.

On the Windows-based laptop, open four command-line sessions and tile them to create a concurrent view as shown in Figure 2-10 and as follows:

Session 1 uses the ping 192.168.10.103 -t command. The continuous ping monitors the network interface of the laptop.

Session 2 uses the ping 192.168.10.1 t command. The continuous ping monitors the default gateway for the Server-Farm VLAN.

Session 3 uses the ping 192.168.1.1 t command. The continuous ping monitors the inside interface of the firewall.

Session 4 uses the ping 12.12.12.1 t command. The continuous ping monitors the Ethernet interface of the edge router.

Figure 2-10. Concurrent View of Multiple Sessions


Observation

All sessions have good response times, but after observing over a longer period, a pattern shows up. Session 4 (the one with continuous ping to the edge router 12.12.12.1) times out for a period of approximately 4 seconds at an interval of 30 seconds. Apart from the timeouts on the Ethernet interface of the edge router, all other responses are consistently less than 1 ms.

Conclusions

Based on the observations made in previous section, the conclusions are as follows:

  • The internal network, including the firewall, is functioning properly.

  • The problem is related to the edge router.

  • Because of the periodic recurrence of the timeouts, the problem is probably related to a software issue that occurs every 30 seconds.

Actions

Telnet into the edge router and issue a show proc cpu command to verify the CPU load, as follows:

  edge-router# show proc cpu    CPU utilization for five seconds: 3%/0%; one minute: 82%; five minutes: 91% 

The output shows a high load on the CPU. Further investigation reveals that the edge router was incorrectly configured with the no ip route-cache command. This stops the fast switching of the packets in the router. At the same time, the edge router is receiving full Border Gateway Protocol (BGP) routes from the ISP every 30 seconds. So, during that interval, the router processes the routes and stops processing the data traffic.

After adding the ip route-cache command to the configurations, the router starts fast-switching the data traffic, even during high CPU intervals. The problem was solved.

Once again, the super ping command saves the world!



Network Administrators Survival Guide
Network Administrators Survival Guide
ISBN: 1587052113
EAN: 2147483647
Year: 2006
Pages: 106

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net