Troubleshooting EIGRP | Routing TCP[s]IP (Vol. 11998)

Troubleshooting the exchange of IGRP or RIP route information is a reasonably simple procedure. Routing updates are either propagated or they are not, and they either contain accurate information or they do not. The added complexity of EIGRP means an added complexity to the troubleshooting procedure. Neighbor tables and adjacencies must be verified , the query/response procedure of DUAL must be followed, and the influences of VLSM on automatic summarization must be considered .

This section's case study describes a sequence of events that typically can be used when pursuing an EIGRP problem. Following the case study is a discussion of an occasional cause of instabilities in larger EIGRP internets .

Case Study: A Missing Neighbor

Figure 8.45 shows a small EIGRP internetwork. Users are complaining that subnet 192.168.16.224/28 is unreachable. An examination of the route tables reveals that something is wrong at router Grissom (Figure 8.46).

Figure 8.45. Subnet 192.168.16.224/28 is not reachable through Grissom in this example of an EIGRP internetwork.

graphics/08fig45.gif

^[17]

^[17] When troubleshooting an internetwork, it is a good practice to verify that the addresses of all router interfaces belong to the correct subnet.

Figure 8.46. The route tables of Shepard and Grissom show that Grissom's EIGRP process is not advertising or receiving routes on subnet 192.168.16.16/28.

graphics/08fig46.gif

The following observations are made from the two route tables of Figure 8.46:

Shepard does not have subnets 192.168.16.40/30 and 192.168.16.224/28 in its route table , although Grissom does.
Grissom's route table does not contain any of the subnets that should be advertised by Glenn or Shepard.
Shepard's route table contains the subnets advertised by Glenn (and Glenn's table contains the subnets advertised by Shepard, although its route table is not included in the figure).

The conclusion to be drawn from these observations is that Grissom is not advertising or receiving routes correctly over subnet 192.168.16.16/28.

Among the possible causes, the simplest causes should be examined first. These are:

An incorrect interface address or mask
An incorrect EIGRP process ID
A missing or incorrect network statement

In this case, there are no EIGRP or address configuration errors.

Next , the neighbor tables should be examined. Looking at the neighbor tables at Grissom, Shepard, and Glenn (Figure 8.47), two facts stand out:

Grissom (192.168.16.19) is in its neighbors' tables, but its neighbors are not in Grissom's neighbor table.
The entire internetwork has been up for more than five hours; this information is reflected in the uptime statistic for all neighbors except Grissom. However, Grissom's uptime shows approximately one minute.

Figure 8.47. Subnet 192.168.16.224/28 is not reachable through Grissom in this example of an EIGRP internetwork.

graphics/08fig47.gif

If Grissom is in Shepard's neighbor table, Shepard must be receiving Hellos from it. Grissom, however, is apparently not receiving Hellos from Shepard. Without this two-way exchange of Hello packets, an adjacency will not be established and route information will not be exchanged.

A closer examination of Shepard's and Glenn's neighbor tables reinforces this hypothesis:

The SRTT for Grissom is 0, indicating that a packet has never made the round-trip.
The RTO for Grissom has increased to five and eight seconds, respectively.
There is a packet enqueued for Grissom (Q Cnt).
The sequence number recorded for Grissom is 0, indicating that no reliable packets have ever been received from it.

These factors indicate that the two routers are trying to send a packet reliably to Grissom, but are not receiving an ACK.

In Figure 8.48, debug eigrp packets is used at Shepard to get a better look at what is happening. All EIGRP packet types will be displayed, but a second debug command is used with it: debug ip eigrp neighbor 75 192.168.16.19 . This command adds a filter to the first command. It tells debug eigrp packet to display only IP packets of EIGRP 75 (the process ID of the routers in Figure 8.45) and only those packets that concern neighbor 192.168.16.19 (Grissom).

Figure 8.48. The command debug ip eigrp neighbor is used to control the packets displayed by debug eigrp packets.

graphics/08fig48.gif

Figure 8.48 shows that Hello packets are being received from Grissom. It also shows that Shepard is attempting to send updates to Grissom; Grissom is not acknowledging them. After the 16th retry, the message "Retransmission retry limit exceeded" is displayed. This exceeded limit accounts for the low uptime shown for Grissom in the neighbor tables ”when the retransmission retry limit is exceeded, Grissom is removed from the neighbor table. But because Hellos are still being received from Grissom, it quickly reappears in the table and the process begins again.

Figure 8.49 shows the output from debug eigrp neighbors at Shepard. This command is not IP specific, but instead shows EIGRP neighbor events. Here, two instances of the events described in the previous paragraph are displayed: Grissom is declared dead as the retransmission limit is exceeded but is immediately "revived" when its next Hello is received.

Figure 8.49. Debug eigrp neighbors displays neighbor events.

graphics/08fig49.gif

Although Figure 8.48 shows that update packets are being sent to Grissom, observation of EIGRP packets at that router show that they are not being received (Figure 8.50). Because Grissom is successfully exchanging Hellos with Cooper, Grissom's EIGRP process must be working. Suspicion therefore falls on Grissom's Ethernet interface. An inspection of the configuration file shows that an access list is configured as an incoming filter on E0:

 
 interfaceEthernet0 
 ipaddress192.168.16.19255.255.255.240 
 ipaccess-group150in 
 ! 
 ! 
 access-list150permittcpanyanyestablished 
 access-list150permittcpanyhost192.168.16.238eqftp 
 access-list150permittcphost192.168.16.201anyeqtelnet 
 access-list150permittcpanyhost192.168.16.230eqpop3 
 access-list150permitudpanyanyeqsnmpaccess-list150permit 
 icmpany192.168.16.2240.0.0.15

Figure 8.50. Grissom is exchanging Hellos with Cooper via interface S0 and is sending Hellos out E0. However, Grissom is not receiving any EIGRP packets on interface EO.

graphics/08fig50.gif

When EIGRP packets are received at Grissom's E0 interface, they are first filtered through access list 150. They will not match any entry on the list and are therefore being dropped. The problem is resolved (Figure 8.51) by adding the following entry to the access list:

 
 access-list150permiteigrp192.168.16.160.0.0.15any

Figure 8.51. When an entry is added to the access list to permit EIGRP packets, Grissom's neighbor and route tables show that it now has routes to all subnets.

graphics/08fig51.gif

Stuck-in-Active Neighbors

When a route goes active and queries are sent to neighbors, the route will remain active until a reply is received for every query. But what happens if a neighbor is dead or otherwise incapacitated and cannot reply? The route would stay permanently active. The active timer is designed to prevent this situation. The timer is set when a query is sent. If the timer expires before a reply to the query is received, the route is declared stuck-in-active, the neighbor is presumed dead, and it is flushed from the neighbor table ^[18] . The SIA route and any other routes via that neighbor are eliminated from the route table. DUAL will be satisfied by considering the neighbor to have replied with an infinite metric.

^[18] As mentioned previously, the default active time is 3 minutes and can be changed with the command timers active-time .

In reality, this sequence of events should never happen. The loss of Hellos should identify a disabled neighbor long before the active timer expires.

But what happens in large EIGRP networks where a query might, like the bunny in the battery advertisement, keep going and going? Remember that queries cause the diffusing calculation to grow larger, whereas replies cause it to grow smaller (refer to Figure 8.10). Queries must eventually reach the edge of the internetwork, and replies must eventually begin coming back, but if the diameter of the diffusing calculation grows large enough, an active timer may expire before all replies are received. The result, flushing a legitimate neighbor from the neighbor table, is obviously destabilizing.

When neighbors mysteriously disappear from neighbor tables and then reappear, or users complain of intermittently unreachable destinations, SIA routes may be the culprit. Checking the error logs of routers is a good way to find out whether SIAs have occurred (Figure 8.52).

Figure 8.52. The final entry of this error log shows a SIA message.

graphics/08fig52.gif

When chasing the cause of SIAs, close attention should be paid to the topology table in routers. If routes can be "caught" in the active state, the neighbors from whom queries have not yet been received should be noted. For example, Figure 8.53 shows a topology table in which several routes are active. Notice that most of them have been active for 15 seconds and that one (10.6.1.0) has been active for 41 seconds.

Figure 8.53. This topology table shows several active routes, all of which are waiting for a reply from neighbor 10.1.2.1.

graphics/08fig53.gif

Notice also that in each case, the neighbor 10.1.2.1 has its reply status flag (r) set. That is the neighbor from which replies have not yet been received. There may be no problem with the neighbor itself or with the link to the neighbor, but this information points to the direction within the internetwork topology in which the investigation should proceed.

Common causes of SIAs in larger EIGRP internetworks are heavily congested , low-bandwidth data links and routers with low memory or overutilized CPUs. The problem will be exacerbated if these limited resources must handle very large numbers of queries.

The careless adjustment of the bandwidth parameter on interfaces may be another cause of SIAs. Recall that EIGRP is designed to use no more than 50% of the available bandwidth of a link. This restriction means that EIGRP's pacing is keyed to the configured bandwidth. If the bandwidth is set artificially low in an attempt to manipulate routing choices, the EIGRP process may be starved. If IOS 11.2 or later is being run, the command ip bandwidth-percent eigrp may be used to adjust the percentage of bandwidth used.

Note

Changing the percentage of bandwidth used by EIGRP

For example, suppose that an interface is connected to a 56K serial link, but the bandwidth is set to 14K. EIGRP would limit itself to 50% of this amount, or 7K. The following commands adjust the EIGRP bandwidth percent to 200% ”200% of 14K, which is 50% of the actual bandwidth of the 56K link:

 
 interfaceSerial3 
 ipaddress172.18.107.210255.255.255.240 
 bandwidth14 
 ipbandwidth-percenteigrp1200

Increasing the active timer period with the timers active-time command may help avoid SIAs in some situations, but this step should not be taken without careful consideration of the effects it may have on reconvergence.

A good internetwork design is the best solution to instabilities such as SIA routes. By using a combination of intelligent address assignment, route filtering, default routes, and summarization, boundaries may be constructed in a large EIGRP internetwork to restrict the size and scope of diffusing computations . Chapter 13, "Route Filtering," includes an example of such a design.