Troubleshooting EIGRP | Routing TCP/IP, Volume 1 (2nd Edition)

Troubleshooting the exchange of RIP route information is a reasonably simple procedure. Routing updates are either propagated or they are not, and they either contain accurate information or they do not. The added complexity of EIGRP means an added complexity to the troubleshooting procedure. Neighbor tables and adjacencies must be verified, the query/response procedure of DUAL must be followed, and the influences of VLSM on automatic summarization must be considered.

This section's case study describes a sequence of events that typically can be used when pursuing an EIGRP problem. Following the case study is a discussion of an occasional cause of instabilities in larger EIGRP internets.

Case Study: A Missing Neighbor

Figure 7-38 shows a small EIGRP network. Users are complaining that subnet 192.168.16.224/28 is unreachable. An examination of the route tables reveals that something is wrong at router Grissom (Example 7-42).^[18]

^[18] When troubleshooting a network, it is a good practice to verify that the addresses of all router interfaces belong to the correct subnet.

Figure 7-38. Subnet 192.168.16.224/28 is not reachable through Grissom in this example of an EIGRP network.

Example 7-42. The route tables of Shepard and Grissom show that Grissom's EIGRP process is not advertising or receiving routes on subnet 192.168.16.16/28.

Grissom#show ip route Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP        D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area        N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2        E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP        i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, * - candidate default        U - per-user static route, o - ODR Gateway of last resort is not set      192.168.16.0/24 is variably subnetted, 3 subnets, 2 masks C       192.168.16.40/30 is directly connected, Serial0 C       192.168.16.16/28 is directly connected, Ethernet0 D       192.168.16.224/28 [90/2195456] via 192.168.16.42, 01:07:26, Serial0 _______________________________________________________________________________ Shepard#show ip route Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP        D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area        N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2        E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP        i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, * - candidate default        U - per-user static route, o - ODR Gateway of last resort is not set      192.168.16.0/24 is variably subnetted, 4 subnets, 2 masks C       192.168.16.36/30 is directly connected, Serial0 C       192.168.16.16/28 is directly connected, Ethernet0 D       192.168.16.192/28 [90/2297856] via 192.168.16.38, 01:07:20, Serial0 D       192.168.16.128/28 [90/307200] via 192.168.16.17, 01:07:20, Ethernet0

The following observations are made from the two route tables of Example 7-42:

Shepard does not have subnets 192.168.16.40/30 and 192.168.16.224/28 in its route table, although Grissom does.
Grissom's route table does not contain any of the subnets that should be advertised by Glenn or Shepard.
Shepard's route table contains the subnets advertised by Glenn (and Glenn's table contains the subnets advertised by Shepard, although its route table is not included in Example 7-42).

The conclusion to be drawn from these observations is that Grissom is not advertising or receiving routes correctly over subnet 192.168.16.16/28.

Among the possible causes, the simplest causes should be examined first. These follow:

An incorrect interface address or mask
An incorrect EIGRP process ID
A missing or incorrect network statement

In this case, there are no EIGRP or address configuration errors.

Next, the neighbor tables should be examined. Looking at the neighbor tables at Grissom, Shepard, and Glenn (Example 7-43), two facts stand out:

Grissom (192.168.16.19) is in its neighbors' tables, but its neighbors are not in Grissom's neighbor table.
The entire network has been up for more than five hours; this information is reflected in the uptime statistic for all neighbors except Grissom. However, Grissom's uptime shows approximately one minute.

Example 7-43. Shepard and Glenn see Grissom as a neighbor, but Grissom does not see them. This suggests that Shepard and Glenn are receiving Hellos from Grissom, but Grissom is not receiving Hellos from Shepard and Glenn.

Grissom#show ip eigrp neighbors IP-EIGRP neighbors for process 75 H   Address           Interface   Hold    Uptime    SRTT    RTO    Q    Seq                                   (sec)             (ms)          Cnt   Num 0   192.168.16.42     Se0           11    05:27:11   23     200    0    8 ___________________________________________________________________________ Shepard#show ip eigrp neighbors IP-EIGRP neighbors for process 75 H   Address           Interface   Hold    Uptime    SRTT    RTO    Q    Seq                                   (sec)             (ms)          Cnt   Num 1   192.168.16.19     Et0           12    00:01:01    0    5000    1    0 2   192.168.16.17     Et0           11    05:27:33    8     200    0    6 0   192.168.16.38     Se0           14    05:27:34   22     200    0    10 ___________________________________________________________________________ Glenn#show ip eigrp neighbors IP-EIGRP neighbors for process 75 H   Address           Interface   Hold    Uptime    SRTT    RTO    Q    Seq                                   (sec)             (ms)          Cnt   Num 1   192.168.16.19     Et0           14    00:00:59    0    8000    1    0 2   192.168.16.18     Et0           10    05:30:11    9      20    0    7 0   192.168.16.130    Et1          12    05:30:58    6      20    0    7

If Grissom is in Shepard's neighbor table, Shepard must be receiving Hellos from it. Grissom, however, is apparently not receiving Hellos from Shepard. Without this two-way exchange of Hello packets, an adjacency will not be established and route information will not be exchanged.

A closer examination of Shepard's and Glenn's neighbor tables reinforces this hypothesis:

The SRTT for Grissom is 0, indicating that a packet has never made the round trip.
The RTO for Grissom has increased to five and eight seconds, respectively.
There is a packet enqueued for Grissom (Q Cnt).
The sequence number recorded for Grissom is 0, indicating that no reliable packets have ever been received from it.

These factors indicate that the two routers are trying to send a packet reliably to Grissom, but are not receiving an ACK.

In Example 7-44, debug eigrp packets is used at Shepard to get a better look at what is happening. All EIGRP packet types will be displayed, but a second debug command is used with it: debug ip eigrp neighbor 75 192.168.16.19. This command adds a filter to the first command. It tells debug eigrp packet to display only IP packets of EIGRP 75 (the process ID of the routers in Figure 7-38) and only those packets that concern neighbor 192.168.16.19 (Grissom).

Example 7-44. The command debug ip eigrp neighbor is used to control the packets displayed by debug eigrp packets.

Shepard#debug eigrp packets EIGRP Packets debugging is on    (UPDATE, REQUEST, QUERY, REPLY, HELLO, IPXSAP, PROBE, ACK) Shepard#debug ip eigrp neighbor 75 192.168.16.19 IP Neighbor target enabled on AS 75 for 192.168.16.19 IP-EIGRP Neighbor Target Events debugging is on EIGRP: Sending UPDATE on Ethernet0 nbr 192.168.16.19, retry 14, RTO 5000    AS 75, Flags 0x1, Seq 22/0 idbQ 1/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 serno 1-4 EIGRP: Received HELLO on Ethernet0 nbr 192.168.16.19    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 EIGRP: Sending UPDATE on Ethernet0 nbr 192.168.16.19, retry 15, RTO 5000 AS 75, Flags 0x1, Seq 22/0 idbQ 1/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 serno 1-4 EIGRP: Received HELLO on Ethernet0 nbr 192.168.16.19    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 EIGRP: Sending UPDATE on Ethernet0 nbr 192.168.16.19, retry 16, RTO 5000    AS 75, Flags 0x1, Seq 22/0 idbQ 1/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 serno 1-4 EIGRP: Received HELLO on Ethernet0 nbr 192.168.16.19    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 EIGRP: Retransmission retry limit exceeded EIGRP: Received HELLO on Ethernet0 nbr 192.168.16.19    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 EIGRP: Enqueueing UPDATE on Ethernet0 nbr 192.168.16.19 iidbQ un/rely 0/1 peerQ un/rely 0/0 serno 1-4 EIGRP: Sending UPDATE on Ethernet0 nbr 192.168.16.19    AS 75, Flags 0x1, Seq 23/0 idbQ 1/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 serno 1-4

Example 7-44 shows that Hello packets are being received from Grissom. It also shows that Shepard is attempting to send updates to Grissom; Grissom is not acknowledging them. After the 16th retry, the message "Retransmission retry limit exceeded" is displayed. This exceeded limit accounts for the low uptime shown for Grissom in the neighbor tableswhen the retransmission retry limit is exceeded, Grissom is removed from the neighbor table. But because Hellos are still being received from Grissom, it quickly reappears in the table and the process begins again.

Example 7-45 shows the output from debug eigrp neighbors at Shepard. This command is not IP specific, but instead shows EIGRP neighbor events. Here, two instances of the events described in the previous paragraph are displayed: Grissom is declared dead as the retransmission limit is exceeded but is immediately "revived" when its next Hello is received.

Example 7-45. debug eigrp neighbors displays neighbor events.

Shepard#debug eigrp neighbors EIGRP Neighbors debugging is on Shepard# EIGRP: Retransmission retry limit exceeded EIGRP: Holdtime expired EIGRP: Neighbor 192.168.16.19 went down on Ethernet0 EIGRP: New peer 192.168.16.19 EIGRP: Retransmission retry limit exceeded EIGRP: Holdtime expired EIGRP: Neighbor 192.168.16.19 went down on Ethernet0 EIGRP: New peer 192.168.16.19

Although Example 7-44 shows that update packets are being sent to Grissom, observation of EIGRP packets at that router shows that they are not being received (Example 7-46).

Example 7-46. Grissom is exchanging Hellos with Cooper via interface S0 and is sending Hellos out E0. However, Grissom is not receiving any EIGRP packets on interface EO.

Grissom#debug eigrp packets EIGRP Packets debugging is on      (UPDATE, REQUEST, QUERY, REPLY, HELLO, IPXSAP, PROBE, ACK) Grissom# EIGRP: Sending HELLO on Serial0    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 EIGRP: Received HELLO on Serial0 nbr 192.168.16.42    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/0 EIGRP: Sending HELLO on Ethernet0    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 EIGRP: Sending HELLO on Serial0    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 EIGRP: Received HELLO on Serial0 nbr 192.168.16.42    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/0 EIGRP: Sending HELLO on Ethernet0    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 EIGRP: Sending HELLO on Serial0    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 EIGRP: Sending HELLO on Ethernet0    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 EIGRP: Received HELLO on Serial0 nbr 192.168.16.42    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/0 EIGRP: Sending HELLO on Serial0    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 EIGRP: Sending HELLO on Ethernet0    AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0

Because Grissom is successfully exchanging Hellos with Cooper, Grissom's EIGRP process must be working. Suspicion therefore falls on Grissom's Ethernet interface. An inspection of the configuration file shows that an access list is configured as an incoming filter on E0 in Example 7-47.

Example 7-47. An incoming access-list is denying EIGRP packets.

interface Ethernet0  ip address 192.168.16.19 255.255.255.240  ip access-group 150 in ! ! access-list 150 permit tcp any any established access-list 150 permit tcp any host 192.168.16.238 eq ftp access-list 150 permit tcp host 192.168.16.201 any eq telnet access-list 150 permit tcp any host 192.168.16.230 eq pop3 access-list 150 permit udp any any eq snmp access-list 150 permit icmp any 192.168.16.224 0.0.0.15

When EIGRP packets are received at Grissom's E0 interface, they are first filtered through access list 150. They will not match any entry on the list and are therefore being dropped. The problem is resolved (Example 7-48) by adding the following entry to the access list:

access-list 150 permit eigrp 192.168.16.16 0.0.0.15 any

Example 7-48. When an entry is added to the access list to permit EIGRP packets, Grissom's neighbor and route tables show that it now has routes to all subnets.

Grissom#show ip eigrp neighbors IP-EIGRP neighbors for process 75 H   Address                 Interface    Hold Uptime    SRTT   RTO   Q   Seq                                          (sec)          (ms)        Cnt  Num 2   192.168.16.17           Et0             10 00:06:20    4   200   0   41 1   192.168.16.18           Et0             14 00:06:24   15   200   0   85 0   192.168.16.42           Se0             10 06:22:56   22   200   0   12 Grissom#show ip route Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP        D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area        N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2        E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP        i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, * - candidate default        U - per-user static route, o - ODR Gateway of last resort is not set    192.168.16.0/24 is variably subnetted, 6 subnets, 2 masks C    192.168.16.40/30 is directly connected, Serial0 D    192.168.16.36/30 [90/2195456] via 192.168.16.18, 00:06:27, Ethernet0 C    192.168.16.16/28 is directly connected, Ethernet0 D    192.168.16.224/28 [90/2195456] via 192.168.16.42, 00:06:12, Serial0 D    192.168.16.192/28 [90/2323456] via 192.168.16.18, 00:06:27, Ethernet0 D    192.168.16.128/28 [90/307200] via 192.168.16.17, 00:06:12, Ethernet0 Grissom#

Stuck-in-Active Neighbors

When a route goes active and queries are sent to neighbors, the route will remain active until a reply is received for every query. But what happens if a neighbor is dead or otherwise incapacitated and cannot reply? The route would stay permanently active. The active timer and SIA-retransmit timer are designed to prevent this situation. Both the active timer and the SIA-retransmit timer are set when a query is sent. If the SIA-retransmit timer is not supported by the router's IOS (IOS versions earlier then 12.2[4.1]), only the active timer is used. If the timers expire before a reply to the query is received, the route is declared stuck-in-active, the neighbor is presumed dead, and it is flushed from the neighbor table.^[19] The SIA route and any other routes via that neighbor are eliminated from the route table. DUAL will be satisfied by considering the neighbor to have replied with an infinite metric.

^[19] As mentioned previously, the default active time is three minutes. It can be changed with the command timers active-time.

In reality, this sequence of events should never happen. The loss of Hellos should identify a disabled neighbor long before the active timer expires.

But what happens in large EIGRP networks where a query might, like the bunny in the battery advertisement, keep going and going? Remember that queries cause the diffusing calculation to grow larger, whereas replies cause it to grow smaller (refer to Figure 7-6). Queries must eventually reach the edge of the network, and replies must eventually begin coming back, but if the diameter of the diffusing calculation grows large enough, an active timer might expire before all replies are received. The result, flushing a legitimate neighbor from the neighbor table, is obviously destabilizing.

When neighbors mysteriously disappear from neighbor tables and then reappear, or users complain of intermittently unreachable destinations, SIA routes might be the culprit. Checking the error logs of routers is a good way to find out whether SIAs have occurred (Example 7-49).

Example 7-49. The final entry of this error log shows a SIA message.

Gagarin#show logging Syslog logging: enabled (0 messages dropped, 0 flushes, 0 overruns)     Console logging: level debugging, 3369 messages logged     Monitor logging: level debugging, 0 messages logged     Trap logging: level informational, 71 message lines logged     Buffer logging: level debugging, 3369 messages logged Log Buffer (4096 bytes):    ...    ...    ... DUAL: dual_rcvupdate(): 10.51.1.0/24 via 10.1.2.1 metric 409600/128256 DUAL: Find FS for dest 10.51.1.0/24. FD is 4294967295, RD is 4294967295 found DUAL: RT installed 10.51.1.0/24 via 10.1.2.1 DUAL: Send update about 10.51.1.0/24. Reason: metric chg DUAL: Send update about 10.51.1.0/24. Reason: new if DUAL: dual_rcvupdate(): 10.52.1.0/24 via 10.1.2.1 metric 409600/128256 DUAL: Find FS for dest 10.52.1.0/24. FD is 4294967295, RD is 4294967295 found %DUAL-3-SIA: Route 10.11.1.0/24 stuck-in-active state in IP-EIGRP 1. Cleaning up Gagarin#

When chasing the cause of SIAs, close attention should be paid to the topology table in routers. If routes can be "caught" in the active state, the neighbors from whom queries have not yet been received should be noted. For example, Example 7-50 shows a topology table in which several routes are active. Notice that most of them have been active for 15 seconds and that one (10.6.1.0) has been active for 41 seconds.

Example 7-50. This topology table shows several active routes, all of which are waiting for a reply from neighbor 10.1.2.1.

Gagarin#show ip eigrp topology IP-EIGRP Topology Table for process 1 Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply        r - Reply Status A 10.11.1.0/24, 0 successors, FD is 3072128000, Q     1 replies, active 00:00:15, query-origin: Local origin     Remaining replies:         via 10.1.2.1, r, Ethernet0 A 10.10.1.0/24, 0 successors, FD is 3584128000, Q     1 replies, active 00:00:15, query-origin: Local origin     Remaining replies:         via 10.1.2.1, r, Ethernet0 A 10.9.1.0/24, 0 successors, FD is 4096128000, Q     1 replies, active 00:00:15, query-origin: Local origin     Remaining replies:         via 10.1.2.1, r, Ethernet0 A 10.2.1.0/24, 1 successors, FD is Inaccessible, Q     1 replies, active ve 00:00:15, query-origin: Local origin     Remaining res:         via 10.1.2.1, r, Ethernet0 P 10.1.2.0/24, 1 successors, FD is 281600         via Connected, Ethernet0 A 10.6.1.0/24, 0 successors, FD is 3385160704, Q     1 replies, active 00:00:41, query-origin: Local origin     Remaining replies:         via 10.1.2.1, r, Ethernet0 A 10.27.1.0/24, 0 successors, FD is 3897160704, Q --More-

Notice also that in each case, the neighbor 10.1.2.1 has its reply status flag (r) set. That is the neighbor from which replies have not yet been received. There might be no problem with the neighbor itself or with the link to the neighbor, but this information points to the direction within the network topology in which the investigation should proceed.

Common causes of SIAs in larger EIGRP networks are heavily congested, low-bandwidth data links and routers with low memory or overutilized CPUs. The problem will be exacerbated if these limited resources must handle very large numbers of queries.

The careless adjustment of the bandwidth parameter on interfaces might be another cause of SIAs. Recall that EIGRP is designed to use no more than 50 percent of the available bandwidth of a link. This restriction means that EIGRP's pacing is keyed to the configured bandwidth. If the bandwidth is set artificially low in an attempt to manipulate routing choices, the EIGRP process might be starved. If IOS 11.2 or later is being run, the command ip bandwidth-percent eigrp may be used to adjust the percentage of bandwidth used.

For example, suppose that an interface is connected to a 56K serial link, but the bandwidth is set to 14K. EIGRP would limit itself to 50 percent of this amount, or 7K. The commands in Example 7-51 adjust the EIGRP bandwidth percent to 200 percent200 percent of 14K, which is 50 percent of the actual bandwidth of the 56K link.

Example 7-51. Router configuration adjusts the percentage of the configured bandwidth that EIGRP will use.

interface Serial 3  ip address 172.18.107.210 255.255.255.240  bandwidth 14  ip bandwidth-percent eigrp 1 200

Increasing the active timer period with the timers active-time command might help avoid SIAs in some situations, but this step should not be taken without careful consideration of the effects it might have on reconvergence.

A new timer, the SIA-retransmit timer, and the two new EIGRP packet types, SIA-query and SIA-reply, help to minimize SIAs and to push the reset of the neighbor to link that is actually having the problem responding to queries.

Consider the network in Figure 7-39. From router Mercury, EIGRP will route traffic to network 172.16.100.0 via Apollo. Vostok is not a feasible successor because the metric from Vostok to 172.16.100.0 is too high. Vostok routes traffic to 172.16.100.0 via Mercury and Apollo. Soyuz is not a feasible successor because the metric from Soyuz to 172.16.100.0 is too high.

Figure 7-39. Mercury does not list Vostok as a feasible successor to 172.16.100.0/24.

When the link between Mercury and Apollo fails, as shown in Figure 7-40, Mercury places the address 172.16.100.0 (and any other address known via neighbor Apollo) into Active, and sends a query to Vostok. Vostok also places the address into Active state, and sends a query to Soyuz. The Active timers are set. In addition, the SIA-retransmit timers are set. The SIA-retransmit timer is set to one-half the value of the Active timer, typically 90 seconds.

Figure 7-40. SIA-queries and SIA-replies are used to avoid SIA conditions.

When the SIA-retransmit timer expires, Mercury sends an SIA-query to Vostok. Vostok sends an SIA-query to Soyuz. Vostok responds to the SIA-query from Mercury with an SIA-reply. Mercury resets the Active timer and the SIA-retransmit timer. The routers will send up to three SIA-queries (assuming no reply has been received from the original address query) as long as SIA-replies are received, before resetting a neighbor. So as long as a neighbor router responds to the SIA-queries, it won't be declared stuck-in-active and reset, for six minutes, assuming a default Active time of 180 seconds. This gives ample time for a large network to respond to queries.

But, say there is a problem on the link from Vostok to Soyuz that is allowing enough Hellos to get through to keep the neighbors active, but the SIA-reply is not received by Vostok within the SIA-retransmit time. If no SIA-reply is received within 90 seconds of a SIA-query, and no response to the original address query has been received, Vostok will reset neighbor Soyuz and reply to Mercury's original query that the address is unreachable.

The SIA-retransmit timer does two things. If neighbors are responding to SIA-queries, large networks are given more time to respond to address queries. If neighbors are not responding, the neighbor is reset. Only the router that is not receiving responses from its neighbor will reset the adjacency. Before the SIA-retransmit timer was introduced, any router that did not receive a response to an active query after the Active timer expired would reset the neighbor adjacency, even if the problem was somewhere downstream in the network.

A good network design is the best solution to instabilities such as SIA routes. By using a combination of intelligent address assignment, route filtering, default routes, stub routing, and summarization, boundaries may be constructed in a large EIGRP network to restrict the size and scope of diffusing computations. Chapter 13 includes an example of such a design.