Troubleshooting EIGRP
Troubleshooting the exchange of RIP route information is a reasonably simple procedure. Routing updates are either propagated or they are not, and they either contain accurate information or they do not. The added complexity of EIGRP means an added complexity to the troubleshooting procedure. Neighbor tables and adjacencies must be
verified
, the query/response procedure of DUAL must be followed, and the influences of VLSM on automatic summarization must be
considered
.
This section's case study describes a sequence of events that typically can be used when pursuing an EIGRP problem. Following the case study is a discussion of an
occasional
cause of instabilities in larger EIGRP
internets
.
Case Study: A Missing Neighbor
Figure 7-38 shows a small EIGRP network. Users are complaining that subnet 192.168.16.224/28 is unreachable. An examination of the route tables reveals that something is wrong at router Grissom (Example 7-42).
Example 7-42. The route tables of Shepard and Grissom show that Grissom's EIGRP process is not advertising or receiving routes on subnet 192.168.16.16/28.
Grissom#
show ip route
Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, * - candidate default
U - per-user static route, o - ODR
Gateway of last resort is not set
192.168.16.0/24 is variably subnetted, 3 subnets, 2 masks
C 192.168.16.40/30 is directly connected, Serial0
C 192.168.16.16/28 is directly connected, Ethernet0
D 192.168.16.224/28 [90/2195456] via 192.168.16.42, 01:07:26, Serial0
_______________________________________________________________________________
Shepard#
show ip route
Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, * - candidate default
U - per-user static route, o - ODR
Gateway of last resort is not set
192.168.16.0/24 is variably subnetted, 4 subnets, 2 masks
C 192.168.16.36/30 is directly connected, Serial0
C 192.168.16.16/28 is directly connected, Ethernet0
D 192.168.16.192/28 [90/2297856] via 192.168.16.38, 01:07:20, Serial0
D 192.168.16.128/28 [90/307200] via 192.168.16.17, 01:07:20, Ethernet0
The following observations are made from the two route tables of Example 7-42:
-
Shepard does not have subnets 192.168.16.40/30 and 192.168.16.224/28 in its route table, although Grissom does.
-
Grissom's route table does not contain any of the subnets that should be advertised by Glenn or Shepard.
-
Shepard's route table contains the subnets advertised by Glenn (and Glenn's table contains the subnets advertised by Shepard, although its route table is not included in Example 7-42).
The conclusion to be drawn from these observations is that Grissom is not advertising or receiving routes correctly over subnet 192.168.16.16/28.
Among the possible causes, the simplest causes should be examined first. These follow:
-
An incorrect interface address or mask
-
An incorrect EIGRP process ID
-
A missing or incorrect network statement
In this case, there are no EIGRP or address configuration errors.
Next
, the neighbor tables should be examined. Looking at the neighbor tables at Grissom, Shepard, and Glenn (Example 7-43), two facts stand out:
-
Grissom (192.168.16.19) is in its neighbors' tables, but its neighbors are not in Grissom's neighbor table.
-
The entire network has been up for more than five hours; this information is reflected in the
uptime
statistic for all neighbors except Grissom. However, Grissom's uptime shows approximately one minute.
Example 7-43. Shepard and Glenn see Grissom as a neighbor, but Grissom does not see them. This suggests that Shepard and Glenn are receiving Hellos from Grissom, but Grissom is not receiving Hellos from Shepard and Glenn.
Grissom#
show ip eigrp neighbors
IP-EIGRP neighbors for process 75
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
0 192.168.16.42 Se0 11 05:27:11 23 200 0 8
___________________________________________________________________________
Shepard#
show ip eigrp neighbors
IP-EIGRP neighbors for process 75
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
1 192.168.16.19 Et0 12 00:01:01 0 5000 1 0
2 192.168.16.17 Et0 11 05:27:33 8 200 0 6
0 192.168.16.38 Se0 14 05:27:34 22 200 0 10
___________________________________________________________________________
Glenn#
show ip eigrp neighbors
IP-EIGRP neighbors for process 75
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
1 192.168.16.19 Et0 14 00:00:59 0 8000 1 0
2 192.168.16.18 Et0 10 05:30:11 9 20 0 7
0 192.168.16.130 Et1 12 05:30:58 6 20 0 7
If Grissom is in Shepard's neighbor table, Shepard must be receiving Hellos from it. Grissom, however, is apparently not receiving Hellos from Shepard. Without this two-way exchange of Hello packets, an
adjacency
will not be established and route information will not be exchanged.
A closer examination of Shepard's and Glenn's neighbor tables reinforces this hypothesis:
-
The SRTT for Grissom is 0, indicating that a packet has never made the round trip.
-
The RTO for Grissom has increased to five and eight seconds, respectively.
-
There is a packet enqueued for Grissom (Q Cnt).
-
The sequence number recorded for Grissom is 0, indicating that no reliable packets have ever been received from it.
These factors
indicate
that the two routers are trying to send a packet reliably to Grissom, but are not receiving an ACK.
In Example 7-44,
debug eigrp packets
is used at Shepard to get a better look at what is happening. All EIGRP packet types will be displayed, but a second debug command is used with it:
debug ip eigrp neighbor 75 192.168.16.19
. This command adds a filter to the first command. It
tells
debug eigrp packet
to display only IP packets of EIGRP 75 (the process ID of the routers in Figure 7-38) and only those packets that concern neighbor 192.168.16.19 (Grissom).
Example 7-44. The command
debug ip eigrp neighbor
is used to control the packets displayed by debug eigrp packets.
Shepard#
debug eigrp packets
EIGRP Packets debugging is on
(UPDATE, REQUEST, QUERY, REPLY, HELLO, IPXSAP, PROBE, ACK)
Shepard#
debug ip eigrp neighbor 75 192.168.16.19
IP Neighbor target enabled on AS 75 for 192.168.16.19
IP-EIGRP Neighbor Target Events debugging is on
EIGRP: Sending UPDATE on Ethernet0 nbr 192.168.16.19,
retry 14
, RTO 5000
AS 75, Flags 0x1, Seq 22/0 idbQ 1/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 serno
1-4
EIGRP: Received HELLO on Ethernet0 nbr 192.168.16.19
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/1
EIGRP: Sending UPDATE on Ethernet0 nbr 192.168.16.19,
retry 15
, RTO 5000
AS 75, Flags 0x1, Seq 22/0 idbQ 1/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 serno
1-4
EIGRP: Received HELLO on Ethernet0 nbr 192.168.16.19
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/1
EIGRP: Sending UPDATE on Ethernet0 nbr 192.168.16.19,
retry 16
, RTO 5000
AS 75, Flags 0x1, Seq 22/0 idbQ 1/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 serno
1-4
EIGRP: Received HELLO on Ethernet0 nbr 192.168.16.19
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/1
EIGRP: Retransmission retry limit exceeded
EIGRP: Received HELLO on Ethernet0 nbr 192.168.16.19
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0
EIGRP: Enqueueing UPDATE on Ethernet0 nbr 192.168.16.19 iidbQ un/rely 0/1 peerQ
un/rely 0/0 serno 1-4
EIGRP: Sending UPDATE on Ethernet0 nbr 192.168.16.19
AS 75, Flags 0x1, Seq 23/0 idbQ 1/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 serno
1-4
Example 7-44 shows that Hello packets are being received from Grissom. It also shows that Shepard is attempting to send updates to Grissom; Grissom is not acknowledging them. After the 16th retry, the message "Retransmission retry limit exceeded" is displayed. This exceeded limit accounts for the low uptime shown for Grissom in the neighbor tableswhen the retransmission retry limit is exceeded, Grissom is removed from the neighbor table. But because Hellos are still being received from Grissom, it quickly reappears in the table and the process begins again.
Example 7-45 shows the output from
debug eigrp neighbors
at Shepard. This command is not IP specific, but instead shows EIGRP neighbor events. Here, two instances of the events described in the previous paragraph are displayed: Grissom is declared dead as the retransmission limit is exceeded but is immediately "revived" when its next Hello is received.
Example 7-45.
debug eigrp neighbors
displays neighbor events.
Shepard#
debug eigrp neighbors
EIGRP Neighbors debugging is on
Shepard#
EIGRP: Retransmission retry limit exceeded
EIGRP: Holdtime expired
EIGRP: Neighbor 192.168.16.19 went down on Ethernet0
EIGRP: New peer 192.168.16.19
EIGRP: Retransmission retry limit exceeded
EIGRP: Holdtime expired
EIGRP: Neighbor 192.168.16.19 went down on Ethernet0
EIGRP: New peer 192.168.16.19
Although Example 7-44 shows that update packets are being sent to Grissom, observation of EIGRP packets at that router shows that they are not being received (Example 7-46).
Example 7-46. Grissom is exchanging Hellos with Cooper via interface S0 and is sending Hellos out E0. However, Grissom is not receiving any EIGRP packets on interface EO.
Grissom#
debug eigrp packets
EIGRP Packets debugging is on
(UPDATE, REQUEST, QUERY, REPLY, HELLO, IPXSAP, PROBE, ACK)
Grissom#
EIGRP: Sending HELLO on Serial0
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
EIGRP: Received HELLO on Serial0 nbr 192.168.16.42
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/0
EIGRP: Sending HELLO on Ethernet0
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
EIGRP: Sending HELLO on Serial0
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
EIGRP: Received HELLO on Serial0 nbr 192.168.16.42
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/0
EIGRP: Sending HELLO on Ethernet0
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
EIGRP: Sending HELLO on Serial0
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
EIGRP: Sending HELLO on Ethernet0
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
EIGRP: Received HELLO on Serial0 nbr 192.168.16.42
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/0
EIGRP: Sending HELLO on Serial0
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
EIGRP: Sending HELLO on Ethernet0
AS 75, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
Because Grissom is successfully exchanging Hellos with Cooper, Grissom's EIGRP process must be working. Suspicion therefore
falls
on Grissom's Ethernet interface. An inspection of the configuration file shows that an access list is configured as an incoming filter on E0 in Example 7-47.
Example 7-47. An incoming access-list is
denying
EIGRP packets.
interface Ethernet0
ip address 192.168.16.19 255.255.255.240
ip access-group 150 in
!
!
access-list 150 permit tcp any any established
access-list 150 permit tcp any host 192.168.16.238 eq ftp
access-list 150 permit tcp host 192.168.16.201 any eq telnet
access-list 150 permit tcp any host 192.168.16.230 eq pop3
access-list 150 permit udp any any eq snmp
access-list 150 permit icmp any 192.168.16.224 0.0.0.15
When EIGRP packets are received at Grissom's E0 interface, they are first filtered through access list 150. They will not match any entry on the list and are therefore being dropped. The problem is resolved (Example 7-48) by adding the following entry to the access list:
access-list 150 permit eigrp 192.168.16.16 0.0.0.15 any
Example 7-48. When an entry is added to the access list to permit EIGRP packets, Grissom's neighbor and route tables show that it now has routes to all subnets.
Grissom#
show ip eigrp neighbors
IP-EIGRP neighbors for process 75
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
2 192.168.16.17 Et0 10 00:06:20 4 200 0 41
1 192.168.16.18 Et0 14 00:06:24 15 200 0 85
0 192.168.16.42 Se0 10 06:22:56 22 200 0 12
Grissom#
show ip route
Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, * - candidate default
U - per-user static route, o - ODR
Gateway of last resort is not set
192.168.16.0/24 is variably subnetted, 6 subnets, 2 masks
C 192.168.16.40/30 is directly connected, Serial0
D 192.168.16.36/30 [90/2195456] via 192.168.16.18, 00:06:27, Ethernet0
C 192.168.16.16/28 is directly connected, Ethernet0
D 192.168.16.224/28 [90/2195456] via 192.168.16.42, 00:06:12, Serial0
D 192.168.16.192/28 [90/2323456] via 192.168.16.18, 00:06:27, Ethernet0
D 192.168.16.128/28 [90/307200] via 192.168.16.17, 00:06:12, Ethernet0
Grissom#
Stuck-in-Active Neighbors
When a route goes active and queries are sent to neighbors, the route will
remain
active until a reply is received for every query. But what happens if a neighbor is dead or
otherwise
incapacitated and cannot reply? The route would stay permanently active. The active timer and SIA-retransmit timer are designed to prevent this situation. Both the active timer and the SIA-retransmit timer are set when a query is sent. If the SIA-retransmit timer is not supported by the router's IOS (IOS versions earlier then 12.2[4.1]), only the active timer is used. If the timers expire before a reply to the query is received, the route is declared
stuck-in-active,
the neighbor is
presumed
dead, and it is flushed from the neighbor table.
The SIA route and any other routes via that neighbor are eliminated from the route table. DUAL will be satisfied by considering the neighbor to have replied with an infinite metric.
In reality, this sequence of events should never happen. The loss of Hellos should identify a disabled neighbor long before the active timer
expires
.
But what happens in large EIGRP networks where a query might, like the
bunny
in the battery advertisement, keep going and going? Remember that queries cause the diffusing calculation to grow larger, whereas replies cause it to grow smaller (refer to Figure 7-6). Queries must eventually reach the edge of the network, and replies must eventually begin coming back, but if the diameter of the
diffusing
calculation grows large enough, an active timer might expire before all replies are received. The result, flushing a
legitimate
neighbor from the neighbor table, is obviously destabilizing.
When neighbors mysteriously disappear from neighbor tables and then reappear, or users complain of intermittently unreachable destinations, SIA routes might be the culprit. Checking the error logs of routers is a good way to find out whether SIAs have occurred (Example 7-49).
Example 7-49. The final entry of this error log shows a SIA message.
Gagarin#
show logging
Syslog logging: enabled (0 messages dropped, 0 flushes, 0 overruns)
Console logging: level debugging, 3369 messages logged
Monitor logging: level debugging, 0 messages logged
Trap logging: level informational, 71 message lines logged
Buffer logging: level debugging, 3369 messages logged
Log Buffer (4096 bytes):
...
...
...
DUAL: dual_rcvupdate(): 10.51.1.0/24 via 10.1.2.1 metric 409600/128256
DUAL: Find FS for dest 10.51.1.0/24. FD is 4294967295, RD is 4294967295 found
DUAL: RT installed 10.51.1.0/24 via 10.1.2.1
DUAL: Send update about 10.51.1.0/24. Reason: metric chg
DUAL: Send update about 10.51.1.0/24. Reason: new if
DUAL: dual_rcvupdate(): 10.52.1.0/24 via 10.1.2.1 metric 409600/128256
DUAL: Find FS for dest 10.52.1.0/24. FD is 4294967295, RD is 4294967295 found
%DUAL-3-SIA: Route 10.11.1.0/24 stuck-in-active state in IP-EIGRP 1. Cleaning up
Gagarin#
When chasing the cause of SIAs, close attention should be paid to the topology table in routers. If routes can be "caught" in the active state, the neighbors from whom queries have not yet been received should be noted. For example, Example 7-50 shows a topology table in which several routes are active. Notice that most of them have been active for 15 seconds and that one (10.6.1.0) has been active for 41 seconds.
Example 7-50. This topology table shows several active routes, all of which are waiting for a reply from neighbor 10.1.2.1.
Gagarin#
show ip eigrp topology
IP-EIGRP Topology Table for process 1
Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply
r - Reply Status
A 10.11.1.0/24, 0 successors, FD is 3072128000, Q
1 replies, active 00:00:15, query-origin: Local origin
Remaining replies:
via 10.1.2.1, r, Ethernet0
A 10.10.1.0/24, 0 successors, FD is 3584128000, Q
1 replies, active 00:00:15, query-origin: Local origin
Remaining replies:
via 10.1.2.1, r, Ethernet0
A 10.9.1.0/24, 0 successors, FD is 4096128000, Q
1 replies, active 00:00:15, query-origin: Local origin
Remaining replies:
via 10.1.2.1, r, Ethernet0
A 10.2.1.0/24, 1 successors, FD is Inaccessible, Q
1 replies, active ve 00:00:15, query-origin: Local origin
Remaining res:
via 10.1.2.1, r, Ethernet0
P 10.1.2.0/24, 1 successors, FD is 281600
via Connected, Ethernet0
A 10.6.1.0/24, 0 successors, FD is 3385160704, Q
1 replies, active 00:00:41, query-origin: Local origin
Remaining replies:
via 10.1.2.1, r, Ethernet0
A 10.27.1.0/24, 0 successors, FD is 3897160704, Q
--More-
Notice also that in each case, the neighbor 10.1.2.1 has its reply status flag (r) set. That is the neighbor from which replies have not yet been received. There might be no problem with the neighbor itself or with the link to the neighbor, but this information points to the direction within the network topology in which the investigation should proceed.
Common causes of SIAs in larger EIGRP networks are heavily
congested
, low-bandwidth data links and routers with low memory or overutilized CPUs. The problem will be exacerbated if these limited resources must handle very large
numbers
of queries.
The careless adjustment of the bandwidth parameter on interfaces might be another cause of SIAs. Recall that EIGRP is designed to use no more than 50 percent of the available bandwidth of a link. This restriction means that EIGRP's pacing is keyed to the configured bandwidth. If the bandwidth is set artificially low in an attempt to manipulate routing choices, the EIGRP process might be starved. If IOS 11.2 or later is being run, the command
ip bandwidth-percent eigrp
may be used to adjust the percentage of bandwidth used.
For example, suppose that an interface is connected to a 56K serial link, but the bandwidth is set to 14K. EIGRP would limit itself to 50 percent of this amount, or 7K. The commands in Example 7-51 adjust the EIGRP bandwidth percent to 200 percent200 percent of 14K, which is 50 percent of the actual bandwidth of the 56K link.
Example 7-51. Router configuration
adjusts
the percentage of the configured bandwidth that EIGRP will use.
interface Serial 3
ip address 172.18.107.210 255.255.255.240
bandwidth 14
ip bandwidth-percent eigrp 1 200
Increasing the active timer period with the
timers active-time
command might help avoid SIAs in some situations, but this step should not be taken without careful consideration of the effects it might have on reconvergence.
A new timer, the SIA-retransmit timer, and the two new EIGRP packet types, SIA-query and SIA-reply, help to minimize SIAs and to push the reset of the neighbor to link that is actually having the problem responding to queries.
Consider the network in Figure 7-39. From router Mercury, EIGRP will route traffic to network 172.16.100.0 via Apollo. Vostok is not a feasible successor because the metric from Vostok to 172.16.100.0 is too high. Vostok routes traffic to 172.16.100.0 via Mercury and Apollo. Soyuz is not a
feasible
successor because the metric from Soyuz to 172.16.100.0 is too high.
When the link between Mercury and Apollo fails, as shown in Figure 7-40, Mercury places the address 172.16.100.0 (and any other address known via neighbor Apollo) into Active, and sends a query to Vostok. Vostok also places the address into Active state, and sends a query to Soyuz. The Active timers are set. In addition, the SIA-retransmit timers are set. The SIA-retransmit timer is set to one-half the value of the Active timer, typically 90 seconds.
When the SIA-retransmit timer expires, Mercury sends an SIA-query to Vostok. Vostok sends an SIA-query to Soyuz. Vostok responds to the SIA-query from Mercury with an SIA-reply. Mercury resets the Active timer and the SIA-retransmit timer. The routers will send up to three SIA-queries (
assuming
no reply has been received from the original address query) as long as SIA-replies are received, before resetting a neighbor. So as long as a neighbor router responds to the SIA-queries, it won't be declared
stuck-in-active
and reset, for six minutes, assuming a default Active time of 180 seconds. This gives ample time for a large network to respond to queries.
But, say there is a problem on the link from Vostok to Soyuz that is allowing enough Hellos to get through to keep the neighbors active, but the SIA-reply is not received by Vostok within the SIA-retransmit time. If no SIA-reply is received within 90 seconds of a SIA-query, and no response to the original address query has been received, Vostok will reset neighbor Soyuz and reply to Mercury's original query that the address is unreachable.
The SIA-retransmit timer does two things. If neighbors are responding to SIA-queries, large networks are given more time to respond to address queries. If neighbors are not responding, the neighbor is reset. Only the router that is not receiving responses from its neighbor will reset the adjacency. Before the SIA-retransmit timer was introduced, any router that did not receive a response to an active query after the Active timer expired would reset the neighbor adjacency, even if the problem was somewhere downstream in the network.
A good network design is the best solution to instabilities such as SIA routes. By using a combination of
intelligent
address assignment, route filtering, default routes, stub routing, and summarization, boundaries may be
constructed
in a large EIGRP network to restrict the
size
and scope of diffusing
computations
. Chapter 13 includes an example of such a design.
|