9.2. Security and Reliability FeaturesOSPF and IS-IS have a number of featuressome of them part of the protocol specifications, some of them extensions of the protocol, and some of them inherent characteristics of the protocolthat increase both the security and the reliability of the protocol. 9.2.1. Inherent SecurityIS-IS has one significant security advantage over OSPF, which is that the protocol messages themselves are not carried in IP packets. Because of this, IS-IS cannot be attacked by sending faked protocol messages from an external source. Attacks on IS-IS require physical access to a link or router, or logical access such as Telnet or SNMP to a router running IS-IS. In cases where OSPF accepts only IP packets addressed to the multicast AllSPFRouters address (224.0.0.5), the protocol is safe from faked packets sent from outside the network because the address has a link-local scope, and routers do not forward packets with such a destination address. Unfortunately, RFC 2328 requires only OSPF point-to-point interfaces to be limited to accepting this destination address. Interfaces to all other OSPF network types can accept unicast packets and so can be reached from external sources if not protected by other means. An inherent security feature normally mentioned in association with OSPF but that also applies to IS-IS is "fightback."[2] Fightback is the result of normal protocol behavior in which a router, seeing an LSA or LSP that it supposedly originated but which does not match its own LS database information, will originate a new LSA or LSP or will attempt to flush the bogus LSA or LSP. So if an attacker tries to send a spoofed LSA or LSP that pretends to be from a legitimate router on the network, the effects of the bogus PDU is limited because the legitimate router will eventually see it and take measures to remove it. Attacks against sequence numbers or age values and attacks that attempt to inject false link state information should all trigger a fightback.
Because of the fightback behavior, an attack that lobs just a few bogus LSAs or LSPs into a network will not be very effective. The attacker must send persistent PDUs to defeat the fightback behavior, but this in turn increases the exposure of the attacker. If the attacker is willing to accept the exposure, or can otherwise hide himself, the persistent PDUs will overcome the fightback and create sufficient thrashing of SPF processes and routing information to effectively disrupt the domain, resulting in a successful denial of service. The fightback behavior also can be circumvented by PDUs originated by a "phantom" router or if the target router can be fooled into thinking the bogus PDUs come from a legitimate but partitioned router.[3] So you should not rely too much on fightback to make your network safe.
9.2.2. AuthenticationEnabling authentication is one of the two most important measures you can take to secure any routing protocol (the other is good filtering practice). Authentication, which is supported by both OSPF and IS-IS, is simply a mechanism by which two neighbors prove their identity to each other by using a shared secret. No protocol message is accepted from a neighbor unless the message is correctly authenticated. As a result, no logical attack involving sending spoofed messages can be launched unless the attacker can learn the shared secret. Authentication is also useful in preventing certain non-malicious errors. Specifically, it can prevent routers from mistakenly joining an OSPF domain. For example, a service provider might accidentally enable OSPF on a customer-facing external link. At the same time, the customer might through accident or ignorance have OSPF running on his external link to the service provider. If both interfaces on the link happen to have the same AID, OSPF can create an adjacency, and the two domains can merge. If the provider and customer are lucky, nothing more than an undesirable situation will arise. If they are unlucky, address conflicts, incorrect routing, security breaches, and network outages will result. Although such a scenario might seem far-fetched, these mistakes can and do occur. 9.2.2.1. Authentication TypesAuthentication is accomplished by means of either a simple password shared between neighbors or an MD5 (Message Digest version 5) [4] cryptographic checksum. When simple password authentication is used, two neighbors share a password, and all messages exchanged between them must contain the password. Any message that does not include the correct password is dropped. Although it is better than no authentication at all, this mechanism is not secure. The password is carried in the messages in clear text, so if an attacker can gain access to a link and "sniff" a protocol message, he can easily read the password. Figures 9.1 and 9.2 show protocol analyzer captures of OSPF and IS-IS Hellos, respectively, that are using simple password authentication. The password in both is easily read.
When MD5 authentication is used, neighbors share a passwordcalled a keybut the key is never exchanged between the neighbors. Instead, the neighbor originating a packet computes a 128-bit digital fingerprint called a hash or message digest by running a mathematical algorithm using a combination of the packet contents and the key. The hash is then added to the packet, and the packet is transmitted. The receiving neighbor, knowing the same secret key, runs the same computation against the packet contents and the key. If the resulting hash is identical to the hash contained in the packet, the packet is accepted. If the authentication fails, the packet is dropped.[5]
Note that MD5, as described in RFC 1321, is an algorithm only for encrypting some data string. The adaptation of this algorithm to hash together a combination of a key and a message for authentication is called Hashed Message Authentication Code (HMAC-MD5).[6] Although IS-IS documentation correctly references HMAC-MD5, OSPF documentation consistently refers to just MD5 authentication (undoubtedly a legacy of having supported message/key hashing authentication years earlier than IS-IS did). The cryptographic authentication used by both OSPF and IS-IS is HMAC-MD5, so do not let the documentation mislead you into thinking they differ.
Figures 9.3 and 9.4 again show captured OSPF and IS-IS Hellos, but this time MD5 authentication is used. The shared passwords are again stan for OSPF and ollie for IS-IS, but those names are nowhere to be found in the packets. Instead, there are 16-byte (128-bit) message digests that appear to be random numbers.
Yet another reason for using MD5 authentication is that it provides stronger error detection than either the default OSPF message checksums or the optional IS-IS checksums. See Section 9.2.3 for more information.
9.2.2.2. OSPF AuthenticationPrior to RFC 2178, OSPF authentication had only an area scope. That is, authentication had to be enabled on all routers and over all links in an area or not at all. RFC 2178 changes that requirement, so that now authentication can be enabled on a link scope; that is, authentication can occur over a single link without having to be enabled on other links in an area. Although authentication is highly encouraged on all links, this change does give you some flexibility in accommodating routers that might not support OSPF authentication. (But that, in turn, raises the question of the wisdom of allowing any OSPF router that does not support authentication into your network.) The OSPF authentication type and authentication data is carried in the header of every OSPF message (Figure 9.5). The AuType field indicates the type of authentication used:
Figure 9.5. The OSPF header format when null or simple password authentication is used.
When Null authentication is used, receiving routers ignore the 64-bit authentication field. Therefore, the field can contain anything; most implementations will set it to all 0s. If simple password authentication is used, the password is carried in the authentication field. Because the field is 64 bits in length, and an ASCII character is 8 bits, the password can be up to 8 characters. If the password is less than 8 characters, 0s are appended to pad the field out to 64 bits. If MD5 authentication is used, the header format changes as shown in Figure 9.6. The 128-bit message digest (hash) is appended to the end of the message. The Authentication Data length field specifies, in bytes, the length of the appended message digest. Because the hash is always 128 bits (16 bytes), the value of this field should always be 16. The message digest is not considered a part of the OSPF message, and is not accounted for in the Packet Length field of the OSPF header. But as a part of the overall IP packet payload, it is accounted for in the Total Length field of the IP packet header. Figure 9.6. The OSPF header format when MD5 authentication is used.
OSPF has a nice feature for changing keys without disrupting the adjacency between neighbors. You can configure multiple keys on an interface and assign each a numeric identifier between 1 and 255. For every message the router sends on the interface, it sends a copy authenticated by each key with the identifier carried in the Key ID field. Neighbors look at the key ID and, if they have a key with the same identifier, use that key for authentication. If there is no matching identifier, the message is dropped. Thus, when you are changing keys, messages continue to be authenticated using the old key. After all neighbors have been configured with the new key, and messages are being authenticated with that key, you can go back and delete the old key. OSPF MD5 authentication also includes a cryptographic sequence number, which is used to protect against replay attacks. A replay attack is one in which authenticated OSPF packets are copied off of a link and then replayed onto the link at a later time to disrupt or confuse communication between two OSPF neighbors. The cryptographic sequence number is a 32-bit number that the router associates with a neighbor; the router increments the number regularly, and whenever an OSPF message is sent to the neighbor the current value of the number is added to the Cryptographic Sequence Number field. The neighbor, upon receipt of a message, remembers the number. If a subsequent message is received with a cryptographic sequence number less than the current known value, the message is dropped. The idea is that if an OSPF message has been sniffed from the link and replayed at a later time, its sequence number should no longer be valid. RFC 2328 does not specify a period for incrementing the cryptographic sequence number, leaving that decision up to individual implementers, but it suggests incrementing based on a simple counter or the system clock. A potential problem with this sequence is that there is no provision for a rollover procedure from the maximum value to 0. When a router reaches the maximum value (232), it resets the number to 0. However, then subsequent messages have values less than the last known number, and the neighbor drops the messages. Because the messagesparticularly Hellosare dropped, the adjacency times out when the RouterDeadInterval expires. When the neighbor changes the router's state to Down, it resets the expected sequence number for the router to 0. Then when the router begins sending messages to reestablish the adjacency, the neighbor will accept them. In reality, rollover should not be a problem in a normally functioning network. With a 32-bit sequence number starting at 0, if the sequence number is incremented once per second it will take more than 135 years to reach the maximum value. Although cryptographic sequence numbers can prevent some replay attacks, they are not foolproof and in fact can be exploited for a disruptive attack. Notice in Figure 9.3 that the captured packet's cryptographic sequence number, 0x414764d7, is clearly displayed. An attacker could modify the sequence number of a captured message, increasing the number by a significant amount. The message can then be replayed onto the link. The target neighbor will accept these packets with the "more recent" number and begin dropping the legitimate messages, causing the adjacency to fail. When that happens, the attacker can continue to replay the high-numbered messages onto the link, usurping the legitimate messages. The Authentication Type and Authentication information is stored in the interface data structure (except for the cryptographic sequence number, when used, which is stored in the neighbor data structure). As this implies, you can configure authentication differently on each interface. However, this is usually overkill. Unless you have reason to mistrust a specific neighbor, it is more manageable to use the same authentication key throughout the OSPF domain. 9.2.2.3. IS-IS AuthenticationThe IS-IS authentication type and authentication data is carried in the Authentication Information TLV (Figure 9.7). The TLV type is 10, [7] and the TLV can be carried in all IS-IS PDU types. ISO 10589 only specifies clear-text password authentication, but because the writers recognized that other authentication methods would be desirable they included an Authentication Type field to specify the method used and a variable Value field that can accommodate a wide range of authentication data.
Figure 9.7. The IS-IS Authentication Information TLV.
The currently assigned values of the authentication type field are:
The other possible values of the field are reserved for future use. There is no null or "no authentication" value for IS-IS, as there is for OSPF, because the Authentication Information TLV is optional. If authentication is not configured, the TLV is not included in any IS-IS PDUs. Type 255 authentication is, as the name says, for privately developed authentication mechanisms. When type 1 clear-text password authentication is used, the Authentication Value field carries the ASCII representation of the password. Because the TLV Length field is 1 byte, the maximum length that can be specified is 255 bytes. And because each ASCII character of a clear-text password is 1 byte, the largest password the TLV can carry is 255 characters, although some implementations might limit you to a smaller password length. IS-IS HMAC-MD5 cryptographic authentication, authentication type 54, is specified in RFC 3567.[8] As with OSPF, the algorithm takes as input the message to be sent and a secret key, and creates a 128-bit cryptographic hash. The hash is carried in the Authentication Value field of the Authentication Information TLV; receiving routers run the same algorithm against the message contents using their own key, and compare the resulting hash with the hash in the Authentication Information TLV. If the originator and receiver have the same key, the hashes should match and the message is authenticated. If the hashes do not match, the message is rejected.
The Checksum and Remaining Lifetime LSP fields are set to 0 by both the originator and the receiver before calculating the hash to negate any influence of a change in either field during transmission might have on the resulting hash. Changing the values of these fields to 0 is only for the authentication algorithm; the actual values of the fields are stored separately. The total size of the Authentication Information TLV when HMAC-MD5 authentication is used is 19 bytes (a 1-byte type field, 1-byte length field, 1-byte Authentication Type field, and a 16-byte Authentication Value field). IS-IS HMAC-MD5 authentication does not have a sequencing mechanism like that used by OSPF. So if an attacker can gain physical access to an IS-IS link, it is possible to run a replay attack; IS-IS would then need to rely on fightback characteristics to resist the attack. IS-IS authentication has three possible scopes:
When Link authentication scope is enabled, the Authentication Information TLV is carried in Hello PDUs. When Area authentication is enabled, the TLV is carried in all L1 LSPs and SNPs; and when Domain authentication is enabled, the TLV is carried in all L2 LSPs and SNPs. Some IS-IS implementations might not include a separate Link authentication scope, and instead include L1 Hellos in the Area scope and L2 Hellos in the Domain scope. For each scope supported, you have the option of using the same or separate keys. For example, with the Link scope you can use separate keys on each interface, and with the Area scope you can use separate keys in each area. All three authentication scopes, and authentication of all IS-IS PDUs, should be supported by any implementation, but in reality support can vary. To compensate for this fact some IS-IS implementations will allow you more detailed control over what is authenticated. For example, you might be able to authenticate Hellos, LSPs, and CSNPs but ignore authentication of PSNPs. Juniper Networks JUNOS, for instance, provides the following options that can be enabled for L1, L2, or both:
Such intentionally reduced support should be used only when absolutely necessary, to accommodate a system within the domain that does not include full IS-IS authentication. The wisest approach, of course, is to ensure that all systems you install in your network have full HMAC-MD5 authentication support. Like OSPF, IS-IS allows you to gracefully install or change authentication without breaking adjacencies, but instead of using a key-id scheme it exploits a basic characteristic of the protocol: If an unknown TLV is received, it is ignored. So if an IS-IS router on which authentication is not enabled receives a PDU with a type 10 TLV, the TLV is ignored, and the PDU is accepted. Some implementations therefore allow you to send authenticated ISIS PDUs while accepting PDUs whether they are authenticated or not. You can enable this option on all routers in the affected scope, enable or change the keys, and then disable the option on all routers so that only authenticated PDUs are accepted. (The Cisco Systems IOS command is isis authentication send-only and the Juniper Networks JUNOS command is noauthentication-check.) This scheme proves particularly useful for enabling authentication on an operational IS-IS network without scheduling downtime. For changing keys, it appears to be more operationally intense than the OSPF key-id method; but as recommended previously, regular key changes should be performed with scripts, and scripts can easily incorporate this procedure. 9.2.3. ChecksumsThe OSPF message header (Figure 9.8) includes a Checksum field for helping to verify the integrity of the message. The checksum algorithm is the same one used in most IP headers: The message, except for the 64-bit Authentication field, is divided into 16-bit sections, and the one's complement sum of all these segments is calculated. The one's complement of that sum is then calculated (so that the result is the one's complement of the one's complement sum) and included in the Checksum field before transmission.[9] Receiving OSPF routers make the same calculation and compare the results. Conflicting checksum values indicate that an error has occurred during transmission.
Figure 9.8. The header used by all five OSPF message types includes an IP-style 16-bit checksum field.
This checksum algorithm is weaker than other error-detection algorithms, such as the cyclical redundancy check (CRC) used by some data-link protocols. It cannot, for instance, detect multiple canceling bit errors or the reordering of bytes. Interestingly, the checksum algorithm used with OSPF LSAs is not the IP-style one's complement checksum, but is instead an ISO-style Fletcher checksum.[10] Fletcher checksums are also based on one's complement arithmetic but use a more complicated algorithm than IP-style checksums, producing error detection on par with CRC.
Unlike OSPF, the IS-IS PDU header (Figure 9.9) does not have a Checksum field. So whereas LSPs have a 16-bit Fletcher checksum, Hellos and SNPs must rely on the underlying data-link error detection, if any. To address the concerns about this data-link dependence, RFC 3358 adds an optional checksumming capability.[11] When this option is supported, a Checksum TLV (Figure 9.10) is added to Hellos and SNPs, a 16-bit Fletcher checksum is calculated over the full contents of the PDU, and the result is carried in the Value field of the TLV. The TLV type is 12, and the length of the Value field is 2 bytes.
Figure 9.9. The IS-IS PDU header does not have a Checksum field.
Figure 9.10. The optional IS-IS Checksum TLV.
If a router supporting the optional checksum capability receives a Hello or SNP with a Checksum TLV and the checksum fails, the router rejects the message. However, if it receives a message that does not contain a Checksum TLV, the router accepts the message. This provides backward compatibility with systems that do not support the option. Systems that do not support the option ignore the Checksum TLV and accept otherwise-valid messages containing them. If MD5 authentication is used, and a transmission error causes a change to an OSPF or IS-IS message, the authentication check will fail and the receiver will reject the message. MD5 is stronger at detecting errors than either IP-style checksums or Fletcher checksums, providing yet another reason for using MD5 authentication in your OSPF or IS-IS network. Therefore, the standard checksum procedures of OSPF and the optional checksum procedures of IS-IS change when MD5 authentication is enabled. When OSPF MD5 authentication is enabled, the router does not calculate a checksum and sets the checksum field in the header to 0x0000. Similarly, if an IS-IS router supports optional checksums and HMAC-MD5 authentication is enabled, it either sets the checksum value of the Checksum TLV to 0x0000 or it does not include the TLV in the message at all. This procedural change is particularly important for IS-IS because the originator sets the value of the Checksum TLV to 0 before calculating the MD5 hash. Receiving systems that do not support the checksum option will ignore the Checksum TLV and accept the packet, but will include the TLV in the MD5 calculation, causing an authentication failure. 9.2.4. Graceful RestartOne of the first things we all learned about routing is that it consists of two basic processes: path determination and packet forwarding. Modern high-performance routers implement them as separate physical components, with their own processors and memory, as depicted in Figure 9.11 When routing protocol messages are received, the packet forwarding module (of which the router interfaces are a part) sends the messages to the route processing module. The routing protocols running on the route processing module create a routing information database (RIB). The best path to each destination in the RIB is chosen, and this information is used to form the forwarding information database (FIB), which the route processing module sends to the packet forwarding module. The packet forwarding module then forwards according to the information in the FIB without having to directly consult the route processing module. Figure 9.11. A conceptual model of high-performance routers in which discrete components perform route processing and packet forwarding.This delegation of the two basic processes to separate physical components provides performance advantages during times of heavy load: The path determination component can process many routes during times of severe network change without taking resources from the packet forwarding component, and the packet forwarding component can handle peak traffic loads without taking resources from the path determination component. A side effect of this architecture is that so long as the network architecture does not change the packet forwarding module can continue to forward packets based on the FIB even if the route processing module stops operating. This is the basis of graceful restart, also called nonstop forwarding: If the routing protocol stops and restarts for some reason, such as a software error causing a protocol reset, a switchover to a backup route processing module, or a manual reset as part of operational maintenance, the router can continue forwarding packets based on the FIB created before the restart. Thus, graceful restart contributes to area stability both by maintaining forwarding paths during a restart and by reducing the LSA/LSP flooding and SPF/FIB churn normally accompanying a router restart. The contingency is that if the network topology changes while the routing protocol is down, the accuracy of the FIB can no longer be assumed and forwarding must stop until the restart is complete. Any routing protocol can support graceful restart; this section examines the details for graceful restart for OSPF and IS-IS. 9.2.4.1. OSPF Graceful Restart[12]
Under normal OSPF procedures, when a router restarts, all its adjacencies are broken. If the restart is planned, the router breaks its adjacencies by flushing all LSAs it originated. If the restart is unplanned, the router's neighbors break the adjacencies when they cease receiving Hellos. When a neighbor detects the restart, it refloods its LSAs, indicating that its links to the restarting router are no longer available. With all neighbors following this procedure, traffic is rerouted around the restarting router, avoiding the potential of routing loops or black holes resulting from a loss of synchronization and a possibly corrupted FIB. Graceful restart modifies these procedures so that, for a limited time, the restarting router remains in the forwarding path of any routes that passed through the router before the restart. The key conditions that enable OSPF graceful restart are:
When an OSPF router begins a graceful restart, it sends a Grace LSA that indicates to its neighbors the time, in seconds, that the neighbors should continue to treat the router as fully adjacent (that is, in a state of full database synchronization) and the reason for the restart. During this time, called the grace period, the neighbors supporting this graceful restart are called helpers and their state is helper mode. The helper neighbor is responsible for detecting topological changes during the grace period and responding appropriately. 9.2.4.1.1. Planned RestartsA planned restart is one in which the OSPF process is administratively restarted. The protocol has the opportunity, in this situation, to notify its neighbors that it is restarting gracefully. The administrator, as a part of the restart request, can specify the grace period or can accept the default grace period. For example, JUNOS has a default OSPF grace period of 180 seconds. The grace periodwhether default or specifiedshould be less than the LSRefreshTime (1800 seconds) so that the LSAs the router originated before restart do not age out of the LS databases. When a graceful restart is requested, the restarting router first records the cryptographic sequence numbers for each restarting interface. It then issues a Grace LSA to its neighbors on each restarting interface. The router does not flush its LSAs from area databases as it would under normal restart procedures. The router records the grace period, and begins its restart. The graceful restart ends when any of the following occurs:
When graceful restart ends, the restarting router re-originates its type 1 LSA and (if it is the designated router) its type 2 LSA. It flushes its Grace LSAs, reruns its routing calculations, and updates its FIB. Invalid FIB entries are removed; invalid locally originated LSAs are flushed; and type 3, 4, 5, and 7 LSAs are reflooded as necessary. In some circumstances, a neighbor will not enter helper mode, even if it is helper capable. For example, if there are LSAs in the neighbor's LS retransmission list for the restarting router other than periodically refreshed LSAs (an LSA change indicates a topological change) the neighbor will not enter helper mode. A router can act as helper for multiple restarting neighbors, but cannot enter helper mode if it is itself restarting. A helper neighbor exits helper mode when any of the following occurs:
When a router exits helper mode, it refloods its type 1 LSA. If the OSPF network type of the link to the restarting neighbor is broadcast, the router recalculates the DR and, if it is the DR, it refloods its type 2 LSA. Note that a neighbor that does not support graceful restart will ignore the Grace LSA. This neighbor will follow normal OSPF procedures, reflooding its type 1 LSA, indicating that the link to the restarting router is no longer available. This changed LSA causes any neighbors of the restarting router that are in helper mode to exit helper mode, and the restarting router to exit graceful restart by the rules stated in the above bulleted lists. This behavior permits backward compatibility, but also means that for graceful restart to be fully effective all routers should support it. 9.2.4.1.2. Unplanned RestartsUnplanned restarts are the result of such anomalies as routing process failures and unexpected switchover to a backup route processor. Procedures for an unplanned graceful restart are the same as for a planned restart, except that the restarting router sends its Grace LSAs after the restart rather than before. The Grace LSAs must be sent on all OSPF interfaces before Hellos are sent, and with the restart reason set to 0 or 3 (see the following subsection). An unplanned graceful restart is successful only if the neighbor's RouterDeadInterval does not expire during the restart period. If this timer expires, the neighboring router originates a new LSA, stopping the graceful restart process. A concern with unplanned restarts is that a software crash causing the restart could corrupt the FIB. As a result, RFC 3623 leaves it to the implementer to decide whether to support unplanned graceful restarts. 9.2.4.1.3. The Grace LSAThe Grace LSA is a type 9 Opaque LSA. (Opaque LSAs are discussed in Section 10.1.2.) This LSA type has link-local scope, meaning it is never flooded beyond a directly connected neighbor. The Opaque Type is 3 and the Opaque ID is 0. Figure 9.12 shows the format of the Grace LSA. The information in the LSA is contained in three TLVs:
Figure 9.12. The Grace LSA.
9.2.4.1.4. Cisco Systems NSFCisco Systems signals its Non-Stop Forwarding (NSF) capability as described in the Internet drafts "OSPF Restart Signaling,"[13] "OSPF Link-Local Signaling,"[14] and "OSPF Out-of-Band LSDB Resynchronization."[15] Of these, only the first draft deals directly with NSF. The last two drafts describe mechanisms that can be exploited for support of NSF.
The Internet draft "OSPF Link-Local Signaling" describes an Extended Options TLV for OSPF Hellos. NSF capabilities are signaled between neighbors with a Restart Signal (RS) bit, which is 0x00000002 in the Extended Options TLV. After a route processor switchover, a Cisco Systems router sets the RS bit in its Hellos to inform neighbors that it is restarting in NSF mode (like an unplanned graceful restart) and that it would like the neighbor to preserve the existing adjacency. When a neighbor supporting the Cisco NSF capability receives a Hello containing an Extended Options TLV with the RS bit set, it ignores the neighbor list in Hellos received from the restarting router. This is to prevent the neighbor from generating a 1-Way Received event, which would normally break the adjacency, if it does not see itself listed in the Hello of a restarting router. Graceful restartcapable routers that do not support Cisco NSF ignore the RS bit and do not respond in kind. Therefore, Cisco Systems routers treat GR-capable neighbors as non-NSF routers and follow standard OSPF procedures during restarts. Cisco Systems routers will acknowledge Grace LSAs generated by GR-capable neighbors, but they do not become GR helper neighbors. As a result, GR-capable routers revert to standard OSPF procedures during restart when peered with a Cisco Systems router. Therefore, although graceful restart and Cisco Systems NSF are not interoperable, their respective signaling causes no difficulties for peering. Further, each router can successfully support the restart of a like neighbor (NSF to NSF or GR to GR) behind the peering. 9.2.4.2. IS-IS Graceful Restart[16]
The normal reaction of an IS-IS router to a restarting neighbor is similar to OSPF's. When the holding timer associated with the restarting neighbor expires, the router declares the adjacency down and floods LSPs to indicate the adjacency change. The routers in the L1 area or L2 subdomain (depending on whether the broken adjacency was L1 or L2) run an SPF calculation to account for the change. When the router resumes receiving Hellos from the restarting neighbor the adjacency is reestablished, the SRM flags for the link are set on the LSPs in the database; and if the link is point to point, one or more CSNPs are sent to the neighbor. LSPs are again flooded to other neighbors to indicate the adjacency change, and SPF is again run in the area or subdomain. As with OSPF, IS-IS graceful restart modifies these procedures to exploit a separation of the control and forwarding processors in a router. Defined in RFC 3847, IS-IS graceful restart uses a new TLV, called the Restart TLV and carried in Hellos, to provide the necessary signaling. Similar to the way OSPF graceful restart differentiates between planned and unplanned restarts, IS-IS graceful restart differentiates between restarting and starting routers:
Although RFC 3847 does not use the terms helper and helper mode, these OSPF terms can be usefully applied to IS-IS graceful restart. That is, a helper is a router that understands and can support the graceful restart of a neighbor, and the router is in helper mode when it is in the process of supporting a restarting neighbor. RFC 3847 does define restart mode, which you might be tempted to equate with helper mode. But there is a difference: Where helper mode refers to the state of a router when it is assisting a gracefully restarting neighbor, restart mode is a neighbor state by which a router views a restarting neighbor. 9.2.4.2.1. The Restart TLVAny IS-IS router supporting graceful restart capability indicates its support by including a Restart TLV (Figure 9.13) in its Hellos. If a router that does not support graceful restart receives Hellos containing this TLV, the router ignores the TLV. Figure 9.13. The Restart TLV.
9.2.4.2.2. TimersThree timers are defined by RFC 3847 to manage IS-IS graceful restarts:
9.2.4.2.3. RestartsWhen a restart begins, the restarting router sends Hellos on all IS-IS interfaces with the RR flag set and starts timers T1, T2, and T3. When a helper neighbor receives an RR, it knows to attempt to maintain the adjacency to the restarting router. The helper sends a Hello with the RA flag set and the RR flag cleared; the remaining time field set to the present value of its holding timer for the adjacency; and, if the adjacency's interface is LAN, the restarting neighbor ID set to the SysID of the restarting router. This last parameter ensures a restarting router can differentiate an RA meant for it from an RA meant for another restarting router on the same broadcast network. If the interface to the restarting router is point to point, or if the interface is LAN and the helper is the DIS, the helper sends the necessary CSNPs to describe its database. The restarting router adjusts the period of T3 to the lowest value of the remaining time fields of the received RAs from neighbors indicating an adjacency state of Up to the router. When a CSNP or a complete set of CSNPs and the RA are received, the T1 for the receiving interface is stopped. When the router has synchronized its database with all neighbors, T2 and T3 are stopped, the router performs its SPF calculations and updates its FIB as needed, and floods its LSPs. If T1 expires before an RA and CSNP is received on the associated interface, another RR is sent and the timer is restarted. If T3 expires, the restarting router floods its LSPs with the OL bit set to indicate an incomplete database synchronization. If T2 expires, the router runs SPF, updates its FIB as needed, and floods its LSPs. If the LSPs have already been flooded with the OL bit set due to an expired T3, the bit is cleared in the newly flooded LSPs. Note that the SA flag is not used during restarts, and remains clear throughout the process. When the restart is complete, the RR, RA, and SA flags are cleared in Hellos sent by the restarted router. 9.2.4.2.4. StartsA router signals a start by sending on each of its IS-IS interfaces a Hello with the RR and RA flags cleared and the SA flag set to tell its helper neighbors to suppress advertisement of their adjacencies to the starting router. At the same time the SA is sent, the starting router starts timers T1 and T2. T3 is not used during starts. When the state of an adjacency from the starting router to a neighbor transitions to Up, the starting router sends its LSPs to the neighbor but with the OL bit set. When a CSNP and RA is received from the neighbor, T1 for that interface is stopped. And when either synchronization with all neighbors is complete or T2 expires, the starting router runs its SPF, updates its FIB, and floods its LSPs with the OL bit cleared. Hellos are sent with the SA bit cleared, telling helper neighbors to no longer suppress their adjacencies to the started router. Notice that the RR flag is not set initially. But if T1 expires, the timer is restarted and Hellos are sent with both the RR and SA flags set. As with restarts, when the start is complete the RR, RA, and SA flags are cleared in Hellos sent by the started router. 9.2.4.2.5. Interaction with Neighbors That Do Not Support Graceful RestartRouters that do not support graceful restart ignore the Restart TLV, and so during starts or restarts proceed with normal IS-IS procedures of transitioning the adjacency state to Down, flooding LSPs, and then attempting to reinitialize the adjacency. When a starting or restarting router receives a Hello with no Restart TLV on a point-to-point interface, indicating that the neighbor does not support graceful restart, it stops the T1 timer for that interface. Normal IS-IS operation means that CSNPs might or might not be received on the interface, so synchronization is considered complete for this neighbor, whether it really is or not. This does not apply to LAN interfaces, however, where some neighbors might be restart capable and others might not. So if a Hello is received with no Restart TLV, T1 continues running. However if no restart-capable neighbors exist on the LAN link, it would be undesirable for T1 to continually expire and be restarted. Therefore, RFC 3847 recommends that the timer not be restarted after some number of expirations, and normal Hellos be sent after that. The RFC leaves it to the implementers to specify the maximum number of T1 expirations. 9.2.5. Bidirectional Forwarding DetectionAnother capability made possible by the architectural separation of route processing and packet forwarding is bidirectional forwarding detection (BFD).[17] This is a simple Hello protocol for verifying bidirectional communication to a neighboring router's packet forwarding module. Specifically, it detects failures to forwarding path next hops. It can detect these failures in the subsecond range and notify routing protocols of the failure, thus augmenting and improving the routing protocols' own failure-detection abilities.
BFD can send and receive its control (Hello) packets in the millisecond range, which provides a useful utility for routing protocols to quickly detect transport failures in the packet forwarding module, link, or interfaces. This is particularly important on physical media such as Ethernet that do not provide fast failure detection within the data-link procedures. BFD can also detect unidirectional failures such as can occasionally occur in an Ethernet switch. Traditionally, attempts are made to decrease failure detection times by reducing the Hello intervals of the routing protocol. But this approach has distinct limitations. The architectural constraints of OSPF, for example, prevent the protocol from detecting loss of 2-way communication with peers in less than 2 seconds. Some IS-IS implementations allow Hello intervals as short as 333 milliseconds, but this is not universally supported. And because routing protocol Hellos are processed in the route processor, significantly reducing the Hello interval can impact the CPU load. BFD is designed to run on the packet forwarding module. Because of its independence from the route processor and any individual routing protocol, BFD can establish sessions for multiple upper-layer protocols and across multiple connections between peers. This separation from the control plane also means BFD enhances the robustness of graceful restart. 9.2.5.1. BFD Functional ModelBFD has two operating modes and an adjunct function:
BFD uses a three-way handshake similar to that used by the OSPF Hello protocol to verify bidirectional communication. When the "I Hear You" field of the BFD control packet is non-zero in both directions, bidirectional communication is considered verified and the BFD session is established. Because multiple BFD sessions can be active on a single link, a discriminator is used to identify and demultiplex control packets for each session. Three parameters control the exchange of control packets:
The two timers and the detection multiplier are continuously negotiated, are independent in each direction, and can be changed at any time. Each system transmits the period it would like to transmit control packets, and the minimum period it is willing to receive control packets. The agreed-upon transmit and receive periods are jittered up to 25 percent to prevent synchronization on multi-access links. 9.2.5.2. The BFD Control PacketBFD Control packets are encapsulated as appropriate to the transmission link between the neighboring systems. When the packet is encapsulated in IPv4 or IPv6, the TTL (or IPv6 Hop Count) field is set to 255. This helps to prevent attacks against the protocol originating from off the link. The packets are always unicast, and hence BFD sessions are always point to point. Figure 9.14 shows the packet format. Figure 9.14. The BFD Control packet.
|