Best Practices | Cisco Network Security Troubleshooting Handbook

High Availability in IPsec is very important for both LAN-to-LAN and Remote Access VPN connections. It is beyond the scope of this chapter to present every best practice for designing a robust IPsec VPN network (any design guide will do that). However, this section provides some detail on the features available that might assist you in configuring the resiliency for both LAN-to-LAN and Remote Access VPN connections.

Resiliency for IPsec can be obtained in either of the two following ways:

Stateful failover
Stateless failover

Stateful Failover

The Stateful failover feature is introduced in Versions 12.2S and 12.3(11)T. This feature enables a router to continue processing and forwarding packets after an outage for VPN traffic. The IPsec head end router maintains a full IPsec session and state information at a backup head end router to take over the active IPsec sessions should the primary head end router fail. With Stateless Failover (which is discussed in the following section), there is no way to maintain an IPsec session through a failed IPsec router. Sessions must time out and connectivity be re-established, causing network downtime.

IPsec Stateful Failover is implemented using the Stateful switchover (SSO) and Hot Standby Routing Protocol (HSRP) feature. HSRP provides network redundancy for IP networks, which means that HSRP monitors both the inside and outside interfaces so that if either interface goes down, the whole router is considered to be down and ownership of Internet Key Exchange (IKE) and IPsec security associations (SAs) is passed to the standby router (which transitions to the HSRP active state). SSO allows the active and standby routers to share IKE and IPsec state information so that each router has enough information to become the active router at any time. Before you configure Stateful Failover, you must ensure that you have the identical Hardware, the same software version, and the same configuration on both sides. Configuring stateful failover for IPsec involves configuring HSRP by assigning a virtual IP address, and enabling the SSO protocol for replicating the IKE and IPsec SA information.

For a more detailed information on this feature, refer to the following link:

http://www.cisco.com/univercd/cc/td/doc/product/software/ios123/123newft/123t/123t_11/gt_topht.htm

Stateless Failover

In Stateless Failover, the state information (IKE and IPsec SAs) are not replicated to the backup peer. So the remote peer must take the following two actions:

1.	The peer must be able to detect a connectivity failure to take the actions.
2.	Once connectivity failure is detected, the peer must take action to reconnect with another peer, which has connectivity to the same private network as the failed peer.

The Loss of IP connectivity could be caused by many variables, such as device failure, local link failure, or loss of connectivity to the service provider (SP), and so on. A more detailed discussion on how to achieve Stateless Failover on Cisco IOS Router is presented in next few sections.

Loss of Connection Detection Mechanism

Presently, Cisco IOS software offers the following mechanisms to detect the loss of IP connectivity for IPsec VPN implementation:

IKE Keepalives
Dead peer detection (DPD)
Dynamic routing protocols
GRE keepalives

IKE keepalive

The default lifetime for an IKE SA is 24 hours and for IPsec SA, it is 1 hour. So, if there is loss of IP connectivity, there is no standards-based mechanism for either type of SA to detect loss of this connectivity to a peer, except when the quick mode (QM) negotiation fails. This means that when using a default lifetime setting, an IPsec peer might be forwarding data into a "black hole" for up to an hour, before the QM timeout.

Another problem is that SA may become stalled or "dead" in the security association database (SADB). This occurs when a peer has lost connectivity and there is no mechanism to clear the SA out of the SADB until the lifetimes expire. The problem with this situation is that the "dead" peer tries to reestablish connectivity, and connection is refused because an SA already exists with that peer. IKE keepalives assist in the maintenance of the SADB to prevent this from occurring.

The keepalive packets by default are sent every ten seconds. This value is configurable. Once three packets are missed, the IPsec connection is considered down or "dead," and the old SA will be cleared.

To re-establish connectivity, the IPsec device must have at least two IPsec peers defined in its crypto map statement (see Example 6-40). The router tries the next peer sequentially until a connection is established or it runs out of peers, at which time it rolls back to the top of the list.

Keepalive is supported by IOS and PIX, and DPD was originally supported by a client. Now DPD is supported on all platforms.

Example 6-40. IKE Keepalive Configuration

Dhaka# show running-config crypto isakmp policy 10 authentication pre-share ! ! IKE Keepalive is configured with the following command. crypto isakmp keepalive 10 ! crypto isakmp key cisco1234 address 30.1.1.1 crypto isakmp key cisco1234 address 40.1.1.1 ! crypto ipsec transform-set strong esp-3des esp-sha-hmac ! crypto map to_Doha 10 isakmp-ipsec   set peer 30.1.1.1   set peer 40.1.1.1   set transform-set strong   match address 101 ! interface Ethernet0/0 ip address 10.1.1.1 255.255.255.0 ! interface ethernet1/0 ip address 20.1.1.1 255.255.255.0 crypto map to_Doha ! ip route 0.0.0.0 0.0.0.0 ethernet1/0 ip route 10.1.0.0 255.255.0.0 ethernet1/0 access-list 101 permit ip 10.1.1.0. 0.0.0.255 10.1.2.0 0.0.0.255 Dhaka#

Dead Peer Detection (DPD)

As discussed in the previous section, the major problem with original IKE keepalives is that they do not scale well in large hub-and-spoke networks, where the hub must process many keepalive messages from its various peers. Moreover, keepalive messages are tied to the IKE SA and consequently are handled at the process level, which causes performance degradation.

One way of addressing these performance issues is to reduce the number of keepalive messages exchanged without significantly increasing the failover time. DPD is based upon the ability to send a keepalive message only when necessary, because it tracks existing traffic and communication with the peer. A keepalive is not sent until a "worry state" has occurred, which is defined by each peer and is usually caused by lack of traffic from its peer. This means that the router does not have to inform another router that it is alive at every specified time interval. When the traffic needs to be sent out, and the pre-configure time has elapsed without any traffic, only then is the DPD message sent. This reduces the number of messages exchanged dramatically compared to the IKE keepalive.

Configuration for DPD is the same as for the IKE keepalive. It is important to note that it DPD feature was first introduced in Version 12.2(8) T.

Unlike keepalive, DPD works in both site-to-site deployments and Remote Access VPN Client deployments. Starting with release 3.6, the Cisco VPN (Unity) client supports stateless failover to multiple peers with DPD as long as you are using auto-initiation and stored password.

Dynamic Routing Protocol

In the GRE over IPsec deployment, the dynamic routing protocol can be used as a method to detect a dead peer. If the Hellos are missed, the neighbor goes down, causing routing to disappear. This helps routers to take alternate peers to send out the traffic.

GRE keepalive

In the GRE-over-IPsec deployment, IPsec tunnel failure can be detected based on the GRE keepalive. If the GRE keepalive is missed, the tunnel interface goes down, causing the alternate tunnel interface to process the traffic using alternate IPsec tunnel.

Stateless Failover Mechanism Options

Now that you are comfortable with the link failure detection mechanism, it is time to explore the various options available to be configured once the link failure occurs and is detected by one of the mechanisms discussed in the preceding section. This section explores some of the options that are available for reconnecting with the standby device when the link failure occurs.

Backup Peer for Basic LAN-to-LAN IPsec

With either IPsec keepalive or DPD configured, you can define multiple IPsec peers under the single crypto map. The first one in the list will be used to build up the IPsec tunnel. If the first one in the list fails, the second peer in the list will be tried. In Example 6-40 under IKE Keepalive in the previous section, the following configuration defines multiple peers under the same crypto map statement:

crypto map to_Doha 10 isakmp-ipsec ! With this configuration, Router will always try to build up the IPsec tunnel with peer ! 30.1.1.1. If this peer is down for any reason, peer 40.1.1.1 will be tried.   set peer 30.1.1.1   set peer 40.1.1.1   set transform-set strong   match address 101

Hot Standby Routing Protocol (HSRP) and Reverse Route Injection (RRI)

HSRP and RRI together can provide a robust resilience mechanism for hub and spoke connections. HSRP is used for IP redundancy to assist in failing over between two devices when an interface or link becomes unusable. HSRP tracks the state of the router's interfaces and IP connectivity, which provides a mechanism to switch between primary and secondary devices on a failure. HSRP has now been closely coupled with IPsec to track state changes and provide a better solution for stateless IPsec failover. It is now possible to use the HSRP Virtual IP Address (VIP) as a peer endpoint address. HSRP is an active-standby solution, which means that when the primary one is active, the secondary does not have any traffic passing through it. However, it is possible to have multiple active HSRP groups defined, thus allowing half the peers to be active on one device and the other half on the second device. A side benefit of HSRP is that the remote peer now has a much simpler configuration because only one peer needs to be definedthat of the HSRP VIP. In the event of a failure, all SAs must be re-established.

So, as you can see, HSRP provides the redundancy guarantee for spokes. However, this might create problems as well. When failover occurs, devices behind the head-end devices need to know which head-end currently owns the active SAs for IPsec traffic; otherwise, those devices may return the replies to the wrong routers. RRI was developed to correct the routing behavior of the upstream devices and to ensure that a return gets back to the active peer.

RRI allows for static routes to be automatically inserted into the routing process for those networks and hosts protected by a remote tunnel endpoint. These protected hosts and networks are known as remote proxy identities. Each route is created based on the remote proxy network and mask, with the next hop to this network being the remote tunnel endpoint. Using the remote VPN router as the next hop forces traffic through the crypto process to be encrypted. Once the static route is created on the VPN router, this information is then propagated to upstream devices, allowing them to determine which is the correct VPN router to receive returning traffic to maintain IPsec state flows. RRI works with both static and dynamic crypto maps.

In Figure 6-6, if you have two routers in parallel on the Doha site, and if you want to configure HSRP and RRI, both of the Doha routers must be configured in exactly the same way. The important configuration elements are the interface standby group information, interface crypto map command, crypto map, keepalives, and routing as shown in Example 6-41.

Example 6-41. Configuration of Primary Router on the Doha Site

Doha# show running-config crypto isakmp policy 10 authentication pre-share ! crypto isakmp key cisco1234 address 20.1.1.1 ! crypto isakmp keepalive 10 ! crypto ipsec transform-set strong esp-3des esp-sha-hmac ! ! Here you have just a single crypto map pointing to the VIP crypto map to-Dhaka 10 ipsec-isakmp    set peer 20.1.1.1    set transform-set strong    match address ipsec    reverse-route ! interface FastEthernet0/0  ip address 30.1.1.1 255.255.255.0  standby 1 ip 30.1.1.3  standby 1 priority 100  standby 1 preempt  standby 1 name VPNHA  standby 1 track FastEthernet0/1  crypto map HQ-to-Remote redundancy VPNHA interface FastEthernet0/1 ip address 10.1.2.1 255.255.255.0 ! router eigrp 100   network 10.0.0.0 ! ip route 0.0.0.0 0.0.0.0 FastEthernet0/0 ! ip access-list extended ipsec    permit ip 10.1.2.0 0.0.0.255 10.1.1.0 0.0.0.255 Doha#

The following is a listing of some important points about HSRP and RRI implementation:

Although HSRP and RRI provide a mechanism to failover to a secondary device, keepalives are still required to track the state of each peer and to maintain the health of the SADB when a failure occurs. DPD is highly recommended for this solution.
HSRP does not replicate IPsec state information to the standby router; new SAs need to be built with the secondary device.
HSRP is limited to LAN-based media technology because of the way it broadcasts Hello packets between peers. This means that HSRP and crypto map together cannot be configured on the edge device, which has only WAN interfaces.
RRI requires a link-state routing protocol (EIGRP or OSPF) to propagate the static routes, which are created each time a peer establishes an IPsec connection.
HSRP and GRE do not work together when tying GRE to the same physical interface on which HSRP is running. The main issue is the way the HSRP VIP address is advertised.
If there is a device behind the head-end boxes which does not support routing or can only have a single default route, such as a firewall, it is desirable to configure HSRP on both LAN interfaces.
It is also important to set the preempt timers to the same value. These are new changes in HSRP code that prevent the original primary from taking back the connections once it comes back online, which causes the IPsec tunnels to be rebuilt for no reason.
HSRP works for both site-to-site and remote access VPNs.

Generic Routing Encapsulation (GRE) Tunnels over IPsec

To attain redundancy with the GRE over IPsec, the remote peer of the IPsec tunnel should be configured with two GRE tunnels, one to the primary head-end VPN router and the other to the backup VPN router. Both GRE tunnels are secured with IPsec: each one has its own IKE SA and two IPsec SAs. Because GRE can carry multicast and broadcast traffic, it is possible and very desirable to configure a routing protocol for these virtual links. Once a routing protocol is configured, the failover mechanism comes automatically. The Hello or keepalive packets sent by the routing protocol over the GRE tunnels provide a mechanism to detect loss of connectivity. In other words, if the primary GRE tunnel is lost, the remote site detects this event by the loss of the routing protocol Hello packets. Once virtual-link loss is detected, the routing protocol chooses the next best path. For this case, the backup GRE tunnel is chosen.

Therefore, the second part of VPN resiliency is obtained by the automatic behavior of the routing protocol. Because the backup GRE tunnel is always up and secured (MM and QM have already been negotiated), the failover time is determined by the Hello packet mechanism and the convergence time of the routing protocol. Aside from providing a failover mechanism, GRE tunnels also provide the ability to encrypt multicast/broadcast packets and non-IP protocols. It is recommended to use EIGRP as a routing protocol because it performs better than OSPF for GRE over IPsec. In Figure 6-6, if you have another router in Doha, when one of the tunnels is down, packets flow through the other tunnel, so the Dhaka LAN still should be able to access resources to the Doha LAN.

Example 6-42 shows the configuration of Dhaka. This router is configured with GRE, EIGRP, and IPsec. Important configuration elements are the redundant peers and multiple crypto maps, GRE tunnel interfaces, the bandwidth setting on the backup GRE interface, the routing protocol, and multiple access control lists (ACLs) as shown in highlighted areas.

Example 6-42. Configuration of GRE over IPsec for Dhaka

Dhaka# show running-config crypto isakmp policy 10 authentication pre-share ! crypto isakmp key cisco1234 address 30.1.1.1 crypto isakmp key cisco1234 address 30.1.1.2 ! crypto ipsec transform-set strong esp-3des esp-sha-hmac ! ! Two crypto maps  one for the primary and other one for the secondary router. crypto map to_Doha 10 ipsec-isakmp    set peer 30.1.1.1    set transform-set strong    match address gre1 crypto map to_Doha 20 ipsec-isakmp    set peer 30.1.1.2    set transform-set strong    match address gre2 ! interface Tunnel0 ip address 1.1.1.1 255.255.255.0 tunnel source Ethernet 1/0 tunnel destination 30.1.1.1 crypto map Remote-to-HQ ! interface Tunnel1 ip address 10.4.2.1 255.255.255.0 bandwidth 5 tunnel source Ethernet 1/0 tunnel destination 30.1.1.2 crypto map to_Doha ! interface Ethernet0/0 ip address 10.1.1.1 255.255.255.0 ! interface Ethernet 1/0 ip address 20.1.1.1 255.255.255.0 crypto map to_Doha ! router eigrp 100   network 10.0.0.0   network 1.1.1.0 ! ip route 0.0.0.0 0.0.0.0 ethernet 1/0 ! ip access-list extended gre1    permit gre host 20.1.1.1 host 30.1.1.1 ! ip access-list extended gre2    permit gre host 20.1.1.1 host 30.1.1.2 Dhaka#

Note

To control which tunnel should be primary and which one should be secondary, you can use delay, and the bandwidth command. If you do not specify any of these, the traffic will be load sharing on both tunnels, and they will act as backups for each other.