5.5 IP Security architecture (IPSec)

The technologies discussed so far offer the capability to tunnel protocols and user data but do not inherently provide privacy. To satisfy the requirements for scalable, flexible, interoperable VPNs, a major initiative by the IETF Security Working Group has produced a standard called the IP Security Architecture (IPSec). IPSec provides a generic mechanism for the secure transfer of IP-based application data in both public and private networks. IPSec is designed to enable client and gateway products from many vendors to interoperate freely and therefore creates true end-to-end security across multivendor networks such as the Internet. Specifically, as follows:

IPSec works just above the IP layer, protecting IP datagrams; therefore, any IP-based protocol or application can be automatically secured (unlike protocol-specific mechanisms such as SSL and S-HTTP).
IPSec is supported as an IPv4 option and as an IPv6 extension header. All IPv6 implementations are required to support IPSec; IPv4 implementations are strongly recommended to do so. Since IPSec works at Layer 3, this enables IPSec to operate transparently over any existing IP infrastructures; existing higher-level protocols and applications do not require any modification.
IPSec supports standards-based encryption, integrity, and authentication schemes. IPSec is not tied to any particular ciphers and has the flexibility to support emerging, more powerful algorithms as required. Keyed hash algorithms, such as HMAC, combined with traditional hash algorithms, such as MD5 or SHA, are supported for packet authentication. Digital certificates are also fully supported.
IPSec specifies IKE, a public-key (Diffie-Helman) approach to automatic key management (although other automated key distribution techniques may be used). This is a powerful feature that greatly eases deployment and promotes scalability. Manual key distribution of keys is also supported; for example, KDC-based systems such as Kerberos and other public-key systems such as SKIP could be employed.
IPSec enables the network administrator to control the granularity at which a security service is offered. For example, a single encrypted tunnel could be created to transport all the traffic between sites, or a separate encrypted tunnel could be established per TCP connection between sites.

It is important to understand that IPSec is limited to tunneling IP applications only; for tunneling non-IP applications it should be used in conjunction with a multiprotocol encapsulation technique such as GRE or L2TP Having said that, protocols such as NetWare IPX, AppleTalk OSI, and Banyan VINES will happily run over IP instead of their native Layer 3 stacks (e.g., OSI transport class 4 uses IP protocol number 29; VINES uses IP protocol number 83), and where sites are running pure IP at Layer 3, these protocols can be readily tunneled over IPSec.

One of the major strengths of IPSec is that it was developed through open consensus over several years via the IETF (in contrast to technologies such as PPTP, which required several subsequent fixes after security flaws were discovered). IPSec offers flexible building blocks from which robust, secure VPNs can be constructed. Most leading VPN vendors already support IPSec. IPSec is typically implemented on hosts (PCs, servers, etc.) or security gateways (routers, firewalls, or VPN devices), in one of three formats, as follows:

Native IP implementation is integrated directly into the native IP stack source code and is applicable to both hosts and security gateways.
Bump-in-the-Stack (BITS) implementation is implemented transparently, between the native IP stack and the local network drivers. Access to the IP stack source code is not required, so this approach may be appropriate for use with legacy systems (usually employed in hosts).
Hardware assist is another transparent implementation sometimes referred to as a Bump-in-the-Wire (BITW) implementation, applicable to both gateways and hosts. Here, an outboard crypto-processor card is deployed. This is a common design feature of military and high-end commercial network security systems. Usually the BITW device is IP addressable. When supporting a single host, it may be quite analogous to a BITS implementation, but in supporting a router or firewall, it must operate as a security gateway.

This section provides a concise technical review of IPSec and its operations. For further information on IPSec and its components, the interested reader is referred to [54–60].

5.5.1 IPSec concepts and terminology

The IPSec framework comprises three main components, responsible for authentication, privacy, and key management, as follows:

IP Authentication Header (AH) provides data integrity and data origin authentication and replay protection for IP datagrams. Authentication is applied to the whole IP datagram, including the initial IP header.
IP Encapsulating Security Payload (ESP) provides data confidentiality (encryption), data origin authentication, data integrity, and replay protection for IP payloads. Here, authentication is applied to the IP datagram beyond the initial IP header.
Internet Key Exchange (IKE) provides automated setting up of security associations and automated generation and refresh of cryptographic keys.

Fundamental to IPSec is the concept of a Security Association (SA). Both AH and ESP make use of SAs, and a major function of IKE is the establishment and maintenance of SAs. We will discuss each of the above components in turn, after a brief discussion of SAs.

Security Associations (SA)

A Security Association (SA) is an abstraction denoting a secure logical connection between two or more IPSec entities. An SA describes the agreed-upon security services that peer entities will use in order to communicate securely (i.e., an SA is the embodiment of the negotiated security policy between two devices, incorporating the cryptographic algorithms used, keying information, and identifying the participating parties).

There are two basic types of SA: the IKE SA (used for key and SA management) and general-purpose IPSec SAs (used for data transmission). The IKE SA is bidirectional and must be established before any data transfer is possible. IPSec SAs are unidirectional (simplex). A full IPSec VPN, therefore, comprises at least three SAs: an IKE SA plus two general-purpose SAs for bidirectional data communication (i.e., an inbound and outbound SA from the SA initiator's perspective). An SA can be used either for AH or ESP and may be in either transport or tunnel mode. Figure 5.14 illustrates the basic SA concepts. In Figure 5.14, we see that each VPN comprises an IKE SA plus inbound and outbound (relative to the initiator) general-purpose SAs for data transmission. The allocation of SPIs is arbitrary; however, it is recommended that new SPIs should differ from those recently used.

click to expand
Figure 5.14: IKE and IPSec Security Associations (SAs) for two VPNs.

Each SA is identified by a 32-bit number sitting directly above the IP layer called the Security Parameter Index (SPI). The SPI is normally allocated dynamically but is fixed for the duration of an SA. The SPI cannot be encrypted, since it is required to identify the associated SA. A fully qualified (i.e., unique) SA is identified by the combination of three elements:

Security Association Components

Security Parameter Index (SPI)—This is a 32-bit value used to identify different SAs with the same destination address and security protocol. The SPI is carried in the header of the security protocol (AH or ESP). It has only local significance, as defined by the creator of the SA. The SPI values in the range 1 through 255 are reserved by the IANA. The SPI value of 0 must be used for local implementation- specific purposes only. Generally, the SPI is selected by the destination system during the SA establishment.
IP Destination Address—This address can be a unicast, broadcast, or multicast address. However, currently, SA management mechanisms are defined only for unicast addresses.
Security Protocol—This can be either AH or ESP but not both.

An SA protects traffic within it via either AH or ESP, but not both. For a connection that requires protection from both AH and ESP a pair of SAs must be configured in each direction as defined in section 5.4.4.

IPSec databases

In order to keep track of SA connection states and the policies in place, an IPSec implementation maintains two databases, as follows:

Security Policy Database (SPD)—This defines the security services offered to IP traffic, based on factors such as source/destination address, inbound/outbound, and so on. The SPD contains an ordered list of policy entries for inbound and/or outbound traffic. Entries in this database are similar to firewall rules or access control lists. For example, individual entries might specify that traffic to 10.0.0.0 should not go through IPSec processing, traffic to 140.168.0.0 may be discarded, and all other traffic must be processed by IPSec.
Security Association Database (SAD)—This contains parameter information about each SA, such as AH/ESP algorithms and keys, sequence numbers, protocol mode, and SA lifetime. For outbound processing an SPD entry points to an entry in the SAD (i.e., the SPD determines which SA is applied for a given packet). For inbound processing, the SAD determines how the packet must be processed.

Tunnel and transport mode

IPSec defines two modes of operation for security associations, transport and tunnel mode, as follows:

Transport mode—This is typically a security association between two hosts. In IPv4 the IPSec header appears immediately after the IP header and any options. In IPv6 the IPSec header appears after the base IP header and extensions but may appear before or after destination options. In both cases the IPSec header appears before any higher-layer protocols (such as TCP or UDP). In transport mode the original IP datagram is secured via IPSec, and therefore registered IP addresses are required if the datagram is to traverse public networks such as the Internet.
Tunnel mode—In this mode a new IP datagram is constructed and the original IP datagram is fully encapsulated as payload. IPSec tunneling is modeled after [61]. It was originally designed for Mobile IP, an architecture that allows a mobile host to keep its home IP address even if attached to remote or foreign subnets. Tunnel mode is required whenever either end of an SA is operating as a security gateway, where the IPSec traffic is in transit. The IP destination address in the encapsulating header is set to the tunnel endpoint, which is not necessarily the same as the original packet destination. For example, a typical application would be to have two security gateways with an AH tunnel configured to authenticate all traffic between two sites over the Internet.

Figure 5.15 illustrates SAs in both transport and tunnel mode.

click to expand
Figure 5.15: SAs are defined in either transport mode or tunnel mode.

The advantages of transport mode are that it adds only a few bytes of overhead per datagram and has a lower processing overhead. Furthermore, since the IP header is passed in the clear, this enables service providers to offer special processing (such as quality of service) at quite a granular level (based on information in the header such as source and destination address, ToS bits, etc.). By contrast, tunneling obfuscates such details, exposing only gateway-to-gateway information. Even so, anything above Layer 3 is likely to be encrypted, so application-specific flow differentiation is not possible. One significant drawback of transport mode is that the IP header is open to abuse by malicious users (i.e., traffic or topology analysis can be performed in preparation for a subsequent denial-of-service or spoof attack). Another disadvantage is that the so-called mutable fields within the IP header are not authenticated. Tunnel mode offers complete protection of the encapsulated IP datagram; a malicious user only can determine the tunnel end points and the encapsulation protocol.

Tunnel mode also offers the possibility of using private (i.e., unregistered) addressing schemes and can be used instead of NAT in certain scenarios. A major advantage of tunnel mode is that it enables intermediate network devices (such as a router or firewall) to act as a kind of IPSec proxy for hosts that are not yet IPSec enabled. For example, two sites could be connected using an IPSec VPN via IPSec-enabled routers. Clients and servers on those sites have no IPSec capability; however, IPSec processing can be applied transparently on specified traffic between sites based on access control lists (i.e., the end systems are completely unaware of the gateways—no explicit connection must be made). This enables existing IP networks to reap the benefits of secure VPNs with minimal disruption.

Reference [54] states that a host must support both transport and tunnel mode, whereas a security gateway is required to support only tunnel mode. For all transit traffic security gateways must use tunnel mode; however, gateways frequently use transport mode also (e.g., transport mode may also be used for gateway-to-gateway management traffic such as SNMP or ICMP, where the gateway is effectively operating as a host). It is also common practice for host-to-host connections to be in tunnel mode.

5.5.2 Authentication Header (AH)

The IP Authentication Header (AH) provides integrity and data origin authentication for IP datagrams and also offers optional protection against session replay. Note that replay protection must be implemented by any IPSec-compliant system (regardless of whether it is used or not). Data integrity is assured by a checksum (ICV) generated by a message authentication code (e.g., MD5 or SHA-1); data origin authentication is assured by including a secret shared key in the data to be authenticated; and replay protection is provided by use of a sequence number field within the AH header. In the IPSec vocabulary, these three distinct functions are collectively referred to as authentication.

AH services are connectionless, operating on a per-packet basis. AH authenticates all of the IP datagram, with the exception of a small number of mutable fields in the IP header. These fields are typically modified in transit and cannot be predicted by the receiver. The mutable fields include those shown in the following chart:

IP version 4	IP version 6
Type of Service (TOS)	Class
Flags and fragment offset	Flow label
Time to Live (TTL)	Hop limit
Header checksum

If these fields absolutely must be protected, then IPSec tunnel mode must be used. In tunnel mode the original IP header and associated data are encapsulated inside another IP header, so only the outer IP wrapper changes in transit. The payload of the IP packet (including the original IP header) is considered immutable and can, therefore, be fully protected by AH (or indeed ESP authentication).

AH processing is only performed on unfragmented IP packets. An IP packet that already has AH applied may be fragmented by intermediate routers but must be reassembled at the destination before passing up to the AH process, or else it will be discarded by the AH process. This prevents overlapping fragment attacks (such as teardrop). Packets that fail authentication are discarded and are not delivered to upper layers to reduce the chance of denial-of-service attacks. (See Figure 5.16.) Note that the patchy authentication of the IP header with AH indicates that certain fields are not authenticated (i.e., so-called mutable fields that are likely to change in transit).

click to expand
Figure 5.16: The shaded area represents the scope of authentication for IPSec transport and tunnel mode using either the AH or the ESP protocol.

AH header format

AH is identified by protocol number 51, assigned by the IANA. AH can be used either in transport mode or in tunnel mode. In transport mode AH is inserted immediately following the original IP header. If the datagram already has IPSec header(s), then the AH header is inserted before these. In tunnel mode AH is applied to the encapsulating IP header and payload, so the original IP header is fully protected. Note that early IPSec implementations may not support AH in tunnel mode. AH is an integral part of IPv6. In an IPv6 environment, AH is considered a mandatory end-to-end pay-load and it appears after hop-by-hop, routing, and fragmentation extension headers. The destination options, extension headers may appear either before or after the AH header. The current AH header format and operation are described in [55]. The header format is illustrated in Figure 5.17.

click to expand
Figure 5.17: AH header format.

AH Header Fields

Next Header—The next header is an 8-bit mandatory field that shows the data type carried in the payload—for example, an upper-level protocol identifier such as TCP. The values are chosen from the set of IP protocol numbers defined by the IANA.
Payload length—This field is 8 bits long and contains the length of the AH header expressed in 32-bit words, minus 2. It does not relate to the actual payload length of the IP packet as a whole. If default options are used, the value is 4 (three 32-bit fixed words plus three 32-bit words of authentication data minus 2).
Reserved—This field is reserved for future use. Its length is 16 bits and must be set to 0.
Security Parameter Index (SPI)—This field is 32 bits in length. It is used to distinguish among different SAs terminating at the same destination and using the same IPsec protocol. An SPI has only local significance. SPI zero is reserved for local implementation-specific applications and should not appear on the wire.
Sequence number—This 32-bit field is a monotonically increasing counter that is used for replay protection. Replay protection is optional; however, this field is mandatory. The sender always includes this field, and it is at the discretion of the receiver to process it or not. At the establishment of an SA the sequence number is initialized to zero. The first packet transmitted using the SA has a sequence number of one. Sequence numbers are not allowed to repeat. Thus, the maximum number of IP packets that can be transmitted on any given SA is 2³² - 1. After the highest sequence number is used, a new SA and consequently a new key is established. Antireplay is enabled at the sender by default. If, upon SA establishment, the receiver chooses not to use it, the sender does not process this field further. Note that the antireplay mechanism is normally not used with manual key management. Note also that the original AH specification did not discuss the concept of sequence numbers; older IPSec implementations may, therefore, not provide replay protection.
Authentication data—This is a variable-length field containing a checksum called the Integrity Check Value (ICV). The ICV is calculated on a per-packet basis using an algorithm selected at the SA initialization. The authentication data length is an integral multiple of 32 bits. The ICV is used by the receiver to verify the integrity of the incoming packet. In theory any MAC algorithm can be used to calculate the ICV. The specification requires that HMAC-MD5-96 and HMAC-SHA-1-96 must be supported. It requires keyed MD5. In practice keyed SHA-1 is also used. Implementations usually support two to four algorithms. When performing the ICV calculation, the mutable fields are treated as zero.

5.5.3 Encapsulating Security Payload (ESP)

The IP Encapsulating Security Payload (ESP) provides data confidentiality using encryption. ESP can also optionally provide data origin authentication, connectionless (per packet) integrity, and replay protection. When ESP is used to provide authentication functions, it uses the same algorithms used by the AH protocol; however, the scope differs (as illustrated in Figure 5.18). The required services are configurable at SA establishment, with the following restrictions:

Integrity checking and authentication are always enabled together (i.e., they are mutually inclusive).
Replay protection is selectable only if authentication is enabled.
Replay protection is selected only by the receiver.

click to expand
Figure 5.18: ESP header and trailer.

Encryption is selectable regardless of the other service options. It is strongly recommended that if encryption is enabled, authentication should also be enabled. If encryption alone is enabled, then a malicious user could forge packets and mount cryptanalytic attacks. This is practically impossible when integrity check and authentication are enabled. Although authentication and encryption are optional, at least one of them must be enabled; otherwise, it makes no sense to use ESP at all. As with AH, ESP processing is applied only to unfragmented IP packets. If both encryption and authentication are selected, the receiver first authenticates the packet. If authentication fails, the packet is discarded to avoid unnecessary processing and reduce the risk of a denial-of-service attack.

The cryptographic algorithms implemented in IPSec are called transforms. For example, the DES algorithm used in ESP is called the ESP DES-CBC transform [62]. IPSec defines the interfaces into the Public Key Infrastructure (PKI) but does not explicitly specify which PKI; it defines two competing standards: Simple Key Management for Internet Protocol (SKIP) and Internet Security Association Key Management Protocol (ISAKMP). Both are so-called authenticated Diffie-Helman Key Exchange Algorithms (D-H KEAs). Typically, the authentication of the D-H public key material is provided by X.509 certificates, and, therefore, SKIP and ISAKMP do not define the full scope of a PKI and require an interface into an external certificate request-generation technology (such as provided by [18–21]).

ESP packet format

ESP is identified by protocol number 50, assigned by the IANA. The current ESP packet format is described in [56]. The format of the ESP packet is more complicated than that of the AH packet. Actually there is not only an ESP header but also an ESP trailer and ESP authentication data. The payload is encapsulated between the header and the trailer, hence the name of the protocol.

ESP Header Fields

Security Parameter Index (SPI)—as defined for AH.
Sequence Number—as defined for AH.
Payload Data—The Payload Data field is mandatory. It consists of a variable number of bytes of data described by the Next Header field. This field is encrypted with the cryptographic algorithm selected during SA establishment. If the algorithm requires initialization vectors, these are also included here. The ESP specification requires support for the DES algorithm in CBC mode (DES-CBC transform). Often other encryption algorithms are supported, such as triple-DES.
Padding—Most encryption algorithms require that the input data must be an integral number of blocks. Also, the resulting ciphertext (including the Padding, Pad Length, and Next Header fields) must terminate on a 4-byte boundary, so that the Next Header field is right-aligned. That's why this variable-length field is included. It can be used to hide the length of the original messages, too. However, this could adversely impact the effective bandwidth. Padding is an optional field. Note that encryption covers the Payload Data, Padding, Pad Length, and Next Header fields.
Pad Length—This 8-bit field contains the number of the preceding padding bytes. It is always present, and the value of 0 indicates no padding.
Next Header—as defined for AH.
Authentication Data—This field is variable in length and contains the ICV calculated for the ESP packet from the SPI to the Next Header field inclusive. The Authentication Data field is optional. It is included only when integrity check and authentication have been selected at SA initialization time. The ESP specifications require two authentication algorithms to be supported: HMAC with MD5 and HMAC with SHA-1. Often the simpler keyed versions are supported by the IPSec implementations. Note that the IP header is not covered by the ICV. Note also that the original ESP specification [63] discusses the concept of authentication within ESP in conjunction with the encryption transform. That is, there is no Authentication Data field and it is left to the encryption transform to eventually provide authentication.

Since both ESP and AH provide authentication, one might question why ESP authentication does not cover the IP header, doing away with the need for AH altogether. There are several reasons for this, including the following:

ESP requires strong cryptographic algorithms to be implemented, and these may be subject to restrictive regulations in some countries, leading to deployment problems with ESP-based solutions. However, authentication is not regulated and AH can be deployed internationally without restriction.
For many applications only authentication is required. AH is lighter and potentially more scalable than ESP because of the simpler format and lower processing overhead, and in these cases it makes sense to use AH.
The two protocols combined enable finer control and more flexibility for an IPSec network. By nesting AH and ESP, for example, one can implement IPSec tunnels that leverage the strengths of both.

As with AH, ESP can be used in two ways: transport mode and tunnel mode. In transport mode the original IP datagram is taken and the ESP header is inserted immediately after the IP header. If the datagram already has IPSec header(s), then the ESP header is inserted before any of those. The ESP trailer and the optional authentication data are appended to the payload. ESP in transport mode provides neither authentication nor encryption for the IP header. This is a disadvantage, since false packets might be delivered for ESP processing. In tunnel mode ESP offers complete protection if both encryption and authentication are selected, since the original IP datagram becomes the payload data for the new ESP packet (only the encapsulating IP header is not protected). As with AH, ESP is an integral part of IPv6. In an IPv6 environment, ESP is considered an end-to-end payload and it appears after hop-by-hop, routing, and fragmentation extension headers. The destination option extension headers could appear either before or after the AH header.

5.5.4 Combining IPSec protocols

IP packets transmitted over an individual SA are protected either by AH or ESP but not both. Where the security policy requires a combination of services for a particular traffic flow, it is necessary to employ multiple SAs. The term SA bundle refers to a sequence of SAs through which traffic must be processed in order to satisfy the security policy. The order of the sequence is defined by the policy. Note that the SAs that comprise a bundle may terminate at different end points (e.g., one SA may extend between a mobile host and a security gateway and a second, nested SA may extend to a host behind the gateway). The use of SA bundles means that AH and ESP may be applied alone, in combination with the other, or even nested within another instance. With these combinations, authentication and/or encryption can be provided between a pair of communicating hosts, between a pair of communicating firewalls, or between a host and a firewall. Given the two modes of each protocol, there are a number of possible combinations. Fortunately, only a few combinations make sense. Reference [55] describes the mandatory combinations that must be supported by each IPSec implementation. Other combinations may also be supported, but this could affect interoperability.

SA bundles

As indicated previously, if AH and ESP are required in combination, then multiple SAs must be established in each direction. This group of SAs is referred to as an SA bundle. There are two models for SA bundle creation, as follows:

Transport adjacency—AH and ESP are applied in transport mode to the same IP datagram, resulting in two distinct SA pairs: one for AH and one for ESP. This method is practical for only one level of combination; there is no advantage in further nesting.
Iterated (nested) tunneling—AH and ESP are applied in tunnel mode in sequence (i.e., nested). After each application a new IP datagram is created and the next protocol is applied to it. This method has no limit in the nesting levels, though one would normally not nest more than three levels (protocol and processing overheads increase with nesting, and there is the likelihood of fragmentation). Each SA in the bundle can originate or terminate at different nodes along the path. Reference [54] defines three possibilities: all end-points are identical, one endpoint is the same, or no endpoints are the same. Support for only the latter two is required.

If this were not complicated enough, both transport and nested bundles can be combined. For example, an IP packet with transport adjacency IPSec headers can be sent through nested tunnels. When designing an IPSec VPN, however, it is recommended that you limit the number of times IPSec processing is applied to avoid overburdening the gateways and limiting scalability (this will depend somewhat on the vendor equipment architecture selected). Two stages are sufficient for most applications; it is unlikely that further processing beyond three stages has any real benefit. Note that in order to create an SA bundle in which SAs have different end-points, at least one level of tunneling must be applied (transport adjacency does not allow for multiple source-destination addresses, because only one IP header is present).

The general principle of the combined use is that IPSec processing upon packet reception should start with authentication followed by decryption. Using this principle, the sender first applies ESP and then AH to outbound traffic (in fact, this sequence is mandated for transport mode IPSec processing). A more subtle issue with combined SAs is whether ESP authentication should be enabled when AH authentication is in use. In practice enabling ESP authentication makes sense only when the ESP SA extends beyond the AH SA (e.g., an encrypted transport connection between two hosts that traverses an AH tunnel between two gateways). In such cases it is strongly recommended that ESP authentication be enabled to avoid potential spoofing attacks. Finally, if packets are received where the origin is unknown, they should be discarded without performing decryption to avoid wasteful processing and reduce the likelihood of a denial-of-service attack.

5.5.5 The Internet Key Exchange Protocol (IKE)

Although the use of security associations is fundamental to IPSec, IPSec does not have an inherent mechanism for creating SAs. The IETF chose to divide functionality into two parts: IPSec provides packet-level processing, while the Internet Key Management Protocol (IKMP) negotiates SAs. After investigating several alternatives (including SKIP and Photuris), the IETF selected as the Internet Key Exchange (IKE) as its standard for configuring IPSec SAs [59]. IKE (previously referred to as ISAKMP/Oakley) supports automated negotiation of SAs and automated generation and refresh of cryptographic keys. Actually, the terms IKE and ISAKMP are not precisely interchangeable; IKE is a hybrid protocol incorporating features from the following:

ISAKMP provides a framework for authentication and key exchange but does not define them. ISAKMP is designed to be key exchange independent; that is, it supports many different key exchanges. Refer to [57, 58] for further details.
Oakley describes a series of key exchanges (called modes) and the services provided by each (e.g., Perfect Forward Secrecy [PFS] for keys, identity protection, and authentication). Refer to [60] for further details.
SKEME describes a versatile key exchange technique that provides anonymity, repudiability, and quick key refreshment. Refer to [64] for further details.

In practice IKE creates an authenticated, secure tunnel between two entities and then negotiates the SA for IPSec. This process requires that the two entities authenticate themselves to each other and establish shared keys. Note that it is not necessary to use IKE, but manually configuring SAs is a laborious and maintenance-intensive process for anything other than a small lab network. In practice IKE will be used for most real-world applications to enable scalable, rapid deployment.

IKE defines a standardized framework to support negotiation of security associations, initial generation of all cryptographic keys, and subsequent refresh of these keys. Oakley is the mandatory key management protocol that is required to be used within the IKE framework. IKE supports automated negotiation of security associations and automated generation and refresh of cryptographic keys. The ability to perform these functions with little or no manual configuration of machines will be a critical element as a VPN grows in size. In addition, the IKE methods have been designed with the explicit goals of providing protection against several well-known exposures, as follows:

Denial of Service (DoS)—The messages are constructed with unique cookies, which can be used to quickly identify and reject invalid messages without the need to execute processor-intensive cryptographic operations.
Man in the Middle (MITM)—Protection is provided against the common attacks, such as deletion of messages, modification of messages, reflecting messages back to the sender, replaying old messages, and redirection of messages to unintended recipients.
Perfect Forward Secrecy (PFS)—Compromise of past keys provides no useful clues for breaking any other key, whether it occurred before or after the compromised key. That is, each refreshed key will be derived without any dependence on predecessor keys.

Operation

IKE requires that all information exchanges must be both encrypted and authenticated so that no one can eavesdrop on the keying material, and the keying material will be exchanged only among authenticated parties. This is required because the IKE procedures deal with initializing the keys, so they must be capable of running over links where no security can be assumed to exist. Hence, the IKE protocols use the most complex and processor-intensive operations in the IPSec protocol suite.

Peer entities must be authenticated to each other before the IKE SA can be established. IKE is very flexible in this regard, supporting multiple authentication methods. The two entities must agree on a common authentication protocol through a negotiation process. At the time of writing the following mechanisms are generally implemented:

Preshared keys—The same key is preinstalled on each IPSec host. IKE peers authenticate each other by computing and sending a keyed hash of data that includes the preshared key. If the receiving peer is able to independently create the same hash using its preshared key, it knows that both parties must share the same secret, thus authenticating the other party.
Digital signatures—Each IPSec device digitally signs a set of data and sends it to the other party. This method is similar to the previous one, except that it provides nonrepudiation. Currently both the RSA public key algorithm and the Digital Signature Standard (DSS) are supported.

Both digital signature and public key cryptography require the use of digital certificates to validate the public-private key mapping. IKE allows the certificate to be accessed independently (e.g., through DNSSEC) or by having the two devices explicitly exchange certificates as part of IKE.

The integrity of any cryptography-based solution depends more on keeping keys secret than it does on the strength of the cryptographic algorithms used. With IPSec both parties use a shared session key in order to encrypt the IKE tunnel, negotiated via Diffie-Helman. IPSec employs a set of very robust Oakley exchange protocols, using a two-phase approach, as illustrated in Figure 5.19. As we can see in Figure 5.19, in main mode, messages 1 and 2 negotiate the characteristics of the SAs. Messages 3 and 4 exchange nonces and also execute a D-H exchange to establish a master key (SKEYID). Messages 1 through 4 flow in the clear for the initial Phase 1 exchange, and they are unauthenticated. Messages 5 and 6 exchange the required information for mutually authenticating peer identities. The pay-loads of these messages are now protected by the encryption and keying material established with messages 1 through 4. Note that if aggressive mode is used instead of main mode, then only four messages are required, but the SA is not authenticated. Quick mode takes three or four messages only (depending upon whether the commit bit is set).

click to expand
Figure 5.19: Phase 1 and Phase 2 IKE and IPSec SA negotiations.

Phase I—Initializing SAs with IKE

In Phase 1 IKE initially establish SAs and exchange keys between two systems that wish to communicate securely. Throughout this section we refer to the parties involved as Host 1 (H1, the initiator) and Host 2 (H2, the responder). This set of negotiations establishes a master secret from which all cryptographic keys will subsequently be derived for protecting data traffic. In the most general case, public key cryptography is used to establish an IKE SA between systems and to establish the keys that will be used to protect the IKE messages that will flow in the subsequent Phase 2 negotiations. Phase 1 is concerned only with establishing the protection suite for the IKE messages themselves, but it does not establish any SAs or keys for protecting user data.

The SAs that protect the IKE messages are set up during the Phase 1 exchanges. Since we are starting cold (no previous keys or SAs have been negotiated between H1 and H2), the Phase 1 exchanges will use the IKE identity protect exchange (also known as Oakley main mode).

Six messages are needed to complete the exchange, as illustrated in Figure 5.19. As an alternative to main mode IKE supports an option called aggressive mode, used for expedited SA connections (only three messages are exchanged, so this SA is unauthenticated). IKE also offers a solution when the remote host's IP address is not known in advance. IKE allows a remote host to identify itself by a permanent identifier, such as a name or an e-mail address. The IKE Phase 1 exchanges will then authenticate the remote host's permanent identity using public key cryptography, as follows:

Certificates create a binding between the permanent identifier and a public key. Therefore, IKE certificate-based Phase 1 message exchanges can authenticate the remote host's permanent identify.
Since the IKE messages are carried within IP datagrams, the IKE peer (e.g., a firewall or destination host) can associate the remote host's dynamic IP address with its authenticated permanent identity.

Phase 2—Initializing protocol SAs for data transfer

Upon successful completion of Phase 1, H1 will initiate the Oakley Phase 2 message exchanges (known as Oakley quick mode) to define the IPSec SAs and keys used to protect IP datagrams exchanged between users. Phase 2 exchanges are relatively simple, since a secure channel has already been established; all that is required is to negotiate the SAs and keys that will protect user data exchanges. In practice Phase 2 negotiations occur much more frequently than Phase 1 negotiations; for example, a typical application of a Phase 2 negotiation is to refresh the cryptographic keys once every two to three minutes. All IKE Phase 2 payloads, except the IKE header itself, must be encrypted using the algorithm agreed upon during Phase 1 negotiations. Phase 2 authentication is achieved through the use of several cryptographically based hash functions. The input to the hash functions is derived partly from Phase 1 information (SKEYID) and partly from information exchanged in Phase 2. Authentication is based on certificates, but the Phase 2 process itself does not use certificates directly (it uses the SKEYI—a material from Phase 1, which was authenticated by certificates). Oakley quick mode comes in two forms, as follows:

Without a key exchange attribute, quick mode can be used to refresh the cryptographic keys but does not provide the property of Perfect Forward Secrecy (PFS).
With a key exchange attribute, quick mode can be used to refresh the cryptographic keys in a way that provides PFS. This is accomplished by including an exchange of public D-H values within messages 1 and 2.

Note that although PFS is highly desirable in cryptography, the specifications treat PFS as optional. They mandate that a system must be capable of handling the key exchange field when it is present in a quick mode message but do not require a system to include the field within the message.

Performance issues

IKE Phase 1 uses public key cryptographic operations, which are very processor intensive. Phase 2 uses the less processor-intensive symmetric key cryptography. However, a single IKE Phase 1 negotiation can protect several subsequent Phase 2 negotiations, so in practice Phase 1 negotiations are relatively infrequent. As a general rule one might expect Phase 1 negotiations to be executed once a day or maybe once a week for relatively stable fixed VPNs, with Phase 2 negotiations executed once every few minutes. Clearly, dial-up VPNs are required to execute Phase 1 for the duration of each call.

Using IKE with remote access

The key difference for remote access applications is the use of Oakley to identify the remote host by name, rather than by its dynamically assigned IP address. Once the host identity has been authenticated and the IP address binding is known, the rest of the procedures are identical to those described previously. As indicated, Phase I is only executed once, when the dial-up connection is first initiated. Assuming remote host H1 is dialing into a home network, the key points to note are the following:

H1's dynamically assigned IP address is placed in the IP header of all IKE messages.
H1's permanent identifier (e.g., its e-mail address) is placed in the ID field of the IKE Phase 1 messages.
H1's certificate used in the IKE exchange must be associated with H1's permanent identifier.
H1's dynamically assigned IP address is used in traffic-bearing datagrams (the destination IP address that is used together with the SPI and protocol type to identify the relevant IPSec SA for inbound processing).

IKE, ISAKMP, Oakley, and SKEME embody many important concepts, and we have skimmed somewhat over the details here. For further information about IKE, the interested reader is referred to [57–60, [64]].

5.5.6 Design considerations

IPSec is powerful and very flexible; it offers a wide variety of connectivity scenarios, including gateway to gateway, host to host, and host to gateway (where gateways could be firewalls, routers, dedicated VPN appliances, switches, and RAS/NAS boxes; and hosts could comprise servers, PCs, workstations, laptops, and possibly even PDAs and mobile phones in the future). IPSec's initial applications are to secure communications across public and private networks; however, it may also be used to secure communications across the LAN. Note that all IPv6 devices will incorporate IPSec as standard. Some of the key applications currently being deployed include the following:

Branch office connectivity over the Internet—Promises dramatic savings on long-haul wide area costs for large internetworks and enables small and medium-sized companies to rapidly deploy low-cost private networks over the Internet. This assumes that the QoS offered by service providers meets business needs.
Remote access over the Internet—Promises major savings on dial charges for mobile workers and SOHO users. Users dial in to their local ISP and are typically tunneling securely through to the corporate network.
Extranet and intranet connectivity with partners—IPSec's strong authentication and encryption support means that VPNs can be established to suppliers or partners, each with unique security policies assigned.
E-commerce security—Since IPSec encrypts and authenticates at the Network Layer, it can be used with other e-commerce protocols such as SSL and SET to enhance security.

IPSec presents excellent opportunities for service providers to provision secure managed services. For dedicated IVPNs the service provider would typically supply, configure, and manage the CPE devices used to construct a VPN. There are already a number of VPN appliances on the market targeted specifically for this application. Providers can bring this service to market very rapidly, since IPSec's transparency means that IVPNs can be deployed without any modification to the service provider infrastructure. Providers can offer basic QoS if they build their IVPN service over a premium IP service (possibly mapped over ATM or Frame Relay). It is also relatively straightforward to provide dial support using IPsec client software on the remote user PCs. Again, this requires no change to the provider infrastructure. We will now examine the most common deployment scenarios for IPSec.

Example design 1: simple end-to-end host security

As shown in Figure 5.20, two hosts are connected through the Internet (or an intranet) without any IPSec gateway between them (standard routing is employed). In this case the hosts may use ESP, AH, or a combination in either transport or tunnel mode. Figure 5.20 illustrates the concept with possible encapsulation formats.

click to expand
Figure 5.20: Simple end-to-end security using a host IPSec stack only. Note that IPorg is the original IP header; IPtun is the new IP header created in IPSec tunnel mode.

Example design 2: basic VPN support

Figure 5.21 illustrates a very simple VPN created between two IPSec gateways, G1 and G2. In this case the hosts are not required to support IPSec. The gateways are required to support only tunnel mode, either with AH or ESP. This configuration would allow private network addresses to be handled over a public IP network such as the Internet. However, bear in mind that the addressing schemes need to be consistent (e.g., they should not overlap).

click to expand
Figure 5.21: Basic VPN security using gateway IPSec stacks between two intranets. Note that IPorg is the original IP header; IPtun is the new IP header created in IPSec tunnel mode.

It may be desirable to configure tunnels between gateways that combine both AH and ESP support. Some products support this capability, with the encapsulation order configurable via the tunnel policy. Note that this is often termed combined tunneling and should not be confused with iterated tunneling (since the SA bundle comprising the tunnel has identical end points, it would be inefficient to perform iterated tunneling in this case). Instead, one IPSec protocol is applied in tunnel mode and the other in transport mode, which can be conceptually thought of as a combined AH-

ESP tunnel. An equivalent approach is to IP tunnel the original datagram and then apply transport adjacency IPSec processing to it. The result is that we have an outer IP header followed by the IPSec headers in the order set by the tunnel policy, and then the original IP packet.

Example design 3: end-to-end security with VPN support

Figure 5.22 illustrates a scenario that is a combination of Designs 1 and 2. Both hosts and gateways are required to support IPSec. The security gateways must be configurable to enable IPsec traffic (including ISAKMP traffic) to be passed for hosts behind them. In this situation one would normally configure the gateways to use AH in tunnel mode, protecting the host traffic, which is set to ESP transport mode.

click to expand
Figure 5.22: Remote access.

Note that for enhanced security we could use a combined AH-ESP tunnel between the gateways (if supported), so that the ultimate destination addresses would be encrypted, the entire packet traversing the Internet would be authenticated, and the encapsulated data double encrypted. This is the only case where three stages of IPSec processing might be useful; however, the performance impact is likely to be considerable.

Example design 4: remote access

Figure 5.23 illustrates a secure design—remote access over the Internet to reach a server (H2) in an organization protected by a firewall (FW). The remote host would typically use a PPP dial-in connection to an ISP RAS. The ISP would then onward route using conventional routing. Between the remote host H1 and the firewall, only tunnel mode is required. Between the hosts either tunnel mode or transport mode can be used, with the same choices as in Design 1. A typical configuration would be to use AH in tunnel mode between H1 and FW and ESP in transport mode between H1 and H2 (note that early IPSec implementations that do not support AH in tunnel mode cannot implement this design). Note that in this case, the sender must apply the transport header before the tunnel header. Therefore, the management interface to the IPsec implementation must support configuration of the SPD and SAD to ensure this ordering of IPsec header application. It may also be possible to create a combined AH-ESP tunnel between the remote host, H1, and the gateway, FW. In this case H1 can access the whole intranet using just one SA bundle.

click to expand
Figure 5.23: End-to-end security. Both gateways and hosts support IPSec.

While the combination of the IPSec protocols in theory leads to a large number of possibilities, in practice only a few (such as those presented previously) are commonly used. Use of other, optional combinations may adversely affect interoperability [54].