10.4 Internet Protocol Security (IPSec)

Layer 2 tunneling protocols such as PPTP and L2TP are not the current state-of-the art in IP VPN technology. Currently, this is the domain of the Internet Protocol Security (IPSec) suite of protocols. By combining tunneling and encryption, many users are able to realize cost savings, flexibility, and confidentiality for their data.

IPSec is a mechanism used to encrypt data. Using encryption and tunnels, a virtual private network can be created over the Internet. As mentioned, this is often a preferable solution to companies for a single reason — cost. Internet access alone is generally cheaper than a leased layer two VPN through a service provider. This should not be the only consideration, however. Some service providers are marketing their layer 2 solutions with much lower costs to prevent customers from making the switch to a pure IP/VPN solution. This is because the second major advantage of an IPSec VPN is that the service can be provisioned completely independently from the service provider. This freedom is another powerful incentive for those who are frustrated by the sometimes long provisioning cycles of service providers.

The primary disadvantage of IPSec for some time has been interoperability. IPSec is a complicated set of protocols spanning several RFCs. In fact, the primary criticism of IPSec has been that it is too complicated, with too many options to consider when deploying it. These options and disagreements about the standards for implementations have led many vendors to develop incompatible IPSec solutions. This landscape has changed in recent years and, although interoperability is not perfect yet, an understanding of the IPSec protocol and the most commonly encountered options will go a long way in ensuring you have a painless VPN rollout.

To understand where to use IPSec in our networks, we first must discuss how IPSec operates. To protect traffic, IPSec uses two different types of protocol headers, the Authentication Header (AH), which as its name suggests is used to ensure the integrity of data that has been transmitted, but does not encrypt data. The other header is the Encapsulating Security Payload (ESP). ESP is the header that is used to encrypt data and provides the confidentiality service that we commonly associate with a VPN. The IPSec standards also call for two modes of operation: tunnel mode and transport mode. Of the two, tunnel mode is more common, but we will examine uses for each mode.

We will discuss the AH and ESP headers in more detail shortly, but for now let us introduce the concepts so that we can establish a baseline vocabulary for further discussions. While IPSec is commonly associated with encrypted data, this is only one of the features that the protocol suite offers. The AH only ensures that the data that has been transmitted has not been altered. That means that if Host A sends a message to Host B over a shared network, someone could sniff the data and see the contents of the packet. Use of the AH would mean, however, that the person in the middle could not alter the contents of that packet without alerting the receiving end. Furthermore, if proper authentication of the session has occurred, Host B would have a very high assurance that it was indeed communicating with Host A.

What is the point of all this IPSec complexity if data cannot be encrypted? Is that not what we want IPSec for in the first place? Readers in countries other than the United States might more quickly appreciate this; in some places, the encryption of data is illegal. In this case, the AH at least provides the assurance that if someone is observing your data, they cannot change it without alerting you. In short, the AH provides integrity, but not confidentiality.

The ESP, on the other hand, lives up to what we commonly associate with the confidentiality requirements of a VPN. The payload of the packet is encrypted and is encapsulated by the ESP header and an ESP trailer. Typically, the ESP header provides confidentiality, but not integrity. That means you can be fairly sure that someone has not been able to read your data, but you cannot be completely sure that someone has not sent you a fake encrypted packet inserted into your communications with a remote host. However remote this possibility may be, ESP does have an authentication option that can be configured if required. It should also be noted that ESP does not need to provide encryption at all. Null encryption can be used. That means that the ESP header is attached to data, but the data is not encrypted. Remember this if you are ever troubleshooting an IPSec connection.

In addition to the two major header types, IPSec also uses two modes of operation, each of which is discussed in some detail here, along with suggested implementations of each mode.

The easiest mode to work with if you are new to IPSec is the transport mode. In this case, the data being transmitted is simply encapsulated in an AH or ESP header and transmitted. In a sense, IPSec just becomes another header layer in the IP suite of services and, as data is passed down the stack it is encapsulated just as it is at any other layer. The packet diagram in Exhibit 10 shows an example of TCP data being protected by an ESP header in transport mode (upper) and another diagram showing the AH header providing integrity to a packet (lower). In this case, the IP header on the outside of the packet contains the same IP address header of the originating host. Tunnel mode means that an IP packet carrying user data is tunneled inside another IP packet. If you recall the previous discussion of tunneling and encapsulation around PPTP and L2TP, you will see that the same process is occurring here. Exhibit 11 shows a tunneled packet.

Exhibit 10: ESP and AH Transport Mode

click to expand

Exhibit 11: ESP Tunnel Mode

click to expand

In the case of PPTP, the "tunnel header" field was a GRE|PPP header combination. In the case of L2TP, the tunnel header was an L2TP| PPP header. IPSec is not interested in the extension of a layer 2 header and so the tunnel header is an AH or ESP header.

There are several significant elements of tunnel mode operation. The first is that the IP headers on the packet do not need to match. This means that Host A can send out normal unencrypted data, and a router or another VPN gateway device could accept the packet and then tunnel it inside another IP packet over the Internet using an IP address from the gateway. The packet that the Internet sees has the IP address given to it by the VPN gateway device. Eventually, another VPN gateway device would receive the packet and remove the outermost IP header and then decrypt the data. The now-unencrypted IP packet will have the IP address of Host A. This unencrypted packet can now be forwarded on to Host B as a normal IP packet. This process is diagramed in Exhibit 12.

Exhibit 12: Gateway-to-Gateway ESP Tunnel Mode

click to expand

While the data is traveling over the local LAN, it is unencrypted and viewable to anyone with a sniffer. When it travels out over the Internet, the packet is encrypted by the gateways. Using this method, entire networks behind the gateways can communicate with each other using only one IPSec connection between the gateways. Anyone monitoring traffic on the Internet would only see a lot of traffic passing between the two gateways, but would not be able to read any of the data nor have any clues as to which LAN devices on each network were communicating with each other. This ability to multiplex many user sessions into a single tunnel is a powerful ability of IPSec. The traffic patterns between internal LAN hosts are hidden to any interloper.

Because the internal IP header is not visible to any Internet traffic, these addresses can also be in the private range. My 192.168.1.0/24 network could be connected to your 10.0.0.0/8 network with no problems, as long as the gateways that we were using had at least one public address to "wrap" the encrypted traffic between our networks.

Taking this concept one step further, it also means that a remote client could connect to a company gateway using a public Internet address, as is done with L2TP or PPTP solutions. Inside the tunnel, a virtual interface could be created for the remote client and the remote client could be assigned a company internal or private address to use for communications over the tunnel. This allows the remote gateway to configure the client using extensions to DHCP and eases the configuration of roaming IPSec clients. From the point of view of the virtual tunnel interface, the remote client looks like it is part of the company subnet.

Together, transport mode and tunnel mode cover just about all the requirements for connectivity that could possibly be presented. Of the two, tunnel mode is the more commonly employed, for three reasons. The first reason is that the majority of IPSec implementations are of the gateway-to-gateway type, although this is changing as IPSec alone becomes more widely available for host-to-gateway configurations. This type of deployment naturally fits with tunneled data for reasons explained above. All traffic between the gateways can be encrypted with a single IPSec tunnel. This greatly increases the simplicity of deployment and improves the performance of the IPSec gateways.

Remote clients also commonly use tunnel mode because of the ability to use private addresses on the internally tunneled IP header. This greatly simplifies firewall configuration for tunneled packets and address allocation issues for network administrators.

Using tunnel mode with remote clients also has another important benefit that is explained in greater depth shortly; it works better with network address translation (NAT). There are a number of reasons why IPSec and NAT do not work well together; but let it suffice for now that of all the problems, the ones with tunnel mode are more easily fixed than those with transport mode. Because a popular use of IPSec VPNs is with SOHO (small office/home office) users who connect from their home networks using NAT to the corporate office, NAT issues were a serious impediment to the adoption of IPSec as the VPN protocol of choice.

Transport mode is best suited for direct data transfers between two hosts. For this reason, it is less likely seen in host-to-gateway implementations because the reasons for these connections are not to access the gateway itself, but to access services behind the gateway. Transport mode, however, does have an application for LAN hosts. On a local network, there is no need for the additional overhead of another IP packet and the benefit of hiding inside another IP packet would be questionable. The LAN, however, is where data is most likely to be sniffed or otherwise exposed. If this would be a concern in your organization, then the IPSec transport mode would be an ideal solution.

Having discussed the basics of IPSec, we can now begin to look at the protocol in a bit more detail. While a general overview of IPSec is helpful, when considering interoperability, troubleshooting, and implementation choices, it is best to know the protocol suite to some level of detail.

10.4.1 Authentication Header (AH)

We begin this discussion by examining the authentication header (AH). This header provides integrity, but not confidentiality of data. While use of the AH is not widespread, it will provide the opportunity to examine several common elements of IPSec headers. The first element to understand about AH is the integrity it provides to IP packets. This will be an important issue when we examine the operation of other protocols that like to change packet headers and data, such as NAT.

AH, unlike the ESP authentication option, provides integrity to the entire IP packet, including the IP header itself. Thus, if anyone tried to spoof the IP header or even modify information in it, then the authentication of the packet would fail.

A careful examination (see Exhibit 13) of the operation of the IP header through multiple hops will show that the IP header itself, during normal operation, is not a static entity. If the IP header is changing throughout the lifetime of the packet, how can authentication on the IP header itself be performed? Although the TOS flags — fragment information, time-to-live (TTL), and checksum values — change as the packet travels from hop to hop, the AH does not include those fields in its calculations. So, AH does not authenticate every bit of a packet, but it would be difficult to modify the TTL or checksum without affecting the deliverability of the packet to begin with.

Exhibit 13: AH Format

click to expand

The AH is always found immediately after the outermost IP header. ^[6] Protocol ID 51 in the IP header indicates to the receiving device that the next header is an AH. Like most protocols, the AH has a field that indicates what the protocol header after the AH header will be. This allows the receiver to correctly interpret the bits as the packet is read. ^[7] The next header can be any IP protocol. One of the primary advantages of IPSec over other IP-based solutions like SSL is that IPSec works on every IP packet, not just those of a particular higher-layer protocol.

Because the authentication data at the end of the packet is a variable, the length of the packet as a whole needs to be described to the receiver so that it is aware of when the AH information ends and the Next Header begins. This is the purpose of the Payload Length field (Payload Len.), which describes the AH in 32-bit words (four octets). Note that this is the same value that is expected by the IP and TCP headers.

The Reserved field is currently not used and should be set to all zeros.

The "security parameters index" (SPI) is the first significant new term that we find in the AH. It is worth a thorough discussion because the concept of the SPI is critical to the operation of the ESP and IPSec in general. Consider the following example. A VPN device is terminating a number of IPSec connections. Each one of these connections has a number of parameters associated with it, such as the encryption algorithm to be used, the keys, the mode (tunnel/transport), the header option used with that neighbor (AH/ESP), and any initialization vectors that may need to be shared. This data is all kept internally to the VPN device in a security policy database (SPD). The SPD contains information specific to each security association (SA) that the gateway currently has active. There needs to be some way for the VPN gateway to associate an incoming packet with the agreements that the gateway has previously made with a remote host. Depending on the implementation, the VPN gateway may also have multiple associations and multiple connections with a single remote host, thus eliminating the possibility of simply using the source IP address of the packet to distinguish between them. Instead, each packet is labeled with a unique security parameters index. When the VPN gateway receives packet with an AH header and needs to check the security association to make sure the integrity is valid, the SPI is the value that tells the gateway which SA should be compared against. The SPI is a connection identifier for the IPSec protocol.

SPIs are unidirectional. That is, the SPI that Host A uses when it sends to Host B is different than the SPI that Host B uses when it sends to Host A. As previously seen in the discussion surrounding PPTP and L2TP, keeping these values locally significant reduces the complexity of making sure that values are unique. Instead of Host A asking Host B what value is OK to use, Host A simply chooses an SPI value that it knows it is not using with Host B. Because the IP address is also included in the packet header, if two remote hosts happen to send the same SPI to Host B, the source IP address can still be used to distinguish the IPSec connections from each other. Between the source IP address of packet, the mode/header in use, and the SPI, every IPSec packet can be uniquely identified and referenced in the security database on the VPN gateway.

Back to the AH itself; the sequence number field is used to protect against replay attacks. If an attacker is able to capture a packet from Host A to Host B, he could not change it without affecting the integrity of the packet, but he could send the packet again. In some cases, this attack may have the desired effect. IPSec peers can use a sliding window algorithm that essentially says, "Only packets in this sequence number range will be accepted at this point." The window is sliding because as packets are received, the expected sequence numbers will likewise increase. This is an optional feature and not all implementations have the interpretation of received sequence numbers enabled. Its use does, however, increase the integrity offered by the header because not only can individual packet tampering be identified, but any potential alterations of the traffic flow can potentially be detected.

The actual work of the AH is found in the authentication data field. This is the integrity check value (ICV), which is essentially a Hashed Message Authentication Code (HMAC) with either the MD5 or SHA-1 algorithm. The ICV covers all but the mutable fields in the IP header and the ICV value itself in the AH. Otherwise, the contents of the entire packet are protected against tampering by this value.

When an AH packet is received by a remote device, the packet is first evaluated against the existing security associations database (SAD) by looking up the packet source, operational mode, and SPI in the database. If no existing security association is found for the packet, then the packet is discarded and the event logged. If the SA does exist and the sequence number is within the expected range, then the packet is considered valid and the receiving host runs an identical hash on the packet. If the SA tells the host that the sender is using HMAC-SHA-1 to calculate the ICV, then the receiver will use the same algorithm. If the computed value matches the ICV in the received packet, then the packet is valid and further processing can occur.

10.4.2 Encapsulating Security Payload (ESP)

The Encapsulating Security Payload (ESP) header is primarily used to provide confidentiality to transmitted data. It will also, however, provide some origin authentication in the manner of the AH and anti-replay protection when using sequence numbers. While the packet header is somewhat more complex in order to accommodate this, many concepts we have already covered while discussing the AH and will be familiar.

The ESP header (Exhibit 14) is inserted in the same position as the AH in an IP packet. When the outermost IP header next protocol field is set to 50, then the next header is an ESP header. Unlike AH, ESP also uses at least one trailer field — and possibly two. The first ESP header itself contains the SPI and the sequence number. The first ESP trailer, which is always included, has information on any padding that might have been inserted into the packet to facilitate the encryption algorithms and the next header for the encrypted data itself. If the authentication option is chosen, then ESP will include a separate trailer describing that information.

Exhibit 14: ESP Header Format

click to expand

The first field of the ESP header is the SPI. As with the AH, the SPI is a connection identifier. It is used by the security association database (SAD) on the host to map a packet with specific connection information in the security association (SA). Use of the SPI ensures that each IPSec connection can be identified based upon the source IP address, mode/header, and SPI.

The use of sequence numbers is mandated by the ESP RFC, but only by the sender. The receiver can choose to accept or discard the delivered sequence information. If the anti-replay protections are being used, which is generally a default setting, then the information is processed. The sliding window is used to ensure that only packets within an acceptable range are processed.

Payload data is either the entire tunneled IP packet, or transport, and other upper layer data. From this field to the end of the next header field, all data is encrypted using one of a number of encryption algorithms.

Some encryption algorithms only work on blocks of data of a given size. Because the data sent by upper layers is unaware of these requirements, there is sometimes the need to pad the data to give the encryption algorithms the proper amount of data to encrypt. If there is a pad, then the receiver of the data needs to know how many bits at the end of a packet are parts of that pad so that they can be removed before forwarding the real data up the stack. This accommodation to the encryption algorithms is responsible for the presence of both the padding field and the padding length field.

Padding can also be used for other reasons. IP protocols generally require that data be sent aligned on four octet boundaries. Regardless of the needs of the encryption protocol, the upper layer data may need to be padded to meet these requirements.

Perhaps the most interesting use of padding, however, is to further hide data. Assume that you were able to sniff data encrypted using ESP over a network. Of course, the data itself would not be accessible to you without the proper decryption key, but a little knowledge of the TCP/IP suite, and the length of the packets themselves would enable you to make some pretty good guesses about the function of the hosts that were passing data back and forth. Padding can be used to reduce this information by padding all packets to a uniform size. Traffic analysis is still possible but becomes more difficult.

The final field of the ESP trailer is common to most TCP/IP protocols. This is the indication of the next header in the packet. The position of the ESP header is interesting, however. We see that it is applied after the data itself; that is, the receiver of the data will not be able to tell what the next header is until it has already read the packet. While not intuitive, it makes sense if we consider the decryption process by a remote receiver. To encrypt as much data as possible, the ESP covers all data in the packet except the SPI, sequence number, and any authentication data that may be present. The only reasons that these fields are not encrypted as well is that they are required to either determine if the packet is valid or not and to associate the packet with the keys that are used to decrypt it in the first place.

Once the receiver deems the packet a valid ESP packet, the decryption process occurs. The front of the payload is decrypted first and stored as the rest of the packet is decrypted. When the user data has been entirely decrypted, the next hop header is decrypted. This gives the IPSec device the information it needs to immediately forward the packet up the protocol stack without having to revisit the information at the front of the packet. After all, the reason that most next headers are included at the front of the packet is only for convenience's sake. Packets are read as they are received. In the case of an encrypted packet, nothing can be read until it is decrypted anyway. Placing the next header at the end of the packet allows the packet to be decrypted a bit at a time and then a forwarding decision made without having to re-read the entire packet.

While authentication is an option in the ESP protocol, it is commonly used to ensure that encrypted packets have integrity in addition to confidentiality. In a case of using network address translation (NAT), the ESP authentication trailer is the only way to apply authentication to a packet. Note that, according to the packet header, the authentication mechanism of the ESP protocol does not authenticate anything in the IP header of the packet like AH. Instead, only the encrypted data itself and the SPI are authenticated. In most cases, the authentication that the ESP header uses is enough although it should be noted that there is no reason that the ESP and AH could not be used in combination for certain very sensitive installations.

The authentication process for the ESP protocol, other than what information is protected, is the same as what occurs in the AH protocol. That is, a hashed message authentication code (HMAC) is used with a common hashing algorithm such as MD-5 or SHA-1. The value in the authentication data field of the ESP trailer is simply the hash value of the covered information or, in IPSec terms, the integrity check value (ICV).

Use of the authentication trailer is advantageous in an ESP environment because it reduces the effect of a denial-of-service attack on an IPSec device. Consider this: you wish to deny service to the legitimate users of a VPN gateway. Knowing that the decryption process is the most intensive portion of receiving IPSec data, you send a series of invalid packets that look like they should contain encrypted information but actually are just garbage characters. Because you know that an SPI stays the same through the duration of a connection and that sequence numbers increment one per packet, capturing a few packets will give you a valid source and anti-replay window in which to send packets. Without attempting to decrypt these packets, the receiver has no way of knowing whether or not the packets are valid. Because the authentication option uses a hashed message authentication code based on an asymmetric signature, the origin of the packet can be confirmed. Presumably, your forged packets would not have the proper signature and would be discarded.

This, of course, would only work if the authentication data was examined prior to or in parallel with the decryption process. In the typical genius of protocol designers, this is exactly what happens. Packets that do not pass the authentication test are discarded without attempts at decryption.

So far we have covered the basics of an IPSec connection. When considering the basics, we can see why some have criticized IPSec for being too complicated. We have discussed options for using the transport mode versus tunnel mode and for using the authentication header versus the encapsulating security payload. It does not help that an ESP transport mode packet can be tunneled inside an ESP tunnel or that the ESP tunnel can also have AH applied to it to ensure the integrity of the IP header or you could just apply the ESP authentication option, or both. The flexibility of IPSec leaves many heads spinning. Fortunately for those of us who need to implement the protocol, there are really only a few combinations of options that really make sense for daily use. Most of the time, unless IPSec is going to be used on a LAN, the ESP tunnel mode is the option most often seen. There are some VPN vendors that support only this option, with no chance of configuring others.

Although we may have sorted out these options, we are only a small way into the overall discussion of IPSec operation.

^[6]One confusing element of IPSec in general is that the standards describe the ability to use AH and ESP together in the same packet. While possible, this is not widely employed and will not be further discussed here.

^[7]One of my colleagues likes to describe a host reading a packet using a Pac Man analogy. The Pac Man eats the bits one at a time — but always likes to know what the next meal (header) is to get psyched up for it.