10.6 Internet Key Exchange (IKE)

It is best to start the examination of IKE at a high level and then revisit it to examine the details. IKE has two main phases of operation. The first one is conveniently known as phase one and the second as phase two. During the first phase, no secure channel exists between the two Internet hosts. The primary goal of phase one is to create an encrypted session between the two hosts so that the exchanges of IPSec-specific information can then occur in phase two.

In phase one, IKE operates in one of two modes: main mode or aggressive mode. The primary difference between the two modes is simply the number of packets that are exchanged and the amount of protection offered to protect the exchange itself and the ability to verify the identity of the remote host in a secure manner.

Main mode is the original specification of IKE. It does not, however, mean that it is the default mode of operation for a given VPN device. By default, some VPN gateways expect the packet exchanges to occur in aggressive mode and others expect them to occur in main mode. Both sides need to agree for the IKE protocol exchange to go any further than the first packet. Once the secure channel has been created, IPSec negotiations will begin. This will be a series of IPSec-specific proposals between the two hosts. To put it nontechnically, it is one side saying to another, "OK, I can support the RSA algorithm with 128-bit keys with an SHA-1 160-bit hash. I would like to do this in tunnel mode using ESP if you could. If you cannot support that, then here is my next idea…."

Once phase one has been completed using either main or aggressive mode, a secure channel now exists between two remote hosts across the network. Furthermore, the IPSec specifics as far as encryption algorithms have also been established. All that is left to do in phase two is to actually exchange the keys to be used to encrypt data.

Once this combination of events has occurred, IPSec is ready to begin the actual transmission of data. Now that we have a high-level view of what needs to go on with IKE, let us look at the specifics of the matter and see where things can go wrong and what we need to be careful of.

When we stop to consider the ways that one host can authenticate itself to another, two general cases arise for both main mode and aggressive mode. The first method is that we can use digital signatures. The second is that we can use public key encryption. These methods imply that there is some way to share the digital signatures or the public keys of course. If the infrastructure does not exist to support this, we can simply pre-share the keys. This is another way of saying that each side will agree on a password to use in the process. The danger of the second option is that the encryption will only be as strong as the password. It is, however, a very simple way to exchange authentication information from one host to another and is very commonly used in gateway-to-gateway implementations.

Each of these authentication processes has a separate phase-one IKE packet exchange. It would not be beneficial to go through each one of them in detail — that is what RFC 2409 does; but it would be helpful to examine one of the packet exchanges. Although the exact information in the packets is different for each method of authentication, there are enough similarities between them so that lessons drawn from one can be applied to others. In our case, we will examine the protocol exchange used during public key authentication.

While IPSec itself uses IP numbers 50 and 51 to identify IPSec traffic, the IKE process is a UDP packet sent on port 500. Normally, the source and destination port of IKE is 500; but if NAT is used, the source port may be different. This also is the first point of failure when setting up a VPN connection. If IKE never has a chance to initiate, odds are that either the network itself is broken with regard to routing, IP addresses, etc., or there is a firewall that is preventing UDP port 500 from being forwarded. Some broadband service providers, eager to market their own VPN services to SOHO users, will actively filter out port 500 UDP to prevent home users from successfully establishing an IKE exchange with the branch office. Before committing to a VPN solution as a countermeasure for your remote connectivity needs, make sure that your users will not be using such a service provider.

The first decision that needs to be made when initiating IKE connection is which mode will be used in phase one of the connection. The first example we will examine here is based on main mode negotiations (Exhibit 15).

Exhibit 15: Main Mode IKE Operation

click to expand

The first two packets in Exhibit 15 are simply the "hello" between the two hosts. During this process, the security association (SA) that is going to be used for the remainder of the IKE process is negotiated. The SA is a term that we have seen before associated with IPSec. While IPSec does use the concept of the SA, it is not an idea that is unique to IPSec. IKE, like IPSec, has a number of variables that need to be kept track of during the key exchange process. Some of the information that needs to be associated with a particular IKE session will include the encryption algorithm to be used in the phase one setup, any type of hash algorithm included, authentication, and typically Diffie-Hellman information.

To create a secure channel over an insecure medium, without compromising the integrity of the session, the Diffie-Hellman protocol is used. Discussed in a previous chapter, Diffie-Hellman is a protocol that can be used to exchange keys over an insecure medium to create a temporary secure channel. It is the second packet exchange of IKE that exchanges the Diffie-Hellman values, the identification of the remote side (normally its IP address), and a nonce. Nonce is a new term, but a simple concept. A nonce is simply another term for a pseudo-random number ^[8] that each side generates and then encrypts. The theory is this: if Host A is able to encrypt a nonce with Host B's public key, the only host on the Internet that should be able to decrypt that nonce and send it back to Host A would, of course, be Host B. This exchange proves to each side of the connection that the other side is who they claim to be and that the packets are not being intercepted and modified as in a man-in-the-middle attack.

Once the Diffie-Hellman values have been exchanged and the identity of the remote side established, a channel using the secret keys communicated during the Diffie-Hellman process has been established. Now that nobody can see what is going on, the two parties can then begin to transfer the important information. The final packet in the phase one exchange is a hash of several pieces of information, including the keys used in the protocol exchange, the Diffie-Hellman information, cookies from the ISAKMP header, proposals exchanged during the first packet exchange, and an identification payload. This information is not actually sent, but is hashed. The logic behind this is that this is all information that only the two legitimate participants in the IKE session would know in their entirety. By sending the hash of this information to each side, a final check is done on the authenticity of the remote participants.

As mentioned, this process is slightly different, depending on the authentication method being used. For example, in the case of pre-shared keys, there is no ability to encrypt information using the remote host's public key. The end result of each option, however, remains the same. At the end of phase one of the IKE session, an encrypted session has been established that will allow the exchange of IPSec SA information in phase two of IKE.

Aggressive mode is similar to main mode except that it does not provide the identity protection exchange, which is the purpose of the second and third packet exchanges in main mode. While the extra steps of verifying identity are not performed prior to creating a secure connection, the advantage of aggressive mode is that it can accomplish the phase one exchange in only three packets. The exchange for public keys is shown in Exhibit 16.

Exhibit 16: IKE Aggressive Mode

click to expand

We see that much of the same information is exchanged in terms of IKE security association information, Diffie-Hellman keys, and identification of the local host encrypted with public keys, but identity protection for the participants is not established because the identities are exchanged before a secure channel has been established between the two parties.

The distinction between aggressive mode and main mode is important to understand — not necessarily because the packet format is important, but because the two formats are incompatible with each other. There is no place in the ISAKMP header for the negotiation of the two different modes to occur. This means that each side must be preconfigured with the appropriate modes in order to communicate. Furthermore, we noted that there are at least three different ways to authenticate remote hosts to each other: using digital signatures, public keys, and shared secrets. The packet exchange for each of these is slightly different. As with the mode, there is no opportunity to describe how the authentication should occur in the ISAKMP header itself, so the authentication modes must be predefined as well.

If the phase-one IKE transaction should fail, it is generally for one of the above two reasons. There is a third possibility, however; during the establishment of the phase one encrypted session, a number of variables need to be agreed upon between the remote sides. These include things such as the Diffie-Hellman key length and the hash algorithm that is going to be used to hash data in the payloads. If these values are different between the two hosts, then the phase-one authentication attempts are doomed to fail.

To summarize IKE phase one, its goal is to create an encrypted channel between two hosts so that IPSec security association information can be transferred. Common trouble spots that prevent this from happening include differing modes configured on the two hosts, different authentication methods configured on the two hosts, and the hosts not agreeing upon the algorithms used to create the secure proposal.

Phase two negotiations establish the parameters that are to be used during the actual transfer of data and the re-keying process. This means that secret keys to be used during the encryption process and other session parameters that need to be established are exchanged. By this point, all traffic between the negotiating hosts is encrypted.

One of the more important parameters to be negotiated is the re-keying parameters. A common configuration option is whether or not to configure perfect forward secrecy (PFS). Perfect forward secrecy means that new keys used for the encryption and decryption of data are generated from scratch with no reuse of prior keying material. Normal re-keying, in an effort to be efficient, may use some of the same data from key to key. This implies that someone who is able to compromise one key might be able to compromise any keys related to the compromised one. With PFS, this possibility is eliminated. The disadvantage of configuring PFS is that re-keying is more computationally expensive than normal re-keying. Generally speaking, however, PFS provides greater assurances of confidentiality and is preferred. If a VPN gateway seems to suffer from performance issues, try disabling PFS and observe the result.

Any PFS that may be desired is a product of the phase-two negotiations. With no PFS, as illustrated in Exhibit 17, the keying material is derived from the phase one negotiation. If PFS is required, new keying material will be generated and exchanged during the phase two negotiations. Phase two of the IKE process is described in Exhibit 17 in packet format.

Exhibit 17: Phase Two IKE Exchange

click to expand

In the exchange in Exhibit 17, the first hash value from the initiator is a reiteration of much of the information that was established in the phase-one exchange. The SA information exchanged in phase two, however, relates to the IPSec SA that is being established. Again, the nonce is used to prevent replay attacks and the key material is the Diffie-Hellman material used to establish the key material for the IPSec encryption. To establish authenticity for the exchange, the IDs of the participants are also included. The packet from the responder echoes much of the initiator's material in its own hash and then provides the same material as the initiator for the SA establishment. Finally, the initiator hashes the information received from the responder and resends it as a final verification of the SA that has just been established.

Failures that occur at phase two are difficult to decipher. Remember that the packets are encrypted so you cannot tell what is going on by capturing them in a packet sniffer. But know that the IKE process will help you quickly debug the problem. We know that if phase one has established successfully, the two sides have successfully authenticated to each other and have IKE operating in the same mode with the same authentication. If there is a failure in phase two, it must mean one of two things. The first is that the IPSec proposals that are being exchanged are incompatible. This means that the initiator is saying, "I would like to do IPSec in this way…" and the responder is replying with an, "I don't understand any of those methods." The other common failure point is that one side is requesting PFS and the other side is not. Both of these failures are a matter of examining the configuration options on the VPN devices themselves.

10.6.1 Just Fast Keying (JFK)

IKE is tried and tested and found in most IPSec implementations. Over time, three key complaints about IKE's operation have been voiced. The first is that it takes too many exchanges. Using main mode, there is a minimum of four exchanges between the two phases, for a total of eight round-trip times. Even if serialization delay due to the bandwidth of the links were not an issue, at a minimum, end-to-end delay from one end of the United States is bound at a minimum of 30 ms by the laws of physics. The number of exchanges increases the time it takes to initiate the IPSec connection.

To address these concerns, another keying protocol has been proposed by the IETF. Known as Just Fast Keying (JFK), this newer key exchange protocol attempts to counter the implementation issues that haunt IKE.

Due to the fact that IKE requires the server to maintain information about the connection request, IKE can also become the victim of denial-of-service attacks in the same manner as a TCP SYN attack. This means that attackers can send many requests for key exchanges, yet never follow through with them. Because the servers are holding information about connections that will never be used, legitimate users have fewer resources available for their own connections.

Finally, IKE is complicated. This is a general criticism of the IPSec suite of protocols in general. We have seen in our discussions that there are several places where IKE sessions must agree before the session is allowed to complete. They must agree on the authentication method, the hashing algorithm to be used, Diffie-Hellman key lengths, perfect forward secrecy, and IPSec proposals. This wealth of configurable information makes the protocol robust but most have found that there are many options that are not used and in the end only configuration errors are increased.

JFK addresses each of the criticisms of IKE. The first is that JFK reduces the overall packet count between devices. As the exchange below shows, the packet exchange is reduced to just two packets per device. The first exchange of packets is to exchange the Diffie-Hellman information that will provide the information needed for encrypted packets. The second exchange of packets contains all the information needed for the establishment of an IPSec session.

To keep the exchange brief and to lessen the chance of misconfiguration, JFK supports fewer options than IKE. The benefit of this is that the negotiation of options does not need to occur at the start of the key exchange.

To lessen the chance for denial-of-service (DoS) type attacks utilizing resources on the server side, with JFK, it is the client that is responsible for maintaining the state of the key exchange during packet transfer. It is not until the last packet is sent from the responder to the initiator that the responder needs to start keeping track of the IPSec connection information. As an additional step to protect against DoS attacks, it is the client that is burdened with the computationally expensive process of generating the session key to be used with the IPSec SA.

To ease configuration issues, in the second packet in the exchange, the responder directs the initiator as to which proposals are acceptable for the encryption of traffic. This reduces the complexity of the exchange by putting the burden on the initiator, which is acting as a client to match the responder's configuration options. Additionally, the number of options has been reduced with regard to authentication and negotiation to reduce the overall complexity of the protocol.

Based on the advantages presented by JFK, it seems that this would be a protocol with which you should become at least casually familiar as you evaluate VPN solutions.

^[8]"Pseudo-random" itself is a term that requires some explanation. Computers have a difficult time coming up with true random numbers. To indicate this for the detail oriented, the term "pseudo-random" is used to indicate a number with the "appearance" of randomness. Try using this term at your next party. It will be a big hit.