Per-Packet Key Mixing | Real 802.11 Security: Wi-Fi Protected Access and 802.11i

The key-mixing function creates a new key for every packet transmitted. It was introduced for two reasons:

As a way to protect against RC4 weak key attacks. The recommended defense (discard the first 256 bytes of key stream) was not possible in existing deployed hardware.
As a way to incorporate the extra bits of the extended IV.

The approach is to combine the session key, the IV and the source MAC address together with a hash function to produce a mixed key. Including the source MAC address provides added protection against forgery and also separates out the key space of the two communicating devices that share the session key.

Performing a hash operation for every packet to be encrypted or decrypted is a major overhead and hard work for the low-power MAC processor typical in earlier Wi-Fi LAN systems. To ease the burden, the mixing has been divided into two phases. During phase 1, the session key and source address are hashed. The result remains constant during the session. In phase 2, performed for every packet, the IV is mixed with the result of the first phase to produce an encryption key. This key is then used to initialize the RC4 encryption hardware.

Note that even the second part of the key-mixing computation can be performed in advance because the IV increases monotonically. Therefore, a station knows which values of IV will be coming up shortly and could precompute a number of keys in advance. This step avoids the need for a real-time computation when a packet is received. The two phases of key mixing are shown in Figure 11.3. As you can see in this figure, 104 bits of mixed key material are needed to form the total RC4 key of 128 bits when the IV is added.

There is nothing very glamorous about the computations for the key mixing but let's quickly look at the algorithm.

The inputs to the computation are abbreviated:

TSC = TKIP Sequence Counter (48 bits)
TSCU = Upper 32 bits of TSC (32 bits)
TSCL = Lower 16 bits of TSC (16 bits)
TA = Transmitter address, the MAC address of the encrypting station (48 bits)
TK = The temporal session key (128 bits)
P1K = Output of the first phase (80 bits)
P2K = Output of the second phase (128 bits); this becomes the RC4Key.

The two phases can be written as the following functions:

P1K Phase1(TA, TSCU, TK)
P2K Phase2(P1K, TSCL, TK)

When the system starts up or a new key exchange occurs, the TSC is reset to 0. The system typically computes the value of P1K right away and stores it for use in generating P2K. P1K needs to be recomputed every time TSCU changes, which is every 65536 packets. There is no reason why the next value of P1K can't be computed in advance so there will be no delay when TCSU actually changes.

Substitution Table or S-Box

Both phase 1 and phase 2 require a byte substitution table or S-box. The substitution table is to the computer what logarithm tables used to be to schoolchildren before calculators.^[3] You take a value and look up a corresponding value in a table. The calculations have been done in advance to determine the correct values in the table. A typical substitution table for a byte value is 256 bytes long so there is one entry for each value of the byte.

^[3] This is true for the S-box of RC-4, which is generated by computation. However, unlike log-tables, S-boxes for other algorithms may contain values that are not mathematically generated.

However, key mixing uses 16-bit word values. A full substitution table for a 16-bit value would be 2¹⁶ or 65,336 words long a total of 128Kbytes! However, this full table is needed only if you want to be able to have a reversible function and create a second table that will "undo" the substitution and get you back where you started. You do not need such a table for hashing functions; in fact, this type of reversal is something you want to prevent. Therefore, the key-mixing algorithm uses a partial substitution table with 512 word entries, which you can think of as two tables, each with 256 words. To make the substitution for a 16-bit word X, we use the upper byte of X as an index into the first table and the lower byte of X as an index into the second table. Then we exclusive OR the two words from the tables to produce a final 16-bit word substitution. This is denoted in the algorithm by the function:

 i = S[j]

where i is the result of substituting for j.

The 512 values for the substitution table are listed in the standard. The same substation tables must be used by everyone for this approach to work.

Phase 1 Computation

The output of phase 1 is only 80 bits long (not 128), but it uses all 128 bits of the temporal key in the computation. The result is computed in an array of five 16-bit words called P1K₀, P1K₁, P1K₂, P1K₃, and P1K₄. The following terminology is used in the algorithm:

TSC₁ = bits 16 31 of the TSC (the middle 16 bits)
TSC₂ = bits 32 47 of the TSC (the upper 16 bits)
TA_n = n^th byte of the encrypting station's MAC address
(TA₀ = lowest byte, TA₅ = highest byte)
TK_n = n^th byte of the temporal key
(TK₀ = lowest byte, TK₅ = highest byte)

the expression XY is used to denote combining two bytes into a 16-bit word so that:

XY = 256*X + Y

S[ ] denotes the result from the 16-bit substitution table:

 PHASE1_STEP1   P1K₀ = TSC₁   P1K₁ = TSC₂   P1K₂ = TA₁  TA₀   P1K₃ = TA₃  TA₂   P1K₄ = TA₅  TA₄ PHASE1_STEP2:   FOR i = 0 to 3   BEGIN      P1K₀ = P1K₀ + S[ P1K₄   (TK₁  TK₀) ]      P1K₁ = P1K₁ + S[ P1K₀   (TK₅  TK₄) ]      P1K₂ = P1K₂ + S[ P1K₁   (TK₉  TK₈) ]      P1K₃ = P1K₃ + S[ P1K₂   (TK₁₃  TK₁₂) ]      P1K₄ = P1K₄ + S[ P1K₃   (TK₁  TK₀) ]  +  i      P1K₀ = P1K₀ + S[ P1K₄   (TK₃  TK₂) ]      P1K₁ = P1K₁ + S[ P1K₀   (TK₇  TK₆) ]      P1K₂ = P1K₂ + S[ P1K₁   (TK₁₁  TK₁₀) ]      P1K₃ = P1K₃ + S[ P1K₂   (TK₁₅  TK₁₄) ]      P1K₄ = P1K₄ + S[ P1K₃   (TK₃  TK₂) ]  +  2*i  +  1 END

Although this is quite a bit of computation certainly more than in phase 2 the arithmetic comprises entirely shifts, adds, and exclusive OR operations.

Phase 2 Computation

On the face of it, phase 2 looks more complicated than phase 1. However, although there are more steps, there is no repeating loop in the computation. The result is computed in an array of six 16-bits words called: PPK₀, PPK₁, PPK₂, PPK₃, and PPK₄. The following terminology is used in the algorithm:

P1K_n = output words from phase 1
TSC₀ = bits 0 - 15 of the TSC (the lower 16 bits)
TK_n = n^th byte of the temporal key
(TK₀ = lowest byte, TK₅ = highest byte)

The expression XY is used to denote combining two bytes into a 16-bit word so that:

XY = 256*X + Y

The expression >>>(word) means that the 16-bit word is rotated one place right.

The expression (word) means that the 16-bit word is shifted one place right.

S[ ] denotes the result from the 16-bit substitution table.

RC4Key_n means the n^th byte of the RC4 key used for encryption.

 PHASE2,STEP1:      PPK₀ = P1K₀      PPK₁ = P1K₁      PPK₂ = P1K₂      PPK₃ = P1K₃      PPK₄ = P1K₄      PPK₅ = P1K₅ + TSC₀ PHASE2,STEP2:      PPK₀ = PPK₀ +  S[ PPK₅   (TK₁  TK₀) ]      PPK₁ = PPK₁ +  S[ PPK₀   (TK₃  TK₂) ]      PPK₂ = PPK₂ +  S[ PPK₁   (TK₅  TK₄) ]      PPK₃ = PPK₃ +  S[ PPK₂   (TK₇  TK₆) ]      PPK₄ = PPK₄ +  S[ PPK₃   (TK₉  TK₈) ]      PPK₅ = PPK₅ +  S[ PPK₄   (TK₁₁  TK₁₀) ]      PPK₀ = PPK₀ +  >>>(PPK₅   (TK₁₃  TK₁₂))      PPK₁ = PPK₁ +  >>>(PPK₀   (TK₁₅  TK₁₄))      PPK₂ = PPK₂ +  >>>(PPK₁)      PPK₃ = PPK₃ +  >>>(PPK₂)      PPK₄ = PPK₄ +  >>>(PPK₃)      PPK₅ = PPK₅ +  >>>(PPK₄) PHASE2,STEP3:      RC4Key₀ =  UpperByte(TSC₀)      RC4Key₁ =  (UpperByte (TSC₀) | 0x20) & 0x7F      RC4Key₂ =  LowerByte(TSC₀)      RC4Key₃ =  LowerByte ((PPK₅     (TK₁  TK₀))      RC4Key₄ =  LowerByte (PPK₀)      RC4Key₅ =  UpperByte (PPK₀)      RC4Key₆ =  LowerByte (PPK₁)      RC4Key₇ =  UpperByte (PPK₁)      RC4Key₈ =  LowerByte (PPK₂)      RC4Key₉ =  UpperByte (PPK₂)      RC4Key₁₀ = LowerByte (PPK₃)      RC4Key₁₁ = UpperByte (PPK₃)      RC4Key₁₂ = LowerByte (PPK₄)      RC4Key₁₃ = UpperByte (PPK₄)      RC4Key₁₄ = LowerByte (PPK₅)      RC4Key₁₅ = UpperByte (PPK₅)

The final output of phase 2 is an array of 16 bytes containing the RC4 key to be used in encryption. This can be loaded into the legacy WEP encryption engine prior to processing the MPDU for transmission. The first 3 bytes of this key are transmitted as the WEP IV field (24 bits) and contain the lower 16 bits of the TKIP IV value and the TSC. The second byte of the WEP IV is a repeat of the first byte, except that bit 5 is forced to 1 and bit 4 is forced to 0. Forcing these bits prevents generation of the major class of weak keys. This byte is ignored on receipt.