5.2. The AES Cipher[2]
The Rijndael proposal for AES defined a cipher in which the block length and the key length can be independently specified to be 128, 192, or 256 bits. The AES specification uses the same three key size alternatives but limits the block length to 128 bits. A number of AES parameters depend on the key length (Table 5.3). In the description of this section, we assume a key length of 128 bits, which is likely to be the one most commonly implemented.
Rijndael was designed to have the following characteristics:
Figure 5.1 shows the overall structure of AES. The input to the encryption and decryption algorithms is a single 128-bit block. In FIPS PUB 197, this block is depicted as a square matrix of bytes. This block is copied into the State array, which is modified at each stage of encryption or decryption. After the final stage, State is copied to an output matrix. These operations are depicted in Figure 5.2a. Similarly, the 128-bit key is depicted as a square matrix of bytes. This key is then expanded into an array of key schedule words; each word is four bytes and the total key schedule is 44 words for the 128-bit key (Figure 5.2b). Note that the ordering of bytes within a matrix is by column. So, for example, the first four bytes of a 128-bit plaintext input to the encryption cipher occupy the first column of the in matrix, the second four bytes occupy the second column, and so on. Similarly, the first four bytes of the expanded key, which form a word, occupy the first column of the w matrix. Figure 5.1. AES Encryption and DecryptionFigure 5.2. AES Data StructuresBefore delving into details, we can make several comments about the overall AES structure:
We now turn to a discussion of each of the four stages used in AES. For each stage, we describe the forward (encryption) algorithm, the inverse (decryption) algorithm, and the rationale for the stage. This is followed by a discussion of key expansion. As was mentioned in Chapter 4, AES uses arithmetic in the finite field GF(28), with the irreducible polynomial[3] m(x) = x8 + x4 + x3 + x + 1. The developers of Rijndael give as their motivation for selecting this one of the 30 possible irreducible polynomials of degree 8 that it is the first one on the list given in [LIDL94].
Substitute Bytes TransformationForward and Inverse TransformationsThe forward substitute byte transformation, called SubBytes, is a simple table lookup (Figure 5.4a). AES defines a 16 x 16 matrix of byte values, called an S-box (Table 5.4a), that contains a permutation of all possible 256 8-bit values. Each individual byte of State is mapped into a new byte in the following way: The leftmost 4 bits of the byte are used as a row value and the rightmost 4 bits are used as a column value. These row and column values serve as indexes into the S-box to select a unique 8-bit output value. For example, the hexadecimal value[4] {95} references row 9, column 5 of the S-box, which contains the value {2A}. Accordingly, the value {95} is mapped into the value {2A}.
Figure 5.4. AES Byte-Level Operations |
({02} · {87}) | ({03} · {6E}) | {46} | {A6} | = {47} |
{87} | ({02} · {6E}) | ({03} · {46}) | {A6} | = {37} |
{87} | {6E} | ({02} · {46} | ({03} · {A6}) | = {94} |
({03} · {87}) | {6E} | {46} | ({02} · {A6} | = {ED} |
For the first equation, we have {02} · {87} = (0000 1110) (0001 1011) = (0001 0101); and {03} · {6E} = {6E} ({02} · {6E}) = (0110 1110) (1101 1100) = (1011 0010). Then
{02} · {87} | = | 0001 0101 | ||
{03} · {6E} | = | 1011 0010 | ||
{46} | = | 0100 0110 | ||
{A6} | = | 1010 0110 | ||
0100 0111 = {47} |
The other equations can be similarly verified.
The inverse mix column transformation, called InvMixColumns, is defined by the following matrix multiplication:
Equation 5-5
It is not immediately clear that Equation (5.5) is the inverse of Equation (5.3). We need to show that:
which is equivalent to showing that:
Equation 5-6
That is, the inverse transformation matrix times the forward transformation matrix equals the identity matrix. To verify the first column of Equation (5.6), we need to show that:
({0E} · {02}) {0B} {0D} ({09} · {03}) = {01}
({09} · {02}) {0E} {0B} ({0D} · {03}) = {00}
({0D} · {02}) {09} {0E} ({0B} · {03}) = {00}
({0B} · {02}) {0D} {09} ({0E} · {03}) = {00}
For the first equation, we have {0E} · {02}) 00011100; and {09} · {03} = {09} ({09} · {02}) = 00001001 00010010 = 00011011. Then
{0E} · {02} | = | 00011100 | |
{0B} | = | 00001011 | |
{0D} | = | 00001101 | |
{09} · {03} | = | 00011011 | |
00000001 |
The other equations can be similarly verified.
The AES document describes another way of characterizing the MixColumns transformation, which is in terms of polynomial arithmetic. In the standard, MixColumns is defined by considering each column of State to be a four-term polynomial with coefficients in GF(28). Each column is multiplied modulo (x4 + 1) by the fixed polynomial a(x), given by
Equation 5-7
Appendix 5A demonstrates that multiplication of each column of State by a(x) can be written as the matrix multiplication of Equation (5.3). Similarly, it can be seen that the transformation in Equation (5.5) corresponds to treating each column as a four-term polynomial and multiplying each column by b(x), given by
Equation 5-8
It can readily be shown that b(x) = a1 (x) mod (x4 + 1).
The coefficients of the matrix in Equation (5.3) are based on a linear code with maximal distance between code words, which ensures a good mixing among the bytes of each column. The mix column transformation combined with the shift row transformation ensures that after a few rounds, all output bits depend on all input bits. See [DAEM99] for a discussion.
In addition, the choice of coefficients in MixColumns, which are all {01}, {02}, or {03}, was influenced by implementation considerations. As was discussed, multiplication by these coefficients involves at most a shift and an XOR. The coefficients in InvMixColumns are more formidable to implement. However, encryption was deemed more important than decryption for two reasons:
For the CFB and OFB cipher modes (Figures 6.5 and 6.6; described in Chapter 6), only encryption is used.
As with any block cipher, AES can be used to construct a message authentication code (Part Two), and for this only encryption is used.
In the forward add round key transformation, called AddRoundKey, the 128 bits of State are bitwise XORed with the 128 bits of the round key. As shown in Figure 5.4b, the operation is viewed as a columnwise operation between the 4 bytes of a State column and one word of the round key; it can also be viewed as a byte-level operation. The following is an example of AddRoundKey:
The first matrix is State, and the second matrix is the round key.
The inverse add round key transformation is identical to the forward add round key transformation, because the XOR operation is its own inverse.
The add round key transformation is as simple as possible and affects every bit of State. The complexity of the round key expansion, plus the complexity of the other stages of AES, ensure security.
The AES key expansion algorithm takes as input a 4-word (16-byte) key and produces a linear array of 44 words (176 bytes). This is sufficient to provide a 4-word round key for the initial AddRoundKey stage and each of the 10 rounds of the cipher. The following pseudocode describes the expansion:
|
The key is copied into the first four words of the expanded key. The remainder of the expanded key is filled in four words at a time. Each added word w[i] depends on the immediately preceding word, w[i 1], and the word four positions back,w[i 4]. In three out of four cases, a simple XOR is used. For a word whose position in the w array is a multiple of 4, a more complex function is used. Figure 5.6 illustrates the generation of the first eight words of the expanded key, using the symbol g to represent that complex function. The function g consists of the following subfunctions:
RotWord performs a one-byte circular left shift on a word. This means that an input word [b0, b1, b2, b3] is transformed into [b1, b2, b3, b0].
SubWord performs a byte substitution on each byte of its input word, using the S-box (Table 5.4a).
The result of steps 1 and 2 is XORed with a round constant, Rcon[j].
The round constant is a word in which the three rightmost bytes are always 0. Thus the effect of an XOR of a word with Rcon is to only perform an XOR on the leftmost byte of the word. The round constant is different for each round and is defined as Rcon[j] = (RC[j], 0, 0, 0), with RC[1] = 1, RC[j] = 2 · RC[j - 1] and with multiplication defined over the field GF(28). The values of RC[j] in hexadecimal are
j | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
RC[j] | 01 | 02 | 04 | 08 | 10 | 20 | 40 | 80 | 1B | 36 |
For example, suppose that the round key for round 8 is
EA D2 73 21 B5 8D BA D2 31 2B F5 60 7F 8D 29 2F
Then the first 4 bytes (first column) of the round key for round 9 are calculated as follows:
i (decimal) | temp | After RotWord | After SubWord | Rcon (9) | After XOR with Rcon | w[i 4] | w[i] = temp w[i 4] |
36 | 7F8D292F | 8D292F7F | 5DA515D2 | 1B000000 | 46A515D2 | EAD27321 | AC7766F3 |
The Rijndael developers designed the expansion key algorithm to be resistant to known cryptanalytic attacks. The inclusion of a round-dependent round constant eliminates the symmetry, or similarity, between the ways in which round keys are generated in different rounds. The specific criteria that were used are as follows [DAEM99]:
Knowledge of a part of the cipher key or round key does not enable calculation of many other round key bits
An invertible transformation [i.e., knowledge of any Nk consecutive words of the Expanded Key enables regeneration the entire expanded key (Nk = key size in words)]
Speed on a wide range of processors
Usage of round constants to eliminate symmetries
Diffusion of cipher key differences into the round keys; that is, each key bit affects many round key bits
Enough nonlinearity to prohibit the full determination of round key differences from cipher key differences only
Simplicity of description
The authors do not quantify the first point on the preceding list, but the idea is that if you know less than Nk consecutive words of either the cipher key or one of the round keys, then it is difficult to reconstruct the remaining unknown bits. The fewer bits one knows, the more difficult it is to do the reconstruction or to determine other bits in the key expansion.
As was mentioned, the AES decryption cipher is not identical to the encryption cipher (Figure 5.1). That is, the sequence of transformations for decryption differs from that for encryption, although the form of the key schedules for encryption and decryption is the same. This has the disadvantage that two separate software or firmware modules are needed for applications that require both encryption and decryption. There is, however, an equivalent version of the decryption algorithm that has the same structure as the encryption algorithm. The equivalent version has the same sequence of transformations as the encryption algorithm (with transformations replaced by their inverses). To achieve this equivalence, a change in key schedule is needed.
Two separate changes are needed to bring the decryption structure in line with the encryption structure. An encryption round has the structure SubBytes, ShiftRows, MixColumns, AddRoundKey. The standard decryption round has the structure InvShiftRows, InvSubBytes, AddRoundKey, InvMixColumns. Thus, the first two stages of the decryption round need to be interchanged, and the second two stages of the decryption round need to be interchanged.
InvShiftRows affects the sequence of bytes in State but does not alter byte contents and does not depend on byte contents to perform its transformation. InvSubBytes affects the contents of bytes in State but does not alter byte sequence and does not depend on byte sequence to perform its transformation. Thus, these two operations commute and can be interchanged. For a given State Si,
InvShiftRows [InvSubBytes (Si)] = InvSubBytes [InvShiftRows (Si)]
The transformations AddRoundKey and InvMixColumns do not alter the sequence of bytes in State. If we view the key as a sequence of words, then both AddRoundKey and InvMixColumns operate on State one column at a time. These two operations are linear with respect to the column input. That is, for a given State Si and a given round key wj:
InvMixColumns (Si j) = [InvMixColumns (Si)] [InvMixColumns (j)]
To see this, suppose that the first column of State Si is the sequence (y0, y1, y2, y3) and the first column of the round key wj is (k0, k1, k2, k3). Then we need to show that
Let us demonstrate that for the first column entry. We need to show that:
[{0E} · (y0 k0)] [{0B} · (y1 k1)] [{0D} · (y2 k2)] [{09} · (y3 k3)]
= [{0E} · y0] [{0B} · y1] [{0D} · y2] [{09} · y3]
[[{0E} · k0] ] [{0B} · k1] [{0D} · k2] [{09} · k3]
This equation is valid by inspection. Thus, we can interchange AddRoundKey and InvMixColumns, provided that we first apply InvMixColumns to the round key. Note that we do not need to apply InvMixColumns to the round key for the input to the first AddRoundKey transformation (preceding the first round) nor to the last AddRoundKey transformation (in round 10). This is because these two AddRoundKey transformations are not interchanged with InvMixColumns to produce the equivalent decryption algorithm.
Figure 5.7 illustrates the equivalent decryption algorithm.
The Rijndael proposal [DAEM99] provides some suggestions for efficient implementation on 8-bit processors, typical for current smart cards, and on 32-bit processors, typical for PCs.
AES can be implemented very efficiently on an 8-bit processor. AddRoundKey is a bytewise XOR operation. ShiftRows is a simple byte shifting operation. SubBytes operates at the byte level and only requires a table of 256 bytes.
The transformation MixColumns requires matrix multiplication in the field GF(28), which means that all operations are carried out on bytes. MixColumns only requires multiplication by {02} and {03}, which, as we have seen, involved simple shifts, conditional XORs, and XORs. This can be implemented in a more efficient way that eliminates the shifts and conditional XORs. Equation Set (5.4) shows the equations for the MixColumns transformation on a single column. Using the identity {03} · x = ({02} · x)
Equation 5-9
Equation Set (5.9) is verified by expanding and eliminating terms.
The multiplication by {02} involves a shift and a conditional XOR. Such an implementation may be vulnerable to a timing attack of the sort described in Section 3.4. To counter this attack and to increase processing efficiency at the cost of some storage, the multiplication can be replaced by a table lookup. Define the 256-byte table X2, such that X2[i] = {02} · i. Then Equation Set (5.9) can be rewritten as
Tmp = so, j j s2, s3, s'0, j = s0, j Tmp so, j j]
s'1, c = s1, j Tmp s1, j j]
s'2, c = s2, j Tmp s2, j j]
s'3, j = s3, j Tmp s3, j j]
The implementation described in the preceding subsection uses only 8-bit operations. For a 32-bit processor, a more efficient implementation can be achieved if operations are defined on 32-bit words. To show this, we first define the four transformations of a round in algebraic form. Suppose we begin with a State matrix consisting of elements ai,j and a round key matrix consisting of elements ki,j. Then the transformations can be expressed as follows:
SubBytes | bi,j = S[ai,j] |
ShiftRows | |
MixColumns | |
AddRoundKey |
In the ShiftRows equation, the column indices are taken mod 4. We can combine all of these expressions into a single equation:
In the second equation, we are expressing the matrix multiplication as a linear combination of vectors. We define four 256-word (1024-byte) tables as follows:
Thus, each table takes as input a byte value and produces a column vector (a 32-bit word) that is a function of the S-box entry for that byte value. These tables can be calculated in advance.
We can define a round function operating on a column in the following fashion:
As a result, an implementation based on the preceding equation requires only four table lookups and four XORs per column per round, plus 4 Kbytes to store the table. The developers of Rijndael believe that this compact, efficient implementation was probably one of the most important factors in the selection of Rijndael for AES.