Section 11.4. Hash Functions

[Page 334 (continued)]

11.4. Hash Functions

A hash value h is generated by a function H of the form

h = H(M)

where M is a variable-length message and H(M) is the fixed-length hash value. The hash value is appended to the message at the source at a time when the message is assumed or known to be correct. The receiver authenticates that message by recomputing the hash value. Because the hash function itself is not considered to be secret, some means is required to protect the hash value (Figure 11.5).

We begin by examining the requirements for a hash function to be used for message authentication. Because hash functions are typically quite complex, it is useful to examine some very simple hash functions to get a feel for the issues involved. We then look at several approaches to hash function design.

[Page 335]

Requirements for a Hash Function

The purpose of a hash function is to produce a "fingerprint" of a file, message, or other block of data. To be useful for message authentication, a hash function H must have the following properties (adapted from a list in [NECH92]):

H can be applied to a block of data of any size.
H produces a fixed-length output.
H(x) is relatively easy to compute for any given x, making both hardware and software implementations practical.
For any given value h, it is computationally infeasible to find x such that H(x) = h. This is sometimes referred to in the literature as the one-way property.
For any given block x, it is computationally infeasible to find y y) = H(x). This is sometimes referred to as weak collision resistance.
It is computationally infeasible to find any pair (x, y) such that H(x) = H(y). This is sometimes referred to as strong collision resistance.^[2]
^[2] Unfortunately, these terms are not used consistently. Alternate terms used in the literature include one-way hash function: (properties 4 and 5); collision-resistant hash function: (properties 4, 5, and 6); weak one-way hash function: (properties 4 and 5); strong one-way hash function: (properties 4, 5, and 6). The reader must take care in reading the literature to determine the meaning of the particular terms used.

The first three properties are requirements for the practical application of a hash function to message authentication.

The fourth property, the one-way property, states that it is easy to generate a code given a message but virtually impossible to generate a message given a code. This property is important if the authentication technique involves the use of a secret value (Figure 11.5e). The secret value itself is not sent; however, if the hash function is not one way, an attacker can easily discover the secret value: If the attacker can observe or intercept a transmission, the attacker obtains the message M and the hash code C = H(S_AB||M). The attacker then inverts the hash function to obtain S_AB||M = H¹(C). Because the attacker now has both M and S_AB||M, it is a trivial matter to recover S_AB.

The fifth property guarantees that an alternative message hashing to the same value as a given message cannot be found. This prevents forgery when an encrypted hash code is used (Figures 11.5b and c). For these cases, the opponent can read the message and therefore generate its hash code. However, because the opponent does not have the secret key, the opponent should not be able to alter the message without detection. If this property were not true, an attacker would be capable of the following sequence: First, observe or intercept a message plus its encrypted hash code; second, generate an unencrypted hash code from the message; third, generate an alternate message with the same hash code.

The sixth property refers to how resistant the hash function is to a type of attack known as the birthday attack, which we examine shortly.

[Page 336]

Simple Hash Functions

All hash functions operate using the following general principles. The input (message, file, etc.) is viewed as a sequence of n-bit blocks. The input is processed one block at a time in an iterative fashion to produce an n-bit hash function.

One of the simplest hash functions is the bit-by-bit exclusive-OR (XOR) of every block. This can be expressed as follows:

C_i = b_i1 i1 ... im

where

C_i	= ith bit of the hash code, 1 n
m	= number of n-bit blocks in the input
b_ij	= ith bit in jth block
	= XOR operation

This operation produces a simple parity for each bit position and is known as a longitudinal redundancy check. It is reasonably effective for random data as a data integrity check. Each n-bit hash value is equally likely. Thus, the probability that a data error will result in an unchanged hash value is 2ⁿ. With more predictably formatted data, the function is less effective. For example, in most normal text files, the high-order bit of each octet is always zero. So if a 128-bit hash value is used, instead of an effectiveness of 2¹²⁸, the hash function on this type of data has an effectiveness of 2¹¹².

A simple way to improve matters is to perform a one-bit circular shift, or rotation, on the hash value after each block is processed. The procedure can be summarized as follows:

Initially set the n-bit hash value to zero.
Process each successive n-bit block of data as follows:
1. Rotate the current hash value to the left by one bit.
2. XOR the block into the hash value.

This has the effect of "randomizing" the input more completely and overcoming any regularities that appear in the input. Figure 11.7 illustrates these two types of hash functions for 16-bit hash values.

Figure 11.7. Two Simple Hash Functions (This item is displayed on page 337 in the print version)

Although the second procedure provides a good measure of data integrity, it is virtually useless for data security when an encrypted hash code is used with a plaintext message, as in Figures 11.5b and c. Given a message, it is an easy matter to produce a new message that yields that hash code: Simply prepare the desired alternate message and then append an n-bit block that forces the new message plus block to yield the desired hash code.

Although a simple XOR or rotated XOR (RXOR) is insufficient if only the hash code is encrypted, you may still feel that such a simple function could be useful when the message as well as the hash code are encrypted (Figure 11.5a). But you must be careful. A technique originally proposed by the National Bureau of Standards used the simple XOR applied to 64-bit blocks of the message and then an encryption of the entire message that used the cipher block chaining (CBC) mode. We can define the scheme as follows: Given a message consisting of a sequence of 64-bit blocks X₁, X₂,..., X_N, define the hash code C as the block-by-block XOR of all blocks and append the hash code as the final block:

[Page 337]

C = X_N+1 = X₁ ... N

Next, encrypt the entire message plus hash code, using CBC mode to produce the encrypted message Y₁, Y₂,..., Y_N+1. [JUEN85] points out several ways in which the ciphertext of this message can be manipulated in such a way that it is not detectable by the hash code. For example, by the definition of CBC (Figure 6.4), we have

X₁	= IV D(Y₁)
X_i	= Y_i1 D(Y_i)
X_N+1	= Y_N D(Y_N+1)

[Page 338]

But X_N+1 is the hash code:

X_N+1	= X₁ ... N
	= [IV D(Y₁)] [ D(Y₂)] ... [ ... D (Y_N)]

Because the terms in the preceding equation can be XORed in any order, it follows that the hash code would not change if the ciphertext blocks were permuted.

Birthday Attacks

Suppose that a 64-bit hash code is used. One might think that this is quite secure. For example, if an encrypted hash code C is transmitted with the corresponding unencrypted message M (Figure 11.5b or 11.5c), then an opponent would need to find an M' such that H(M') = H(M) to substitute another message and fool the receiver. On average, the opponent would have to try about 2⁶³ messages to find one that matches the hash code of the intercepted message [see Appendix 11A, Equation (11.1)].

However, a different sort of attack is possible, based on the birthday paradox (Appendix 11A). Yuval proposed the following strategy [YUVA79]:

The source, A, is prepared to "sign" a message by appending the appropriate m-bit hash code and encrypting that hash code with A's private key (Figure 11.5c).
The opponent generates 2^m/2 variations on the message, all of which convey essentially the same meaning. The opponent prepares an equal number of messages, all of which are variations on the fraudulent message to be substituted for the real one.
The two sets of messages are compared to find a pair of messages that produces the same hash code. The probability of success, by the birthday paradox, is greater than 0.5. If no match is found, additional valid and fraudulent messages are generated until a match is made.
The opponent offers the valid variation to A for signature. This signature can then be attached to the fraudulent variation for transmission to the intended recipient. Because the two variations have the same hash code, they will produce the same signature; the opponent is assured of success even though the encryption key is not known.

Thus, if a 64-bit hash code is used, the level of effort required is only on the order of 2³² [see Appendix 11A, Equation (11.7)].

The generation of many variations that convey the same meaning is not difficult. For example, the opponent could insert a number of "space-space-backspace" character pairs between words throughout the document. Variations could then be generated by substituting "space-backspace-space" in selected instances. Alternatively, the opponent could simply reword the message but retain the meaning. Figure 11.8 [DAVI89] provides an example.

Figure 11.8. A Letter in 2³⁷ Variations [DAVI89] (This item is displayed on page 339 in the print version)

The conclusion to be drawn from this is that the length of the hash code should be substantial. We discuss this further in Section 11.5.

[Page 339]

Block Chaining Techniques

A number of proposals have been made for hash functions based on using a cipher block chaining technique, but without the secret key. One of the first such proposals was that of Rabin [RABI78]. Divide a message M into fixed-size blocks M₁, M₂,..., M_N and use a symmetric encryption system such as DES to compute the hash code G as follows:

H_o	= initial value
H_i	= E(M_i, H_i, H_i1)
G	= H_N

[Page 340]

This is similar to the CBC technique, but in this case there is no secret key. As with any hash code, this scheme is subject to the birthday attack, and if the encryption algorithm is DES and only a 64-bit hash code is produced, then the system is vulnerable.

Furthermore, another version of the birthday attack can be used even if the opponent has access to only one message and its valid signature and cannot obtain multiple signings. Here is the scenario; we assume that the opponent intercepts a message with a signature in the form of an encrypted hash code and that the unencrypted hash code is m bits long:

Use the algorithm defined at the beginning of this subsection to calculate the unencrypted hash code G.
Construct any desired message in the form Q₁, Q₂,..., Q_N2.
Compute for H_i = E(Q_i, H_i1) for 1 (
Generate 2^m/2 random blocks; for each block X, compute E(X, H_N2). Generate an additional 2^m/2 random blocks; for each block Y, compute D(Y, G), where D is the decryption function corresponding to E.
Based on the birthday paradox, with high probability there will be an X and Y such that E(X, H_N2) = D(Y, G).
Form the message Q₁, Q₂,..., Q_N2, X, Y. This message has the hash code G and therefore can be used with the intercepted encrypted signature.

This form of attack is known as a meet-in-the-middle attack. A number of researchers have proposed refinements intended to strengthen the basic block chaining approach. For example, Davies and Price [DAVI89] describe the following variation:

H_i = E(M_i, H_i1) i1

Another variation, proposed in [MEYE88],

H_i = E(H_i1, M_i) i

However, both of these schemes have been shown to be vulnerable to a variety of attacks [MIYA90]. More generally, it can be shown that some form of birthday attack will succeed against any hash scheme involving the use of cipher block chaining without a secret key provided that either the resulting hash code is small enough (e.g., 64 bits or less) or that a larger hash code can be decomposed into independent subcodes [JUEN87].

Thus, attention has been directed at finding other approaches to hashing. Many of these have also been shown to have weaknesses [MITC92]. We examine two strong hash functions in Chapter 12.