Appendix 11A Mathematical Basis of the Birthday Attack

[Page 346 (continued)]

In this appendix, we derive the mathematical justification for the birthday attack. We begin with a related problem and then look at the problem from which the name "birthday attack" is derived.

The Birthday Paradox

The birthday paradox is often presented in elementary probability courses to demonstrate that probability results are sometimes counterintuitive. The problem can be stated as follows: What is the minimum value of k such that the probability is greater than 0.5 that at least two people in a group of k people have the same birthday? Ignore February 29 and assume that each birthday is equally likely. To answer, let us define

P(n, k) = Pr[at least one duplicate in k items, with each item able to take on one of n equally likely values between 1 and n]

Thus, we are looking for the smallest value of k such that P(365, k) 0.5. It is easier first to derive the probability that there are no duplicates, which we designate as Q(365, k 365, then it is impossible for all values to be different. So we assume 365. Now consider the number of different ways, k values with no duplicates. We may choose any of the 365 values for the first item, any of the remaining 364 numbers for the second item, and so on. Hence, the number of different ways is

Equation 11-2

If we remove the restriction that there are no duplicates, then each item can be any of 365 values, and the total number of possibilities is 365^k. So the probability of no duplicates is simply the fraction of sets of values that have no duplicates out of all possible sets of values:

and

Equation 11-3

This function is plotted in Figure 11.10. The probabilities may seem surprisingly large to anyone who has not considered the problem before. Many people would guess that to have a probability greater than 0.5 that there is at least one duplicate, the number of people in the group would have to be about 100. In fact, the number is 23, with P(365, 23) = 0.5073. For k = 100, the probability of at least one duplicate is 0.9999997.

Figure 11.10. The Birthday Paradox (This item is displayed on page 348 in the print version)

Perhaps the reason that the result seems so surprising is that if you consider a particular person in a group, the probability that some other person in the group has the same birthday is small. But the probability that we are concerned with is the probability that any pair of people in the group has the same birthday. In a group of 23, there are (23(23 1))/2 = 253 different pairs of people. Hence the high probabilities.

Useful Inequality

Before developing a generalization of the birthday problem, we derive an inequality that will be needed:

Equation 11-4

[Page 348]

Figure 11.11 illustrates the inequality. To see that the inequality holds, note that the lower line is the tangent to e^x at x = 0. at The slope of that line is just the derivative of e^x at x = 0;

Figure 11.11. A Useful Inequality

[Page 349]

The tangent is a straight line of the form ax + b, with a = 1, and the tangent at x = o must equal e^o Thus, the tangent is the function (1 x), confirming the inequality of Equation (11.4). Further, note that for small x, we have (1 x) e^{The General Case of Duplications}

The birthday problem can be generalized to the following problem: Given a random variable that is an integer with uniform distribution between 1 and n and a selection of k instances (k n, k), that there is at least one duplicate? The birthday problem is just the special case with n = 365. By the same reasoning as before, we have the following generalization of Equation (11.3):

Equation 11-5

We can rewrite as

Using the inequality of Equation (11.4):

Now let us pose the question: What value of k is required such that P(n, k) 0.5? To satisfy the requirement, we have

For large k, we can replace k x (k 1) by k², and we get

Equation 11-6

As a reality check, for n = 365, we get which is very close to the correct answer of 23.

We can now state the basis of the birthday attack in the following terms. Suppose we have a function H, with 2^m possible outputs (i.e., an m-bit output). If H is applied to k random inputs, what must be the value of k so that there is the probability of at least one duplicate [i.e., H(x) = H(y) for some inputs x, y)]? Using the approximation in Equation (11.6):

[Page 350]

Equation 11-7

Overlap between Two Sets

There is a problem related to the general case of duplications that is also of relevance for our discussions. The problem is this: Given an integer random variable with uniform distribution between 1 and n and two sets of k instances (k n, k), that the two sets are not disjoint; that is, what is the probability that there is at least one value found in both sets?

Let us call the two sets X and Y, with elements {x₁, x₂,..., x_k} and {y₁, y₂,..., y_k}, respectively. Given the value of x₁, the probability that y₁ = x₁ is just 1/n, and therefore probability that does not match x₁ is [1 (1/n)]. If we generate the k random values in Y, the probability that none of these values is equal to is [1 (1/n)]^k. Thus, the probability that there is at least one match to x₁ is 1 [1 (1/n)]^k.

To proceed, let us make the assumption that all the elements of X are distinct. If n is large and if k is also large (e.g., on the order of ), then this is a good approximation. In fact, there may be a few duplications, but most of the values will be distinct. With that assumption, we can make the following derivation:

Using the inequality of Equation (11.4):

R(n, k) > 1 (e^1/n)^k2

R(n, k) > 1 (e^k2/n)

Let us pose the question: What value of k is required such that R(n, k) > 0.5? To satisfy the requirement, we have

Equation 11-8

We can state this in terms related to birthday attacks as follows. Suppose we have a function H, with 2^m possible outputs (i.e., an m-bit output). Apply H to k random inputs to produce the set X and again to k additional random inputs to produce the set Y. What must be the value of k so that there is the probability of at least 0.5 that there is a match between the two sets (i.e., H(x) = H(y) for some inputs x X, Y)? Using the approximation in Equation (11.8):

Related Problem

The Birthday Paradox

Figure 11.10. The Birthday Paradox (This item is displayed on page 348 in the print version)

Useful Inequality

Figure 11.11. A Useful Inequality

Overlap between Two Sets