Cryptography: Theory and Practice:Shannon s Theory | Modern Cryptography: Theory and Practice

Cryptography: Theory and Practice
by Douglas Stinson
CRC Press, CRC Press LLC
ISBN: 0849385210 Pub Date: 03/17/95

Table of Contents

2.2 Entropy

In the previous section, we discussed the concept of perfect secrecy. We restricted our attention to the special situation where a key is used for only one encryption. We now want to look at what happens as more and more plaintexts are encrypted using the same key, and how likely a cryptanalyst will be able to carry out a successful ciphertext-only attack, given sufficient time.

The basic tool in studying this question is the idea of entropy, a concept from information theory introduced by Shannon in 1948. Entropy can be thought of as a mathematical measure of information or uncertainty, and is computed as a function of a probability distribution.

Suppose we have a random variable X which takes on a finite set of values according to a probability distribution p(X). What is the information gained by an event which takes place according to distribution p(X)? Equivalently, if the event has not (yet) taken place, what is the uncertainty about the outcome? This quantity is called the entropy of X and is denoted by H(X).

These ideas may seem rather abstract, so let’s look at a more concrete example. Suppose our random variable X represents the toss of a coin. The probability distribution is p(heads) = p(tails) = 1/2. It would seem reasonable to say that the information, or entropy, of a coin toss is one bit, since we could encode heads by 1 and tails by 0, for example. In a similar fashion, the entropy of n independent coin tosses is n, since the n coin tosses can be encoded by a bit string of length n.

As a slightly more complicated example, suppose we have a random variable X that takes on three possible values x₁, x₂, x₃ with probabilities 1/2, 1/4, 1/4 respectively. The most efficient “encoding” of the three possible outcomes is to encode x₁, as 0, to encode x₂ as 10 and to encode x₃ as 11. Then the average number of bits in an encoding of X is

The above examples suggest that an event which occurs with probability 2^-n can be encoded as a bit string of length n. More generally, we could imagine that an event occurring with probability p might be encoded by a bit string of length approximately - log₂ p. Given an arbitrary probability distribution p₁, p₂, ..., p_n for a random variable X, we take the weighted average of the quatities - log₂ p_i to be our measure of information. This motivates the following formal definition.

DEFINITION 2.3 Suppose X is a random variable which takes on a finite set of values according to a probability distribution p(X). Then, the entropy of this probability distribution is defined to be the quantity

If the possible values of X are x_i, 1 ≤ i ≤ n, then we have

REMARK Observe that log₂ p_i is undefined if p_i = 0. Hence, entropy is sometimes defined to be the relevant sum over all the non-zero probabilities. Since lim_{x →0} x log₂ x = 0, there is no real difficulty with allowing p_i = 0 for some i. However, we will implicitly assume that, when computing the entropy of a probability distribution p_i, the sum is taken over the indices i such that p_i ≠ 0. Also, we note that the choice of two as the base of the logarithms is arbitrary: another base would only change the value of the entropy by a constant factor.

Note that if p_i = 1/n for 1 ≤ i ≤ n, then H(X) = log₂ n. Also, it is easy to see that H(X) ≥ 0, and H(X) = 0 if and only if p_i = 1 for some i and p_j = 0 for all j ≠ i.

Let us look at the entropy of the various components of a cryptosystem. We can think of the key as being a random variable K that takes on values according to the probability distribution , and hence we can compute the entropy H(K). Similarly, we can compute entropies H(P) and H(C) associated with plaintext and ciphertext probability distributions, respectively.

To illustrate, we compute the entropies of the cryptosystem of Example 2.1.

Example 2.1 (Cont.)

We compute as follows:

Similar calculations yield H(K) = 1.5 and .

Table of Contents