2.2 ENTROPY OF AN INFORMATION SOURCE

 < Day Day Up > 



2.2 ENTROPY OF AN INFORMATION SOURCE

What is information? How do we measure information? These are fundamental issues for which Shannon provided the answers. We can say that we received some information if there is "decrease in uncertainty." Consider an information source that produces two symbols A and B. The source has sent A, B, B, A, and now we are waiting for the next symbol. Which symbol will it produce? If it produces A, the uncertainty that was there in the waiting period is gone, and we say that "information" is produced. Note that we are using the term "information" from a communication theory point of view; it has nothing to do with the "usefulness" of the information.

Shannon proposed a formula to measure information. The information measure is called the entropy of the source. If a source produces N symbols, and if all the symbols are equally likely to occur, the entropy of the source is given by

For example, assume that a source produces the English letters (in this chapter, we will refer to the English letters A to Z and space, totaling 27, as symbols), and all these symbols will be produced with equal probability. In such a case, the entropy is

The information source may not produce all the symbols with equal probability. For instance, in English the letter "E" has the highest frequency (and hence highest probability of occurrence), and the other letters occur with different probabilities. In general, if a source produces (i)th symbol with a probability of P(i), the entropy of the source is given by

If a large text of English is analyzed and the probabilities of all symbols (or letters) are obtained and substituted in the formula, then the entropy is

Note 

Consider the following sentence: "I do not knw wheter this is undrstandble." In spite of the fact that a number of letters are missing in this sentence, you can make out what the sentence is. In other words, there is a lot of redundancy in the English text.

This is called the first-order approximation for calculation of the entropy of the information source. In English, there is a dependence of one letter on the previous letter. For instance, the letter ‘U’ always occurs after the letter ‘Q’. If we consider the probabilities of two symbols together (aa, ab, ac, ad,..ba, bb, and so on), then it is called the second-order approximation. So, in second-order approximation, we have to consider the conditional probabilities of digrams (or two symbols together). The second-order entropy of a source producing English letters can be worked out to be

H = 3.36 bits/symbol

The third-order entropy of a source producing English letters can be worked out to be

H = 2.77 bits/symbol

As you consider the higher orders, the entropy goes down.

start example

If a source produces (i)th symbol with a probability of P(i), the entropy of the source is given by H = Σ P(i) log2P(i) bits/symbol.

end example

As another example, consider a source that produces four symbols with probabilities of 1/2, 1/4, 1/8, and 1/8, and all symbols are independent of each other. The entropy of the source is 7/4 bits/symbol.

Note 

As you consider the higher-order probabilities, the entropy of the source goes down. For example, the third-order entropy of a source producing English letters is 2.77 bits/symbol—each combination of three letters can be represented by 2.77 bits.



 < Day Day Up > 



Principles of Digital Communication Systems and Computer Networks
Principles Digital Communication System & Computer Networks (Charles River Media Computer Engineering)
ISBN: 1584503297
EAN: 2147483647
Year: 2003
Pages: 313
Authors: K V Prasad

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net