2.4 SHANNON S THEOREMS | Principles Digital Communication System & Computer Networks (Charles River Media Computer Engineering)

< Day Day Up >

2.4 SHANNON'S THEOREMS

In a digital communication system, the aim of the designer is to convert any information into a digital signal, pass it through the transmission medium and, at the receiving end, reproduce the digital signal exactly. To achieve this objective, two important requirements are:

To code any type of information into digital format. Note that the world is analog—voice signals are analog, images are analog. We need to devise mechanisms to convert analog signals into digital format. If the source produces symbols (such as A, B), we also need to convert these symbols into a bit stream. This coding has to be done efficiently so that the smallest number of bits is required for coding.
To ensure that the data sent over the channel is not corrupted. We cannot eliminate the noise introduced on the channels, and hence we need to introduce special coding techniques to overcome the effect of noise.

These two aspects have been addressed by Claude Shannon in his classical paper "A Mathematical Theory of Communication" published in 1948 in Bell System Technical Journal, which gave the foundation to information theory. Shannon addressed these two aspects through his source coding theorem and channel coding theorem.

Shannon's source coding theorem addresses how the symbols produced by a source have to be encoded efficiently. Shannon's channel coding theorem addresses how to encode the data to overcome the effect of noise.

2.4.1 Source Coding Theorem

The source coding theorem states that "the number of bits required to uniquely describe an information source can be approximated to the information content as closely as desired."

Again consider the source that produces the English letters. The information content or entropy is 4.07 bits/symbol. According to Shannon's source coding theorem, the symbols can be coded in such a way that for each symbol, 4.07 bits are required. But what should be the coding technique? Shannon does not tell us! Shannon's theory puts only a limit on the minimum number of bits required. This is a very important limit; all communication engineers have struggled to achieve the limit all these 50 years.

Consider a source that produces two symbols A and B with equal probability.

Symbol	Probability	Code Word
A	0.5	1
B	0.5	0

The two symbols can be coded as above, A is represented by 1 and B by 0. We require 1 bit/symbol.

Now consider a source that produces these same two symbols. But instead of coding A and B directly, we can code AA, AB, BA, BB. The probabilities of these symbols and associated code words are shown here:

Symbol	Probability	Code Word
AA	0.45	0
AB	0.45	10
BA	0.05	110
BB	0.05	111

Here the strategy in assigning the code words is that the symbols with high probability are given short code words and symbols with low probability are given long code words.

Note

Assigning short code words to high-probability symbols and long code words to low-probability symbols results in efficient coding.

In this case, the average number of bits required per symbol can be calculated using the formula

where P(i) is the probability and L(i) is the length of the code word. For this example, the value is (1 * 0.45 + 2 * 0.45 + 3 * 0.05 + 3 * 0.05) = 1.65 bits/symbol. The entropy of the source is 1.469 bits/symbol.

So, if the source produces the symbols in the following sequence:

A A B A B A A B B B

then source coding gives the bit stream

0 110 110 10 111

This encoding scheme on an average, requires 1.65 bits/symbol. If we code the symbols directly without taking into consideration the probabilities, the coding scheme would be

AA	00
AB	01
BA	10
BB	11

Hence, we require 2 bits/symbol. The encoding mechanism taking the probabilities into consideration is a better coding technique. The theoretical limit of the number of bits/symbol is the entropy, which is 1.469 bits/symbol. The entropy of the source also determines the channel capacity.

As we keep considering the higher-order entropies, we can reduce the bits/ symbol further and perhaps achieve the limit set by Shannon.

Based on this theory, it is estimated that English text cannot be compressed to less than 1.5 bits/symbol even if you use sophisticated coders and decoders.

This theorem provides the basis for coding information (text, voice, video) into the minimum possible bits for transmission over a channel. We will study the details of source coding in Chapter 4, "Coding of Text, Voice, Image, and Video Signals."

The source coding theorem states "the number of bits required to uniquely describe an information source can be approximated to the information content as closely as desired."

2.4.2 Channel Coding Theorem

Shannon's channel coding theorem states that "the error rate of data transmitted over a bandwidth limited noisy channel can be reduced to an arbitrary small amount if the information rate is lower than the channel capacity."

This theorem is the basis for error correcting codes using which we can achieve error-free transmission. Again, Shannon only specified that using ‘good’ coding mechanisms, we can achieve error-free transmission, but he did not specify what the coding mechanism should be! According to Shannon, channel coding may introduce additional delay in transmission but, using appropriate coding techniques, we can overcome the effect of channel noise.

Consider the example of a source producing the symbols A and B. A is coded as 1 and B as 0.

Symbols produced:	A	B	B	A	B
Bit stream:	1	0	0	1	0

Now, instead of transmitting this bit stream directly, we can transmit the bit stream

 111000000111000

that is, we repeat each bit three times. Now, let us assume that the received bit stream is

 101000010111000

Two errors are introduced in the channel. But still, we can decode the data correctly at the receiver because we know that the second bit should be 1 and the eighth bit should be 0 because the receiver also knows that each bit is transmitted thrice. This is error correction. This coding is called Rate 1/3 error correcting code. Such codes that can correct the errors are called Forward Error Correcting (FEC) codes.

Ever since Shannon published his historical paper, there has been a tremendous amount of research in the error correcting codes. We will discuss error detection and correction in Chapter 5 "Error Detection and Correction".

All these 50 years, communication engineers have struggled to achieve the theoretical limits set by Shannon. They have made considerable progress. Take the case of line modems that we use for transmission of data over telephone lines. The evolution of line modems from V.26 (2400bps data rate, 1200Hz bandwidth), V.27 modems (4800bps data rate, 1600Hz bandwidth), V.32 modems (9600bps data rate, 2400Hz bandwidth), and V.34 modems (28,800bps data rate, 3400Hz bandwidth) indicates the progress in source coding and channel coding techniques using Shannon's theory as the foundation.

Note

Source coding is used mainly to reduce the redundancy in the signal, whereas channel coding is used to introduce redundancy to overcome the effect of noise.

Summary

In this chapter, we studied Shannon's theory of communication. Shannon introduced the concept of entropy of an information source to measure the number of bits required to represent the symbols produced by the source. He also defined channel capacity, which is related to the bandwidth and signal-to-noise ratio. Based on these two measures, he formulated the source coding theorem and channel coding theorem. Source coding theorem states that "the number of bits required to uniquely describe an information source can be approximated to the information content as closely as desired." Channel coding theorem states that "the error rate of data transmitted over a bandwidth limited noisy channel can be reduced to an arbitrary small amount if the information rate is lower than the channel capacity." A good conceptual understanding of these two theorems is important for every communication engineer.

References

C.E. Shannon. "A Mathematical Theory of Communication." Bell System Technical Journal, Vol. 27, 1948.

Every communications engineer must read this paper. Shannon is considered the father of modern communications. You have to be very good at mathematics to understand this paper.
W. Gappmair, Claude E. Shannon. "The 50^th Anniversary of Information Theory." IEEE Communications Magazine, Vol. 37, No. 4, April 1999.

This paper gives a brief biography of Shannon and a brief overview of the importance of Shannon's theory.
cm.bell-labs.com/cm/ms/what/shannonday/paper.html You can download Shannon's original paper from this link.

Questions

Draw the block diagram of a communication system and explain the function of each block.
What is entropy of an information source? Illustrate with examples.
What is source coding? What is the difference between lossless coding and lossy coding?
Explain the concept of channel capacity with an example.
What is channel coding? Explain the concept of error correcting codes.

Exercises

1.	A source produces 42 symbols with equal probability. Calculate the entropy of the source.
2.	A source produces two symbols A and B with probabilities of 0.6 and 0.4, respectively. Calculate the entropy of the source.
3.	The ASCII code is used to represent characters in the computer. Is it an efficient coding technique from Shannon's point of view? If not, why?
4.	An information source produces English symbols (letters A to Z and space). Using the first-order model, calculate the entropy of the information source. You need to enter a large English text with the 27 symbols, calculate the frequency of occurrence of each symbol, and then calculate the entropy. The answer should be close to 4.07 bits/symbol.
5.	In the above example, using the second-order model, calculate the entropy. You need to calculate the frequencies taking two symbols at a time such as aa, ab, ac, etc. The entropy should be close to 3.36 bits/symbol.

Answers

1.	For a source that produces 42 symbols with equal probability, the entropy of the source is H = log₂ 42 bits/symbol = 5.55 bits/symbol
2.	For a source that produces two symbols A and B with probabilities of 0.6 and 0.4, respectively, the entropy is H = −{0.6 log₂ 0.6 + 0.4 log₂ 0.4} = 0.970 bits/symbol
3.	In ASCII, each character is represented by seven bits. The frequency of occurrence of the English letters is not taken into consideration at all. If the frequency of occurrence is taken into consideration, then the most frequently occurring letters have to be represented by small code words (such as 2 bits) and less frequently occurring letters have to be represented by long code words. According to Shannon's theory, ASCII is not an efficient coding technique. However, note that if an efficient coding technique is followed, then a lot of additional processing is involved, which causes delay in decoding the text.
4.	You can write a program that obtains the frequency of occurrence of the English letters. The program takes a text file as input and produces the frequency of occurrence for all the letters and spaces. You can ignore the punctuation marks. You need to convert all letters either into capital letters or small letters. Based on the frequencies, if you apply Shannon's formula for entropy, you will get a value close to 4.07 bits/symbol.
5.	You can modify the above program to calculate the frequencies of two letter combinations (aa, ab, ac,…ba, bb, … zy, zz). Again, if you apply the formula, you will get a value close to 3.36 bits/symbol.

Projects

Simulate a digital communication system in C language. (a) Write a program to generate a continuous bit stream of 1s and 0s. (b) Simulate the medium by changing a 1 to a 0 and a 0 to a 1 at random places in the bit stream using a random number generator. (c) Calculate the bit error rate (= number of errors/number of bits transmitted).
Develop a file compression utility using the second-order approximation described in this chapter. The coding should be done by taking the frequencies of occurrence of combinations of two characters, as in Exercise 2.

< Day Day Up >