Section 17.6. Compression Methods Without Loss | Computer and Communication Networks (paperback)

17.6. Compression Methods Without Loss

Some types of data, including text, image, and video, might contain redundant or repeated elements. If so, those elements can be eliminated and some sort of codes substituted for future decoding. In this section, we focus on techniques that do not incur any loss during compression:

Arithmetic encoding
Run-length encoding
Huffman encoding
Lempel-Ziv encoding

Here, we ignore arithmetic encoding and consider only the last three encoding techniques.

17.6.1. Run-Length Encoding

One of the simplest data-compression techniques is run-length encoding . This technique is fairly effective for compression of plaintext and numbers , especially for facsimile systems. With run-length code, repeated letters can be replaced by a run length, beginning with C _c to express the compression letter count.

Example.

Assume a system that represents b as a blank. Find the compressed version of the following sentence :

THISSSSSS b IS bbbb AN b EXAMPLE b OF b RUNLENGTH b CODE

Solution.

According to the conventions stated, the compressed version of that sentence turns into:

THIS C _c 6 Sb IS C _c b 4 b ANbEXAMPLE b OF b RUN C _c -5LENGTH b CODE

It is obvious that the longer the text, the smaller the compression ratio becomes, as shown in Table 17.1. The statistics obtained in this table are based on a 1,000-character piece of text.

Table 17.1. Statistics obtained for run-length compression of a 1,000-character piece of text

Number of Repeated Characters	Average Length of Repeated Characters	Compression Ratio C _r
10	4	0.99
10	10	0.93
20	4	0.98
20	10	0.85
30	4	0.97
30	10	0.79

17.6.2. Huffman Encoding

Huffman encoding is an efficient frequency-dependent coding technique. With this algorithm, source values with smaller probabilities appear to be encoded by a longer word.

This technique reduces the total number of bits, leading to an efficient compression of data. The algorithm that implements such a technique is as follows .

Begin Huffman Encoding Algorithm

Sort outputs of the source in decreasing order of their probabilities. For example, 0.7, 0.6, 0.6, 0.59, ..., 0.02, 0.01.
Merge the two least probabilistic outputs into a single output whose probability is the sum of corresponding probability, such as 0.02 + 0.01 = 0.03.
If the number of remaining outputs is 2, go to the next step; otherwise , go to step 1.
Assign 0 and 1 as codes on the diagram.
If a new output is the result of merging two outputs, append the code word with 0 and 1; otherwise, stop.

Example.	Design a Huffman encoder for a source generating { a ₁ , a ₂ , a ₃ , a ₄ , a ₅ } and with probabilities {0.05, 0.05, 0.08, 0.30, 0.52}.
Solution.	Following the algorithm, the output of the information source shown in Figure 17.11, the information related to { a ₁ , a ₂ , a ₃ , a ₄ , a ₅ } is compressed to 1100, 1101, 111, 10, 0, respectively.

Figure 17.11. Huffman encoding

17.6.3. Lempel-Ziv Encoding

Lempel-Ziv codes are independent of the source statistics. This coding technique is normally used for UNIX compressed files. The algorithm that converts a string of logical bits into a Lempel-Ziv code is summarized as follows.

Begin Lempel-Ziv Encoding Algorithm

Any sequence of source output is passed in a phrase of varying length. At the first step, identify phrases of the smallest length that have not appeared so far. Note that all phrases are different, and lengths of words grow as the encoding process proceeds.
Phrases are encoded using code words of equal length. If k ₁ = number of bits are needed to describe the code word and k ₂ = the number of phrases, we must have

k ₁ = log ₂ k ₂ ₂ .
A code is followed by the last bit of parser output to double-check the last bit.

Example.

For the following string of bits, find the encoded Lempel-Ziv words:

11110111011000001010010001111010101100

Solution.

Implementing step 1 on the string, there are 14 phrases, as follows:

1- 11- 10 - 111- 0 - 110 - 00 - 001- 01- 0010 - 0011- 1101- 010 - 1100

Thus, k ₂ = 14 and k ₁ = log ₂ 14 ₂ = 4. Table 17.2 shows the encoded words as steps 3 and 4 are applied on the parser output.

Table 17.2. An example of Lempel-Ziv coding process

Parser Output	Location	Encoded Output
1	0001	00001
11	0010	00011
10	0011	00010
111	0100	00000
110	0110	00100
00	0111	01010
001	1000	01111
01	1001	01011
0011	0001	10000
0011	1011	10001
1101	1101	01101
011	1101	10010
1100	1110	01100
1	0001	00001