Section 17.6. Compression Methods Without Loss


17.6. Compression Methods Without Loss

Some types of data, including text, image, and video, might contain redundant or repeated elements. If so, those elements can be eliminated and some sort of codes substituted for future decoding. In this section, we focus on techniques that do not incur any loss during compression:

  • Arithmetic encoding

  • Run-length encoding

  • Huffman encoding

  • Lempel-Ziv encoding

Here, we ignore arithmetic encoding and consider only the last three encoding techniques.

17.6.1. Run-Length Encoding

One of the simplest data-compression techniques is run-length encoding . This technique is fairly effective for compression of plaintext and numbers , especially for facsimile systems. With run-length code, repeated letters can be replaced by a run length, beginning with C c to express the compression letter count.

Example.

Assume a system that represents b as a blank. Find the compressed version of the following sentence :

THISSSSSS b IS bbbb AN b EXAMPLE b OF b RUNLENGTH b CODE

Solution.

According to the conventions stated, the compressed version of that sentence turns into:

THIS C c 6 Sb IS C c b 4 b ANbEXAMPLE b OF b RUN C c -5LENGTH b CODE

It is obvious that the longer the text, the smaller the compression ratio becomes, as shown in Table 17.1. The statistics obtained in this table are based on a 1,000-character piece of text.

Table 17.1. Statistics obtained for run-length compression of a 1,000-character piece of text

Number of Repeated Characters

Average Length of Repeated Characters

Compression Ratio C r

10

4

0.99

10

10

0.93

20

4

0.98

20

10

0.85

30

4

0.97

30

10

0.79


17.6.2. Huffman Encoding

Huffman encoding is an efficient frequency-dependent coding technique. With this algorithm, source values with smaller probabilities appear to be encoded by a longer word.

This technique reduces the total number of bits, leading to an efficient compression of data. The algorithm that implements such a technique is as follows .

Begin Huffman Encoding Algorithm
  1. Sort outputs of the source in decreasing order of their probabilities. For example, 0.7, 0.6, 0.6, 0.59, ..., 0.02, 0.01.

  2. Merge the two least probabilistic outputs into a single output whose probability is the sum of corresponding probability, such as 0.02 + 0.01 = 0.03.

  3. If the number of remaining outputs is 2, go to the next step; otherwise , go to step 1.

  4. Assign 0 and 1 as codes on the diagram.

  5. If a new output is the result of merging two outputs, append the code word with 0 and 1; otherwise, stop.

Example.

Design a Huffman encoder for a source generating { a 1 , a 2 , a 3 , a 4 , a 5 } and with probabilities {0.05, 0.05, 0.08, 0.30, 0.52}.

Solution.

Following the algorithm, the output of the information source shown in Figure 17.11, the information related to { a 1 , a 2 , a 3 , a 4 , a 5 } is compressed to 1100, 1101, 111, 10, 0, respectively.

Figure 17.11. Huffman encoding

17.6.3. Lempel-Ziv Encoding

Lempel-Ziv codes are independent of the source statistics. This coding technique is normally used for UNIX compressed files. The algorithm that converts a string of logical bits into a Lempel-Ziv code is summarized as follows.

Begin Lempel-Ziv Encoding Algorithm

  1. Any sequence of source output is passed in a phrase of varying length. At the first step, identify phrases of the smallest length that have not appeared so far. Note that all phrases are different, and lengths of words grow as the encoding process proceeds.

  2. Phrases are encoded using code words of equal length. If k 1 = number of bits are needed to describe the code word and k 2 = the number of phrases, we must have

    k 1 = log 2 k 2 2 .

  3. A code is followed by the last bit of parser output to double-check the last bit.

Example.

For the following string of bits, find the encoded Lempel-Ziv words:

11110111011000001010010001111010101100

Solution.

Implementing step 1 on the string, there are 14 phrases, as follows:

1- 11- 10 - 111- 0 - 110 - 00 - 001- 01- 0010 - 0011- 1101- 010 - 1100

Thus, k 2 = 14 and k 1 = log 2 14 2 = 4. Table 17.2 shows the encoded words as steps 3 and 4 are applied on the parser output.

Table 17.2. An example of Lempel-Ziv coding process

Parser Output

Location

Encoded Output

1

0001

00001

11

0010

00011

10

0011

00010

111

0100

00000

110

0110

00100

00

0111

01010

001

1000

01111

01

1001

01011

0011

0001

10000

0011

1011

10001

1101

1101

01101

011

1101

10010

1100

1110

01100

1

0001

00001




Computer and Communication Networks
Computer and Communication Networks (paperback)
ISBN: 0131389106
EAN: 2147483647
Year: 2007
Pages: 211
Authors: Nader F. Mir

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net