5.2 Lossy compression

5.2 Lossy compression

In addition to lossless compression, the JPEG standard defines three lossy compression modes. These are called the baseline sequential mode, the progressive mode and the hierarchical mode. These modes are all based on the discrete cosine transform (DCT) to achieve a substantial compression while producing a reconstructed image with high visual fidelity. The main difference between these modes is the way in which the DCT coefficients are transmitted.

The simplest DCT-based coding is referred to as the baseline sequential process and it provides capability that is sufficient for many applications. The other DCT-based processes which extend the baseline sequential process to a broader range of applications are referred to as extended DCT-based processes. In any extended DCT-based decoding processes, baseline decoding is required to be present in order to provide a default decoding capability.

5.2.1 Baseline sequential mode compression

Baseline sequential mode compression is usually called baseline coding for short. In this mode, an image is partitioned into 8 × 8 nonoverlapping pixel blocks from left to right and top to bottom. Each block is DCT coded, and all the 64 transform coefficients are quantised to the desired quality. The quantised coefficients are immediately entropy coded and output as part of the compressed image data, thereby minimising coefficient storage requirements.

Figure 5.3 illustrates the JPEG's baseline compression algorithm. Each eight-bit sample is level shifted by subtracting 28-1=7 = 128 before being DCT coded. This is known as DC level shifting. The 64 DCT coefficients are then uniformly quantised according to the step size given in the application-specific quantisation matrix. The use of a quantisation matrix allows different weighting to be applied according to the sensitivity of the human visual system to a coefficient of the frequency.

click to expand
Figure 5.3: Block diagram of a baseline JPEG encoder

Two examples of quantisation tables are given in Tables 5.1 and 5.2. [5]. These tables are based on psychovisual thresholding and are derived empirically using luminance and chrominance with a 2:1 horizontal subsampling. These tables may not be suitable for any particular application, but they give good results for most images with eight-bit precision.

Table 5.1: Luminance Q table

16

11

10

16

24

40

51

61

12

12

14

19

26

58

60

55

14

13

16

24

40

57

69

56

14

17

22

29

51

87

80

62

18

22

37

56

68

109

103

77

24

35

55

64

81

104

113

92

49

64

78

87

103

121

120

101

72

92

95

98

112

100

103

99

If the elements of the quantisation tables of luminance and chrominance are represented by Q(u, v), then a quantised DCT coefficient with horizontal and vertical spatial frequencies of u and v, Fq(u, v), is given by:

(5.1) 

Table 5.2: Chrominance Q table

17

18

24

47

99

99

99

99

18

21

26

66

99

99

99

99

24

26

56

99

99

99

99

99

47

66

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

where F(u, v) is the transform coefficient value prior to quantisation, and . means rounding the division to the nearest integer. At the decoder the quantised coefficients are inverse quantised by:

(5.2) 

to reconstruct the quantised coefficients.

A quality factor q_JPEG is normally used to control the elements of the quantisation matrix Q(u, v) [6]. The range of q_JPEG percentage values is between 1 and 100 per cent. The JPEG quantisation matrices of Tables 5.1 and 5.2 are used for q_JPEG = 50, for the luminance and chrominance, respectively. For other quality factors, the elements of the quantisation matrix, Q(u, v), are multiplied by the compression factor a, defined as [6]:

(5.3) 

subject to the condition that the minimum value of the modified quantisation matrix elements, αQ(u, v), is 1. For 100 per cent quality, q_JPEG = 100, that is lossless compression; all the elements of αQ(u, v) are set to 1.

After quantisation, the DC (commonly referred to as (0,0)) coefficient and the 63 AC coefficients are coded separately as shown in Figure 5.3. The DC coefficients are DPCM coded with prediction of the DC coefficient from the previous block, as shown in Figure 5.4, i.e. DIFF = DCi - DCi-1. This separate treatment from the AC coefficients is to exploit the correlation between the DC values of adjacent blocks and to code them more efficiently as they typically contain the largest portion of the image energy. The 63 AC coefficients starting from coefficient AC(1,0) are run length coded following a zigzag scan as shown in Figure 5.4.

click to expand
Figure 5.4: Preparing of the DCT coefficients for entropy coding

The adoption of a zigzag scanning pattern is to facilitate entropy coding by encountering the most likely nonzero coefficients first. This is due to the fact that, for most natural scenes, the image energy is mainly concentrated in a few low frequency transform coefficients.

5.2.2 Run length coding

Entropy coding of the baseline encoder is accomplished in two stages. The first stage is the translation of the quantised DCT coefficients into an intermediate set of symbols. In the second stage, variable length codes are assigned to each symbol. For the JPEG standard a symbol is structured in two parts: a variable length code (VLC) for the first part, normally referred to as symbol-1, followed by a binary representation of the amplitude for the second part, symbol-2.

5.2.2.1 Coding of DC coefficients

Instead of assigning individual variable length codewords (e.g. Huffman code) to each DIFF, the DIFF values are categorised based on the magnitude range called CAT. The CAT is then variable length coded. Table 5.3 shows the categories for the range of amplitudes in the baseline JPEG. Since the DCT coefficient values are in the range -2047 to 2047, there are 11 categories for nonzero coefficients. Category zero is not used for symbols; it is used for defining the end of block (EOB) code.

Table 5.3: The category (CAT) of the baseline encoder

CAT

Range

0

-

1

-1, 1

2

-3, -2, 2,3

3

-7,...-4, 4,...7

4

-15, ...-8, 8, ...15

5

-31, ...-16, 16,...31

6

-63, ...-32, 32, ...63

7

-127, ... -64, 64, ... 127

8

-255, ...-128, 128, ...255

9

-511,... -256, 256, ...511

10

-1023, ...-512, 512, ...1023

11

-2047,... -1024, 1024,... 2047

The CAT after being VLC coded is appended with additional bits to specify the actual DIFF values (amplitude) within the category. Here CAT is symbol-1 and the appended bits represent symbol-2.

When the DIFF is positive the appended bits are just the lower order bits of the DIFF. When it is negative, the appended bits become the lower order bits of DIFF-1. The lower order bits start from the point where the most significant bit of the appended bit sequence is one for positive differences and zero for negative differences. For example: for DIFF = 6 = 0000 ... 00110, the appended bits start from 1, hence it would be 110. This is because DIFF is positive and the most significant bit of the appended bits should be 1. Also, since 6 is in the range of 4 to 7 (Table 5.3) the value of CAT is 3. From Table B.1 of Appendix B, the codeword for CAT = 3 is 100. Thus the overall codeword for DIFF = 6 is 100110, where 100 is the VLC code of CAT (symbol-1) and 110 is the appended codeword (symbol-2).

For a negative DIFF, such as DIFF = -3, first of all -3 is in the range of -3 to -2, thus from Table 5.3, CAT = 2, and its VLC code from Table B.1 of Appendix B is 011. However, to find the appended bits, DIFF - 1 = -4 = 1111 ... 100, where the lower order bits are 00. Note the most significant bit of the appended bits is 0. Thus the codeword becomes 01100.

5.2.2.2 Coding of AC coefficients

For each nonzero AC coefficient in zigzag scan order, symbol-1 is described as a two-dimensional event of (RUN, CAT), sometimes called (RUN, SIZE). For the baseline encoder, CAT is the category for the amplitude of a nonzero coefficient in the zigzag order, and RUN is the number of zeros preceding this nonzero coefficient. The maximum length of run is limited to 15. Encoding of runs greater than 15 is done by a special symbol (15, 0), which is a run length of 15 zero coefficients followed by a coefficient of zero amplitude. Hence it can be interpreted as the extension symbol with 16 zero coefficients. There can be up to three consecutive (15, 0) symbols before the terminating symbol-1 followed by a single symbol-2. For example a (RUN = 34, CAT = 5) pair would result in three symbols a, b, and c, with a = (15, 0), b = (15, 0) and c = (2, 5).

An end of block (EOB) is designated to indicate that the rest of the coefficients of the block in the zigzag scanning order are quantised to zero. The EOB symbol is represented by (RUN = 0, CAT = 0).

The AC code table for symbol-1 consists of one Huffman codeword (maximum length 16 bits, not including additional bits) for each possible composite event. Table B.2 of Appendix B shows the codewords for all possible combinations of RUN and CAT of symbol-1 [4]. The format of the additional bits (symbol-2) is the same as in the coding of DIFF in DC coefficients. For the kth AC coefficient in the zigzag scan order, ZZ(k), the additional bits are either the lower order bits of ZZ(k) when ZZ(k) is positive, or the lower order bits of ZZ(k) - 1, when ZZ(k) is negative. In order to clarify this, let us look at a simple example.

Example: the quantised DCT coefficients of a luminance block are shown in Figure 5.5. If the DC coefficient in the previous luminance block was 29, find the codewords for coding of the DC and AC coefficients.

click to expand
Figure 5.5: Quantised DCT coefficients of a luminance block

Codeword for the DC coefficient

DIFF = 31 - 29 = 2. From Table 5.3, CAT = 2 and according to Table B.1 of Appendix B, the Huffman code for this value of CAT is 011. To find the appended bits, since DIFF = 2 > 0, then 2 = 000 ... 0010. Thus the appended bits are 10. Hence the overall codeword for coding the DC coefficient is 01110.

Codewords for the AC coefficients

Scanning starts from the first nonzero AC coefficient that has a value of 18. From Table 5.3, the CAT value for 18 is 5, and since there is no zero value AC coefficient before it, then RUN = 0. Hence, symbol-1 is (0, 5). From Table B.2 of Appendix B, the codeword for (0, 5) is 11010. The symbol-2 is the lower order bits of ZZ(k) = 18 = 000 ... 010010, which is 10010. Thus the first AC codeword is 1101010010.

The next nonzero AC coefficient in the zigzag scan order is -21. Since it has no zero coefficient before it and it is in the range of -31 to -16, then it has a RUN = 0, and CAT = 5. Thus symbol-1 of this coefficient is (0, 5), which again from Table B.2 of Appendix B, has a codeword of 11010. For symbol-2, since -21 < 0, then ZZ(k) - 1 = -21 - 1 = -22 = 111... 1101010, and symbol-2 becomes 01010 (note the appended bits start from where the most significant bit is 0). Thus the overall codeword for the second nonzero AC coefficient is 1101001010.

The third nonzero AC coefficient in the scan is -13, which has one zero coefficient before it. Then RUN = 1 and, from its range, CAT is 4. From Table B.2 of Appendix B, the codeword for (RUN = 1, CAT = 4) is 111110110. To find symbol-2, ZZ(k) -1 = -13 - 1 = -14 = 111... 110010. Thus symbol-2 = 0010, and the whole codeword becomes 1111101100010.

The fourth and the final nonzero AC coefficient is 5 (CAT = 3), which is preceded by three zeros (RUN = 3). Thus symbol-1 is (3, 3), which, from Table B.2 of Appendix B, has a codeword of 111111110101. For symbol-2, since ZZ(k) = 5 = 000 ... 00101, then the lower order bits are 101, and the whole codeword becomes 111111110101101.

Since 5 is the last nonzero AC coefficient, then the encoding terminates here and the end of block (EOB) code is transmitted which is defined as the (0, 0) symbol with no appended bits. From Table B.2 of Appendix B, its codeword is 1010.

5.2.2.3 Entropy coding

For coding of the magnitude categories or run length events, the JPEG standard specifies two alternative entropy coding methods, namely Huffman coding and arithmetic coding. Huffman coding procedures use Huffman tables, and the type of table is determined by the entropy table specifications, shown in Figure 5.3. Arithmetic coding methods use arithmetic coding conditioning tables, which may also be determined by the entropy table specification. There can be up to four different Huffman and arithmetic coding tables for each DC and AC coefficient. No default values for Huffman tables are specified, so the applications may choose tables appropriate for their own environment. Default tables are defined for the arithmetic coding conditioning. Baseline sequential coding uses Huffman coding, and extended DCT-based and lossless processes may use either Huffman or arithmetic coding (see Table 5.4).

Table 5.4: Summary: essential characteristics of coding process

Baseline process (required for all DCT-based decoders)

  • DCT-based process

  • source image: 8-bit samples within each component

  • sequential

  • Huffman coding: 2 AC and 2 DC tables

  • decoders shall process scans with 1, 2, 3 and 4 components

  • interleaved and noninterleaved scans

Extended DCT-based processes

  • DCT-based process

  • source image: 8-bit or 12-bit samples

  • sequential or progressive

  • Huffman or arithmetic coding: 4 AC and 4 DC tables

  • decoder shall process scans with 1, 2, 3 and 4 components

  • interleaved and noninterleaved scans

Lossless process

  • predictive process (not DCT-based)

  • source image; N-bit samples (2 N 16)

  • sequential

  • Huffman or arithmetic coding: 4 tables

  • decoders shall process scans with 1, 2, 3 and 4 components

  • interleaved and noninterleaved scans

Hierarchical processes

  • multiple layers (nondifferential and differential)

  • uses extended DCT-based or lossless processes

  • decoders shall process scans with 1, 2, 3 and 4 components

  • interleaved and noninterleaved scans

In arithmetic coding of AC coefficients, the length of zero run is no longer limited to 15; it can go up to the end of the block (e.g. 62). Also, arithmetic coding may be made adaptive to increase the coding efficiency. Adaptive means that the probability estimates for each context are developed based on a prior coding decision for that context. The adaptive binary arithmetic coder may use a statistical model to improve encoding efficiency. The statistical model defines the contexts which are used to select the conditional probability estimates used in the encoding and decoding procedures.

5.2.3 Extended DCT-based process

The baseline encoder only supports basic coding tools, which are sufficient for most image compression applications. These include input image with eight-bit/pixel precision, Huffman coding of the run length, and sequential transmission. If other modes, or any input image precision, are required, and in particular if arithmetic coding is employed to achieve higher compression, then the term extended DCT-based process is applied to the encoder. Table 5.4 summarises all the JPEG supported coding modes.

Figure 5.6 illustrates the reconstruction of a decoded image in a sequential mode (baseline or extended). As mentioned, as soon as a block of pixels is coded, its 64 coefficients are quantised, coded and transmitted. The receiver after decoding the coefficients, inverse quantisation and inverse transformation, sequentially adds them to the reconstructed image. Depending on the channel rate, it might take some time to reconstruct the whole image. In Figure 5.6, reconstructed images at 25, 50, 75 and 100 per cent of the image are shown.

click to expand
Figure 5.6: Reconstructed images in sequential mode

In the progressive mode, the quantised coefficients are stored in the local buffer and transmitted later. There are two procedures by which the quantised coefficients in the buffer may be partially encoded within a scan. Firstly, for the highest image quality (lowest quantisation step size) only a specified band of coefficients from the zigzag scanned sequence need to be coded. This procedure is called spectral selection, since each band typically contains coefficients which occupy a lower or higher part of the frequency spectrum for the 8 × 8 block. Secondly, the coefficients within the current band need not be encoded to their full accuracy within each scan (coarser quantisation). On a coefficient's first encoding, a specified number of the most significant bits are encoded first. In subsequent scans, the less significant bits are then encoded. This procedure is called successive approximation or bit plane encoding. Either procedure may be used separately, or they may be mixed in flexible combinations.

Figure 5.7 shows the reconstructed image quality with the first method. In this Figure, the first image is reconstructed from the DC coefficient only, with its full quantised precision. The second image is made up of DC (coefficient 0) plus the AC coefficients 1 and 8, according to the zigzag scan order. That is, after receiving the two new AC coefficients, a new image is reconstructed from these coefficients and the previously received DC coefficients. The third image is made up of coefficients 0, 1, 8, 16, 9, 2, 3, 10, 17 and 24. In the last image, all the significant coefficients (up to EOB) are included.

click to expand
Figure 5.7: Image reconstruction in progressive mode

5.2.4 Hierarchical mode

In the hierarchical mode, an image is coded as a sequence of layers in a pyramid. Each lower size image provides prediction for the next upper layer. Except for the top level of the pyramid, for each luminance and colour component at the lower levels the difference between the source components and the reference reconstructed image is coded. The coding of the differences may be done using only DCT-based processes, only lossless processes, or DCT-based processes with a final lossless process for each component.

Downsampling and upsampling filters, similar to those of Figures 2.4 and 2.6, may be used to provide a pyramid of spatial resolution, as shown in Figure 5.8. The hierarchical coder including the downsampling and upsampling filters is shown in Figure 5.9.

click to expand
Figure 5.8: Hierarchical multiresolution encoding

click to expand
Figure 5.9: A three-level hierarchical encoder

In this Figure, the image is lowpass filtered and subsampled by 4:1, in both directions, to give a reduced image size 1/16. The baseline encoder then encodes the reduced image. The decoded image at the receiver may be interpolated by 1:4 to give the full size image for display. At the encoder, another baseline encoder encodes the difference between the subsampled input image by 2:1 and the 1:2 upsampled decoded image. By repeating this process, the image is progressively coded, and at the decoder it is progressively built up. The bit rate at each level depends on the quantisation step size at that level. Finally, for lossless reversibility of the coded image, the difference between the input image and the latest decoded image is lossless entropy coded (no quantisation).

As we see, the hierarchical mode offers a progressive representation similar to the progressive DCT-based mode, but it is useful in environments which have multiresolution requirements. The hierarchical mode also offers the capability of progressive transmission to a final lossless stage, as shown in Figure 5.9.

5.2.5 Extra features

In coding of colour pictures, encoding is called noninterleaved if all blocks of a colour component are coded before beginning to code the next component. Encoding is interleaved if the encoder compresses a block of 8 × 8 pixels from each component in turn, considering the image format. For example, with the 4:4:4 format, one block from each luminance and two chrominance components are coded. In the 4:2:0 format, the encoder codes four luminance blocks before coding one block from each of Cb and Cr.

The encoder is also flexible to allow different blocks within a single component to be compressed using different quantisation tables resulting in a variation in the reconstructed image quality, based on their perceived relative importance. This method of coding is called region of interest (ROI) coding. Also, the standard can allow different regions within a single image block to be compressed at different rates.



Standard Codecs(c) Image Compression to Advanced Video Coding
Standard Codecs: Image Compression to Advanced Video Coding (IET Telecommunications Series)
ISBN: 0852967101
EAN: 2147483647
Year: 2005
Pages: 148
Authors: M. Ghanbari

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net