10.9 Coding of still images

MPEG-4 also supports coding of still images with a high coding efficiency as well as spatial and SNR scalability. The coding principle is based on the discrete wavelet transform, which was described in some length in Chapter 4. The lowest subband after quantisation is coded with a differential pulse code modulation (DPCM) and the higher bands with a variant of embedded zero tree wavelet (EZW) [18]. The quantised DPCM and zero tree data are then entropy coded with an arithmetic encoder. Figure 10.25 shows a block diagram of the still image encoder.

click to expand
Figure 10.25: Block diagram of a wavelet-based still image encoder

In the following sections each part of the encoder is described.

10.9.1 Coding of the lowest band

The wavelet coefficients of the lowest band are coded independently from the other bands. These coefficients are DPCM coded with a uniform quantiser. The prediction for coding a wavelet coefficient w_x is taken from its neighbouring coefficients w_A or w_C, according to:

(10.15)

The difference between the actual wavelet coefficient w_x and its predicted value w_prd is coded. The positions of the neighbouring pixels are shown in Figure 10.26.

Figure 10.26: Prediction for coding the lowest band coefficients

The coefficients after DPCM coding are encoded with an adaptive arithmetic coder. First the minimum value of the coefficient in the band is found. This value,

known as band_offset, is subtracted from all the coefficients to limit their lower bound to zero. The maximum value of the coefficients as band_max_value is also calculated. These two values are included in the bit stream.

For adaptive arithmetic coding [19], the arithmetic coder model is initialised at the start of coding with a uniform distribution in the range of 0 to band_max_value. Each quantised and DPCM coded coefficient after arithmetic coding is added to the distribution. Hence, as the encoding progresses, the distribution of the model adapts itself to the distribution of the coded coefficients (adaptive arithmetic coding).

10.9.2 Coding of higher bands

For efficient compression of higher bands as well as for a wide range of scalability, the higher order wavelet coefficients are coded with the embedded zero tree wavelet (EZW) algorithm first introduced by Shapiro [18]. Details of this coding technique were given in Chapter 4. Here we show how it is used within the MPEG-4 still image coding algorithm.

Figure 10.27 shows a multiscale zero tree coding algorithm, based on EZW, for coding of higher bands. The wavelet coefficients are first quantised with a quantiser Q₀. The quantised coefficients are scanned with the zero tree concept (exploiting similarities among the bands of the same orientation), and then entropy coded with an arithmetic coder (AC). The generated bits comprise the first portion of the bit stream, as the base layer data, BS₀. The quantised coefficients of the base layer after inverse quantisation are subtracted from the input wavelet coefficients, and the residual quantisation distortions are requantised by another quantiser, Q₁. These are then zero tree scanned (ZTS), entropy coded to represent the second portion of the bit stream, BS₁. The procedure is repeated for all the quantisers, Q₀ to Q_N, to generate N + 1 layers of the bit stream.

click to expand
Figure 10.27: A multiscale encoder of higher bands

The quantisers used in each layer are uniform with a dead band zone of twice the quantiser step size of that layer. The quantiser step size in each layer is specified by the encoder in the bit stream. As we are already aware each quantiser is a multilayer and the quantiser step size of a lower layer is several times that of its immediate upper layer. This is because, for a linear quantiser Q_i, with a quantiser step size of qi, the maximum residual quantisation distortion is qi (for those which fall in the dead zone of Q_i) and qi/2 (for those which are quantised). Hence for a higher layer quantiser Q_i₊₁, with a quantiser step size of q_i₊₁ to be efficient, then q_i+₁

should be several times smaller than qi. If , then it becomes a bilevel quantiser.

The number of quantisers indicates the number of SNR-scalable layers, and the quantiser step size in each layer determines the granularity of SNR scalability at that layer. For finest granularity of SNR scalability, all the layers can use a bilevel (one bit) quantiser. In this case, for optimum encoding efficiency, the quantiser step size of each layer is exactly twice that of its immediate upper layer. Multistage quantisation in this mode now becomes quantisation by successive approximation or bit plane encoding, described in Chapter 4. Here, the number of quantisers is equal to the number of bit planes, required to represent the wavelet transform coefficients. In this bilevel quantisation, instead of quantiser step sizes, the maximum number of bit planes is specified in the bit stream.

As the Figure shows, the quantised coefficients are zero tree scanned (ZTS) to exploit similarities among the bands of the same orientation. The zero tree takes advantage of the principle that if a wavelet coefficient at a lower frequency band is insignificant, then all the wavelet coefficients of the same orientation at the same spatial location are also likely to be insignificant. A zero tree exists at any node, when a coefficient is zero and all the node's children are zero trees. The wavelet trees are efficiently represented and coded by scanning each tree from the root at the lowest band through the children and assigning symbols to each state of the tree.

If a multilevel quantiser is used, then each node encounters three symbols: zero tree root (ZT), value zero tree root (VZ) and value (V). A zero tree root symbol denotes a coefficient that is the root of a zero tree. When such a symbol is coded, the zero tree does not need to be scanned further, because it is known that all the coefficients in such a tree have zero values. A value zero tree root symbol is a node where the coefficient has a nonzero value, and all its four children are zero tree roots. The scan of this tree can stop at this symbol. A value symbol identifies a coefficient with value either zero or nonzero, but some of the descendents are nonzero. The symbols and the quantised coefficients are then entropy coded with an adaptive arithmetic coder.

When a bilevel quantiser is used, the value of each coefficient is either 0 or 1. Hence, depending on the implementation procedure, different types of symbol can be defined. Since multilayer bilevel quantisation is in fact quantisation by successive approximation, then this mode is exactly the same as coding the symbols at EZW. There we defined four symbols of +, -, ZT and Z, where ZT is a zero tree root symbol, and Z is an isolated zero within a tree and + and - are the values for refinement (see section 4.5 for details).

In order to achieve both spatial and SNR scalability, two different scanning methods are employed in this scheme. For spatial scalability, the wavelet coefficients are scanned from subband to subband, starting from the lowest frequency band to the highest frequency band. For SNR scalability, the wavelet coefficients are scanned quantiser to quantiser. The scanning method is defined in the bit stream.

10.9.3 Shape adaptive wavelet transform

Shape adaptive wavelet (SA-wavelet) coding is used for compression of arbitrary shaped textures. SA-wavelet coding is different from the regular wavelet coding mainly in its treatment of the boundaries of arbitrary shaped texture. The coding ensures that the number of wavelet coefficients to be coded is exactly the same as the number of pixels in the arbitrary shaped region, and coding efficiency at the object boundaries is the same as for the middle of the region. When the object boundary is rectangular, SA-wavelet coding becomes the same as regular wavelet coding.

The shape information of an arbitrary shaped region is used in performing the SA-wavelet transform in the following manner. Within each region, the first row of pixels belonging to that region and the first segment of the consecutive pixels in the row are identified. Depending on whether the starting point in the region has odd or even coordinates, and the number of pixels in the row of the segment is odd or even, the proper arrangements for 2:1 downsampling and use of symmetric extensions are made [3].

Coding of the SA-wavelet coefficients is the same as coding of regular wavelet coefficients, except that a modification is needed to handle partial wavelet trees that have wavelet coefficients corresponding to pixels outside the shape boundary. Such wavelet coefficients are called out nodes of the wavelet trees. Coding of the lowest band is the same as that of the regular wavelet, but the out nodes are not coded. For the higher bands, for any wavelet trees without out nodes, the regular zero tree is applied. For a partial tree, a minor modification to the regular zero tree coding is needed to deal with the out nodes. That is, if the entire branch of a partial tree has out nodes only, no coding is needed for this branch, because the shape information is available to the decoder to indicate this case. If a parent node is not an out node, all the children out nodes are set to zero, so that the out nodes do not affect the status of the parent node as the zero tree root or isolated zero. At the decoder, shape information is used to identify such zero values as out nodes. If the parent node is an out node and not all of its children are out nodes, there are two possible cases. The first case is that some of its children are not out nodes, but they are all zeros. This case is treated as a zero tree root and there is no need to go down the tree. The shape information indicates which children are zeros and which are out nodes. The second case is that some of its children are not out nodes and at least one of such nodes is nonzero. In this case, the out node parent is set to zero and the shape information helps the decoder to know that this is an out node, and coding continues further down the tree. There is no need to use a separate symbol for any out nodes.