7.7 Coding of pictures

Since the encoder was described in terms of the basic unit of a macroblock, then the picture types may be defined in terms of their macroblock types. In the following each of these picture types are defined.

7.7.1 I-pictures

In I-pictures all the macroblocks are intra coded. There are two intra macroblock types: one that uses the current quantiser scale, intra-d, and the other that defines a new value for the quantiser scale, intra-q. Intra-d is the default value when the quantiser scale is not changed. Although these two types can be identified with 0 and 1, and no variable length code is required, the standard has foreseen some possible extensions to the macroblock types in the future. For this reason, they are VLC coded and intra-d is assigned with 1, and intra-q with 01. Extensions to the VLC codes with a start code of 0 are then open. The policy of making the coding tables open in this way was adopted by the MPEG group video committee in developing the international standard. The advantage of future extensions was judged to be worth the slight coding inefficiency.

If the macroblock type is intra-q, then the macroblock overhead should contain an extra five bits, to define the new quantiser scale between 1 and 31. For intra-d macroblocks, no quantiser scale is transmitted and the decoder uses the previously set value. Therefore the encoder may prefer to use as many intra-d types as possible. However, when the encoding rate is to be adjusted, which normally causes a new quantiser to be defined, the type is changed to intra-q. Note that, since in H.261 the bit rate is controlled at either the start of GOBs or rows of a GOB, then, if there is any intra-q in a GOB, it must be the first MB in that GOB, or rows of the GOB. In I-pictures of MPEG-1, an intra-q can be any of the macroblocks.

Each block within the MB is DCT coded and the coefficients are divided by the quantiser step size, rounded to the nearest integer. The quantiser step size is derived from the multiplication of the quantisation weighting matrix and the quantiser parameter (1 to 31). Thus the quantiser step size is different for different coefficients and may change from MB to MB. The only exception is the DC coefficients, which are treated differently. This is because the eye is sensitive to large areas of luminance and chrominance errors; then the accuracy of each DC value should be high and fixed. The quantiser step size for the DC coefficient is fixed to eight. Since in the quantisation weighting matrix the DC weighting element is eight, the quantiser parameter for the DC coefficient is always 1, irrespective of the quantisation parameter used for the remaining AC coefficients.

Due to the strong correlation between the DC values of blocks within a picture, the DC indices are coded losslessly by DPCM. Such a correlation does not exist among the AC coefficients, and hence they are coded independently. The prediction for the DC coefficients of luminance blocks follows the coding order of blocks within a macroblock and the raster scan order. For example, in the macroblocks of 4:2:0 format pictures shown in Figure 7.13, the DC coefficient of block Y₂ is used as a prediction for the DC coefficient of block Y₃. The DC coefficient of block Y₃ is a prediction for the DC coefficient of Y₀ of the next macroblock. For the chrominance, we use the DC coefficients of the corresponding value of the block in the previous macroblock.

Figure 7.13: Positions of luminance and chrominance blocks within a macroblock in 4—2—0 format

The differentially coded DC coefficient and the remaining AC coefficients are zigzag scanned, in the same manner as was explained for H.261 coefficients of Chapter 6. A copy of the coded picture is stored in the frame store to be used for the prediction of the next P and the past or future B-pictures.

7.7.2 P-pictures

As in I-pictures, each P-picture is divided into slices, which are in turn divided into macroblocks and then blocks for coding. Coding of P-pictures is more complex than of I-pictures, since motion compensated blocks may be constructed. For inter macroblocks, the difference between the motion compensated macroblock and the current macroblock is partitioned into blocks, and then DCT transformed and coded.

Decisions on the type of a macroblock, or whether motion compensation should be used or not, are similar to those for H.261 (see Chapter 6). Other H.261 coding tools, such as differential encoding of motion vectors, coded block pattern, zigzag scan, nature of variable length coding etc. are similar. In fact, coding of P-pictures is the same as coding each frame in H.261 with two major differences:

Motion estimation has a half pixel precision and, due to larger distances between the P-frames, the motion estimation range is much larger.
In MPEG-1 all intra-MB use the quantisation weighting matrix, whereas in H.261 all MB use a flat matrix. Also in MPEG-1 the intra-MB of P-pictures are predictively coded like those of I-pictures, with the exception that the prediction value is fixed at 128 × 8 if the previous macroblock is not intra coded.

Locally decoded P-pictures are stored in the frame store for further prediction. Note that, if B-pictures are used, two buffer stores are needed to store two prediction pictures.

7.7.3 B-pictures

As in I and P-pictures, B-pictures are divided into slices, which in turn are divided into macroblocks for coding. Due to the possibility of bidirectional motion compensation, coding is more complex than for P-pictures. Thus the encoder has more decisions to make than in the case of P-pictures. These are: how to divide the picture into slices, determining the best motion vectors to use, deciding whether to use forward, backward or interpolated motion compensation or to code intra, and how to set the quantiser scale. These make processing of B-pictures computationally very intensive. Note that motion compensation is the most costly operation in the codecs, and for every macroblock both forward and backward motion compensations have to be performed.

The encoder does not need to store decoded B-pictures, since they are not used for prediction. Hence B-pictures can be coded with larger distortions. In this regard, to reduce the slice overhead, larger slices (fewer slices in the picture) may be chosen.

In P-pictures, as for H.261, there are eight different types of macroblock. In B-pictures, due to backward motion compensation and interpolation of forward and backward motion compensation, the number of macroblock types is about 14. Figure 7.14 shows the flow chart for macroblock type decisions in B-pictures.

click to expand
Figure 7.14: Selection of macroblock types in B-pictures

The decision on the macroblock type starts with the selection of a motion compensation mode based on the minimisation of a cost function. The cost function is the mean-squared/absolute error of the luminance difference between the motion compensated macroblock and the current macroblock. The encoder first calculates the best forward motion compensated macroblock from the previous anchor picture for forward motion compensation. It then calculates the best motion compensated macroblock from the future anchor picture, as the backward motion compensation. Finally, the average of the two motion compensated errors is calculated to produce the interpolated macroblock. It then selects the one that had the smallest error difference with the current macroblock. In the event of a tie, an interpolated mode is chosen.

Another difference between macroblock types in B and P-pictures is in the definition of noncoded and skipped macroblocks. In P-pictures, the skipped MB is the one in which none of its blocks has any significant DCT coefficient (cbp (coded block pattern) = 0), and the motion vector is also zero. The first and the last MB in a slice cannot be declared skipped. They are treated as noncoded.

A noncoded MB in P-pictures is the one in which none of its blocks has any significant DCT coefficient (cbp = 0), but the motion vector is nonzero. Thus the first and the last MB in a slice, which could be skipped, is noncoded with motion vector set to zero! In H.261 the noncoded MB was called motion vector only (MC).

In B-pictures, the skipped MB has again all zero DCT coefficients, but the motion vector and the type of prediction mode (forward, backward or interpolated) is exactly the same as that of its previous MB. Similar to P-pictures, the first and the last MB in a slice cannot be declared skipped, and is in fact called noncoded.

The noncoded MB in B-pictures has all of its DCT coefficients zero (cbp = 0), but either its motion vector or its prediction (or both) is different from its previous MB.

7.7.4 D-pictures

D-pictures contain only low frequency information, and are coded as the DC coefficients of the blocks. They are intended to be used for fast visible search modes. A bit is transmitted for the macroblock type, although there is only one type. In addition there is a bit denoting the end of the macroblock. D-pictures are not part of the constrained bit stream.