7.4 Encoder

As mentioned, the international standard does not specify the design of the video encoders and decoders. It only specifies the syntax and semantics of the bit stream and signal processing at the encoder/decoder interface. Therefore, options are left open to the video codec manufacturers to trade-off cost, speed, picture quality and coding efficiency. As a guideline, Figure 7.7 shows a block diagram of an MPEG-1 encoder. Again it is similar to the generic codec of Chapter 3 and the H.261 codec of Chapter 6. For simplicity the coding flags shown in the H.261 codec are omitted, although they also exist.

click to expand
Figure 7.7: A simplified MPEG-1 video encoder

The main differences between this encoder and that defined in H.261 are:

Frame reordering: at the input of the encoder coding of B-pictures is postponed to be carried out after coding the anchor I and P-pictures.
Quantisation: intraframe coded macroblocks are subjectively weighted to emulate perceived coding distortions.
Motion estimation: not only is the search range extended but the search precision is increased to half a pixel. B-pictures use bidirectional motion compensation.
No loop filter.
Frame store and predictors: to hold two anchor pictures for prediction of B-pictures.
Rate regulator: since here there is more than one type of picture, each generating different bit rates.

Before describing how each picture type is coded, and the main differences between this codec and H.261, we can describe the codec on a macroblock basis, as the basic unit of coding. Within each picture, macroblocks are coded in a sequence from left to right. Since 4:2:0 image format is used, then the six blocks of 8 × 8 pixels, four luminance and one of each chrominance component, are coded in turn. Note that the picture area covered by the four luminance blocks is the same as that covered by each of the chrominance blocks.

First, for a given macroblock, the coding mode is chosen. This depends on the picture type, the effectiveness of motion compensated prediction in that local region and the nature of the signal within the block. Secondly, depending on the coding mode, a motion compensated prediction of the contents of the block based on the past and/or future reference pictures is formed. This prediction is subtracted from the actual data in the current macroblock to form an error signal. Thirdly, this error signal is divided into 8×8 blocks and a DCT is performed on each block. The resulting two-dimensional 8×8 block of DCT coefficients is quantised and is scanned in zigzag order to convert into a one-dimensional string of quantised DCT coefficients. Fourthly, the side information for the macroblock, including the type, block pattern, motion vector and address alongside the DCT coefficients are coded. For maximum efficiency, all the data are variable length coded. The DCT coefficients are run length coded with the generation of events, as we discussed in H.261.

A consequence of using different picture types and variable length coding is that the overall bit rate is very variable. In applications that involve a fixed rate channel, a FIFO buffer is used to match the encoder output to the channel. The status of this buffer may be monitored to control the number of bits generated by the encoder. Controlling the quantiser parameter is the most direct way of controlling the bit rate. The international standard specifies an abstract model of the buffering system (the video buffering verifier) in order to limit the maximum variability in the number of bits that are used for a given picture. This ensures that a bit stream can be decoded with a buffer of known size (see section 7.8).