10.11 Scalability

We have covered scalability in the standard codecs several times, but due to the nature of content-based coding, scalability in MPEG-4 can be different from the other standard codecs. However, since MPEG-4 is also used as a frame-based video codec, the scalability methods we have discussed so far can also be used in this codec. Thus we introduce two new methods of scalability that are only defined for MPEG-4.

10.11.1 Fine granularity scalability

The scalability methods we have seen so far, SNR, spatial and temporal, are normally carried out in two or a number of layers. These create coarse levels of representing various quality, spatial and temporal resolutions. A few spatial and temporal resolutions are quite acceptable for natural scenes, as it is of no practical use to have more resolution levels of these kinds. Hence, for the frame-based video, MPEG-4 also recommends the spatial and temporal scalabilities as we have discussed so far.

On the other hand, the existing SNR scalability has the potential to be represented in more quality levels. The increment in quality is both practical and appreciated by the human observer. Thus in MPEG-4, instead of SNR scalability, a synonymous method named fine granularity scalability (FGS) is recommended. In this method the base layer is coded similar to the base layer of a SNR scalable coder, namely coding of video at a given frame rate and a relatively large quantiser step size. Then the difference between the original DCT coefficients and the quantised coefficients in the base layer (base layer quantisation distortion) rather than being quantised with a finer quantiser step size, as is done in the SNR scalable coder, is represented in bit planes. Starting from the highest bit plane that contains nonzero bits, each bit plane is successively coded using run length coding, on a block by block basis. The codewords for the run lengths can be derived either from Huffman or arithmetic coding. Typically different codebooks are used for different bit planes, because the run length distributions across the bit planes are different.

10.11.2 Object-based scalability

In the various scalability methods described so far, including FGS, the scalability operation is applied to the entire frame. In object-based scalability, the scalability operations are applied to the individual objects. Of particular interest to MPEG-4 is object-based temporal scalability (OTS), where the frame rate of a selected object is enhanced, such that it has a smoother motion than the remaining area. That is, the temporal scalability is applied to some objects to increase their frame rates against the other objects in the frames.

MPEG-4 defines two types of OTS. In type-1 the video object layer 0 (VOL0) comprises the background and the object of interest. Higher frame rates for the object of interest are coded at VOL1, as shown in Figure 10.35, where it is predictively coded, or Figure 10.36, if the enhancement layer is bidirectionally coded.

click to expand
Figure 10.35: OTS enhancement structure of type-1, with predictive coding of VOL

click to expand
Figure 10.36: OTS enhancement structure of type-1, with bidirectional coding of VOL

On type-2 OTS, the background is separated from the object of interest, as shown in Figure 10.37. The background, VO0, is sent at a low frame rate without scalability. The object of interest is coded at two frame rates of base and enhancement layers of VOL0 and VOL1, respectively, as shown in the Figure.

click to expand
Figure 10.37: OTS enhancement structure of type-2