Flylib.com

Books Software

 
 
 

7.2 Preprocessing

7.2 Preprocessing

The source material for video coding may exist in a variety of forms such as computer files or live video in CCIR-601 format [4]. If CCIR-601 is the source, since MPEG-1 is for coding of video at VCR resolutions , then SIF format is normally used. These source pictures must be processed prior to coding. In Chapter 2 we explained how CCIR-601 video was converted to SIF format. If the source is film, we also discussed the conversion methodology in that Chapter. However, if computer source files do not have the SIF format, they have to be converted too. In MPEG-1, another preprocessing step is required to reorder the input pictures for coding. This is called picture reordering .

7.2.1 Picture reordering

Because of the conflicting requirements of random access and highly efficient coding, the MPEG suggested that not all pictures of a video sequence should be coded in the same way. They identified four types of picture in a video sequence. The first type is called I-pictures, which are coded without reference to the previous picture. They provide access points to the coded sequence for decoding. These pictures are intraframe coded as for JPEG, with a moderate compression. The second type is the P-pictures, which are predictively coded with reference to the previous I or P-coded pictures. They themselves are used as a reference (anchor) for coding of the future pictures. Coding of these pictures is very similar to H.261. The third type is B-pictures, or bidirectionally coded pictures, which may use past, future or combinations of both pictures in their predictions . This increases the motion compensation efficiency, since occluded parts of moving objects may be better compensated for from the future frame. B-pictures are never used for predictions. This part, which is unique to MPEG, has two important implications:

  1. If B-pictures are not used for predictions of future frames , then they can be coded with the highest possible compression without any side effects. This is because, if one picture is coarsely coded and is used as a prediction, the coding distortions are transferred to the next frame. This frame then needs more bits to clear the previous distortions, and the overall bit rate may increase rather than decrease.

  2. {% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}

    In applications such as transmission of video over packet networks, B-pictures may be discarded (e.g. due to buffer overflow) without affecting the next decoded pictures [5]. Note that if any part of the H.261 pictures, or I and P-pictures in MPEG, are corrupted during the transmission, the effect will propagate until they are refreshed [6].

Figure 7.3 illustrates the relationship between these three types of picture. Since B-pictures use I and P-pictures as predictions, they have to be coded later. This requires reordering the incoming picture order, which is carried out at the preprocessor.

click to expand
Figure 7.3: An example of MPEG-1 GOP

The fourth picture type is the D-pictures. These are intraframe coded, where only the DC coefficients are retained. Hence the picture quality is poor and normally used for applications like fast forward. D-pictures are not part of the GOP, hence they are not present in a sequence containing any other picture type.

7.3 Video structure

7.3.1 Group of pictures (GOP)

Since in the H.261 standard successive frames are similarly coded, a picture is the top level of the coding hierarchy. In MPEG-1 due to the existence of several picture types, a group of pictures, called GOP, is the highest level of the hierarchy. A GOP is a series of one or more pictures to assist random access into the picture sequence. The first coded picture in the group is an I-picture. It is followed by an arrangement for P and B-pictures, as shown in Figure 7.3.

The GOP length is normally defined as the distance between I-pictures, which is represented by parameter N in the standard codecs. The distance between the anchor I/P to P-pictures is represented by M. In the above Figure N = 12 and M = 3. The group of pictures may be of any length, but there should be at least one I-picture in each GOP. Applications requiring random access, fast forward play or fast and normal reverse play may use short GOPs. GOP may also start at scene cuts or other cases where motion compensation is not effective. The number of consecutive B-pictures is variable. Neither a P nor a B-picture needs to be present. For most applications, GOP in the SIF-625/50 format has N = 12 and M = 3. In SIF-525/60, the values are 15 and 3, respectively.

The encoding or transmission order of pictures differs from the display or incoming picture order. In the Figure B-pictures 1 and 2 are encoded after P-picture 0 and I-picture 3 are encoded. Also in this Figure B-pictures 13 and 14 are a part of the next GOP. Although their display order is 0,1,2,...,11, their encoding order is 3,1,2,6,4,5.. .. This reordering introduces delays amounting to several frames at the encoder (equal to the number of B-pictures between the anchor I and P-pictures). The same amount of delay is introduced at the decoder, in putting the transmission/decoding sequence back to its original. This format inevitably limits the application of MPEG-1 for telecommunications.

7.3.2 Picture

All the three main picture types, I, P and B, have the same SIF size with 4:2:0 format. In SIF-625 the luminance part of each picture has 360 pixels, 288 lines and 25 Hz, and those of each chrominance are 180 pixels, 144 lines and 25 Hz. In SIF-525, these values for luminance are 360 pixels, 240 lines and 30 Hz, and for the chrominance are 180, 120 and 30, respectively. For 4:2:0 format images, the luminance and chrominance samples are positioned as shown in Figure 2.3.

7.3.3 Slice

Each picture is divided into a group of macroblocks, called slices. In H.261 such a group was called GOB. The reason for defining a slice is the same as that for defining a GOB, namely resetting the variable length code to prevent channel error propagation into the picture. Slices can have different sizes within a picture, and the division in one picture need not be the same as the division in any other picture.

The slices can begin and end at any macroblock in a picture, but with some constraints. The first slice must begin at the top left of the picture (the first macroblock) and the end of the last slice must be the bottom right macroblock (the last macroblock ) of the picture, as shown in Figure 7.4. Therefore the minimum number of slices per picture is one, and the maximum number is equal to the number of macroblocks (e.g. 396 in SIF-625).

1 begin

end 1

2 begin

end 2

3 begin

end 3

4 begin

end 4

5 begin

end 5

6 begin

end 6

7 be in

end 7

8 begin

end 8

9 begin

end 9

10 be in

end 10

11 be in

end 11

12 begin

end 12

13 begin

end 13

14 egin

end 14

1 egin

end 15

11 legin

end 16

17 begin

end 17

18 begin

end 18


Figure 7.4: An example of slice structure for SIF-625 pictures

Each slice starts with a slice start code, and is followed by a code that defines its position and a code that sets the quantisation step size. Note that in H.261 the quantisation step sizes were set at each GOB or row of GOBs, but in MPEG-1 they can be set at any macroblock (see below). Therefore, in MPEG-1 the main reason for defining slices is not to reset a new quantiser, but to prevent the effects of channel error propagation. If the coded data is corrupted, and the decoder detects it, then it can search for the new slice, and the decoding starts from that point. Part of the picture slice from the start of the error to the next slice can then be degraded. Therefore in a noisy environment it is desirable to have as many slices as possible. On the other hand each slice has a large overhead, called slice start code (minimum of 32 bits). This creates a large overhead in the total bit rate. For example, if we use the slice structure of Figure 7.4, where there is one slice for each row of MBs, then for SIF-625 video there are 18 slices per picture, and with 25 Hz video, the slice overhead can be 32 × 18 × 25 = 14 400 bit/s.

To optimise the slice structure, that is, to give a good immunity from channel errors and at the same time to minimise the slice overhead, one might use short slices for macroblocks with significant energy (such as intra-MB), and long slices for less significant ones (e.g. macroblocks in B-pictures). Figure 7.5 shows a slice structure where in some parts the slice length extends beyond several rows of macroblocks, and in some cases is less than one row.

click to expand
Figure 7.5: Possible arrangement of slices in SIF-625

7.3.4 Macroblock

Slices are divided into macroblocks of 16 × 16 pixels, similar to the division of GOB into macroblocks in H.261. Macroblocks in turn are divided into blocks, for coding. In Chapter 6, we gave a detailed description of how a macroblock was coded, starting from its type, mode of selection, blocks within the MB, their positional addresses and finally the block pattern. Since MPEG-1 is also a macroblock-based codec, most of these rules are used in MPEG-1. However, due to differences of slice versus GOB, picture type versus a single picture format in H.261, there are bound to be variations in the coding. We first give a general account of these differences then, in the following section, more details about the macroblocks in the various picture types.

The first difference is that since a slice has a raster scan structure, macroblocks are addressed in a raster scan order. The top left macroblock in a picture has address 0, the next one on the right has address 1 and so on. If there are M macroblocks in a picture (e.g. M = 396), then the bottom right macroblock has address M - 1. To reduce the address overhead, macroblocks are relatively addressed by transmitting the difference between the current macroblock and the previously coded macroblock. This difference is called the macroblock address increment. In I-pictures, since all the macroblocks are coded, the macroblock address increment is always 1. The exception is that, for the first coded macroblock at the beginning of each slice, the macroblock address is set to that of the right-hand macroblock of the previous row. This address at the beginning of each picture is set to -1. If a slice does not start at the left edge of the picture (see the slice structure of Figure 7.5), then the macroblock address increment for the first macroblock in the slice will be larger than one. For example, in the slice structure of Figures 7.4 and 7.5 there are 22 macroblocks per row. For Figure 7.4, at the start of slice two, the macroblock address is set to 21, which is the address of the macroblock at the right-hand edge of the top row of macroblocks. In Figure 7.5, if the first slice contains 30 macroblocks, eight of them would be in the second row, so the address of the first macroblock in the second slice would be 30 and the macroblock increment would be nine. For further reduction of address overhead, macroblock address increments are VLC coded.

There is no code to indicate a macroblock address increment of zero. This is why the macroblock address is set to -1 rather than zero at the top of the picture. The first macroblock will have an increment of one, making its address equal to zero.

7.3.5 Block

Finally, the smallest part of the picture structure is the block of 8 × 8 pixels, for both luminance and chrominance components . DCT coding is applied at this block level. Figure 7.6 illustrates the whole structure of partitioning a video sequence, from its GOP level at the top to the smallest unit of block at the bottom.

click to expand
Figure 7.6: MPEG-1 coded video structure