2.3 Image format

CCIR-601 is based on an image format for studio quality. For other applications, images with various degrees of resolution and dimensions might be preferred. For example, in videoconferencing or videotelephony, small image sizes with lower resolutions require much less bandwidth than the studio or broadcast video, and at the same time the resultant image quality is quite acceptable for the application. On the other hand, for HDTV, larger image sizes with improved luminance and chrominance resolutions are preferred.

2.3.1 SIF images

In most cases the video sources to be coded by standard video codecs are produced by CCIR-601 digitised video signals direct from the camera. It is then logical to relate picture resolutions and dimensions of various applications to those of CCIR-601. The first sets of images related to CCIR-601 are the lower resolution images for storage applications.

A lower resolution to CCIR-601 would be an image sequence with half the CCIR-601 resolutions in each direction. That is, in each CCIR-601 standard, active parts of the image in the horizontal, vertical and temporal dimensions are halved. For this reason it is called the source input format, or SIF [3]. The resultant picture is noninterlaced (progressive). The positions of the chrominance samples share the same block boundaries with those of the luminance samples, as shown in Figure 2.2. For every four luminance samples, Y, there will be one pair of chrominance components, C_b and C_r.

click to expand
Figure 2.2: Positioning of luminance and chrominance samples (dotted lines indicate macroblock boundaries)

Thus, for the European standard, the SIF picture resolution becomes 360 pixels per line, 288 lines per picture and 25 pictures per second. For North America and the Far East, these values are 360, 240 and 30, respectively.

One way of converting the source video rate (temporal resolution) is to use only odd or even fields. Another method is to take the average values of the two fields. Discarding one field normally introduces aliasing artefacts, but simple averaging blurs the picture. For better SIF picture quality more sophisticated methods of rate conversion are required, which inevitably demand more processing power. The horizontal and vertical resolutions are halved after filtering and subsampling of the video source.

Considering that in CCIR-601 the chrominance bandwidth is half of the luminance, then the number of each chrominance pixel per line is half of the luminance pixels, but their frame rates and the number of lines per frame are equal. This is normally referred to as 4:2:2 image format. Figure 2.3 shows the luminance and chrominance components for the 4:2:2 image format. As the Figure shows, in the scanning direction (horizontal) there is a pair of chrominance samples for every alternate luminance sample, but the chrominance components are present in every line. For SIF pictures, there is a pair of chrominance samples for every four luminance pixels as shown in the Figure.

click to expand
Figure 2.3: Sampling pattern for 4—2—2 (CCIR 601) and 4—2—0 SIF

Thus, in SIF, the horizontal and vertical resolutions of luminance will be half of the source resolutions, but for the chrominance, although horizontal resolution is halved, the vertical resolution has to be one quarter. This is called 4:2:0 format.

The lowpass filters used for filtering the source video are different for luminance and chrominance coefficients. The luminance filter coefficient is a seven-tap filter with characteristics:

(2.3)

Use of a power of two for the devisor allows a simple hardware implementation.

For the chrominance the filter characteristic is a four-tap filter of the type:

(2.4)

Hence, the chrominance samples have to be placed at a horizontal position in the middle of the luminance samples, with a phase shift of half a sample. These filters are not part of the international standard, and other filters may be used. Figure 2.4 illustrates the subsampling and lowpass filtering of the CCIR-601 format video into SIF format.

click to expand
Figure 2.4: Conversion of CCIR-601 to SIF

Note that the number of luminance pixels per line of CCIR-601 is 720. Hence the horizontal resolutions of SIF luminance and chrominance should be 360 and 180, respectively. Since in the standard codecs the coding unit is based on macroblocks of 16 × 16 pixels, 360 is not divisible by 16. Therefore from each of the leftmost and rightmost sides of SIF four pixels are removed.

The preprocessing into SIF format is not normative, and other preprocessing steps and other resolutions may be used. The picture size need not even be a multiple of 16. In this case a video coder adds padding pixels to the right or bottom edges of the picture. For example, a horizontal resolution of 360 pixels could be coded by adding eight pixels to the right edge of each horizontal row, bringing the total to 368. Now 23 macroblocks would be coded in each row. The decoder would discard the extra padding pixels after decoding, giving the final decoded resolution of 360 pixels.

The sampling format of 4:2:0 should not be confused with that of the 4:1:1 format used in some digital VCRs. In this format chrominance has the same vertical resolution as luminance, but horizontal resolution is one quarter. This can be represented with the sampling pattern shown in Figure 2.5. Note that 4:1:1 has the same number of pixels as 4:2:0!

click to expand
Figure 2.5: Sampling pattern of 4—1—1 image format

2.3.2 Conversion from SIF to CCIR-601 format

A SIF is converted to its corresponding CCIR-601 format by spatial upsampling as shown in Figure 2.6. A linear phase finite impulse response (FIR) is applied after the insertion of zeros between samples [3]. A filter that can be used for upsampling the luminance is a seven-tap FIR filter with the impulse response of:

(2.5)

click to expand
Figure 2.6: Upsampling and filtering from SIF to CCIR-601 format (a luminance signals) (b chrominance signals)

At the end of the lines some special techniques such as replicating the last pixel must be used. Note that the DC response of this filter has a gain of two. This is due to the inserted alternate zeros in the upsampled samples, such that the upsampled values retain their maximum nominal value of 255.

According to CCIR recommendation 601, the chrominance samples need to be cosited with the luminance samples 1, 3, 5 ... In order to achieve the proper location, the upsampling filter should have an even number of taps, as given by:

(2.6)

Note again, the filter has a gain of two.

The SIF may be reconstructed by inserting four black pixels into each end of the horizontal luminance line in the decoded bitmap, and two grey pixels (value of 128) to each of the horizontal chrominance lines. The luminance SIF may then be upsampled horizontally and vertically. The chrominance SIF should be upsampled once horizontally and twice vertically, as shown in Figure 2.6b.

2.3.3 CIF image format

For worldwide videoconferencing, a video codec has to cope with the CCIR-601 of both European (625 line, 50 Hz) and North America and Far East (525 line, 60 Hz) video formats. Hence, CCIR-601 video sources from these two different formats have to be converted to a common format. The picture resolutions also need to be reduced, to be able to code them at lower bit rates.

Considering that in CCIR-601 the number of pixels per line in both the 625/50 and 525/60 standards is 720 pixels per line, then half of this value, 360 pixels/line, was chosen as the horizontal resolution. For the vertical and temporal resolutions, a value intermediate between the two standards was chosen such that the combined vertical × temporal resolutions were one quarter of that of CCIR-601. The 625/50 system has the greater vertical resolution. Since the active picture area is 576 lines, half of this value is 288 lines. On the other hand, the 525/60 system has the greater temporal resolution, so that the half rate is 30 Hz. The combination of 288 lines and 30 Hz gives the required vertical x temporal resolution. This is illustrated in Figure 2.7.

click to expand
Figure 2.7: Spatio-temporal relation in CIF format

Such an intermediate selection of vertical resolution from one standard and temporal resolution from the other leads to the adopted name common intermediate format (CIF). Therefore a CIF picture has a luminance with 360 pixels per lines, 288 lines per picture and 30 (precisely 29.97) pictures per second [4]. The colour components are at half the spatial resolution of luminance, with 180 pixels per line and 144 lines per picture. Temporal resolutions of colour components are the same as for the luminance at 29.97 Hz.

In CIF format, like SIF, pictures are progressive (noninterlaced), and the positions of the chrominance samples share the same block boundaries with those of the luminance samples, as shown in Figure 2.2. Also like SIF, the image format is also 4:2:0 and similar down-conversion and up-conversion filters to those shown in Figures 2.4 and 2.6 can also be applied to CIF images. Note the difference between SIF-625 and CIF and SIF-525 and CIF. In the former the only difference is in the number of pictures per second, while in the latter they differ in the number of lines per picture.

2.3.4 SubQCIF, QSIF, QCIF

For certain applications, such as video over mobile networks or videotelephony, it is possible to reduce the frame rate. Known reduced frame rates for CIF and SIF-525 are 15, 10 and 7.5 frames/s. These rates for SIF-625 are 12.5 and 8.3 frames/s. To balance the spatio-temporal resolutions, the spatial resolutions of the images are normally reduced, nominally by halving in each direction. These are called quarter-SIF (QSIF) and quarter-CIF (QCIF) for SIF and CIF formats, respectively. Conversion of SIF or CIF to QSIF and QCIF (or vice versa) can be carried out with a similar method to converting CCIR-601 to SIF and CIF, respectively, using the same filter banks shown in Figures 2.4 and 2.6. Lower frame rate QSIF and QCIF images are normally used for very low bit rate video.

Certain applications, such as video over mobile networks, demand even smaller image sizes. SubQCIF is the smallest standard image size, with the horizontal and vertical picture resolutions of 128 pixels by 96 pixels, respectively. The frame rate can be very low (e.g. five frames/s) to suit the channel rate. The image format in this case is 4:2:0, and hence the chrominance resolution is half the luminance resolution in each direction.

2.3.5 HDTV

Currently there is no European standard for HDTV. The North American and Far Eastern HDTV has a nominal resolution of twice the 525-line CCIR-601 format. Hence, the filter banks of Figures 2.4 and 2.6 can also be used for image size conversion. Also, since in HDTV higher chrominance bandwidth is desired, it can be made equal to the luminance, or half of it. Hence there will be upto a pair of chrominance pixels for every luminance pixel, and the image format can be made 4:2:2 or even 4:4:4. In most cases HDTV is progressive, to improve vertical resolution.

It is common practice to define image format in terms of relations between 8 × 8 pixel blocks with a macroblock of 16 × 16 pixels. The concept of macroblock and block will be explained in Chapter 6. Figure 2.8 shows how blocks of luminance and chrominance in various 4:2:0, 4:2:2 and 4:4:4 image formats are defined.

click to expand
Figure 2.8: Macroblock structures (a 4—2—0) (b 4—2—2) (c 4—4—4)

For a more detailed representation of image formats, especially discriminating 4:2:0 from 4:1:1, one can relate the horizontal and vertical resolutions of the chrominance components to those of luminance as shown in Table 2.1. Note, the luminance resolution is the same as the number of pixels in each scanning direction.

Table 2.1: Percentage of each chrominance component resolution with respect to luminance in the horizontal and vertical directions
Image format	Horizontal [%]	Vertical [%]
4:4:4	100	100
4:2:2	50	100
4:2:0	50	50
4:1:1	25	100

2.3.6 Conversion from film

Sometimes sources available for compression consist of film material, which has a nominal frame rate of 24 pictures per second. This rate can be converted to 30 pictures per second by the pulldown technique [3]. In this mode digitised pictures are shown alternately for three and two television field times, generating 60 fields per second. This alteration may not be exact, since the actual frame rate in the 525/60 system is 29.97 frames per second. Editing and splicing of compressed video after the conversion might also have changed the pulldown timing. A sophisticated encoder might detect the duplicated fields, average them to reduce digitisation noise and code the result at the original 24 pictures per second rate. This should give a significant improvement in quality over coding at 30 pictures per second. This is because, first of all, when coding at 24 pictures per second the bit rate budget per frame is larger than that for 30 pictures per second. Secondly, direct coding of 30 pictures per second destroys the 3:2 pulldown timing and gives a jerky appearance to the final decoded video.

2.3.7 Temporal resampling

Since the picture rates are limited to those commonly used in the television industry, the same techniques may be applied. For example, conversion from 24 pictures per second to 60 fields can be achieved by the technique of 3 :2 pulldown. Video coded at 25 pictures per second can be converted to 50 fields per second by displaying the original decoded lines in the odd CCIR-601 fields, and the interpolated lines in the even fields. Video coded at 29.97 or 30 pictures per second may be converted to a field rate twice as large using the same method.