Video Basics

Before discussing the fundamentals of video compression, let us look at how video signals are generated. Their characteristics will help us to understand how they can be exploited for bandwidth reduction without actually introducing perceptual distortions. In this regard, we will first look at image formation and colour video. Interlaced/progressive video is explained, and its impact on the signal bandwidth and display units is discussed. Representation of video in digital form and the need for bit rate reductions will be addressed. Finally, the image formats to be coded for various applications and their quality assessments will be analysed.

Analogue video

Scanning

Video signals are normally generated at the output of a camera by scanning a two-dimensional moving scene and converting it into a one-dimensional electric signal. A moving scene is a collection of individual pictures or images, where each scanned picture generates a frame of the picture. Scanning starts at the top left corner of the picture and ends at the bottom right.

The choice of number of scanned lines per picture is a trade-off between the bandwidth, flicker and resolution. Increasing the number of scanning lines per picture increases the spatial resolution. Similarly, increasing the number of pictures per second will increase the temporal resolution. There is a lower limit to the number of pictures per second, below which flicker becomes perceptible. Hence, flicker-free, high resolution video requires larger bandwidth.

If a frame is formed by the single scanning of a picture, it is called progressive scanning. Alternatively, two pictures may be scanned at two different times, with the lines interleaved, such that two consecutive lines of a frame belong to alternate fields to form a frame. In this case, each scanned picture is called a field, and the scanning is called interlaced. Figure 2.1 shows progressive and interlaced frames.

click to expand
Figure 2.1: Progressive and interlaced frames

The concept behind interlaced scanning is to trade-off vertical-spatial resolution with that of the temporal. For instance, slow moving objects can be perceived with higher vertical resolution, since there are not many changes between the successive fields. At the same time, the human eye does not perceive flicker since the objects are displayed at field rates. For fast moving objects, although vertical resolution is reduced, the human eye is not sensitive to spatial resolutions at high display rates. Therefore, the bandwidth of television signals is halved without significant loss of picture resolution. Usually, in interlaced video, the number of lines per field is half the number of lines per frame, or the number of fields per second is twice the number of frames per second. Hence, the number of lines per second remains fixed.

It should be noted that if high spatio-temporal video is required, for example in high definition television (HDTV), then the progressive mode should be used. Although interlaced video is a good trade-off in television, it may not be suitable for computer displays, owing to the closeness of the screen to the viewer and the type of material normally displayed, such as text and graphs. If television pictures were to be used with computers, the result would be annoying interline flicker, line crawling etc. To avoid these problems, computers use noninterlaced (also called progressive or sequential) displays with refresh rates higher than 50/60 frames per second, typically 72 frames/s.

Colour components

During the scanning, a camera generates three primary colour signals called red, green and blue, the so-called RGB signals. These signals may be further processed for transmission and storage. For compatibility with black and white video and because of the fact that the three colour signals are highly correlated, a new set of signals at different colour space is generated. These are called colour systems, and the three standards are NTSC, PAL and SECAM [1]. We will concentrate on the PAL system as an example, although the basic principles involved in the other systems are very similar.

The colour space in PAL is represented by YUV, where Y represents the luminance and U and V represent the two colour components. The basis YUV colour space can be generated from gamma-corrected RGB (referred to in equations as R'G'B') components as follows:

(2.1) 

In the PAL system the luminance bandwidth is normally 5 MHz, although in PAL system-I, used in the UK, it is 5.5 MHz. The bandwidth of each colour component is only 1.5 MHz, because the human eye is less sensitive to colour resolution. For this reason, in most image processing applications, such as motion estimation, decisions on the types of block to be coded or not coded (see Chapter 6) are made on the luminance component only. The decision is then extended to the corresponding colour components. Note that for higher quality video, such as high definition television (HDTV), the luminance and chrominance components may have the same bandwidth, but nevertheless all the decisions are made on the luminance components. In some applications the chrominance bandwidth may be reduced much further than the ratio of 1.5/5 MHz.

Digital video

The process of digitising analogue video involves the three basic operations of filtering, sampling and quantisation. The filtering operation is employed to avoid the aliasing artefacts of the follow-up sampling process. The filtering applied to the luminance can be different to that for chrominance, owing to different bandwidth requirements.

Filtered luminance and chrominance signals are sampled to generate a discrete time signal. The minimum rate at which each component can be sampled is its Nyquist rate and corresponds to twice the signal bandwidth. For a PAL system this is in the range of 10–11 MHz. However, due to the requirement to make the sampling frequency a harmonic of the analogue signal line frequency, the sampling rate for broadcast quality signals has been recommended by CCIR to be 13.5 MHz, under recommendation CCIR-601 [2]. This is close to three times the PAL subcarrier frequency. The chrominance sampling frequency has also been defined to be half the luminance sampling frequency. Finally, sampled signals are quantised to eight-bit resolution, suitable for video broadcasting applications.

It should be noted that colour space recommended by CCIR-601 is very close to the PAL system. The precise luminance (Y) and chrominance (Cb and Cr) equations under this recommendation are:

(2.2) 

The slight departure from the PAL parameters is due to the requirement that, in the digital range, Y should take values in the range of 16–235 quantum levels. Also, the normally AC chrominance components of U and V are centred on the grey level 128, and the range is defined from 16 to 240. The reasons for these modifications are:

  1. to reduce the granular noise of all three signals in later stages of processing
  2. to make chrominance values positive to ease processing operations (e.g. storage).

Note that despite the unique definition for Y, Cb and Cr, the CCIR-601 standard for European broadcasting is different from that for North America and the Far East. In the former, the number of lines per frame is 625 and the number of frames per second is 25. In the latter these values are 525 and 30, respectively. The number of samples per active line, called picture elements (pixels) is 720 for both systems. In the 625-line system, the total number of pixels per line, including the horizontal blanking, is 13.5 MHz times 64 μs, equal to 864 pixels. Note also that despite the differences in the number of lines and frames rates, the number of pixels generated per second under both CCIR-601/625 and CCIR-601/525 is the same. This is because in digital television we are interested in the active parts of the picture, and the number of active television lines per frame in CCIR-601/625 is 576 and the total number of pixels per second becomes equal to 720 × 576 × 25 = 10 368 000. In CCIR-601/525 the number of active lines is 480, and the total number of pixels per second is 720 × 480 × 30 = 10 368 000.

The total bit rate is then calculated by considering that there are half the luminance pixels for each of the chrominance pixels, and with eight bits per pixel, the total bit rate becomes 10 368 000 × 2 × 8 = 165 888 000 bits/s. Had we included all the horizontal and vertical blanking, then the total bandwidth would be 13.5 × 106 × 2 × 8 = 216 Mbit/s. Either of these values is much greater than the equivalent analogue bandwidth, hence the video compression to reduce the digital bit rate is very demanding. In the following chapters we will show how such a huge bit rate can be compressed down to less than 10 Mbit/s, without noticeable effect on picture quality.

Image format

CCIR-601 is based on an image format for studio quality. For other applications, images with various degrees of resolution and dimensions might be preferred. For example, in videoconferencing or videotelephony, small image sizes with lower resolutions require much less bandwidth than the studio or broadcast video, and at the same time the resultant image quality is quite acceptable for the application. On the other hand, for HDTV, larger image sizes with improved luminance and chrominance resolutions are preferred.

SIF images

In most cases the video sources to be coded by standard video codecs are produced by CCIR-601 digitised video signals direct from the camera. It is then logical to relate picture resolutions and dimensions of various applications to those of CCIR-601. The first sets of images related to CCIR-601 are the lower resolution images for storage applications.

A lower resolution to CCIR-601 would be an image sequence with half the CCIR-601 resolutions in each direction. That is, in each CCIR-601 standard, active parts of the image in the horizontal, vertical and temporal dimensions are halved. For this reason it is called the source input format, or SIF [3]. The resultant picture is noninterlaced (progressive). The positions of the chrominance samples share the same block boundaries with those of the luminance samples, as shown in Figure 2.2. For every four luminance samples, Y, there will be one pair of chrominance components, Cb and Cr.

click to expand
Figure 2.2: Positioning of luminance and chrominance samples (dotted lines indicate macroblock boundaries)

Thus, for the European standard, the SIF picture resolution becomes 360 pixels per line, 288 lines per picture and 25 pictures per second. For North America and the Far East, these values are 360, 240 and 30, respectively.

One way of converting the source video rate (temporal resolution) is to use only odd or even fields. Another method is to take the average values of the two fields. Discarding one field normally introduces aliasing artefacts, but simple averaging blurs the picture. For better SIF picture quality more sophisticated methods of rate conversion are required, which inevitably demand more processing power. The horizontal and vertical resolutions are halved after filtering and subsampling of the video source.

Considering that in CCIR-601 the chrominance bandwidth is half of the luminance, then the number of each chrominance pixel per line is half of the luminance pixels, but their frame rates and the number of lines per frame are equal. This is normally referred to as 4:2:2 image format. Figure 2.3 shows the luminance and chrominance components for the 4:2:2 image format. As the Figure shows, in the scanning direction (horizontal) there is a pair of chrominance samples for every alternate luminance sample, but the chrominance components are present in every line. For SIF pictures, there is a pair of chrominance samples for every four luminance pixels as shown in the Figure.

click to expand
Figure 2.3: Sampling pattern for 4—2—2 (CCIR 601) and 4—2—0 SIF

Thus, in SIF, the horizontal and vertical resolutions of luminance will be half of the source resolutions, but for the chrominance, although horizontal resolution is halved, the vertical resolution has to be one quarter. This is called 4:2:0 format.

The lowpass filters used for filtering the source video are different for luminance and chrominance coefficients. The luminance filter coefficient is a seven-tap filter with characteristics:

(2.3) 

Use of a power of two for the devisor allows a simple hardware implementation.

For the chrominance the filter characteristic is a four-tap filter of the type:

(2.4) 

Hence, the chrominance samples have to be placed at a horizontal position in the middle of the luminance samples, with a phase shift of half a sample. These filters are not part of the international standard, and other filters may be used. Figure 2.4 illustrates the subsampling and lowpass filtering of the CCIR-601 format video into SIF format.

click to expand
Figure 2.4: Conversion of CCIR-601 to SIF

Note that the number of luminance pixels per line of CCIR-601 is 720. Hence the horizontal resolutions of SIF luminance and chrominance should be 360 and 180, respectively. Since in the standard codecs the coding unit is based on macroblocks of 16 × 16 pixels, 360 is not divisible by 16. Therefore from each of the leftmost and rightmost sides of SIF four pixels are removed.

The preprocessing into SIF format is not normative, and other preprocessing steps and other resolutions may be used. The picture size need not even be a multiple of 16. In this case a video coder adds padding pixels to the right or bottom edges of the picture. For example, a horizontal resolution of 360 pixels could be coded by adding eight pixels to the right edge of each horizontal row, bringing the total to 368. Now 23 macroblocks would be coded in each row. The decoder would discard the extra padding pixels after decoding, giving the final decoded resolution of 360 pixels.

The sampling format of 4:2:0 should not be confused with that of the 4:1:1 format used in some digital VCRs. In this format chrominance has the same vertical resolution as luminance, but horizontal resolution is one quarter. This can be represented with the sampling pattern shown in Figure 2.5. Note that 4:1:1 has the same number of pixels as 4:2:0!

click to expand
Figure 2.5: Sampling pattern of 4—1—1 image format

Conversion from SIF to CCIR 601 format

A SIF is converted to its corresponding CCIR-601 format by spatial upsampling as shown in Figure 2.6. A linear phase finite impulse response (FIR) is applied after the insertion of zeros between samples [3]. A filter that can be used for upsampling the luminance is a seven-tap FIR filter with the impulse response of:

(2.5) 

click to expand
Figure 2.6: Upsampling and filtering from SIF to CCIR-601 format (a luminance signals) (b chrominance signals)

At the end of the lines some special techniques such as replicating the last pixel must be used. Note that the DC response of this filter has a gain of two. This is due to the inserted alternate zeros in the upsampled samples, such that the upsampled values retain their maximum nominal value of 255.

According to CCIR recommendation 601, the chrominance samples need to be cosited with the luminance samples 1, 3, 5 ... In order to achieve the proper location, the upsampling filter should have an even number of taps, as given by:

(2.6) 

Note again, the filter has a gain of two.

The SIF may be reconstructed by inserting four black pixels into each end of the horizontal luminance line in the decoded bitmap, and two grey pixels (value of 128) to each of the horizontal chrominance lines. The luminance SIF may then be upsampled horizontally and vertically. The chrominance SIF should be upsampled once horizontally and twice vertically, as shown in Figure 2.6b.

CIF image format

For worldwide videoconferencing, a video codec has to cope with the CCIR-601 of both European (625 line, 50 Hz) and North America and Far East (525 line, 60 Hz) video formats. Hence, CCIR-601 video sources from these two different formats have to be converted to a common format. The picture resolutions also need to be reduced, to be able to code them at lower bit rates.

Considering that in CCIR-601 the number of pixels per line in both the 625/50 and 525/60 standards is 720 pixels per line, then half of this value, 360 pixels/line, was chosen as the horizontal resolution. For the vertical and temporal resolutions, a value intermediate between the two standards was chosen such that the combined vertical × temporal resolutions were one quarter of that of CCIR-601. The 625/50 system has the greater vertical resolution. Since the active picture area is 576 lines, half of this value is 288 lines. On the other hand, the 525/60 system has the greater temporal resolution, so that the half rate is 30 Hz. The combination of 288 lines and 30 Hz gives the required vertical x temporal resolution. This is illustrated in Figure 2.7.

click to expand
Figure 2.7: Spatio-temporal relation in CIF format

Such an intermediate selection of vertical resolution from one standard and temporal resolution from the other leads to the adopted name common intermediate format (CIF). Therefore a CIF picture has a luminance with 360 pixels per lines, 288 lines per picture and 30 (precisely 29.97) pictures per second [4]. The colour components are at half the spatial resolution of luminance, with 180 pixels per line and 144 lines per picture. Temporal resolutions of colour components are the same as for the luminance at 29.97 Hz.

In CIF format, like SIF, pictures are progressive (noninterlaced), and the positions of the chrominance samples share the same block boundaries with those of the luminance samples, as shown in Figure 2.2. Also like SIF, the image format is also 4:2:0 and similar down-conversion and up-conversion filters to those shown in Figures 2.4 and 2.6 can also be applied to CIF images. Note the difference between SIF-625 and CIF and SIF-525 and CIF. In the former the only difference is in the number of pictures per second, while in the latter they differ in the number of lines per picture.

SubQCIF, QSIF, QCIF

For certain applications, such as video over mobile networks or videotelephony, it is possible to reduce the frame rate. Known reduced frame rates for CIF and SIF-525 are 15, 10 and 7.5 frames/s. These rates for SIF-625 are 12.5 and 8.3 frames/s. To balance the spatio-temporal resolutions, the spatial resolutions of the images are normally reduced, nominally by halving in each direction. These are called quarter-SIF (QSIF) and quarter-CIF (QCIF) for SIF and CIF formats, respectively. Conversion of SIF or CIF to QSIF and QCIF (or vice versa) can be carried out with a similar method to converting CCIR-601 to SIF and CIF, respectively, using the same filter banks shown in Figures 2.4 and 2.6. Lower frame rate QSIF and QCIF images are normally used for very low bit rate video.

Certain applications, such as video over mobile networks, demand even smaller image sizes. SubQCIF is the smallest standard image size, with the horizontal and vertical picture resolutions of 128 pixels by 96 pixels, respectively. The frame rate can be very low (e.g. five frames/s) to suit the channel rate. The image format in this case is 4:2:0, and hence the chrominance resolution is half the luminance resolution in each direction.

HDTV

Currently there is no European standard for HDTV. The North American and Far Eastern HDTV has a nominal resolution of twice the 525-line CCIR-601 format. Hence, the filter banks of Figures 2.4 and 2.6 can also be used for image size conversion. Also, since in HDTV higher chrominance bandwidth is desired, it can be made equal to the luminance, or half of it. Hence there will be upto a pair of chrominance pixels for every luminance pixel, and the image format can be made 4:2:2 or even 4:4:4. In most cases HDTV is progressive, to improve vertical resolution.

It is common practice to define image format in terms of relations between 8 × 8 pixel blocks with a macroblock of 16 × 16 pixels. The concept of macroblock and block will be explained in Chapter 6. Figure 2.8 shows how blocks of luminance and chrominance in various 4:2:0, 4:2:2 and 4:4:4 image formats are defined.

click to expand
Figure 2.8: Macroblock structures (a 4—2—0) (b 4—2—2) (c 4—4—4)

For a more detailed representation of image formats, especially discriminating 4:2:0 from 4:1:1, one can relate the horizontal and vertical resolutions of the chrominance components to those of luminance as shown in Table 2.1. Note, the luminance resolution is the same as the number of pixels in each scanning direction.

Table 2.1: Percentage of each chrominance component resolution with respect to luminance in the horizontal and vertical directions

Image format

Horizontal [%]

Vertical [%]

4:4:4

100

100

4:2:2

50

100

4:2:0

50

50

4:1:1

25

100

Conversion from film

Sometimes sources available for compression consist of film material, which has a nominal frame rate of 24 pictures per second. This rate can be converted to 30 pictures per second by the pulldown technique [3]. In this mode digitised pictures are shown alternately for three and two television field times, generating 60 fields per second. This alteration may not be exact, since the actual frame rate in the 525/60 system is 29.97 frames per second. Editing and splicing of compressed video after the conversion might also have changed the pulldown timing. A sophisticated encoder might detect the duplicated fields, average them to reduce digitisation noise and code the result at the original 24 pictures per second rate. This should give a significant improvement in quality over coding at 30 pictures per second. This is because, first of all, when coding at 24 pictures per second the bit rate budget per frame is larger than that for 30 pictures per second. Secondly, direct coding of 30 pictures per second destroys the 3:2 pulldown timing and gives a jerky appearance to the final decoded video.

Temporal resampling

Since the picture rates are limited to those commonly used in the television industry, the same techniques may be applied. For example, conversion from 24 pictures per second to 60 fields can be achieved by the technique of 3 :2 pulldown. Video coded at 25 pictures per second can be converted to 50 fields per second by displaying the original decoded lines in the odd CCIR-601 fields, and the interpolated lines in the even fields. Video coded at 29.97 or 30 pictures per second may be converted to a field rate twice as large using the same method.

Picture quality assessment

Conversion of digital pictures from one format to another, as well as their compression for bit rate reduction, introduces some distortions. It is of great importance to know whether the introduced distortion is acceptable to the viewers. Traditionally this has been done by subjective assessments, where the degraded pictures are shown to a group of subjects and their views on the perceived quality or distortions are sought.

Over the years, many subjective assessment methodologies have been developed and validated. Among them are: the double stimulus impairment scale (DSIS), where the subjects are asked to rate the impairment of the processed picture with respect to the reference unimpaired picture, and the double stimulus continuous quality scale (DSCQS), where the order of the presentation of the reference and processed pictures is unknown to the subjects. The subjects will then give a score between 1 and 100 containing adjectival guidelines placed at 20 point intervals (1–20 = bad, 21–40 = poor, 41–60 = fair, 61–80 = good and 81–100 = excellent) for each picture, and their difference is an indication of the quality [5]. Pictures are presented to the viewers for about ten seconds and the average of the viewers' scores, defined as the mean opinion score (MOS), is a measure of video quality. At least 20–25 nonexpert viewers are required to give a reliable MOS, excluding the outliers.

These methods are usually used in assessment of still images. For video evaluation single stimulus continuous quality evaluation (SSCQE) is preferred, where the time-varying picture quality of the processed video without reference is evaluated by the subjects [5]. In this method subjects are asked to continuously evaluate the video quality of a set of video scenes. The judgement criteria are the five scales used in the DSCQS above. Since video sequences are long, they are segmented into ten seconds shots, and for each video segment an MOS is calculated.

Although these methods give reliable indications of the perceived image quality, they are unfortunately time consuming and expensive. An alternative is objective measurements, or video quality metrics, which employ some mathematical models to mimic human visual systems behaviour.

In 1997 the Video Quality Experts Group (VQEG) formed from experts of ITU-T study group 6 and ITU-T study group 9 undertook this task [6]. They are considering three methods for the development of the video quality metric. In the first method, called the full reference (FR-TV) model, both the processed and the reference video segments are fed to the model and the outcome is a quantitative indicator of the video quality. In the second method, called the reduced reference (RR-TV) model, some features extracted from the spatio-temporal regions of the reference picture (e.g. mean and variance of pixels, colour histograms etc.) are made available to the model. The processed video is then required to generate similar statistics in those regions. In the third model, called no reference (NR-TV), or single ended, the processed video without any information from the reference picture excites the model. All these models should be validated with the SSCQE methods for various video segments. Early results indicate that these methods compared with the SSCQE perform satisfactorily, with a correlation coefficient of 0.8–0.9 [7].

Until any of these quality metrics become standards, it is customary to use the simplest form of objective measurement, which is the ratio of the peak-to-peak signal to the root-mean-squared processing noise. This is referred to as the peak-to-peak signal-to-noise ratio (PSNR) and defined as:

(2.7)  click to expand

where Yref(i, j) and Yprc(i, j) are the pixel values of the reference and processed images, respectively, and N is the total number of pixels in the image. In this equation, the peak signal with an eight-bit resolution is 255, and the noise is the square of the pixel-to-pixel difference (error) between the reference image and the image under study. Although it has been claimed that in some cases the PSNR's accuracy is doubtful, its relative simplicity makes it a very popular choice.

Perhaps the main criticism against the PSNR is that the human interpretation of the distortions at different parts of the video can be different. Although it is hoped that the variety of interpretations can be included in the objective models, there are still some issues that not only the simple PSNR but also more sophisticated objective models may fail to address. For example, if a small part of a picture in a video is severely degraded, this hardly affects the PSNR or any objective model parameters (depending on the area of distortion), but this distortion attracts the observers' attention, and the video looks as bad as if a larger part of the picture was distorted. This type of distortion is very common in video, where due to a single bit error, blocks of 16 × 16 pixels might be erroneously decoded. This has almost no significant effect on PSNR, but can be viewed as an annoying artefact. In this case there will be a large discrepancy between the objective and subjective test results.

On the other hand one may argue that under similar conditions if one system has a better PSNR than the other, then the subjective quality can be better but not worse. This is the main reason that PSNR is still used in comparing the performance of various video codecs. However, in comparing codecs, PSNR or any objective measure should be used with great care, to ensure that the types of coding distortion are not significantly different from each other. For instance, objective results from the blockiness distortion produced by the block-based video codecs can be different from the picture smearing distortion introduced by the filter-based codecs. The fact is that even the subjects may interpret these distortions differently. It appears that expert viewers prefer blockiness distortion to smearing, and nonexperts' views are opposite!

In addition to the abovementioned problems of subjective and objective measurements of video quality, the impact of people's expectation of video quality cannot be ignored. As technology progresses and viewers become more familiar with digital video, their level of expectation of video quality can grow. Hence, a quality that today might be regarded 'good', may be rated as 'fair' or 'poor' tomorrow. For instance, watching a head-and-shoulders video coded at 64 kbit/s by the early prototypes of the video codecs in the mid 1980s was very fascinating. This was despite the fact that pictures were coded at one or two frames per second, and waiving hands in front of the camera would freeze the picture for a few seconds, or cause a complete picture break-up. But today, even 64 kbit/s coded video at 4–5 frames per seconds, without picture freeze, does not look attractive. As another example, most people might be quite satisfied with the quality of the broadcast TV at home, both analogue and digital, but if they watch football spectators on a broadcast TV side by side with an HDTV video, they then realise how much information they are missing. These are all indications that people's expectations of video quality in the future will be higher. Thus video codecs either have to be more sophisticated or more channel bandwidth should be assigned to them. Fortunately, with the advances in digital technology and growth in network bandwidth, both are feasible and in the future we will witness better quality video services.

Problems

1. 

In a PAL system, determine the values of the three colour primaries R, G and B for the following colours: red, green, blue, yellow, cyan, magenta and white.

 r g b y c m w r 1 0 0 1 0 1 1 g 0 1 0 1 1 0 1 b 0 0 1 0 1 1 1

2. 

Calculate the luminance and chrominance values of the colours in problem 1, if they are digitised into eight bits, according to CCIR-601 specification.

 r g b y c m w y 82 145 41 210 170 107 235 c b 90 54 240 16 166 202 128 c r 240 34 110 146 16 222 128

3. 

Calculate the horizontal scanning line frequency for CCIR-601/625 and CCIR-601/525 line systems and hence their periods.

 and

4. 

CCIR-601/625 video is normally digitised at 13.5 MHz sampling rate. Find the number of pixels per scanning line. If there are 720 pixels in the active part of the horizontal scanning, find the duration of horizontal scanning fly-back (i.e. horizontal blanking interval).

 ; 864 - 720 = 144 pixels;

5. 

Repeat problem 4 for CCIR-601/525.

857 pixels; 857 - 720 = 137 pixels; 10 s

6. 

Find the bit rate per second of the following video formats (only active pixels are considered):

  1. CCIR-601/625; 4:2:2
  2. CCIR-601/525; 4:2:2
  3. SIF/625; 4:2:0
  4. SIF/525; 4:2:0
  5. CIF
  6. SIF/625; 4:1:1
  7. SIF/525; 4:1:1
  8. QCIF(15 Hz)
  9. subQCIF(10 Hz)

 a. 720 576 25 2 8 = 166 mbit/s b. 720 480 30 2 x 8 = 166 mbit/s c. 360 288 25 1.5 8 = 31 mbit/s d. 360 240 30 1.5 8 = 31 mbit/s e. 37 mbit/s f. 31 mbit/s g. 31 mbit/s h. 4.7 mbit/s i. 1.4 mbit/s

7. 

The luminance values of a set of pixels in a CCIR-601 video are: 128; 128; 128; 120; 60; 50; 180; 154; 198; 205; 105; 61; 93; 208; 250; 190; 128; 128; 128. They are filtered and downsampled by 2:1 into SIF format. Find the luminance values of seven SIF samples, starting from the fourth pixel of CCIR-601.

94 73 194 184 50 204 207

8. 

The luminance values of the SIF pixels in problem 7 are upsampled and filtered into CCIR-601 format. Find the reconstructed CCIR-601 format pixel values. Calculate the PSNR of the reconstructed samples.

94 82 73 132 194 201 184 109 50 121 204 222 207 psnr = 20.2 db

9. 

A pure sinusoid is linearly quantised into n bits:

  1. show that the signal-to-quantisation noise ratio (SNR) in dB is given by SNR = 6n + 1.78,
  2. find such an expression, for peak SNR (PSNR),
  3. calculate the minimum bit per pixel required for quantising video, such that PSNR is better than 58 dB.

      hint: 

    (the mean-squared quantisation error of a uniformly quanitised waveform with step size Δ is Δ2/12).

 a. a sinusoid with amplitude a has a peak-to-peak 2a = 2 n ∆ → ∆ a = a 2 1- n b. peak-to-peak power of the sinusoid is higher than its mean power → psnr = 10.78 + 6n c. 10.78 + 6 n ≥ 58 → n ≥ 8 bits

Answers

1. 

 

r

g

b

y

c

m

w

R

1

0

0

1

0

1

1

G

0

1

0

1

1

0

1

B

0

0

1

0

1

1

1

2. 

 

r

g

b

y

c

m

w

Y

82

145

41

210

170

107

235

cb

90

54

240

16

166

202

128

cr

240

34

110

146

16

222

128

3. 

and

4. 

; 864 - 720 = 144 pixels;

5. 

857 pixels; 857 - 720 = 137 pixels; 10 μs

6. 

  1. 720 × 576 × 25 × 2 × 8 = 166 Mbit/s
  2. 720 × 480 × 30 × 2 x 8 = 166 Mbit/s
  3. 360 × 288 × 25 × 1.5 × 8 = 31 Mbit/s
  4. 360 × 240 × 30 × 1.5 × 8 = 31 Mbit/s
  5. 37 Mbit/s
  6. 31 Mbit/s
  7. 31 Mbit/s
  8. 4.7 Mbit/s
  9. 1.4 Mbit/s

7. 

94 73 194 184 50 204 207

8. 

94 82 73 132 194 201 184 109 50 121 204 222 207 PSNR = 20.2 dB

9. 

  1. A sinusoid with amplitude A has a peak-to-peak 2A = 2n ∆ ⇒ ∆ A = A 21-n

    click to expand

  2. Peak-to-peak power of the sinusoid is

    higher than its mean power ⇒ PSNR = 10.78 + 6n

  3. 10.78 + 6n ≥ 58 ⇒ n ≥ 8 bits

References

1 NETRAVALI, A.N., and HASKELL, B.G.: 'Digital pictures, representation and compression and standards' (Plenum Press, New York, 1995, 2nd edn)

2 CCIR Recommendation 601: 'Digital methods of transmitting television information'. Recommendation 601, encoding parameters of digital television for studios

3 MPEG-1: 'Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s'. ISO/IEC 11172-2: video, November 1991

4 OKUBO, S.: 'Video codec standardisation in CCITT study group XV', Signal Process. Image Commun., 1989, pp.45–54

5 Recommendation ITU-R BT.500 (revised): 'Methodology for the subjective assessment of the quality of television pictures'

6 VQEG: The Video Quality Experts Group, RRNR-TV Group Test Plan, draft version 1.4, 2000

7 TAN, K.T,GHANBARI, M., and PEARSON, D.E.: 'An objective measurement tool for MPEG video quality', Signal Process., 1998, 7, pp.279–294

Page not found. Sorry. :(



Standard Codecs(c) Image Compression to Advanced Video Coding
Standard Codecs: Image Compression to Advanced Video Coding (IET Telecommunications Series)
ISBN: 0852967101
EAN: 2147483647
Year: 2005
Pages: 148
Authors: M. Ghanbari

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net