2.4 Picture quality assessment

2.4 Picture quality assessment

Conversion of digital pictures from one format to another, as well as their compression for bit rate reduction, introduces some distortions. It is of great importance to know whether the introduced distortion is acceptable to the viewers. Traditionally this has been done by subjective assessments, where the degraded pictures are shown to a group of subjects and their views on the perceived quality or distortions are sought.

Over the years, many subjective assessment methodologies have been developed and validated. Among them are: the double stimulus impairment scale (DSIS), where the subjects are asked to rate the impairment of the processed picture with respect to the reference unimpaired picture, and the double stimulus continuous quality scale (DSCQS), where the order of the presentation of the reference and processed pictures is unknown to the subjects. The subjects will then give a score between 1 and 100 containing adjectival guidelines placed at 20 point intervals (1–20 = bad, 21–40 = poor, 41–60 = fair, 61–80 = good and 81–100 = excellent) for each picture, and their difference is an indication of the quality [5]. Pictures are presented to the viewers for about ten seconds and the average of the viewers' scores, defined as the mean opinion score (MOS), is a measure of video quality. At least 20–25 nonexpert viewers are required to give a reliable MOS, excluding the outliers.

These methods are usually used in assessment of still images. For video evaluation single stimulus continuous quality evaluation (SSCQE) is preferred, where the time-varying picture quality of the processed video without reference is evaluated by the subjects [5]. In this method subjects are asked to continuously evaluate the video quality of a set of video scenes. The judgement criteria are the five scales used in the DSCQS above. Since video sequences are long, they are segmented into ten seconds shots, and for each video segment an MOS is calculated.

Although these methods give reliable indications of the perceived image quality, they are unfortunately time consuming and expensive. An alternative is objective measurements, or video quality metrics, which employ some mathematical models to mimic human visual systems behaviour.

In 1997 the Video Quality Experts Group (VQEG) formed from experts of ITU-T study group 6 and ITU-T study group 9 undertook this task [6]. They are considering three methods for the development of the video quality metric. In the first method, called the full reference (FR-TV) model, both the processed and the reference video segments are fed to the model and the outcome is a quantitative indicator of the video quality. In the second method, called the reduced reference (RR-TV) model, some features extracted from the spatio-temporal regions of the reference picture (e.g. mean and variance of pixels, colour histograms etc.) are made available to the model. The processed video is then required to generate similar statistics in those regions. In the third model, called no reference (NR-TV), or single ended, the processed video without any information from the reference picture excites the model. All these models should be validated with the SSCQE methods for various video segments. Early results indicate that these methods compared with the SSCQE perform satisfactorily, with a correlation coefficient of 0.8–0.9 [7].

Until any of these quality metrics become standards, it is customary to use the simplest form of objective measurement, which is the ratio of the peak-to-peak signal to the root-mean-squared processing noise. This is referred to as the peak-to-peak signal-to-noise ratio (PSNR) and defined as:

(2.7) click to expand

where Yref(i, j) and Yprc(i, j) are the pixel values of the reference and processed images, respectively, and N is the total number of pixels in the image. In this equation, the peak signal with an eight-bit resolution is 255, and the noise is the square of the pixel-to-pixel difference (error) between the reference image and the image under study. Although it has been claimed that in some cases the PSNR's accuracy is doubtful, its relative simplicity makes it a very popular choice.

Perhaps the main criticism against the PSNR is that the human interpretation of the distortions at different parts of the video can be different. Although it is hoped that the variety of interpretations can be included in the objective models, there are still some issues that not only the simple PSNR but also more sophisticated objective models may fail to address. For example, if a small part of a picture in a video is severely degraded, this hardly affects the PSNR or any objective model parameters (depending on the area of distortion), but this distortion attracts the observers' attention, and the video looks as bad as if a larger part of the picture was distorted. This type of distortion is very common in video, where due to a single bit error, blocks of 16 × 16 pixels might be erroneously decoded. This has almost no significant effect on PSNR, but can be viewed as an annoying artefact. In this case there will be a large discrepancy between the objective and subjective test results.

On the other hand one may argue that under similar conditions if one system has a better PSNR than the other, then the subjective quality can be better but not worse. This is the main reason that PSNR is still used in comparing the performance of various video codecs. However, in comparing codecs, PSNR or any objective measure should be used with great care, to ensure that the types of coding distortion are not significantly different from each other. For instance, objective results from the blockiness distortion produced by the block-based video codecs can be different from the picture smearing distortion introduced by the filter-based codecs. The fact is that even the subjects may interpret these distortions differently. It appears that expert viewers prefer blockiness distortion to smearing, and nonexperts' views are opposite!

In addition to the abovementioned problems of subjective and objective measurements of video quality, the impact of people's expectation of video quality cannot be ignored. As technology progresses and viewers become more familiar with digital video, their level of expectation of video quality can grow. Hence, a quality that today might be regarded 'good', may be rated as 'fair' or 'poor' tomorrow. For instance, watching a head-and-shoulders video coded at 64 kbit/s by the early prototypes of the video codecs in the mid 1980s was very fascinating. This was despite the fact that pictures were coded at one or two frames per second, and waiving hands in front of the camera would freeze the picture for a few seconds, or cause a complete picture break-up. But today, even 64 kbit/s coded video at 4–5 frames per seconds, without picture freeze, does not look attractive. As another example, most people might be quite satisfied with the quality of the broadcast TV at home, both analogue and digital, but if they watch football spectators on a broadcast TV side by side with an HDTV video, they then realise how much information they are missing. These are all indications that people's expectations of video quality in the future will be higher. Thus video codecs either have to be more sophisticated or more channel bandwidth should be assigned to them. Fortunately, with the advances in digital technology and growth in network bandwidth, both are feasible and in the future we will witness better quality video services.



Standard Codecs(c) Image Compression to Advanced Video Coding
Standard Codecs: Image Compression to Advanced Video Coding (IET Telecommunications Series)
ISBN: 0852967101
EAN: 2147483647
Year: 2005
Pages: 148
Authors: M. Ghanbari

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net