3. Full-Reference Quality Assessment using Structural Distortion Measures


3. Full-Reference Quality Assessment using Structural Distortion Measures

The paradigm of error sensitivity based image and video quality assessment considers any kind of image distortions as being certain types of errors. Since different error structures will have different effects on perceived image quality, the effectiveness of this approach depends on how the structures of the errors are understood and represented. Linear channel decomposition is the most commonly used way to decompose the error signals into a set of elementary components, and the visual error sensitivity models for these elementary components are relatively easily obtained from psychovisual experiments. As described in Section 2.5, because linear channel decomposition methods cannot fully decorrelate the structures of the signal, the decomposed coefficients still exhibit strong correlations with each other. It has been argued in Section 2.5 that the Minkowski error metric cannot capture these structural correlations. Therefore, the error sensitivity based paradigm relies on a very powerful masking model, which must cover various kinds of intra- and inter-channel interactions between the decomposed coefficients. Current knowledge about visual masking effects is still limited. At this moment, it is not clear whether building a comprehensive masking model is possible or not, but it is likely that even if it were possible, the model would be very complicated.

In this section, we propose an alternative way to think about image quality assessment: it is not necessary to consider the difference between an original image and a distorted image as a certain type of error. What we will now describe as structural distortion measurement may lead to more efficient and more effective image quality assessment methods.

3.1 New Philosophy

In [8] and [68], a new philosophy in designing image and video quality metrics has been proposed:

The main function of the human visual system is to extract structural information from the viewing field, and the human visual system is highly adapted for this purpose. Therefore, a measurement of structural distortion should be a good approximation of perceived image distortion.

The new philosophy can be better understood by comparison with the error sensitivity based philosophy:

First, a major difference of the new philosophy from the error sensitivity based philosophy is the switch from error measurement to structural distortion measurement. Although error and structural distortion sometimes agree with each other, in many circumstances the same amount of error may lead to significantly different structural distortion. A good example is given in Figures 8 and 9, where the original "Lena" image is altered with a wide variety of distortions: impulsive salt-pepper noise, additive Gaussian noise, multiplicative speckle noise, mean shift, contrast stretching, blurring, and heavy JPEG compression. We tuned all the distorted images to yield the same MSE relative to the original one, except for the JPEG compressed image, which has a slightly smaller MSE. It is interesting to see that images with nearly identical MSE have drastically different perceptual quality. Our subjective evaluation results show that the contrast stretched and the mean shifted images provide very high perceptual quality, while the blurred and the JPEG compressed images have the lowest subjective scores [7,68]. This is no surprise with a good understanding of the new philosophy since the structural change from the original to the contrast stretched and mean shifted images is trivial, but to the blurred and JPEG compressed images the structural modification is very significant.

Second, another important difference of the new philosophy is that it considers image degradation as perceived structural information loss. For example, in Figure 41.9, the contrast stretched image has a better quality than the JPEG compressed image simply because almost all the structural information of the original image is preserved, in the sense that the original image can be recovered via a simple pointwise inverse linear luminance transform. Apparently, a lot of information in the original image is permanently lost in the JPEG compressed image. The reason that a structural information loss measurement can be considered as a prediction of visual perception is based on the assumption that the HVS functions similarly — it has adapted to extract structural information and to detect changes in structural information. By contrast, an error sensitivity based approach estimates perceived errors to represent image degradation. If it works properly, then a significant perceptual error should be reported for the contrast stretched image because its difference (in terms of error) from the original image is easily discerned.

Third, the new philosophy uses a top-down approach, which starts from the very top level — simulating the hypothesized functionality of the overall HVS. By comparison, the error sensitivity based philosophy uses a bottom-up approach, which attempts to simulate the function of each relevant component in the HVS and combine them together, in the hope that the combined system will perform similarly to the overall HVS.

How to apply the new philosophy to create a concrete image and video quality assessment method is an open issue. There may be very different implementations, depending on how the concepts of "structural information" and "structural distortion" are interpreted and quantified. Generally speaking, there may be two ways of implementing a quality assessment algorithm using the new philosophy. The first is to develop a feature description framework of natural images, which covers most of the useful structural information of an image signal. Under such a description framework, structural information changes between the original and the distorted signals can be quantified. The second is to design a structure comparison method that can compare structural similarity or structural difference between the original and the distorted signals directly. As a first attempt to implement this new philosophy, a simple image quality indexing approach was proposed in [7,68], which conforms to the second approach.

3.2 An Image Quality Indexing Approach

Let x = {xi | i = 1,2,N} and y = {xi | i = 1, 2,N} be the original and the test image signals, respectively. The proposed quality index is defined as:

(41.4)

where

click to expand

The dynamic range of Q is [-1, 1]. The best value 1 is achieved if and only if yi = xi for all i = 1, 2,N. The lowest value of -1 occurs when yi = -xi, for all i=1,2,N.

This quality index models any distortion as a combination of three factors: loss of correlation, mean distortion and contrast distortion. In order to understand this, we rewrite the definition of Q as the product of three components:

(41.5)

The first component is the correlation coefficient between x and y, which measures the degree of linear correlation between x and y, and its dynamic range is [-1, 1]. The best value 1 is obtained when yi = axi + b for all i = 1, 2,N, where a and b are constants and a > 0. We consider the linear correlation coefficient as a very important factor in comparing the structures of two signals. Notice that a pointwise linearly changed signal can be recovered exactly with a simple pointwise inverse linear transform. In this sense, the "structural information" is preserved. Furthermore, a decrease in the linear correlation coefficient gives a quantitative measure of how much the signal is changed non-linearly. Obviously, even if x and y are linearly correlated, there still may be relative distortions between them, which are evaluated in the second and third components. The second component, with a range of [0, 1], measures how similar the mean values of x and y are. It equals 1 if and only if = . σx and σy can be viewed as rough estimate of the contrast of x and y , so the third component measures how similar the contrasts of the images are. Its range of values is also [0, 1], where the best value 1 is achieved if and only if σx= σy.

Image signals are generally non-stationary and image quality is often spatially variant. In practice it is usually desired to evaluate an entire image using a single overall quality value. Therefore, it is reasonable to measure statistical features locally and then combine them together. We apply our quality measurement method to local regions using a sliding window approach. Starting from the top-left corner of the image, a sliding window of size B B moves pixel by pixel horizontally and vertically through all the rows and columns of the image until the bottom-right corner is reached. At the j-th step, the local quality index Qj is computed within the sliding window. If there are a total of M steps, then the overall quality index is given by

(41.6)

It has been shown that many image quality assessment algorithms work consistently well if the distorted images being compared are created from the same original image and the same type of distortions (e.g., JPEG compression). In fact, for such comparisons, the MSE or PSNR is usually sufficient to produce useful quality evaluations. However, the effectiveness of image quality assessment models degrades significantly when the models are employed to compare the quality of distorted images originating from different types of original images with different types of distortions. Therefore, cross-image and cross-distortion tests are very useful in evaluating the effectiveness of an image quality metric.

The images in Figures 41.8 and 41.9 are good examples for testing the cross-distortion capability of the quality assessment algorithm. Obviously, the MSE performs very poorly in this case. The quality indices of the images are calculated and given in Figures 41.8 and 41.9, where the sliding window size is fixed at B=8. The results exhibit surprising consistency with the subjective measures. In fact, the ranks given by the quality index are the same as the mean subjective ranks of our subjective evaluations [7,68]. We noticed that many subjects regard the contrast stretched image to have better quality than the mean shifted image and even the original image. This is no surprise because contrast stretching is often an image enhancement process, which often increases the visual quality of the original image. However, if we assume that the original image is the perfect one (as our quality measurement method does), then it is fair to give the mean shifted image a higher quality score.

click to expand
Figure 41.8: Evaluation of "Lena" images with different types of noise. Top-left— Original "Lena" image, 512 512, 8bits/pixel; Top-right— Impulsive salt-pepper noise contaminated image, MSE=225, Q=0.6494; Bottom-left— Additive Gaussian noise contaminated image, MSE=225, Q=0.3891; Bottom-right— Multiplicative speckle noise contaminated image, MSE=225, Q=0.4408.

click to expand
Figure 41.9: Evaluation of "Lena" images with different types of distortions. Top-left— Mean shifted image, MSE=225, Q=0.9894; Top-right— Contrast stretched image, MSE=225, Q=0.9372; Bottom-left— Blurred image, MSE=225, Q=0.3461; Bottom-right— JPEG compressed image, MSE=215, Q=0.2876.

In Figures 41.10 and 41.11, different images with the same distortion types are employed to test the cross-image capability of the quality index. In Figure 41.10, three different images are blurred, such that they have almost the same MSE with respect to their original ones. In Figure 41.11, three other images are compressed using JPEG, and the JPEG compression quantization steps are selected so that the three compressed images have similar MSE in comparison with their original images. Again, the MSE has very poor correlation with perceived image quality in these tests, and the proposed quality indexing algorithm delivers much better consistency with visual evaluations.

click to expand
Figure 41.10: Evaluation of blurred image quality. Top-left— Original "Woman" image; Top-right— Blurred "Woman" image, MSE=200, Q=0.3483; Middle-left— Original "Man" image; Middle-right— Blurred "Man" image, MSE=200, Q=0.4123; Bottom-left— Original "Barbara" image; Bottom-right— Blurred "Barbara" image, MSE=200, Q=0.6594.

click to expand
Figure 41.11: Evaluation of JPEG compressed image quality. Top-left— Original "Tiffany" image; Top-right— compressed "Tiffany" image, MSE=165, Q=0.3709; Middle-left— Original "Lake" image; Middle-right— compressed "Lake" image, MSE=167, Q=0.4606; Bottom-left— Original "Mandrill" image; Bottom-right— compressed "Mandrill" image, MSE=163, Q=0.7959.

Interested users may refer to [69] for more demonstrative images and an efficient MATLAB implementation of the proposed quality indexing algorithm.

The proposed quality indexing method is only a rudimentary implementation of the new paradigm. Although it gives promising results under the current limited testings, more extended experiments are needed to validate and optimize the algorithm. More theoretical and experimental connections with respect to human visual perception need to be established. Another important issue that needs to be explored is how to apply it for video quality assessment. In [70], the quality index was calculated frame by frame for a video sequence and combined with other image distortion features such as blocking to produce a video quality measure.




Handbook of Video Databases. Design and Applications
Handbook of Video Databases: Design and Applications (Internet and Communications)
ISBN: 084937006X
EAN: 2147483647
Year: 2003
Pages: 393

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net