9.7 Protection against error

H.263 provides error protection, robustness and resilience to allow accessing of video information over a wide range of transmission media. In particular, due to the rapid growth of mobile communications, it is extremely important that access is available to video information via wireless networks. This implies a need for useful operation of video compression algorithms in a very error prone environment at low bit rates (i.e. less than 64 kbit/s).

In the previous sections we studied the two important coding tools of VLC and resynchronisation markers in the H.263 codec. The former spreads the errors, and the latter tries to confine them into a small area. In this section, we introduce some more useful tools that can enhance the video quality beyond what we have seen so far. Some of these are recommended as options (or annexes) and some as postprocessing tools that can be implemented at the decoder without the help of the encoder. They can be used either together or individually, to improve video quality.

9.7.1 Forward error correction

Forward error correction is the simplest and most effective means of improving video quality in the event of channel errors. It is based on adding some redundancy bits, known as parity bits, to a group of data bits, according to some rules. At the receiver, the decoder, invoking the same rule, can detect if any error has occurred, and in certain cases even correct it. However, for video data, error correction is not as important as is error detection.

The forward error correction for H.263 is the same as for H.261, and is optional [25-H]. However, since the main usage of H.263 will be in a mobile environment with poor error characteristics, forward error correction is particularly important. In most cases (e.g. the GSM system), the error correction will be an integral part of the transmission channel. If it is not, or if additional protection is required, then it should be built into the H.263 system.

To allow the video data and error correction parity information to be identified by the decoder, an error correction framing pattern is included. This pattern consists of multiframes of eight frames, each frame comprising 1 bit framing, 1 bit fill indicator (FI), 492 bits of coded data and 18 bits parity. One bit from each one of the eight frames provide the frame alignment pattern of (S₁S₂S₃S₄S₅S₆S₇S₈) = (00011011) that will help the decoder to resynchronise itself after the occurrence of errors.

The error detection/correction code is a BCH (511, 493) [16]. The parity is calculated against a code of 493 bits, comprising a bit fill indicator (FI) and 492 bits of coded video data. The generator polynomial is given by:

(9.12)

The parity bits are calculated by dividing the 493 bits (left shifted by 18 bits) of the video data (including the fill bit) to this generating function. Since the generating function is a 19-bit polynomial, the remainder will be an 18-bit binary number (that is why the data bits had to be shifted by 18 bits to the left), to be used as the parity bits. For example, for the input data of 01111...11 (493 bits), the resulting correction parity bits are 011011010100011011 (18 bits). The encoder appends these 18 bits to the 493 data bits, and the whole 511 bits are sent to the receiver as a block of data. Now this 511-bit data is exactly divisible by the generating function, and the remainder will be zero. Thus, the receiver can perform a similar division, and if there is any remainder, it is an indication of channel error. This is a very robust form of error detection, since burst of errors can also be detected.

9.7.2 Back channel

The impact of error on interframe coded pictures becomes objectionable when error propagates through the picture sequence. Errors affecting only one video frame are easily tolerated by the viewers, especially at high frame rates. To improve quality of video services, propagation of errors through the picture frames must be prevented. A simple method for this task is: when the decoder detects errors in the bit stream (e.g. section 9.7.1), it may ask the encoder to code that part of the picture in the next frame in intra mode. This is called forced updating, and of course requires a back channel from the decoder to the encoder.

Since intraframe coded macroblocks (MB) generate more bits than interframe coded ones, forced updating may not be too impressive. In particular, in normal interframe coding, only a small number of MBs in a GOB are coded. Forced updating will encode all the MBs in the GOB (including the noncoded MBs) in intramode, which increases the bit rate significantly. This can have a side effect of impairing video quality in the subsequent frames. Moreover, if errors occur in more than one GOB, the situation becomes much worse, since the encoder can exceed its bit rate budget, dropping some picture frames. This results in picture jerkiness, which is equally annoying.

A better way of preventing propagation of errors is to ask the encoder to change its prediction to an error free picture. For example, if error occurs in frame N, then in coding of the next frame (frame N + 1) the encoder uses frame N - 1, which is free of error at the decoder. This of course requires some additional picture buffers at both the encoder and the decoder.

The optional reference picture selection mode of H.263 uses additional picture memory at the encoder to perform such a task [25-N]. The amount of additional picture memory accommodated in the decoder may be signalled by external means to help memory management at the encoder. The source encoder for this mode is similar to the generic interframe coder, but several picture memories are provided in order that the encoder may keep a copy of several past pictures, as shown in Figure 9.21.

click to expand
Figure 9.21: An encoder with multiple reference pictures

The source encoder selects one of the picture memories according to the backward channel message GOB-by-GOB to suppress the temporal error propagation due to the interframe coding. The information to signal which picture is selected for prediction is included in the encoded bit stream. The decoder of this mode also has an additional plural number of picture memories, to store the correctly decoded video signals with its temporal reference (TR) information. The decoder uses the stored picture whose TR is TRP as the reference picture for interframe decoding, instead of the last decoded picture, if the TRP field exists in the forward message. When the picture whose TR is TRP is not available at the decoder, the decoder may send the forced intra update signal to the encoder.

A positive acknowledgment (ACK) or a negative acknowledgment (NACK) is returned depending on whether the decoder successfully decodes a GOB.

Both forced intra updating and multiple reference picture modes require a back channel from the decoder to the encoder. In case the back channel cannot be provided, the multiple reference picture can still be used to alleviate error propagation. For example, the encoder may always use an average of two previous frames for prediction. Of course, in an error-free environment, the compression efficiency is not as good as for the single-frame prediction, but has a good robustness against the channel errors.

Figure 9.22 illustrates the efficiency of the multiple reference picture in preventing error propagation. In the Figure errors occur at frame 30, where the picture quality drops by 3 dB (graph E) compared with the nonerror case (NE). With the back channel (BC), the encoder in coding of frame 31 uses prediction from frame 29 instead of frame 30, and the picture quality is not very different from the nonerror case. Without the back channel, with a prediction from the average of two previous frames (2F + E), the picture quality improves. However, the improvement is not very significant, since the quality of the picture without error (2F) due to nonoptimum prediction is not as good as with the nonerror mode (NE).

click to expand
Figure 9.22: Use of multiple reference pictures with and without back channel

9.7.3 Data partitioning

Although the individual bits of the VLC coded symbols in a bit stream are equally susceptible to channel errors, the impact of the error on the symbols is unequal. Between the two resynchronisation markers, symbols that appear earlier in the bit stream suffer less from the errors than those which come later. This is due to the cumulative impact of VLC on decoding of the subsequent data. To illustrate the extent of the difference on the unequal susceptibility to errors, consider a segment of VLC coded video data between two resynchronisation markers. Also assume that the segment has N symbols with an average VLC length of L bits/symbol and a channel with a bit error rate of P. If any of the first L bits of the bit stream (those immediately after the first marker) are in error, then the symbol would be in error with a probability of LP. The probability that the second symbol in the bit stream is in error now becomes 2LP, since any error in the first L bits also affects the second symbol. Hence the probability that the last symbol in the bit stream is in error will be NLP, since every error ahead of this symbol can change the value of this symbol. Thus the last symbol is N times more likely to be in error than the first symbol in the bit stream.

In applications where some video data is more important than the others, like the macroblock addresses (as distinct from interframe DCT coefficients), by bringing the important data ahead of the nonimportant data, one can significantly reduce the channel error side effects. This form of partitioning the VLC coded data into segments of various importance is called data partitioning, which is one of the optional modes of H.263 [25-V]. Note that this form of data partitioning is different from the data partitioning used as a layering technique, described in section 8.5.2. There, through the priority break point, the DCT coefficients were divided into two parts and the lower frequency coefficients along with the other data comprised the base layer and the high frequency DCT coefficients were the second layer. Inclusion of the priority break points and other overheads increased the bit rate by about 3–4 per cent (see Figure 8.25). But here, the entire set of data in a GOB is partitioned and are ordered according to the importance of their contributions in video quality, without any additional overhead. For example, within a GOB, the order of importance of data can be: coding status of MBs, motion vectors, block pattern, quantiser parameter, DC coefficients, AC coefficients. Thus it is also possible to extract all the DC coefficients of the blocks in a GOB, and send them ahead of all the AC coefficients.

To appreciate the importance of data partitioning in protecting video against channel errors, Figure 9.23 shows two snap shots of a video sequence with and without data partitioning. It was assumed that in data partitioning, only the DCT coefficients were subjected to errors, but for the normal mode, the bit error could affect any bit of the data. This is a plausible assumption, since normally the important data comprises a small fraction of the bit stream and it can be heavily protected against error. The important data can also use a reversible variable length code (RVLC), such that some of the corrupted data can be retrieved. In fact Annex V of data partitioning recommends RVLC for the slice header (including the macroblock type) and motion vectors [25-V]. The DCT coefficients according to this recommendation use normal VLC. The good picture quality under data partitioning over the normal as shown in Figure 9.23 justifies such a decision. This also shows the insignificance of the DCT coefficients, as their loss hardly affects the picture quality. It should be noted that, in this picture, all the macroblocks were interframe coded. Had there been any intraframe coded macroblock, then its loss would have been noticeable.

click to expand
Figure 9.23: Effects of errors (a with data partitioning) (b without data partitioning)

Table 9.3 compares the normal VLC and RVLC for the combined macroblock type and block pattern (MCBPC). Note that RVLC is symmetric and it has more bits than the normal VLC. Hence its use should be avoided, unless it is vital to prevent drastic image degradation.

Table 9.3: VLC and RVLC bits of MCBPC
Index	MB type	CBPC	Normal VLC	RVLC
0	3 (intra)	00	1	1
1	3	01	001	010
2	3	10	010	0110
3	3	11	011	01110
4	4 (intra + Q)	00	0001	00100
5	4	01	000001	011110
6	4	10	000010	001100
7	4	11	000011	0111110

Table 9.4 shows the average number of bits used in an experiment for each slice of a QSIF size salesman image test sequence (picture in Figure 9.23). The last column is the average bit/slice in normal coding of the sequence, for the whole nine slices. For data partitioning, the second column is the slice overhead (including the macroblock type, resynchronisation markers), the third column is the motion vector overhead and the fourth column is the number of bits used for the DCT coefficients. The sum of all the bits in data partitioning is given in the fifth column.

Table 9.4: Number of bits per slice for data partitioning
Slice No	Slice header	MV	Coeff	SUM	Normal
1	52	30	211	293	269
2	63	34	506	603	571
3	45	42	748	835	803
4	48	42	1025	1115	1083
5	45	71	959	1075	1043
6	41	46	844	931	899
7	48	34	425	507	475
8	51	32	408	491	459
9	38	24	221	283	251

First, since the sum of the slice header and motion vectors is only 8–28 per cent of the data, less for more active slices, they can be easily protected without significantly increasing the total bit rate. Secondly, comparing the total number of bits in data partitioning with normal coding (columns 5 and 6), we see that data partitioning uses about 3–12 per cent more bits than does normal coding. Considering that this increase is due to the use of RVLC for only the header and the motion vectors and some more resynchronisation markers at the end of the important data, then had we used RVLC for the entire bits, the increase in bit rate would have been much higher. Hence the fact that DCT coefficients do not contribute too much to image quality and RVLC needs more bits than VLC; it is very wise not to use RVLC for the DCT coefficients, as Annex V recommends [25-V]. It should be noted that the main cause for the unpleasant appearance of the picture without data partitioning (Figure 9.23b) is the error on the important data of the bit stream, such as MB address and motion vectors. When the coding status of an MB is wrongly addressed to the decoder, visual information is misplaced. Also in the nondata partitioning mode, since the data, MB and motion vectors VLC coded are mixed, any bit error easily causes nonvalid codewords and a large area of the picture will be in error, as shown in Figure 9.23b.

Note that data partitioning is only used for P and B-pictures, because for I-pictures, DCT coefficients are all important and their absence degrades picture quality significantly.

9.7.4 Error detection by postprocessing

In the error correction/detection section of 9.7.1 we saw that with the help of parity bits the decoder can detect an erroneous bit stream. In data communications, the decoder normally ignores the entire segment of the bits and requests for retransmission. Due to the delay sensitive nature of visual services, in video communication retransmission is never used. Moreover, the decoder can decode a part of the bit stream, up to the point where it finds an invalid code word. Hence a part of the corrupted bit stream can be recovered limiting the damaged area.

However, the decoder still cannot identify the exact location of the error (if this were possible, it could have corrected it!). What is certain is that the bits after the invalid codeword up to the next resynchronisation marker are not decodable, as shown in Figure 9.24.

click to expand
Figure 9.24: Error in a bit stream

It is to be expected that several symbols are wrongly decoded before the decoder finds an invalid codeword. In some cases, the entire data may be decodable without encountering an invalid codeword, although this rarely happens. For example, the grey parts of the slices in Figure 9.23b are due to the invalid codewords that the decoder has given up decoding. Figure 9.23b also shows wrongly decoded blocks of pixels, where the decoder can still carry on decoding beyond these blocks. Hence in those parts that are decodable, the correctly decoded data cannot be separated from the wrongly decoded ones, unless some form of processing on the decoded pixels is carried out.

A simple and efficient method of separating correctly decoded blocks from the wrongly decoded ones is to test for pixel continuity at the macroblock (MB) boundaries. For nonerroneous pictures, due to high interpixel correlation pixel differences at the MB borders are normally small, and those due to errors create large differences. As shown in Figure 9.25a, for every decoded MB, the average of upper and lower pixel differences at the MB boundaries is calculated as:

(9.13)

click to expand
Figure 9.25: Pixels at the boundary of (a a macroblock) (b four blocks)

where N is the total number of pixels at the upper and lower borders of the MB.

The boundary difference (BD) of each MB is then compared against a threshold, and for those MBs which are larger, the implication is that they are most likely to be erroneously decoded. Since due to texture or edges in the image there might be some inherent discontinuity at the MB boundaries, the boundary threshold can be made dependent on the local image statistics. For example, the mean value of the boundary differences of all the MB in the slice, or the slice above, with some tolerance (a few times the standard deviation of the mean differences) can be used as the threshold. Experiments show that mean plus four times the standard deviation is a good value for the threshold [17]. The boundary difference, BD, can be calculated separately for luminance and each of the colour differences. A macroblock might have been erroneously decoded if any of these boundary differences so indicated.

Another method is to calculate the boundary differences around the 8 x 8 pixel block boundaries, as shown in Figure 9.25b. In 4:2:0 image format, each MB has four luminance blocks and one of each chrominance block, and hence the boundary difference is applied only to the luminance blocks.

In a similar fashion to the boundary difference of eqn. 9.13, the block boundary is calculated on the inner and outer pixels of the blocks, as shown in Figure 9.25b. Again, if any of the four block boundary values, BD, indicates a discontinuity, the macroblock is most likely to be erroneously decoded. Combining the boundary differences of the macroblock (Figure 9.25a) and the block (Figure 9.25b) increases the reliability of detection.

Assuming that these methods can detect an erroneously decoded MB, then if the first erroneous MB in a slice is found, and provided that the error had only occurred in the bits of this MB, in general it is possible to retrieve the remaining data. Here, after identifying the first erroneous MB, some of the bits are skipped and decoding is performed on the remaining bits. The process is continued such that the remaining bits up to the next resynchronisation marker are completely decodable (no invalid codeword is encountered). In doing so, even parts of the slice/GOB that were not decodable before are now decoded and the erroneous part of the GOB can be confined to one MB.

If errors occur in more than one MB, then it may not be possible to have perfect decoding (no invalid codeword) up to the next resynchronisation marker. Thus in general, when decoding proceeds up to the next resynchronisation marker, the number of erroneous MBs are counted. This number should be less than the number of erroneous MBs in the previous run. The process ends when any further skips in bits and decoding does not further reduce the number of erroneous macroblocks in a GOB.

Figure 9.26 shows the decoded pictures at each stage of this step-by-step skipping and decoding of the bits. For the purpose of demonstration, only one bit was introduced in the bit stream between the resynchronisation markers of some of the slices. The first picture shows the erroneous picture without any postprocessing. The second picture shows the reconstructed picture after the first round of bit skipping in each slice, and so on. As we see, in each stage the erroneous area (number of erroneous MBs) is reduced and further processing does not reduce the number of erroneous MBs (not much differences between pictures d and e). There is only one erroneous MB in each slice of the final picture (Figure 9.26e), which can easily be concealed.

click to expand
Figure 9.26: Step-by-step decoding and skipping of bits in the bit stream

In the above example it was assumed that a single bit error was affecting one MB, or a burst of errors affected only one MB. If errors affect more than one MB, then at the end more than one MB will be in error and of course it will take more time to find these erroneous MBs in the decoding. This is because, after finding the first erroneous MB, since there are some erroneous MBs to follow, perfect decoding (not finding a valid codeword) is not possible. Experiments show that in most cases, all the macroblocks between the first and the last erroneous MB in a slice will be in error. However, it is still possible to recover some of the macroblocks, which without this sort of processing was not possible.

9.7.5 Error concealment

If any of the error resilience methods mentioned so far or their combinations is not sufficient to produce satisfactory picture quality, then one may try to hide the image degradation from the viewer. This is called error concealment.

The main idea behind error concealment is to replace the damaged pixels with pixels from parts of the video that have maximum resemblance. In general, pixel substitution may come from the same frame or from the previous frame. These are called intraframe and interframe error concealment, respectively [18].

9.7.5.1 Intraframe error concealment

In intraframe error concealment, pixels of an erroneous MB are replaced by those of a neighbouring MB with some form of interpolation. For example, pixels at the macroblock boundary may be directly replaced by the pixels from the other side of the border, and for the other pixels, the average of the neighbouring pixels inversely weighted by their distances may be substituted.

An efficient method of intraframe error concealment is shown in the block diagram of Figure 9.27. A block of pixels, larger than the size of a macroblock (preferably 48 × 48 pixels, equivalent to 3 × 3 MBs) encompassing the MB to be concealed, is fast Fourier transformed (FFT). Pixels of the MB to be concealed initially are filled with grey level values. The FFT coefficients are two-dimensionally lowpass filtered (LPF) to remove the discontinuity due to these inserted pixels. The resultant lowpass filtered coefficients are then inverse fast Fourier transformed (IFFT), to reconstruct a replica of the input pixels. Due to lowpass filtering, the reconstructed pixels are similar but not exactly the same as the input pixels. The extent of dissimilarity depends on the cutoff frequency of the lowpass filter. The lower the cutoff frequency, the stronger is the influence of the neighbouring pixels into the concealed MB. The centre MB at the output now replaces the centre MB at the input, and the whole process of FFT, LPF and IFFT repeats again. The process is repeated several times, and at each time the cutoff frequency of the LPF is gradually increased. To improve the quality of error concealment, the lowpass filter can be made directional, based on the characteristics of the surrounding pixels. The process is terminated when the difference between the pixels of the concealed MB at the input and output is less than a threshold.

Figure 9.27: An example of intraframe error concealment

This form of error concealment assumes an isolated erroneous MB surrounded by eight immediate nonerroneous neighbours. This is suitable for JPEG or motion JPEG coded pictures, where error is localised (see Figure 5.17), or for interframe coded pictures, if by means of postprocessing error is confined to an MB (e.g. Figure 9.26e). For video, where there is a danger of error at the same slice/GOB, then pixels of the top and bottom slices should be used, and the two right and left MBs are treated as if they are in error. This impairs the performance of the concealment, and may not be suitable. For video, a more suitable error concealment is interframe error concealment, which is explained in the following.

9.7.5.2 Interframe error concealment

In interframe error concealment, pixels from the previous frame are substituted for the pixels of the MB to be concealed, as shown in Figure 9.28. This could be either by direct substitution, or by their motion compensated version, using an estimated motion vector. Obviously, due to movement, motion compensated substitution is better. The performance of this method depends on how accurately the motion vector for concealment is estimated. In the following, several methods of estimating this motion vector are explained and their error concealment fidelity are compared against each other.

click to expand
Figure 9.28: A grid of 3 × 3 macroblocks in the current and previous frame

Zero mv

Direct substitution of pixels from the MB of the previous frame at the same spatial position of the MB to be concealed (zero motion vector). This is the simplest method of substitution, and is effective in the picture background, or in the foreground with slow motion.

Previous mv

The estimated motion vector is the same as the motion vector of the spatially similar MB in the previous frame. This method, which assumes a uniform motion of objects, performs well most of the time, but, however, it is eventually bound to fail.

Top mv

The estimated motion vector is the same as the motion vector of MB at the top of the wanted MB (e.g. MB number 2 of Figure 9.28). Similarly, the motion vector of the bottom MB (e.g. MB number 5) may be used. Since these two MBs are closest to the current MB, it is expected that their MVs will have the highest similarity. However, this method is as simple as the direct substitution (zero mv) and previous mv.

Mean mv

The average of the motion vectors of the six immediate neighbours represents the estimated mv. The mean value for horizontal displacement, x₀, and vertical displacement, y₀, are taken separately:

(9.14)

where x_i and y_i are the horizontal and vertical components of the motion vector i; mv_i (x_i, y_i). Note that, due to averaging, any small perturbations in the neighbouring motion vector components will cancel each other. Thus the estimated motion vector will be different from the motion vector of the neighbouring MB. This method of error concealment may not produces a smooth picture. The discontinuity at the MB boundaries produces a blocking artefact that appears very annoying. Hence this method is not good for parts of the picture with motion in various directions, such as the movement of lips and eyes of a talking head.

Majority mv

The majority of the motion vectors are grouped together, and their mean or other representative value is taken as the estimated motion vector.

(9.15)

where N out of six motion vectors are almost at the same direction. Since in general all motion vectors can differ from each other, then to find the majority, the motion vectors should be vector quantised, and the majority is found among their original values. This method works well for rigid body movement, where the neighbouring motion vectors normally move in the same direction. However, since there are only six neighbouring motion vectors, a definite majority among them cannot be found reliably. Hence for nonrigid movement, such as lips and eyes, this method may not work well.

Vector median mv

The median of a group of vectors is one of the vectors in the group that has the smallest Euclidean distance from all. Thus among the six neighbouring motion vectors mv₁-mv₆ of Figure 9.28, the jth motion vector, mv_j, is the median if:

(9.16)

such that for all motion vectors mv_k, 1 ≤ k ≤ 6, the distance of vector j, dist_j is less than the distance of vector k, dist_k [19].

This method is expected to produce a good result, because since the median of vectors has the least distance from all, it then has the largest correlation with them. Also, since the macroblock to be concealed is at the centre of all and has the highest correlation with them, then it has the same property as the median vector. This good performance is achieved at a higher computational cost. Here, a Euclidean distance of each vector from all the five other vectors should be calculated first, which requires vector distance calculations. Then for each, five distances to be averaged to represent the average distance of a vector from the others. Finally, they should be rank ordered to find the minimum distance.

To compare the relative error concealment performance of each method, four sets of head-and-shoulders type image sequences at QSIF resolutions were subjected to channel errors. In the event of error, the whole GOB was concealed by the above mentioned methods. This is because, due to VLC coding, a single bit error may cause the remaining bits up to the next GOB nondecodable, as shown on the erroneous picture of Figure 9.23. Tables 9.5 and 9.6 summarise the quality of these error concealment methods for QCIF video at 5 and 12.5 frames/s, respectively. To show just the impact of error concealment, measurements were carried out only on the concealed areas.

Table 9.5: PSNR [dB] of various error concealment methods at 5 frames/s
Type	64 Kbps, 5 fps, QCIF
Sequence	Seq-1	Seq-2	Seq-3	Seq-4
Zero	17.04	18.15	14.54	13.08
Previous	17.28	18.34	14.53	13.48
Top	19.27	21.08	17.04	16.25
Average	19.18	21.74	17.51	16.18
Majority	19.35	21.83	17.89	16.61
Median	19.87	22.52	18.29	16.89
No errors	22.57	26.94	20.85	19.88

Table 9.6: PSNR [dB] of the various error concealment methods at 12.5 frames/s
Type	64 Kbps, 12.5 fps, QCIF
Sequence	Seq-1	Seq-2	Seq-3	Seq-4
Zero	20.62	22.11	18.64	16.62
Previous	22.49	22.19	18.19	16.53
Top	22.97	25.16	20.97	20.04
Average	22.92	25.99	21.24	20.08
Majority	23.33	26.32	21.57	20.36
Median	24.36	26.72	22.16	20.81
No errors	26.12	29.69	23.93	23.04

As the Tables show, the vector median method gives the best result at both high and low frame rates. That of the majority method is the second best. In all cases, the performance of the average method is as poor as the simple method of top and, in some cases, it is even poorer (seq-1 and seq-4 of Table 9.5). The poor performance of the previous mv means that motion is not uniform. This is particularly evident at the low frame rate of five frames/s. However, all the methods are superior to zero motion, implying that loss concealment by an estimated motion vector improves picture quality.

Also, note that since quality measurements were carried out at the error concealed areas, then the performance at a lower frame rate is poorer than at the higher frame rate. That is, as the frame rate is reduced, the estimated motion vector is less similar to the actual motion vector. Despite this, estimating motion vectors by all the methods gives better performance than not estimating them (zero motion vector). Figure 9.29 shows an accumulated erroneous picture of seq-3 at five frames/s and its concealed one with the median vector method [19].

click to expand
Figure 9.29: An erroneous picture along with its error concealed version

Bidirectional mv

If B-pictures are present in the group of pictures (GOP), then due to stronger relation between the motion vector of a B and its anchor P or I-picture, a better estimation of the motion vector can be made. As an example consider a GOP of N = α and M = 2, that is, the image sequence is made of alternate P and B-pictures, as shown in Figure 9.30.

Figure 9.30: A group of alternate P and B-pictures

To estimate a missing motion vector for a P-picture, say P₃₁, the available motion vectors of the same spatial coordinates of the B-pictures can be used, with the

following substitutions:

if only B₂₃ is available, then P₃₁ = 2 × B₂₃
if only B₂₁ is available, then P₃₁ = -2 × B₂₁
if both B₂₃ and B₂₁ are available, then P₃₁ = B₂₃ - B₂₁
if none of them are available, then set P₃₁ = 0

To estimate a missing motion vector of a B-picture, simply divide that of the P-picture by two: or .

Here we have used a simple previous mv estimation method, explained earlier. Although in the tests of Tables 9.5 and 9.6 (images sequences made of P-pictures only) this method did not perform well, since the relation here between P and B-pictures is strong, the method does work well. For example, using MPEG-1 video we have achieved about 3–4 dB improvement over the majority method [20]. The amount of improvement is picture dependent, and it appears for QCIF images coded with H.263; at least 1 dB improvement over the majority can be achieved. Interested readers should consult [20] for further detailed information.

9.7.5.3 Loss concealment

In transmission of video over packet networks such as IP, ATM or wireless packet networks, the video data is packed into the payload of the packets. In this transmission mode two types of distortion may occur. One is the error in the payload, which results in erroneous reception of the bit stream, similar to the effect of channel errors. The second one is either error in the packet header, which results in a packet loss, or if the packet is queued in a congested network. Excessively delayed packets will be of no use and hence they will be discarded either by the switching nodes (routers) or by the receiver itself.

Detection, correction and concealment of the error in the packet payload is similar to that for the previous methods mentioned. For packet loss the methods can be slightly different. First, the decoder by examining the packet sequence number discovers that a packet is missing. Second, when a packet is lost, unlike channel errors, no part of the video data is decodable. Hence, loss concealment is more vital to video over packet networks than is error concealment in a nonpacket transporting environment.

Considering that in coding of video, in particular at low bit rates, not all parts of the picture are coded, then the best concealment for noncoded macroblocks is the direct copy of the previous macroblock without any motion compensation (i.e. zero mv). For those which are coded, as Tables 9.5 and 9.6 show, a motion compensated macroblock gives a better result. However, the information as to which macroblock was or was not coded is not available at the decoder. It is obvious that any attempt to replace the noncoded area by the motion compensated macroblock will degrade the image quality rather than improve it. Our simulations show that replacing a noncode MB with an estimated motion compensated MB would degrade the quality of the pixels in that MB by 7–10 dB [21].

Therefore, for proper loss concealment, the coded and noncoded maroblocks should be discriminated from each other. A noncoded MB should be directly copied from the previous frame, but for the coded one, it should be motion compensated by an estimated motion vector (any of the estimation methods of section 9.7.5.2). Decision on the coding status of a missing MB can be made on the coding status of the MB at the same spatial location in the previous frame. Investigations show that if an MB is coded, it is about 70 per cent certain that it will be coded in the next frame [21]. Also, if an MB is not coded, it is 90 per cent certain that it will not be coded in the next frame. Thus a decision on whether a lost MB should be replaced by direct substitution, or its motion compensated version, can be made based on the coding status of that MB in the previous frame. We call this method of loss concealment, selective loss concealment.

To demonstrate the image enhancement due to loss concealment, the Salesman test image sequence coded at 144 kbit/s, 10 Hz, was exposed to channel errors at a rate of 10^-2 bit rate, using the channel error model given in Appendix E [22].

Figure 9.31 shows the objective quality of the entire decoded picture sequence with loss and loss concealment. As the Figure shows, while the quality of the decoded video due to loss is impaired by more than 10 dB, loss concealment enhances degraded image quality by around 7 dB. Figure 9.31 also shows the improvement due to selective concealment versus the full concealment, where a lost macroblock is always replaced by the motion compensated previous macroblock, irrespective of whether a macroblock was coded or not.

click to expand
Figure 9.31: Quality of decoded video with and without loss concealment with a bit error ratio of 10^-2

9.7.5.4 Selection of best estimated motion vector

Although Tables 9.5 and 9.6 show that one method of estimating a lost motion vector is better than the other, nevertheless they represent the average quality over the entire video sequence. Had we compared these methods on a macroblock by macroblock basis, there can be situations in which an overall best method will not perform well. The reason is that the quality of such error/loss concealment depends on the directions and values of the surrounding motion vectors of that macroblock. What makes poor error/loss concealment is that the motion compensated replacement macroblock shows some pixel discontinuity. This makes the reconstructed picture look blocky, which is very disturbing.

To improve the error/loss concealed image quality, one may apply all the above motion estimation methods, and test for image discontinuity around the reconstructed macroblock. The method that gives the least discontinuity is then chosen. Methods introduced in section 9.7.4 can be used as a discontinuity measure.