7.6 Motion estimation

7.6 Motion estimation

In Chapter 3, block matching motion estimation/compensation and its application in standard codecs was discussed in great detail. We even introduced some fast search methods for estimation, which can be used in software-based codecs. As we saw, motion estimation in H.261 was optional. This was mainly due to the assumption that, since motion compensation can reduce correlation, then DCT coding may not be efficient. Investigations since the publication of H.261 have proved that this is not the case. What is expected from a DCT is to remove the spatial correlation within a small area of 8 x 8 pixels. Measurement of correlations between the adjacent error pixels have shown that there is still strong correlation between the error pixels, which does not impair the potential of DCT for spatial redundancy reduction. Hence motion estimation has become an important integral part of all the later video codecs, such as MPEG-1, MPEG-2, H.263, H.26L and MPEG-4. These will be explained in the relevant chapters.

Considering MPEG-1, the strategy for motion estimation in this codec is different from the H.261 in four main respects:

  • motion estimation is an integral part of the codec

  • motion search range is much larger

  • higher precision of motion compensation is used

  • B-pictures can benefit from bidirectional motion compensation.

These features are described in the following sections.

7.6.1 Larger search range

In H.261, if motion compensation is used, a search is carried out within every subsequent frame. Also, H.261 is normally used for head-and-shoulders pictures, where the motion speed tends to be very small. In contrast, MPEG-1 is used mainly for coding of films with much larger movements and activities. Moreover, in the search for motion in P-pictures, since they might be several frames apart, the search range becomes many times larger. For example, in a GOP structure with M = 3, where there are two B-pictures between the anchor pictures, the motion speed is three times greater than that for consecutive pictures. Thus in MPEG-1 we expect a much larger search range. Considering that in full search block matching the number of search positions for a motion speed of w is (2w + 1)2, then tripling the search range makes motion estimation prohibitively computationally expensive.

In Chapter 3 we introduced some fast search methods such as logarithmic step searches and hierarchical motion estimation. Although the hierarchical method can be used here, of course needing one or more levels of hierarchy, use of a logarithmic search may not be feasible. This is because these methods are very prone to large search ranges, and at these ranges the final minima can be very far away from the local minima, so causing the estimation to fail [8].

One way of alleviating this problem is to use a telescopic search method. This is unique to MPEG with B-pictures. In this method, rather than searching for the motion between the anchor pictures, the search is carried out on all the consecutive pictures, including B-pictures. The final search between the anchor pictures is then the sum of all the intermediate motion vectors, as shown in Figure 7.9. Note that since we are now searching for motion in successive pictures, the search range is smaller, and even fast search methods can be used.


Figure 7.9: Telescopic motion search

7.6.2 Motion estimation with half pixel precision

In the search process with a half pixel resolution, normal block matching with integer pixel positions is carried out first. Then eight new positions, with a distance of half a pixel around the final integer pixel, are tested. Figure 7.10 shows a part of the search area, where the coordinate marked A has been found as the best integer pixel position at the first stage.


Figure 7.10: Subpixel search positions, around pixel coordinate A

In testing the eight subpixel positions, pixels of the macroblock in the previous frame are interpolated, according to the positions to be searched. For subpixel positions, marked with h in the middle of the horizontal pixels, the interpolation is:

(7.1) 

where the division is truncated. For the subpixels in the vertical midpoints, the interpolated values for the pixels are:

(7.2) 

and for subpixels in the corner (centre of four pixels), the interpolation is:

(7.3) 

Note that in subpixel precision motion estimation, the range of the motion vectors' addresses is increased by 1 bit for each of the horizontal and vertical directions. Thus the motion vector overhead may be increased by two bits per vector (in practice due to variable length coding, this might be less than two bits). Despite this increase in motion vector overhead, the efficiency of motion compensation outweighs the extra bits, and the overall bit rate is reduced. Figure 7.11 shows the motion compensated error, with and without half pixel precision, for two consecutive frames of the Claire sequence. The motion compensated error has been magnified by a factor of four for better representation. It might be seen that half pixel precision has fewer blocking artefacts and, in general, motion compensated errors are smaller.

click to expand
Figure 7.11: Motion compensated prediction error (a) with half pixel precision (b) without half pixel precision

For further reduction on the motion vector overhead, differential coding is used. The prediction vector at the start of each slice and each intra coded macroblock is set to zero. Note that the predictively coded macroblocks with no motion vectors also set the prediction vector to zero. The motion vector prediction errors are then variable length coded.

7.6.3 Bidirectional motion estimation

B-pictures have access to both past and future anchor pictures. They can then use either past frame, called forward motion estimation, or the future frame for backward motion estimation, as shown in Figure 7.12.

click to expand
Figure 7.12: Motion estimation in B-pictures

Such an option increases the motion compensation efficiency, particularly when there are occluded objects in the scene. In fact, one of the reasons for the introduction of B-pictures was the fact that the forward motion estimation used in H.261 and P-pictures cannot compensate for the uncovered background of moving objects.

From the two forward and backward motion vectors, the coder has a choice of choosing any of the forward, backward or their combined motion compensated predictions. In the latter case, a weighted average of the forward and backward motion compensated pictures is calculated. The weight is inversely proportional to the distance of the B-picture with its anchor pictures. For example, in the GOB structure of I, B1, B2, P, the bidirectionally interpolated motion compensated picture for B1 would be two-thirds of the forward motion compensated pixels from the I-picture and one-third from backward motion compensated pixels of the P-picture. This ratio is reversed for B2. Note that B-pictures do not use motion compensation from each other, since they are not used as predictors. Also note that the motion vector overhead in B-pictures is much more than in P-pictures. The reason is that for B-pictures there are more macroblock types, which increase the macroblock type overhead, and for the bidirectionally motion compensated macroblocks two motion vectors have to be sent.

7.6.4 Motion range

When B-pictures are present, due to various distances between a picture and its anchor, it is expected that the search range for motion estimation will be different for different picture types. For example, with M = 3, P-pictures are three frames apart from their anchor pictures. B1-pictures are only one frame apart from their past frame and two frames from their future frames, and those of B2-pictures are in reverse order. Hence the motion range for P-pictures is larger than the backward motion range of B1-pictures, which is itself larger than the forward motion vector. For normal scenes, the maximum search range for P-pictures is usually taken as 11 pixels/3 frames, and the forward and backward motion range for B1-pictures are 3 pixels/frame and 7 pixels/2 frames, respectively. These values for B2-pictures become 7 and 3.

It should be noted that, although motion estimation for B-pictures, due to the calculation of forward and backward motion vectors, is more processing demanding than that of the P-pictures nevertheless, due to larger motion range for P-pictures, the latter can be more costly than the former. For example, if the full search method is used, the number of search operations for P-pictures will be (2 × 11 + 1)2 = 529. This value for the forward and backward motion vectors of B1-pictures will be (2 × 3 + 1)2 = 49 and (2 × 7 + 1)2 = 225, respectively. For B2-pictures, the forward and backward motion estimation cost becomes 225 and 49, respectively. Thus, although motion estimation cost for P-pictures in this example is 529, the cost for a B-picture is about 49 + 225 = 274, which is less. For motion estimation with half pixel accuracy, for P and B-pictures 8 and 16 more operations have to be added to these values, respectively. For more active pictures, where the search ranges for both P and B-pictures are larger, the gap on motion estimation cost becomes wider.



Standard Codecs(c) Image Compression to Advanced Video Coding
Standard Codecs: Image Compression to Advanced Video Coding (IET Telecommunications Series)
ISBN: 0852967101
EAN: 2147483647
Year: 2005
Pages: 148
Authors: M. Ghanbari

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net