10.8 Coding of synthetic objects

Synthetic images form a subset of computer graphics, that can be supported by MPEG-4. Of particular interest in synthetic images is the animation of head-and-shoulders or cartoon-like images. Animation of synthetic faces was studied in the first phase of MPEG-4, and the three-dimensional body animation was addressed in the second phase of MPEG-4 development [2].

The animation parameters are derived from a two-dimensional mesh, which is a tessellation of a two-dimensional region into polygonal patches. The vertices of the polygonal patches are referred to as the node points or vertices of the mesh. In coding of the objects, these points are moved according to the movement of the body, head, eyes, lips and changes in the facial expressions. A two-dimensional mesh matched to the Claire image is shown in Figure 10.23a. Since the number of nodes representing the movement can be very small, this method of coding, known as model-based coding, requires a very low bit rate, possibly in the range of 10–100 bit/s [14].

click to expand
Figure 10.23: (a a two-dimensional mesh) (b the mapped texture)

In order to make synthetic images look more natural, the texture of the objects is mapped into the two-dimensional mesh, as shown in Figure 10.23b. For coding of the animated images, triangular patches in the current frame are deformed by the movement of the node points to be matched into the triangular patches or facets in the reference frame. The texture inside each patch in the reference frame is thus warped onto the current frame, using a parametric mapping, defined as a function of the node point motion vectors.

For triangular meshes, affine mapping with six parameters (three node points or vertices) is a common choice [15]. Its linear form implies that texture mapping can be accomplished with low computational complexity. This mapping can model a general form of motion including translation, rotation, scaling, reflection and shear, and preserves straight lines. This implies that the original two-dimensional motion field can be compactly represented by the motion of the node points, from which a continuous, piecewise affine motion field can be reconstructed. At the same time, the mesh structure constrains movements of adjacent image patches. Therefore, meshes are well suited to representing mildly deformable but spatially continuous motion fields.

However, if the movement is more complex, like the motion of lips, then affine modelling may fail. For example, Figure 10.24 shows the reconstructed picture of Claire, after nine frames of affine modelling. The accumulated error due to model failure around the lips is very evident.

Figure 10.24: Reconstructed model-based image with the affine transform

For a larger complex motion, requiring a more severe patch deformation, one can use quadrilateral mappings with eight degrees of freedom. Bilinear and perspective mappings are these kinds of mapping, which have a better deformation capability over affine mapping [16, 17].