This section presents an example of fourth generation video coding techniques, using recognition and reconstruction of visual data. The approach has to be considered as a validation of the fourth generation video coding techniques and to provide a framework worth exploring. It is not intended to provide any complete new video coding scheme. For more details in eigenface coding approaches using adaptive techniques see , .
The scheme is based on the well-known eigenspace concepts used in face recognition systems, which have been modified to cope with the video compression application. Let us simplify the visual content by assuming that we are interested in the coding of faces in a videoconference session. Let us assume that automatic tools to detect a face in a video sequence are available . Then, some experiments show that a face can be well represented by very few coefficients found through the projection of the face on an eigenspace previously defined. The image face can be well reconstructed (decoded), up to a certain quality, by coding only very few coefficients .
Our coding technique is based on a face recognition approach, which has been modified to cope with the coding application . It assumes that a set of training images for each person contained in the video sequence is previously known. Once these training images have been found (usually coming from an image database or from a video sequence), a Principal Component Analysis (PCA) is performed for each individual using the corresponding training set of each person. This means that a PCA decomposition for every face image to be coded is obtained. The PCA is done previously to the encoding process.
After the PCA, the face to be coded is projected and reconstructed using each set of different eigenvectors (called eigenfaces) obtained in the PCA stage. If the reconstruction error using a specific set of eigenfaces is below a threshold, then the face is said to match the training image which generated this set of eigenfaces. In this case the recognized face is coded by quantizing only the most important coefficients used in the reconstruction. The size of the coded image has to be previously normalized for PCA purposes and then denormalized at the decoder. It is clear that the corresponding eigenfaces of each person have to be transmitted previously to the decoder. However this can be done using conventional still image coding techniques such as JPEG and no significant increment in bit rate is generated.
Figure 39.2 shows five views of the image Ana and Figure 39.3 five views of the image Jos Mari. These images come from the test sequences accepted in MPEG-7.
Figure 39.2: Five training views of the image Ana.
Figure 39.3: Five training views of the image Jos Mari.
Figure 39.4 shows the original image Ana, the reconstruction of the detected face image Ana using the eigenvectors, and corresponding projected coefficients of the PCA using the training images of Ana and the error done. Figure 39.5 shows the equivalent result for Jos Mari. Only 5 real numbers have been used to decode the shown images which means a very high compression ratio. The image size is 50x70 and the original image has been encoded using 8 bits/pixel. The compression ratio of the decoded images is 350.
Figure 39.4: Decoded (reconstructed) image Ana. Left— original image. Center— reconstructed image. Compression factor 350. Right— Error image.
Figure 39.5: Decoded (reconstructed) image Jos Mari. Left— original image. Center— reconstructed image. Compression factor 350. Right— Error image.
The presented results show that image coding using recognition and reconstruction may be the next step forward in video coding. Good object models will be needed, though, to encode any kind of object following this approach.