Our system for video object classification consists of two components, namely a segmentation module and a classification module (cf. Figure 23.1).
Figure 23.1: Architecture of the video object classification system.
Based on motion cues the camera motion within the scene is determined (motion estimation) and a background image for the entire sequence is constructed (background mosaic). During the construction process, parts belonging to foreground objects are removed by temporal filtering. Then, object segmentation is performed by evaluating differences between the current frame and the reconstructed background mosaic (segmentation).
The object masks determined by the segmentation algorithm are fed forward to the classification module. For each mask, an efficient shape-based representation is calculated (contour description). Then, this description is matched to pre-calculated object descriptions stored in a database (matching). The final classification of the object is achieved by integrating the matching results for a number of successive frames. This adds reliability to the approach since unrecognizable single object views occurring in the video are insignificant with respect to the whole sequence. Moreover, it allows an automatic description of object behavior.