5. Conclusion and Future Work

In this chapter, we presented various requirements for generating video summarization. Existing works on this field rely more or less on each of the following aspects of videos:

The video-structure level that expresses how the video is organized in term of sequences, scenes and shots. Such structure also helps to capture temporal properties of a video.
The semantic level that expresses the semantics of parts of the video. Various forms of semantics may be either automatically or manually extracted.
The signal level that represents features related to the video content in terms of colors/textures and motion.

Many works have been done in these different fields. Nevertheless, we show that a better and powerful formalism has to be proposed in order to facilitate the dynamic creation of adaptive video summaries. It is the reason why we proposed the VISU model. This complex model is based both on a stratum data model and on a conceptual graph formalism in order to represent video content. We also make use of the cinematographic structure of the video and some low-level features in a transparent way for the user during the generation of summaries.

An originality of the VISU model is to allow the expression of rich queries for guiding video summary creation. Such rich queries can only be fulfilled by the system if the representation model is able to support effectively complex annotations; that is why we choose to use the conceptual graph formalism as a basis for the annotation structure in VISU. Our approach also makes use of well known values in Information Retrieval, namely the term frequency and inverse document frequency. Such values are known to be effective for retrieval and we apply them on annotations.

The proposed model will be extended to support temporal relations between annotations in the future. We will also in the future use this work to propose a graphical user interface in order to generate automatically query expressions.