4. Adaptive Video Summarization

At present, most research efforts focus on developing and implementing the various summarization methods independently. There is little research currently dedicated to the development of an adaptive summarization technique. Some systems provide the user with various summarization methods and allow the user to choose which method is most appropriate. As stated above, there are various types of digital video, i.e., music video, home video, movies, news broadcasts, surveillance video, etc. The most effective summarization method is the one that allows for the user to gain the maximum understanding and competency of the original video. For the various types of digital video, the level of understanding varies. For example, for surveillance video it may be necessary to view a highly detailed summary of a video sequence. News broadcasts may only need a brief keyframe summary of the anchorperson and story shots. The level of understanding is based on the users information need. When the type of video is unknown before analysis, how does one determine the best summarization method to utilize? The solution to this problem involves creating a composite summarization solution that adapts to the different classes of video. This section describes the current research in this area as well as suggestions for further research.

The F schl r system is an online video management system that allows for the browsing, indexing, and viewing of television programs. Summarization is facilitated through organizing the video content into shots, groups of shots, scenes, story lines, and objects and events. Browsing in the F Schl r system is facilitated via 6 browser interfaces [1, 30, 49]. A scrollbar browser allows a user to scroll through the available keyframes. The advantage of this browser is that it is easy to use. However, viewing large documents with large numbers of keyframes can be overwhelming to the user. A slideshow browser presents each individual keyframe in approximately two-second intervals. The advantage of this browser is that it preserves the temporal nature of the video. The disadvantage of this browser is that a slideshow can be too long for large video sequences causing the user to lose track of their location in the video. A timeline browser allows a user to see a fixed number of keyframes (24) at a time in temporal order. The advantage of this type of browser is that the user can view the keyframes in sets. A ToolTip is also used to display the start and end time of each segment within the set of keyframes. An overview/detail browser displays the number of significant keyframes to the user selected by the scene level segmentation. The user then has a choice of viewing more detail by selecting a keyframe that displays the timeline browser for that particular segment of video. A dynamic overview browser allows a user to view a condensed set of significant keyframes. When the mouse scrolls over the significant keyframes the keyframes flip through the detailed keyframes within the segment. A small timeline appears beneath each keyframe to let the user know where they are in temporal order. This technique allows a user to view the keyframes without major screen changes; however the time that it takes to view each significant keyframe's detailed keyframes may be too long. The hierarchical browser groups keyframes in a hierarchal tree structure that the user can navigate. The top level of the tree structure indicates the entire programs with the subsequent level indicating further segmentations of the previous levels.

The Informedia Digital Video Library System [40, 45] utilizes speech, closed caption text, speech processing, and scene detection to automatically segment news and documentary video. One aspect of the summarization method of the Informedia Digital Video Library System is composed of partitioning the video into shots and keyframes. Browsing is facilitated in this system by filmstrips. Filmstrips consist of thumbnail images along with the video segment metadata. The thumbnail is selected based on the similarity between the text that is associated with the frame and the user query. The user can use the browsing interface to view the metadata and jump to a particular point in the video segment for playback. During playback, the text-based transcript is displayed with highlighted keywords.

We are advocating the development of new video summarization algorithms based on the composition of multiple existing summarization algorithms. Key to the success of such an endeavour is the ascertainment of what constitutes good keyframe choices. Unlike segmentation, it is not obvious what the ground truth is for any given set of data. For this task we propose a controlled user study wherein users are provided with a simple interface for selection of keyframes for a corpus of test content. From the data gained in this study and user survey material, we will create a mean keyframe selection for the corpus and analyze the performance of existing algorithms relative to that corpus. The key measures of performance are: 1) do the algorithms-under-test produce similar volumes of keyframes over similar regions and 2) how close are the algorithmically selected keyframes to those selected by the test population. From this measure of performance, the content characteristics can be correlated with the internal measures in the algorithm and video content characteristics so as to assess the appropriate classes of content for each summarization method.