Chapter 12: Adaptive Video Segmentation and Summarization | Handbook of Video Databases: Design and Applications (Internet and Communications)

Charles B. Owen and John K. Dixon
Department of Computer Science and Engineering
Michigan State University
East Lansing, Michigan, USA
cbowen,dixonjoh@cse.msu.edu

1. Introduction

Many techniques have been developed for segmentation and summarization of digital video. The variety of methods is partially due to the fact that different methods work better on different classes of content. Histogram-based segmentation works best on color video with clean cuts; motion-based summarization works best on video with moving cameras and a minimum of disjoint motion. Recognizing that there is no single, best solution for each of these problems has led to the ideas in this chapter for integrating the variety of existing algorithms into a common framework, creating a composite solution that is sensitive to the class of content under analysis, the performance of the algorithms on the specific content, and the best combination of the results. This chapter presents a survey of the existing techniques used to perform video segmentation and summarization and highlights ways in which a composite solution can be developed that adapts to the underlying video content.

As massive quantities of digital video accumulate in corporate libraries, public archives, and home collections, locating content in that video continues to be a significant problem. A key element of any method that attempts to index digital video is effective segmentation and summarization of the video. Video segmentation involves decomposing a video sequence into shots. Summarization involves developing an abstraction for the video sequence and creating a user-interface that facilitates browsing of the abstraction. Indexing methods require minimization of redundancy for effective performance. Ideally, an indexing method would be given a compact summarization of the content with only salient elements subject to analysis. And, given the limited performance of indexing methods and the questionable ability of humans to pose exact queries, it is essential that results be presented in a way that allows for very fast browsing by the user. The goal of adaptive video segmentation and summarization is to create algorithmic solutions for each area of analysis that are independent of the video class, non-heuristic in nature, and useful for both indexing and browsing purposes. Such tools can go a long ways towards unlocking these massive vaults of digital video.

Great progress has been made on the segmentation and summarization of digital video. However, it is common for this research to focus on specific classes of video. Major projects have analyzed CNN Headline News, music videos, and late night comedians [1–7]. This work answered many questions about how to analyze video where the structure is known in advance. However, any general solution must work for a wide variety of content without the input of manually collected structural knowledge. This fact has been well known in the mature document analysis community for many years. Researchers have attempted to apply ideas gleaned by that community and new approaches derived from analysis of past segmentation and structural analysis results to develop solutions that are adaptive to the content and able to function for video ranging from commercial edited motion pictures, to home movies produced by amateurs, to field content captured in military settings.

The primary goal of adaptive video segmentation is to apply known algorithms based on motion, color, and statistical characteristics in parallel, with instrumentation of the algorithms, content analysis, and results fusion to determine a best segmentation with maximum accuracy and performance. The text document retrieval community has recognized the value of adaptive application of multiple algorithmic search mechanisms. This strategy can also be applied to shot segmentation.

Likewise, results fusion and adaptation techniques can also be applied to video summarization. One of the critical tools of any indexing and browsing environment is effective summarization. Video to be indexed must be presented to an indexing system with a minimum of redundancy so as to avoid redundant retrieval results and to maximize the disparity in the indexing space. Likewise, search results must be presented as compact summaries that allow users to choose the correct result or adapt the search as quickly as possible. Again, many different approaches for video summarization exist. This toolbox of approaches can be utilized as the basis for an adaptive solution for video summarization that will draw on the strengths of the different approaches in different application classes.