In this chapter, we have presented three video summarization systems that create motion video summaries of the original video from different perspectives. The audio-centric summarization system summarizes the input video by selecting video segments that contain semantically important spoken sentences; the image-centric summarization system creates a summary by eliminating duplicates and redundancies while preserving visually rich contents in the given video; and the audio-visual summarization system constitutes a summary by decoupling the audio and the image track of the input video, summarizing the two tracks separately, and then integrating the two summaries with a loose alignment. These three summarization systems are capable of covering a variety of video programs and customer needs. The audio-centric summarization is applicable for such video programs that information is mainly conveyed by audio speeches and visual content is a subordinate to the audio; the image-centric summarization is useful when the visual content delivers the main theme of the video program; and the audio-visual summarization is suitable when the user wants to maximize the coverage for both audio and visual contents of the original video without having to sacrifice either of them.
Video summarization is a process of creating a concise form of the original video in which important content is preserved and redundancy is eliminated. The process is very subjective because different people may have different opinions on what constitutes the important content for a given video. Especially for entertainment videos such as movies and dramas, different people watch them and appreciate them from many different perspectives. For example, given an action movie with a love story, some viewers may be touched by the faithful love between the main figures, some may be impressed by the action scenes, some may be interested only in a particular star, etc. These different interests reflect the difference in individuals' ages, personal tastes, cultural backgrounds, social ranks, etc. The three video summarization systems presented in this chapter strive to summarize a given video program from three different aspects. They cover certain information abstraction needs, but are not omnipotent for the whole spectrum of users' requirements. An ultimate video summarization system is the one that is able to learn the viewpoint and the preference from a particular user, and create summaries accordingly. With a such system intelligence, a personalized content abstraction service will become available, and a wide range of information intelligence requirements will be fulfilled.