Chapter 8: A Temporal Multi-Resolution Approach to Video Shot Segmentation

Tat-Seng Chua, A. Chandrashekhara, and HuaMin Feng
Department of Computer Science
National University of Singapore Singapore
{chuats, chandra, fenghm}

1. Introduction

The rapid accumulation of huge amounts of digital video data in archives has led to many video applications, which has necessitated the development of many video-processing techniques. These applications should have the ability to represent, index, store and retrieve video efficiently. Since video is a time-based media, it is very important to break the video streams into basic temporal units, called shots [45]. This kind of temporal segmentation is very useful in most video applications. The temporal segmentation is the first and the most basic step in the structured modeling of video [35][2]. When attempts are made to let computers understand videos, these shots serve as the basic blocks to construct the whole story. The understanding of video usually requires the understanding of the relationship between shots [6]. While in an application of video retrieval system, indexing the shots seems to be an inevitable step [13][5]. Therefore a good algorithm for the temporal segmentation of video can be helpful in all of these systems.

The temporal partitioning of video is generally called video segmentation or shot boundary detection [13][45][42]. To fulfill the task of partitioning video, video segmentation needs to detect the joining of two shots in the video stream and locate the position of these joins. These joins are made by the video editing process, which appears to be of two different types based on the technique involved [12]. If the video editor does nothing but directly concatenates the two shots together, the join is termed an abrupt transition, which is named CUT. If the video editor uses some special technique such as fade in/out, wipe, dissolve or morphing to make the joint appear smooth visually, the join will be a gradual transition, which is called GT. Due to the presence of these transitions and the wide varying lengths of GTs, the task of detecting the type and location of the transition of video shot is a complex task. Moreover, GT can be of varying temporal durations and are generated by different special techniques involved in the editing. These have made GT much more difficult to handle than CUT.

Most research uses different techniques to detect different types of GT. However, it can be observed that the various types of video transitions can be modeled as a temporal multi-resolution edge phenomenon. The temporal resolution of a video stream can be high (i.e., the original video stream) or low (i.e., by temporal sub-sampling of video frames), and different types of video transitions only differ in their characteristics in the temporal scale space. For example, longer GT's cannot be observed at a high temporal resolution but are apparent at a lower temporal resolution of the same video stream. Thus we claim that the transition of video shots is a multi-resolution phenomenon. Information across resolutions will be used to help detect as well as locate both the CUT and GT transition points. Since wavelet is well known for its ability to model sharp discontinuities and processing signals according to scales [4], we use Canny-like B-Spline wavelets in this multi-resolution analysis.

While various concepts of video segmentation have been explained and elaborated in other chapters of this book, this chapter is intended to provide a temporal multi-resolution analysis (TMRA) framework for video shot segmentation. In the next section, a short review of shot segmentation techniques is discussed. Section 3 reviews the basic theory behind video representation and multi-resolution analysis. Section 4 discusses the TMRA framework. Section 5 provides a brief account of simulation results. Section 6 concludes with the future discussions of work in this area.

Handbook of Video Databases. Design and Applications
Handbook of Video Databases: Design and Applications (Internet and Communications)
ISBN: 084937006X
EAN: 2147483647
Year: 2003
Pages: 393 © 2008-2017.
If you may any questions please contact us: