Rong Zhao
Department of Computer Science
State University of New York
Stony Brook, NY 11794-4400, USA
<rzhao@cs.sunysb.edu>
William I. Grosky
Department of Computer and Information Science
University of Michigan-Dearborn
Dearborn, MI 48128-1491, USA
<wgrosky@umich.edu>
The emergence of multimedia technology coupled with the rapidly expanding image and video collections on the World Wide Web have attracted significant research efforts in providing tools for effective retrieval and management of visual information. Video data is available and used in many different application domains such as security, digital library, distance learning, advertising, electronic publishing, broadcasting, interactive TV, video-on-demand entertainment, and so on. As in the old saying, "a picture is worth a thousand words." If each video document is considered a set of still images, and the number of images in such a set might be in hundreds or thousands or even more, it's not so hard to imagine how difficult it could be if we try to find certain information in video documents. The sheer volume of video data available nowadays presents a daunting challenge in front of researchers - How can we organize and make use of all these video documents both effectively and efficiently? How can we represent and locate meaningful information and extract knowledge from video documents? Needless to say, there's an urgent need for tools that can help us index, annotate, browse, and search video documents. Video retrieval is based on the availability of a representation scheme of video contents and how to define such a scheme mostly depends on the indexing mechanism that we apply to the data. Apparently, it is totally impractical to index video documents manually due to the fact that it is too time consuming.
However, state-of-the-art computer science hasn't been mature enough to provide us with a method that is both automatic and able to cope with these problems with the intelligence comparable to that of human beings. Therefore, existing video management techniques for video collections and their users are typically at cross-purposes. While they normally retrieve video documents based on low-level visual features, users usually have a more abstract and conceptual notion of what they are looking for. Using low-level features to correspond to high-level abstractions is one aspect of the semantic gap [22] between content-based analysis and retrieval methods and the concept-based users [9, 28, 38, 39, 40].
In this chapter, we attempt to find a solution to negotiating this semantic gap in content-based video retrieval, with a special focus on video shot detection. We will introduce a novel technique for spatial color indexing, color anglogram, which is invariant to rotation, scaling, and translation. We will present the results of our study that seeks to transform low-level visual features to a higher level of meaning when we apply this technique to video shot detection. This chapter also concerns another technique to further our exploration in bridging the semantic gap, latent semantic indexing (LSI), which has been used for textual information retrieval for many years. In this environment, LSI is used to determine clusters of co-occurring keywords, sometimes, called concepts, so that a query which uses a particular keyword can then retrieve documents perhaps not containing this keyword, but containing other keywords from the same concept cluster. In this chapter, we examine the use of this technique for video shot detection, hoping to uncover the semantic correlation between video frames. Experimental results show that LSI, together with color anglogram, is able to extract the underlying semantic structure of video contents, thus helping to improve the shot detection performance significantly.
The remainder of this chapter is organized as follows. In Section 2, related works on visual feature indexing and their application to video shot detection are briefly reviewed. Section 3 describes the color anglogram technique. Section 4 introduces the theoretical background of latent semantic indexing. Comparison and evaluation of various experimental results are presented in Section 5. Section 6 contains the conclusions, along with proposed future work.