In this chapter, we have presented the results of our work that seeks to negotiate the gap between low-level features and high-level concepts in the domain of video shot detection. We introduce a novel technique for spatial color indexing, color anglogram, which is invariant to rotation, scaling, and translation. This work also concerns a dimension reduction technique, latent semantic indexing (LSI), which has been used for textual information retrieval for many years. In this environment, LSI is used to determine clusters of co-occurring keywords, sometimes, called concepts, so that a query which uses a particular keyword can then retrieve documents perhaps not containing this keyword, but containing other keywords from the same cluster. In this chapter, we examine the use of this technique to uncover the semantic correlation between video frames.
First of all, experimental results show that latent semantic indexing is able to correlate the semantically similar visual features, either color or spatial color, to construct higher-level concept clusters. Using LSI to discover the underlying semantic structure of video contents is a promising approach to bringing content-based video analysis and retrieval systems to understand the video contents on a more meaningful level. Since the semantic gap is narrowed by using LSI, the retrieval process can better reflect human perception. Secondly, the results proved that color anglogram, our spatial color indexing technique, is more accurate in capturing and emphasizing meaningful features in the video contents than color histogram. Its invariance to rotation, scaling, and translation also provides a better tolerance to object and camera movements, thus helps improve the performance in situations when more complex shot transitions, especially gradual transitions, are involved. Finally, by comparing the experimental results, we validated that the integration of color anglogram and LSI provides a fairly reliable and effective shot detection technique which can help improve the performance of video shot detection. Considering that these results are consistent to those obtained from our previous studies in other application areas [38, 39, 40], we believe that combining their power of bridging the semantic gap can help to bring content-based image/video analysis and retrieval onto a new level.
To further improve the performance of our video shot detection techniques, a more in-depth study of threshold selection is necessary. Even though it is unlikely to totally eliminate user's manual selection of thresholds, how to minimize these interactions plays a crucial role in improving the effectiveness and efficiency in analyzing very large video databases. Besides, it is also interesting to explore the similarity among multiple frames to tackle the problems with complex gradual transitions.
To extend our study on video analysis and retrieval, we propose to use the anglogram technique to represent shape features, and then, to integrate these features into the framework of our shot detection techniques. One of the strengths of latent semantic indexing is that we can easily integrate different features into one feature vector and treat them just as similar components. Hence, ostensibly, we can expand the feature vector by adding more features without any concern. We are also planning to apply various clustering techniques, along with our shot detection methods, to develop a hierarchical classification scheme.