In the remainder of the Handbook, some of the world leading experts in this field examine the state of the art, ongoing research, and open issues in designing video database systems.
Section II presents concepts and techniques for video modeling and representation. In Chapter 2, Garg, Naphade, and Huang discuss the need for a semantic index associated with a video program and the difficulties in bridging the gap that exists between low-level media features and high-level semantics. They view the problem of semantic video indexing as a multimedia understanding problem, in which there is always a context to the co-occurrence of semantic concepts in a video scene and propose a novel learning architecture and algorithm, describing its application to the specific problem of detecting complex audiovisual events. In Chapter 3, Vasconcelos reviews ongoing efforts for the development of statistical models for characterizing semantically relevant aspects of video and presents a system that relies on such models to achieve the goal of semantic characterization. In Chapter 4, Eleftheriadis and Hong introduce Flavor (Formal Language for Audio-Visual Object Representation), an object-oriented language for bitstream-based media representation. The very active field of summarization and understanding of sports videos is the central topic of Chapter 5, where Assfalg, Bertini, Colombo, and Del Bimbo report their work on automatic semantic video annotation of generic sports videos, and Chapter 6, where Ekin and Tekalp propose a generic integrated semantic-syntactic event model to describe sports video, particularly soccer video, for search applications.
Section III describes techniques and algorithms used for video segmentation and summarization. In Chapter 7, Ardizzone and La Cascia review existing shot boundary detection techniques and propose a new neural network-based segmentation technique, which does not require explicit threshold values for detection of both abrupt and gradual transitions. In Chapter 8, Chua, Chandrashekhara, and Feng provide a temporal multi-resolution analysis (TMRA) framework for video shot segmentation. In Chapter 9, Smith, Watclar, and Christel describe the creation of video summaries and visualization systems through multimodal feature analysis, combining multiple forms of image, audio, and language information, and show results of evaluations and user studies under the scope of the Informedia Project at Carnegie Mellon University. In Chapter 10, Gong presents three video content summarization systems developed by multidisciplinary researchers in NEC USA, C&C Research Laboratories. These summarization systems are able to produce three kinds of motion video summaries: (1) audio-centric summary, (2) image-centric summary, and (3) audio-visual content summary. In Chapter 11, Mulhem, Gensel, and Martin present then their work on the VISU model, which allows both to annotate videos with high level semantic descriptions and to query these descriptions for generating video summaries. This discussion is followed by a broad overview of adaptive video segmentation and summarization approaches, presented by Owen and Dixon in Chapter 12. In Chapter 13, Owen, Zhou, Tang, and Xiao describe augmented imagery and its applications as a powerful tool for enhancing or repurposing content in video databases, including a number of interesting case studies. In Chapter 14, Ahmed and Karmouch present a new algorithm for video indexing, segmentation and key framing, called the binary penetration algorithm, and show its extension to a video web service over the World Wide Web for multiple video formats. Concluding the Section, in Chapter 15, Zhao and Grosky revisit the semantic gap problem and introduce a novel technique for spatial color indexing, color anglogram, and its use in conjunction with a dimension reduction technique, latent semantic indexing (LSI), to uncover the semantic correlation between video frames.
Section IV examines tools and techniques for designing and interacting with video databases. In Chapter 16, Hjelsvold and Vdaygiri present the result of their work while developing two interactive video database applications: HotStreams™ - a system for delivering and managing personalized video content - and TEMA (Telephony Enabled Multimedia Applications) - a platform for developing Internet-based multimedia applications for Next Generation Networks (NGN). In Chapter 17, Huang, Chokkareddy, and Prabhakaran introduce the topic of animation databases and present a toolkit for animation creation and editing. In Chapter 18, Djeraba, Hafri, and Bachimont explore different video exploration strategies adapted to user requirements and profiles, and introduce the notion of probabilistic prediction and path analysis using Markov models. Concluding the Section, in Chapter 19, Picariello, Sapino, and Subrahmanian introduce AVE! (Algebraic Video Environment), the first algebra for querying video.
The challenges behind audio and video indexing and retrieval are discussed in Section V. It starts with a survey of the state of the art in the area of audio content indexing and retrieval by Liu and Wang (Chapter 20). In Chapter 21, Ortega-Binderberger and Mehrotra discuss the important concept of relevance feedback and some of the techniques that have been successfully applied to multimedia search and retrieval. In Chapter 22, Santini proposes a novel approach to structuring and organizing video data using experience units and discusses some of its philosophical implications. Farin, Haenselmann, Kopf, K hne, and Effelsberg describe their work on a system for video object classification in Chapter 23. In Chapter 24, Li, Tang, Ip, and Chan advocate a web-based hybrid approach to video retrieval by integrating the query-based (database) approach with the content-based retrieval paradigm and discuss the main issues involved in developing such a web-based video database management system supporting hybrid retrieval, using their VideoMAP* project as an example. In Chapter 25, Zhang and Chen examine the semantic gap problem in depth. The emergence of MPEG-7 and its impact on the design of video databases is the central topic of Chapter 26, where Smith discusses the topic in great technical detail and provides examples of MPEG-7-compatible descriptions. In Chapter 27, Satoh and Katayama discuss issues and approaches for indexing of large-scale (tera- to peta-byte order) video archives, and report their work on Name-It, a system that associate faces and names in news videos in an automated way by integration of image understanding, natural language processing, and artificial intelligence technologies. The next two chapters cover the important problem of similarity measures in video database systems. In Chapter 28, Cheung and Zakhor discuss the problem of video similarity measurement and propose a randomized first-order video summarization technique called the Video Signature (ViSig) method, whereas in Chapter 29, Traina and Traina Jr. discuss techniques for searching multimedia data types by similarity in databases storing large sets of multimedia data and present a flexible architecture to build content-based image retrieval in relational databases. At the end of the Section, in Chapter 30, Zhou and Huang review existing relevance feedback techniques and present a variant of discriminant analysis that is suited for small sample learning problems.
In Section VI we focus on video communications, particularly streaming, and the technological challenges behind the transmission of video across communication networks and the role played by emerging video compression algorithms. In Chapter 31, Hua and Tantaoui present several cost-effective techniques to achieve scalable video streaming, particularly for video-on-demand (VoD) systems. In Chapter 32, Shahabi and Zimmermann report their work designing, implementing, and evaluating a scalable real-time streaming architecture, Yima. In Chapter 33, Zhang, Ayg n, and Song present the design strategies of a middleware for client-server distributed multimedia applications, termed NetMedia, which provides services to support synchronized presentations of multimedia data to higher level applications. Apostolopoulos, Tan, and Wee examine the challenges that make simultaneous delivery and playback of video difficult, and explore algorithms and systems that enable streaming of pre-encoded or live video over packet networks such as the Internet in Chapter 34. They provide a comprehensive tutorial and overview of video streaming and communication applications, challenges, problems and possible solutions, and protocols. In Chapter 35, Ghandeharizadeh and Kim discuss the continuous display of video objects using heterogeneous disk subsystems and quantify the tradeoff associated with alternative multi-zone techniques when extended to a configuration consisting of heterogeneous disk drives. In Chapter 36, Vetro and Kalva discuss the technologies, standards, and challenges that define and drive universal multimedia access (UMA). In Chapter 37, Basu, Little, Ke, and Krishnan look at the dynamic stream clustering problem, and present the results of simulations of heuristic and approximate algorithms for clustering in interactive VoD systems. In Chapter 38, Lienhart, Kozintsev, Chen, Holliman, Yeung, Zaccarin, and Puri offer an overview of the key questions in distributed video management, storage and retrieval, and delivery and analyze the technical challenges, some current solutions, and future directions. Concluding the Section, Torres and Delp provide a summary of the state of the art and trends in the fields of video coding and compression in Chapter 39.
Section VII provides the reader with the necessary background to understand video processing techniques and how they relate to the design and implementation of video databases. In Chapter 40, Wee, Shen, and Apostolopoulos present several compressed-domain image and video processing algorithms designed with the goal of achieving high performance with computational efficiency, with emphasis on transcoding algorithms for bitstreams that are based on video compression algorithms that rely on the block discrete cosine transform (DCT) and motion-compensated prediction, such as the ones resulting from predominant image and video coding standards in use today. In Chapter 41, Wang, Sheikh, and Bovik discuss the very important, and yet largely unexplored, topic of image and video quality assessment. Concluding the Section, in Chapter 42, Do rr and Dugelay discuss the challenges behind extending digital watermarking, the art of hiding information in a robust and invisible manner, to video data.
In addition to several projects, prototypes, and commercial products mentioned throughout the Handbook, Section VIII presents detailed accounts of three projects in this field, namely: an electronic clipping service under development at At&T Labs, described by Gibbon, Begeja, Liu, Renger, and Shahraray in Chapter 43; a multi-modal two-level classification framework for story segmentation in news videos, presented by Chaisorn, Chua, and Lee in Chapter 44; and the Video Scout system, developed at Philips Research Labs and presented by Dimitrova, Jasinschi, Agnihotri, Zimmerman, McGee, and Li in Chapter 45.
Finally, Section IX assembles the answers from some of the best-known researchers in the field to questions about the state of the art and future research directions in this dynamic and exciting field.