3. A Stand-Alone Video Retrieval System | Handbook of Video Databases: Design and Applications (Internet and Communications)

3. A "Stand-Alone" Video Retrieval System

As mentioned earlier, we have been developing an extended version of VideoMAP, a "stand-alone" video retrieval system (VRS) for supporting hybrid retrieval of videos through query-based and content-based accesses [5]. The architecture of it is depicted in Fig. 24.1, which is divided into several components: Video acquisition, video analysis, video index and video retrieval.

3.1 Processing Flows

The main processing flows of supporting hybrid video retrieval in a stand-alone VRS involve the following:

Video acquisition: There exists an extensive distribution network of multimedia data globally. And videos, as an important element of multimedia, can be found in a variety of sources.
Video analysis: The obtained video meta data is only a sequence of frames, without structure and annotation initially, so a series of processes of video indexing and retrieval must be carried out based on content analysis. There are three modules:
- Segmentation and Clustering Component (SCC): Detect the camera movements in video, cluster the semantically relevant segments into scene and determine which module is necessary for structuring video objects.
- Feature Extracting Component (FEC): Extract the visual feature vector from the video and other object features, such as the color, texture, shape and so on.
- Semantic Definition and Extraction Component (SDEC): Define key-frame semantic and extract and associate/map semantic meanings to low-level image properties.
- Semantic Modeling Component (SMC): Organize video segments into semantically meaningful hierarchies with annotated feature indices.
Video index: The process of video analysis typically yielded a video hierarchical structure. Indexes are generated for the scene hierarchy, and the video visual features and video segment hierarchies. These indexes are not absolutely self-existent. For example, in the "Segment" and "Keyframe" level of video hierarchy, they also contain the indexes for visual and semantic features.
Video retrieval: Video retrieval contains two kinds of retrieval formats: CAROL/ST Retrieval - the original retrieval format which mainly uses the semantic annotation and spatio-temporal relationships of video; Content-based Retrieval - the prevailing retrieval format which mainly uses the visual information inherent in the video content.

3.1.1 Foundation Classes

Our "stand-alone" VRS extends a conventional OODB to define video objects through a specific hierarchy (video➜scene➜segment➜keyframe). In addition, it includes a mechanism of CBR to build index on visual features of these objects. Their class attributes, methods and corresponding relations form a complex network (or, a "map"). More details are described in [5].

click to expand
Figure 24.1: A "Stand-alone" VRS architecture.

3.2 A Query Language (CAROL/ST) with CBR

In the hybrid approach, there are three kinds of queries possible for the query-processing component: Text-based Search, Content-based Search and the Hybrid Search. Search paths of the three kinds of queries are described in [5]. In our approach, objects, attributes and methods are stored in separated classes. Therefore, the query processor can search the object database with different entry points. After integrating it with CBR, the search paths become applicable to our hybrid approach. Here, we show some query examples (cf. Figure 24.5 and Figure 24.6). Detailed query syntax and additional features for query processing are described in [6].