In this section, we briefly describe our prototype work in terms of the implementation environment, the actual user interface and language facilities, and sample results.
Starting from the first prototype of VideoMAP, our experimental prototyping has been conducted based on Microsoft Visual C++ (as the host programming language), and NeoAccess OODB toolkit (as the underlying database engine). Here we illustrate the main user facilities supported by VideoMAP which is also the kernel for our web-based prototype, i.e., VideoMAP*. It currently runs on Windows and offers a user-friendly graphical user interface supporting two main kinds of activities: Video editing, and Video retrieval.
When a user invokes the function of video editing, a screen comes up for uploading a new video into the system and annotating its semantics (cf. Figure 24.5). For example, s/he can name the video object and assign some basic descriptions to get started.
Figure 24.5: Annotating the segments after video segmentation.
A sub-module of Video segmentation is devised to help decompose the whole video stream into segments and to identify the keyframes. Further, the Feature Extraction module is to calculate the visual features of the media object. By reviewing the video abstraction structure composed by the segments and keyframes, the user can annotate the semantics according to his/her understanding and preference (cf. Figure 24.5).
Our prototype also provides an interface for the user to issue queries using its query language (i.e. CAROL/ST with CBR). All kinds of video objects such as "Scene," "Segment," "Keyframe" can be retrieved by specifying their semantics or visual information. Figure 24.6 shows a sample query issued by the user, which is validated by a compiler sub-module before execution. The retrieved video objects are then returned to the user in the form of a tree (cf. Figure 24.6), whose node not only can be played out, but also can be used subsequently for formulating new queries in an iterative manner.
Figure 24.6: Query facilities.
As described earlier, our system supports three primary types of queries, namely: (1) query by semantic information, (2) query by visual information and (3) query by both semantic and visual information (the "hybrid" type). For type (1), VideoMAP supports retrieving all kinds of video objects (Video, Scene, Segment, Keyframe, Feature) based on the semantic annotations input by the editor/operator earlier (cf. section 5.1.1). Figure 24.7 shows the interface for users to specify semantic query, and for displaying the query result video.
Figure 24.7: Query by semantic information.
For type (2), users can specify queries which involve visual features and their similarity measurement. Visual similarity considers the feature of color, texture and so on. Users can specify the query by using either individual feature or its combination [5]. Figure 24.8 illustrates the interaction sessions for the user to specify a visual feature-based query (using "similar-to" operator), and the resulting video being displayed to the user. Finally, for type (3), users can issue queries which involve both semantic and visual features of the targeted video objects. These "heterogeneous" features call for different similarity measurement functions, and their integration is being devised in the context of VideoMAP* prototype system.
Figure 24.8: Query by visual information.
The experimental prototype of our web-based video query system (VideoMAP*) is built on top of Microsoft Windows 2000. The Query Client Applet is developed using Borland's JBuilder 6 Personal, while the Query Server is implemented using Microsoft Visual C++ 6.0. The kernel of the query server is based on its original standalone system. The client ORB is developed using IONA's ORBacus 4.1 for Java, and the server ORB is developed using IONA's ORBacus 4.1 for C++. The client uses Java Media Framework to display video query result, while the server still uses the same object-oriented database engine, NeoAccess, to store object data.
Figure 24.9 shows the GUI of the query system. There are two processes: Query Client Applet, and Query Server. The client (left-side) provides several functions, such as Login, DBList, DBCluster, Schema, Query Input, Query Result and Video Display. Some important server messages and query debug messages are shown in the Server/Debug MessageBox. The server console (right-side) shows a list of connection messages.
Figure 24.9: GUI of the Web-based video query system (VideoMAP*).
In Figure 24.10, it shows a query of extracting all scene objects that are created in a cluster of databases. The format of the result returned from the query server is XML-like, which contains the type of the result object, the name of the object, the brief description of the object, the video link and the temporal information of the video. Owing to its expressiveness and flexibility, XML is very suitable for Web-based data presentation, in addition to being the standard for multimedia data description (e.g., MPEG-7). More specifically, video data can be separated into XML data description and raw videos. The data description can be easily distributed over the Web but the server keeps its own raw video files. So a user can display the video result by making connection to the web server. This would reduce the loading of the query server when there are multiple user connections.
Figure 24.10: A client's query processed by the server (VideoMAP*).