Video Scout consists of an archiving module and a retrieval module (see Figure 45.1). Scout takes input from the encoded video, the electronic program guide (EPG) metadata, and the user profile in order to derive high-level information. This process involves the following:
Pre-selects programs to record based on matches between the EPG metadata and the user profile.
De-multiplexes and decodes TV program.
Unimodal Analysis Engine: Analyzes the individual audio, visual, and transcript streams.
Multimodal Bayesian Engine: Classifies segments based on integration of the visual, audio, and transcript streams.
Personalizer: Matches requests in user profile with indexed content.
User interface: Allows users to access video segments organized by either TV program or individual segment topic.
User profile: Collects data from the user interface and passes it to the archiving module.
Figure 45.1: System Diagram.
An overview of the system architecture used for archiving is given in Figure 45.2. The system consists of a host PC with a TriMedia board (a specialized media processor). We should note that the archiving could be performed during program recording at the set-top box end or at the service provider end. In the second case, content descriptions can be encoded in XML in a proprietary format or using MPEG-7 for later use in retrieval. As shown in Figure 45.2, the program for analysis is selected based on the EPG data and the user's interests. Selected fields of the EPG data are matched to the user profile. When a match occurs, the relevant program information is written to a program text file. The MPEG-2 Video related to the one of selected program is then retrieved and decoded in software. During the decoding process, the individual visual, audio, and transcript data are separated out.
Figure 45.2: Block diagram of the system architecture displaying the PC and TriMedia sides. Items within the dashed line comprise the Multimodal Integration Framework.
The Unimodal Analysis Engine (UAE) reads the separated visual and audio streams from the MPEG-2 video. In addition, it creates the transcript stream by extracting the closed caption data. From the visual stream, the UAE generates probabilities for cuts and performs, videotext, and face detection. From the audio stream, the UAE segments the audio signal and generates probabilities for seven audio categories: silence, speech, music, noise, and speech with background music, speech with speech, and speech with noise. From the transcript stream, the UAE looks for the single, double and triple arrows in text and generates probabilities indicating the category of a segment (e.g., "economy", "weather"). In addition, the UAE produces a descriptive summary for each program. These probabilities and the summary are then stored for each program. The Multimodal Bayesian Engine combines the input from the unimodal analysis engine and delivers high-level inferences to the retrieval module.
An overview of the retrieval module is shown in Figure 45.3. The retrieval module contains the Personalizer performing the personalized segment selection, and the User Interface to enable the user to interact with the system. The Personalizer looks for matches between the indexed segments and the user profile. When a match is found, the segment is flagged as "not watched". The user interface allows users to enter their profiles. These profiles use a magnet metaphor-attracting the content users request. In addition, the user interface allows users to access and playback whole programs or video segments organized by topic.
Figure 45.3: Overview of Video Scout retrieval application.
The user profile contains implicit and explicit user requests for content. These requests can be for whole TV programs as well as for topic-based video segments. Implicit requests are based on inferences from viewing history. The profile contains program titles, genres, actors, descriptions, etc. In addition, the profile contains specific topics users are interested in. For example: A profile might contain request for financial news about specific companies. In this case, Scout would store individual news stories and not whole programs. The profile performs the function of a database query in two ways. First, the Analysis Engine uses the profile to determine which programs to record. Second, the Personalizer uses the profile to select specific segments from the entire indexed and recorded content.