8. Conclusions

The arrival of personal TV products into consumers market has created an opportunity to experiment with and deploy content-based retrieval research into advanced products. In this paper we describe a multimodal content-based approach to personal TV recorders. Our system employs a three-layer, multimodal integration framework to analyze, characterize, and retrieve whole programs and video segments based on user profiles. This framework has been applied to narrative and non-narrative programs, although in the current implementation we focus on non-narrative content. We believe that our research enables consumers to have a completely new level of access to television programs. The access is enhanced through personalization of video content at the sub-program level using multimodal integration.

We can apply personalized content based access to video in both television of the future scenarios mentioned in the introduction of the paper: VOD and HDR. In the VOD scenario, the archiving can be performed at the service provider end and the descriptions sent down to the set-top box. In the HDR scenario the archiving can be also performed in the set-top box as we have shown with our current research and implementation.

Video Scout is currently implemented on a PC and a Philips TriCodec board. We use a 600 MHz PC with a PIII processor and 256 MB of RAM running WindowsNT. The TriCodec board has a 100 MHz TriMedia TM1000 processor and 4MB of RAM. On this low-end platform, the visual, audio, and transcript analyses take less than the length of the TV programs to analyze and extract the features while the Bayesian engine takes less than one minute per one hour of TV program to give the segmentation and classification information. The retrieval application runs on the PC. However this is not computationally intensive and therefore can migrate onto the TriMedia. Users can access Scout using a Philips Pronto programmable remote control.

In the future we plan to explore (i) the use of different retrieval features, (ii) multimodal integration for narrative content, and (iii) delivery of a full-fledged system capable of responding to users' personalization needs. Also, we plan to fully integrate our face learning and person identification methods into the current system.