Digital video is now becoming prevalent and will likely become the major source of information and entertainment in the next few decades. With decreasing production and storage costs and faster networking, it will soon be possible to access massive video repositories. It is, however, not clear that such repositories will be useful in the absence of powerful search paradigms and intuitive user interfaces. So far, most research on video analysis in the context of large databases has focused on low-level processes (such as colour or texture characterization) that are not likely to solve the problem.
In this chapter, we have presented an alternative view, which is directly targeted at recovering video semantics. We have argued for a principled formulation based on Bayesian inference, and showed that it has great potential for both low-level tasks, such as shot segmentation, and high-level semantic characterization. In particular, it was shown that principled models of video structure can make a significant difference at all levels. These ideas were embodied in the BMoViES system to illustrate the fact that, once a semantic representation is in place, problems such as retrieval, summarization, visualization, and browsing become significantly simpler to address.