1. Introduction

Over the past few years, there has been a spectacular increase in the number of applications that need to organize, store and retrieve video data. Such applications largely fall into three categories.

Annotated video databases: News organizations such as CNN and the BBC have vast archives of video that they need to store, query and retrieve. Much of this data is stored using textual annotations that are often painstakingly hand created. In other cases, such textual annotations are created using free text data that accompanies the video (e.g., transcripts of what is being said in the news program). This body of textual annotations about the video is used to store the video and to retrieve part or all of it.

Image processed video databases: There are numerous applications where the amount of video data being produced is very high, and where the time-frame within which such video must be queryable is very small. Such applications include surveillance video (e.g., in banks, airports, etc.). In such cases, users must be rapidly able to take actions - for instance, when a known terrorist is spotted at an airport, this fact must be immediately made available to appropriate law enforcement authorities before it is too late to act. If the terrorist has previously been seen at various banks, the ability to correlate the airport surveillance video with the bank surveillance videos would greatly enhance the ability to track financial aspects of such crimes.

Military applications also fall into this category. For instance, the use of Predator video for airborne surveillance in military operations represents information that must be acted upon expeditiously. The luxury of waiting till some person looks at the video and creates textual annotations is not an option in such cases, as decisions must be taken in soft real time based on the data contained in such videos.

Hybrid video databases: As the name implies, databases of this kind contain both types of data. For example, a news organization with a video of a terrorist activity may want to correlate the terrorist strike with its existing video archives which are processed using textual annotations.

Clearly, the first two types of databases are special cases of hybrid video databases. Hence, in this paper, we focus on hybrid video databases as they cover both the above possibilities.

In this paper, we describe the AVE! video database system developed by us. AVE! stands for Algebraic Video Environment. Classical relational databases [22,23] take queries in a declarative query language (SQL or relational calculus are examples) and convert them to a relational algebra query. The relational algebra query is then optimised. This is done by using some rewrite rules that hold in the algebra - such rewrite rules make statements of the form "algebraic query expression 1 equals (or returns a subset of) algebraic query expression 2." Without the formal definition of a relational algebra style algebra for video databases, there is little hope to build effective video databases that scale to large numbers of users and data.

This paper represents a first step towards this much broader goal. It describes an algebra that may be used to query video data when either human-created annotations of the video are present, or when video analysis and image processing programs are present, or both. As such, our algebra is general and applies to most kinds of video database applications. To our knowledge, it is the first algebra for querying video (the only prior algebra for video is not for querying video, but for composing videos [24]). In addition, we have implemented a prototype system called AVE! that demonstrates our theoretical model.