Many features have been proposed for image/video retrieval. For images, often used features are color, shape, texture, color layout, etc. A comprehensive review can be found in . Traditional video retrieval systems employ the same feature set on each frame, in addition to some temporal analysis, e.g., key shot detection . Recently, a lot of new approaches have been introduced to improve the features. Some of them are based on temporal or spatial-temporal analysis, i.e., better ways to group the frames and select better key frames. This includes integrating with other media, e.g., audio, text, etc. Another hot research topic is motion-based features and object-based features. Compared to color, shape and texture, motion-based and object-based features are more natural to human beings, and therefore at a higher level.
Traditional video analysis methods are often shot based. Shot detection methods can be classified into many categories, e.g., pixel based, statistics based, transform based, feature based and histogram based . After the shot detection, key frames can be extracted in various ways . Although key frames can be used directly for retrieval , many researchers are studying better organization of the video structures. In , Yeung et al. developed scene transition graphs (STG) to illustrate the scene flow of movies. Aigrain et al. proposed to use explicit models of video documents or rules related to editing techniques and film theory . Statistical approaches such as Hidden Markov Model (HMM) , unsupervised clustering  were also proposed. When audio, text and some other accompanying contents are available, grouping can be done jointly . There were also a lot of researches on extracting captions from video clips and they can also be used to help retrieval .
Motion is one of the most significant differences between video and images. Motion analysis has also been very popular for video retrieval. On one hand, motion can help find interesting objects in the video, such as the work by Courtney , and Ferman et al. , Gelgon and Bouthemy , Ma and Zhang , etc. On the other hand, motion can be directly used as a feature, named by Nelson and Polana  as "temporal texture." The work was extended by Otsuka et al. , Bouthemy and Fablet , Szummer and Picard , etc.
If objects can be segmented easily, object-based analysis of video sequence is definitely one of the most attractive methods to try. With the improvement on computer vision technologies, many object-based approaches were proposed recently. To name a few, in , Courtney developed a system, which allows for detecting moving objects in a closed environment based on motion detection.
Zhong and Chang  applied color segmentation to separate images into homogeneous regions, and tracked them along time for content-based video query. Deng and Manjunath  proposed a new spatio-temporal segmentation and region-tracking scheme for video representation. Chang et al. proposed to use Semantic Visual Templates (SVT), which is a personalized view of concepts composed of interactively created templates /objects.
This chapter is not intended to cover in depth for features used in the state-of-the-art video retrieval systems. Readers are referred Sections II, III and V for more detailed information.