In this chapter, we emphasize two important aspects in video indexing, namely, analysing and organizing video information. For research effort in video analysis, we described Name-It, which automatically associates faces and names in news videos by properly integrating state-of-the-art technologies of image processing, natural language processing, and artificial intelligence. For video information organization, we developed SR-tree for efficient nearest-neighbor search of multidimensional information. We then revealed that these two aspects are closely related: Name-It extracts meaningful face-name association information by finding coincident high-density regions among the different feature spaces. On the other hand, the similarity retrieval with nearest-neighbor search returns meaningful results when the feature space has skewed distribution. Based on these facts, we studied searching meaningful information in video archives, and developed the distinctiveness detection method and its application to nearest-neighbor search.
One of the promising future directions toward video indexing might be to integrate the above three approaches to further enhance mining meaningful information from realistic-scale video archives. The presented approaches in this chapter are especially important when the archives are huge and have skewed distribution. As our observation implied that real video archives have skewed distribution in the corresponding feature space, yet at the same time, the size of the video archives should be large enough to be practical. Based on this idea, we are currently developing a broadcast video archive system. The system can acquire digital video archives from seven terrestrial channels available in the Tokyo area, 24 hours a day, in a fully automated way. Thus the system can provide quite large (currently 7000 hours of videos in 10TB disk array) video archives including diverse types of programs. We are testing our technologies with the broadcast video archive system to further enhance the technologies and to realize mining meaningful information from realistic-scale video archives.