Chapter 26: Video Indexing and Retrieval using MPEG-7

John R. Smith
IBM T. J. Watson Research Center
30 Saw Mill River Road
Hawthorne, NY 10532 USA


1. Introduction

Numerous requirements are driving the need for more effective methods for searching and retrieving video based on content. Some of the factors include the growing amount of unstructured multimedia data, the broad penetration of the Internet, and new on-line multimedia applications related to distance learning, entertainment, e-commerce, consumer digital photos, music download, mobile media, and interactive digital television. The MPEG-7 multimedia content description standard addresses some aspects of these problems by standardizing a rich set of tools for describing multimedia content in XML [31],[32]. However, MPEG-7 does not standardize methods for extracting descriptions nor for matching and searching. The extraction and use of MPEG-7 descriptions remains a challenge for future research, innovation, and industry competition. As a result, techniques need to be developed for analyzing, extracting, and indexing information from video based using MPEG-7 standard [22][36][37][44]

1.1. Video Indexing and Retrieval

The problem of video indexing and retrieval using MPEG-7 involves two processes: (1) producing or extracting MPEG-7 descriptions from video content, and (2) searching for video content based on the MPEG-7 descriptions. In general, in MPEG-7 pull applications the user is actively seeking multimedia content or information. The benefit of MPEG-7 for pull applications is that the queries can be based on standardized descriptions. While content-based video retrieval is useful for many applications such as multimedia databases, intelligent media services, and personalization, many applications require an interface at the semantic level. Ideally, video retrieval involves, for example, description of scenes, objects, events, people, places, and so forth, along with description of features, speech transcripts, closed captions, and so on. MPEG-7 provides rich metadata for describing many aspects of video content including the semantics of real-world scenes related to the video content.

The overall application environment for MPEG-7 video indexing and retrieval is illustrated in Figure 26.1. The environment involves a user seeking information from a digital media repository. The repository stores video content along with corresponding MPEG-7 metadata descriptions. The MPEG-7 metadata potentially gives descriptions of semantics (i.e., people, places, events, objects, scenes, and so on), features (color, texture, motion, melody, timbre, and so on), and other immutable attributes of the digital media (i.e., titles, authors, dates, and so on). The user may be provided with different means for searching the digital media repository, such as by issuing text or key-word queries, by selecting examples of content being sought, by selecting models or illustrating through features, and so on. Another aspect of the environment involves the access and delivery of video from the digital media repository. Given the rich MPEG-7 descriptions, the digital media content can potentially be adapted to the user environment. For example, the video content can be summarized to produce a personalized presentation according to user's preferences, device capabilities, usage context, and so on.

click to expand
Figure 26.1: MPEG-7 applications include pull-type applications such as multimedia database searching, push-type applications such as multimedia content filtering, and universal multimedia access.

1.1.1. MPEG-7 Standard Elements

MPEG-7 is a standard developed by International Standards Organization (ISO) and International Electrotechnical Commission (IEC), which specifies a "Multimedia Content Description Interface." MPEG-7 provides a standardized representation of multimedia metadata in XML. MPEG-7 describes multimedia content at a number of levels, including features, structure, semantics, models, collections, and other immutable metadata related to multimedia description. The objective of MPEG-7 is to provide an interoperable metadata system that is also designed to allow fast and efficient indexing, searching, and filtering of multimedia based on content. The MPEG-7 standard specifies an industry standard schema using XML Schema Language. The schema is comprised of Description Schemes (DS) and Descriptors. Overall, the MPEG-7 schema defines over 450 simple and complex types. MPEG-7 produces XML descriptions but also provides a binary compression system for MPEG-7 descriptions. The binary compression system allows MPEG-7 descriptions to be more efficiently stored and transmitted. The MPEG-7 descriptions can be stored as files or within databases independent of the multimedia data, or can embedded within the multimedia streams, or broadcast along with multimedia data.

The constructs are defined as follows:

  • The Description Definition Language (DDL) is the language specified in MPEG-7 for defining the syntax of Description Schemes and Descriptors. The DDL is based on the XML Schema Language.

  • Description Schemes (DS) are description tools defined using DDL that describe entities or relationships pertaining to multimedia content. Description Schemes specify the structure and semantics of their components, which may be Description Schemes, Descriptors, or datatypes. Examples of Description Schemes include: MovingRegion DS, CreationInformation DS, and Object DS.

  • Descriptors (D) are description tools defined using DDL that describe features, attributes, or groups of attributes of multimedia content. Example Descriptors include: ScalableColor D, SpatioTemporalLocator D, and AudioSpectrumFlatness D.

  • Features are defined as a distinctive characteristic of multimedia content that signifies something to a human observer, such as the "color" or "texture" of an image. This distinguishes Descriptions from Features as follows: consider color to be a feature of an image, then the ScalableColor D can be used to describe the color feature.

  • Data (Essence, Multimedia Data) is defined as a representation of multimedia in a formalized manner suitable for communication, interpretation, or processing by automatic means. For example, the data can correspond to an image or video.

The MPEG-7 standard specifies the Description Definition Language (DDL) and the set of Description Schemes (DS) and Descriptors that comprise the MPEG-7 schema. However, MPEG-7 is also extensible in that the DDL can be used to define new DSs and Descriptors and extend the MPEG-7 standard DSs and Descriptors. The MPEG-7 schema is defined in such a way that would allow a customer Descriptor to be used together with the standardized MPEG-7 DSs and Descriptors, for example, to include a medical image texture Descriptor within an MPEG-7 image description.

1.1.2. Outline

In this chapter, we examine the application of MPEG-7 for video indexing and retrieval. The chapter is organized as follows: in Section 2, we introduce the MPEG-7 standard and give examples of the description tools and review the elements of the MPEG-7 standard including description tools and classification schemes. In Section 3, we identify MPEG-7 description tools for video indexing and retrieval and give example descriptions. In Section 4, we discuss video searching. Finally, in Section 5, we examine future directions and make conclusions.

