Brief Introduction on MPEG-7 Audio

MPEG-7 is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group) [25], the committee that developed well known MPEG-1, MPEG-2, and MPEG-4 standards. MPEG-7, formally named "Multimedia Content Description Interface," is a standard for describing the multimedia content data that supports some degree of interpretation of information's meaning, which can be passed onto, or accessed by a device or computer code. MPEG-7 standard provides support to a broad range of applications (for example, multimedia digital libraries, broadcast media selection, multimedia editing, home entertainment devices, etc.). MPEG-7 will also make the web as searchable for multimedia content as it is searchable for text today.

The main elements of MPEG-7 standard are: 1) Description tools are composed of Descriptors, which define the syntax and the semantics of each feature (metadata element), and Description Schemes (DS), that specify the structure and semantics of the relationships between their components: Descriptors and Description Schemes. 2) A Description Definition Language (DDL) defines the syntax of the MPEG-7 Description tools and allows the creation of new DSs and Ds and the extension and modification of existing DSs. 3) System tools support binary coded representation for efficient storage and transmission, transmission mechanisms, multiplexing of descriptions, synchronization of descriptions with content, management, and protection of intellectual property in Ds.

MPEG-7 consists of eight parts: 1) Systems, 2) Description Definition Language, 3) Visual, 4) Audio, 5) Multimedia Description Schemes, 6) Reference Software, 7) Conformance Testing, and 8) Extraction and use of descriptions. MPEG-7 audio provides structures, in conjunction with the Multimedia Description Schemes part of the standard, for describing audio content.

The MPEG-7 audio standard comprises a set of descriptors that can be divided roughly into two classes: low-level or generic tools and high-level or application specific tools. Figure 20.5 shows an overview of these audio descriptors.

click to expand
Figure 20.5: Overview of audio framework in MPEG-7 audio.

1.18 Low Level Tools

Low level tools may be used in a variety of applications. Besides silence descriptor, there are six groups of low level audio descriptors.

Basic Audio Descriptors include the AudioWaveform Descriptor, which describes the audio waveform envelope, and the AudioPower Descriptor, which depicts the temporally smoothed instantaneous power.

Basic Spectral Descriptors are derived from a single time-frequency analysis of audio signal. Among this group are the AudioSpectrumEnvelope Descriptor, which is a log-frequency spectrum; the AudioSpectrumCentroid Descriptor, which describes the center of gravity of the log-frequency power spectrum; the AudioSpectrumSpread Descriptor, which represents the second moment of the log-frequency power spectrum; and the AudioSpectrumFlatness Descriptor, which indicates the flatness of the spectrum within a number of frequency bands.

Signal Parameter Descriptors consist of two descriptors: The AudioFundamentalFrequency descriptor describes the fundamental frequency of an audio signal, and the AudioHarmonicity Descriptor represents the harmonicity of a signal.

There are two Descriptors in Timbral Temporal Descriptors group. The LogAttackTime Descriptor characterizes the attack of a sound, and the TemporalCentroid Descriptor represents where in time the energy of a signal is focused.

Timbral Spectral Descriptors have five components. The SpectralCentroid Descriptor is the power-weighted average of the frequency of the bins in the linear power spectrum, the HarmonicSpectralCentroid Descriptor is the amplitude-weighted mean of the harmonic peaks of the spectrum, the HarmonicSpectralDeviation Descriptor indicates the spectral deviation of log-amplitude components from a global spectral envelope, the HarmonicSpectralSpread describes the amplitude-weighted standard deviation of the harmonic peaks of the spectrum, and finally the HarmonicSpectralVariation Descriptor is the normalized correlation between the amplitude of the harmonic peaks between two subsequent time-slices of the signal.

The last group of low level descriptors is Spectral Basis Descriptors. It includes the AudioSpectrumBasis Descriptor, which is a series of basis functions that are derived from the singular value decomposition of a normalized power spectrum, and the AudioSpectrumProjection Descriptor, which represents low-dimensional features of a spectrum after projection upon a reduced rank basis.

1.19 High Level Tools

High level tools are specialized for domain specific applications. There are five sets of high level tools that roughly correspond to the application areas that are interested in the standard.

Audio Signature Description Scheme statistically summarizes the spectral flatness Descriptor as a condensed representation of an audio signal. It provides a unique content identifier for robust automatic identification of audio signals.

Musical Instrument Timbre Description Tools describe perceptual features of instrument sounds with a reduced set of Descriptors. They relate to the notion such as "attack," "brightness," and "richness" of a sound.

Melody Description Tools include a rich representation for monophonic melodic information to facilitate efficient, robust, and expressive melodic similarity matching. The scheme includes a MelodyContour Description Scheme for extremely terse, efficient melody contour representation, and a MelodySequence Description Scheme for a more verbose, complete, expressive melody representation.

General Sound Recognition and Indexing Description Tools are a collection of tools for indexing and categorization of general sounds, with immediate application to sound effects.

Spoken Content Description Tools allow detailed description of words spoken within an audio stream.

MPEG-7 aims at making the multimedia content more searchable than it is today. The set of descriptors standardized in MPEG-7 audio makes it possible to develop content retrieval tools, systems that are able to access a lot of different audio archives in the same way. They are also useful for the content creator to edit the content, and for content distributors to select and filter the content. Some typical applications of MPEG-7 audio include large scale audio content (radio, TV broadcast, movie, music) archives and retrieval, audio content distribution, education, and surveillance.

Handbook of Video Databases. Design and Applications
Handbook of Video Databases: Design and Applications (Internet and Communications)
ISBN: 084937006X
EAN: 2147483647
Year: 2003
Pages: 393 © 2008-2017.
If you may any questions please contact us: