5.2 Video Compression Standards

Streaming of video over today's wired and wireless networks depends heavily on international video compression standards. There are numerous video compression systems that do not use open standards, such as Real Network's RealVideo and Microsoft's Windows Media Player, but they are not discussed in this chapter, as details of their inner workings are not publicly available. Standardization in the video compression space has been done primarily by two different standards bodies, the International Organization for Standardization (ISO) and the International Telecommunications Union (ITU), previously CCITT. The video communications standards of the highest past, current, and future interest are H.261, H.263, MPEG-1, MPEG-2, MPEG-4, and JVT. These video compression standards aree described briefly here, with their particular features relevant to wireless streaming highlighted. More details on the video compression standards themselves can be found in Puri and Chen ^[1] and Rao and Hwang. ^[2]

5.2.1 H.261

ITU-T H.261, "Video codec for audiovisual services at p 64 kbps," is the ancestor of all of the popular video compression standards in use today. H.261 was designed for video telephony and video conferencing, for use over one or more dedicated ISDN lines. The standardization effort for H.261 began with the establishment in December 1984 of CCITT Study Group XV, Specialist Group on Coding for Visual Telephony. In March 1989, the p 64 kbps specification was frozen. Final standardization was established in December 1990.

Like the other video compression standards that follow it, H.261 uses block-based motion estimation and compensation and block-based transform and quantization. Intracoded frames and intercoded frames are allowed in H.261. Intercoded frames are encoded with respect to a prediction formed from a previously coded frame. A Discrete Cosine Transform is applied to 8 x 8 pixel blocks, and the resulting transform coefficients are quantized and entropy coded using variable length coding (VLC) techniques. Macroblocks are arranged into Group of Blocks (GOBs). Pictures and GOBs contain unique start codes, which can be used as resynchronization points when transmission errors occur.

5.2.2 MPEG-1

Work on MPEG began in 1988. ISO IEC/JTC1 SC29 IS 11172, "Coded representation of picture, audio, and multimedia/hypermedia information," became an international standard in November 1992. MPEG was originally designed for digital storage applications with a target bit rate of about 1.5 Mbps, but has been applied to a wide spectrum of application, including video streaming over the Internet.

Like H.261, MPEG allows intracoded frames ("I" frames) and intercoded frames ("P" frames); also, MPEG introduced bidirectionally coded frames ("B" frames). B frames are predicted using a frame before and after the coded frame, and can be coded using relatively fewer bits. In MPEG, B frames are never used in coding other pictures. This disposable property of B frames can be important when MPEG is streamed over lossy networks. MPEG improved intracoding also by adding a quantization matrix, and improved intercoding by allowing motion estimation at half pel resolution. Any number of consecutive macroblocks, in scan order, can be grouped into a slice. Slices are begun with unique slice start codes, which can serve as resynchronization points.

In addition to providing a video compression standard, MPEG provided also an audio compression standard, and a systems standard. MPEG video can be carried either as a video elementary stream or as a program stream.

5.2.3 MPEG-2

Work on MPEG-2 began in 1990 and the video coding portion became an international standard in November 1994, entitled "Generic coding of moving pictures and associated audio," and standardized as ISO/IEC Committee Draft 13818 and ITU-T H.262. MPEG-2 was targeted at higher bit rate applications than MPEG-1, including standard definition television (SDTV) and high definition television (HDTV).

MPEG-2 builds on MPEG-1 coding techniques by adding tools for interlaced picture coding and methods of scalability. MPEG-2 was the first standard to introduce the concept of profiles and levels, to describe interoperability points. Each profile includes a group of tools that compliant decoders must support. Each level provides limitations of pixel dimensions and frame rates that a decoder must support. MPEG-2 defined seven profiles: Simple, Main, SNR, Spatial, High, 4:2:2, and Multi. MPEG-1 defined four levels: High, High1440, Main, and Low.

The methods of scalability that MPEG-2 provides are spatial scalability, SNR scalability, temporal scalability, and data partitioning. Scalable video encoding techniques can be of great use for video streaming when used in conjunction with Unequal Error Protection (UEP), as described in Section 5.4 of this chapter. The bit rates used in MPEG-2 video coding are generally higher than are used for Internet or wireless video streaming.

5.2.4 H.263

Design of ITU-T H.263, "Video coding for low bit rate communication," began in 1993, and the Version 1 standard was published in March 1996. H.263 was designed as an extension of H.261, and greatly increased compression efficiency over H.261. H.261 added some of the tools from MPEG-1 and MPEG-2, as well as some original tools. The tools added to H.261 that improve coding efficiency include half pel motion compensation, median prediction of motion vectors, improved entropy coding, unrestricted motion vectors, and more efficient coding of Macroblock and block signaling overhead.

Version 2, also called H.263+, was standardized in September 1997. Version 3, or H263++, was standardized in January 1998. Version 2 added several features for error resilience, including a slice-structured mode, reference picture selection, and temporal, spatial, and SNR scalability. Version 3 added data partitioning and reversible variable length coding for additional error resilience.

H.263 is commonly used in videoconferencing over dedicated telecommunications lines, as well as over IP.

5.2.5 MPEG-4

Design of the MPEG-4 standard, "Coding of audio-visual objects," began in 1993. Its initial version, ISO/IEC 14496, was finalized in October 1998 and became an international standard in the first months of 1999. The fully backward compatible extensions under the title of MPEG-4 Version 2 were frozen at the end of 1999, and achieved formal international standard status in early 2000.

Relative to the preexisting video compression standards, MPEG-4 added object-based coding and improved video compression efficiency. According to Koenen, ^[3] MPEG-4 provides standardized ways to:

Represent units of aural, visual, or audiovisual content, called "media objects." These media objects can be of natural or synthetic origin, which means they could be recorded with a camera or microphone, or generated with a computer.
Describe the composition of these objects to create compound media objects that form audiovisual scenes.
Multiplex and synchronize the data associated with media objects, so that they can be transported over network channels providing a QoS appropriate for the nature of the specific media objects.
Interact with the audiovisual scene generated at the receiver's end.

MPEG-4 provides many profiles; for natural video alone, there are 11 profiles:

Simple Visual Profile
Simple Scalable Visual Profile
Core Visual Profile
Main Visual Profile N-Bit Visual Profile
Advanced Real-Time Simple Profile
Core Scalable Profile
Advanced Coding Efficiency
Advanced Simple Profile
Fine Granularity Scalability Profile
Simple Studio Profile
Core Studio Profile

Because of the large number of profiles for MPEG-4, interoperability has been difficult. The most commonly used profile is the Simple Profile.

MPEG-4 has several tools to improve error resilience, including reversible variable length coding and several methods of scalability. Fine Grain Scalability, in particular, is well suited for use with Unequal Error Protection for video streaming over lossy networks. Li ^[4] describes MPEG-4 Fine Grain Scalability in detail, and compares its use with SNR scalability and simulcast.

MPEG4IP ^[5] is an open source package designed to enable developers to create streaming servers and clients that are standards-based and free from proprietary technology. MPEG4IP uses the MPEG-4 Simple Profile.

5.2.6 JVT

In 2001, ISO and ITU-T joined forces to develop the JVT (Joint Video Team) standard. This effort was originally begun in the ITU-T as H.26L. Committee Draft status was reached in May 2002. JVT is scheduled to become an international standard in February 2003, and called H.264 by the ITU and MPEG-4 Part 10 by ISO.

JVT provides many of the tools found in H.263 and its extended versions H.263+ and H.263++, but at an improved coding efficiency. JVT is claimed to provide the same visual quality as MPEG-4 Advanced Simple Profile at half the bit rate. ^[6] JVT uses 4 4 block integer transform and motion blocks of a variety of sizes. JVT's May 2002 Committee Draft defines two profiles: Baseline and Main.

JVT's May 2002 Committee Draft does not include scalability, although it is intended for use in video streaming applications. Flexible Macroblock Ordering can improve performance over lossy networks, by allowing slices to be formed from nonneighboring macroblocks; in other words, to put neighboring macroblocks into different slices. Therefore, if one slice is unavailable at the decoder due to packet loss, neighboring macroblocks from other slices can be used to perform spatial concealment of the missing data. In JVT, pictures not used to predict other pictures are known as disposable pictures and are indicated in picture headers. In previous coding standards, B pictures were the only pictures to have this characteristic, while in JVT bipredictively coded pictures are not required to be disposable. Indication of the disposable nature of a picture in the picture header effectively allows temporal scalability, which can be used with Unequal Error Protection.

Table 5.1 provides a list of the video compression standards and the bit rate ranges that they were originally designed for. All of these video compression standards share the property that they use interframe prediction. A video frame is predicted from a previous frame, and only the differences are transmitted. This means that if transmission errors occur, the errors will persist for many frames. In general, macroblocks or entire frames are intracoded at regular intervals to limit the length of time an error can persist.

Table 5.1: Video Compression Standards
Standard	Bit Rate Range
H.261	64 to 384 kbps
H.263	64 kbps to 1 Mbps
MPEG-1	1 to 1.5 Mbps
MPEG-2	2 to 15 Mbps
MPEG-4	64 kbps to 2 Mbps
JVT	32 kbps to ?

^[1]Puri, A. and Chen, T., Multimedia Systems, Standards, and Networks, Marcel Dekker, New York, 2000.

^[2]Rao, K. and Hwang, J., Techniques and Standards for Image, Video and Audio Coding, Prentice Hall, New York, 1996.

^[3]Koenen, R., Overview of the MPEG-4 standard, ISO/IEC JTC1/SC29/WG11 N4668, March 2002.

^[4]Li, W., Overview of fine granularity scalability in MPEG-4 video standard, IEEE Circuits Syst. Video Technol., 11 (3), 301–317, 2001.

^[5]MPEG4IP: open source, open standards, open streaming, http://www.mpeg4ip.net.

^[6]Wiegand, T., JVT coding, Workshop on multimedia convergence (IP Cablecom/MEDIACOM 2004/Interactivity in Multimedia), ITU Headquarters, Geneva, Switzerland, March 12–15, 2002, www.itu.int/itudoc/itu-t/workshop/converge/s6am-p3_pp4.ppt.