Digital video compression techniques have played an important role in the world of telecommunication and multimedia systems where bandwidth is still a valuable commodity. Hence, video coding techniques are of prime importance for reducing the amount of information needed for a picture sequence without losing much of its quality, judged by the human viewers. Modern compression techniques involve very complex electronic circuits and the cost of these can only be kept to an acceptable level by high volume production of LSI chips. Standardisation of the video compression techniques is therefore essential.
Straightforward PCM coding of TV signals at 140 Mbit/s was introduced in the 1970s. It conformed with the digital hierarchy used mainly for multichannel telephony but the high bit rate restricted its application to TV programme distribution and studio editing. Digital TV operation for satellites became attractive since the signals were compatible with the time-division multiple access systems then coming into use. Experimental systems in 1980 used bit rates of about 45 Mbit/s for NTSC signals and 60 Mbit/s for the PAL standard.
An analogue videophone system tried out in the 1960s had not proved viable, but by the 1970s it was realised that visual speaker identification could substantially improve a multiparty discussion and videoconference services were considered. This provided the impetus for the development of low bit rate video coding. With the available technology in the 1980s, COST211 video codec, based on differential pulse code modulation (DPCM) was standardised by CCITT, under the H.120 standard. The codec's target bit rate was at 2 Mbit/s for Europe and 1.544 Mbit/s for North America, suitable for their respective first levels of digital hierarchy. However, the image quality, although having very good spatial resolution (due to the nature of DPCM working on pixel-by-pixel bases), had a very poor temporal quality. It was soon realised that in order to improve the image quality, without exceeding the target bit rate, less than one bit should be used to code each pixel. This was only possible if a group of pixels were coded together, such that the bit per pixel is fractional. This led to the design of so-called block-based codecs.
During the late 1980s study period, of the 15 block-based videoconferencing proposals submitted to the telecommunication standardisation sector of the International Telecommunication Union (ITU-T formerly CCITT), 14 were based on the discrete cosine transform (DCT) and only one on vector quantisation (VQ). The subjective quality of video sequences presented to the panel showed hardly any significant differences between the two coding techniques. In parallel to the ITU-T's investigation during 1984–1988, the Joint Photographic Experts Group (JPEG) was also interested in compression of static images. They chose the DCT as the main unit of compression, mainly due to the possibility of progressive image transmission. JPEG's decision undoubtedly influenced the ITU-T in favouring DCT over VQ. By now there was worldwide activity in implementing the DCT in chips and on DSPs.
By the late 1980s it was clear that the recommended ITU-T videoconferencing codec would use a combination of interframe DPCM for minimum coding delay and the DCT. The codec showed greatly improved picture quality over H.120. In fact, the image quality for videoconferencing applications was found reasonable at 384 kbit/s or higher and good quality was possible at significantly higher bit rates of around 1 Mbit/s. This effort, although originally directed at video coding at 384 kbit/s, was later extended to systems based on multiples of 64 kbit/s (p × 64 kbit, where p can take values from 1 to 30). The standard definition was completed in late 1989 and is officially called the H.261 standard (the coding method is often referred to as 'p × 64').
The success of H.261 was a milestone for low bit rate coding of video at reasonable quality. In the early 1990s, the Motion Picture Experts Group (MPEG) started investigating coding techniques for storage of video, such as CD-ROMs. The aim was to develop a video codec capable of compressing highly active video such as movies, on hard discs, with a performance comparable to that of VHS home video cassette recorders (VCRs). In fact, the basic framework of the H.261 standard was used as a starting point in the design of the codec. The first generation of MPEG, called the MPEG-1 standard, was capable of accomplishing this task at 1.5 Mbit/s. Since, for storage of video, encoding and decoding delays are not a major constraint, one can trade delay for compression efficiency. For example, in the temporal domain a DCT might be used rather than DPCM, or DPCM used but with much improved motion estimation, such that the motion compensation removes temporal correlation. This latter option was adopted within MPEG-1.
It is ironic that in the development of H.261, motion compensation was thought to be optional, since it was believed that after motion compensation little was left to be decorrelated by the DCT. However, later research showed that efficient motion compensation can reduce the bit rate. For example, it is difficult to compensate for the uncovered background, unless one looks ahead at the movement of the objects. This was the main principle in MPEG-1, where the motion in most picture frames is looked at from past and future, and this proved to be very effective.
These days, MPEG-1 decoders/players are becoming commonplace for multimedia on computers. MPEG-1 decoder plug-in hardware boards (e.g. MPEG magic cards) have been around for a few years, and now software MPEG-1 decoders are available with the release of new operating systems or multimedia extensions for PC and Mac platforms. Since in all standard video codecs only the decoders have to comply with proper syntax, software-based coding has added extra flexibility that might even improve the performance of MPEG-1 in the future.
Although MPEG-1 was optimised for typical applications using noninterlaced video of 25 frames/s (in European format) or 30 frames/s (in North America) at bit rates in the range of 1.2-1.5 Mbit/s (for image quality comparable to home VCRs), it can certainly be used at higher bit rates and resolutions. Early versions of MPEG-1 for interlaced video, such as those used in broadcast, were called MPEG-1+. Broadcasters, who were initially reluctant to use any compression on video, fairly soon adopted a new generation of MPEG, called MPEG-2, for coding of interlaced video at bit rates of 4–9 Mbit/s. MPEG-2 is now well on its way to making a significant impact in a range of applications such as digital terrestrial broadcasting, digital satellite TV, digital cable TV, digital versatile disc (DVD) and many others. In November 1998, OnDigital of the UK started terrestrial broadcasting of BBC and ITV programmes in MPEG-2 coded digital forms, and almost at the same time several satellite operators such as Sky-Digital launched MPEG-2 coded television pictures direct to homes.
Since in MPEG-2 the number of bidirectionally predicted pictures is at the discretion of the encoder, this number may be chosen for an acceptable coding delay. This technique may then be used for telecommunication systems. For this reason ITU-T has also adopted MPEG-2 under the generic name of H.262 for telecommunications. H.262/MPEG-2, apart from coding high resolution and higher bit rate video, also has the interesting property of scalability, such that from a single MPEG-2 bit stream two or more video images at various spatial, temporal or quality resolutions can be extracted. This scalability is very important for video networking applications. For example, in applications such as video on demand, multicasting etc., the client may wish to receive video of his/her own quality choice, or in networking applications during network congestion less essential parts of the bit stream can be discarded without significantly impairing the received video pictures.
Following the MPEG-2 standard, coding of high definition television (HDTV) was seen to be the next requirement. This became known as MPEG-3. However, the versatility of MPEG-2, being able to code video of any resolution, left no place for MPEG-3, and hence it was abandoned. Although Europe has been slow in deciding whether to use HDTV, broadcast of HDTV with MPEG-2 compression in the USA has already started. It is foreseen that in the USA by the year 2014 the existing transmission of analogue NTSC video will cease and HDTV/MPEG-2 will be the only terrestrial broadcasting format.
After so much development on MPEG-1 and 2, one might wonder what is next. Certainly we have not yet addressed the question of sending video at very low bit rates, such as 64 kbit/s or less. This of course depends on the demand for such services. However, there are signs that in the very near future such demands may arise. For example, currently, owing to a new generation of modems allowing bit rates of 56 kbit/s or so over public switched telephone networks (PSTN), videophones at such low bit rates are needed. In the near future there will be demands for sending video over mobile networks, where the channel capacity is very scarce. In fact, the wireless industry is the main driving force behind the low bit rate image/video compression. For instance, during the months of June and July 2002, about two million picture-phone sets were sold in Japan alone. A picture-phone is a digital photo camera that grabs still pictures, compresses and sends them as a text file over the mobile network. On the video front, in October 2002 the Japanese company NTT DoCoMo announced the launch of her first handheld mobile video codec. The codec is the size of today's mobile phones, at a price of almost 350 US dollars.
To fulfil this goal, the MPEG group started working on a very low bit rate video codec, under the name of MPEG-4. Before achieving acceptable image quality at such bit rates, new demands arose. These were mainly caused by the requirements of multimedia, where there was a considerable demand for coding of multiviewpoint scenes, graphics and synthetic as well as natural scenes. Applications such as virtual studio and interactive video were the main driving forces. Ironically, critics say that since MPEG-4 could not deliver the very low bit rate codec that had been promised the goal posts have been moved.
Work on very low bit rate systems, due to the requirement of PSTN and mobile applications, was carried out by ITU-T, and a new video codec named H.263 has been devised to fulfil the goal of MPEG-4. This codec, which is an extension of H.261, but uses lessons learned from MPEG developments, is sophisticated enough to code small dimensioned video pictures at low frame rates within 10–64 kbit/s. Over the years the compression efficiency of this codec has been improved steadily through several iterations and amendments. Throughout its evolution the codec has then been renamed H.263+ and H.263++ to indicate the improvements. Due to a very effective coding strategy used in this codec, the recommendation even defines the application of this codec to very high resolution images such as HDTV, albeit at higher bit rates.
Before leaving the subject of MPEG-4, I should add that today's effort on MPEG-4 is on functionality, since this is what makes MPEG-4 distinct from other coders. In MPEG-4 images are coded as objects, and the generated bit stream is scalable. This provides the possibility of interacting with video, choosing the parts that are of interest. Moreover, natural images can be mixed with synthetic video, in what is called virtual studio. MPEG-4 defines a new coding method based on models of objects for coding synthetic objects. It also uses the wavelet transform for coding of still images. However, MPEG-4, as part of its functionality for coding of natural images, uses a similar technique to H.263; hence it is now equally capable of coding video at very low bit rates.
The fruitful outcome of the MPEG-2/H.262 video codec product under the joint effort of the MPEG and ITU encouraged the two standards bodies in further collaboration. In 1997 the ISO/IEC MPEG group joined the video coding experts group of the ITU-T and formed a Joint Video Team (JVT) to work on very low bit rate video. The project was called H.26L, with L standing for the long-term objectives. The JVT objective is to create a single video coding standard to outperform the most optimised H.263 and MPEG-4 video codecs. The H.26L development is an ongoing activity, with the first version of the standard finalised at the end of the year 2002. In the end the H.26L codec will be called H.264 by the ITU-U community and MPEG-4 version 10 by the ISO/IEC MPEG group.
As we see the video coding standards have evolved under the two brand names of H.26x and MPEG-x. The H.26x codecs are recommended by the telecommunication standardisation sector of the International Telecommunication Union (ITU-T). The ITU-T recommendations have been designed for telecommunications applications, such as videoconferencing and videotelephony. The MPEG-x products are the work of the International Standardisation Organisation and the International Electrotechnical Commission, Joint Technical Committee number 1 (ISO/IEC JTC1). The MPEG standards have been designed mostly to address the needs of video storage (e.g. CD-ROM, DVD), broadcast TV and video streaming (e.g. video Internet). For the most part the two standardisation committees have worked independently on different standards. However, there were exceptions, where their joint work resulted in standards such as H.262/MPEG-2 and the H.26L. Figure 1.1 summarises the evolution of video coding standards by the two organisations and their joint effort from the beginning of 1984 until now (2003). The Figure also shows the evolution of still image coding under the joint work of ITU and ISO/IEC, which is best known as the JPEG group.
Figure 1.1: Evolution of video coding standards by the ITU-T and ISO/IEC committees
It should be noted that MPEG activity is not just confined to the compression of audio-visual contents. The MPEG committee has also been active in the other aspects of audio-visual information. For example, work on object or content-based coding of MPEG-4 has brought new requirements, in particular searching for content in image databases. Currently, a working group under MPEG-7 has undertaken to study these requirements. The MPEG-7 standard builds on the other standards, such as MPEG-1, 2 and 4. Its main function is to define a set of descriptors for multimedia databases to look for specific image/video clips, using image characteristics such as colour, texture and information about the shape of objects. These pictures may be coded by either of the standard video codecs, or even in analogue forms.
With the advances made on content description in MPEG-7 and coding and compression of contents under MPEG-4, it is now the time to provide the customers with efficient access to these contents. There is a need to produce specifications of standardised interfaces and protocols that allow customers to access the wide variety of content providers. This is the task undertaken by the MPEG-21, under the name of multimedia framework.
In this book, we start by reviewing briefly the basics of video, including scanning, formation of colour components at various video formats and quality evaluation of video. At the end of each chapter a few problems have been designed, either to cover some specific parts of the book in greater depth, or for a better appreciation of those parts. Principles of video compression techniques used in the standard codecs are given in Chapter 3. These include the three fundamental elements of compression: spatial, temporal and intersymbol redundancy reductions. The discrete cosine transform (DCT), as the core element of all the standard codecs, and its fast implementation are presented. Quantisation of the DCT coefficients for bit rate reduction is given. The most important element of temporal redundancy reduction, namely motion compensation, is discussed in this Chapter. Two variable length coding techniques for reduction of the entropy of the symbols, namely Huffman and arithmetic coding, are described. Special attention is paid to the arithmetic coding, because of its role and importance in most recent video codecs. The Chapter ends with an overview of a generic interframe video codec, which is used as a generic codec in the following chapters to describe various standard codecs.
Due to the importance of wavelet coding in the new generation of standard codecs, Chapter 4 is specifically devoted to the description of the basic principles of wavelet-based image coding. The three well known techniques for compression of wavelet-based image coding (EZW, SPIHT and EBCOT) are presented. Their relative compression efficiencies are compared against each other.
Coding of still pictures, under the Joint Photographic Experts Group (JPEG), is presented in Chapter 5. Lossless and lossy compression versions of JPEG are described, as is baseline JPEG and its extension with sequential and progressive modes. The Chapter also includes a new standard for still image coding, under JPEG2000. Potential for improving the picture quality under this new codec and its new functionalities are described.
Chapter 6 describes the H.261 video codec for teleconferencing applications. The structure of picture blocks and the concept of the macroblock as the basic unit of coding are defined. Selection of the best macroblock type for efficient coding is presented. The Chapter examines the efficiency of zigzag scanning of the DCT coefficients for coding. The efficiency of two-dimensional variable length coding of zigzag scanned DCT coefficients is compared with one-dimensional variable length codes.
Chapter 7 explains the MPEG-1 video coding technique for storage applications. The concept of group of pictures for flexible access to compressed video is explained. Differences between MPEG-1 and H.261, as well as the similarities, are highlighted. These include the nature of motion compensation and various forms of coding of picture types used in this codec. Editing, pause, fast forward and fast reverse picture tricks are discussed.
Chapter 8 is devoted to coding of high quality moving pictures with the MPEG-2 standard. The concept of level and profile with its applications are defined. The two main concepts of interlacing and scalability, which discriminate this codec from MPEG-1, are given. The best predictions for the nonscalable codecs from the fields, frames and/or their combinations are discussed. On the scalability, the three fundamental scalable codecs, spatial, SNR and temporal scalable codecs are analysed, and the quality of some pictures coded using these methods is contrasted. Layered coding is contrasted against scalability and the additional overhead due to scalability/layering is also compared against the nonscalable encoder. The Chapter ends with transmission of MPEG-2 coded video for broadcast applications and video over ATM networks, as well as its storage on the digital versatile disc (DVD).
Chapter 9 discusses H.263 video coding for very low bit rate applications. The fundamental differences and similarities between this codec and H.261 and MPEG-1/2 are highlighted. Special interest is paid to the importance of motion compensation in this codec. Methods of improving the compression efficiency of this codec under various optional modes are discussed. Rather than describing all the annexes (optional modes) one by one, we have tried to group them into meaningful categories, and the importance of each category as well as the option itself is described. The relative compression performance of this codec with a limited option is compared with the other video codecs. Since H.263 is an attractive video encoding tool for mobile applications, special interest is paid to the transmission of H.263 coded video over unreliable channels. In this regard, error correction for transmission of video over mobile networks is discussed. Methods of improving the robustness of the codecs against channel errors are given, and methods for post processing and concealment of erroneous video are explained. Finally, the Chapter ends with the introduction of an improved successor of this codec under the name of H.26L. This codec is the joint work of the ITU-T and MPEG video coding expert groups and will be called H.264 by the ITU and MPEG-4 version 10 by the MPEG group.
In Chapter 10 a new method of video coding based on the image content is presented. The level and profiles set out for this codec are outlined. The concept of image plane that enables users to interact with the individual objects and change their characteristics is introduced. Methods for segmenting video frames into objects and their extractions are explained. Coding of arbitrary shaped objects with a particular emphasis on coding of their shapes is studied. The shape-adaptive DCT as a natural coding scheme for these objects is analysed.
Coding of synthetic objects with model-based coding and still images with the wavelet transform is introduced. It is shown how video can be coded with the wavelet transform, and its quality is compared against H.263. Performance of frame-based MPEG-4 is also compared against the H.263 for some channel error rates, using mobile and fixed network environments. The Chapter ends with scalability defined for content-based coding.
The book ends with a Chapter on content description, search and video browsing under the name MPEG-7. Various content search methods exploiting visual information such as colour, texture, shape and motion are described. Some practical examples for video search by textures and shapes are given. The Chapter ends with a brief look at the multimedia framework under MPEG-21, to define standards for easy and efficient use of contents by customers.