10.12 MPEG-4 versus H.263

There is a strong similarity between the simple profile MPEG-4 and H.263, such that a simple profile MPEG-4 can decode a bit stream generated by the core H.263. In fact most of the annexes introduced into H.263 after the year 2000 were parts of the profiles designed for MPEG-4 in the mid 1990s. However, there is no absolute reverse compatibility between the simplest profile of MPEG-4 and the H.263.

The main source of incompatibility is concerned with the systems function of MPEG-4. Like its predecessor, MPEG-4 uses a packet structure form of transporting compressed audio-visual data. In MPEG-2 we saw that packetised elementary streams (PES) of audio, video and data are multiplexed into a transport packet stream. In MPEG-4, due to the existence of many objects (up to 256 visual objects plus audio and data), the transport stream becomes significantly important. Even if one object is coded (frame-based mode), the same transpost stream should be used. In particular, for transmission of MPEG-4 video over mobile networks, the packetised system has to add extra error resilience to the MPEG-4 bit stream. It is this packetised format of MPEG-4 that makes it nondecodable by an H.263 codec.

In MPEG-4 the elementary objects of the audio-visual media are fed to the synchronisation layer (SL), to generate SL-packets. An SL-packet has a resynchronisation marker, similar to the GOB and slice resynchronisation markers in H.263. However, in MPEG-4 the resynchronisation marker is inserted after a certain number of bits are coded, but in H.263 they are added after several macroblocks. Since the number of bits generated per macroblock is variable, then while distances between the resynchronisation markers in H.263 are variable, those of MPEG-4 are fixed. This helps to reduce the effect of channel errors in MPEG-4, hence making it more error resilient.

A copy of the picture header is repeated in every SL-packet, to make them independent of each other. Each medium may generate more than one SL-packet. For example, if scalability is used, several interrelated packets are generated. Packets are then multiplexed into the output stream. This layer is unaware of the transport or delivery layer. The sync layer is interfaced to the delivery layer through the delivery multimedia integration framework (DMIF) application interface (DAI). The DAI is network independent but demands session setup and stream control functions. It also enables setting up quality of service for each stream.

The transport or delivery layer is delivery aware but unaware of media. MPEG-4 does not define any specific delivery layer. It relies mainly on the existing transport layers, such as RTP for Internet, MPEG-2 transport stream for wired and wireless or ATM for B-ISDN networks.

There are also some small differences on the employment of the coding tools in the two codecs. For example, reversible VLC used in the simple profile of MPEG-4 is not exactly the same as the RVLC used in data partitioning of Annex V of H.263. In the former, RVLC is also used for DCT coefficients, and in the latter it is used for the nonDCT coefficients, e.g. motion vectors, macroblock addresses etc. Although in Chapter 9 we showed that RVLC due to its higher overhead over the conventional VLC is not viable for the DCT-coefficients, nevertheless in the experiments of Chapter 9 macroblocks of the P-pictures were all interframe coded. Had any macroblock been intraframe coded, its retrieval through RVLC would have improved the picture quality.

These differences are significant enough to ask which one of H.263 and MPEG-4 might be suitable for video over mobile networks. In the late 1990s, an international collaboration under the project 3gpp (3rd generation partnership project) set up an extensive investigation to compare the performance of these two codecs [24]. The outcome of one of the experiments is shown in Figure 10.38.

click to expand
Figure 10.38: Comparison between MPEG-4 and H.263

In this experiment wireless channels of 64 kbit/s were used, of which 7.6 kbit/s were assigned to the speech signal. The remaining bits were the target video bit rates for coding of the overtime video test sequence for the two codecs, including the overheads. The channel errors were set to 10^-6, 2 × 10^-4 and 10^-3, for both fixed and mobile sets, identified by F and M in the Figure, respectively. The H.263 codec is equipped with annexes D, F, I, J and N (see list of H.263 annexes in Chapter 9 [25]) and that of MPEG-4 was the simple profile (see section 10.1). The performance of the reconstructed pictures after error concealment was subjectively evaluated. The average of the viewers' scores (see section 2.4) as the mean opinion score (MOS) is plotted in Figure 10.38 against the tested error rates.

At the low error rate of 10^-6, H.263 outperforms MPEG-4 for both mobile and fixed environments. Thus it can be concluded that H.263 is a more compression efficient encoder than MPEG-4. Much of this is due to the use of RVLC in the MPEG-4, as we have seen in Table 9.3, and RVLC is less compression efficient than MPEG-4. Other overheads, such as packet header and more frequent resynchronisation markers, add to this overhead. On the other hand, at high error rates, the superior error resilience of MPEG-4 over H.263 compensates for compression inefficiency and in some cases the decoded video under MPEG-4 is perceived to be better than H.263.

Considering the above analysis, one cannot for sure say whether H.263 is better or worse than MPEG-4. In fact what makes a bigger impact on the experimental results is not the basic H.263 or MPEG-4 definitions but the annexes or optionalities of these two codecs. For example, had data partitioning (Annex V) been used in H.263, it would have performed as well as MPEG-4 at high error rates. Unfortunately, data partitioning in H.263 was introduced in 2001, and this study was carried out in 1997. Hence the 3ggp project fails to show the true difference between the H.263 and MPEG-4.