2.12 Introduction to Compression

Compression, bit rate reduction and data reduction are all terms meaning basically the same thing in this context. In essence the same (or nearly the same) information is carried using a smaller quantity or rate of data. In transmission systems, compression allows a reduction in bandwidth and will generally result in a reduction in cost to make possible some process that would be impracticable without it. If a given bandwidth is available to an uncompressed signal, compression allows faster than real-time transmission in the same bandwidth. If a given bandwidth is available, compression allows a better quality signal in the same bandwidth.

Compression is summarized in Figure 2.34. It will be seen in (a) that the data rate is reduced at source by the compressor. The compressed data is then passed through a communication channel and returned to the original rate by the expander. The ratio between the source data rate and the channel data rate is called the compression factor. The term coding gain is also used. Sometimes a compressor and expander in series are referred to as a compander. The compressor may equally well be referred to as a coder and the expander a decoder in which case the tandem pair may be called a codec.

Figure 2.34: In (a) a compression system consists of compressor or coder, a transmission channel and a matching expander or decoder. The combination of coder and decoder is known as a codec. (b) MPEG is asymmetrical since the encoder is much more complex than the decoder.

In audio and video compression, where the encoder is more complex than the decoder the system is said to be asymmetrical as in (b). The encoder needs to be algorithmic or adaptive whereas the decoder is 'dumb' and only carries out actions specified by the incoming bitstream. This is advantageous in applications such as broadcasting where the number of expensive complex encoders is small but the number of simple inexpensive decoders is large. In point-to-point applications the advantage of asymmetrical coding is not so great.

Although there are many different coding techniques, all of them fall into one or other of these categories. In lossless coding, the data from the expander is identical bit for bit with the original source data. Lossless coding is generally restricted to compression factors of around 2:1. A lossless coder cannot guarantee a particular compression factor and the channel used with it must be able to function with the variable output data rate.

In lossy coding data from the expander is not identical bit for bit with the source data and as a result comparison of the input with the output is bound to reveal a difference. Lossy codecs are not suitable for computer data, but are used in MPEG ¹⁶ as they allow greater compression factors than lossless codecs. Successful lossy codecs are those in which the errors are arranged so that a human viewer or listener finds them subjectively difficult to detect. Thus lossy codecs must be based on an understanding of psycho-acoustic and psycho -visual perception and are often called perceptive codes.

In perceptive coding, the greater the compression factor required, the more accurately must the human senses be modelled. Perceptive coders can be forced to operate at a fixed compression factor. This is convenient for practical transmission applications where a fixed data rate is easier to handle than a variable rate. Source data that results in poor compression factors on a given codec is described as difficult. The result of a fixed compression factor is that the subjective quality can vary with the 'difficulty' of the input material. Perceptive codecs should not be concatenated indiscriminately especially if they use different algorithms.

Although the adoption of digital techniques is recent, compression itself is as old as television. Figure 2.35 shows some of the compression techniques used in traditional television systems.

Figure 2.35: Compression is a old as television. (a) Interlace is a primitive way of halving the bandwidth. (b) Colour difference working invisibly reduces colour resolution. (c) Composite video transmits colour in the same bandwidth as monochrome.

One of the oldest techniques is interlace, which has been used in analog television from the very beginning as a primitive way of reducing bandwidth. Interlace is not without its problems, particularly in motion rendering. MPEG-2 supports interlace simply because legacy interlaced signals exist and there is a requirement to compress them. This should not be taken to mean that it is a good idea.

The generation of colour difference signals from RGB in video represents an application of perceptive coding. The human visual system (HVS) sees no change in quality although the bandwidth of the colour difference signals is reduced. This is because human perception of detail in colour changes is much less than in brightness changes. This approach is sensibly retained in MPEG.

Composite video systems such as PAL, NTSC and SECAM are all analog compression schemes that embed a subcarrier in the luminance signal so that colour pictures are available in the same bandwidth as monochrome. In comparison with a progressive scan RGB picture, interlaced composite video has a compression factor of 6:1.

In a sense MPEG-2 can be considered to be a modern digital equivalent of analog composite video as it has most of the same attributes. For example, the eight-field sequence of PAL subcarrier that makes editing difficult has its equivalent in the GOP ( group of pictures) of MPEG.

In a PCM digital system the bit rate is the product of the sampling rate and the number of bits in each sample and this is generally constant. Nevertheless the information rate of a real signal varies. In all real signals, part of the signal is obvious from what has gone before or what may come later and a suitable receiver can predict that part so that only the true information actually has to be sent. If the characteristics of a predicting receiver are known, the transmitter can omit parts of the message in the knowledge that the receiver has the ability to recreate it. Thus all encoders must contain a model of the decoder.

The difference between the information rate and the overall bit rate is known as the redundancy. Compression systems are designed to eliminate as much of that redundancy as practicable or perhaps affordable. One way in which this can be done is to exploit statistical predictability in signals. The information content or entropy of a sample is a function of how different it is from the predicted value. Most signals have some degree of predictability. A sine wave is highly predictable, because all cycles look the same. According to Shannon's theory, any signal that is totally predictable carries no information. In the case of the sine wave this is clear because it represents a single frequency and so has no bandwidth.

At the opposite extreme a signal such as noise is completely unpredictable and as a result all codecs find noise difficult. There are two consequences of this characteristic. First, a codec designed using the statistics of real material should not be tested with random noise because it is not a representative test. Second, a codec that performs well with clean source material may perform badly with source material containing superimposed noise. Most practical compression units require some form of preprocessing before the compression stage proper and appropriate noise reduction should be incorporated into the pre-processing if noisy signals are anticipated. It will also be necessary to restrict the degree of compression applied to noisy signals.

All real signals fall mid-way between the extremes of total predictability and total unpredictability or noisiness. If the bandwidth (set by the sampling rate) and the dynamic range (set by the word length) of the transmission system are used to delineate an area, this sets a limit on the information capacity of the system. Figure 2.36(a) shows that most real signals only occupy part of that area. The signal may not contain all frequencies, or it may not have full dynamics at certain frequencies.

Figure 2.36: (a) A perfect coder removes only the redundancy from the input signal and results in subjectively lossless coding. If the remaining entropy is beyond the capacity of the channel some of it must be lost and the codec will then be lossy. An imperfect coder will also be lossy as it falls to keep all entropy. (b) As the compression factor rises, the complexity must also rise to maintain quality. (c) High compression factors also tend to increase latency or delay through the system.

Entropy can be thought of as a measure of the actual area occupied by the signal. This is the area that must be transmitted if there are to be no subjective differences or artefacts in the received signal. The remaining area is called the redundancy because it adds nothing to the information conveyed. Thus an ideal coder could be imagined which miraculously sorts out the entropy from the redundancy and only sends the former. An ideal decoder would then recreate the original impression of the information quite perfectly .

As the ideal is approached, the coder complexity and the latency or delay both rise. Figure 2.36(b) shows how complexity increases with compression factor and (c) shows how increasing the codec latency can improve the compression factor. Obviously we would have to provide a channel that could accept whatever entropy the coder extracts in order to have transparent quality. As a result, moderate coding gain that only removes redundancy need not cause artefacts and results in systems described as subjectively lossless.

If the channel capacity is not sufficient for that, then the coder will have to discard some of the entropy and with it useful information. Larger coding gain that removes some of the entropy must result in artefacts. It will also be seen from Figure 2.36 that an imperfect coder will fail to separate the redundancy and may discard entropy instead, resulting in artefacts at a suboptimal compression factor.

A single variable-rate transmission is unrealistic in broadcasting where fixed channel allocations exist. The variable-rate requirement can be met by combining several compressed channels into one constant rate transmission in a way that flexibly allocates data rate between the channels. Provided the material is unrelated, the probability of all channels reaching peak entropy at once is very small and so those channels that are at one instant passing easy material will free up transmission capacity for those channels that are handling difficult material. This is the principle of statistical multiplexing.

Lossless codes are less common for audio and video coding where perceptive codes are permissible. The perceptive codes often obtain a coding gain by shortening the word length of the data representing the signal waveform. This must increase the noise level and the trick is to ensure that the resultant noise is placed at frequencies where human senses are least able to perceive it. As a result although the received signal is measurably different from the source data, it can appear the same to the human listener or viewer at moderate compression factors. As these codes rely on the characteristics of human sight and hearing, they can only fully be tested subjectively.

The compression factor of such codes can be set at will by choosing the bit rate of the compressed data. Whilst mild compression will be undetectable, with greater compression factors, artefacts become noticeable. Figure 2.36 shows this to be inevitable from entropy considerations.