6.8 Handling Differences in Audio Signal Rate and Resolution

Because of the variety of sampling rates in use in digital audio and the increasing use of audio sample resolutions beyond 16 bits it is important to ensure that the audio signal retains optimum sound quality when it is converted digitally from one rate or resolution to another. The question of sample rate conversion has already been covered to some extent earlier in this chapter, since it is closely related to the topic of synchronization. There is little more to be said here except to state that normally it is impossible to interconnect two devices digitally whose sampling rates differ by more than a tiny amount from one another, requiring that a sample rate convertor be used between the two. The question of differences in sample word length, though, will be covered in more detail.

The standard two-channel interface allows for up to 24 bits of audio per sample, linearly encoded in fixed point form. Until recently only 16 of these were normally used, with the remaining bits set to zero and the MSB of the 16-bit sample in the bit 27 position, but the question now arises as to how to cope with signals of, say, 18- or 20-bit resolution when they are digitally connected to devices of lower resolution. A number of techniques can be used to process, say, a 20-bit signal to reduce its resolution to 16 bits. These range from straightforward truncation, through bit-shifting and redithering at the new resolution, to developments which involve intelligent rounding of the truncation error by noise shaping. In future it may be that professional digital audio devices will incorporate internal intelligent procedures to handle signals of a higher resolution than their internal architecture allows, but at the moment it is normally necessary to employ external processing of some kind at such a juncture.

Truncation is the worst possible solution and involves simply losing the least significant bits of the word. Without redithering the result of truncation is very unpleasant low-level distortion. If a 20-bit source were connected digitally to a 16-bit destination without any intermediate processing the result would normally be the straightforward truncation of the four LSBs.

The addition of dither noise in the digital domain at the point where resolution is reduced is a suitable means of improving the distortion situation and this has been implemented on some digital interface processors and in professional digital mixers. Some editors also have various dithering algorithms for this purpose. The process randomizes the quantizing error that results from word-length reduction by adding a pseudo-random number sequence of controlled amplitude and spectrum to the incoming audio data. In addition, if the full dynamic range of the digital signal has not been used by the programme material (if headroom has been left, for example) it may be possible to bit-shift the 20-bit samples upwards before truncating and redithering. That way more of the MSBs are used and less of the information contained in the LSBs is lost. This is achieved by a simple increase of gain in the digital domain prior to 16-bit transfer.

In the 1992 revision of the AES3 interface standard, provision is made for much more careful definition of the sample word length, such that receiving devices may optimize the transfer of data from a transmitter of different resolution. Standardization work has also gone on within the EBU to determine how analog signal levels should relate to digital signal levels, especially since 20-bit recording can be used on the audio tracks of digital video recorders . The conclusion reached was that one could not rely on correct implementation of byte 2 of channel status in devices using the AES3 interface in all cases, especially in older equipment. Irrespective of the number of bits, the only practical argument was for a fixed relationship to be used between analog and digital levels ¹³ .

Originally EBU Recommendation R-64 specified that analog alignment level (corresponding to a meter reading of PPM4 or 0 dBu electrically) should be set to read 12dB below full scale (i.e. -12dB FS) on a digital system. This was based on the dynamic range available from typical 16-bit convertors, and assumed that the finished programme's level would be well controlled. Since then 16-bit convertor technology has improved and because it was necessary to use the same alignment for 20-bit systems the new recommendation now specifies alignment level to be 18dB below full scale. This allows for an additional 6dB of operational headroom.