4.3 Standard Two-Channel Interface Principles

Common to all the international standards for a two-channel interface is the data format of the subframe containing samples of audio data for each channel. There are two principal electrical approaches used for the standard two-channel interface: one is unbalanced and uses relatively low voltages, the other is balanced and uses higher voltages. AES3-ID-2001 also describes an unbalanced coaxial link for use over distances beyond 100 m (see section 4.3.6).

4.3.1 Data Format

The interface is serial and self-clocking. That is to say that two channels of audio data are carried in a multiplexed fashion over the same communications channel, and the data is combined with a clock signal in such a way that the clock may be extracted at the receiver and used to synchronize reception . As shown in Figure 4.1, one frame of data is divided into two subframes, handling channels 1 and 2 respectively. Channels 1 and 2 may be independent mono signals or they may be the left and right channels of a stereo pair, and they are separately identified by the preamble that takes up the first four clock periods of each subframe. Samples of channels 1 and 2 are transmitted alternately and in real time, such that two subframes are transmitted within the time period of one audio sample thus the data rate of the interface depends on the prevailing audio sampling rate.

Figure 4.1: Format of the standard two-channel interface frame.

The subframe format consists of a sync preamble, four auxiliary bits (which may be used for additional audio resolution), 20 audio sample bits in linear two's complement form, a validity bit (V), a user bit (U), a channel status bit (C) and a parity bit (P). The audio data is transmitted least significant bit first, and any unused LSBs are set to zero; thus the MSB of the audio sample, whatever the resolution, is always in the MSB position. The remaining non-audio bits are discussed in later sections.

The data is combined with a clock signal of twice the bit rate using a simple coding scheme known as bi-phase mark , in which a transition is caused to occur at the boundary of each bit cell (see Figure 4.2). An additional transition is also introduced in the middle of any bit cell that is set to binary state '1'. Such a scheme eliminates almost all DC content from the signal, making it possible to use transformer coupling if necessary and allowing for phase inversion of the data signal. (It is only the transition that matters, not the direction of the transition.) This channel code is the same as that used for SMPTE/EBU timecode.

Figure 4.2: An example of the bi-phase mark channel code.

As shown in Figure 4.3, there are three possible subframe preambles in time slots 1 to 4 which violate the rules of the modulation scheme in order to provide a clearly recognizable sync point when the data is decoded. These preambles cannot be confused with the data portion of the subframe. In AES3 these are called 'X', 'Y' and 'Z' preambles, but in IEC 60958 they are primarily labelled 'M', 'W' and 'B'. As the diagram shows, X and Y preambles identify subframes of channels 1 and 2 respectively, whereas the Z preamble occurs once every 192 frames in place of the X preamble in order to mark the beginning of a new channel status block (see section 4.8). Since the parity bit which ends the previous subframe is 'even parity', the transition at the start of each preamble will always be in the same (positive) direction, but a phase inverted preamble must still be decoded properly.

Figure 4.3: Three different preambles (X, Y and Z) are used to synchronize a receiver at the starts of subframes.

The parity bit is set such that the number of ones in the subframe, excluding the preamble, is even, and thus it may be used to detect single bit errors but not correct them. Such a parity scheme cannot detect an even number of errors in the subframe, since parity would appear to be correct in this case. As discussed in section 6.9 there are more effective ways of detecting poor links than using the parity bit.

4.3.2 Audio Resolution

In normal operation only the 20-bit chunk of the subframe is used for audio data.

This is adequate for most professional and consumer purposes but the standard allows for the four auxiliary bits to be replaced by additional audio LSBs if necessary, taking the maximum resolution up to 24 bits. AES3-1992 provides a facility within the channel status data for signalling the actual number of audio bits used in the transmitted data, such that receiving equipment may adjust to decode them appropriately. This will be of considerable importance in ensuring optimum transfer of audio quality between devices of different resolutions during post-production, as discussed in section 6.8.

In consumer formats, the category code that describes the source device (see section 4.8.6) may also imply a fixed audio word length, because certain categories only operate at a particular resolution. The Compact Disc, for example, always uses a 16-bit word length.

4.3.3 Balanced Electrical Interface

All the standards referring to a professional or 'broadcast use' interface specify a balanced electrical interface conforming to CCITT Rec. V.11 ¹³ . There are distinct similarities between this and the RS-422A standard ¹⁴ but they are not identical, although RS-422 drivers and receivers are used in many cases. Figure 4.4 shows a circuit designed for better isolation and electrical balance than the basic CCITT specification, as suggested in AES3-1992. Although transformers are not a mandatory feature of all the standards, they are advisable because they provide true electrical isolation between devices and help to reduce electromagnetic interference problems. (Manufacturers often connect an RS-422 driver directly between the two legs of the source, which makes it balanced but not floating. Alternatively an RS-485 driver is used, which is a tri-state version of RS422 giving a typical output voltage of 4V 5%, going to a high impedance state when turned off.) The standards specify that the connector to be used is the conventional audio three-pin XLR (IEC 268-12), using pin 1 as the shield and pins 2 and 3 as the balanced data signal. Polarity is not really important, since the channel code is designed to allow phase inversion, although the convention is that pin 2 is ' +' and pin 3 is '-'.

Figure 4.4: Recommended electrical circuit for use with the standard two-channel interface.

Although the original AES3 standard allowed for up to four receivers to be connected to one transmitter, this is now regarded as inadvisable due to the impedance mismatch which arises. Originally the standard called for the output impedance of the transmitter to be 110 ohms 20% over the range 0.1 to 6 MHz, and for that of receivers to be 250 ohms, but this has been changed in AES3-1992 so that the receiver's impedance should now be the same as that of the transmitter and the transmission line. Amendment 3 (1999) modifies the specification to accommodate the increasingly common use of higher frame rates than originally envisaged, as a result of the use of high sampling frequencies such as 96 kHz. The standard now specifies that impedance should be maintained within the defined limits between 100 kHz and 128 times the maximum frame rate. Only one receiver should be connected across each line and distribution amplifiers should be used for feeding large numbers of receivers from a single source. The cable's characteristic impedance, originally specified as between 90 and 120 ohms, is now specified as 110 ohms as well. It should be a balanced, screened pair, and although standard audio cables are often used successfully it is worthwhile considering cable with better controlled characteristics for large installations and long distances, in order to improve the reliability and integrity of the link (see section 6.7.2). This is especially true when using sampling frequencies above 48 kHz where the selection of cables and maximum lengths will become increasingly critical.

There is a difference in driver voltage levels between the original and later versions of the standard. AES3-1985 and all the related standards specified a peak-to-peak amplitude of between 3 and 10 volts when measured across a 110 ohm resistor without the connecting cable present. The 1992 revision changed it to be between 2 and 7 volts in order to conform more closely to the specifications of the RS-422 driver chips used in many systems. (RS-422A in fact specifies that receiver inputs should not be damaged by voltages of less than 12 volts .)

At the receiving end, the standards all indicate that correct decoding of the data should be possible provided that the eye pattern (see section 3.2.4) of the received data is no worse than shown in Figure 4.5. This suggests a minimum peak-to-peak amplitude of 200 mV and allows for the toleration of a certain amount of jitter in the time domain. Without equalization the balanced interface should be capable of error-free communication over distances of at least 100 m at 48 kHz sampling frequency, and often further. This depends to some extent on the type of cable, the electromagnetic environment, the integrity of the transmission line, the frame rate and the quality of the data recovery in the receiver. One should expect maximum cable lengths to be shorter at high frame rates, all other factors being equal. Receivers vary quite widely in respect of their ability to lock to an unstable data signal which has suffered distortion over the link, and an interconnect which works badly with one receiver may be satisfactory with another. Devices are available which will give some idea of the quality of the received data signal, in order that the user may tell how close the link is to failure (see section 6.9).

Figure 4.5: The minimum eye pattern acceptable for correct decoding of standard two-channel data.

It is possible to equalize the signal at the receiver in order to compensate for high-frequency losses over long links and the standards suggest the curve shown in Figure 4.6 for use at the 48 kHz sampling frequency. It has been suggested ¹⁵ , though, that as cable lengths increase the loss characteristic approaches a second order curve before problems occur, and that therefore a second order equalization characteristic is often more effective.

Figure 4.6: EQ characteristic recommended by the AES to improve reception in the case of long lines (basic sampling rate).

4.3.4 Unbalanced Electrical Interface

The unbalanced interface described in this section is commonly found on consumer and semi-professional equipment and has become widely used as a stereo interface on computer sound cards, probably because of the compact size of the connector. The unbalanced electrical interface specified originally in IEC 958 and EIAJ CP-340/1201 is not a feature of professional standards such as AES3. IEC 958 did not originally state explicitly that the unbalanced interface was intended for consumer use it simply called it 'unbalanced line (two-wire transmission)' but operational convention and the origin of the SPDIF interface on which it was based established that the unbalanced two-wire interface, terminating in RCA phono connectors, was for consumer applications. Interestingly, EIAJ CP-340 took the step of noting that the unbalanced interface and the optical fibre interface applied only to Type II transmissions (consumer), although it did not say anything about the balanced interface being only for professional purposes. These confusions are resolved in IEC 60958 which clearly indicates the use of the unbalanced or optical interfaces for consumer applications in Part 3.

The unbalanced interface is shown in Figure 4.7. IEC 60958 (1999) specifies a source impedance of 75 ohms 20% for this interface, between 0.1 and 6 MHz, and a termination impedance of 75 ohms 5%. Like AES3, it is being revised to account for higher sampling frequencies and so will in future state an upper limit of 128 times the maximum frame rate. It specifies a characteristic cable impedance of 75 ohms 35%. The cable is normally a standard audio coaxial cable and this interface is typically used for interconnecting consumer equipment over the sorts of distances involved in hi-fi systems. It does not specify a maximum length over which communication may be expected to be successful but it does give an eye pattern limit for correct decoding and specifies a minimum peak-to-peak input voltage at the receiver of 200 mV (the minimum eye pattern for correct decoding is essentially the same as AES3). A significant difference between this interface and the balanced interface is that the source signal amplitude should be only 0.5 V 20%, peak-to-peak, which is much lower than the balanced interface. It should be noted, though, that video-type 75 ohm coaxial cable exhibits very low losses below about 10 MHz and so one might expect to be able to cover significant distances without the signal level falling below the minimum specified.

Figure 4.7: The consumer electrical interface (transformer and capacitor are optional but may improve the electrical characteristics of the interface).

It used to be said by some that because the unbalanced interface was a coaxial transmission line with well-controlled impedances it formed a better link than the balanced interface. This was always offset by the advantages of a balanced line in rejecting interference and the higher voltages used in the balanced interface. Now that the balanced interface specifies source and termination impedances to be the same, requires point-to-point connection, and recommends 110 ohm cable (rather than anything between 90 and 120 ohms), the balanced interface has the benefits of a good transmission line as well as its other advantages.

4.3.5 Optical Interface

An optical interface was introduced as a possibility in IEC 958 but was left 'under consideration'. Surprisingly, perhaps, this still seems to be the case in 60958. It was specified more explicitly in EIAJ CP-340 (or CP-1201) as applying only to Type II data and consisting of a transmitter with a wavelength of 660 nm 30 nm and a power of between -15 and -21 dBm. Receivers should still correctly interpret the data when the optical input power is -27 dBm. The connector indicated conforms to the specification laid out in EIAJ RCZ-6901.

Typically the optical interface is found in consumer equipment such as DAT recorders , CD players, computer sound cards, stand-alone convertors, and amplifiers with built-in D/A convertors. It usually makes use of an LED transmitter (see section 1.7) and a fibre optic cable, connected to a photodetector in the receiver. The 'TOSLink' style of fibre optic interface is popular in consumer equipment, and is driven from a TTL level (05 volt) unbalanced source, with a data format identical to that used with the electrical interface. The advantages of optical links in rejecting interference have already been stated in section 1.7 but there are also dangers in using cheap optical interfaces. Their limited bandwidth and high dispersion may actually result in a poorer transmission channel than a normal electrical interface, resulting in a high degree of timing instability in the positions of data transitions (see Chapter 6). For a comprehensive introduction to fibre optics in audio the reader is referred to Ajemian and Grundy ¹⁶ .

4.3.6 Coaxial Interface

A coaxial method of transmission for the professional AES3 interface, described in AES-3ID ¹⁷ , makes use of 75 ohm video-style coaxial cable to carry digital audio signals over distances up to around 1000 m. A similar but not identical description of this is to be found in SMPTE 276M ¹⁸ . A signal level similar to that of a video signal (1 volt) is used, although the signal is not formatted to look like a video waveform (it still uses basically the same bi-phase mark channel code as AES3). Easy conversion is possible between the balanced form of AES3 and this coaxial form and a number of manufacturers make balanced-to-coax adaptors (although the voltage level is much lower after transformation from 1 volt/75 ohms back to 110 ohms than might be expected from a standard AES3 balanced output stage). Some simple conversion networks are illustrated in the information document.

The advantages of this interface include the ease of distribution of audio within a television studio environment, using video distribution amplifiers and cabling, and improved electromagnetic radiation characteristics when compared with the balanced twisted pair, as discussed in Rorden and Graham ¹⁹ . Tests of such an interface have been successful, showing in one test that equalized video lines could carry an AES-format digital audio signal over a distance of more than 800 miles (1300 km) without any noticeable corruption.

4.3.7 Multipin Connector

A multipin connector version of AES3 is described in the information document AES2-ID ²⁰ for use in circumstances in which there is not sufficient space for multiple XLR connectors. Such a connector allows multiple channel interfacing without going to the lengths of implementing the MADI standard (see section 4.11) and could be a lower cost solution than MADI in cases of a smaller number of channels than 56. The described configuration carries 16 channels of audio data on a single 50 pin D-type connector.