7.3 Basis of Voice Coding

Voice signals in the telephone device are analog signals within the frequency range of 300 Hz to 3.4 kHz. The landline digital voice network converts these analog signals into digital signals with the help of the PCM (Pulse Code Modulation) ^[9] scheme. In this scheme, the voice band analog signals are sampled at 8 kHz speed to meet the Nyquist sampling rate, f_s > 2 bandwidth. The samples are digitized by the different quantization techniques to 8 bits of data. Thus, the wireline PCM system is a 64-kbps stream where one voice sample of 8 bits is generated every 125 microseconds. The quantization of the samples creates error. Successive quantization errors of voice samples can be assumed uncorrelated random noise. Therefore, the quantization error is viewed as noise and expressed as the signal-to-quantization noise ratio (SQR). It is expressed as

where E[.] denotes the expectation value, X(t) is the analog input signal at time t, and Y(t) is the decoded output signal at time t. The error [Y(t) - X(t)] is limited in amplitude to q/2, where q is the height of the quantization interval. If all quantization intervals have equal lengths and the input analog signal is a sinusoidal, the SQR in dB is expressed as

SQR(dB) = 10.8 + 20log10[v/q]

where v is the RMS (root mean square) value of the amplitude of the input. The SQR values of the signals increase with the sample amplitude and penalize the small sample-size signals. A more-efficient coding can be achieved by not having uniform sample size, but rather by having sample size vary. The process of compounding ^[10] is used to achieve this nonuniform sampling. The compression algorithm used in North America and Japan for PCM is called μ-law, and a compounding formula recommended by the ITU for Europe and the rest of the world is called A-law. The 64-kbps voice-coding standard is issued from the ITU as Recommendation G.711. ^[11] In addition, the ITU developed Recommendations G.726 and G.727 on 40, 32, 24, and 16-kbps adaptive differential PCM (ADPCM) coding standards and G.722 on 56 to 64 kbps, 7-kHz wideband ADPCM standard.

The introduction of digital wireless communication in the early 1980s introduced a new challenge: G.711 and G.721 are not usable for wireless networks due to very limited bandwidth of wireless links. The early digital wireless access standard can support only 108 kbps over 200-kHz bandwidth in GSM and about 24 kbps over 30 kHz for North American IS-54 TDMA digital standards. It was felt that lower rate voice coders are needed for wireless voice communications. In addition, the voice coder should be able to provide robust communication under fading channel behavior. The initial answer came from many different solutions such as the code excited linear prediction (CELP) codec at 6.5 kbps, called the half-rate codec. For example, CDMA introduced the system with a full-rate 13-kbps CELP codec. The ETSI (European Telecommunications Standards Institution) introduced residual excited linear predictive speech coding (RPE-LTP) that is 13 kbps with a frame size of 20 milliseconds and no look-ahead delay. The ANSI (American National Standards Institution) proposed vector-sum excited linear predictive coding (VSELP) at 7.95 kbps for TDMA IS-54 system that has a frame size of 20 milliseconds and look-ahead delay of 5 milliseconds. This RPE-LTP codec of GSM allows supporting 8 voice calls simultaneously within the 200-kHz bandwidths and three voice calls within the 30-kHz bandwidth of the North American IS-54 TDMA system. New coders with improved performance were introduced in the subsequent GSM networks. They include full rate, ^[12] half rate, enhanced full rate, ^[13] adaptive multirate, ^[14] and RECOVC (recognition-compatible voice coding) speech transcoding. Other coders include G.728 16-kbps speech coding using low delay code excited linear prediction and G.729 ^[15] 8-kbps speech coding using conjugate structure algebraic code excited linear prediction. The Qualcom code excited linear prediction (QCELP) is a variable bit rate codec of 8.5, 4.0, 2.0, and 0.8 kbps speeds, a frame size of 20 milliseconds, and a look-ahead delay of 5 milliseconds. Further to this bit rate reduction, a single call also can be provisioned for half-rate coding by taking advantage of the silence period during the conversation. At present a number of lower bit rate coding techniques are under consideration for wireless voice communications.

^[9]Bellamy, J.C., Digital Telephony, Wiley Interscience, New York, 1991, pp. 98–142.

^[10]Bellamy, J.C., Digital Telephony, Wiley Interscience, New York, 1991, pp. 98–142.

^[11]International Telecommunications Union, Pulse Code Modulation of Voice Frequencies, Recommendation G.711, Telecom Standardization Sector, Geneva, Switzerland, 1988.

^[12]GSM 06.10, Digital Cellular Telecommunications System (Phase 2+): Full Rate Speech Transcoding, Version 7.0.1, Release 1998.

^[13]GSM 06.60, Digital Cellular Telecommunications System (Phase 2+): Enhanced Full Rate (FER) Speech Transcoding, Version 7.0.1, Release 1998.

^[14]GSM 06.10, Digital Cellular Telecommunications System (Phase 2+): Adaptive Multi-Rate (AMR) Speech Transcoding, Version 7.1.0, Release 1998.

^[15]International Telecommunications Union, Coding of Speech at 8 kbits/sec Using Conjugate-Structure Algebraic Code-Excited Linear-Predictive (CS-ACLEP) Coding, Recommendation H.729, Telecom Standardization Sector, Geneva, Switzerland, Mar. 1996.