Section 17.2. Digital Voice and Compression | Computer and Communication Networks (paperback)

17.2. Digital Voice and Compression

Our discussion starts with the voice as a simple real-time signal. We first review the fundamentals of voice digitization and sampling.

17.2.1. Signal Sampling

In the process of digitalizing a signal, analog signals first go through a sampling process , as shown in Figure 17.2. The sampling function is required in the process of converting an analog signal to digital bits. However, acquiring samples from an analog signal and eliminating the unsampled portions of the signal may result in some permanent loss of information. In other words, the sampling resembles an information-compression process with loss.

Figure 17.2. Overview of digital voice process

Sampling techniques are of several types:

Pulse amplitude modulation (PAM), which translates sampled values to pulses with corresponding amplitudes
Pulse width modulation (PWM), which translates sampled values to pulses with corresponding widths
Pulse position modulation (PPM), which translates sampled values to identical pulses but with corresponding positions to sampling points

PAM is a practical and commonly used sampling method; PPM is the best modulation technique but is expensive. PWM is normally used in analog remote-control systems. The sampling rate in any of these schemes obeys the Nyquist theorem , according to which at least two samples on all components of the spectrum are needed in order to reconstruct a spectrum:

Equation 17.1

where f _H is the highest-frequency component of a signal, and f _S is the sampling rate.

17.2.2. Quantization and Distortion

Samples are real numbersdecimal-point values and integer valuesand, thus, up to infinite bits are required for transmission of a raw sample. The transmission of infinite bits occupies infinite bandwidth and is not practical for implementation. In practice, sampled values are rounded off to available quantized levels. However, rounding off the values loses data and generates distortion . A measure is needed to analyze this distortion. The distortion measure should show how far apart a signal denoted by x(t) is to its reproduced version, denoted by . The distortion measure of a single source is the difference between source sample X _i and its corresponding quantized value , denoted by , and is widely known as squared-error distortion :

Equation 17.2

Note that is noninvertible, since lost information cannot be recovered. The distortion measure of n samples is based on the values of source samples obtained at the sampler output. As a result, the collection of n samples forms a random process:

Equation 17.3

Similarly, the reconstructed signal at the receiver can be viewed as a random process:

Equation 17.4

The distortion between these two sequences is the average between their components:

Equation 17.5

Note that itself is a random variable, since it takes on random numbers . Thus, the total distortion between the two sequences is defined as the expected value of :

Equation 17.6

If all samples are expected to have approximately the same distortion denoted by . By using squared-error distortion, we obtain the total distortion:

Equation 17.7

Let R be the minimum number of bits required to reproduce a source and guarantee that the distortion be less than a certain distortion bound D _b . Clearly, if D decreases, R must increase. If X is represented by R bits, the total number of different values X _i takes is 2 ^R . Each single-source output is quantized into N levels. Each level 1, 2, ..., N is encoded into a binary sequence. Let be the set of real numbers ₁ , ..., _k , ..., _N , and let be the quantized value belonging to subset _k . Note that is a quantized version of X _k . Apparently, R = log ₂ N bits are required to encode N quantized levels. Figure 17.3 shows a model of N -level quantization: For the subsets ₁ =[ a ₁ ], ₂ =[ a ₁ , a ₂ ], ..., ₈ = [ a _N _- ₁ , ], the quantized values are , ,..., , respectively. We can use the definition of expected value to obtain D , as follows :

Equation 17.8

Figure 17.3. N -level quantization

Typically, a distortion bound denoted by D _b is defined by designers to ensure that D _b D .

Example.	Consider that each sample of a sampled source is a Gaussian random variable with a given probability distribution function . We want eight levels of quantization over the regions { a ₁ =-60, a ₂ =-40, ..., a ₇ = 60 } and . Assuming that the distortion bound for this signal is D _b = 7.2, find out how far the real distortion, D , is to D _b .

Solution.	Since N = 8, the number of bits required per sample is R =log ₂ 8 = 3. Using this, D can be developed further:

We note here that the total distortion as a result of eight-level quantization is 16.64, which is considerably greater than the given distortion bound of 7.2.

The conclusion in the example implies that the quantization chosen for that source may not be optimal. A possible reason may be inappropriate choices of ( a ₁ , ..., a ₇ ) and/or . This in turn means that R is not optimal.

Optimal Quantizers

Let ” be the length of each region equal to a _i ₊ ₁ - a _i . Thus, regions can be restated as ( - , a ₁ ) ... ( a _N _- ₁ , + ). Clearly, the limits of the upper region can also be shown by ( a ₁ + ( N - 2)”, + ). Then, the total distortion can be rewritten as