Squeezing Your Voice into a Smaller Package
Once our analog waveforms have been digitized, we might want to save WAN bandwidth by compressing these digitized waveforms by encoding them. The processes of encoding and decoding these waveforms are defined by
, also known as
code decoder (CODEC)
. Let's consider a few forms of
used by various CODECs:
Pulse code modulation (PCM)
Doesn't actually compress the analog waveform. Rather, PCM samples and
quantization (as described earlier) without any compression. G.711 is the CODEC that uses PCM.
Adaptive Differentiated PCM (ADPCM)
. Instead of encoding an entire sample, ADPCM can send the difference in the current sample, versus the previous sample. G.726 is an example of an ADPCM CODEC.
Conjugate Structure Algebraic Code Excited Linear Predication (CS-ACELP)
Dynamically builds a
based on speech patterns. It then uses a
look ahead buffer
to see whether the
sample matches a pattern already in the
. If it does, then the codebook location can be sent instead of the actual sample. G.729 is an example of a CS-ACELP CODEC.
To help visualize how CS-ACELP works, imagine you and I are having a conversation across a digital circuit, and you notice that frequently in my speech pattern, I make the sound "ing" (for example, as in the words rout
, and read
). Instead of digitizing the "ing" sound, you make an entry in a book that you have (called a codebook) that describes what "ing" sounds like. I also make the same entry in my codebook. In the future, instead of me having to digitize and transmit all of the 1s and 0s that describe how "ing" sounds, I simply send you the location of that sound in your codebook. For example, instead of sending the "ing" sound, I might tell you, as a metaphor, that I'm sending the sound located on page 51, line 30 in your codebook. You look up that location in your codebook and find the binary code for making the "ing" sound. The advantage is that instead of sending the actual sound, I'm only sending you the location of that sound in your codebook, which takes up far less bandwidth than sending the actual sound.
As mentioned earlier, this codebook is built dynamically, based on the speech patterns of the conversation. Therefore, it's probably safe to assume that the codebook built during a conversation with Barney the Purple Dinosaur would be significantly different from the codebook built during a conversation with the Osbourne family.
The purpose of the look-ahead buffer used by G.729 is to collect voice patterns in a buffer and attempt to match those voice patterns with a pattern already defined in the local codebook. In fact, in the Cisco VoIP environment, G.729 is the most popular CODEC for sending voice traffic over the WAN, primarily because of its high quality and low bandwidth requirements. To transmit the actual digitized voice, G.729 only requires 8 kbps, compared to the 64 kbps of bandwidth required by G.711. CS-ACELP is designed to encode speech patterns. Therefore, other audio sources (for example, music on hold) might experience more quality degradation than human speech.
Low-Delay Conjugate Excited Linear Predication (LDCELP)
Is very similar to CS-ACELP. However, LDCELP uses a smaller codebook, resulting in less delay, but it requires more bandwidth. The G.728 CODEC is an example of a LDCELP CODEC.
Working with Cisco products, you will normally use G.711 (which requires 64 kbps of bandwidth for voice payload) in the local area network (LAN) environment and G.729 (which requires 8 kbps of bandwidth for voice payload) over the WAN. G.729 does have a couple of variants. Although all flavors of G.729 require 8 kbps of bandwidth to transmit voice, G.729a uses a less complex algorithm, which saves processor resources with very
quality degradation. G.729b enables voice activity detection (VAD).
What is VAD, and why do we need it? Let's say that you and I are talking with each other on the phone, and you step away for a moment. During that time,
of us is talking. However, the VoIP networking is still carrying that "silence," and silence takes up just as much bandwidth as regular speech. However, VAD can detect when the conversation stops. By default on Cisco routers, after 250
(ms) (that is, one-fourth of a second) of silence, the router stops sending the silence, thus freeing up bandwidth. To take a line from Simon and Garfunkel, VAD does not send "the sound of silence." Although the amount of bandwidth saved by VAD varies based on speech patterns, a 30 percent bandwidth savings is typical.
In addition to the type of compression used, the
of the final voice packet depends on several
, such as:
Is the voice traffic being transmitted across a Frame Relay, Asynchronous Transfer Mode (ATM), or Ethernet network?
Is the voice traffic being sent over a virtual private network (VPN)?
Is the header information being compressed?
Is the digitized voice being compressed? For example, G.711 does not compress voice, but other CODECs (for example, G.729 and G.723) do compress voice.
Although several Cisco course
to aid in determining the required bandwidth for a call, I prefer to use the Cisco web-based
Voice Bandwidth Calculator
located at the following URL:
A Cisco Connection Online (CCO) account is required in order to use this tool. You can register for a CCO account by pointing your web browser to the Cisco home page at http://www.cisco.com and clicking the
Let's walk through how to use this incredibly
tool to determine the required bandwidth for a call:
Open up a web browser to
From the initial screen, as shown in Figure 2-8, select the CODEC being used (typically G.711 on a high-speed LAN or G.729 on a lower-speed WAN), the voice protocol (VoIP for our discussion), and the number of simultaneous calls you need to support. Then click the
Figure 2-8. Voice Bandwidth Calculator Screen 1
Complete the next screen, as shown in Figure 2-9, by entering the voice payload size (the number of bytes used to encode a single voice sample, which is typically 20 bytes for the G.729 CODEC), clicking a checkbox if you are using RTP header compression (discussed in Chapter 6, "Why Quality Matters"), selecting your media access (for example, Frame Relay, Ethernet, or PPP), and indicating any additional overhead that
the size of the packet (for example, tunneling or security overhead). Then click the
Figure 2-9. Voice Bandwidth Calculator Screen 2
From the final screen, as shown in Figure 2-10, note the Total Bandwidth (including Overhead) value. This value gives you the amount of bandwidth required to support the number of simultaneous calls you indicated, with the network characteristics you specified.
Figure 2-10. Voice Bandwidth Calculator Screen 3
Thus far in the chapter, you have seen that when sending your voice traffic across the network, you can
bandwidth by using a CODEC that compresses the voice traffic. However, there is a tradeoff. If you reduce your bandwidth requirement, you might need to sacrifice some of the voice quality. To best determine whether this "bandwidth for quality"
is acceptable, you need a way to
compare quality. Fortunately, you have your choice of various voice quality measurements:
Mean Opinion Score (MOS)
to judge the quality of voice after passing through the CODEC being
. MOS values range from 1, for unsatisfactory quality, to 5, for no noticeable quality degradation. For toll-quality voice, however, an MOS value in the range of 4 is appropriate. The G.711 CODEC has an MOS value of 4.1. Accompanied by a significant bandwidth savings, G.729 has an MOS of 3.92, while the less processor
G.729a has a MOS of 3.9. However, the challenge with MOS is that at its essence, it is based on opinion.
Perceptual Speech Quality Measurement (PSQM)
Digitally measures the difference in the original signal and the signal after it
through a CODEC.
Perceptual Evaluation of Speech Quality (PESQ)
Digitally measures quality, like PSQM, but attempts to match the more familiar MOS values. For example, if a particular CODEC has an MOS score of 4.1, then it should also have a PESQ value of approximately 4.1.