Section 6.2. Voice Channels | Switching to VoIP

6.2. Voice Channels

A VoIP softswitch has two main functions: call management (or switching), which is covered in the next chapter, and voice transmission.

Voice transmissionthe packaging, transmittal, receiving, and reconstruction of digitized voice dataoccurs inside virtualized pathways across the TCP/IP network. Many softPBX systems, Asterisk included, call them channels . The word channel does mean different things to different vendors , though. Keep that in mind as you read VoIP documentation. It also means different things at different layers . RTP protocol has media channels , which are streams of sound or video data, while a path across the network for call signaling is also sometimes called a channel .

For this chapter, we'll give the word channel a wide definition: the complete virtualized transport that takes the mouth-to-ear analog signal and transports it over a great distance using networked software.

There are several steps in the process of transmitting voice sounds over a channel: sampling, digitizing, encoding, transport, decoding, and playback. Usually, each step occurs once per packet of voice data. Complex applications like conference call, surveillance, or overhead paging may handle these steps in a unique way, but for this chapter, we'll concentrate on how the process works for a standard, two-party, point-to-point phone call.

6.2.1. Sampling and Digitizing

Digital-to-analog conversion (DAC) and analog-to-digital conversion are the processes that convert sound from the format in which it is heardanalog sound wavesinto the format that VoIP uses to carry itdigital streamsand back again. These processes are necessary in order for inherently analog devicesnamely, human earsto use digital sound signals. In the world of traditional telephony, the process is fairly simple, because variations in DAC techniques are driven by requirements of different data links and devices and by regional standards variations.

The DAC processes employed in Voice over IP aren't tied to the data link, so they can vary greatly: different DAC, digitizing, and compression techniques are used in different circumstances. Sometimes the data link's properties, like bandwidth capacity and latency, are factors in the selection of these techniques, but not always. DAC is required in all telephony environments, even where VoIP isn't used, because just about every traditional telephony system employs digital carriers and analog sound reproduction devices like speakers and transducers .

DAC includes of quantizing or digital "sampling" of sounds, filtering for bandwidth preservation, and signal compression for bandwidth efficiency. Pulse code modulation (PCM) is the most common sampling technique used to turn audible sounds into digital signals. We'll deal with DAC subjects in greater detail later.

6.2.1.1 The 64 kbps channel

To connect a phone call, a traditional telephone, whether analog or digital, requires a loop with enough quality for 64 kilobits per second of digital throughput. In fact, 64 kbps is the fixed line speed of any POTS line. Analog and (most) digital telephone systems offer similar sound clarity because they operate at the same sampling frequency, 8,000 Hz. This frequency, when combined with a sampling resolution of 8 bits, requires 64 kbps of bandwidthhence the 64 kbps line speed.

It should suffice for now that each concurrent voice conversationanalog or digitalrequires a link capable of a speed or bandwidth ^[*] of 64 kbps. As you'll discover, the 64 kbps channel is a baseline unit for dealing with sizing issues on your VoIP network. It's important to look at bandwidth conservation methods in relationship to 64 kbpsthe "standard unit" of voice bandwidth.

^[*] Bandwidth is not an entirely accurate term to describe link speed, but the industry has accepted it as synonymous with link speed.

6.2.2. Encoding

6.2.2.1 Framing

Framing is the real-time process of dividing a stream of digital sound information into manageable, equal- sized hunks for transport over the network. Consider Figure 6-1, which shows a representation of 60 milliseconds of digitized sound. It's divided into three frames , each 20 milliseconds in duration. At this rate, it takes 50 frames to represent 1 second of digitized sound.

Figure 6-1. Framing is the process of dividing a digital stream into equal-sized hunks

6.2.2.2 Digital versus packet based

Unlike an analog phone line, the sound signals transmitted and received by an IP endpoint are digital. This makes them more akin to those carried over a traditional voice T1 or ISDN circuit. But unlike a digital phone company circuit, VoIP calls are also packet based. This means that the sound frames are carried across the network in units that are also used to carry other kinds of datain VoIP's case, UDP datagrams.

6.2.2.3 Multiplexing

The PSTN offers a way of providing a far-higher call capacity than a single POTS line24 simultaneous calls using two pairs of wire rather than one pair per simultaneous call. This high-density technology is called T1. It's often used to provide links between PBX systems. For instance, one could use a T1 circuit to link PBXs in separate buildings at disparate locations. T1 provides a far more economical way of allowing many simultaneous calls between users at opposing locations than does POTS. The technique is called multiplexing . Even denser multiplexed voice circuits can carry more voice channels: DS3, which supports 672 individual channels, and OC (optical carrier) circuits are also used to multiplex and link between switches. These circuits tend to be quite expensive. In voice applications, DS3s and higher are most likely to show up in call center environments or as trunks between PSTN switches. In data applications, DS3s and OC circuits are often used by ISPs and application service providers that need very high-capacity Internet connectivity.

6.2.2.4 Compression

VoIP provides an even more economical way of linking those PBXs together. If 100 calls are to occur at the same time using a PBX, then roughly 6 mbps of composite bandwidth is required. This would require five T1s. But VoIP encoding techniques allow for significant compression of the sound sample used to represent the spoken voice in the network, so that far fewer physical links are required in this instance.

It's possible to reduce a 64 kbps voice call down to 44 kbps without a noticeable reduction in sound qualitya feat that, setting aside the concept of overhead (which we'll cover later), is quite common with VoIP compression methods. Now, that link between PBXs uses only 4 mbps, and needs only three T1 circuits instead of five, resulting in a much cheaper trunk.

The algorithms VoIP uses to encode sound data, and sometimes to decrease bandwidth requirements, are called codecs . In order to get three T1 circuits to do the work of five in a voice application, bandwidth-conserving codecs are used.

6.2.2.5 Codecs

Codecs, short for coder /decoders, are algorithms for packaging multimedia data in order to stream it, or transport it in real time, over the network. There are dozens of codecs for audio and video. We'll be talking about audio codecs, since they are most common on VoIP networks.

Most of the codecs in use on VoIP networks were defined by ITU-T recommendations in of the G variety (transmission systems and media). A few are well-suited to a very high fidelity application like music streaming, but most are suitable only to spoken word. They're the ones we'll be concentrating on.

Telephony audio codecs break down into two groups: those that are based on pulse code modulation and those that restructure the digital representation of PCM into a more portable format. So the two groups of telephony codecs are PCM codecs, which are the basic 64 kbps codecs, and vocoders , which are the codecs that go a step beyond the essential PCM algorithm. Here are the codecs you'll see most often:

G.711

This codec is a 64 kbps encoding/decoding algorithm that uses straightforward 8-bit PCM digitization for 8 kHz linear audio monaural signals. In other words, that's a toll-grade telephone audio signal. G.711 is the least processor- intensive codec, and it's the encoding scheme used by most traditional digital telephony circuits, like T1s. G.711 does not provide any compression.

m Law and ALaw are two variations of the PCM digitizing technique used in the G.711 codec. One uses a logarithmic digitizing scale to grade amplitude levels, while the other uses a linear one. They are not compatible and must be transcoded if the caller is using one and the receiver is using the other. This isn't likely to be a problem unless you have callers accessing your VoIP network from other continents. m Law is used throughout North America and parts of the Far East, while ALaw is prevalent elsewhere.

G.721, G.723, G.726, G.728, and G.729A

These codecs enable significantly more economic use of the network, permitting high-quality sound reproduction at a bitrate of 8 to 32 kbps. Unlike G.711, this group of codecs uses ADPCM or CELP algorithms to reduce bandwidth requirements. Adaptive differential pulse code modulation (ADPCM) conserves bandwidth by measuring the deviation of each sample from a predicted point rather than from zero, allowing fewer bits to be required to represent the historically 8-bit PCM scales . CELP, or code excited linear prediction, uses a newer variation of this approach.

G.722

This codec is called a wideband codec because it uses double the sampling rate (16 kHz rather than 8). The effect is much higher sound quality than the other VoIP codecs. Other than that, it's identical to G.711.

GSM

The global systems for mobile codec offers a 13 kbps bit stream that has its roots in the mobile phone industry. Like many of the ITU-recommended codecs, GSM uses a form of CELP to achieve sample scale compression but is much less processor intensive.

iLBC

The Internet low-bitrate codec is a free, proprietary audio codec that provides similar bandwidth consumption and processing intensity to G.729A, but with better resilience to packet loss. iLBC is available from http://www.ilbcfreeware.org.

Speex

The Speex codec supports sampling rates of 8 to 32 kHz and a variable packet rate. Speex also allows the bitrate to change in midstream without a new call setup. This can be useful in bursty congestion situations, but is unlikely to matter much to enterprise networks that have quality-of-service measures and more reliability than the Internet. Speex is free, and open source implementations exist.

Each of the codecs has some pros and cons. G.711 is great on data links where there's plenty of capacity and very little latency, like Ethernet. It's also highly resilient to errors. But you wouldn't want to use it on a 56 k frame relay link because there would not be enough bandwidth. Conversely, the codecs that provide compression do so at a loss, or degradation in quality, to the sound. That's why some call them "lossy" codecs.

6.2.2.6 Codec packet rates

Besides the bits that represent data, all data packets carry bits used for routing and sometimes for error correction. These "overhead bits" have no direct benefit to voice applications, other than allowing the lower levels to functionthings like Ethernet headers, IP routing headers, other information necessary for transport of the packet. When longer durations of sound are carried by each packet, these overhead items don't have to be transmitted as often, because fewer packets are required to transport the same sound. The net result of decreasing overhead is that the application uses the network more efficiently . Reducing overhead is crucial, just as it is in a business plan, because overhead, while necessary, provides no direct benefit to the application (or to the business).

The knee-jerk way to lower overhead in a VoIP network is to reduce the number of packets per second used to transmit the sound. But this increases the impact of network errors on the voice call. So there needs to be some balance between what's acceptable overhead and what's acceptable resiliency to errors. This is where a diversity of available codecs can help. Different codecs have different packet rates and overhead ratioswhich gives VoIP system builders a way to fine-tune their network's voice bandwidth economy.

The packet rate is the number of packets required per second (pps) of sound transmitted. Again, different audio codecs use different rates. The gap between transmitted packets is called the packet interval , and it is expressed in converse proportion to the packet rate. The shorter the packet interval, the more packets are required per second. Some of the codecs, especially those that use very advanced CELP algorithms, can require a longer duration of audio at a time (say, 30 ms rather than 20 ms) in order to encode and decode. The packet interval has the most obvious effect on overhead. The shorter it is, the more overhead is required to transmit the sound. The longer it is, the less overhead is required.

A G.711 call, which normally fits neatly on a 64 kbps channel, won't fit so snugly into a 64 kbps IP WAN connection. This is because it is wrapped in RTP and UDP packets, which are necessary overhead. Remember to include UDP and RTP overhead when calculating the capacities of your IP network connections.

But with longer packet intervals comes increased lag (see Figure 6-2). The longer the interval, the longer the lag will be between the time the sound is spoken and the time it is encoded, transported, decoded, and played back for the listener. An IP packet isn't transmitted until it is completely constructed , so a VoIP sound frame can't travel across the network until it's completely encoded. A 30 ms sound frame takes a third longer to encode than a 20 ms one, and inflicts 10 ms more lag, too. As with all networked apps, lag is bad. It's especially bad in VoIP.

Figure 6-2. Longer packet intervals cause lag, but decrease overhead

Long packet intervals have another drawback: the greater the duration of sound carried by each packet, the greater the chance that a listener will notice a negative effect on the sound if a packet is dropped due to congestion or a network error. Dropping a packet carrying 20 ms of sound is almost imperceptible with the G.711 codec, but dropping a 60 ms packet is quite obtrusive. Since VoIP sound frames are carried in "unreliable" UDP datagrams, dropped packets aren't retransmitted. Even if TCP packets were used instead of UDP, error awareness and retransmission would take so long that, by the time the retransmitted packet arrived at the receiving phone, it would be hopelessly out of sequence.

Consider that 8,000 samples per second are required for a basic voice signal at 8 bits per sample. Now, assuming a 20 ms packet interval (1/50 ^th of a second), you can see that it takes a minimum of 1,280 bits of G.711 data in each packet to adequately carry the sound:

Mathematically, increasing the sound data in each packet means reduction of packet overhead. Figure 6-3 illustrates a very simplified cross-section of a VoIP packet carrying 20 ms of G.711 data.

Following the previous example, increasing the packet interval to 30 ms (1/33rd of a second) results in a reduction in the number of packets required per second, raising

Figure 6-3. TCP/IP adds overhead to a VoIP channel; this IP packet carries 20 ms of sound

the bit count per packet and reducing the amount of overhead required to transmit the sound:

Generally , on Ethernet-to-Ethernet calls, the use of G.711 with a 20 ms packet interval is encouraged, because a 100 mbps data link can support hundreds of simultaneous 64 kbps calls without congestion, and a dropped packet at 20 ms interval is almost imperceptible.

On calls that cross low-bandwidth links, it's up to the administrator to balance between latency, possible reductions in sound quality incurred by using a compression codec, and network congestion.

Up until this point, we've been talking about the overhead of each packet merely as it relates to the amount of voice payload it carries, and with good reason: codec selection and framing are something over which you, as an administrator, have the most control.

But packet overhead is affected by the network and data link layers, too. Ethernet frames have a different size and different overhead than ATM cells or frame relay frames. Network overhead is addressed in the following section.

Different codecs have different bandwidth requirements. Table 6-1 shows the characteristics of the most popular VoIP codecs.

Table 6-1. VoIP codec characteristics

Codec	Algorithm	Bandwidth used for sound	Packet interval	Voice bits per packet	Processing intensity
G.711	PCM	64 kbps	20 ms	1,280 bits	Low
G.726	ADPCM	32 kbps	20 ms	640 bits	Medium
G.728	CELP	16 kbps	10 ms	160 bits	High
G.729A	CELP	8 kbps	10 ms	160 bits	High
GSM	RPE ^a or CELP	13 kbps	20 ms	160 bits	Depends upon algorithm used; CELP is higher
^a Regular pulse excitement, yet another sound encoding/compression technique.

6.2.3. Transport

6.2.3.1 The T1 carrier versus VoIP

The T1's 24 DS0 channels are each never-ending streams of digitized voice information. In reality though, the T1 circuit itself is one big stream of binary digits that uses TDM to divide the T1 into those 24 DS0 channels. Each is assigned a time slice of the big stream, and each time slice is further divided into frames, as shown in Figure 6-1. All voice T1s use the same amount of bandwidth no matter how many calls are in progressroughly 1.54 mbps. Trunking with T1s is very stable and predictable as a result.

VoIP frees the system builder from requirements traditionally imposed on the lower OSI layers of the network. In a T1, the transport and data link layers are defined together as a bundled carrier, and you have to use the G.711 PCM codec on all the channels, yielding 24 simultaneous voice channels in the available bandwidth. VoIP lets you pick and choose the codec, packet interval, and transport technologies you want and thus gives you ultimate control. Using the G.729A codec and a T1, you could conceivably trunk hundreds of calls at once.

Unlike G.711 traffic on a T1, VoIP's "carrier" is TCP/IP. So VoIP can traverse Ethernet, T1s, DSL lines, cable internet lines, POTS lines, frame relay networks, virtual private networks (VPNs), microwave radio, satellite connections, ATM, and just about any other link. If IP can go there, VoIP can go therejust with varying levels of quality.

All VoIP systems to date use equal-sized frames of the same fidelity during each call. You'll never have a sample rate of 8 kHz change to 12 kHz during a single session the way you could with a variable-rate MP3, for example.

6.2.3.2 Voice packet structure

The layered appearance of a VoIP packet is similar to that of other types of networked applications that run within the TCP/IP protocol: the lower layers encapsulate the higher layers recursively.

The lowest layer, shown leftmost in Figure 6-3, is the Internet Protocol (IP) packet header. It contains routing information so that the packet can be handled correctly by the devices responsible for carrying it across the network. It also contains a flag that indicates which protocol of the TCP/IP suite this packet is carrying: TCP, UDP, or something else. Voice packets are almost always UDP. Among other things, the IP header may include a Type of Service flag that allows routers and switches to treat it with a certain priority based on its sensitivity to delay. At a minimum, the IP packet header is 160 bits in length.

The payload of the IP packet is the UDP packet, whose header is 64 bits long. Its first 32 bits contain the source and destination ports of the UDP traffic it carries in its payload, along with 8 bits for optional error checking and 8 bits for describing the length of its payload in multiples of 8 bits.

The payload of the UDP packet is the RTP packet, whose header is 96 bits long. It contains information about the sequence and timing of the packet within the greater data stream.

6.2.3.3 Real-Time Transport Protocol

The Real-Time Transport Protocol (RTP) defines a simple way of sending and receiving encoded media streams ^[*] in connectionless sessions. It provides headers that afford VoIP systems an easy way of discriminating between multiple sessions on the same host. Remember that the codec merely describes how the digitized sample is encoded, compressed, and decoded. RTP is responsible for transporting the encoded sound data within a UDP datagram. RTP was designed for use outside the realm of telephony, too: streaming audio and video for entertainment and education are common with RTP.

^[*] When certain signaling protocols, such as IAX, are used, RTP may not be employed to packetize the voice stream. IAX has its own built-in voice-packetization capability.

RTP supports mixing several streams into a single session in order to support applications like conference calling. It doesn't, however, provide adequate controls for defining multiplexed voice pathways that are normally associated with telephony, like trunks. This is the responsibility of the softPBX and its signaling protocols. Control of RTP's media sessions, and collection of data relevant to those sessions, is accomplished by RTP's sister, RTCP (Real-Time Transport Control Protocol). Together, RTP and RTCP provide:

Packetizing and transport of digitized, encoded voice or video signals, including unique identification of each RTP stream
Multicast sessions for conferencing applications
Basic performance feedback about the utilization of RTP media sessions

For the VoIP administrator, RTP is largely invisible. Most VoIP frameworks and system-building tools, including Asterisk and Open H.323, implement RTP so seamlessly that the administrator rarely has to worry about its inner workings. If you are interested in RTP, check out Internet RFC (Request for Comments) 1889, published by the IETF's Network Working Group.

As shown in Figure 6-3, the only part of each VoIP packet not considered overhead is the payload of the RTP packet, which is encoded sound data.

6.2.3.4 Ethernet

Ethernet is a physical and logical data link specification that makes provisions for error-correction on locally connected network devices. Ethernet packets, called frames , are typically less than 1,500 bytes, or about 12,000 bits. VoIP packets are very rarely larger than 250 bytes, or 2,000 bits. Not accounting for Ethernet overhead, the packet in Figure 6-3 is 1,600 bits long, a rather small packet.

Like RTP, UDP, and IP, Ethernet adds some bulk to each packet. The overhead Ethernet imposes is 176 bits for its header and 128 for its CRC "footer"a bumper at the end of each Ethernet frame that provides an error-detection mechanism used by network interfaces on participating hosts . Figure 6-4 shows an Ethernet VoIP frame.

The total size of a G.711 Ethernet VoIP frame is 1,904 bits. At a standard packet interval of 20 ms and 50 pps, a voice call digitized using plain- vanilla PCM at a rate of 64 kbps consumes a healthy amount of overheadspecifically, 15.2 kbps of Ethernet overhead and 16 kbps of combined RTP, UDP, and IP overhead.

When you add in the payload to all that overhead, an Ethernet-transported voice channel using the G.711 codec requires 95.2 kbps of bandwidth.

While a G.729A voice channel requires only 8 kbps of bandwidth to frame the sound stream, the overhead of IP, UDP, RTP, and Ethernet adds 31.2 kbps, putting the total bandwidth consumption of a G.729A call at 39.2 kbps. Table 6-2 shows the total Ethernet bandwidth consumed by several of the most popular codecs.

Figure 6-4. An Ethernet-encapsulated 20 ms VoIP packet

Table 6-2. VoIP codec bandwidth consumption

Codec	Encoded sound bandwidth	Ethernet overhead bandwidth	Total bandwidth
G.711	64 kbps	31.2 kbps	95.2 kbps
G.726	32 kbps	31.2 kbps	63.2 kbps
G.728	16 kbps	31.2 kbps	78.4 kbps ^a
G.729A	8 kbps	31.2 kbps	39.2 kbps
GSM	13 kbps	31.2 kbps	44.2 kbps
^a G.728 uses four voice frames at 16 kbps per packet. This accounts for the deviation in overhead bandwidth.

Ethernet isn't the only data link suitable for carrying VoIP packetsATM, frame-relay, point-to-point circuits, and other technologies can be used, and each introduces its own overhead factors.

6.2.4. Decoding and Playback

When a VoIP packet is received, it is decoded according to the codec employed to encode it. It is then played back on the analog hardware of the receiving endpointa speakerwhile undergoing DAC, or digital-to-analog conversion. Decoding generally takes about as much processing power as encoding, depending on the codec employed.

G.729A and Asterisk

Digium's implementation of the G.729A codec for Asterisk is a licensed commercial version made by Voice Age (http://www.voiceage.com). The codec is patent-protected, so if you want to use G.729A endpoints with the Asterisk server, you must pay for a commercial license. Digium sells the licensed, and GPL-friendly, version of G.729A at the price of around $10 per simultaneous call. Using unlicensed versions of G.729A (which do exist) violates the GPL under which Asterisk is distributed, because the GPL requires end users to adhere to local patent law, by which G.792A is governed in several nations.

Most IP phones and ATAs support several codecs, as shown in Table 6-3. All support G.711 using both the m law and Alaw scales, and a majority support G.729A, though with variance in quality and completeness of their implementation. It's fair to say that G.711 and G.729A are the two most popular VoIP codecs in use today.

Table 6-3. Codecs supported by some leading VoIP endpoint devices

Phone / ATA	G.711	G.726	G.728	GSM	Speex	G.729A
Cisco 7960 IP Phone	Yes	No	No	No	No	Yes
Avaya 4602 IP Phone	Yes	No	No	No	No	Yes
Grandstream Budgetone 101 IP Phone	Yes	Yes	Yes	No	No	Yes
Digium IAXy i100 ATA	Yes	Yes	No	No	No	No
Grandstream Handytone ATA	Yes	Yes	Yes	No	No	Yes
Cisco ATA-186 ATA	Yes	No	No	No	No	Yes
3com 3102 IP Phone	Yes	Yes	No	No	No	Yes
SNOM 220 IP Phone	Yes	No	No	Yes	No	Yes
X-Lite Softphone	Yes	No	No	Yes	Yes	No

6.2.4.1 Things that degrade playback quality

Several factors can degrade the quality of audio transmitted over the network:

Jitter: This effect occurs when gaps between packets occur at durations greater than the packet interval. The effect is missing or garbled speech. Jitter can be caused by network congestion or processing overloads on the encoding or decoding endpoints (this usually doesn't happen on dedicated devices, only on softphones). Some IP phones and softPBXs offer a jitter buffer to compensate for packets arriving out of sequence or at odd intervals, but jitter buffers introduce lagand lag is bad.
Lag (also called latency): The time it takes from the moment the caller speaks until the moment the listener hears what was spoken; the longer the lag, the more difficult the conversation becomes. On the PSTN, lag tends to be around 15 ms, or practically imperceptible. On Ethernet, packetization overhead, varying by packet interval, introduces at least 20-50 ms of lag. Over slow links like a frame-relay network, where larger packet intervals are required to conserve bandwidth, lag can be downright annoying. Lag as great as 120 ms is common on slow frame-relay links, for example. Using VoIP across a virtual private network can be even worse . Compound that by adding all the laggy cell phone networks in use today and you could be approaching a half-second or more of lag.
Packet loss: Excessive packet loss will kill a VoIP network. Packet loss is a fact of life, though, especially in Ethernet networks, and even the most well-intentioned network engineer will only be able to minimize, not completely eliminate, it. That being said, highly compressed codecs like G.729 are at the greatest risk of quality degradation due to packet loss, because their packet intervals tend to be larger; plus, their predictive algorithms break down quickly when robbed of voice frames. Packet loss is caused by network congestion, malfunctioning equipment, and processing overload.

6.2.4.2 Transcoding

In Project 3.1, a SIP endpoint was used to connect through the Asterisk server to a demonstration server on the Internet via the IAX signaling protocol. Though the voice channel from the SIP phone to the Asterisk server was encoded using G.711 law, the cross-Internet voice channel to the demonstration server was encoded using GSM, as shown in Figure 6-5.

Figure 6-5. A call path that uses two codecs

When a call path requires it to use more than one codecdifferent ones on the two endpoints of the call, as in the case of Figure 6-5the softPBX or another specialized server device called a gateway must transform in real time, or transcode each leg of the call. Certain connectivity mediums don't provide enough bandwidth to facilitate G.711 from end to end. A 64 kbps circuit, for example, can't carry a G.711 call, because it requires more than 64 kbps when IP packet overhead is accounted for. So bandwidth-conserving codecs have their uses, but not all endpoints support every codec. Transcoding is the solution.

Transcoding is a processing-intensive task, so it's a good idea to minimize the number of codecs that you support as standards on your network. A few conference calls with three or four codecs apiece could be a real handful. Cisco and other commercial vendors recommend G.711 for local calls over Ethernet, and G.729A over low-bandwidth WAN connections. The softPBX will insert itself into the path of the call in order to negotiate the appropriate codec on each leg and then perform transcoding.

6.2.4.3 Call paths

Even though the softPBX is the central call-management and -signaling element on the VoIP network, it doesn't always sit in the call path. One of the purposes of SIP, and other signaling protocols, is to allow endpoints to discover what codecs their peers support, so that, when beginning a call, both endpoints can be using the same one. Another purpose of SIP is to allow for multiple pathways through the voice network based on the capabilities of each endpoint and the preferences of the administrator. These pathways are known as call paths.

For example, an IP phone can place a call through the softPBX, and the softPBX can act as a proxy for the sound signals, receiving them from that caller and sending them milliseconds later to the receiver. In this case, the softPBX may or may not be transcoding, but it is a point in the call path. You could call this softPBX call path or a proxied call path.

But an IP phone placing a call mustn't always have its call path cross through the softPBX. In fact, in most commercial VoIP softPBX implementations, this isn't the preferred method. Indeed, with Cisco's CallManager, it isn't even possible out of the box, with exceptions for a few centralized applications like conferencing, music-on-hold , and bridging. In these setups, the softPBX sets up the call using a signaling protocol, and then the phones themselves communicate the sound data directly to each other in UDP bursts. The big advantage of an independent call path is that there's less processing load incurred on the softPBX. One disadvantage is that it's impossible to run centralized applications that deal with the sound stream in the call, like, say, a clandestine call-recording application.

Independent call paths are sometimes called direct paths. Call paths are the paths of the media channels used in the call not the call-management signals. In most setups, the call signals tend always to pass through the PBX.

When transcoding is employed, the call path always crosses the softPBX or another gateway device that speaks all the necessary codecs. Fortunately, transcoding tends only to be used when a medium other than Ethernet is being used for connectivity, and a codec besides G.711 is employed for that leg of the call path.

So, the call path determination is affected by several issues:

Do the endpoints involved in the call all support a common codec? Yes indicates an independent call path. No indicates a softPBX call path because transcoding is needed
Is that common codec sufficiently bandwidth conservative to be used across all data links involved? Yes indicates an independent call path. No indicates a softPBX call path because transcoding may be required for one or more endpoints
Is there a centralized application, such as a call recorder or conferencing server, that must be in the call path? Yes indicates a softPBX call path. No indicates an independent call path, in order to minimize processing load on the softPBX.
Is there an IP firewall between the caller and the receiver that will interfere with the voice transmission? Yes indicates a softPBX call path that circumvents the firewall by the administrator's design. No indicates an independent call path. (Firewall problems are covered more deeply in Chapter 13.)

Commercial vendors support automatic selection of a call path to varying degrees. As indicated earlier, some don't support a softPBX call path at all, unless its purpose is to deliver conferencing applications and not necessarily for transcoding. Others support automated negotiation of the call path during call setup signaling.

Asterisk falls into the latter group. It allows either kind of path for SIP calls (except a conference call) according to the administrator's design. Project 6.1 describes how to enable an independent call path using a SIP feature called Reinvite.

6.2.4.4 Silence suppression and comfort noise generation

When nobody is speaking, there's a great opportunity to save bandwidth, because during periods of silence, no sound data needs to be transmitted over the network, right?

Several codecs have taken this idea to heart. GSM, G.723.1, and others support silence suppression , a technique that suspends the packet stream during periods of silence. In order to create a seamless experience for the person listening to that silence, silence suppression is usually accompanied by comfort noise generation, or a small amount of white noise. This white noise is created by the endpoint of the listener, rather than being transmitted to her over the network.

For a good demonstration of silence suppression, try the peer-to-peer softphone Skype (http://www.skype.com). Before making a call with Skype, shut down all other networked applications on your PC. Then, while placing a Skype call, watch as the traffic load to and from the PC all but stops during times of silence, but notice the white noise you hear. During these times, the white noise is simulated by your Skype client.