Chapter 6. Replacing the Voice Circuit with VoIP
In circuit-switched voice networks, every time a call is placed, the network establishes a dedicated pathway from the calling endpoint through the network to the receiving endpoint. This
This chapter describes the software and hardware elements of the voice loop as it exists in Voice over IP so you can get the most out of it. Indeed, a VoIP admin can do more to improve the quality and economics of his network by "tweaking the loop" than he can by fiddling with any other aspect of the network. |
6.1. The "Dumb" Transport
In a VoIP network, each loop, or pathway, from caller to receiver is virtualized and controlled using software. So, during times of silence, for instance, the call's
LAN and WAN data links, each capable of carrying TCP/IP, are just systems for moving bits, and the low
Compare that to the trusty old voice T1, which is a rigidly defined data link designed with DS0 voice channels in mind: each channel is fixed at 64 kbps, and there can be 24 simultaneous phone calls per T1.
Of course, all of VoIP's economy and flexibility come at a price: sophistication of infrastructure. Many traditional telephone technicians aren't hip to what TCP/IP
|
6.2. Voice Channels
A VoIP softswitch has two main functions: call management (or switching), which is covered in the
Voice transmissionthe packaging, transmittal, receiving, and reconstruction of digitized voice dataoccurs inside virtualized
For this chapter, we'll give the word channel a wide definition: the complete virtualized transport that takes the mouth-to-ear analog signal and transports it over a great distance using networked software. There are several steps in the process of transmitting voice sounds over a channel: sampling, digitizing, encoding, transport, decoding, and playback. Usually, each step occurs once per packet of voice data. Complex applications like conference call, surveillance, or overhead paging may handle these steps in a unique way, but for this chapter, we'll concentrate on how the process works for a standard, two-party, point-to-point phone call. 6.2.1. Sampling and Digitizing
Digital-to-analog conversion (DAC) and
The DAC processes employed in Voice over IP aren't tied to the data link, so they can vary greatly: different DAC, digitizing, and compression techniques are used in different circumstances. Sometimes the data link's properties, like bandwidth capacity and latency, are factors in the selection of these techniques, but not always. DAC is required in all telephony environments, even where VoIP isn't used, because just about every traditional telephony system employs digital
DAC includes of quantizing or digital "sampling" of sounds, filtering for bandwidth preservation, and signal compression for bandwidth efficiency. Pulse code modulation (PCM) is the most common sampling technique used to
6.2.1.1 The 64 kbps channel
To connect a phone call, a traditional telephone, whether analog or digital, requires a loop with enough quality for 64 kilobits per second of digital throughput. In fact, 64 kbps is the fixed line speed of any POTS line. Analog and (most) digital telephone systems offer similar sound clarity because they
It should suffice for now that each concurrent voice conversationanalog or digitalrequires a link capable of a speed or bandwidth
[*]
of 64 kbps. As you'll discover, the 64 kbps channel is a baseline unit for dealing with sizing issues on your VoIP network. It's important to look at bandwidth conservation
6.2.2. Encoding6.2.2.1 Framing
Framing
is the real-time process of dividing a stream of digital sound information into manageable, equal-
Figure 6-1. Framing is the process of dividing a digital stream into equal-sized hunks
6.2.2.2 Digital versus packet basedUnlike an analog phone line, the sound signals transmitted and received by an IP endpoint are digital. This makes them more akin to those carried over a traditional voice T1 or ISDN circuit. But unlike a digital phone company circuit, VoIP calls are also packet based. This means that the sound frames are carried across the network in units that are also used to carry other kinds of datain VoIP's case, UDP datagrams. 6.2.2.3 Multiplexing
The PSTN offers a way of providing a far-higher call capacity than a single POTS line24 simultaneous calls using two pairs of wire rather than one pair per simultaneous call. This
6.2.2.4 Compression
VoIP provides an even more economical way of linking those PBXs together. If 100 calls are to occur at the same time using a PBX, then
It's possible to reduce a 64 kbps voice call down to 44 kbps without a noticeable reduction in sound qualitya feat that, setting aside the concept of overhead (which we'll cover later), is quite common with VoIP compression methods. Now, that link between PBXs uses only 4 mbps, and needs only three T1 circuits instead of five, resulting in a much cheaper trunk. The algorithms VoIP uses to encode sound data, and sometimes to decrease bandwidth requirements, are called codecs . In order to get three T1 circuits to do the work of five in a voice application, bandwidth-conserving codecs are used. 6.2.2.5 Codecs
Codecs, short for
Most of the codecs in use on VoIP networks were defined by ITU-T recommendations in of the G variety (transmission systems and media). A few are well-suited to a very high fidelity application like music streaming, but most are suitable only to spoken word. They're the ones we'll be
Telephony audio codecs break down into two groups: those that are based on pulse code modulation and those that restructure the digital representation of PCM into a more portable format. So the two groups of telephony codecs are PCM codecs, which are the basic 64 kbps codecs, and vocoders , which are the codecs that go a step beyond the essential PCM algorithm. Here are the codecs you'll see most often:
Each of the codecs has some pros and cons. G.711 is great on data links where there's plenty of capacity and very little latency, like Ethernet. It's also highly resilient to errors. But you wouldn't want to use it on a 56 k frame relay link because there would not be enough bandwidth. Conversely, the codecs that provide compression do so at a loss, or degradation in quality, to the sound. That's why some call them "lossy" codecs. 6.2.2.6 Codec packet rates
Besides the bits that represent data, all data packets carry bits used for routing and sometimes for error correction. These "overhead bits" have no direct benefit to voice applications, other than allowing the lower levels to functionthings like Ethernet headers, IP routing headers, other information necessary for transport of the packet. When longer durations of sound are carried by each packet, these overhead items don't have to be transmitted as often, because fewer packets are required to transport the same sound. The net result of
The
The
packet rate
is the number of packets required per second (pps) of sound transmitted. Again, different audio codecs use different rates. The gap between transmitted packets is called the
packet interval
, and it is
But with longer packet intervals comes increased lag (see Figure 6-2). The longer the interval, the longer the lag will be between the time the sound is spoken and the time it is encoded, transported, decoded, and
Figure 6-2. Longer packet intervals cause lag, but decrease overhead
Long packet intervals have another drawback: the greater the duration of sound carried by each packet, the greater the chance that a listener will notice a negative effect on the sound if a packet is dropped due to congestion or a network error. Dropping a packet carrying 20 ms of sound is almost imperceptible with the G.711 codec, but dropping a 60 ms packet is quite obtrusive. Since VoIP sound frames are carried in "unreliable" UDP datagrams, dropped packets aren't retransmitted. Even if TCP packets were used instead of UDP, error awareness and retransmission would take so long that, by the time the retransmitted packet arrived at the receiving phone, it would be hopelessly out of sequence.
Consider that 8,000 samples per second are required for a basic voice signal at 8 bits per sample. Now,
Mathematically, increasing the sound data in each packet means reduction of packet overhead. Figure 6-3 illustrates a very simplified cross-section of a VoIP packet carrying 20 ms of G.711 data. Following the previous example, increasing the packet interval to 30 ms (1/33rd of a second) results in a reduction in the number of packets required per second, raising Figure 6-3. TCP/IP adds overhead to a VoIP channel; this IP packet carries 20 ms of sound
the bit count per packet and reducing the amount of overhead required to transmit the sound:
On calls that cross low-bandwidth links, it's up to the administrator to balance between latency, possible reductions in sound quality incurred by using a compression codec, and network congestion.
Up until this point, we've been talking about the overhead of each packet merely as it
But packet overhead is affected by the network and data link layers, too. Ethernet frames have a different
Different codecs have different bandwidth requirements. Table 6-1 shows the characteristics of the most popular VoIP codecs. Table 6-1. VoIP codec characteristics
6.2.3. Transport6.2.3.1 The T1 carrier versus VoIP
The T1's 24 DS0 channels are each
VoIP
Unlike G.711 traffic on a T1, VoIP's "carrier" is TCP/IP. So VoIP can traverse Ethernet, T1s, DSL lines, cable internet lines, POTS lines, frame relay networks, virtual private networks (VPNs), microwave radio, satellite connections, ATM, and just about any other link. If IP can go there, VoIP can go therejust with varying levels of quality.
6.2.3.2 Voice packet structureThe layered appearance of a VoIP packet is similar to that of other types of networked applications that run within the TCP/IP protocol: the lower layers encapsulate the higher layers recursively.
The
The payload of the IP packet is the UDP packet, whose header is 64 bits long. Its first 32 bits contain the source and destination ports of the UDP traffic it carries in its payload, along with 8 bits for optional error checking and 8 bits for describing the length of its payload in
The payload of the UDP packet is the RTP packet, whose header is 96 bits long. It contains information about the sequence and timing of the packet within the greater data stream. 6.2.3.3 Real-Time Transport ProtocolThe Real-Time Transport Protocol (RTP) defines a simple way of sending and receiving encoded media streams [*] in connectionless sessions. It provides headers that afford VoIP systems an easy way of discriminating between multiple sessions on the same host. Remember that the codec merely describes how the digitized sample is encoded, compressed, and decoded. RTP is responsible for transporting the encoded sound data within a UDP datagram. RTP was designed for use outside the realm of telephony, too: streaming audio and video for entertainment and education are common with RTP.
RTP supports mixing several streams into a single session in order to support applications like conference calling. It doesn't, however, provide adequate controls for defining multiplexed voice pathways that are normally associated with telephony, like trunks. This is the responsibility of the softPBX and its signaling protocols. Control of RTP's media sessions, and collection of data relevant to those sessions, is accomplished by RTP's sister, RTCP (Real-Time Transport Control Protocol). Together, RTP and RTCP provide:
For the VoIP administrator, RTP is largely invisible. Most VoIP frameworks and system-building tools, including Asterisk and Open H.323, implement RTP so seamlessly that the administrator rarely has to worry about its inner workings. If you are interested in RTP, check out Internet RFC (Request for Comments) 1889, published by the IETF's Network Working Group.
As shown in Figure 6-3, the only part of each VoIP packet
not
6.2.3.4 EthernetEthernet is a physical and logical data link specification that makes provisions for error-correction on locally connected network devices. Ethernet packets, called frames , are typically less than 1,500 bytes, or about 12,000 bits. VoIP packets are very rarely larger than 250 bytes, or 2,000 bits. Not accounting for Ethernet overhead, the packet in Figure 6-3 is 1,600 bits long, a rather small packet.
Like RTP, UDP, and IP, Ethernet adds some bulk to each packet. The overhead Ethernet imposes is 176 bits for its header and 128 for its CRC "footer"a
The total size of a G.711 Ethernet VoIP frame is 1,904 bits. At a standard packet interval of 20 ms and 50 pps, a voice call digitized using plain-
When you add in the payload to all that overhead, an Ethernet-transported voice channel using the G.711 codec requires 95.2 kbps of bandwidth.
While a G.729A voice channel requires only 8 kbps of bandwidth to frame the sound stream, the overhead of IP, UDP, RTP, and Ethernet adds 31.2 kbps,
Figure 6-4. An Ethernet-encapsulated 20 ms VoIP packet
Table 6-2. VoIP codec bandwidth consumption
Ethernet isn't the only data link suitable for carrying VoIP packetsATM, frame-relay, point-to-point circuits, and other technologies can be used, and each introduces its own overhead factors. 6.2.4. Decoding and PlaybackWhen a VoIP packet is received, it is decoded according to the codec employed to encode it. It is then played back on the analog hardware of the receiving endpointa speakerwhile undergoing DAC, or digital-to-analog conversion. Decoding generally takes about as much processing power as encoding, depending on the codec employed.
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %} Most IP phones and ATAs support several codecs, as shown in Table 6-3. All support G.711 using both the m law and Alaw scales, and a majority support G.729A, though with variance in quality and completeness of their implementation. It's fair to say that G.711 and G.729A are the two most popular VoIP codecs in use today. Table 6-3. Codecs supported by some leading VoIP endpoint devices
6.2.4.1 Things that degrade playback qualitySeveral factors can degrade the quality of audio transmitted over the network:
6.2.4.2 TranscodingIn Project 3.1, a SIP endpoint was used to connect through the Asterisk server to a demonstration server on the Internet via the IAX signaling protocol. Though the voice channel from the SIP phone to the Asterisk server was encoded using G.711 law, the cross-Internet voice channel to the demonstration server was encoded using GSM, as shown in Figure 6-5. Figure 6-5. A call path that uses two codecs
When a call path requires it to use more than one codecdifferent ones on the two endpoints of the call, as in the case of Figure 6-5the softPBX or another specialized server device called a gateway must transform in real time, or transcode each leg of the call. Certain connectivity mediums don't provide enough bandwidth to facilitate G.711 from end to end. A 64 kbps circuit, for example, can't carry a G.711 call, because it requires more than 64 kbps when IP packet overhead is accounted for. So bandwidth-conserving codecs have their uses, but not all endpoints support every codec. Transcoding is the solution. Transcoding is a processing-intensive task, so it's a good idea to minimize the number of codecs that you support as standards on your network. A few conference calls with three or four codecs apiece could be a real handful. Cisco and other commercial vendors recommend G.711 for local calls over Ethernet, and G.729A over low-bandwidth WAN connections. The softPBX will insert itself into the path of the call in order to negotiate the appropriate codec on each leg and then perform transcoding. 6.2.4.3 Call pathsEven though the softPBX is the central call-management and -signaling element on the VoIP network, it doesn't always sit in the call path. One of the purposes of SIP, and other signaling protocols, is to allow endpoints to discover what codecs their peers support, so that, when beginning a call, both endpoints can be using the same one. Another purpose of SIP is to allow for multiple pathways through the voice network based on the capabilities of each endpoint and the preferences of the administrator. These pathways are known as call paths.
For example, an IP phone can place a call through the softPBX, and the softPBX can act as a proxy for the sound signals, receiving them from that caller and sending them milliseconds later to the receiver. In this case, the softPBX may or may not be transcoding, but it is a point in the call path. You could call this softPBX call path or a
But an IP phone placing a call mustn't always have its call path cross through the softPBX. In fact, in most commercial VoIP softPBX implementations, this isn't the preferred method. Indeed, with Cisco's CallManager, it isn't even possible out of the box, with exceptions for a few centralized applications like conferencing,
When transcoding is employed, the call path always crosses the softPBX or another gateway device that speaks all the necessary codecs. Fortunately, transcoding tends only to be used when a medium other than Ethernet is being used for connectivity, and a codec besides G.711 is employed for that leg of the call path. So, the call path determination is affected by several issues:
Commercial vendors support automatic selection of a call path to varying degrees. As indicated earlier, some don't support a softPBX call path at all, unless its purpose is to deliver conferencing applications and not
Asterisk
6.2.4.4 Silence suppression and
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|