Understanding IP Protocols for VoIP

VoIP protocols come in several flavors, but two of them lead the pack in popularity. The H.323 protocol was established by the International Telecommunications Union (ITU), and is geared for multimedia applications. The Session Initiation Protocol (more commonly known as SIP) was established by the Internet Engineering Task Force (IETF), and is quickly growing as the preferred protocol of choice with many carriers. Both of these protocols are used as a basis for transmitting VoIP calls and use the same coderdecoders, such as G.711 or G.729, to transmit and receive the calls.

Comparing H.323 and SIP

Despite the fact that its name isn’t very user friendly, H.323 is the incumbent protocol, gaining initial market share over SIP. In fact, the name isn’t the only part of the protocol that’s hard to understand. Because H.323 is so complex, many companies have turned to SIP, which is an easier protocol to administer and manage. Looking at H.323 against SIP, we see that:

SIP is structurally faster than H.323. It uses less time setting up calls, requiring only one invite message, versus the eight messages for H.323. Speed of call setup is an important factor for companies that use VoIP for telemarketing campaigns. If you place hundreds of thousands of calls a day, and every call means revenue, every second counts.
SIP is made of an easy-to-read text-based protocol. H.323 is based on a binary software code that is difficult to read and comprehend. The text-based nature of SIP enables it to be easily dissected and analyzed, directly as it is transmitted from your server.

Tip If you have a choice when choosing a VoIP setup, then I recommend using SIP over H.323. The industry is moving away from binary-based codes, so if you were to use H.323, you might need to upgrade sooner rather than later.

Ethereal is an extremely helpful tool for troubleshooting VoIP issues, allowing you to see every message sent through the hardware it is installed on. This information includes the IP addresses being used for transmission, as well as the call setup and teardown for easy troubleshooting. The software is available for free at www.ethereal.com.

Understanding the structure of a VoIP call

Every VoIP call consists of four completely separate data streams, each governed by the parent protocol, which is generally either H.323 or SIP (see “Comparing H.323 and SIP”). You have four data streams, established in pairs, with one pair at each end of the call.

One transmission set handles the setup, maintenance, and teardown (signaling) of the call, and the other set concerns itself with only the transport of voice data. Here are the details:

One pair of data streams handles signaling data to establish, maintain, and tear down the call. H.245 is the most commonly used protocol for H.323 and Session Description Protocol (SDP) is the favorite with SIP. These protocols were designed to negotiate the signaling of a voice call over VoIP. There are differences in how the signaling is managed in SIP versus H.323, but by design, they are structured to keep the signaling removed from the data stream that holds the voice portion of the call.
The other set of data streams sends the voice data. The Real-Time Transport Protocol (RTP) is the means by which your voice or video is sent from the originating phone or videoconferencing system to the system receiving the call. The signaling stream tells the VoIP equipment how to package the data and converts it on the far end of the call. RTP is the carrier pigeon that takes your message along the path determined by the signaling protocols.

Reading in the Ethereal world

Ethereal is a software package that enables your VoIP switch to capture packets, as long as the switch is using an OS (operating system) into which Ethereal can be installed (Ethereal runs on all popular platforms, including Windows, Unix, and Linux systems).

Ethereal only captures packets that flow through the computer on which it is installed, so if you have Ethereal loaded on a server behind your router, you will only be able to record packets passed between your router to your IP phones, and not the packets between your router and your carrier.

Remember Figure 15-2 shows the four paths taken in a standard VoIP call. If any one of the connections doesn’t terminate to the correct location, the call is impaired. Terminating the signaling stream to the wrong IP address on an unknown router prevents the call from being established, and terminating the RTP at the wrong location either prevents any sound from being transmitted, or results in one-way audio.

image from book
Figure 15-2: The four paths of a typical VoIP call.

Figure 15-3 demonstrates the VoIP’s flexibility: When signaling and RTP streams can operate independently, you have a whole lot of options. In the figure, one phone (IP Phone A) is calling another phone (IP Phone B). The RTP stream is being reinvited to the media server that provides voicemail for IP Phone B.

image from book
Figure 15-3: The voice portion of this call in the RTP stream is being sent to a media server. The signaling protocol never connects between either of the phones and the media server.

The interesting thing about this call is that the media portion of the call (that is, the voice part) in the RTP stream is being sent to a media server, and the signaling protocol doesn’t establish a connection between either of the phones and the media server.

Tip You can configure VoIP protocols to try several different phones (a cellphone, another office phone, or a home phone). For example, say you decide that after five rings, if the call isn’t answered, the call is sent to the next phone on the list. Such features are commonly referred to as reinvite features.

Technical Stuff Data signals aren’t sent in constant streams; the signaling protocols instead tend to transmit information on an as-needed basis. As long as the gateways at either end of the call function properly, there is no need for the overhead to baby-sit the call. After the call is established, the signaling handles the maintenance and housekeeping of the call while it is active, but doesn’t transmit nearly as much information as is sent in the RTP stream.

Don’t go rogue on me

Warning! If the SDP portion of the call is incorrect, someone may end up sending the RTP stream to an unknown IP address. A breakaway stream or rogue RTP is sent to some unlucky IP address in the world. The constant barrage of data may not be dangerous to the server receiving the rogue RTP, but it can be annoying. Performing interoperability tests with your carrier is crucial to working out configuration issues that would otherwise make your calls go to dead air.

Understanding your voice choices

Both H.323 and SIP protocols use the same equipment called codecs (an abbreviated name for coder-decoder) to convert your voice into digital code so it can be transmitted. Codecs are built into Analog Telephone Adapters (ATAs), such as the ones used with Internet VoIP services like Vonage, and in VoIP gateways your standard phone service may connect to with your VoIP carrier.

Codecs use industry standard methods to convert voice signals into digital data. Converting your voice to digital code is a delicate process, with three primary concerns with data transmissions:

Compression ratio: This ratio identifies how much a codec can reduce the volume of data sent to transmit the RTP stream of a VoIP call. The goal is reducing the amount of bandwidth you use to transmit your VoIP calls, enabling you to send more calls over the same bandwidth.
Call quality: Calls can suffer from clipping, jitter, and latency to greater or lesser degrees, depending on the compression ratio and packetization process.
Packetization delay: When a call is converted to or from VoIP, the process causes a delay. The length of this delay affects call quality.

As you can see, these three elements are inextricably related. All codecs used affect call quality and generate a packetization delay. As a result, when it comes to choosing a codec, your options are much limited based on your bandwidth and call quality requirements.

After comparing these elements, and looking at all the codecs, a few protocols stand out and are more commonly used than the other by carriers. I discuss them in the following sections.

Using uncompressed VoIP with G.711

G.711 is the garden variety, uncompressed protocol for SIP. If you want to send faxes or have poor line quality on another SIP protocol, G.711 is your codec of choice. This codec protocol gives you the highest quality and is most common in the industry.

This codec is the easiest to use, as the software is widely available on the Internet for free and all carriers support it.

Technical Stuff G.711 comes in both a domestic U.S. version and a European version. The difference between the versions is minimal, but they’re different enough to prevent them from working together efficiently. The version use in the U.S. and Japan is called G.711µ (referred to as G.711 mu-law, or sometimes called u-law) and the European version, which is called G.711a (often referred to as G.711 a-law).

Remember The greatest drawback to this codec is the fact that, because the data isn’t compressed, it uses up more bandwidth to transmit one voice call than you would use if you sent it out using the traditional Time Division Multiplexing (TDM) protocol. A non-VoIP call uses 64 kbps to transmit a single conversation, where the G.711 codec uses about 84 kbps. If you use this codec, your T-1 line, which in a non-VoIP scenario could handle 24 consecutive calls, is maxed out at about 18 consecutive calls. Is this the price you pay for quality? Maybe the benefits of VoIP outweigh the need for extra bandwidth, but you should definitely bear this issue in mind.

Compressed VoIP with G.729

VoIP with the G.729 is a 5:1 compression codec that allows VoIP to shine. It takes the 84 kbps required to handle a call with G.711 and reduces it down to only 20 kbps. In spite of the fact that the signaling portion of the call is not compressed, the voice portion is actually compressed to a ratio of 8:1. You can make almost 77 calls on the same T-1 of bandwidth where you were restricted to 18 with the uncompressed codec.

Remember Faxes don’t work when sent over the G.729. The codec was designed to compress the audio portion for voice calls by removing the inaudible tones. Unfortunately, inaudible tones are what make fax machines work. The tones are used to transmit data over the wires. If the inaudible tones aren’t transmitted, faxes are doomed to failure on this codec. Don’t fear; if you need to send compressed faxes, the next codec is for you.

Compressing your fax calls

The T.38 codec was built to resolve the faxing problem found on G.729. The T.38 Fax over IP (FoIP) protocol (oh, yes, there’s a cute acronym for everything these days) doesn’t actually compress the audio at all. Instead it converts the fax to a tagged image file format (TIFF) image and sends it, along with its own description information. The transmission is less likely to fail because the data can be re-sent if it gets lost. There may be provisions to duplicate the information if you want to add a second layer of protection to the transmission.

Tip The only shortcoming of this codec is that it’s pretty new. Not all carriers are updated with it yet, so you may have to wait until it is provided. Until then, you can always mix and match your codecs, sending faxes over G.711 and using G.729 for voice calls only.