In the Data-Networking Community


Prev	don't be afraid of buying books	Next

Over the years, data-networking engineers have developed precise rules for how a data packet is constructed, and how each side behaves when it sends and receives data packets. These rules are called protocols. Although many protocols for data networking have been developed during the past 50 years, since the rise of the Internet, the Internet Protocol (IP) has become the most important protocol.

IP has proved to be remarkably scalable and adaptable. That is why IP networking has become ubiquitous, changing the way people think about transferring data and communicating. Over the past few years, the word convergence has drawn a lot of attention to the IP-networking industry. Convergence refers to taking different types of data—voice, video, and application data—and transferring them over the same IP network.

Data-Networking Standards

Just as the ITU has been influential in the creation of standards in the telephony community, the Internet Engineering Task Force (IETF) has led the standardization efforts in the data-networking community.

New data-networking techniques go through a rigorous trial phase consisting of study, implementation, and review to verify their stability and robustness. Those that pass these critical examinations are known by their Request For Comments (RFC) number, because the RFC stage is often the last step in the transition from a draft standard to an approved standard.

Each of the components of the Internet Protocol discussed in the following sections—known by names such as TCP, UDP, and RTP—has one or more corresponding RFCs that describe its operation.

How VoIP Works

Voice over IP is simply the transfer of voice conversations as data over an IP network. Unlike traditional circuit-switched calls on the PSTN VoIP calls are "packet switched." In a packet-switched environment, multiple computer devices share a single data network. They communicate by sending packets of data to one another, each packet containing addressing information that specifies the source and target computers. The contents of these packets—that is, their payload—are snippets of the voice conversation. The packets within a single transmission can take different paths from end to end across a data network.

With a VoIP call, the call setup portion of the calling sequence has to be simulated—dial tone, ringing, busy signals. The audio portion of the call needs to be converted from analog to digital, cut into packets, sent across the network in packet format, reassembled, and converted from digital back to analog. Encoders and decoders at either end do the conversion from analog to digital and back. (An explanation of how they work is provided a bit later.)

Here is what happens when a VoIP call is made:

1. The caller picks up the telephone handset and hears a dial tone.

2. The caller dials a telephone number, which is mapped to the IP address of the callee.

3. Call setup protocols are invoked to locate the callee and send a signal to produce a ring.

4. The destination phone rings, indicating to the callee that a call has arrived.

5. The callee picks up the telephone handset and begins a two-way conversation. The audio transmission is encoded using a codec and travels over the IP network using a voice streaming protocol.

6. The conversation ends, call teardown occurs, and billing is performed.

VoIP Components

This chapter has already discussed the building blocks of the PSTN. To transfer voice data on the same network with e-mail and web traffic, a new set of components is required. Some of the most important components are the following:

Codecs
TCP/IP and VoIP protocols
IP telephony servers and PBXs
VoIP gateways and routers
IP phones and softphones

Codecs

A codec (which stands for compressor/decompressor or coder/decoder) is the hardware or software that samples analog sound, converts it to digital bits, and outputs it at a predetermined data rate. Some codecs perform compression to save bandwidth. There are dozens of available codecs, each with its own characteristics.

Codecs have odd-looking names that correspond to the name of the ITU standard that describes their operation. For example, the codecs named G.711u and G.711a convert from analog to digital and back with relatively high quality. As with most things digital, higher quality implies more bits, so these two codecs use more bandwidth than lower-quality codecs.

Lower-speed codecs, such as G.726, G.729, and those in the G.723.1 family, consume less network bandwidth. However, low-speed codecs impair the quality of the audio much more than high-speed codecs because low-speed codecs apply lossy compression—compression that loses some of the original data. Fewer bits are sent, so the receiving side does its best to approximate what the original audio sounded like, but it is not a high-fidelity re-creation.

Table 1-1 lists some common VoIP codecs. For each codec, the codec's data rate is shown, as well as the time needed by the codec to do the analog-to-digital and digital-to-analog conversions. The middle column in the table shows the rate at which the codec generates its output. The Packetization Delay column refers to the delay a codec introduces as it converts from analog to digital and back. You will see in later chapters that this fixed amount of delay can affect the quality of the call as perceived by the listeners.

Table 1-1. Six Common Codecs Used in VoIP

Codec Name	Nominal Data Rate	Packetization Delay
G.711u	64.0 kbps	1.0 ms
G.711a	64.0 kbps	1.0 ms
G.726-32	32.0 kbps	1.0 ms
G.729	8.0 kbps	25.0 ms
G.723.1 MPMLQ	6.3 kbps	67.5 ms
G.723.1 ACELP	5.3 kbps	67.5 ms

Codecs use sophisticated techniques for coding and compression. You will see names that stretch the limits of your math background, like Multi-Pulse Maximum Likelihood Quantization (MPMLQ) and Algebraic Code Excited Linear Predictive (ACELP) compression. The names suggest how the codecs do their job; these topics are beyond the scope of this book.

Packet loss concealment (PLC) is an additional feature available with the G.711u or G.711a codecs. PLC techniques reduce or mask the effects of data loss during a telephone conversation. PLC does not add delay or have bad side effects, but it makes the G.711 codecs more expensive to manufacture. Codecs with PLC can provide a dramatic improvement in the voice quality when data loss occurs.

TCP/IP Protocols

The TCP/IP family of protocols forms the basis of the Internet and most current corporate networks. Computer programs send and receive data over an IP network by making program calls to the TCP/IP software, known as the protocol stack, in their local computer. The TCP/IP stack in the local computer exchanges information with the TCP/IP stack in the target computer to accomplish the transfer of data from one side to the other. The information they exchange includes the size of the chunks of data (the datagram size), identifies data associated with each datagram (the datagram header), and what should occur if a datagram is lost or damaged in transit.

The Internet Protocol determines how datagrams are transferred across an IP network from the sending program to the receiving program. Datagrams are the units sent and received by the two sides, and they move in hops, or segments, across a network. Each hop has its own network characteristics; for example, some hops may be Fast Ethernet hops, whereas others could be slower, broadband connections. To optimize the performance of the hops, devices on the network may perform datagram fragmentation, cutting large datagrams into smaller pieces, called packets, which need to be reassembled into the original datagrams by the receiving computer.

When a datagram arrives at an IP router or switch in a network, the router or switch decides where the datagram should go in its next hop and forwards it along. The section "Working on the Problem Areas" in Chapter 3, "Planning for VoIP," returns to this discussion of hops through the network, but for now, suffice it to say that too much time spent going through one or more of the hops can delay the datagrams and add variation in the delay time, making the telephone conversation sound poor.

The current version of the Internet Protocol called IPv4 has been around since RFC 791 was published in the early 1980s. It is remarkable that despite all the changes in computer networks, the underlying protocol has not changed very much. However, IPv4 has some limitations, which have led to a new, improved version known as IPv6. IPv6 seeks to provide a larger address space to prevent the current Internet from running out of available IP addresses. The protocol details that are discussed in this book all apply to IPv4. IPv6 is currently being tested by the major network vendors and is being deployed in some newer networks. Look for IPv6 networks to eventually replace the current IPv4 infrastructure.

Sending and receiving application programs communicate via two related protocols when they contact their TCP/IP stack:

Transmission Control Protocol (TCP)— When making calls to the TCP interface, the sending program wants to make sure that the receiving program gets everything that is sent—that is, it wants to avoid data being lost, duplicated, or out of order. TCP is known as a connection-oriented protocol because the two sides of the data exchange maintain strong tracking of everything that is sent and received. For example, your browser uses the TCP interface when fetching web pages—you don't want to see holes or out-of-order pieces of data on the screen, so your browser and the web server program work together to make sure everything is received intact.
User Datagram Protocol (UDP)— When using UDP, the sending application has no assurance of delivery, and it is willing to deal with that. UDP is called a connectionless protocol, which means that when using this protocol, the two sides don't acknowledge receiving any data to make sure everything arrived intact. Think about a stock ticker running across the bottom of your computer screen. If a datagram is lost, causing one of the quotes to be lost, it is not catastrophic because another will come along shortly—a stock ticker application is a good example of a program that uses UDP to send data.

The datagrams that the application assembles contain protocol-specific information. The TCP or UDP portion of an individual datagram is nested inside an IP wrapper. For example, a UDP header describes how the payload of a UDP datagram is to be decoded. In turn, the IP header contains information such as the network addresses of the sender and the receiver. Figure 1-6 shows IP packets and their header format. (Refer to RFC 791 for more information.)

Figure 1-6. IP Packets and Their Header Format

Whether the protocol is TCP or UDP, the header of every IP packet contains several standard fields. You will encounter these fields throughout this book as VoIP is discussed:

TOS (Type Of Service)— The TOS byte can be used to mark the priority of a packet. It is generally set to zero, which means that the devices in the network that examine the packet give their best effort in delivering it from one side of the network to the other. By setting this byte to a nonzero value, an application can request improved handling for a packet, making it less likely to be dropped or delayed. The first 6 bits of this byte are also known as the Differentiated Services (or DiffServ) field.
TTL (Time To Live)— Each time a packet takes one hop in its path across a network, the number in the TTL byte is reduced by one. If a device receives a packet with a zero in its TTL byte, it discards the packet. A TTL of zero means the packet has lived too long (that is, it has taken too many hops), indicating a problem with the network or with the packet. The TTL keeps packets from circling an IP network forever.
Checksum— A checksum is used to detect any changes made to the bits during transmission. The sending side feeds all the bits it is sending through a sophisticated equation and writes the final result of the equation into the Checksum field. The receiving side similarly passes all the bits it receives through the same equation. If its results match the checksum that was sent, the receiving side can be confident no bits were changed (accidentally or maliciously) during the transmission. Otherwise, it should discard the packet. This checksum is used to verify the integrity of the IP header.
Source Address and Destination Address— These fields are the 4-byte IP addresses of the sending and receiving applications. These 4 bytes are traditionally written in dotted notation, like 192.168.123.158.

These definitions scratch the surface of an extremely complex subject. To obtain more detailed information, you should seek out some of the excellent books that explain TCP/IP comprehensively. A few recommended titles are provided at the end of this chapter.

VoIP Protocols

Application programs build their own families of higher-layer protocols on top of the lower-layer protocols they use for transport and other tasks. Placing a VoIP telephone call on a data network involves the call setup—the VoIP equivalent of getting a dial tone, dialing a phone number, getting a ring or a busy signal at the far end, and picking up the phone to answer the call—and then the telephone conversation. VoIP protocols are required during both phases:

Call setup protocols— Several higher-layer protocols can accomplish call setup and takedown, including H.323, SIP, SCCP, MGCP, and Megaco/H.248 (these are described in the next section). The programs that implement the call setup protocol use TCP and UDP to exchange data during the call setup and takedown phases.
Voice streaming protocols— The exchange of encoded voice data occurs after call setup (and before call takedown), using two data flows—one in each direction—to let both participants speak at the same time. Each of these two data flows uses a higher-layer protocol called Real-Time Transport Protocol (RTP), which is encapsulated in UDP as it travels through the network. Figure 1-7 illustrates the two sets of VoIP protocols.

Figure 1-7. Two Sets of High-Level Protocols, for Call Setup and for Conversation

The following sections describe the call setup and voice streaming protocols in more depth.

Call Setup Protocols

Call setup protocols use TCP and UDP to transfer data during the setup and takedown phases of a telephone call. They handle functions like the mapping of phone numbers to IP addresses, generating dial tones and busy signals, ringing the callee, and hanging up. There are two families of call setup protocols: one set from the telephony community and the other from the data-networking community.

The call setup protocols H.323 and Media Gateway Control Protocol (MGCP) come from the telephony community by way of the ITU. H.323 is widely deployed and, among the call setup protocols, has been around the longest period of time. H.323 is actually a family of telephony-based standards for multimedia, including voice and videoconferencing. MGCP is the less flexible version, for use with inexpensive devices like home telephones.

The family of H.323 protocols has been refined for many years. As a result, it is robust and flexible, but the cost of this robustness is that it has high overhead: A calling session includes lots of handshakes and data exchanges for each function performed.

Session Initiation Protocol (SIP) and Media Gateway Control (Megaco) are lightweight protocols developed by the IETF in the data-networking community. SIP, in particular, represents typical data-networking logic, which asks: Why use a heavyweight protocol (such as H.323) when a lightweight protocol (such as SIP) will get the job done most of the time? SIP is a current "industry darling"—it is supported by Cisco and Nortel, and Microsoft ships SIP client interfaces with its Windows XP operating system.

In addition to the standardized call setup protocols, vendors have provided their own proprietary protocols. One example of this is Skinny Client Control Protocol (SCCP). SCCP provides a simple, lightweight call setup protocol for Cisco devices.

Although the H.323 family of call setup protocols is predominant today, it is likely that all the protocols discussed here (H.323, MGCP, Megaco, SIP, and SCCP) will be used by VoIP equipment in varying degrees for the foreseeable future.

Voice Streaming Protocols

Widely used for streaming audio and video, RTP is designed for applications that send data in one direction with no acknowledgments. The header of each RTP datagram contains a time stamp, so the application receiving the datagram can reconstruct the timing of the original data. It also contains a sequence number so that the receiving side can deal with missing, duplicate, or out-of-order datagrams.

The two RTP streams carrying the bidirectional conversation are the important elements in determining the quality of the voice conversations. It is helpful to understand the composition of the RTP datagrams, which transport the voice datagrams. Figure 1-8 shows the RTP header format. (Refer to RFC 1889 for more information.)

Figure 1-8. Header Used for RTP Follows the UDP Header in Each Datagram

All the fields related to RTP sit inside the UDP payload. So, like UDP, RTP is a connectionless protocol. The software that creates RTP datagrams is not commonly part of the TCP/IP protocol stack, so applications are written to add and recognize an additional 12-byte header in each UDP datagram. The sender fills in each header, which contains four important fields:

RTP Payload Type— Indicates which codec to use. The codec conveys the type of data (such as voice, audio, or video) and how it is encoded.
Sequence Number— Helps the receiving side reassemble the data and detect lost, out-of-order, and duplicate datagrams.
Time Stamp— Used to reconstruct the timing of the original audio or video. It also helps the receiving side determine variations in datagram arrival times, known as jitter.

It is the time stamp that brings real value to RTP. An RTP sender puts a time stamp in each datagram it sends. The receiving side of an RTP application notes when each datagram actually arrives and compares this to the time stamp. If the time between datagram arrivals is the same as when they were sent, there is no variation. However, depending on network conditions, there could be lots of variation in datagram arrival times. The receiving side can easily calculate this jitter using the time stamp.
Source ID— Lets the software at the receiving side distinguish among multiple, simultaneous incoming streams.

Real bandwidth consumption by VoIP calls is higher than it first appears. The accumulation of headers can add a lot of overhead, depending on the size of the data payload. For example, a typical payload size when using the G.729 codec is 20 bytes, which means that the codec outputs 20-byte chunks of the VoIP call at a predetermined rate specific to that codec. With RTP, two-thirds of the datagram is the header, because the total header overhead consists of the following:

RTP (12 bytes) + UDP (8 bytes) + IP (20 bytes) = 40 bytes

The G.729 codec has a data rate of 8 kbps. When sent at 20-ms intervals, its payload size is 20 bytes per datagram. To this, add the 40 bytes of RTP header and any additional Layer 2 headers. For example, Ethernet drivers generally add 18 more bytes. The Bandwidth Required column in Table 1-2 shows a more accurate picture of actual bandwidth usage for some common codecs on an Ethernet network.

Some IP phones let you set the "delay between packets" or "speech packet length," which is the rate at which the sender delivers datagrams into the network. For example, at 64 kbps, a 20-ms speech datagram implies that the sending side creates a 160-byte datagram payload every 20 ms. A simple equation relates the codec speed, the delay between voice datagrams, and the datagram payload size:

Payload size (in bytes) = Codec speed (in bits/sec) * datagram delay (ms) 8 (bits/byte) * 1000 (ms/sec)

In this example:

160 bytes = (64,000 * 20)/8000

For a given data rate, increasing the delay causes the datagrams to get larger, because the datagrams are sent less frequently to transport the same quantity of data. A delay of 30 ms at a data rate of 64 kbps would require sending 240-byte datagrams.

Table 1-2. Common Codec Attributes

Codec	Nominal Data Rate	Packetization Delay	Typical Datagram Spacing	Bandwidth Required
G.711u	64.0 kbps	1.0 ms	20 ms	87.2 kbps
G.711a	64.0 kbps	1.0 ms	20 ms	87.2 kbps
G.726-32	32.0 kbps	1.0 ms	20 ms	55.2 kbps
G.729	8.0 kbps	25.0 ms	20 ms	31.2 kbps
G.723.1 MPMLQ	6.3 kbps	67.5 ms	30 ms	21.9 kbps
G.723.1 ACELP	5.3 kbps	67.5 ms	30 ms	20.8 kbps

Now that you understand more about the types of protocols VoIP uses, you are ready to move on to the discussion of the next set of VoIP components.

IP Telephony Servers and PBXs

Many data-networking transactions are based on the client/server model of computing. Client computers make requests for services to server computers, which perform those services and return the results. You are probably familiar with web servers, e-mail servers, and database servers, all of which perform client/server transactions.

Adding voice data to IP networks requires yet another set of servers that are designed to provide voice services in innovative ways. An IP PBX typically serves as the core IP telephony server. On the PSTN, the PBX is often a closed-box system—it provides all the voice functions and features you need, but usually in a proprietary manner. Management of the closed-box platform is left up to the PBX vendor. With VoIP, an IP PBX can be built on a PC platform running on an operating system such as Microsoft Windows, Linux, or Sun Solaris. Although parts of the IP PBX are inherently proprietary, the platforms can be managed through vendor application programming interfaces (APIs) and through the standard APIs provided by the operating system.

An IP PBX provides functions and features similar to those that a traditional PBX provides. Although the standard PBX of the PSTN offers multiple features developed over decades, such as call transfer and call forwarding, IP PBXs are already providing the same kinds of features and more, and their development is advancing quickly. Cisco CallManager is an example of a full-featured IP PBX.

Other IP telephony servers provide new and interesting services. For example, unified messaging—the convergence of voice mail and e-mail—can be considered a benefit of a VoIP implementation. Unified messaging servers also run on PC platforms and talk to e-mail servers and IP PBXs to provide message access in a variety of ways.

Figure 1-9 shows key components in a VoIP deployment.

Figure 1-9. VoIP Network and Its Typical Components

Another new concept introduced along with IP telephony servers is clustering, in which several of these servers are grouped together in a cluster to offer increased scalability, reliability, and redundancy. Clustered servers function together and can be managed as a unit, providing combined processing power while logically appearing as a single server. Clustering is not available with traditional PBXs in the PSTN.

Another type of server, the gatekeeper, is used by the H.323 protocol to provide call admission control (CAC) and other management functions, such as address lookup, for multimedia services. The gatekeeper uses a set of signaling flows, RAS (registration, admission, and status), to work with VoIP devices. The CAC function of a gatekeeper can be especially important for networks with limited bandwidth, because the gatekeeper can track the number of calls in progress and restrict calls based on current bandwidth consumed. The goal of CAC is to limit new calls (or reroute them to the PSTN) if they may adversely impact the quality of calls that are already in progress on the VoIP network.

Video streaming and videoconferencing servers also deserve some mention here. Although not directly related to VoIP, video servers eventually take advantage of the converged network infrastructure that is needed for VoIP. Because of its higher bandwidth requirements, video over IP presents a new set of challenges that make VoIP look easy!

VoIP Gateways, Routers, and Switches

VoIP gateways and IP routers move RTP voice datagrams through an IP network. VoIP gateways provide a connection between the VoIP network and the PSTN, so these devices play a key role in the migration path toward VoIP. Although networks that have exclusively VoIP phones are growing, there are still instances when it is necessary to connect to the PSTN to place calls to PSTN users. VoIP gateways must use the SS7 protocol to signal switches in the PSTN when a phone call is originating from the VoIP network and the callee is in the PSTN. In addition, VoIP gateways may provide conversion between different codecs, which is called transcoding. If a codec other than G.711, say G.729, is used on the VoIP network, the voice data must be converted to G.711 before being transferred to the PSTN.

In a corporate environment, VoIP gateways can interconnect with traditional PBXs to provide a migration path and allow for staged VoIP deployments. Gateways are typically capable of speaking a large number of different protocols. These complex devices handle the variety of signaling and data protocols that are required to communicate between the VoIP network and the PSTN.

By examining the IP packet headers, IP routers make the decisions necessary to move packets to the next router and hop along the path to the destination. Tracing the route of a voice packet through the network can be useful for problem identification and diagnosis; techniques for this are discussed in later chapters. Router technology itself is well understood but is not discussed in detail in this book.

Figure 1-10 shows an expanded VoIP network with a connection to the PSTN.

Figure 1-10. VoIP Network with Its VoIP Gateways Connected to the PSTN

Ethernet switches look at the link-layer header information to move packets from source to destination. These switches have become the cornerstone of many campus LANs, providing high-speed network access to the desktop. Many router-like functions are beginning to appear in switches to further blur the line between router and switch. Many VoIP implementations recommend using switch functionality to create virtual LANs (VLANs) for VoIP traffic. IP phones are typically connected directly to Ethernet switches.

IP Phones and Softphones

To make VoIP work, analog audio must be converted to digital datagrams. You know this is performed by codecs, but where does the conversion take place? Where are the codecs located?

If you are using older analog telephones, the codecs are located in the IP PBX. Incoming calls are digitized there, before being forwarded onto the IP network.

Alternatively, the codecs can be located in the telephones themselves. These new digital telephones are called IP phones. Rather than having a four-line telephone connector in the back, they usually have an Ethernet LAN connection. An IP phone makes data connections to an IP telephony server, which does the call setup processing.

There is yet another choice. Your computer can serve in the role of the IP phone on your desk. You plug a headset and microphone into the computer's audio card. The computer's CPU runs the software doing the codec processing, and the computer has a LAN connection into the data network. As with an IP phone, your computer, or softphone, probably relies on an IP telephony server to do call setup processing.


	Amazon