Understanding VoIP Requirements

The increased efficiency of IP networks and the ability to statistically multiplex voice traffic with data packets allows companies to maximize their return on investment (ROI) in data network infrastructures. Decreased cost and an increase in the availability of differentiated services are two major reasons companies are evaluating the implementation of VoIP.

As demand for voice services in the IP network expands, it is important to understand the components and functionality that must be present for a successful implementation. Several protocols and tools are available for carrying voice in a data network. In defining the VoIP protocol stack, you must understand at which layer these tools and protocols reside and how they interact with other layers. When voice is packaged into IP packets, additional headers are created to carry voice-specific information. These headers can create significant additional overhead in the IP network.

Understanding which protocols to use and knowing how to limit overhead is crucial in carrying voice efficiently across an IP network.

Business Case for VoIP

Business advantages that are driving implementations of VoIP networks have changed over time. Starting with simple media convergence, these advantages have evolved to include the convergence of call-switching intelligence and the total user experience.

Originally, ROI calculations centered on toll-bypass and converged-network savings. Although these savings are still relevant today, advances in voice technologies allow organizations and service providers to differentiate their product offerings by providing advanced features.

VoIP business drivers include the following:

Cost savings Traditional time-division multiplexing (TDM), which is used in the public switched telephone network (PSTN) environment, dedicates 64 kbps bandwidth per voice channel. This approach results in bandwidth being wasted when there is no voice to transmit. VoIP shares bandwidth among multiple logical connections, which makes more efficient use of the bandwidth, thereby reducing bandwidth requirements. A substantial amount of equipment is needed to combine 64 kbps channels into high-speed links for transport across the network. Packet telephony statistically multiplexes voice traffic alongside data traffic. This consolidation results in substantial savings on capital equipment and operations costs.
Flexibility The sophisticated functionality of IP networks allows organizations to be flexible in the types of applications and services they provide to their customers and users. Service providers can easily segment customers, which helps the service providers provide different applications, custom services, and rates that depend on the traffic volume needs of the customer and other factors.
Advanced features Current VoIP applications provide advanced features such as the following:
- Advanced call routing When multiple paths exist to connect a call to its destination, some of these paths might be preferred over others based on cost, distance, quality, partner handoffs, traffic load, or various other considerations. Least-cost routing and time-of-day routing are two examples of advanced call routing that can be implemented to determine the best possible route for each call.

- Unified messaging Unified messaging improves communications and boosts productivity. It delivers this advantage by providing a single-user interface to messages that have been delivered over a variety of mediums. For example, users can read their e-mail, hear their voice mail, and view fax messages by accessing a single inbox.

- Integrated information systems Organizations are using VoIP to affect business process transformation. Centralized call control, geographically dispersed virtual-contact centers, and access to resources and self-help tools are examples of VoIP technology that have enabled organizations to draw from a broad range of resources to service customers.

- Long-distance toll bypass Long-distance toll bypass is an attractive solution for organizations that are charged long-distance fees for a significant number of calls between sites. In this case, it might be more cost effective to use VoIP to place those calls across the IP network. If the IP WAN becomes congested, the calls can overflow into the PSTN, ensuring that there is no degradation in voice quality.

- Encryption Security mechanisms in the IP network allow the administrator to ensure that IP conversations are secure. Encryption of sensitive signaling header fields and the message body protects the packet in case of unauthorized packet interception.

- Customer relationship The ability to provide customer support through multiple mediums, such as telephone, chat, and e-mail, builds solid customer satisfaction and loyalty. A pervasive IP network allows organizations to provide contact-center agents with consolidated and up-to-date customer records along with the related customer communication. Access to this information allows quick problem solving, which, in turn, builds strong customer relationships.

VoIP Functional Components

In the traditional PSTN telephony network, all the elements that are required to complete the call are transparent to the end user. Migration to VoIP necessitates an awareness of these required elements and a thorough understanding of the protocols and components that provide the same functionality in an IP network.

Required VoIP functionality includes the following features:

Signaling
Database services
Bearer control
CODECs

The following sections describe each required functional component.

Signaling

Signaling is the ability to generate and exchange control information to establish, monitor, and release connections between two endpoints. Voice signaling requires the ability to provide supervisory, address, and alerting functionality between nodes. PSTN uses Signaling System 7 (SS7) to transport control messages in an out-of-band signaling network. VoIP presents several options for signaling, including H.323, Session Initiation Protocol (SIP), Megaco/H.248, and Media Gateway Control Protocol (MGCP). Some VoIP gateways are also capable of initiating SS7 signaling directly to the PSTN network.

Signaling protocols are classified either as peer-to-peer or client/server architectures. SIP and H.323 are examples of peer-to-peer signaling protocols where the end devices or gateways contain the intelligence to initiate and terminate calls and interpret call control messages. Megaco/H.248 and MGCP are examples of client/server protocols where the endpoints or gateways do not contain call control intelligence but send or receive event notifications to the server commonly referred to as the call agent. For example, when an MGCP gateway detects that a telephone has gone off hook, the gateway does not know to automatically provide a dial tone. The gateway sends an event notification to the call agent, telling the agent that an off-hook condition has been detected. The call agent then notifies the gateway to provide a dial tone.

Database Services

Access to services such as 1-800 numbers or caller ID requires the ability to query a database to determine whether the call can be placed or the information can be made available. Database services include access to billing information, calling name (CNAM) delivery, toll-free database services (1-8xx), and calling card services. VoIP service providers can differentiate their services by providing access to numerous and unique database services. For example, to simplify fax access to mobile users, a provider might build a service that converts fax to e-mail. Another example might be to provide a call notification service that places outbound calls with prerecorded messages at specific times to notify users of such events as school closures, wake-up calls, or appointment reminders.

Bearer Channel Control

Bearer channels are the channels that carry voice calls. Proper supervision of these channels requires that the appropriate call connect and call disconnect signaling be passed between end devices. Correct signaling ensures that the channel is allocated to the current voice call and that the channel is properly de-allocated when either side terminates the call. These connect and disconnect messages are carried in SS7 within the PSTN network, and in SIP, H.323, Megaco/H.248, or MGCP within an IP network.

CODECs

Coder-decoders (CODECs) provide the coding and decoding translation between analog and digital facilities. Each CODEC type defines the method of voice coding and the compression mechanism that is used to convert the voice stream. The PSTN uses TDM to carry each voice call. Each voice channel reserves 64 kbps of bandwidth and uses the G.711 CODEC to convert the analog voice wave to a TDM voice stream. G.711 creates a 64 kbps digitized voice stream. In VoIP design, CODECs often compress voice beyond the 64 kbps voice stream to allow more efficient use of network resources. The most widely used CODEC in the WAN environment is G.729, which compresses the voice stream (that is, the voice payload only) to 8 kbps.

VoIP Protocols

VoIP employs a variety of protocols to set up a call, tear down a call, and send information (for example, the actual spoken voice) during a call. The following are the major VoIP protocols:

H.323 An ITU standard protocol for interactive conferencing. H.323 was originally designed for multimedia in a connectionless environment, such as a LAN. H.323 serves as an umbrella of standards that define all aspects of synchronized voice, video, and data transmission. H.323 defines end-to-end call signaling.
Media Gateway Control Protocol (MGCP) A method for PSTN gateway control or thin device control. Specified in RFC 2705, MGCP defines a protocol to control VoIP gateways that are connected to external call-control devices, referred to as call agents. MGCP provides the signaling capability for less-expensive edge devices, such as gateways, that might not have implemented a full voice-signaling protocol such as H.323. For example, any time an event such as an off-hook condition occurs at the voice port of a gateway, the voice port reports that event to the call agent. The call agent then signals that device to provide a service, such as dial-tone signaling.
Megaco/H.248 A joint Internet Engineering Task Force (IETF) and ITU standard that is based on the original MGCP standard. Megaco defines a single gateway control approach that works with multiple gateway applications including PSTN gateways, ATM interfaces, analog-like and telephone interfaces, interactive voice response (IVR) servers, and others. Megaco provides full call control intelligence and implements call level features such as transfer, conference, call forward, and hold. The basic operation of Megaco is very similar in nature to MGCP. However, Megaco provides more flexibility by interfacing with a wider variety of applications and gateways.
Session Initiation Protocol (SIP) A detailed protocol that specifies the commands and responses to set up and tear down calls. SIP also details features such as security, proxy, and transport (TCP or User Datagram Protocol [UDP]) services. SIP and its partner protocols, Session Announcement Protocol (SAP) and Session Description Protocol (SDP), can provide announcements and information about multicast sessions to users on a network. SIP defines end-to-end call signaling between devices. SIP is a text-based protocol that borrows many elements of HTTP, using the same transaction request and response model, and similar header and response codes. It also adopts a modified form of the URL-addressing scheme used within e-mail that is based on Simple Mail Transfer Protocol (SMTP).
Real-Time Transport Protocol (RTP) An IETF standard media-streaming protocol. RTP carries the voice payload across the network. RTP provides sequence numbers and time stamps for the orderly processing of voice packets. In addition to voice packets, RTP can also carry streaming video packets.
RTP Control Protocol (RTCP) Provides out-of-band control information for an RTP flow. Every RTP flow has a corresponding RTCP flow that reports statistics on the call. RTCP is used for quality of service (QoS) reporting.

Successfully integrating connection-oriented voice traffic in a connectionless IP network requires enhancements to the signaling stack. In some ways, the user voice protocol must make the connectionless network appear more connection oriented through the use of sequence numbers. Table 5-1 provides examples of how various VoIP components and protocols map to the seven-layer OSI model.

Table 5-1. Mapping VoIP Components and Protocols to the OSI Model
OSI Layer	VoIP Component and Protocol
Application	IP Communicator, CallManager, and human speech
Presentation	CODECs
Session	H.323, SIP, MGCP, and Megaco
Transport	RTP and UDP (media); TCP and UDP (signal)
Network	IP
Data link	Any data link technology that supports the transport of IP packets. Examples include Frame Relay, ATM, Ethernet, Point-to-Point Protocol (PPP), Multilink PPP (MLP), and High-Level Data Link Control (HDLC).
Physical	Any physical technology that supports the transport of the data link frames or cells listed in the preceding row. Examples include Category 5 unshielded twisted-pair (UTP), T1, E1, ISDN BRI, and ISDN PRI.

Applications such as Cisco IP Communicator and CallManager provide the interface for users to originate voice at their PCs or laptops and convert and compress the voice before passing it to the network. If a gateway is used, a standard telephone becomes the interface to users, and human speech becomes the application.

CODECs define how voice is compressed. Users can configure which CODEC to use or negotiate a CODEC according to what is available.

The VoIP components that reside at the session layer are the signaling methods. H.323 and SIP define end-to-end call-signaling methods. MGCP and Megaco/H.248 define a method to separate the signaling function from the voice call function. This last approach is referred to as client/server architecture for voice signaling. The client/server architecture uses a call agent to control signaling on behalf of the endpoint devices, such as gateways. The central control device participates in the call setup only. Voice traffic still flows directly from endpoint to endpoint.

A constant in VoIP implementation is that voice uses RTP inside UDP to carry the voice payload across the network. IP voice packets can reach the destination out of order and unsynchronized. The packets must be reordered and resynchronized before playing them out to the user. Because UDP does not provide services such as sequence numbers or time stamps, RTP provides the sequencing functionality.

Once voice packets have been encapsulated at the transport layer, they are ready for transmission across an IP network. This network layer IP traffic can be transmitted across nearly any data-link and physical layer technology, which are capable of transmitting data.

VoIP Service Considerations

In traditional telephony networks, dedicated bandwidth for each voice stream provides voice with a guaranteed delay across the network. Because bandwidth is guaranteed in the TDM environment, there is no variable delay (jitter). Configuring voice in a data network requires network services with low delay, minimal jitter, and minimal packet loss. Bandwidth requirements must be properly calculated based on the CODEC that is used and the number of concurrent connections. QoS must be configured to minimize jitter and loss of voice packets. The PSTN offers uptime of 99.999 percent, also known as the five nines of availability. A system that is up 99.999 percent of the time experiences only five minutes of down time in an entire year. To match the availability of the PSTN, the IP network must be designed with redundancy and failover mechanisms. Additionally, security policies must be established to address both network stability and voice-stream security.

Table 5-2 lists the issues associated with implementing VoIP in a converged network and solutions that address these issues.

Table 5-2. Issues and Solutions for VoIP in a Converged Network
Issue	Solution
Latency	Increase bandwidth. Choose a different CODEC type. Fragment data packets. Prioritize voice packets.
Jitter	Use dejitter buffers.
Bandwidth	Calculate bandwidth requirements, including voice, payload, overhead, and data.
Packet loss	Design the network to minimize congestion. Prioritize voice packets. Drop lower priority traffic more aggressively than voice traffic.
Reliability	Provide redundancy for the following: Hardware Links Power (uninterruptible power supply [UPS]) Perform proactive network management.
Security	Secure the following components: Network infrastructure Call-processing systems Endpoints Applications

RTP and RTCP

Real-Time Transport Protocol (RTP) provides end-to-end network transport functions intended for applications transmitting real-time payloads, such as audio and video. Those functions include payload-type identification, sequence numbering, time stamping, and delivery monitoring.

RTP typically runs on top of UDP to use the multiplexing and checksum services of UDP. Although RTP is often used for unicast sessions, it was primarily designed for multicast sessions. In addition to the roles of sender and receiver, RTP also defines the roles of translator and mixer to support multicast requirements.

RTP is a critical component of VoIP because it enables the destination device to reorder and retime the voice packets before they are played out to the user. An RTP header contains a time stamp and sequence number, which allows the receiving device to buffer and remove jitter and latency by synchronizing the packets to play back a continuous stream of sound. RTP uses sequence numbers to properly order the packets. However, RTP does not request retransmission if a packet is lost. Rather, a voice-enabled router can use a loss-concealment algorithm to interpolate approximately what the lost packet would have sounded like. This synthetically generated packet can then be sent in place of the dropped packet. While this loss-concealment approach minimizes the impact of a single dropped voice packet, multiple voice packets, dropped in succession, result in poor voice quality, as perceived by the listener.

While RTP streams the actual audio, RTP Control Protocol (RTCP) monitors the quality of the data distribution and provides control information.

RTCP provides the following feedback on current network conditions:

RTCP provides a mechanism for hosts involved in an RTP session to exchange information about monitoring and controlling the session. RTCP monitors the quality of elements such as packet count, packet loss, delay, and interarrival jitter. RTCP transmits packets as a percentage of session bandwidth, but at a specific rate of at least every five seconds.
The RTP standard states that the Network Time Protocol (NTP) time stamp is based on synchronized clocks. The corresponding RTP time stamp is randomly generated and based on data-packet sampling. Both NTP and RTP information are included in RTCP packets by the sender of the data.
When a voice stream is assigned UDP port numbers, RTP is typically assigned an even-numbered port, and RTCP is assigned the next odd-numbered port. Each voice call has four ports assigned: RTP plus RTCP in the transmit direction and RTP plus RTCP in the receive direction.

RTP and RTCP Application

As voice packets are placed on the network to reach a destination, they might take one or more paths to reach their destination. Each path might have a different length and transmission speed, which results in the packets being out of order when they arrive at their destination. As the packets were placed on the wire at the source of the call, RTP tagged the packets with a time stamp and sequence number. At the destination, RTP can reorder the packets and send them to the digital signal processor (DSP) at the same pace as they were placed on the wire at the source.

Note

For more information on RTP, refer to RFC 1889, "RTP: A Transport Protocol for Real-Time Applications," which you can find at ftp://ftp.rfc-editor.org/in-notes/rfc1889.txt.

Throughout the duration of each RTP call, the RTCP report packets are generated at least every five seconds. In the event of poor network conditions, a call might be disconnected due to high packet loss. When viewing packets using a packet analyzer, a network administrator could check information in the RTCP header, which includes packet count, octet count, number of packets lost, and jitter. The RTCP header information might shed light on why a call was disconnected.

RTP Header Compression

RTP, a Layer 4 protocol, is encapsulated inside of UDP, another Layer 4 protocol. This UDP segment is then encapsulated inside of an IP packet. The combined IP, UDP, and RTP header overhead is 40 bytes. However, in a default G.729 implementation, the voice payload is only 20 bytes, half the size of the header. Fortunately, Cisco offers a feature called RTP header compression, which reduces the 40-byte header down to only two or four bytes, as illustrated in Figure 5-1.

Figure 5-1. RTP Header Compression

Note

RTP header compression is often abbreviated as cRTP.

RTP header compression technology does not actually compress the header. Rather, cRTP makes the observation that most information contained in the IP, UDP, and RTP headers does not change during a conversation. For example, the source and destination IP addresses, the source and destination UDP port numbers, and the RTP payload type fields do not change during a conversation. Therefore, instead of transmitting this redundant information in each and every packet, cRTP allows routers at each end of a link to cache this information and only send such header information as UDP checksums and a session context ID (CID), which identifies the RTP session to which a particular packet belongs.

An administrator can configure cRTP for an interface, using the ip rtp header-compression [passive] command. This command should be entered, in interface configuration mode, for the interfaces on both ends of a link. Also, note the optional passive keyword. The passive keyword tells an interface not to send compressed headers unless it first receives a compressed header. Therefore, the passive keyword should only be entered on one of the two interfaces connected to a link. If the passive keyword were entered on both interfaces connected to a link, cRTP would not function because neither interface would initiate the sending of compressed headers.

Business Case for VoIP

VoIP Functional Components

Signaling

Database Services

Bearer Channel Control

CODECs

VoIP Protocols

Table 5-1. Mapping VoIP Components and Protocols to the OSI Model

VoIP Service Considerations

Table 5-2. Issues and Solutions for VoIP in a Converged Network

RTP and RTCP

RTP and RTCP Application

RTP Header Compression

Figure 5-1. RTP Header Compression