The increased efficiency of IP networks and the ability to statistically multiplex voice traffic with data packets allows companies to maximize their return on investment (ROI) in data network infrastructures. Decreased cost and an increase in the availability of differentiated services are two major reasons companies are evaluating the implementation of VoIP. As demand for voice services in the IP network expands, it is important to understand the components and functionality that must be present for a successful implementation. Several protocols and tools are available for carrying voice in a data network. In defining the VoIP protocol stack, you must understand at which layer these tools and protocols reside and how they interact with other layers. When voice is packaged into IP packets, additional headers are created to carry voice-specific information. These headers can create significant additional overhead in the IP network. Understanding which protocols to use and knowing how to limit overhead is crucial in carrying voice efficiently across an IP network. Business Case for VoIPBusiness advantages that are driving implementations of VoIP networks have changed over time. Starting with simple media convergence, these advantages have evolved to include the convergence of call-switching intelligence and the total user experience. Originally, ROI calculations centered on toll-bypass and converged-network savings. Although these savings are still relevant today, advances in voice technologies allow organizations and service providers to differentiate their product offerings by providing advanced features. VoIP business drivers include the following:
VoIP Functional ComponentsIn the traditional PSTN telephony network, all the elements that are required to complete the call are transparent to the end user. Migration to VoIP necessitates an awareness of these required elements and a thorough understanding of the protocols and components that provide the same functionality in an IP network. Required VoIP functionality includes the following features:
The following sections describe each required functional component. SignalingSignaling is the ability to generate and exchange control information to establish, monitor, and release connections between two endpoints. Voice signaling requires the ability to provide supervisory, address, and alerting functionality between nodes. PSTN uses Signaling System 7 (SS7) to transport control messages in an out-of-band signaling network. VoIP presents several options for signaling, including H.323, Session Initiation Protocol (SIP), Megaco/H.248, and Media Gateway Control Protocol (MGCP). Some VoIP gateways are also capable of initiating SS7 signaling directly to the PSTN network. Signaling protocols are classified either as peer-to-peer or client/server architectures. SIP and H.323 are examples of peer-to-peer signaling protocols where the end devices or gateways contain the intelligence to initiate and terminate calls and interpret call control messages. Megaco/H.248 and MGCP are examples of client/server protocols where the endpoints or gateways do not contain call control intelligence but send or receive event notifications to the server commonly referred to as the call agent. For example, when an MGCP gateway detects that a telephone has gone off hook, the gateway does not know to automatically provide a dial tone. The gateway sends an event notification to the call agent, telling the agent that an off-hook condition has been detected. The call agent then notifies the gateway to provide a dial tone. Database ServicesAccess to services such as 1-800 numbers or caller ID requires the ability to query a database to determine whether the call can be placed or the information can be made available. Database services include access to billing information, calling name (CNAM) delivery, toll-free database services (1-8xx), and calling card services. VoIP service providers can differentiate their services by providing access to numerous and unique database services. For example, to simplify fax access to mobile users, a provider might build a service that converts fax to e-mail. Another example might be to provide a call notification service that places outbound calls with prerecorded messages at specific times to notify users of such events as school closures, wake-up calls, or appointment reminders. Bearer Channel ControlBearer channels are the channels that carry voice calls. Proper supervision of these channels requires that the appropriate call connect and call disconnect signaling be passed between end devices. Correct signaling ensures that the channel is allocated to the current voice call and that the channel is properly de-allocated when either side terminates the call. These connect and disconnect messages are carried in SS7 within the PSTN network, and in SIP, H.323, Megaco/H.248, or MGCP within an IP network. CODECsCoder-decoders (CODECs) provide the coding and decoding translation between analog and digital facilities. Each CODEC type defines the method of voice coding and the compression mechanism that is used to convert the voice stream. The PSTN uses TDM to carry each voice call. Each voice channel reserves 64 kbps of bandwidth and uses the G.711 CODEC to convert the analog voice wave to a TDM voice stream. G.711 creates a 64 kbps digitized voice stream. In VoIP design, CODECs often compress voice beyond the 64 kbps voice stream to allow more efficient use of network resources. The most widely used CODEC in the WAN environment is G.729, which compresses the voice stream (that is, the voice payload only) to 8 kbps. VoIP ProtocolsVoIP employs a variety of protocols to set up a call, tear down a call, and send information (for example, the actual spoken voice) during a call. The following are the major VoIP protocols:
Successfully integrating connection-oriented voice traffic in a connectionless IP network requires enhancements to the signaling stack. In some ways, the user voice protocol must make the connectionless network appear more connection oriented through the use of sequence numbers. Table 5-1 provides examples of how various VoIP components and protocols map to the seven-layer OSI model.
Applications such as Cisco IP Communicator and CallManager provide the interface for users to originate voice at their PCs or laptops and convert and compress the voice before passing it to the network. If a gateway is used, a standard telephone becomes the interface to users, and human speech becomes the application. CODECs define how voice is compressed. Users can configure which CODEC to use or negotiate a CODEC according to what is available. The VoIP components that reside at the session layer are the signaling methods. H.323 and SIP define end-to-end call-signaling methods. MGCP and Megaco/H.248 define a method to separate the signaling function from the voice call function. This last approach is referred to as client/server architecture for voice signaling. The client/server architecture uses a call agent to control signaling on behalf of the endpoint devices, such as gateways. The central control device participates in the call setup only. Voice traffic still flows directly from endpoint to endpoint. A constant in VoIP implementation is that voice uses RTP inside UDP to carry the voice payload across the network. IP voice packets can reach the destination out of order and unsynchronized. The packets must be reordered and resynchronized before playing them out to the user. Because UDP does not provide services such as sequence numbers or time stamps, RTP provides the sequencing functionality. Once voice packets have been encapsulated at the transport layer, they are ready for transmission across an IP network. This network layer IP traffic can be transmitted across nearly any data-link and physical layer technology, which are capable of transmitting data. VoIP Service ConsiderationsIn traditional telephony networks, dedicated bandwidth for each voice stream provides voice with a guaranteed delay across the network. Because bandwidth is guaranteed in the TDM environment, there is no variable delay (jitter). Configuring voice in a data network requires network services with low delay, minimal jitter, and minimal packet loss. Bandwidth requirements must be properly calculated based on the CODEC that is used and the number of concurrent connections. QoS must be configured to minimize jitter and loss of voice packets. The PSTN offers uptime of 99.999 percent, also known as the five nines of availability. A system that is up 99.999 percent of the time experiences only five minutes of down time in an entire year. To match the availability of the PSTN, the IP network must be designed with redundancy and failover mechanisms. Additionally, security policies must be established to address both network stability and voice-stream security. Table 5-2 lists the issues associated with implementing VoIP in a converged network and solutions that address these issues.
RTP and RTCPReal-Time Transport Protocol (RTP) provides end-to-end network transport functions intended for applications transmitting real-time payloads, such as audio and video. Those functions include payload-type identification, sequence numbering, time stamping, and delivery monitoring. RTP typically runs on top of UDP to use the multiplexing and checksum services of UDP. Although RTP is often used for unicast sessions, it was primarily designed for multicast sessions. In addition to the roles of sender and receiver, RTP also defines the roles of translator and mixer to support multicast requirements. RTP is a critical component of VoIP because it enables the destination device to reorder and retime the voice packets before they are played out to the user. An RTP header contains a time stamp and sequence number, which allows the receiving device to buffer and remove jitter and latency by synchronizing the packets to play back a continuous stream of sound. RTP uses sequence numbers to properly order the packets. However, RTP does not request retransmission if a packet is lost. Rather, a voice-enabled router can use a loss-concealment algorithm to interpolate approximately what the lost packet would have sounded like. This synthetically generated packet can then be sent in place of the dropped packet. While this loss-concealment approach minimizes the impact of a single dropped voice packet, multiple voice packets, dropped in succession, result in poor voice quality, as perceived by the listener. While RTP streams the actual audio, RTP Control Protocol (RTCP) monitors the quality of the data distribution and provides control information. RTCP provides the following feedback on current network conditions:
RTP and RTCP ApplicationAs voice packets are placed on the network to reach a destination, they might take one or more paths to reach their destination. Each path might have a different length and transmission speed, which results in the packets being out of order when they arrive at their destination. As the packets were placed on the wire at the source of the call, RTP tagged the packets with a time stamp and sequence number. At the destination, RTP can reorder the packets and send them to the digital signal processor (DSP) at the same pace as they were placed on the wire at the source. Note For more information on RTP, refer to RFC 1889, "RTP: A Transport Protocol for Real-Time Applications," which you can find at ftp://ftp.rfc-editor.org/in-notes/rfc1889.txt. Throughout the duration of each RTP call, the RTCP report packets are generated at least every five seconds. In the event of poor network conditions, a call might be disconnected due to high packet loss. When viewing packets using a packet analyzer, a network administrator could check information in the RTCP header, which includes packet count, octet count, number of packets lost, and jitter. The RTCP header information might shed light on why a call was disconnected. RTP Header CompressionRTP, a Layer 4 protocol, is encapsulated inside of UDP, another Layer 4 protocol. This UDP segment is then encapsulated inside of an IP packet. The combined IP, UDP, and RTP header overhead is 40 bytes. However, in a default G.729 implementation, the voice payload is only 20 bytes, half the size of the header. Fortunately, Cisco offers a feature called RTP header compression, which reduces the 40-byte header down to only two or four bytes, as illustrated in Figure 5-1. Figure 5-1. RTP Header Compression
Note RTP header compression is often abbreviated as cRTP. RTP header compression technology does not actually compress the header. Rather, cRTP makes the observation that most information contained in the IP, UDP, and RTP headers does not change during a conversation. For example, the source and destination IP addresses, the source and destination UDP port numbers, and the RTP payload type fields do not change during a conversation. Therefore, instead of transmitting this redundant information in each and every packet, cRTP allows routers at each end of a link to cache this information and only send such header information as UDP checksums and a session context ID (CID), which identifies the RTP session to which a particular packet belongs. An administrator can configure cRTP for an interface, using the ip rtp header-compression [passive] command. This command should be entered, in interface configuration mode, for the interfaces on both ends of a link. Also, note the optional passive keyword. The passive keyword tells an interface not to send compressed headers unless it first receives a compressed header. Therefore, the passive keyword should only be entered on one of the two interfaces connected to a link. If the passive keyword were entered on both interfaces connected to a link, cRTP would not function because neither interface would initiate the sending of compressed headers. |