Section 18.3. Real-Time Media Transport Protocols

18.3. Real-Time Media Transport Protocols

In real-time applications, a stream of data is sent at a constant rate. This data must be delivered to the appropriate application on the destination system, using real-time protocols. The most widely applied protocol for real-time transmission is the Real-Time Transport Protocol (RTP), including its companion version: Real-Time Control Protocol (RTCP).

UDP cannot provide any timing information. RTP is built on top of the existing UDP stack. Problems with using TCP for real-time applications can be identified easily. Real-time applications may use multicasting for data delivery. As an end-toend protocol, TCP is not suited for multicast distribution. TCP uses a retransmission strategy for lost packets, which then arrive out of order. Real-time applications cannot afford these delays. TCP does not maintain timing information for packets. In real-time applications, this would be a requirement.

18.3.1. Real-Time Transport Protocol (RTP)

The real-time transport protocol (RTP) provides some basic functionalities to real-time applications and includes some specific functions to each application. RTP runs on top of the transport protocol as UDP. As noted in Chapter 8, UDP is used for port addressing in the transport layer and for providing such transport-layer functionalities as reordering . RTP provides application-level framing by adding application-layer headers to datagrams. The application breaks the data into smaller units, called application data units (ADUs). Lower layers in the protocol stack, such as the transport layer, preserve the structure of the ADU.

Real-time applications, such as voice and video, can tolerate a certain amount of packet loss and do not always require data retransmission. The mechanism RTP uses typically informs a source about the quality of delivery. The source then adapts its sending rate accordingly . If the rate of packet loss is very high, the source might switch to a lower-quality transmission, thereby placing less load on the network. A real-time application can also provide the data required for retransmission. Thus, recent data can be sent instead of retransmitted old data. This approach is more practical in voice and video applications. If a portion of an ADU is lost, the application is unable to process the data, and the entire ADU would have to be retransmitted.

Real-Time Session and Data Transfer

The TCP/IP and OSI models divide the network functionalities, based on a layered architecture. Each layer performs distinct functions, and the data flows sequentially between layers. The layered architecture may restrict the implementation on certain functions out of the layered order. Integrated layer processing dictates a tighter coupling among layers. RTP is used to transfer data among sessions in real time. A session is a logical connection between an active client and an active server and is defined by the following entities:

RTP port number , which represents the destination port address of the RTP session. Since RTP runs over UDP, the destination port address is available on the UDP header.
IP address of the RTP entity, which involves an RTP session. This address can be either a unicast or a multicast address.

RTP uses two relays for data transmission. A relay is an intermediate system that acts as both a sender and a receiver. Suppose that two systems are separated by a firewall that prevents them from direct communication. A relay in this context is used to handle data flow between the two systems. A relay can also be used to convert the data format from a system into a form that the other system can process easily. Relays are of two types: mixer and translator .

A mixer relay is an RTP relay that combines the data from two or more RTP entities into a single stream of data. A mixer can either retain or change the data format. The mixer provides timing information for the combined stream of data and acts as the source of timing synchronization. Mixers can be used to combine audio streams in real-time applications and can be used to service systems that may not be able to handle multiple RTP streams.

The translator is a device that generates one or more RTP packets for each incoming RTP packet. The format of the outgoing packet may be different from that of the incoming packet. A translator relay can be used in video applications in which a high-quality video signal is converted to a lower-quality in order to service receivers that support a lower data rate. Such a relay can also be used to transfer packets between RTP entities separated by an application-level firewall. Translators can sometimes be used to transfer an incoming multicast packet to multiple destinations.

RTP Packet Header

RTP contains a fixed header and an application-specific variable-length header field. Figure 18.8 shows the RTP header format. The RTP header fields are:

Version (V), a 2-bit field indicating the protocol version.
Padding (P), a 1-bit field that indicates the existence of a padding field at the end of the payload. Padding is required in applications that require the payload to be a multiple of some length.
Extension (X), a 1-bit field indicating the use of an extension header for RTP.
Contributing source count (CSC), a 4-bit field that indicates the number of contributing source identifiers.
Marker (M), a 1-bit field indicating boundaries in a stream of data traffic. For video applications, this field can be used to indicate the end of a frame.
Payload type , A 7-bit field specifying the type of RTP payload. This field also contains information on the use of compression or encryption.
Sequence number , a 16-bit field that a sender uses to identify a particular packet within a sequence of packets. This field is used to detect packet loss and for packet reordering.
Timestamp , a 32-bit field enabling the receiver to recover timing information. This field indicates the timestamp when the first byte of data in the payload was generated.
Synchronization source identifier , a randomly generated field used to identify the RTP source in an RTP session.
Contributing source identifier , an optional field in the header to indicate the contributing sources for the data.

Figure 18.8. Packet format for the real-time transport protocol

Overall, the main segment of an RTP header includes 12 bytes and is appended to a packet being prepared for multimedia application.

18.3.2. Real-Time Control Protocol (RTCP)

The Real-Time Transport Protocol (RTCP) also runs on top of UDP. RTCP performs several functions, using multicasting to provide feedback about the data quality to all session members. The session multicast members can thus get an estimate of the performance of other members in the current active session. Senders can send reports about data rates and the quality of data transmission. Receivers can send information about packet-loss rate, jitter variations, and any other problems they might encounter. Feedback from a receiver can also enable a sender to diagnose a fault. A sender can isolate a problem to a single RTP entity or a global problem by looking at the reports from all receivers.

RTCP performs source identification . RTCP packets contain some information to identify the source of the control packet. The rate of RTCP packets must also be kept to less than 5 percent of the total session traffic. Thus, this protocol carries out "rate control" of RTCP packets. At the same time, all session members must be able to evaluate the performance of all other session members. As the number of active members in a session increases , the transmission rates of the control packets must be reduced. RTCP is also responsible for session control and can provide some session-control information, if necessary.

Packet Type and Format

RTCP transmits control information by combining a number of RTCP packets in a single UDP datagram. The RTCP packet types are sender reports (SR), receiver reports (RR), source descriptor (SDES), goodbye (BYE), and application-specific types . Figure 18.9 shows some RTCP packet formats. The fields common to all packet types are as follows :

Version , a 2-bit field that indicates the current version.
Padding , a 1-bit field that indicates the existence of padding bytes at the end of the control data.
Count , a 5-bit field that indicates the number of SR or RR reports or the number of source items in the SDES packet.
Packet type , an 8-bit field that indicates the type of RTCP packet. (Four RTCP packet types were specified earlier.)
Length , a 16-bit field that represents the length of packet in 32-bit words minus 1.
Synchronization source identifier , a field common to the SR and RR packet types; it indicates the source of the RTCP packets.

Figure 18.9. Format of the SR packet in RCTP

Figure 18.9 also shows a typical format of a sender report. The report consists of the common header fields and a block of sender information. The sender report may also contain zero or more receiver report blocks, as shown in the figure. The fields in the sender information block are:

NTP timestamp , a 64-bit field that indicates when a sender report was sent. The sender can use this field in combination with the timestamp field returned in receiver reports to estimate the round-trip delay to receivers.
RTP timestamp , a 32-bit field used by a receiver to sequence RTP data packets from a particular source.
Sender's packet count , a 32-bit field that represents the number of RTP data packets transmitted by the sender in the current session.
Sender's byte count , a 32-bit field that represents the number of RTP data octets transmitted by the sender in the current session.

The SR packet includes zeros or more RR blocks. One receiver block is included for each sender from which the member has received data during the session. The RR block includes the following fields:

SSRC_ n , a 32-bit field that identifies the source in the report block, where n is the number of sources.
Fraction lost , an 8-bit field indicating the fraction of data packet loss from source SSRC_ n since the last SR or RR report was sent.
Cumulative number of packets lost , a 24-bit field that represents the total number of RTP data packets lost from the source in the current active session identified by SSRC_ n .
Extended highest sequence number received , the first 16 least-significant bits, used to indicate the highest sequence number for packets received from source SSRC_ n . The first 16 most-significant bits indicate the number of times that the sequence number has been wrapped back to zero.
Interarrival jitter , a 32-bit field used to indicate the jitter variations at the receiver for the source SSRC_ n .
Last SR timestamp , a 32-bit field indicating the timestamp for the last SR packet received from the source SSRC_ n .
Delay since last SR, a 32-bit field indicating the delay between the arrival time of the last SR packet from the source SSRC_ n and the transmission of the current report block.

Receivers in RTCP can provide feedback about the quality of reception through a receiver report. A receiver that is also a sender during a session, it also sends the sender reports.

18.3.3. Estimation of Jitter in Real-Time Traffic

The jitter factor is a measure of the delay experienced by RTP packets in a given session. The average jitter can be estimated at the receiver. Let us define the following parameters at the receiver:

t _i	Timestamp of the RTP data packet i indicated by the source.
± _i	Arrival time of the RTP data packet i at the receiver.
d _i	Measure of difference between interarrival time of RTP packets at receiver and the one for packet departure from the source. This value represents the difference in packet spacing at source and receiver.
E [ i ]	Estimate of average jitter until the time of packet i arrival.

The difference interval d _i is given by

Equation 18.1

The estimated average jitter until the time of packet i arrival is given by

Equation 18.2

where k is a normalizing coefficient. The interarrival jitter value indicated in the sender report provides useful information on network conditions to the sender and the receiver. The jitter value can be used as an estimate to calculate the variation of network congestion.

RTP packet-sequence numbers are used to help a receiver sequence packets in order to recover lost packets. When packets arrive out of order at the receiver, the sequence number can be used to assemble data packets. Consider Figure 18.10. When certain packets are lost, the gaps are filled in with previously received data packets. As shown in the figure, packet D is replayed twice, since packet C is lost. This mechanism can help reduce the pips and clicks in voice signals owing to lost packets. This reconstructed data stream is sent to the receiver, with the lost packets replaced by previously received packets. This can significantly improve the latency.