4.8 Multicast support protocols

Several new protocols have been developed to improve support of real-time multimedia applications, including Real-Time Protocol (RTP), Real-Time Control Protocol (RTCP), and Real-Time Streaming Protocol (RTSP). These protocols are so called because they are designed to support applications with tight delivery constraints (e.g., bounded aggregate transit delay packet interarrival time). Although the protocols are designed from the ground up with large multicast applications in mind, they can happily support unicast traffic. Figure 4.15 illustrates where this new breed of protocols sits within the IP model.

click to expand
Figure 4.15: IP multicast and real-time application support protocols in context. For a description of the Streaming Protocol (ST2) and associated higher-level protocols refer to [30].

4.8.1 Real-time Transport Protocol (RTP)

RTP is an IETF protocol that provides support for real-time unicast and multicast network applications such as interactive audio and video, although many other applications, such as interactive distributed simulation and control and measurement applications, are also likely candidates. Commercial implementations of RTP and applications that use RTP are currently available for a number of platforms. RTP is also used by the VAT application on the MBone (as described in section 4.7.2). RTP is summarized as follows:

RTP provides real-time, end-to-end delivery services using existing transport protocols. RTP typically runs on top of UDP (although not mandated), utilizing its multiplexing and checksum services and supplementing these with its own more specialized functions geared toward real-time applications.
RTP supports simultaneous data transfer to multiple destinations if required (assuming multicasting is supported by the underlying transport). Unlike legacy protocols, RTP embodies the concept of time as part of its delivery mechanism and is now the core protocol for real-time transport on both IP and hybrid MPOA networks. Although time constrained, due to the underlying packet-switched nature of the majority of supported networks, some variation in packet interarrival times is to be expected.
The RTP header includes fields for sequence numbering and time stamping (32 bit). These functions are performed at source, enabling the receiver to reconstruct the application data flow accordingly. This timing information is necessary to synchronize and display audio and video data and to determine whether packets have been lost or have arrived out of order. For some applications, limited packet loss is not necessarily a problem as long as the real-time characteristics of the source data can be constructed (e.g., a video or audio feed). The source of a stream of RTP packets is called the synchronization source, or SSRC.
The RTP header specifies the payload type, thus allowing multiple data and compression types. For example, for an audio application the RTP header in each packet indicates what type of audio encoding is employed (e.g., PCM, ADPCM, or LPC). By including encoding details in each packet, senders have the potential to change the encoding during a conference (e.g., to accommodate a new participant that is connected through a low-bandwidth link or to react to indications of network congestion).
RTP is a best-effort protocol; it does not guarantee timely delivery nor does it provide any quality-of-service guarantees. Packets are not guaranteed to arrive, and when they do they are not guaranteed to be in sequence. QoS facilities are assumed to be provided by protocols such as RSVP.

RTP offers a highly flexible protocol architecture designed to offer scalability, and it is anticipated that RTP services will be integrated within the application framework rather than implemented as a separate layer. Unlike conventional protocols, in which additional functions might be accommodated by making the protocol more general, or by adding an option mechanism that would require parsing, RTP is intended to be tailored through modification and/or additions to the headers as needed. Examples are given in the IETF RTP specification [31], and a companion specification [32] defines a set of payload type codes and their mapping to payload formats (e.g., media encodings).

An RTP session is defined by a particular pair of destination transport addresses (actually a network address plus a port pair for RTP and RTCP). By convention RTP data should be carried on an even UDP port number, and its associated RTCP packets should be carried on the next higher odd port number. Applications may use any such UDP port pair (e.g., the port pair may be allocated randomly by a session management program). A fixed port pair cannot be allocated, because multiple applications are likely to run on the same host, and there are some operating systems that do not allow multiple processes to use the same UDP port with different multicast addresses. However, port numbers 5004 and 5005 have been registered for use as the default pair [32]. The destination transport address pair may be common for all participants (e.g., for IP multicast) or may be different for each (e.g., individual unicast network addresses plus a common port pair). In a multimedia session, each media type (audio, video, etc.) is carried in a separate RTP session with its own RTCP packets. Multiple RTP sessions are distinguished by different port number pairs and/or different multicast addresses.

Since members of the working group join and leave during the conference, it is useful to know who is participating at any moment and how well they are receiving the audio data. For that purpose, each instance of the audio application in the conference periodically multicasts a reception report plus the name of its user on the RTCP (control) port. The reception report indicates how well the current speaker is being received and may be used to control adaptive encodings. In addition to the user name, other identifying information may also be included subject to control bandwidth limits. A site sends the RTCP BYE packet [31] when it leaves the conference.

Mixers and translators

RTP takes into account the likelihood of wide variations in service characteristics across large public networks. Two special intermediate devices (relays) are defined as follows:

Mixers—Instead of forcing all users to suffer a reduced-quality application feed appropriate for the lowest denominator of the bandwidth available, an RTP-level relay called a mixer may be placed near the low-bandwidth area. This mixer resynchronizes incoming packets into a single packet stream and reconstructs the appropriate packet interarrival spacing for different bandwidth users. These packets might be unicast to a single recipient or multicast on a different address to multiple recipients.
Translators—Some receivers may have ample bandwidth but may not be directly reachable via IP multicast. For example, they might be behind a firewall that blocks these IP packets. For these sites an RTP-level relay called a translator may be used. Two translators are installed, one on either side of the firewall, with the outside one tunneling all multicast packets through a secure connection to the translator inside the firewall. The translator inside the firewall transmits these packets as multicasts to a multicast group restricted to the site's internal network.

Mixers and translators may be designed for a variety of purposes. An example is a video mixer, which scales the images of individuals in separate video streams and composites them into one video stream to simulate a group scene. Other examples of translation include the connection of a group of hosts speaking only IP-UDP to a group of hosts that understand only ST-II or the packet-by-packet encoding translation of video streams from individual sources without resynchronization or mixing. Details of the operation of mixers and translators are given in [31]. The interested reader should refer to [31–33], for further details on RTP.

4.8.2 Real-time Control Protocol (RTCP)

RTCP is the control protocol defined by the IETF to work in cooperation with RTP to provide periodic state information via control packets sent to all participants in a session, using the same packet distribution mechanism as for data. The underlying protocol must, therefore, be capable of multiplexing both data and control packets (typically this is achieved by using separate port numbers in UDP). Feedback of information to the application server can be used both to control performance and for diagnostic purposes. RTCP performs the following functions:

State information feedback—The main function of RTCP is to provide information to an application regarding the quality of data distribution, in particular the status regarding flow and congestion being experienced by RTP. Each RTCP packet contains transmitter and/or receiver reports that include useful statistics (number of packets transmitted, number of packets lost, interarrival jitter, etc.).
Identify RTP sources—Each SSRC (i.e., RTP source—see section 4.8.1) is identified by a transport-level identifier within RTCP called the canonical name (CNAME). Since an SSRC may change during an RTP session, perhaps because an application is restarted, the CNAME is used to keep track of all participants in the session. Receivers use the CNAME to identify related data streams within a set of related RTP sessions (e.g., to synchronize audio and video streams).
Control RTCP transmission rates—In order to constrain control traffic overheads on networks, and to allow RTP to scale, control traffic is limited to at most 5 percent of the overall session traffic. This threshold is regulated by calculating the rate at which RTCP packets are transmitted as a function of the number of participants (and since participants send control packets to everyone else, they can easily work this out).
Convey session control information—RTCP can optionally be used as a method for conveying a minimal amount of information to all session participants.

Experiments with IP multicasting have demonstrated the value of user feedback from RTCP to diagnose distribution faults. Reception quality feedback is useful for transmitters, receivers, and third-party monitors. For example, a transmitter can regulate its transmission rate based on such feedback; a receiver can determine whether problems are local, regional, or global; and network managers can use RTCP statistics to evaluate the performance of their networks for multicast distribution.

4.8.3 Real-Time Streaming Protocol (RTSP)

Real-Time Streaming Protocol (RTSP) is designed to provide a robust protocol for streaming multimedia in one-to-many applications over unicast and multicast and to support interoperability between clients and servers from different vendors. RTSP can be used with RSVP to set up and manage reserved-bandwidth streaming sessions. RTSP is currently a draft IETF specification, although products using RTSP are available today.

Traditionally, high-volume multimedia data are delivered in maximum-sized packets to optimize bandwidth (reducing the protocol-to-data ratio to maximize efficiency). From the user perspective this often means waiting for large amounts of data to download before a media file can be accessed. The concept behind streaming is to break up data into packets optimized in length for the available bandwidth between the client and server. Once a client has buffered enough incoming packets, it can simultaneously process these packets while receiving and decompressing other incoming packets. This enables the user to start using multimedia applications almost immediately, without having to wait for the whole media file to be downloaded. Typical data sources for streaming applications include both live data feeds and stored clips.

RTSP is designed to work over RTP to both control and deliver real-time content. Although RTSP can be used with unicast in the near future, its use may help smooth the transition for environments transitioning from unicast to IP multicasting with RTP. RTSP is considered more of a framework than a protocol. It is intended to control multiple data delivery sessions and provide a means for choosing delivery channels such as UDP, TCP, IP Multicast, and delivery mechanisms based on RTP. Control mechanisms such as session establishment and licensing issues are being addressed. The interested reader should refer to [34] for further details.