3.4 Data Delivery | ITV Handbook: Technologies and Standards

3.4.1 MPEG-based Methods

3.4.1.1 MPEG-2 Architecture

The MPEG-2 ISO/IEC 13818-1 Standard for encoding/decoding of video streams contains an early version of the DSM-CC ISO/IEC 13818-6 standard developed for the delivery of multimedia broadband services. DSM-CC covers a number of distinct protocol areas:

Network session and resource control
Configuration of a client
Downloading to a client
VCR-like control of the video stream
Generic interactive application services
Generic broadcast application services ”data and user- user object carousel
Switched digital broadcast channel change

To deliver data using MPEG-2, there is a need to introduce various encapsulations , which are essentially methods of packaging data in MPEG packets. This includes DSM-CC data and object carousels, asynchronous, synchronous, and synchronized streaming protocols, and data piping protocols.

Intuitively, a data carousel is a chunk of files retransmitted periodically to enable download in presence of random tuning. The DSM-CC data carousel is a collection of MPEG-2 tables carrying modules, each of which is designed for delivery of a single file. The object carousel is built on top of the data carousel, and carries multiple objects within modules, where an object could be a file or a directory.

The delivery of data may be asynchronous, synchronous, or synchronized. With asynchronous delivery, there is no requirement that the processing and presentation of the data is coordinated with that of the video. Asynchronous data elementary streams (i.e., packets) do not carry any MPEG-2 system time stamps, and the delivery of the data is not governed by a delivery clock.

With synchronous data delivery, there is a requirement that data and video are coordinated. Synchronous elementary data streams use two types of time stamps: Program Clock Reference (PCR) and Presentation time stamp (PTS). PCRs are used to enable receivers to reconstruct the clock used by the emitter to construct the transport stream. PTSs located in the header bytes of the MPEG-2 PES packets, specify the time instants at which the delivery of the first byte of the PES payload should start. The intention of the standard is to provide a metered delivery mechanism.

Synchronized data delivery differs from synchronous data delivery in that there is a requirement that the data has been fully decoded and reconstructed at the time specified by the PTS, namely when the reconstructed clock strikes a value that equals that of the PTS specified. Therefore, synchronized delivery provides the tightest integration between data and video. Support of synchronized delivery is complicated, as it requires receivers to start decoding sufficiently early to complete decoding before the PTS. Due to the difficulties of this implementation, the ATSC Trigger standard introduced a method for decoupling a short synchronization trigger, carried in a single packet, from the data it refers to, carried in a large number of packets.

3.4.1.2 MPEG-4 Architecture

The MPEG-4 architecture defines the layers of TransMux, DMIF, Sync, and elementary stream, as depicted in Figure 3.15. The first multiplexing layer is managed according to the Delivery Multimedia Integration Framework (DMIF) specification, part 6 of the MPEG-4 standard. This multiplex may be embodied by the MPEG-defined FlexMux tool, which allows grouping of elementary streams with a low multiplexing overhead. Multiplexing at this layer may be used, for example, to group elementary streams with similar Quality of Service (QoS) requirements, reduce the number of network connections or the end to end delay.

Figure 3.15. The layered MPEG-4 architecture.

The Transport Multiplexing (TransMux) layer in Figure 3.15 models the layer that offers transport services matching the requested QoS. Only the interface to this layer is specified by MPEG-4, whereas the concrete mapping of the data packets and control signaling must be done in collaboration with the bodies that have jurisdiction over the respective transport protocol. Any suitable existing transport protocol stack, such as MPEG-2 Transport Stream or UDP/IP, over a suitable link layer may become a specific TransMux instance. The choice is left to the end user or service provider, and this allows MPEG-4 to be used in a wide variety of operation environments.

3.4.2 IP-based Transport

Today, the most popular data delivery methods are based on IP version 4 [IPv4]. Figure 3.16 depicts the relationship between the various iTV related protocols built on top of the IP protocol. The Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP) are built directly on top of IP. The FTP, HTTP, Secure-FTP (S-FTP), and RTSP are all built on top of TCP. The Secure HTTP (HTTPS) [HTTPS] is built on top of the Transport Layer Security (TLS) protocol, which is built on top of TCP. The RTP, Trivial-FTP (TFTP), and Dynamic Host Configuration Protocol (DHCP) are built on top of UDP.

Figure 3.16. A subset of the IP protocol stack that is important for iTV.

3.4.2.1 IP

The IP is concerned with two basic issues: addressing and fragmentation [IPv4].

Addressing : The Internet modules use the addresses carried in the header to transmit datagrams to their destinations. The selection of a transmission path is called routing.
Fragmentation : The Internet modules use fields in the Internet header to fragment and reassemble Internet datagrams when necessary for transmission through small packet networks.

IP serves as a method of communication between hosts , each of which must have modules that know how to receive, send, and process IP packets, also known as Internet datagrams. The packets are regarded as unrelated and independent of each other, and there are no logical or virtual connections defined. The modules share common rules for interpreting address fields and fragmenting and assembling Internet datagrams. In addition, these modules share common procedures for making routing decisions.

Four key mechanisms are employed:

Type of service : This is used to indicate the quality of the service desired. The type of service indication is used by gateways to select the actual transmission parameters for a particular network, the network to be used for the next hop, or the next gateway when routing a datagram.
Time to live : The TTL is an indication of an upper bound on the lifetime of an Internet datagram. It is set by the sender of the datagram and reduced at the points along the route where it is processed . If the time to live reaches zero before the Internet datagram reaches its destination, the datagram is destroyed . The time to live can be thought of as a self-destruct time limit.
Options : These provide for control functions needed or useful in some situations but unnecessary for the most common communications. The options include provisions for time stamps, security, and special routing.
Header checksum : It provides verification that the information used in processing datagram has been transmitted correctly. The data may contain errors. If the header checksum fails, the Internet datagram is discarded at once by the entity that detects the error.

IP does not provide a reliable communication facility. There are no acknowledgments either end-to-end or hop-by-hop . There is no error control for data, only a header checksum. There are no retransmissions. There is no flow control. Errors detected may be reported via the Internet Control Message Protocol (ICMP), which is implemented in the IP module [ICMP].

3.4.2.2 TCP

The Transmission Control Protocol (TCP) is a connection-oriented, end-to-end reliable protocol designed for use as a highly reliable host-to-host protocol between hosts in packet-switched computer communication networks [TCP]. Very few assumptions are made about the reliability of the communication protocols below the TCP layer, allowing it to operate above a wide spectrum of communication systems. To provide reliability of service on top of a less reliable IP, TCP introduces facilities in the following areas:

Basic data transfer : The TCP is able to transfer a continuous stream of data, between two hosts, in both directions.
Reliability : The TCP can recover from a situation in which data is damaged, lost, duplicated , or delivered out of order. This is achieved by assigning sequence numbers and requiring a positive Acknowledgment (ACK) from the receiving host. If the ACK is not received within a time-out interval, the data is retransmitted. At the receiver, the sequence numbers are used to correctly order segments that may be received out of order and to eliminate duplicates. Damage is handled by adding a checksum to each segment transmitted, checking it at the receiver, and discarding damaged segments.
Flow control : The TCP provides a means for the receiver to govern the amount of data sent by the sender. This is achieved by returning, with every ACK, a window indicating a range of acceptable sequence numbers beyond the last segment successfully received. This window also indicates the maximum number of segments that the sender may transmit before waiting for an ACK.
Multiplexing : The TCP provides a set of addresses or ports within each host to allow for multiple simultaneous users of a TCP communication facility. These addresses, concatenated with the IP network and host addresses, uniquely identify each connection. It is useful to publish fixed port numbers to allow public access. Discovering unpublished ports requires some search mechanisms.
Connections : The TCP requires the maintenance of status information for each data stream. This status information, which includes sockets, sequence numbers, and window sizes, is called a connection. When two hosts wish to communicate, they must first establish a TCP connection by initializing the status information on each side. When their communication is complete, the connection is terminated or closed to free the resources for other uses. Since connections must be established between unreliable hosts and over the unreliable IP, a handshake mechanism with clock-based sequence numbers is used to avoid erroneous initialization of connections.

3.4.2.3 UDP

This User Datagram Protocol (UDP) was finalized in 1980 [UDP]. It was developed to standardize a datagram mode of packet-switched communication in the environment of an interconnected set of computer networks. UDP assumes that the IP is used as the underlying protocol. The protocol does not guarantee delivery and duplicate protection; applications requiring ordered reliable delivery of streams of data should use the TCP.

The UDP implementation must be able to interface with the IP implementation to determine the source and destination IP addresses and the protocol field from the header. One possible UDP/IP interface could return the whole datagram including the entire header in response to a receive operation. Such an interface could also allow the UDP to pass a full datagram, complete with header, to the IP to send. The IP would verify certain fields for consistency and compute the header checksum.

3.4.2.4 DHCP

The DHCP provides a communications startup framework for passing configuration information to hosts on a TCP/IP network [DHCP]. DHCP is based on the Bootstrap Protocol (BOOTP), adding the capability of automatic allocation of reusable network addresses and additional configuration options. DHCP consists of two components : a protocol for delivering host-specific configuration parameters from a DHCP server to a host and a mechanism for allocation of network addresses to hosts.

DHCP is built on a client-server model, where designated DHCP server hosts allocate network addresses and deliver configuration parameters to dynamically configured hosts. In the context of DHCP, the term server (typically a headend) refers to a host providing initialization parameters through DHCP, and the term client (typically a set-top box) refers to a host requesting initialization parameters from a DHCP server.

DHCP supports three mechanisms for IP address allocation. In automatic allocation , DHCP assigns a permanent IP address to a client. In dynamic allocation , DHCP assigns an IP address to a client for a limited period of time (or until the client explicitly relinquishes the address). In manual allocation , a client's IP address is assigned by the network administrator, and DHCP is used simply to convey the assigned address to the client. A particular network will use one or more of these mechanisms, depending on the policies of the network administrator.

Dynamic allocation is the only one of the three mechanisms that allows automatic reuse of an address that is no longer needed by the client to which it was assigned. Thus, dynamic allocation is particularly useful for assigning an address to a client that will be connected to the network only temporarily or for sharing a limited pool of IP addresses among a group of clients that do not need permanent IP addresses. It is therefore best suited for iTV applications in which a spontaneous temporary connection may be established, e.g., for the purpose of performing a TV-commerce transaction. Dynamic allocation may also be a good choice for assigning an IP address to a new client being permanently connected to a network where IP addresses are sufficiently scarce that it is important to reclaim them when old clients are retired .

3.4.2.5 HTTP

The HTTP is a generic, stateless application-level protocol [HTTP1.0]. HTTP has been in use by the World-Wide Web global information initiative since 1990. HTTP/1.0 did not sufficiently take into consideration the effects of hierarchical proxies, caching, the need for persistent connections, or virtual hosts. The HTTP/1.1 protocol was developed to address those issues [HTTP1.1].

The HTTP protocol is a request-response protocol. A client sends a request to the server in the form of a request method, a Uniform Resource Identifier (URI) [URI], and protocol version, followed by a message containing request modifiers, client information, and possible body content over a connection with a server. The server responds with a status line, including the message's protocol version and a success or error code, followed by a message containing server information, entity meta information, and possible entity-body content.

HTTP/1.1 introduced the URI to unify the Uniform Resource Location (URL) and Name (URN) [URI]. Messages are passed in a format similar to that used by Internet mail as defined by Multipurpose Internet Mail Extensions (MIME) [MIME].

HTTP messages are either requests from a client to a server or responses from a server to a client. Both types of messages consist of a start line (which is either a request-line or status-line), zero or more header fields (also known as headers), an empty line indicating the end of the header fields, and possibly a message body.

In many cases, HTTP communication is initiated by a browser and consists of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished via a single connection between the browser and the origin server.

A more complicated situation occurs when one or more intermediaries are present in the request-response chain. There are 3 common forms of intermediary: proxy, gateway, and tunnel. A proxy is a forwarding agent, receiving requests for a URI in its absolute form, rewriting all or part of the message, and forwarding the reformatted request toward the server identified by the URI. A gateway is a receiving agent, acting as a layer above some other server(s) and, if necessary, translating requests to the underlying server's protocol. A tunnel acts as a relay point between two connections without changing the messages; tunnels are used when the communication needs to pass through an intermediary (e.g., a firewall) even when the intermediary cannot understand the contents of the messages.

A request or response message that travels the whole chain will pass through multiple separate connections. Some HTTP communication options may apply only to the connection with the nearest , nontunnel neighbor, only to the end points of the chain, or to all connections along the chain.

Any party to the communication that is not acting as a tunnel may employ an internal cache for handling requests. The effect of a cache is that the request-response chain is shortened if one of the participants along the chain has a cached response applicable to that request. There are a wide variety of architectures and configurations of caches and proxies currently being experimented with or deployed across the Internet. These systems include national hierarchies of proxy caches to save transoceanic bandwidth, systems that broadcast or multicast cache entries, and organizations that distribute subsets of cached data via CD-ROM.

In HTTP/1.0, most implementations used a new connection for each request/response exchange. In HTTP/1.1, a connection may be used for multiple exchanges, although connections may be closed spontaneously for a variety of reasons.

Whereas HTTP communication usually takes place over TCP/IP connections, HTTP can be implemented on top of any other protocol that features a reliable transport and provides delivery guarantees . The mapping of the HTTP/1.1 request and response structures onto the transport data units of the protocol in question is outside the scope of the HTTP protocol specification.

3.4.2.6 FTP

The FTP has had a long evolution over the years [FTP]. Its first widespread use was as a protocol for file transfer between hosts on the ARPANET, allowing the convenient use of remote file storage capabilities. By July 1973, considerable changes from the last versions of FTP were made, but the general structure remained the same. By the time the new official specification reflected these changes, many implementations already deployed based on the older specification were not updated. In 1975, RFC 686 entitled Leaving Well Enough Alone discussed the differences between all of the early and later versions of FTP. Motivated by the transition from the Network Control Protocol (NCP) to the TCP as the underlying protocol, RFC was developed as the specification of FTP for use on top of TCP.

According to the final version of the FTP model, defined in RFC 959, the protocol interpreter initiates the control connection using the Telnet protocol. Subsequently, standard FTP commands are transmitted to the server process via the Telnet control connection. Standard replies are sent from the server to the client over the control connection in response to the commands.

The FTP commands specify the parameters for the data connection (data port, transfer mode, representation type, and structure) and the nature of file system operation (store, retrieve, append, delete, etc.). The client should listen on the specified data port, and the server initiate the data connection and data transfer in accordance with the specified parameters.

Files are transferred only via the data connection. The control connection is used for the transfer of commands, which describe the functions to be performed, and the replies to these commands. The mechanics of transferring data consist of setting up the data connection to the appropriate ports and choosing the parameters for transfer; both the client and the server have a default data port.

There are three modes: one that formats the data and allows for restart procedures; one that also compresses the data for efficient transfer; and the pass-through mode which moves the data with little or no processing. Two processes are used, one at the client and one at the server. One of the two must be passive and the other is active. The passive data transfer process listens on the data port prior to sending a transfer request command. The FTP request command determines the direction of the data transfer. The server, on receiving the transfer request, initiates the data connection to the port. When the connection is established, the data transfer begins, and the server sends a confirming reply to the client.

FTP requires that the control connections be open while data transfer is in progress. It is the responsibility of the user to request the closing of the control connections when finished using the FTP service, and it is the server that takes the action. The server may abort data transfer if the control connections are closed without command.

3.4.2.7 TFTP

The Trivial FTP (TFTP) is the simplest file transfer protocol in widespread use [TFTP]. As opposed to FTP, it has been designed to be implemented on top of the Internet UDP. This protocol is very restrictive , to simplify implementation, and therefore, it lacks most of the features of regular FTP. It only enables reading from and writing to a remote server. It cannot list directories, and currently has no provisions for user authentication.

This protocol is used by all modern set-top boxes to seamlessly download software patches and upgrades. Commonly, three modes of transfer are specified by the TFTP standard:

Netascii , as defined in the USA Standard Code for Information Interchange with the modifications specified in the Telnet Protocol Specification
Octet , which is essentially the binary mode raw 8 bit bytes
Mail mode , which is obsolete and should not be implemented or used

Any transfer begins with a request to read or write a file, which also serves to request a connection. If the server grants the request, the connection is opened and the file is sent in fixed length blocks of 512 bytes. Each data packet contains one block of data, and must be acknowledged by an ACK packet before the next packet can be sent. A data packet of less than 512 bytes signals termination of a transfer. If a packet gets lost in the network, the intended recipient will time-out and may retransmit its last packet (which may be data or an ACK), causing the sender of the lost packet to retransmit that lost packet. The sender has to keep just one packet on hand for retransmission, because the lock step ACK guarantees that all older packets have been received. Both machines involved in a transfer are considered senders and receivers. One sends data and receives ACKs, the other sends ACKs and receives data.

Most errors cause termination of the connection. An error is signaled by sending an error packet. This packet is not acknowledged nor retransmitted (i.e., a TFTP server or user may terminate after sending an error message), so the other end of the connection may not get it. Therefore time-outs are used to detect such a termination when the error packet has been lost. Errors are caused by three types of events: not being able to satisfy the request (e.g., file not found, access violation, or no such user), receiving a packet that cannot be explained by a delay or duplication in the network (e.g., an incorrectly formed packet), and losing access to a necessary resource (e.g., disk full or access denied during a transfer). TFTP recognizes only one error condition that does not cause termination, the source port of a received packet being incorrect. In this case, an error packet is sent to the originating host.

3.4.2.8 RTP

The RTP, defined in RFC 1889, was developed by the Audio-Video transport working group and finalized in 1996 [RTP]. It provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video, or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee QoS for real-time services. Although RTP is designed to be independent of the underlying transport layer, it is usually carried over UDP.

RTP was originally designed with three application scenarios in mind:

Simple multicast audio conference : The audio conferencing application used by each conference participant sends audio data in small millisecond chunks. Each chunk of audio data is preceded by an RTP header, and both are contained in a UDP packet. The Internet, like other packet networks, occasionally loses and reorders packets and delays them by variable amounts of time. To cope with these impairments, the RTP header contains timing information and a sequence number that allow the receivers to reconstruct the timing produced by the source, so that in this example, chunks of audio are contiguously played out the speaker. This timing reconstruction is performed separately for each source of RTP packets in the conference. The sequence number can also be used by the receiver to estimate how many packets are being lost.
Audio and video conference : When both audio and video media are used in a conference, they are transmitted as separate RTP sessions using two different UDP port pairs or multicast addresses. There is no direct coupling at the RTP level between the audio and video sessions, except that a user participating in both sessions should use the same distinguished (canonical) name for both so that the sessions can be associated by the application.
Mixers and translators : Consider a situation in which participants in one area are connected through a low-speed link to the majority of the conference participants who enjoy high-speed network access. Instead of forcing everyone to use a lower-bandwidth, reduced-quality audio encoding, an RTP-level relay called a mixer may be placed near the low bandwidth area. This mixer resynchronizes incoming audio packets to reconstruct the constant millisecond spacing generated by the sender, mixes these reconstructed audio streams into a single stream, translates the audio encoding to a lower bandwidth and forwards the lower bandwidth packet stream across the low-speed link. These packets might be unicast to a single recipient or multicast on a different address to multiple recipients. The RTP header includes a means for mixers to identify the sources that contributed to a mixed packet so that correct talker indication can be provided at the receivers.

RTP is designed to allow an application to scale automatically over session sizes ranging from a few participants to thousands. For example, in an audio conference the data traffic is inherently self-limiting because only one or two people will speak at a time, so with multicast distribution the data rate on any given link remains relatively constant, independent of the number of participants. However, the control traffic is not self-limiting. If the reception reports from each participant were sent at a constant rate, the control traffic would grow linearly with the number of participants. Therefore, the rate must be scaled down.

For each session, it is assumed that the data traffic is subject to an aggregate limit called the session bandwidth to be divided among the participants. This bandwidth might be reserved and the limit enforced by the network, or it might just be a reasonable share. The session bandwidth may be chosen based on some cost or a priori knowledge of the available network bandwidth for the session. It is somewhat independent of the media encoding, but the encoding choice may be limited by the session bandwidth. The session bandwidth parameter is expected to be supplied by a session management application when it invokes a media application, but media applications may also set a default based on the single-sender data bandwidth for the encoding selected for the session. The application may also enforce bandwidth limits based on multicast scope rules or other criteria.

Bandwidth calculations for control and data traffic include lower-layer transport and network protocols (e.g., UDP, IP) because that is what the resource reservation system would need to know. The control traffic should be limited to a small and known fraction of the session bandwidth: small so that the primary function of the transport protocol to carry data is not impaired, and known so that the control traffic can be included in the bandwidth specification given to a resource reservation protocol and so that each participant can independently calculate its share. It is suggested that the fraction of the session bandwidth allocated to control data traffic be fixed at 5%. Although the value of this and other constants in the interval calculation is not critical, it is critical that all participants in the session use the same values so the same interval will be calculated. Therefore, these constants should be fixed for a particular profile.

RTP receivers provide reception quality feedback using report packets that may take one of two forms depending upon whether or not the receiver is also a sender. The only difference between the sender report and receiver report forms, besides the packet type code, is that the sender report includes a 20-byte sender information section for use by active senders. The sender report is issued if a site has sent any data packets during the interval since issuing the last report or the previous one, otherwise the receiver report is issued.

Cumulative packet counts are used in both the sender information and receiver report blocks so that differences can be calculated between any two reports to make measurements over both short and long time periods, and to provide resilience against the loss of a report. The difference between the last two reports received can be used to estimate the recent quality of the distribution. Time stamps are included so that rates can be calculated from these differences over the interval between two reports. The time stamp used is independent of the clock rate for the data encoding. It is therefore possible to implement encoding- and profile-independent quality monitors .

As an example, consider the packet loss rate over the interval between two reception reports. The difference in the cumulative number of packets lost gives the number lost during that interval. The difference in the extended last sequence numbers received gives the number of packets expected during the interval. The ratio of these two is the packet loss fraction over the interval. This ratio should equal the fraction lost field if the two reports are consecutive, but otherwise not. The loss rate per second can be obtained by dividing the loss fraction by the difference in the time stamps, expressed in seconds. The number of packets received is the number of packets expected minus the number lost. The number of packets expected can also be used to judge the statistical validity of any loss estimates. For example, a loss of 1 out of 5 packets has a lower significance than 200 out of 1000.

RTP could be used, for example, to deliver MPEG-4 video over the Internet (see Figure 3.17). Video is compressed using MPEG-4 encoders, encapsulated onto RTP, which in turn is carried over an IP network. The weak link of this protocol stack is the binding of MPEG-4 onto RTP, due to issues related to timing and binding of multiple elementary streams, each delivered with a possibly different bit rate.

Figure 3.17. Delivering video through MPEG-4 compression over RTP and the Internet.

3.4.2.9 RTSP

The RTSP, defined in RFC 2326, is an application-level protocol for control over the delivery of data with real-time properties [RTSP]. RTSP provides an extensible framework to enable controlled, on-demand delivery of real-time data, such as audio and video. Sources of data can include both live data feeds and stored clips. This protocol is intended to control multiple data delivery sessions; provide a means for choosing delivery channels such as UDP, multicast UDP, and TCP; and provide a means for choosing delivery mechanisms based on RTP. The set of streams to be controlled is defined by a presentation description. This memorandum does not define a format for a presentation description.

There is no notion of an RTSP connection; instead, a server maintains a session labeled by an identifier. An RTSP session is in no way tied to a transport-level connection such as a TCP connection. During an RTSP session, an RTSP client can open and close many reliable transport connections to the server to issue RTSP requests. Alternatively, it may use a connectionless transport protocol such as UDP.

The streams controlled by RTSP may use RTP, but the operation of RTSP does not depend on the transport mechanism used to carry continuous media.

The protocol is intentionally similar in syntax and operation to HTTP/1.1 so that extension mechanisms to HTTP can, in most cases, also be added to RTSP. However, RTSP differs in a number of important aspects from HTTP:

RTSP introduces a number of new methods and has a different protocol identifier.
An RTSP server needs to maintain state by default in almost all cases, as opposed to the stateless nature of HTTP.
Both an RTSP server and client can issue requests.
Data is carried out-of- band by a different protocol (with some exceptions).
RTSP is defined to use ISO 10646 (UTF-8) rather than ISO 8859-1, consistent with current HTML internationalization efforts.
The Request-URI always contains the absolute URI and puts the host name in a separate header field to ensure backward compatibility with a historical blunder, HTTP/1.1.

RTSP supports the following operations:

Retrieval of media from media server : The client can request a presentation description via HTTP or some other method. If the presentation is being multicast, the presentation description contains the multicast addresses and ports to be used for the continuous media. If the presentation is to be sent only to the client via unicast, the client provides the destination for security reasons.
Invitation of a media server to a conference : A media server can be invited to join an existing conference, either to play back media into the presentation or to record all or a subset of the media in a presentation. This mode is useful for distributed teaching applications. Several parties in the conference may take turns pushing the remote control buttons .
Addition of media to an existing presentation : Particularly for live presentations, it is useful if the server can tell the client about additional available media.

Each presentation and media stream may be identified by an RTSP URL. The overall presentation and the properties of the media the presentation is made up of are defined by a presentation description file, the format of which is outside the scope of this specification. The presentation description file may be obtained by the client using HTTP or other means such as Email and is not necessarily stored on the media server.

The presentation description file contains a description of the media streams making up the presentation, including their encoding, language, and other parameters that enable the client to choose the most appropriate combination of media. In this presentation description, each media stream that is individually controllable by RTSP is identified by an RTSP URL, which points to the media server handling that particular media stream and names the stream stored on that server. Several media streams can be located on different servers; for example, audio and video streams can be split across servers for load sharing. The description also enumerates which transport methods the server is capable of.

To correlate RTSP requests with a stream, RTSP servers needs to maintain session state whose transitions are described in RFC 2326 and reproduced in Table 3.2. The following commands are central to the allocation and usage of stream resources on the server:

Setup : Causes the server to allocate resources for a stream and start an RTSP session.
Play and Record : Starts data transmission on a stream allocated via SETUP.
Pause : Temporarily halts a stream without freeing server resources.
Teardown : Frees resources associated with the stream. The RTSP session ceases to exist on the server.

RTSP requires that these commands are issued via a reliable protocol such as the TCP.

3.4.2.10 Bridging RTSP and DSM-CC

The MPEG-2 counterpart of RTSP is the DSM-CC ISO/IEC 13818-6, which is currently used in existing Cable deployments. It supports sessions and enables fine control over MPEG streams, including various play speeds (e.g., using trick modes). It is possible to bridge between an RTSP session and a DSM-CC session using a gateway (see Figure 3.18). On the one hand, the RTSP client interacts with the RTSP gateway using RTSP sessions. On the other hand, the DSM-CC media server interacts with the RTSP gateway using DSM-CC sessions.

Figure 3.18. RTSP Gateways can bridge RTSP sessions with MPEG-2 DSM-CC sessions.

Table 3.2. RTSP Session State Transitions

State	Message Sent	State after Response
Init	Setup	Ready
Init	Teardown	Init
Ready	Play	Playing
	Record	Recording
	Teardown	Init
	Setup	Ready
Playing	Pause	Ready
	Teardown	Init
	Play	Playing
	Setup	Playing (changed transport)
Recording	Pause	Ready
	Teardown	Init
	Record	Recording
	Setup	Recording (changed transport)

3.4.2.11 Carriage over AAL5/ATM

Experiences with various applications over the TCP indicates that larger Maximum Transmission Unit (MTU) sizes for the IP over ATM AAL5 networks tend to give better performance than those with smaller MTU. Routers can sometimes perform better with larger packet sizes because most of the performance costs in routers are associated with the number of packets handled rather than the number of bytes transferred. It is therefore highly desirable to reduce fragmentation in the network and thereby enhance performance by having the IP MTU for ATM Adaptation Layer 5 (AAL5) be reasonably large.

Following RFC 1209, which specifies the IP MTU over Switched Multimegabit Data Service (SMDS) to be 9180 octets, to simultaneously increase interpretability and reduce IP fragmentation, the default IP MTU for use with ATM AAL5 was set to 9180. Implementations that support Switched Virtual Circuits (SVC) must attempt to negotiate the MTU using the ATM signaling protocol. When the calling party wants to use a different value than the default, it includes the AAL parameters with the desired value for the Maximum CPCS-SDU Size field as part of the SETUP message of the ATM Signaling Protocol. The called party responds using the same information elements and identifiers in its CONNECT message response. When the called party receives the SETUP message, it processes it as follows :

If it is able to accept the ATM MTU values proposed by the SETUP message, it includes an AAL Parameters information element in its response.
If it wishes a smaller ATM MTU size than that proposed, it indicates that through the AAL Parameters information elements in the CONNECT message responding to the original SETUP message.
In most other cases it responds to the call with result in the message "AAL Parameters cannot be supported."