3.4.1 MPEG-based Methods3.4.1.1 MPEG-2 ArchitectureThe MPEG-2 ISO/IEC 13818-1 Standard for encoding/decoding of video streams contains an early version of the DSM-CC ISO/IEC 13818-6 standard developed for the delivery of multimedia broadband services. DSM-CC covers a number of distinct protocol areas:
To deliver data using MPEG-2, there is a need to introduce various encapsulations , which are essentially methods of packaging data in MPEG packets. This includes DSM-CC data and object carousels, asynchronous, synchronous, and synchronized streaming protocols, and data piping protocols. Intuitively, a data carousel is a chunk of files retransmitted periodically to enable download in presence of random tuning. The DSM-CC data carousel is a collection of MPEG-2 tables carrying modules, each of which is designed for delivery of a single file. The object carousel is built on top of the data carousel, and carries multiple objects within modules, where an object could be a file or a directory. The delivery of data may be asynchronous, synchronous, or synchronized. With asynchronous delivery, there is no requirement that the processing and presentation of the data is coordinated with that of the video. Asynchronous data elementary streams (i.e., packets) do not carry any MPEG-2 system time stamps, and the delivery of the data is not governed by a delivery clock. With synchronous data delivery, there is a requirement that data and video are coordinated. Synchronous elementary data streams use two types of time stamps: Program Clock Reference (PCR) and Presentation time stamp (PTS). PCRs are used to enable receivers to reconstruct the clock used by the emitter to construct the transport stream. PTSs located in the header bytes of the MPEG-2 PES packets, specify the time instants at which the delivery of the first byte of the PES payload should start. The intention of the standard is to provide a metered delivery mechanism. Synchronized data delivery differs from synchronous data delivery in that there is a requirement that the data has been fully decoded and reconstructed at the time specified by the PTS, namely when the reconstructed clock strikes a value that equals that of the PTS specified. Therefore, synchronized delivery provides the tightest integration between data and video. Support of synchronized delivery is complicated, as it requires receivers to start decoding sufficiently early to complete decoding before the PTS. Due to the difficulties of this implementation, the ATSC Trigger standard introduced a method for decoupling a short synchronization trigger, carried in a single packet, from the data it refers to, carried in a large number of packets. 3.4.1.2 MPEG-4 ArchitectureThe MPEG-4 architecture defines the layers of TransMux, DMIF, Sync, and elementary stream, as depicted in Figure 3.15. The first multiplexing layer is managed according to the Delivery Multimedia Integration Framework (DMIF) specification, part 6 of the MPEG-4 standard. This multiplex may be embodied by the MPEG-defined FlexMux tool, which allows grouping of elementary streams with a low multiplexing overhead. Multiplexing at this layer may be used, for example, to group elementary streams with similar Quality of Service (QoS) requirements, reduce the number of network connections or the end to end delay. Figure 3.15. The layered MPEG-4 architecture.
The Transport Multiplexing (TransMux) layer in Figure 3.15 models the layer that offers transport services matching the requested QoS. Only the interface to this layer is specified by MPEG-4, whereas the concrete mapping of the data packets and control signaling must be done in collaboration with the bodies that have jurisdiction over the respective transport protocol. Any suitable existing transport protocol stack, such as MPEG-2 Transport Stream or UDP/IP, over a suitable link layer may become a specific TransMux instance. The choice is left to the end user or service provider, and this allows MPEG-4 to be used in a wide variety of operation environments. 3.4.2 IP-based TransportToday, the most popular data delivery methods are based on IP version 4 [IPv4]. Figure 3.16 depicts the relationship between the various iTV related protocols built on top of the IP protocol. The Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP) are built directly on top of IP. The FTP, HTTP, Secure-FTP (S-FTP), and RTSP are all built on top of TCP. The Secure HTTP (HTTPS) [HTTPS] is built on top of the Transport Layer Security (TLS) protocol, which is built on top of TCP. The RTP, Trivial-FTP (TFTP), and Dynamic Host Configuration Protocol (DHCP) are built on top of UDP. Figure 3.16. A subset of the IP protocol stack that is important for iTV.
3.4.2.1 IPThe IP is concerned with two basic issues: addressing and fragmentation [IPv4].
IP serves as a method of communication between hosts , each of which must have modules that know how to receive, send, and process IP packets, also known as Internet datagrams. The packets are regarded as unrelated and independent of each other, and there are no logical or virtual connections defined. The modules share common rules for interpreting address fields and fragmenting and assembling Internet datagrams. In addition, these modules share common procedures for making routing decisions. Four key mechanisms are employed:
IP does not provide a reliable communication facility. There are no acknowledgments either end-to-end or hop-by-hop . There is no error control for data, only a header checksum. There are no retransmissions. There is no flow control. Errors detected may be reported via the Internet Control Message Protocol (ICMP), which is implemented in the IP module [ICMP]. 3.4.2.2 TCPThe Transmission Control Protocol (TCP) is a connection-oriented, end-to-end reliable protocol designed for use as a highly reliable host-to-host protocol between hosts in packet-switched computer communication networks [TCP]. Very few assumptions are made about the reliability of the communication protocols below the TCP layer, allowing it to operate above a wide spectrum of communication systems. To provide reliability of service on top of a less reliable IP, TCP introduces facilities in the following areas:
3.4.2.3 UDPThis User Datagram Protocol (UDP) was finalized in 1980 [UDP]. It was developed to standardize a datagram mode of packet-switched communication in the environment of an interconnected set of computer networks. UDP assumes that the IP is used as the underlying protocol. The protocol does not guarantee delivery and duplicate protection; applications requiring ordered reliable delivery of streams of data should use the TCP. The UDP implementation must be able to interface with the IP implementation to determine the source and destination IP addresses and the protocol field from the header. One possible UDP/IP interface could return the whole datagram including the entire header in response to a receive operation. Such an interface could also allow the UDP to pass a full datagram, complete with header, to the IP to send. The IP would verify certain fields for consistency and compute the header checksum. 3.4.2.4 DHCPThe DHCP provides a communications startup framework for passing configuration information to hosts on a TCP/IP network [DHCP]. DHCP is based on the Bootstrap Protocol (BOOTP), adding the capability of automatic allocation of reusable network addresses and additional configuration options. DHCP consists of two components : a protocol for delivering host-specific configuration parameters from a DHCP server to a host and a mechanism for allocation of network addresses to hosts. DHCP is built on a client-server model, where designated DHCP server hosts allocate network addresses and deliver configuration parameters to dynamically configured hosts. In the context of DHCP, the term server (typically a headend) refers to a host providing initialization parameters through DHCP, and the term client (typically a set-top box) refers to a host requesting initialization parameters from a DHCP server. DHCP supports three mechanisms for IP address allocation. In automatic allocation , DHCP assigns a permanent IP address to a client. In dynamic allocation , DHCP assigns an IP address to a client for a limited period of time (or until the client explicitly relinquishes the address). In manual allocation , a client's IP address is assigned by the network administrator, and DHCP is used simply to convey the assigned address to the client. A particular network will use one or more of these mechanisms, depending on the policies of the network administrator. Dynamic allocation is the only one of the three mechanisms that allows automatic reuse of an address that is no longer needed by the client to which it was assigned. Thus, dynamic allocation is particularly useful for assigning an address to a client that will be connected to the network only temporarily or for sharing a limited pool of IP addresses among a group of clients that do not need permanent IP addresses. It is therefore best suited for iTV applications in which a spontaneous temporary connection may be established, e.g., for the purpose of performing a TV-commerce transaction. Dynamic allocation may also be a good choice for assigning an IP address to a new client being permanently connected to a network where IP addresses are sufficiently scarce that it is important to reclaim them when old clients are retired . 3.4.2.5 HTTPThe HTTP is a generic, stateless application-level protocol [HTTP1.0]. HTTP has been in use by the World-Wide Web global information initiative since 1990. HTTP/1.0 did not sufficiently take into consideration the effects of hierarchical proxies, caching, the need for persistent connections, or virtual hosts. The HTTP/1.1 protocol was developed to address those issues [HTTP1.1]. The HTTP protocol is a request-response protocol. A client sends a request to the server in the form of a request method, a Uniform Resource Identifier (URI) [URI], and protocol version, followed by a message containing request modifiers, client information, and possible body content over a connection with a server. The server responds with a status line, including the message's protocol version and a success or error code, followed by a message containing server information, entity meta information, and possible entity-body content. HTTP/1.1 introduced the URI to unify the Uniform Resource Location (URL) and Name (URN) [URI]. Messages are passed in a format similar to that used by Internet mail as defined by Multipurpose Internet Mail Extensions (MIME) [MIME]. HTTP messages are either requests from a client to a server or responses from a server to a client. Both types of messages consist of a start line (which is either a request-line or status-line), zero or more header fields (also known as headers), an empty line indicating the end of the header fields, and possibly a message body. In many cases, HTTP communication is initiated by a browser and consists of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished via a single connection between the browser and the origin server. A more complicated situation occurs when one or more intermediaries are present in the request-response chain. There are 3 common forms of intermediary: proxy, gateway, and tunnel. A proxy is a forwarding agent, receiving requests for a URI in its absolute form, rewriting all or part of the message, and forwarding the reformatted request toward the server identified by the URI. A gateway is a receiving agent, acting as a layer above some other server(s) and, if necessary, translating requests to the underlying server's protocol. A tunnel acts as a relay point between two connections without changing the messages; tunnels are used when the communication needs to pass through an intermediary (e.g., a firewall) even when the intermediary cannot understand the contents of the messages. A request or response message that travels the whole chain will pass through multiple separate connections. Some HTTP communication options may apply only to the connection with the nearest , nontunnel neighbor, only to the end points of the chain, or to all connections along the chain. Any party to the communication that is not acting as a tunnel may employ an internal cache for handling requests. The effect of a cache is that the request-response chain is shortened if one of the participants along the chain has a cached response applicable to that request. There are a wide variety of architectures and configurations of caches and proxies currently being experimented with or deployed across the Internet. These systems include national hierarchies of proxy caches to save transoceanic bandwidth, systems that broadcast or multicast cache entries, and organizations that distribute subsets of cached data via CD-ROM. In HTTP/1.0, most implementations used a new connection for each request/response exchange. In HTTP/1.1, a connection may be used for multiple exchanges, although connections may be closed spontaneously for a variety of reasons. Whereas HTTP communication usually takes place over TCP/IP connections, HTTP can be implemented on top of any other protocol that features a reliable transport and provides delivery guarantees . The mapping of the HTTP/1.1 request and response structures onto the transport data units of the protocol in question is outside the scope of the HTTP protocol specification. 3.4.2.6 FTPThe FTP has had a long evolution over the years [FTP]. Its first widespread use was as a protocol for file transfer between hosts on the ARPANET, allowing the convenient use of remote file storage capabilities. By July 1973, considerable changes from the last versions of FTP were made, but the general structure remained the same. By the time the new official specification reflected these changes, many implementations already deployed based on the older specification were not updated. In 1975, RFC 686 entitled Leaving Well Enough Alone discussed the differences between all of the early and later versions of FTP. Motivated by the transition from the Network Control Protocol (NCP) to the TCP as the underlying protocol, RFC was developed as the specification of FTP for use on top of TCP. According to the final version of the FTP model, defined in RFC 959, the protocol interpreter initiates the control connection using the Telnet protocol. Subsequently, standard FTP commands are transmitted to the server process via the Telnet control connection. Standard replies are sent from the server to the client over the control connection in response to the commands. The FTP commands specify the parameters for the data connection (data port, transfer mode, representation type, and structure) and the nature of file system operation (store, retrieve, append, delete, etc.). The client should listen on the specified data port, and the server initiate the data connection and data transfer in accordance with the specified parameters. Files are transferred only via the data connection. The control connection is used for the transfer of commands, which describe the functions to be performed, and the replies to these commands. The mechanics of transferring data consist of setting up the data connection to the appropriate ports and choosing the parameters for transfer; both the client and the server have a default data port. There are three modes: one that formats the data and allows for restart procedures; one that also compresses the data for efficient transfer; and the pass-through mode which moves the data with little or no processing. Two processes are used, one at the client and one at the server. One of the two must be passive and the other is active. The passive data transfer process listens on the data port prior to sending a transfer request command. The FTP request command determines the direction of the data transfer. The server, on receiving the transfer request, initiates the data connection to the port. When the connection is established, the data transfer begins, and the server sends a confirming reply to the client. FTP requires that the control connections be open while data transfer is in progress. It is the responsibility of the user to request the closing of the control connections when finished using the FTP service, and it is the server that takes the action. The server may abort data transfer if the control connections are closed without command. 3.4.2.7 TFTPThe Trivial FTP (TFTP) is the simplest file transfer protocol in widespread use [TFTP]. As opposed to FTP, it has been designed to be implemented on top of the Internet UDP. This protocol is very restrictive , to simplify implementation, and therefore, it lacks most of the features of regular FTP. It only enables reading from and writing to a remote server. It cannot list directories, and currently has no provisions for user authentication. This protocol is used by all modern set-top boxes to seamlessly download software patches and upgrades. Commonly, three modes of transfer are specified by the TFTP standard:
Any transfer begins with a request to read or write a file, which also serves to request a connection. If the server grants the request, the connection is opened and the file is sent in fixed length blocks of 512 bytes. Each data packet contains one block of data, and must be acknowledged by an ACK packet before the next packet can be sent. A data packet of less than 512 bytes signals termination of a transfer. If a packet gets lost in the network, the intended recipient will time-out and may retransmit its last packet (which may be data or an ACK), causing the sender of the lost packet to retransmit that lost packet. The sender has to keep just one packet on hand for retransmission, because the lock step ACK guarantees that all older packets have been received. Both machines involved in a transfer are considered senders and receivers. One sends data and receives ACKs, the other sends ACKs and receives data. Most errors cause termination of the connection. An error is signaled by sending an error packet. This packet is not acknowledged nor retransmitted (i.e., a TFTP server or user may terminate after sending an error message), so the other end of the connection may not get it. Therefore time-outs are used to detect such a termination when the error packet has been lost. Errors are caused by three types of events: not being able to satisfy the request (e.g., file not found, access violation, or no such user), receiving a packet that cannot be explained by a delay or duplication in the network (e.g., an incorrectly formed packet), and losing access to a necessary resource (e.g., disk full or access denied during a transfer). TFTP recognizes only one error condition that does not cause termination, the source port of a received packet being incorrect. In this case, an error packet is sent to the originating host. 3.4.2.8 RTPThe RTP, defined in RFC 1889, was developed by the Audio-Video transport working group and finalized in 1996 [RTP]. It provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video, or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee QoS for real-time services. Although RTP is designed to be independent of the underlying transport layer, it is usually carried over UDP. RTP was originally designed with three application scenarios in mind:
RTP is designed to allow an application to scale automatically over session sizes ranging from a few participants to thousands. For example, in an audio conference the data traffic is inherently self-limiting because only one or two people will speak at a time, so with multicast distribution the data rate on any given link remains relatively constant, independent of the number of participants. However, the control traffic is not self-limiting. If the reception reports from each participant were sent at a constant rate, the control traffic would grow linearly with the number of participants. Therefore, the rate must be scaled down. For each session, it is assumed that the data traffic is subject to an aggregate limit called the session bandwidth to be divided among the participants. This bandwidth might be reserved and the limit enforced by the network, or it might just be a reasonable share. The session bandwidth may be chosen based on some cost or a priori knowledge of the available network bandwidth for the session. It is somewhat independent of the media encoding, but the encoding choice may be limited by the session bandwidth. The session bandwidth parameter is expected to be supplied by a session management application when it invokes a media application, but media applications may also set a default based on the single-sender data bandwidth for the encoding selected for the session. The application may also enforce bandwidth limits based on multicast scope rules or other criteria. Bandwidth calculations for control and data traffic include lower-layer transport and network protocols (e.g., UDP, IP) because that is what the resource reservation system would need to know. The control traffic should be limited to a small and known fraction of the session bandwidth: small so that the primary function of the transport protocol to carry data is not impaired, and known so that the control traffic can be included in the bandwidth specification given to a resource reservation protocol and so that each participant can independently calculate its share. It is suggested that the fraction of the session bandwidth allocated to control data traffic be fixed at 5%. Although the value of this and other constants in the interval calculation is not critical, it is critical that all participants in the session use the same values so the same interval will be calculated. Therefore, these constants should be fixed for a particular profile. RTP receivers provide reception quality feedback using report packets that may take one of two forms depending upon whether or not the receiver is also a sender. The only difference between the sender report and receiver report forms, besides the packet type code, is that the sender report includes a 20-byte sender information section for use by active senders. The sender report is issued if a site has sent any data packets during the interval since issuing the last report or the previous one, otherwise the receiver report is issued. Cumulative packet counts are used in both the sender information and receiver report blocks so that differences can be calculated between any two reports to make measurements over both short and long time periods, and to provide resilience against the loss of a report. The difference between the last two reports received can be used to estimate the recent quality of the distribution. Time stamps are included so that rates can be calculated from these differences over the interval between two reports. The time stamp used is independent of the clock rate for the data encoding. It is therefore possible to implement encoding- and profile-independent quality monitors . As an example, consider the packet loss rate over the interval between two reception reports. The difference in the cumulative number of packets lost gives the number lost during that interval. The difference in the extended last sequence numbers received gives the number of packets expected during the interval. The ratio of these two is the packet loss fraction over the interval. This ratio should equal the fraction lost field if the two reports are consecutive, but otherwise not. The loss rate per second can be obtained by dividing the loss fraction by the difference in the time stamps, expressed in seconds. The number of packets received is the number of packets expected minus the number lost. The number of packets expected can also be used to judge the statistical validity of any loss estimates. For example, a loss of 1 out of 5 packets has a lower significance than 200 out of 1000. RTP could be used, for example, to deliver MPEG-4 video over the Internet (see Figure 3.17). Video is compressed using MPEG-4 encoders, encapsulated onto RTP, which in turn is carried over an IP network. The weak link of this protocol stack is the binding of MPEG-4 onto RTP, due to issues related to timing and binding of multiple elementary streams, each delivered with a possibly different bit rate. Figure 3.17. Delivering video through MPEG-4 compression over RTP and the Internet.
3.4.2.9 RTSPThe RTSP, defined in RFC 2326, is an application-level protocol for control over the delivery of data with real-time properties [RTSP]. RTSP provides an extensible framework to enable controlled, on-demand delivery of real-time data, such as audio and video. Sources of data can include both live data feeds and stored clips. This protocol is intended to control multiple data delivery sessions; provide a means for choosing delivery channels such as UDP, multicast UDP, and TCP; and provide a means for choosing delivery mechanisms based on RTP. The set of streams to be controlled is defined by a presentation description. This memorandum does not define a format for a presentation description. There is no notion of an RTSP connection; instead, a server maintains a session labeled by an identifier. An RTSP session is in no way tied to a transport-level connection such as a TCP connection. During an RTSP session, an RTSP client can open and close many reliable transport connections to the server to issue RTSP requests. Alternatively, it may use a connectionless transport protocol such as UDP. The streams controlled by RTSP may use RTP, but the operation of RTSP does not depend on the transport mechanism used to carry continuous media. The protocol is intentionally similar in syntax and operation to HTTP/1.1 so that extension mechanisms to HTTP can, in most cases, also be added to RTSP. However, RTSP differs in a number of important aspects from HTTP:
RTSP supports the following operations:
Each presentation and media stream may be identified by an RTSP URL. The overall presentation and the properties of the media the presentation is made up of are defined by a presentation description file, the format of which is outside the scope of this specification. The presentation description file may be obtained by the client using HTTP or other means such as Email and is not necessarily stored on the media server. The presentation description file contains a description of the media streams making up the presentation, including their encoding, language, and other parameters that enable the client to choose the most appropriate combination of media. In this presentation description, each media stream that is individually controllable by RTSP is identified by an RTSP URL, which points to the media server handling that particular media stream and names the stream stored on that server. Several media streams can be located on different servers; for example, audio and video streams can be split across servers for load sharing. The description also enumerates which transport methods the server is capable of. To correlate RTSP requests with a stream, RTSP servers needs to maintain session state whose transitions are described in RFC 2326 and reproduced in Table 3.2. The following commands are central to the allocation and usage of stream resources on the server:
RTSP requires that these commands are issued via a reliable protocol such as the TCP. 3.4.2.10 Bridging RTSP and DSM-CCThe MPEG-2 counterpart of RTSP is the DSM-CC ISO/IEC 13818-6, which is currently used in existing Cable deployments. It supports sessions and enables fine control over MPEG streams, including various play speeds (e.g., using trick modes). It is possible to bridge between an RTSP session and a DSM-CC session using a gateway (see Figure 3.18). On the one hand, the RTSP client interacts with the RTSP gateway using RTSP sessions. On the other hand, the DSM-CC media server interacts with the RTSP gateway using DSM-CC sessions. Figure 3.18. RTSP Gateways can bridge RTSP sessions with MPEG-2 DSM-CC sessions.
Table 3.2. RTSP Session State Transitions
3.4.2.11 Carriage over AAL5/ATMExperiences with various applications over the TCP indicates that larger Maximum Transmission Unit (MTU) sizes for the IP over ATM AAL5 networks tend to give better performance than those with smaller MTU. Routers can sometimes perform better with larger packet sizes because most of the performance costs in routers are associated with the number of packets handled rather than the number of bytes transferred. It is therefore highly desirable to reduce fragmentation in the network and thereby enhance performance by having the IP MTU for ATM Adaptation Layer 5 (AAL5) be reasonably large. Following RFC 1209, which specifies the IP MTU over Switched Multimegabit Data Service (SMDS) to be 9180 octets, to simultaneously increase interpretability and reduce IP fragmentation, the default IP MTU for use with ATM AAL5 was set to 9180. Implementations that support Switched Virtual Circuits (SVC) must attempt to negotiate the MTU using the ATM signaling protocol. When the calling party wants to use a different value than the default, it includes the AAL parameters with the desired value for the Maximum CPCS-SDU Size field as part of the SETUP message of the ATM Signaling Protocol. The called party responds using the same information elements and identifiers in its CONNECT message response. When the called party receives the SETUP message, it processes it as follows :
|