Delivering Streaming Media

To accommodate multimedia applications requiring delay-sensitive delivery, use real-time media transfer protocols such as Real-Time Transport Protocol (RTP), and its partner control protocol Real-Time Control Protocol (RTCP). To provide additional control to the media transfer, use signaling protocols such as Real-Time Streaming Protocol (RTSP), SIP, and H.323. You also can specify these signaling protocols to use Session Description Protocol (SDP), XML, or SMIL to supply the information related to the session to the participants.

Table 9-3 outlines the real-time protocols discussed in this Chapter.

Table 9-3. Streaming Media Protocols
Protocol	Usage	Related Standards, RFC(s),W3C documents	Protocol Versions Available
Real-Time Transport Protocol (RTP)	Media Transport protocol	RTP Standard RFC 3550 (obsoletes RFC 1889) RFC 3551 RTP Audio/Video Profile (extends RFC 3550)	No version numbering
Real-Time Control Protocol (RTCP)	RTP control for reporting, timing and statistics calculation.	Defined within RFC 3550	No version numbering
Real-Time Streaming Protocol (RTSP)	Signaling protocol for controlling RTP sessions. Provides VCR-like controls. Uses TCP port 554.	RFC 2326 (April 1998)	Version 2
Session Initiation Protocol (SIP)	Signaling protocol for inviting participants to a session.	RFC 3261 (obsoletes RFC 2543)	No version numbering
H.323	Signaling protocol for inviting participants to a session. Originally meant for fixed bandwidth (e.g., ISDN) conferencing, but now used for scalable real-time video conferencing, such as Microsoft NetMeeting.	Not an Internet Draft RFC. Managed by ITU-T	Version 5 (2003)
Session Description protocol (SDP)	Used by RTSP and SIP to describe the session to participants.	RFC 2327	No version numbering
SMIL	Used for rich multimedia presentations which integrate streaming audio and video with images, text, or any other media type.	http://www.w3.org/AudioVideo/ http://www.w3.org/TR/SMIL2/M/	Version 2

Transferring Streaming Media with the Real-Time Transport Protocol

RTP provides the transport for audio and visual media transmission over an IP network. The Layer 4 transport can be over UDP or TCP, but more often UDP is used as the transport protocol. For real-time applications, UDP provides less packet delay than TCP. Recall from Chapter 2, "Exploring the Network Layers," that delays occur using TCP and are associated with retransmissions from packet loss and the TCP slow start congestion control algorithm. Furthermore, most real-time applications prefer to conceal packet loss rather than retransmit lost packets. As such, RTP provides mechanisms to handle network issues, such as jitter and packet loss, on its own at the application layer of the OSI model.

Note

You can scale RTP by using IP Multicast Layer 3 forwarding, provided that you enable IP Multicast features in your network infrastructure, such as PIM-SM and Bidir-PIM. You also can use unicast-UDP to transport RTP sessions.

RTP is flexible as it specifies the transport mechanism, not the payload formats and algorithms for the underlying real-time media. RTP can transport a number of video formats, such as MPEG-4, H.261, JPEG compressed video, and many more. However, each payload format is normally specified separately in its respective RFCs or ITU document, in order to provide format-specific header values and controls. For example, in the case of H.261, RFC 2032 specifies Negative Acknowledgements to control video flow and handle retransmission of lost packets.

Because RTP uses UDP as its transport protocol to deal with delays associated with network errors resulting in lost or out-of-sequence packets, RTP packets include timestamps and sequence information in their application headers. RTP uses the timing information to synchronize different sources involved in a multimedia presentation, such as audio and video. For example, lip movements require synchronization with the voice of someone presenting over a video conference or corporate communication. RTP also includes sequence numbers to determine lost or out-of-sequence packets. In contrast to the mechanism TCP uses to retransmit and reorder in the event of errors, RTP uses sequence numbers to detect packet loss in order to conceal rather than correct errors. The assumption is that a video frame received out-of-sequence is better discarded than displayed to the viewer out-of-sequence.

Note

Although RTP uses a short playout buffer, it is meant to alleviate packet jitter and is not suitable for buffering, retransmitting, and reordering packets.

RTCP is the protocol within RTP for session monitoring and control but does not provide any delivery guarantee. RTCP maintains the state of RTP sessions using a unique identifier, called CNAME, for each group of RTP-UDP connections. RTCP uses this state to group the different feeds into a single multimedia session. Based on the session that the RTP-UDP connections belong to, synchronization can take place using the timing information that is included in the session information embedded within each RTP and RTCP UDP connection. However, the timestamps in the RTP packets originating from different servers may skew from one another, making synchronization difficult. As a result, RTCP provides a reference clock (or wallclock) to reconcile timestamps from different RTP streams for synchronization purposes. You can derive the RTCP wallclock from an external Network Time Protocol (NTP) source. NTP is accurate enough to provide time resolution appropriate for any of today's streaming media applications. The streaming media application can use the RTCP time reference to calculate jitter, data packet rates, and clock skew in the individual RTP connections.

RTCP also provides congestion control through client-side reporting on the quality of the streaming data reception. RTCP sender reports (SR) are sent periodically (for example, every 5 seconds) to receivers indicating the quality of the stream. Based on these reports, participants can calculate statistics, such as number of lost packets, round trip times, and inter-arrival jitter. Optionally, RTCP also can send participant information such as participant name, e-mail address, phone number, and location in the SRs.

RTCP sends packets periodically to all participants in the session, using a different port number than the RTP streams. This way, all participants can evaluate the total number of participants. Packets are sent using the same distribution as the RTP streams, either UDP unicast or multicast. RTCP traffic normally does not exceed 5 percent of the total session bandwidth, with at least 25 percent of that being for source reports.

Table 9-4 lists the available RTCP commands.

Table 9-4. Available RTCP Commands
Packet Code	Description
SR	Sender Reporttransmit and receive reports for active senders to all participants
RR	Receiver Reporttransmit and receive reports for receivers to all participants
SDES	Source descriptiondescribe the source of session, including the CNAME
BYE	Explicit leave
APP	Application specific extensions

RTP organizes data into payloads such that each packet contains an independently decodable unit. If possible, each frame of a video feed is compressed and sent in a single packet, so that the user can decode the packet as it arrives on the network. If a source sends a single frame across multiple packets, the RTP timestamp is the same for each packet.

Note

RTP uses the RTP UDP port range 1638432767. RTP uses the even numbers in this range; RTCP uses odd numbers within the range.

Real-time Data Control with Real Time Streaming Protocol

RTSP acts as a TV remote control, enabling the recipient to use functions, such as play, pause, record, fast-forward, and rewind, to control the delivery of media from the origin server to clients.

RTSP is similar to HTTP with the following major exceptions:

RTSP maintains session state by default. RTSP requires session state for normal operations, whereas HTTP requires state only for session-stickiness.
Servers can contact clients using an existing persistent connection.
RTSP does not transport the stream. The data RTSP transports is the meta-information for the stream.
RTSP clients must specify the host portion of the URL within the RTSP commands, whereas, HTTP allows clients to specify the relative path of the URL within the requests and use a Host: header to specify the host portion of the URL.

Table 9-5 lists the common RTSP messages that clients and servers use.

Table 9-5. Available RTSP Control Messages
Method	Method Direction (i.e., Client Direction Server)	Description
DESCRIBE	->	Requests the description file of a session.
ANNOUNCE	<->	The client publishes a new description file to a server using the ANNOUNCE method. If sent to the client by the server during a real-time session, an update to a description file is sent.
SETUP	->	Sent by the client to a server to specify the transport protocol to be used in the session, for example, Real-Time Transport Protocol/Audio Video Profile (RTP/AVP).
PAUSE	->	Causes the server to temporarily suspend the media. Not yet used for live feeds.
TEARDOWN	->	Requests the server to stop the session.
PLAY	->	Tells the server to start sending the streaming data over the transport indicated in the SETUP method.
RECORD	->	Indicates to the server to store the streaming media content, useful in live streaming.
REDIRECT	<-	Informs the client that it must connect to another location.
SET_PARAMETER	<->	Assigns a value of a session-specific parameter. These parameters are dependant on the implementation.
GET_PARAMETER	<->	Requests a value from the server of a session-specific parameter.
OPTIONS	<->	Identical to the HTTP 1.1 Options header. The server must return the available methods for the URI specified.

A client must have information about the following components to request and receive a stream.

Media types The media types in use, such as audio, video, and whiteboards
Network protocols used The network protocols in use, such as RTP, UDP, and IP
Codecs Information to determine the codec algorithms in use, such as H.261 and MPEG4
IP addresses and ports The multicast or unicast IP addresses that are in use, whether TCP or UDP is used as the transport protocol, and the TCP or UDP port numbers that are in use

RTSP can use Session Description Protocol (SDP), SMTP, XML, or SMIL or inform clients of this information. In Figure 9-3, a client uses HTTP to request a description of the streaming content from the server. You identify the streaming content by URL in the same way that HTTP identifies web content. The server responds with a detailed SDP description of the streaming media, including the media types, transport protocols, the multicast or unicast IP addresses of the sources, and codecs. Figure 9-3 shows a typical RTSP flow in which the client retrieves the SDP file using HTTP.

Figure 9-3. Sample RTSP Flow for Controlling a Multimedia Session

You also can send SDP description files to clients using RTSP in response to the DESCRIBE method. Example 9-4 is a description file in SDP format for an on-demand Windows streaming media session within an .asf file containing three streams: an audio, video, and whiteboard stream.

Note

The W3C defines custom XML tags that enable you to implement the SDP fields given in Example 9-4 using SMIL. To use the custom tags, you need to define the SDP namespace <smil xmlns:sdp="http://www.w3.org/AudioVideo/1998/08/draft-hoschka-smilsdp-00"> in your SMIL file.

Example 9-4. Sample Session Description File to Describe a WMT Stream

 v=0                                                   - protocol version o=sdaros 2890844526 2890842807 IN IP4 10.1.2.11       - owner information s=Content Networking Fundamentals                     - session name i=A Seminar on Content Networking Concepts            - session description u=http://www.cisco.com/cn.pdf                         - URL of user-friendly description e=sdaros@ssdl.com (Silvano Da Ros)                    - owner e-mail c=IN IP4 10.1.3.11/127                                - IP address of t=2873397496 2873404696                               - time range of session a=recvonly                                            - direction of the session m=audio 3456 RTP/AVP 0                                - media types and port numbers m=video 2232 RTP/AVP 31 m=whiteboard 32416 UDP WB a=orient:portrait                                     - Orientation = landscape or portrait.

Note

If you are using RealNetworks or WMT, you can package the SDP file directly into the proprietary container file headers.

When the client receives the SDP file, it sends a SETUP method to the server to initialize the requested session. The SETUP includes the transport and the port numbers that the client requires.

 SETUP rtsp://ssdl.com/test.asf RTSP/1.0        CSeq: 101        Transport: RTP/AVP;unicast;client_port=2301-3202

The server responds with a RTSP 200 OK that includes a sequence number for the current method, and an identifier for the client to use in subsequent RTSP methods as a reference to the session. The server confirms the transport (RTP/AVP;unicast) and client ports (client_port), and informs the client as to the server ports (server_port) to use for the RTP connection, within the Transport: RTSP header.

 RTSP/1.0 200 OK        CSeq: 101        Date: 23 Aug 2005 15:35:06 GMT        Session: 47112344        Transport: RTP/AVP;unicast;          client_port=4589-4589;server_port=6256-6257

By keeping track of session state, a participant may send many RTSP messages over short-lived TCP connections, throughout the timeline of the presentation. By providing the session identifier in every RTSP request or response, an RTSP session can span multiple TCP connections. RTSP supports pipelining as well and works with either unicast or multicast.

The server then sends a PLAY method notifying the server that it should start the RTP session. In this example, the server sends the three streams that are indicated in the SDP file over three independent UDP streams. The client and server use a fourth RTCP TCP stream to synchronize the three streams. The client and server establish these four streams transparently to RTSP. The client includes the time range of the session that it wishes to view in the PLAY method.

 PLAY rtsp://ssdl.com/test.asf RTSP/1.0            CSeq: 102            Session: 47112344            Range: npt=1-40

Note

RTSP reuses HTTP Basic and Message digest authentication to authenticate users.

The server responds with a 200 OK indicating to the client that it should initiate the three RTP UDP streams and the single RTCP TCP connection to the server.

 RTSP/1.0 200 OK            CSeq: 102

When the client wants to stop or pause the session, it sends a TEARDOWN or PAUSE method to the server. Figure 9-4 illustrates the finite state machine (FSM) that RTSP uses to transition between idle, ready, and playing/recording.

Figure 9-4. RTSP State Diagram

Note

RTSP supports cache control mechanisms in a similar manner to HTTP.

If your clients are behind a Cisco PIX firewall performing NAT, you can use the PIX application recognition (fixup) feature to rewrite private NAT addresses in the payload with registered public IP addresses. To enable RTSP fixup, use the command fixup protocol rtsp. If instead your Cisco IOS router is performing NAT, you can use NBAR to recognize/rewrite RTSP. To enable NAT RTSP support on your router, use the ip nat service rtsp port global configuration command.

Fast-Forwarding and Rewinding a Stream with RTSP

RTSP clients can use the Scale: header in conjunction with the PLAY method to indicate the speed and direction that the server should stream the media to the client. As an example, if the client sets the header as Scale: 2, it will receive the stream at twice the normal rate. Also, if Scale: 0.5, the server sends the stream at half the rate. Negative Scale: values instruct the server to deliver the stream in reverse direction. That is, the client requests to rewind the stream.

The client also uses the Range: header in the PLAY method to seek through various parts of the stream. The Range: header takes three different timestamp types as parameters.

Normal Play Time (NPT) NPT starts at time zero and increases time as the session proceeds. NPT is the equivalent to the time you see on our VCR or DVD display while watching a movie. For example, 00:19:03.3 indicates the session has progressed for just over 19 minutes (19 minutes and 3.3 seconds).
Society of Motion Picture and Television Engineers (SMTE) relative time codes SMPTE relative time codes enable direct labeling of the 30 individual frames sampled every second during a video stream. For example, 00:19:02:20 indicates the 20^th frame of second 2 in minute 19.
Absolute time Absolute time expresses the range values in Greenwich Mean Time (GMT), or Coordinated Universal Time (UTC) as specified by ISO 8601 timestamps. For example, 20061016T144523.45Z expresses the timestamp from October 16, 2006, at 14:45:23.45 GMT.

Table 9-6 gives samples of each timestamp type.

Table 9-6. Samples of NPT, SMTE, and UTC Timecodes
Timestamp Type	Sample Ranges	Description of Seek
NPT	Range: npt=00:07:33- Range: npt=00:07:33-00:18:23	Start at 00:07:33; with no end-time specified. Start at 00:07:33; end at 00:18:23
SMPTE	Range: smpte=00:07:33:15-	Start at 00:07:33; with no end-time specified.
UTC	Range: 20061016T144523.45Z- 20061016T145013.15Z	Start October 16, 2006, at 14:45:23.45 GMT; End October 16, 2006, at 14:50:13.15 GMT

Using Quality of Service and IP Multicast with Streaming Media

Streaming video normally produces constant data rates with frames arriving at constant intervals. However, as you learned in Chapter 6, "Ensuring Content Delivery with Quality of Service," networks can introduce packet loss, packet delay, and jitter between frame inter-arrival times. To reduce the effect of these network-related issues on your applications, you should use the following QoS features that you learned in Chapter 6, to prioritize your streaming media applications over less-critical applications.

Classifying and Marking Streaming Applications You can classify and mark your streaming content using the schemes you learned in Chapter 6. For example, you can use the RTP, RTCP, and RTSP protocol classification mechanism in NBAR to classify and mark streaming media packets. You can then apply CBWFQ to the marked packets to give priority to the streaming applications in your network.
Class-Based Weighted Fair Queuing (CBWFQ) with Low Latency Queuing (LLQ) You can configure a number of queues for your marked streaming protocols, such as RTSP and RTCP, and specify the overall bandwidth of each application. You also can allocate a priority queue (PQ) to your transport protocols, such as RTP and RTCP, to ensure that streaming content has priority over less-critical traffic.
Resource Reservation Protocol (RSVP) and Traffic Policing Your eyes are not able to notice frame discontinuity, or flicker, at rates greater than 30 frames/sec (fps). Indeed, movies that you watch in theaters display at 24 fps. Encoding video frames at rates between 10 and 30 fps can deliver data rates ranging from 40 kbps to over 1 Mbps per stream. If your streaming applications support RSVP, you can enable RSVP in your network to allow receivers to signal their desired resource requirements per streaming flow. Alternatively, you can enable traffic policing to give flows appropriate levels of bandwidth to avoid having other less critical applications flood your WAN links.
Note

To reduce the amount of bandwidth required by RTP on serial links of 2 Mbps or less, you can compress RTP headers using the ip rtp header-compression configuration command.
Layer 2 QoS To ensure that your Layer 2 network does not drop audiovideo traffic, you can enable the Layer 2 QoS features you learned in Chapter 6.

Most streaming applications are uni-directional, so you can enable PIM-SM to scale the application on your network. However, if your application requires bidirectional communication, you can enable Bidir-PIM, as you learned in Chapter 5, "IP Multicast Content Delivery."

Table 9-3. Streaming Media Protocols

Transferring Streaming Media with the Real-Time Transport Protocol

Table 9-4. Available RTCP Commands

Real-time Data Control with Real Time Streaming Protocol

Table 9-5. Available RTSP Control Messages

Figure 9-3. Sample RTSP Flow for Controlling a Multimedia Session

Example 9-4. Sample Session Description File to Describe a WMT Stream

Figure 9-4. RTSP State Diagram

Fast-Forwarding and Rewinding a Stream with RTSP

Table 9-6. Samples of NPT, SMTE, and UTC Timecodes

Using Quality of Service and IP Multicast with Streaming Media