Generating RTP Packets


As compressed frames are generated, they are passed to the RTP packetization routine. Each frame has an associated timestamp, from which the RTP timestamp is derived. If the payload format supports fragmentation, large frames are fragmented to fit within the maximum transmission unit of the network (this is typically needed only for video). Finally, one or more RTP packets are generated for each frame, each including media data and any required payload header. The format of the media packet and payload header is defined according to the payload format specification for the codec used. The critical parts to the packet generation process are assigning timestamps to frames, fragmenting large frames , and generating the payload header. These issues are discussed in more detail in the sections that follow.

In addition to the RTP data packets that directly represent the media frames, the sender may generate error correction packets and may reorder frames before transmission. These processes are described in Chapters 8, Error Concealment, and 9, Error Correction. After the RTP packets have been sent, the buffered media data corresponding to those packets is eventually freed. The sender must not discard data that might be needed for error correction or in the encoding process. This requirement may mean that the sender must buffer data for some time after the corresponding packets have been sent, depending on the codec and error correction scheme used.

Timestamps and the RTP Timing Model

The RTP timestamp represents the sampling instant of the first octet of data in the frame. It starts from a random initial value and increments at a media-dependent rate.

During capture of a live media stream, the sampling instant is simply the time when the media is captured from the video frame grabber or audio sampling device. If the audio and video are to be synchronized, care must be taken to ensure that the processing delay in the different capture devices is accounted for, but otherwise the concept is straightforward. For most audio payload formats, the RTP timestamp increment for each frame is equal to the number of samples ”not octets ”read from the capture device. A common exception is MPEG audio, including MP3, which uses a 90kHz media clock, for compatibility with other MPEG content. For video, the RTP timestamp is incremented by a nominal per frame value for each frame captured, depending on the clock and frame rate. The majority of video formats use a 90kHz clock because that gives integer timestamp increments for common video formats and frame rates. For example, if sending at the NTSC standard rate of (approximately) 29.97 frames per second using a payload format with a 90kHz clock, the RTP timestamp is incremented by exactly 3,003 per packet.

For prerecorded content streamed from a file, the timestamp gives the time of the frame in the playout sequence, plus a constant random offset. As noted in Chapter 4, RTP Data Transfer Protocol, the clock from which the RTP timestamp is derived must increase in a continuous and monotonic fashion irrespective of seek operations or pauses in the presentation. This means that the timestamp does not always correspond to the time offset of the frame from the start of the file; rather it measures the timeline since the start of the playback.

Timestamps are assigned per frame. If a frame is fragmented into multiple RTP packets, each of the packets making up the frame will have the same timestamp.

The RTP specification makes no guarantee as to the resolution, accuracy, or stability of the media clock. The sender is responsible for choosing an appropriate clock, with sufficient accuracy and stability for the chosen application. The receiver knows the nominal clock rate but typically has no other knowledge regarding the precision of the clock. Applications should be robust to variability in the media clock, both at the sender and at the receiver, unless they have specific knowledge to the contrary.

The timestamps in RTP data packets and in RTCP sender reports represent the timing of the media at the sender: the timing of the sampling process, and the relation between the sampling process and a reference clock. A receiver is expected to reconstruct the timing of the media from this information. Note that the RTP timing model says nothing about when the media data is to be played out. The timestamps in data packets give the relative timing, and RTCP sender reports provide a reference for interstream synchronization, but RTP says nothing about the amount of buffering that may be needed at the receiver, or about the decoding time of the packets.

Although the timing model is well defined by RTP, the specification makes no mention of the algorithms used to reconstruct the timing at a receiver. This is intentional: The design of playout algorithms depends on the needs of the application and is an area where vendors may differentiate their products.

Fragmentation

Frames that exceed the network maximum transmission unit (MTU) must be fragmented into several RTP packets before transmission, as shown in Figure 6.4. Each fragment has the timestamp of the frame and may have an additional payload header to describe the fragment.

Figure 6.4. Fragmentation of a Media Frame into Several RTP Packets

graphics/06fig04.gif

The fragmentation process is critical to the quality of the media in the presence of packet loss. The ability to decode each fragment independently is desirable; otherwise loss of a single fragment will result in the entire frame being discarded ”a loss multiplier effect we wish to avoid. Payload formats that may require fragmentation typically define rules by which the payload data may be split in appropriate places, along with payload headers to help the receiver use the data in the event of some fragments being lost. These rules require support from the encoder to generate fragments that both obey the packing rules of the payload format and fit within the network MTU.

If the encoder cannot produce appropriately sized fragments, the sender may have to use an arbitrary fragmentation. Fragmentation can be accomplished by the application at the RTP layer, or by the network using IP fragmentation. If some fragments of an arbitrarily fragmented frame are lost, it is likely that the entire frame will have to be discarded, significantly impairing quality (Handley and Perkins 33 describe these issues in more detail).

When multiple RTP packets are generated for each frame, the sender must choose between sending the packets in a single burst and spreading their transmission across the framing interval. Sending the packets in a single burst reduces the end-to-end delay but may overwhelm the limited buffering capacity of the network or receiving host. For this reason it is recommended that the sender spread the packets out in time across the framing interval. This issue is important mostly for high-rate senders, but it is good practice for other implementations as well.

Payload Format “Specific Headers

In addition to the RTP header and the media data, packets often contain an additional payload-specific header. This header is defined by the RTP payload format specification in use, and provides an adaptation layer between RTP and the codec output.

Typical use of the payload header is to adapt codecs that were not designed for use over lossy packet networks to work on IP, to otherwise provide error resilience, or to support fragmentation. Well-designed payload headers can greatly enhance the performance of a payload format, and implementers should pay attention in order to correctly generate these headers, and to use the data provided to repair the effects of packet loss at the receiver.

The section titled Payload Headers in Chapter 4, RTP Data Transfer Protocol, discusses the use of payload headers in detail.



RTP
RTP: Audio and Video for the Internet
ISBN: 0672322498
EAN: 2147483647
Year: 2003
Pages: 108
Authors: Colin Perkins

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net