The RTP Data Transfer Packet


The format of an RTP data transfer packet is illustrated in Figure 4.2. There are four parts to the packet:

  1. The mandatory RTP header

  2. An optional header extension

  3. An optional payload header (depending on the payload format used)

  4. The payload data itself

Figure 4.2. An RTP Data Transfer Packet

graphics/04fig02.gif

The entire RTP packet is contained within a lower-layer payload, typically UDP/IP.

Header Elements

The mandatory RTP data packet header is typically 12 octets in length, although it may contain a contributing source list, which can expand the length by 4 to 60 additional octets. The fields in the mandatory header are the payload type, sequence number, time-stamp, and synchronization source identifier. In addition, there is a count of contributing sources, a marker for interesting events, support for padding and a header extension, and a version number.

PAYLOAD TYPE
graphics/04inf01.gif

The payload type , or PT , field of the RTP header identifies the media transported by an RTP packet. The receiving application examines the payload type to determine how to treat the data ”for example, passing it to a particular decompressor. The exact interpretation of the payload field is defined by an RTP profile, which binds the payload type numbers to payload format specifications, or by a non-RTP means.

Many applications operate under the RTP profile for audio and video conferences with minimal control (RFC 1890). 7 This profile (commonly called the audio/video profile ) defines a table of default mappings between the payload type number and payload format specifications. Examples of these static assignments are shown in Table 4.1 (this is not a complete list; the profile defines additional assignments). In addition to the static assignments, out-of- band signaling ”for example, using SIP, RTSP, SAP, or H.323 ”may be used to define the mapping. Payload types in the range 96 to 127 are reserved for dynamic assignment in this manner when the audio/video profile is being used; other profiles may specify different ranges.

Payload formats are named in terms of the MIME namespace. This namespace was originally defined for e-mail, to identify the content of attachments, but it has since become a general namespace for media formats and is used in many applications. The use of MIME types with RTP is relatively new ”payload type names originally occupied a separate namespace ”but it is a powerful feature, providing a central repository of transport and encoding options for each type of media.

Table 4.1. Examples of Static Payload Type Assignments

Payload Type Number

Payload Format

Specification

Description

AUDIO/PCMU

RFC 1890

ITU G.711 µ-law audio

3

AUDIO/GSM

RFC 1890

GSM full-rate audio

8

AUDIO/PCMA

RFC 1890

ITU G.711 A-law audio

12

AUDIO/QCELP

RFC 2658

PureVoice QCELP audio

14

AUDIO/MPA

RFC 2250

MPEG audio (e.g., MP3)

26

VIDEO/JPEG

RFC 2435

Motion JPEG video

31

VIDEO/H261

RFC 2032

ITU H.261 video

32

VIDEO/MPV

RFC 2250

MPEG I/II video

All payload formats should now have a MIME type registration. Newer payload formats include it in their specification; a group registration for the older ones is in progress. 51 The complete list of MIME types is maintained online at http://www.iana.org/assignments/media-types.

Whether the payload type assignment is static or dynamic, it is necessary to describe the session to the application so that the application knows which payload types are to be used. A common means of describing sessions is the Session Description Protocol (SDP). 15 A sample session description might be as follows :

 v=0 o=bloggs 2890844526 2890842807 IN IP4 10.45.1.82 s=- e=j.bloggs@example.com(Joe Bloggs) c=IN IP4 224.2.17.12/127 t=2873397496 2873404696 m=audio 49170 RTP/AVP 0 m=video 51372 RTP/AVP 98 a=rtpmap:98 H263-1998/90000 

Of interest in our discussion of RTP are the c= and m= lines, which communicate addresses and ports for the RTP session and define the profile and payload types in use, and the a=rtpmap: line, which makes a dynamic payload type assignment.

The example describes two RTP sessions: Audio is being sent to the IPv4 multicast group 224.2.17.12 on port 49170 with time-to-live 127, and video is being sent to the same multicast group on port 51372. Both audio and video use RTP/AVP as their transport; this is RTP transport using the RTP profile for audio and video conferences with minimal control. 7

The payload type used for audio is 0. This is a static assignment in the profile, the payload format for AUDIO/PCMU. The payload type for video is 98, which is mapped to the payload format for VIDEO/H263-1998 by the a=rtpmap: line. By referencing the table of MIME type assignments, we find that the definition of VIDEO/H263-1998 is in RFC 2429. 22

Although SDP is a common solution for describing RTP sessions, nothing in RTP requires SDP to be used. For example, applications based on ITU recommendation H.323 use RTP for their media transport but use a different mechanism (H.245) for describing sessions.

There has been some debate on the merits of static versus dynamic assignment of payload type numbers to payload formats, encouraged perhaps by the long list of static assignments in the audio/video profile and the perceived complexity of the signaling needed for dynamic assignments.

When RTP was new and its use was the subject of experimentation with simple payload formats, static payload type assignments made sense. A receiver could decode the RTP payload on the basis of the payload type number only, because the codecs required no additional configuration, and the absence of signaling simplified development of these new applications.

However, as designers have gained experience with RTP and applied it to more complex payload formats, it has become clear that the practice of making static assignments is flawed.

Most of the payload formats in use today require some configuration in addition to payload type assignment, requiring the use of signaling; and emerging applications ”such as voice-over-IP and video-on-demand ”require signaling for user location, authentication, and payment. Because signaling is required anyway, the incentive for static payload type assignment is lost.

Requiring dynamic assignment also avoids problems due to depletion of the payload type space; there are only 127 possible static assignments, and the number of possible payload formats far exceeds that. Dynamic assignment allows for only those formats needed for the duration of a session to be bound to payload type numbers.

Accordingly, the policy of the IETF Audio/Video Transport working group is that no more static assignments will be made, and that applications should signal their payload type usage out of band.

The choice of payload format has various other implications: It defines the rate of the RTP media clock, and the format of any payload header and the payload itself. For static assignments, the clock rate is specified in the profile; dynamic assignments must indicate the clock rate along with the mapping between payload type and payload format. For example, in the previous session description the a=rtpmap: line specifies a 90,000-Hz clock for the VIDEO/H263-1998 payload format. Most payload formats operate with a limited set of clock rates, with the payload format specification defining which rates are valid.

An RTP session is not required to use only a single payload format; multiple payload formats can be used within a session, with the different formats being identified by different payload types. The format can change at any time within a session, and as long as the mapping from payload type to payload format has been communicated in advance, there is no requirement for signaling before the change occurs. An example might be encoding of DTMF tones within a voice-over-IP session, to support the "Press 0 to speak to an operator" style of automated service, in which one format is used for speech and another for the tones.

Even though multiple payload formats may be used within a session, the payload type is not intended to be used to multiplex different classes of media. For example, if both audio and video are being sent by an application, they should be sent as two different RTP sessions, on different addresses/ports, rather than being sent as a single RTP session and demultiplexed by the payload type. This separation of media allows applications to request different network quality of service for the different media, and it is also required for correct operation of the RTP control protocol.

SEQUENCE NUMBER
graphics/04inf02.gif

The RTP sequence number is used to identify packets, and to provide an indication to the receiver if packets are being lost or delivered out of order. It is not used to schedule playout of the packets ”that is the purpose of the timestamp ”although it does allow the receiver to reconstruct the order in which packets were sent.

The sequence number is an unsigned 16-bit integer, which increases by one with each data packet sent and wraps around to zero when the maximum value is reached. An important consequence of the 16-bit space is that sequence number wrap-around happens relatively often: A typical voice-over-IP application sending audio in 20-millisecond packets will wrap the sequence number approximately every 20 minutes.

This means that applications should not rely on sequence numbers as unique packet identifiers. Instead, it is recommended that they use an extended sequence number, 32 bits or wider, to identify packets internally, with the lower 16 bits being the sequence number from the RTP packet and the upper 16 being a count of the number of times the sequence number has wrapped around:

 extended_seq_num = seq_num + (65536 * wrap_around_count) 

Because of possible packet loss or reordering , maintaining the wrap-around counter ( wrap-around-count ) is not a simple matter of incrementing a counter when the sequence number wraps to zero. The RTP specification has an algorithm for maintaining the wrap-around counter:

 uint16_t    udelta = seq  max_seq if (udelta < max_dropout) {     if (seq < max_seq) {         wrap_around_count++     }     max_seq = seq; } else if (udelta <= 65535  max_misorder) {     // The sequence number made a very large jump     if (seq == bad_seq) {         // Two sequential packets received; assume the         // other side has restarted without telling us         ...     } else {         bad_seq = seq + 1;     } } else {     // Duplicate or misordered packet     ... } 

Note that all calculations are done with modulo arithmetic and 16-bit unsigned quantities . Both seq and max_seq are the unextended sequence numbers from the RTP packets. The RTP specification recommends that max_misorder = 100 and max_dropout = 3000 .

If the extended sequence number is calculated immediately on reception of a packet and used thereafter, most of the application can be made unaware of sequence number wrap-around. The ability to hide the wrap-around greatly simplifies loss detection and concealment , packet reordering, and maintenance of statistics. Unless the packet rate is very high, the wrap-around time for a 32-bit sequence number is such that most applications can ignore the possibility. For example, the voice-over-IP example given earlier will take over two years to wrap around the extended sequence number.

If the packet rate is very high, a 32-bit extended sequence number may wrap around while the application is running. When designing applications for such environments, you must either use a larger extended sequence number (for example, 64 bits) to avoid the problem, or build the application to handle wrap-around by performing all calculations on sequence numbers using 32-bit modulo arithmetic. Incorrect operation during wrap-around of the sequence number is a common problem, especially when packets are lost or reordered around the time of wrap-around.

The initial value of the sequence number should be chosen randomly , rather than starting from zero. This precaution is intended to make known plain-text attacks on an encrypted RTP stream more difficult. Use of a random initial sequence number is important even if the source does not encrypt, because the stream may pass through an encrypting translator that is not known to the source, and adding the random offset in the translator is not trivial (because sequence numbers are reported in RTCP reception report packets; see Chapter 5, RTP Control Protocol). A common implementation problem is to assume that the sequence numbers start from zero; receivers should be able to play out a stream irrespective of the initial sequence number (this capability is also needed to handle late joins).

The sequence number should always follow a continuous sequence, increasing by one for each packet sent, and never jumping forward or backward (except for wrap-around, of course). This requirement should apply across changes in payload format regardless of how the media is generated. For example, when you're splicing together video clips ”perhaps to insert advertisements ”the RTP sequence number space must be continuous, and it must not be reset at the start of each clip. This has implications for the design of streaming media servers because they cannot rely on sequence numbers stored with a media file and must generate sequence numbers on the fly.

The primary use of the sequence number is loss detection. A gap in the sequence number space indicates to the receiver that it must take action to recover or conceal the missing data. This is discussed in more detail in Chapters 8, Error Concealment, and 9, Error Correction.

A secondary use of the sequence number is to allow reconstruction of the order in which packets were sent. A receiver does not necessarily care about this ”because many payload formats allow for decoding of packets in any order ”but sorting the packets into order as they are received may make loss detection easier. The design of playout buffer algorithms is discussed in more detail in Chapter 6, Media Capture, Playout, and Timing.

TIMESTAMP
graphics/04inf03.gif

The RTP timestamp denotes the sampling instant for the first octet of media data in a packet, and it is used to schedule playout of the media data. The timestamp is a 32-bit unsigned integer that increases at a media-dependent rate and wraps around to zero when the maximum value is exceeded. With typical video codecs, a clock rate of 90kHz is used, corresponding to a wrap-around of approximately 13 hours; with 8kHz audio the interval is approximately 6 days.

The initial value of the timestamp is randomly chosen, rather than starting from zero. As with the sequence number, this precaution is intended to make known plain-text attacks on an encrypted RTP stream more difficult. Use of a random initial timestamp is important even if the source does not encrypt, because the stream may pass through an encrypting translator that is not known to the source. A common implementation problem is to assume that the timestamp starts from zero. Receivers should be able to play out a stream irrespective of the initial timestamp and be prepared to handle wrap-around; because the timestamp does not start at zero, a wrap-around could occur at any time.

Timestamp wrap-around is a normal part of RTP operation and should be handled by all applications. The use of extended timestamps, perhaps 64-bit values, can make most of the application unaware of the wrap-around. Extended timestamps are not recommended, though, because 64-bit arithmetic is often inefficient on today's processors.

A better design performs all timestamp calculations using 32-bit modulo arithmetic. This approach allows differences between timestamps to be calculated, provided that the packets compared are within half the timestamp space of each other.

The timestamp is derived from a media clock that must increase in a linear and monotonic fashion (except for wrap-around, of course), producing a single timeline for each RTP session. This is true irrespective of the means by which the media stream is generated.

An example is useful to clarify the implications of the way in which the timestamp increases: When audio clips are spliced together within a single RTP session, the RTP timestamps must form a continuous sequence and must not be reset at the start of each clip. These requirements are illustrated in Figure 4.3, which shows that an RTP receiver cannot tell from the RTP headers that a change has occurred.

Figure 4.3. Formation of a Continuous Timestamp Sequence across Two Clips That Have Been Spliced Together

graphics/04fig03.gif

The same is true when a fast-forward or rewind operation occurs: The timestamps must form a continuous sequence and not jump around. This requirement is evident in the design of the Real-Time Streaming Protocol (RTSP), 14 which includes the concept of "normal play time" representing the time index into the stream. Because the continuity of RTP timestamps must be maintained, an RTSP server has to send an updated mapping between RTP timestamps and the normal play time during a seek operation.

The continuity of RTP timestamps has implications for the design of streaming media servers. The servers cannot rely on timestamps (or sequence numbers) stored with a media file but must generate them on the fly, taking into account seek operations within the media and the duration of any previous data that has been played out within the RTP session.

The requirement for a media clock that increases in a linear and monotonic fashion does not necessarily imply that the order in which media data is sampled is the order in which it is sent. After media frames have been generated ”and hence have obtained their timestamps ”they may be reordered before packetization. As a result, packets may be transmitted out of timestamp order, even though the sequence number order is maintained. The receiver has to reconstruct the timestamp order to play out the media.

An example is MPEG video, which contains both key frames and delta-encoded frames predicted from them forward (P-frames) and backward (B-frames). When B-frames are used, they are predicted from a later packet and hence must be delayed and sent out of order. The result is that the RTP stream will have non-monotonically increasing timestamps. 12 Another example is the use of interleaving to reduce the effects of burst loss (see the section titled Interleaving in Chapter 8, Error Concealment). In all cases, a single timeline, which the receiver must reconstruct to play out the media, is retained.

Timestamps on RTP packets are not necessarily unique within each wrap-around cycle. If two packets contain data from the same sampling instant, they will have the same timestamp. Duplication of timestamps typically occurs when a large video frame is split into multiple RTP packets for transmission (the packets will have different sequence numbers but the same timestamp).

The nominal rate of the media clock used to generate timestamps is defined by the profile and/or payload format in use. For payload formats with static payload type assignments, the clock rate is implicit when the static payload type is used (it is specified as part of the payload type assignment). The dynamic assignment process must specify the rate along with the payload type (see the section titled Payload Type earlier in this chapter). The chosen rate must be sufficient to perform lip synchronization with the desired accuracy, and to measure variation in network transit time. The clock rate may not be chosen arbitrarily; most payload formats define one or more acceptable rates.

Audio payload formats typically use the sampling rate as their media clock, so the clock increases by one for each full sample read. There are two exceptions: MPEG audio uses a 90kHz clock for compatibility with non-RTP MPEG transport; and G.722, a 16kHz speech codec, uses an 8kHz media clock for backward compatibility with RFC 1890, which mistakenly specified 8kHz instead of 16kHz.

Video payload formats typically use a 90kHz clock, for compatibility with MPEG and because doing so yields integer timestamp increments for the typical 24Hz, 25Hz, 29.97Hz, and 30Hz frame rates and the 50Hz, 59.94Hz, and 60Hz field rates in widespread use today. Examples include PAL (Phase Alternating Line) and NTSC (National Television Standards Committee) television, plus HDTV (High-Definition Television) formats.

It is important to remember that RTP makes no guarantee as to the resolution, accuracy, or stability of the media clock ”those properties are considered application dependent, and outside the scope of RTP ”and in general, all that is known is its nominal rate. Applications should be able to cope with variability in the media clock, both at the sender and at the receiver, unless they have specific knowledge to the contrary.

In some cases it may be possible to define the resolution, accuracy, and stability of the media clock, and to use this knowledge to simplify application design. This is typically possible only when a single entity controls both sender and receiver, or when both are designed to a profile with a strict clock specification.

The process by which a receiver reconstructs the correct timing of a media stream based on the timestamps is described in Chapter 6, Media Capture, Playout, and Timing.

SYNCHRONIZATION SOURCE
graphics/04inf04.gif

The synchronization source (SSRC) identifies participants within an RTP session. It is an ephemeral, per-session identifier that is mapped to a long-lived canonical name , CNAME, through the RTP control protocol (see the section titled RTCP SDES: Source Description, in Chapter 5, RTP Control Protocol).

The SSRC is a 32-bit integer, chosen randomly by participants when they join the session. Having chosen an SSRC identifier, the participant uses it in the packets it sends out. Because SSRC values are chosen locally, two participants can select the same value. Such collisions may be detected when one application receives a packet from another that contains the SSRC identifier chosen for itself.

If a participant detects a collision between the SSRC it is using and that chosen by another participant, it must send an RTCP BYE for the original SSRC (see the section titled RTCP BYE: Membership Control, in Chapter 5, RTP Control Protocol) and select another SSRC for itself. This collision detection mechanism ensures that the SSRC is unique for each participant within a session.

It is important that a high-quality source of randomness is used to generate the SSRC, and that collision detection is implemented. In particular, the seed for the random number generator should not be based on the time at which the session is joined or on the transport addresses of the session, because collisions can result if multiple participants join at once.

All packets with the same SSRC form part of a single timing and sequence number space, so a receiver must group packets by SSRC for playback. If a participant generates multiple streams in one RTP session ”for example, from separate video cameras ”each must be identified as a different SSRC so that the receivers can distinguish which packets belong to each stream.

CONTRIBUTING SOURCES
graphics/04inf05.gif

Under normal circumstances, RTP data is generated by a single source, but when multiple RTP streams pass through a mixer or translator, multiple data sources may have contributed to an RTP data packet. The list of contributing sources (CSRCs) identifies participants who have contributed to an RTP packet but were not responsible for its timing and synchronization. Each contributing source identifier is a 32-bit integer, corresponding to the SSRC of the participant who contributed to this packet. The length of the CSRC list is indicated by the CC field in the RTP header.

Packets containing a CSRC list are produced by the operation of an RTP mixer, as described later in this chapter, in the section titled Mixers. When receiving a packet containing a CSRC list, the SSRC is used to group packets for playout in the usual manner, and each CSRC is added to the list of known participants. Each participant identified by a CSRC will have a corresponding stream of RTP control protocol packets, providing fuller identification of the participant.

MARKER
graphics/04inf06.gif

The marker (M) bit in the RTP header is used to mark events of interest within a media stream; its precise meaning is defined by the RTP profile and media type in use.

For audio streams operating under the RTP profile for audio and video conferences with minimal control, the marker bit is set to one to indicate the first packet sent after a period of silence, and otherwise set to zero. A marker bit set to one serves as a hint to the application that this may be a good time to adjust its playout point, because a small variation in the length of a silence period is not usually noticeable to listeners (whereas a change in the playout point while audio is being played is audible).

For video streams operating under the RTP profile for audio and video conferences with minimal control, the marker bit is set to one to indicate the last packet of a video frame, and otherwise set to zero. If set to one, the marker serves as a hint that the application can begin decoding the frame, rather than waiting for the following packet ”which will have a different timestamp ”to detect that the frame should be displayed.

In all cases, the marker bit provides only a hint to the application, which should be designed to operate even if packets with the marker set are lost. For audio streams, it is usually possible to intuit the end of a silent period because the relationship between sequence number and timestamp changes. The start of a video frame can be detected by a change in the timestamp. An application can use these observations to operate with reduced performance if the packets containing the marker bit are lost.

It is possible for an RTP profile to specify that additional marker bits exist, at the expense of a smaller payload type field. For example, a profile could mandate two marker bits and a six-bit payload type. No current profiles use this feature.

PADDING
graphics/04inf07.gif

The padding (P) bit in the RTP header is used to indicate that the payload has been padded out past its natural length. If padding is added to an RTP packet, the P bit is set and the last octet of the payload is filled with a count of the number of padding octets. Padding is rarely used, but it is needed for some encryption schemes that work with particular block sizes, and to adapt a payload format to a fixed-capacity channel.

As an example of the use of padding, Figure 4.4 shows a GSM audio frame packetized in RTP that has been padded out to 48 octets from its natural length of 45 octets (33 for the GSM frame, 12 for the RTP header). This padding might be needed if the packet were encrypted with the Data Encryption Standard (DES), 56 which requires 8-octet (64-bit) blocks.

Figure 4.4. An RTP Packet Carrying a GSM Audio Frame, to Which Three Padding Bits Have Been Added

graphics/04fig04.gif

VERSION NUMBER

graphics/04inf08.gif Each RTP packet contains a version number , indicated by the V field. The current version of RTP to define additional versions, and the previous versions of RTP are not in widespread use. The only meaningful use of the version number field is as part of a packet validity check.

Header Extensions

graphics/04inf09.gif RTP allows for the possibility that extension headers, signaled by the X bit being set to one, are present after the fixed RTP header, but before any payload header and the payload itself. The extension headers are of variable length, but they start with a 16-bit type field followed by a 16-bit length field (which counts the length of the extension in octets, excluding the initial 32 bits), allowing the extension to be ignored by receivers that do not understand it.

Extension headers provide for experiments that require more header information than that provided by the fixed RTP header. They are rarely used; extensions that require additional, payload format “independent header information are best written as a new RTP profile. If additional headers are required for a particular payload format, they should not use a header extension and instead should be carried in the payload section of the packet as a payload header.

Although header extensions are extremely rare, robust implementations should be prepared to process packets containing an unrecognized header extension by ignoring the extension.

Payload Headers

graphics/04inf10.gif The mandatory RTP header provides information that is common to all payload formats. In many cases a payload format will need more information for optimal operation; this information forms an additional header that is defined as part of the payload format specification. The payload header is included in an RTP packet following the fixed header and any CSRC list and header extension. Often the definition of the payload header constitutes the majority of a payload format specification.

The information contained in a payload header can be either static ”the same for every session using a particular payload format ”or dynamic. The payload format specification will indicate which parts of the payload header are static and which are dynamic, and it must be configured on a per-session basis. Those parts that are dynamic are usually configured through SDP, 15 with the a=fmtp: attribute used to define "format parameters," although other means are sometimes used. The parameters that can be specified fall into three categories:

  1. Those that affect the format of the payload header, signaling the presence or absence of header fields, their size, and their format. For example, some payload formats have several modes of operation, which may require different header fields for their use.

  2. Those that do not affect the format of the payload header but do define the use of various header fields. For example, some payload formats define the use of interleaving and require header fields to indicate the position within the interleaving sequence.

  3. Those that affect the payload format in lieu of a payload header. For example, parameters may specify the frame size for audio codecs, or the video frame rate.

Features of the payload format that do not change during a session usually are signaled out of band, rather than being included in the payload header. This reduces overheads during the session, at the expense of additional signaling complexity. The syntax and use of format parameters are usually specified as part of the payload format specification.

The primary reason for specifying payload headers is to provide error resilience for those formats that were not designed for use over lossy packet networks. The first example of this was the payload format for H.261 video, as discussed in RFC 2032 and RFC 2736. 9 , 33 More recent examples are the more loss-tolerant payload formats for MP3 and AMR (Adaptive Multi-Rate) audio. 38 , 41 The issue of error resilience is discussed further in Chapters 8, Error Concealment, and 9, Error Correction.

The RTP payload format for H.261 video offers an interesting lesson in design for error resilience. The H.261 codec allows groups of video blocks to be up to 3 kilobytes long. The original version of the payload format specified that each group of blocks should be inserted directly into an RTP packet, or if too large, should be arbitrarily split across packets. However, this approach leaves circumstances in which packets arrive at the receiver and must be discarded because the previous packet was lost, and the partial group of blocks is not independently decodable. This is a loss multiplier effect that we want to avoid.

In fact, the group of blocks is not the smallest unit in H.261 video. There are smaller units called macro-blocks , but they are not identifiable without parsing from the start of the group of blocks. However, including additional information at the start of each packet makes it possible to reinstate information that would normally be found by parsing from the start of the group of blocks. This technique is used to define a payload format that splits the H.261 stream on macro-block boundaries if it exceeds the network MTU.

This is a less obvious packetization for H.261, but it does mean that a smart decoder can reconstruct valid H.261 video from a stream of RTP packets that has experienced loss, without having to discard any of the data that arrived. It shows the advantages of error-resilient design of payload formats.

Payload Data

graphics/04inf11.gif One or more frames of media payload data, directly following any payload header, make up the final part of an RTP packet (other than padding, if needed). The size and format of the payload data depend on the payload format and format parameters chosen during session setup.

Many payload formats allow for multiple frames of data to be included in each packet. There are two ways in which a receiver can determine how many frames are present:

  1. In many cases frames are of a fixed size, and it is possible to determine the number present by inspecting the size of the packet.

  2. Other payload formats include an identifier in each encapsulated frame that indicates the size of the frame. An application needs to parse the encapsulated frames to determine the number of frames and their start points. This is usually the case when the frames can be variable sizes.

Usually no limit on the number of frames that may be included is specified. Receivers are expected to handle reception of packets with a range of sizes: The guidelines in the audio/video profile suggest up to 200 milliseconds worth of audio, in multiples of the frame size, and video codecs should handle both fragmented and complete frames.

There are two key issues to consider when you're choosing the amount of payload data to include in each packet: the maximum transmission unit (MTU) of the network path that will be traversed, and the latency induced by waiting for more data to be produced to fill a longer packet.

Packets that exceed the MTU will be either fragmented or dropped. It is clearly undesirable if oversize packets are dropped; less obvious are the problems due to fragmentation. A fragmented packet will be reassembled at the receiver, provided that all fragments arrive. If any fragment is lost, the entire packet must be discarded even though some parts of it were correctly received. The result is a loss multiplier effect, which can be avoided if the packets are sized appropriately, and if the payload format is designed such that each packet can be independently decoded (as discussed in relation to payload headers).

Latency is another concern because a packet cannot be sent until the last octet of data it will contain is produced. The data at the start of the packet is delayed until the complete packet is ready. In many applications, the latency concern provides a tighter constraint on the application than the MTU does.



RTP
RTP: Audio and Video for the Internet
ISBN: 0672322498
EAN: 2147483647
Year: 2003
Pages: 108
Authors: Colin Perkins

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net