Robust Header Compression | RTP: Audio and Video for the Internet

As noted earlier, CRTP does not work well over links with loss and long round-trip times, such as many cellular radio links. Each lost packet causes several subsequent packets to be lost because the context is out of sync during at least one link round-trip time.

In addition to reducing the quality of the media stream, the loss of multiple packets wastes bandwidth because some packets that have been sent are simply discarded, and because a full header packet must be sent to refresh the context. Robust Header Compression (ROHC) ³⁷ was designed to solve these problems, providing compression suitable for use with third-generation cellular systems. ROHC gets significantly better performance than CRTP over such links, at the expense of additional complexity of implementation.

Observation of the changes in header fields within a media stream shows that they fall into three categories:

Some fields are static, or mostly static. Examples include the RTP SSRC, UDP ports, and IP addresses. These fields can be sent once when the connection is established, and either they never change or they change very infrequently.
Some fields change in a predictable manner with each packet sent, except for occasional sudden changes. Examples include the RTP timestamp and sequence number, and (often) the IPv4 ID field. During periods when these fields are predictable, there is usually a constant relation between them. When sudden changes occur, often only a single field changes unpredictably.
Some fields are unpredictable, having essentially random values, and have to be communicated as is, with no compression. The main example is the UDP checksum.

ROHC operates by establishing mapping functions between the RTP sequence number and the other predictable fields, then reliably transferring the RTP sequence number and the unpredictable header fields. These mapping functions form part of the compression context, along with the values of static fields that are communicated at startup or when those fields change.

The main differences between ROHC and CRTP come from the way they handle the second category: fields that usually change in a predictable manner. In CRTP, the value of the field is implicit and the packet contains an indication that it changed in the predictable fashion. In ROHC, the value of a single key field ”the RTP sequence number ”is explicitly included in all packets, and an implicit mapping function is used to derive the other fields.

Operation of ROHC: States and Modes

ROHC has three states of operation, depending on how much context has been transferred:

The system starts in initialization and refresh state, much like the full header mode of CRTP. This state conveys the necessary information to set up the context, enabling the system to enter first- or second-order compression state.
First-order compression state allows the system to efficiently communicate irregularities in the media stream ”changes in the context ”while still keeping much of the compression efficiency. In this state, only a compressed representation of the RTP sequence number, along with the context identifier, and a reduced representation of the changed fields are conveyed.
Second-order state is the highest compression level, when the entire header is predictable from the RTP sequence number and stored context. Only a compressed representation of the RTP sequence number and a (possibly implicit) context identifier are included in the packet, giving a header that can be as small as one octet.

If any unpredictable fields are present, such as the UDP checksum, then both first- and second-order compression schemes communicate those fields unchanged. As expected, the result is a significant reduction in the compression efficiency. For simplicity, this description omits further mention of these fields, although they will always be conveyed.

The compressor starts in initialization and refresh state, sending full headers to the decompressor. It will move to either first- or second-order state, sending compressed headers, after it is reasonably sure that the decompressor has correctly received enough information to set up the context.

The system can operate in one of three modes: unidirectional, bidirectional optimistic, and bidirectional reliable. Depending on the mode chosen , the compressor will transition from the initialization and refresh state to either first- or second-order compression state, according to a timeout or an acknowledgment:

Unidirectional mode . No feedback is possible, and the compressor transitions to first- or second-order state after a predetermined number of packets has been sent.
Bidirectional optimistic mode . The compressor transitions to the first- or second-order state after a predetermined number of packets has been sent, much as in unidirectional mode, or when an acknowledgment is received.
Bidirectional reliable mode . The compressor transitions to the first- or second-order state on receipt of an acknowledgment.

The choice of unidirectional or bidirectional feedback depends on the characteristics of the link between compressor and decompressor. Some network links may not support a ( convenient ) back channel for feedback messages, forcing unidirectional operation. In most cases, though, one of the bidirectional modes can be used, allowing the receiver to communicate its state to the sender.

The compressor starts by assuming unidirectional operation. The decompressor will choose to send feedback if the link supports it, depending on the loss patterns of the link. Receipt of feedback messages informs the compressor that bidirectional operation is desired. The choice between optimistic and reliable mode is made by the decompressor and depends on the capacity of the back channel and the loss characteristics of the link. Reliable mode sends more feedback but is more tolerant of loss.

It's important to keep the difference between ROHC states and modes clear. The state determines the type of information sent in each packet: full headers, partial updates, or fully compressed. The mode determines how and when feedback is sent from the decompressor: (1) never, (2) when there is a problem, or (3) always.

Typically the system transitions from initialization and refresh state to the second-order state after context has been established. It then remains in second-order state until loss occurs, or until a context update is needed because of changes in the stream characteristics.

If loss occurs, the system's behavior depends on the mode of operation. If one of the bidirectional modes was chosen, the decompressor will send feedback causing the compressor to enter the first-order state and send updates to repair the context. This process corresponds to the sending of a context refresh message in CRTP, causing the compressor to generate a full header packet. If unidirectional mode is used, the compressor will periodically transition to lower states, to refresh the context at the decompressor.

The compressor will also transition to the first-order state when it is necessary to convey a change in the mapping for one of the predictable fields, or an update to one of the static fields. This process corresponds to the sending of a compressed packet containing an updated delta field or a full header packet in CRTP. Depending on the mode of operation, the change to first-order state may cause the decompressor to send feedback indicating that it has correctly received the new context.

Operation of ROHC: Robustness and Compression Efficiency

If the compressed link is reliable, ROHC and CRTP have similar compression efficiency, although ROHC is somewhat more complex. For this reason, dial-up modem links do not typically use ROHC, because CRTP is less complex and yields comparable performance.

When there is packet loss on the compressed link, the performance of ROHC shows because of its flexibility in sending partial context updates, and its robust encoding of compressed values. The capability to send partial context updates allows ROHC to update the context in cases in which CRTP would have to send a full header packet. ROHC can also reduce the size of a context update when there is loss on the link. Both of these capabilities give improved performance, compared to CRTP.

The combination of robust encoding of compressed values and sequence number “driven operation is also a key factor. As noted earlier, the ROHC context contains a mapping between the RTP sequence number and the other predictable header fields. Second-order compressed packets convey the sequence number using a window-based least-significant bit (W-LSB) encoding, and the other fields are derived from this. It is largely the use of W-LSB encoding that gives ROHC its robustness to packet loss.

Standard LSB encoding transmits the k least-significant bits of the field value, instead of the complete field. On receiving these k bits, and given the previously transmitted value V _ref , the decompressor can derive the original value of the field, provided that it is within a range known as the interpretation interval .

The interpretation interval is the 2 ^k values surrounding V _ref , offset by a parameter p so that it covers the range V _ref “ p to V _ref + 2 ^k “ 1 “ p . The parameter p is chosen on the basis of the characteristics of the field being transported and conveyed to the decompressor during initialization, forming part of the context. Possible choices include the following:

If the field value is expected to increase, p = “1.
If the field value is expected to increase or stay the same, p = 0.
If the field value is expected to vary slightly from a fixed value, p = 2 ^{(k “1)} + 1.
If the field value is expected to undergo small negative changes and large positive changes ”for example, the RTP timestamp of a video stream using B- frames ”then p = 2 ^{(k “2)} “ 1.

As an example, consider the transport of sequence numbers , in which the last transmitted value V _ref = 984 and k = 4 least-significant bits are sent as the encoded form. Assume also that p = “1, giving an interpretation interval ranging between 985 and 1,000, as shown in Figure 11.5. The next value sent is 985 (in binary: 1111011001), which is encoded as 9 (in binary: 1001, the 4 least-significant bits of the original value). On receiving the encoded value, the decompressor takes V _ref and replaces the k least-signifi-cant bits with those received, restoring the original value.

Figure 11.5. An Example of LSB Encoding

graphics/11fig05.gif

LSB encoding will work provided that the encoded value is within the interpretation interval. If a single packet were lost in the preceding example and the next value received by the decoder were 10 (in binary: 1010), then the restored value would be 986, which is correct. If more than 2 ^k packets were lost, however, the decoder would have no way of knowing the correct value to decode.

The window-based variant of LSB encoding, W-LSB, maintains the interpretation interval as a sliding window, advancing when the compressor is reasonably sure that the decompressor has received a particular value. Confidence that the window can advance is obtained by various means: In bidirectional optimistic mode, the decompressor sends acknowledgments; in bidirectional optimistic mode, the window advances after a period of time, unless the decompressor sends a negative acknowledgment; and in unidirectional mode, the window simply advances after a period of time.

The advantage of W-LSB encoding is that loss of a small number of packets within the window will not cause the decompressor to lose synchronization. This robustness allows an ROHC decompressor to continue operation without requesting feedback in cases when a CRTP decompressor would fail and need a context update. The result is that ROHC is much less susceptible to the loss multiplier effect than CRTP: A single packet loss on the link will cause a single loss at the output of a ROHC decompressor, whereas a CRTP decompressor must often to wait for a context update before it can continue decompression .