5.4 Streaming Video over the Internet

Because the protocols typically used for streaming of compressed video over the Internet, UDP and RTP, do not guarantee end-to-end delivery of compressed video data, packet losses introduce errors into the decoded video, which reduces the perceived video quality by viewers. Because interframe coding is used in all of the common video compression standards, those errors propagate and hence can have a large impact on video quality.

Consider a typical application, with video encoded at 30 fps, and an intracoded frame occurring every 15 frames or every half a second. If packet loss occurs in the transmission of the intracoded (I) frame, a visible error can persist for half a second, until a new I frame is transmitted. An error persisting for half a second is quite noticeable to a viewer. As shown in Boyce, ^[14] packet loss rates as low as 3 percent can translate into frame error rates as high as 30 percent. Figure 5.1 shows frame error rates from sample traces of MPEG video data transmitted over the public Internet at 384 kbps, with I frames occurring every 15 frames. Frame error rate is defined by counting the percentage of decoded frames that are affected by a packet loss.

click to expand
Figure 5.1: Frame error rate versus packet loss rate for MPEG video data.

Error concealment techniques applied at the decoder can reduce the visual impact of packet losses. An overview of error concealment techniques for video compression was provided in Wang and Zhu. ^[15] These techniques generally copy information from spatial or temporal neighbors to reduce the visual effect of packet losses. Error concealment techniques are most effective at relatively low error rates. To protect video quality from higher loss rates, it is necessary to involve the transmitting as well as the receiving end. A good overview of error control techniques involving both the send and receiver ends was provided also in Wang and Zhu. ^[16] A summary of approaches to streaming video over the Internet can be found in Wu et al. ^[17]

Because the visual effects of packet losses persist until an intracoded Macroblock is received, an encoder can choose to perform intracoding more frequently to protect against packet loss. However, this comes with a visual-quality penalty, as intercoding is generally considerably more efficient than intercoding. More-sophisticated techniques can reduce the coding efficiency penalty by allowing the intra update rates for different image regions to vary according to various channel conditions and image characteristics. ^[18]

Alternatively, reference picture selection, such as that available in H.263+, can be used in networks with NAK feedback capability. ^[19] Instead of encoding a picture using intracoding after detecting a network transmission error, this approach eliminates the persistence of the error effects by intercoding the picture with respect to a previously coded picture, which has been decoded and stored at the decoder.

Scalable video coding can be used to improve the quality of video streamed over lossy networks. With scalable video coding, a base and one or more enhancement layers are encoded, and it is expected that the base layer alone should provide at least a minimally acceptable quality representation of the video. For networks that possess paths with different levels of QoS, the base layer is transmitted with a higher level of QoS than the enhancement layer. In Aravind and coworkers, ^[20] the performance of different types of MPEG-2 scalability over lossy networks was described. In Receiver Driven Layered Multicast, ^[21] scalable video coding is used with IP multicast, and each layer of video is transmitted in a separate multicast group. Clients can join as many multicast groups as may fit in their available bandwidth.

For streaming applications, where a small amount of additional delay can be tolerated, the use of Forward Error Correction (FEC) or Forward Erasure Correction (FXC) can protect against packet loss Using media-independent FEC, well-known information theory techniques can be applied to streaming video. In Rosenberg and Schulzrinne, ^[22] several variations of XOR operations are used to create parity packets from one or more data packets. More-complex techniques such as Reed Solomon (RS) coding also can be used. In RS coding, the original information bytes are transmitted, as well as additional parity bytes. When an RS(n,k) codeword is constructed from byte data, h parity bytes are created from k information bytes, and all n = k + h bytes are transmitted. Such a Reed Solomon decoder can correct up to any h/2 byte errors, or any h byte erasures, where an erasure is defined as an error in a known position. Because in wired IP networks packets are generally lost completely rather than being transmitted with bit errors, when FEC is applied to video streaming over IP networks, the FEC is applied across packets. When RS coding is applied, k information packets of length l bytes are coded using l RS codewords. For each RS codeword, k information bytes are taken from k different packets (one from each packet), and the constructed parity bytes are placed into separate parity packets, and all n = k + h packets are transmitted. Because RTP sequence numbers make it possible to determine if a given packet is lost, an RS(n,k) code can protect against up to any h = n - k packet losses. Figure 5.2 shows an example of an RS(5,3) code applied to IP data. For this example, three information packets are RS encoded, yielding two parity packets and the 3 + 2 = 5 packets are transmitted. The three original information packets can be recovered perfectly if no more than two of the five transmitted packets are lost.

click to expand
Figure 5.2: Reed Solomon (5,3) code applied to IP data.

Because RS coding is systematic, i.e., the original information bytes themselves are transmitted, if all k information bytes are received, no computations are needed at the receiver to reconstruct the original information bytes. A key advantage of RS coding over simple parity is its ability to protect against several consecutive errors, depending on the parameter choices.

Varying amounts of packet loss protection can be achieved by varying the RS(n,k) parameters. The trade-off between delay and error protection capability affects the choice of the n, k parameters. As n and k increase for protection against a burst of length h, the overhead rate h/k decreases, but the delay in the system increases. In Rizzo, ^[23] any code parameter values of n, k up to 255 can be generated using the same generator polynomial, such that as the value of n increases, the parity bytes generated for lower values of n are unchanged. For example, the first 9 bytes of a (10,5) code are the same as would be used in a (9,5) code. The type of FEC code with multicast was used in Rhee et al. ^[24] to achieve variable levels of error protection for different users. Several multicast groups transmit different numbers of parity packets, and individual receivers join as many of the multicast groups as needed to achieve the level of error protection appropriate for their network connection. FEC is well suited to multicast, because the same parity packets can be used to protect against different losses in the separate multicast transmission paths.

FEC and scalability can be combined to achieve Unequal Error Protection (UEP). The overhead rates can be reduced by applying more error protection to the more-important layers of a scalable video stream than to the less-important layers, while maintaining the best possible received video quality in the presence of channel loss. In Priority Encoding Transmission (PET), ^[25] different layers of scalable video compressed data can be placed in the same packets and given different levels of protection.

In the High Priority Protection method (HiPP), ^[26] UEP is accomplished using an MPEG-2-like data partitioning to divide a compressed video stream into two partitions, a high-priority partition and a low-priority partition. Overhead parity data for the video stream is created by applying forward erasure correction coding to only the high-priority partition of the video stream. The high- and low-priority data and parity data are arranged into the same packets and are sent over a single channel. The packetization method used maximizes resistance to burst losses, while minimizing delay and overhead. The HiPP method is discussed in more detail in Section 5.6.

^[14]Boyce, J., Packet loss resilient transmission of MPEG video over the Internet, Signal Processing: Image Communication, pp. 7–24, September 1999.

^[15]Wang, Y. and Zhu, Q.F., Error control and concealment for video communication: a review, Proc. IEEE, 86 (5), 974–997, 1998.

^[16]Wang, Y. and Zhu, Q.F., Error control and concealment for video communication: a review, Proc. IEEE, 86 (5), 974–997, 1998.

^[17]Wu, D. et al., Streaming video over the Internet: approaches and directions, IEEE Trans. Circuits Syst. Video Technol., 11 (3), 282–300, 2001.

^[18]Liao J. and Villasenor, J., Adaptive intra block update for robust transmission of H.263, IEEE Trans. Circuits Syst. Video Technol., 10 (1), 30, 2002.

^[19]Fukunaga, S., Nakai, T., and Inoue, H., Error resilient video coding by dynamic replacing of reference pictures, Proc. IEEE Global Telecommun. Config. (GLOBE-COM), Vol. 3, London, pp. 1503–1508.

^[20]Aravind, R., Civanlar, M., and Riebman, A., Packet loss resilience of MPEG-2 scalable video coding algorithms, IEEE Trans. Circuits Syst. Video Technol., 6 (5), 426–435, 1996.

^[21]Jacobson, V., McCanne, S., and Vetterli, M., Receiver-driven layered multicast, Proc. ACM SIGCOMM '96, Stanford, CA, August 1996, pp. 117–130.

^[22]Rosenberg, J. and Schulzrinne, H., "An RTP payload format for generic forward error correction," RFC2733, http://www.faqs.org/rfcs/rfc2733.html.

^[23]Rizzo, L., Effective erasure codes for reliable computer communication protocols, Comput. Commun. Rev., 27 (2), 24–36, 1997.

^[24]Rhee, I. et al., Layered multicast recovery, Technical report TR-99-09, NCSU, Computer Science Dept., February 1999.

^[25]Albanese, A. et al., Priority encoding transmission, Proc. 35th Ann. IEEE Symp. Foundations of Computer Science, November 1994, pp. 604–612.

^[26]Boyce, J., Packet loss resilient transmission of MPEG video over the Internet, Signal Processing: Image Communication, pp. 7–24, September 1999.