5. Distributed Delivery Mechanisms

Once the requested video has been located, the P2P system optimises its delivery. In conventional file sharing systems media data is typically delivered asynchronously using standard network routing methods, and reliable network protocols such as TCP. This, however, may not be the best solution in situations when real-time streaming is desired and/or video transcoding is required. In the following sections we describe mechanisms that facilitate the delivery of video data over heterogeneous networks to heterogeneous peers. Section 5.1 describes our novel scheme of dynamic transcoding along a network path, while Section 5.2 focuses on robust video unicast and multicast in wireless LANs.

5.1 Optimal Delivery with Dynamic Transcoding

In a network in which data is shared and proliferated, typically multiple copies of a given file will exist throughout the system. When the data of interest is multimedia, in general, and video in particular, a larger array of possibilities arises. For example, different versions of the same video are likely to exist throughout the network at multiple bit rates and resolutions. Furthermore, content can be transcoded or transformed, even on the fly, e.g., to generate versions at new bit rates and resolutions or to incorporate additional redundancy for error resiliency over noisy channels—in other words, to make the content adaptive to client device capabilities, storage capacities, and bandwidth constraints.

In such a scenario, obtaining an exact version of data from a far-away source may be a poorer choice than obtaining a different version of the data from a nearby source and transcoding it into the required version. The "best" means of getting a version of an object (e.g., a video or audio clip) on the network can be estimated by a metric that attempts to rank and prioritize available paths to an object. In this context, a path consists of both a conventional network route and an optional sequence of transformations throughout the network that are necessary to deliver the object to the destination node so as to satisfy the requester's constraints. We will define a cost function to measure the aggregate computational complexity and transmission overhead required to deliver a given object. MAPS attempts to minimize the cost function over the set of possible paths to find the most efficient sequence of operations to employ.

To support cost-function-based optimization, nodes exchange periodic link and node state information through intermittent broadcast updates and as additional information appended to query response packets. Node state information includes an indication of the node's level of activity (based on current reservations and any queued best-effort requests), its transcoding capabilities (reflective of computational power), and an advertised cost associated with each transformation type. Each node assigns its own values to transformations' costs. The cost function may therefore vary from node to node, e.g., a node may penalize computation more than storage depending on its capabilities, or vice versa. Since transcoding is potentially a lengthy operation, MAPS uses a resource-reservation type approach, so that nodes do not become overloaded with concurrent requests.

The cost of the most efficient sequence of operations to obtain a version of an object O satisfying some set of required parameter constraints P at a given node N_D can be expressed (recursively) as follows:

where { Nk } is the set of "neighbouring" nodes (viz. peer nodes from which the object may be delivered to N_D, including itself), { P_j } is the set of possible constraints for which a version of the object can be delivered to the local node N_D from its neighbouring nodes, c_t(N_k, N_D, O, P_j) is the cost of transmission of the intermediate object from the neighbouring node N_k to the local node N_D, and c_x(N_D, O, P, P_j) is the cost of locally converting a version of O possessing initial parameters P_j to one that satisfies the node's target constraints P.

For example, if a node N_D has a version of the object that satisfies its target constraints P, then the cost of retrieving it, c(N_D, R, P), is zero; c_t(N_D, N_D, R, P) is zero since no transmission is required, and c_x(N_D, R, P, P) is zero since P=P_j making transcoding unnecessary. On the other hand, if N_D does not have a version of the resource that satisfies the constraints P, then the cost c (N_D, R, P) can include a transcoding cost at node N_D, the cost of transmission from a peer, as well as the cost of any similar operations (i.e. other transcoding and transmissions) required on its neighbours to obtain the needed object. The best way to obtain a media object can be determined by minimizing the preceding cost function.

Figure 38.10 shows how path selection is determined for an example where a node Peer A requests an MPEG-4 version of a stream ("example") from the network. The requesting node constructs a directed graph representing the relationships between participating nodes (viz. nodes containing the requested stream, and nodes capable of transforming the content to the needed form), and their advertised costs for carrying out any needed transformations. Transmission and transcoding costs are expressed as edge weights, while peer nodes and content transformations are mapped to vertices. While Peer B transcodes "example" from MPEG-2 to MPEG-4, a virtual vertex (Peer B) is created.

click to expand
Figure 38.10: Example of transcoding/path selection graph construction for delivery of a video stream "example" to a client Peer A possessing only an MPEG-4 decoder. The content is available in MPEG-2 format from Peer C, or directly as an MPEG-4 sequence from Peer D. The requesting node compares the cost of obtaining the clip along several possible paths— directly from D, which involves a cost c_t(D, A) of transmission and from C to B to A, which involves the cost of transmission, c_t(C, B) + c_t(B, A), plus the cost of transcoding at B, c_x("example", MPEG-2 to MPEG-4). Because Peer A can only decode MPEG-4, it is useless to get an MPEG-2 version of the object to Peer A. In this case, which cannot satisfy the required constraints, some edges are marked with infinite cost.

In reality, minimizing the proceeding cost function is computationally infeasible. Each peer node in the network can potentially be used to process different versions of the object. The graph of the problem will become more complicated if continuous transcoding of different bitrates is allowed. In practice, we simplify the formulation by some assumptions. We trade the optimality of the cost with the computational complexity of evaluating the formulation. To find a computationally feasible solution to the problem of transcoding and routing of multimedia data we assume that the transcoding, if needed, is performed firstly in a single peer on the network, and, secondly, in only one peer. Thus, for a network consisting of n peers, there are only n possible transcoding positions. For each transcoding position N_k, we calculate the total cost of taking the route as

where the delivery cost from Node N_k to Node N_D C_t(…) can be found by using well-known network routing optimisation via dynamic programming. In other words, we simplify the optimization problem to two network routing problems - one from the source to the transcoder and the other one from the transcoder to the destination.

After we have the cost of transcoding at each note, we minimize the overall cost by:

i.e., after solving at most n dynamic programming problems, the optimal path can be determined. This approach is possible in the situations where N is not restrictively large. Note that most of the times only a limited number of nodes in the peer network are able to actually handle transcoding efficiently, and, therefore, the actual number of transcoding nodes is significantly smaller than N.

The path MAPS uses to deliver a video is determined by minimizing the preceding cost function. One point to note here is that although "cost" is advertised based on a local perspective by peers, the cumulative cost of an operation is inherently global in nature as it spans multiple nodes. The assumption is that peers behave "fairly," by trading individual for global benefits.

5.2 Robust Video Unicast and Multicast in Wireless Lans

With wireless networks gaining prominence and acceptance, especially the LANs based on the IEEE 802.11 standards, it is foreseeable that wireless streaming of audio/video will be a critical part of the P2P infrastructure. Therefore, we investigate in detail the specific features of real-time video streaming in wireless LANs.

There are two major challenges for video streaming over wireless LANs: (1) fluctuations in channel quality, and (2) higher bit/packet error rates compared with wired links. In addition to wireless channel-related challenges, when there are multiple clients, we have the problem of heterogeneity among receivers, since each user will have different channel conditions, power limitations, processing capabilities, etc., and only limited feedback channel capabilities. (This is the case when, for example, multiple peers request a resource that is a real-time streaming.) In the following we analyze both the single-user and multi-user cases in detail, and propose practical solutions to address the aforementioned challenges.

There has been a substantial amount of prior work in the area of video streaming. The single user scenario has been addressed by a number of researchers (see, e.g., [1] [20] [3] and references therein). In the following we introduce a new method that combines the reliability and fixed delay advantages of Forward Error Control (FEC) coding with the bandwidth-conserving channel-adaptive properties of Automatic Repeat ReQuest (ARQ) protocol. In a multicast scenario, to tackle the problem of heterogeneity and to ensure graceful quality degradation the use of multi-resolution based scalable bit streams has been previously suggested in [19][25]. However, such a bit stream is sensitive to the position of packet loss, i.e., the received quality is a function of which packets are erased. To overcome this problem, the Priority Encoding Transmission scheme was proposed in [1], which allows for different resolution layers to be protected by different channel codes based on their importance. Substantial work has been done towards finding an algorithm for optimizing the amount of parity bits used to protect each resolution layer [20][21]. A near-optimal, O(N) (where N is the number of packets) complexity, algorithm was proposed in [21] to solve this problem.

5.2.1 802.11 Wireless LAN as Packet Erasure Channel

To address the problem of wireless video streaming we need to model the communication channel. The IEEE 802.11 Media Access Control (MAC)/Logical Link Control (LLC) and physical (PHY) layers represent two lower layers in the Open System Interconnect (OSI) reference model, i.e., the Data link and Physical layers. In real life, we do not have direct access to the Physical (or even MAC) layer. Further, most of the successful wireless networks adopt the Internet Protocol (IP) as a network layer simplifying the integration of wireless networks into the Internet networks. In this scenario, user applications see the wireless channel as an IP packet channel with erasures — much like the wired Ethernet. Therefore, in designing our algorithms (which run at the application layer) we model the wireless network channel as a packet erasure channel at the network layer level. There is a very close connection between the problem discussed in Section 3.2 and the problem we consider here. In fact having multiple unreliable servers streaming data to a client is essentially analogous to having a single server sending information over a packet erasure channel. It is not a surprise, therefore, that we use a similar MDFEC-based solution for streaming over wireless LANs.

In order to reliably communicate over packet erasure channels, it is necessary to exert some form of error control. Asynchronous communication protocols, such as ARQ, are reliable but have unbounded delay. In synchronous protocols, the data are transmitted with a bounded delay but generally not in a channel adaptive manner. To provide for some measure of reliability, Forward Error Control coding is employed (see also S=section 3.2). If the number of erased packets is less than the decoding threshold for the FEC code, the original data can be recovered perfectly. However, FEC techniques cannot guarantee that the receiver receives all the packets without error. Popular Reed-Solomon (RS) codes are described by two numbers (n,k), where n is the length of the codeword and k is the number of data symbols in the codeword. Each symbol is drawn from a finite field of 2^s elements, where s (we use 8) is the number of bits to be represented in each symbol. The total number of words in the code equals 2^s - 1. RS codes can be used to correct errors, erasures, or both. Particularly efficient decoding algorithms based on Vandermonde matrices [22] exist if only erasures are to be corrected. In this case, each parity symbol can correct any one missing data symbol. This means that we can recover the original codeword, and hence the original data, if at least k of the original n symbols is received.

5.2.2 Robust Video Unicast over Wireless LANs

In a unicast scenario there is only one recipient of the video data. The cost function to be optimized is a reconstruction distortion subject to rate and delay constraints. The first approach we describe is the purely FEC-based MDFEC algorithm [21], which assumes as input a scalable video bit stream. On the other hand, the Hybrid ARQ protocol [17] is a combination of FEC and ARQ techniques and does not assume a scalable video bit stream as input, and therefore is readily applicable to existing video content stored on DVDs and VCDs. The problem we address in this section is that of finding the parameters for source and channel coding schemes for a single server and a single client, to maximize the overall data quality (or, equivalently, minimize the distortion) subject to a communication delay constraint.

MDFEC

MDFEC is a transcoding mechanism to convert a prioritized multiresolution bit stream into a non-prioritized multiple description bit stream (see Figure 38.11) using efficient FEC codes.

click to expand
Figure 38.11: Mapping progressive stream to packets using MDFEC codes. Progressive stream (top) is partitioned by R_i and encoded with (N,i) codes. R_i corresponds to layer i in Figure 38.5.

Let d be an N-dimensional distortion vector (also called the distortion profile) where d_k reflects the distortion attained when k out of N packets are received. The progressive bit stream is marked at N different positions (that form N resolution layers), which correspond to achieving the distortion levels d_k as shown in Figure 38.11. The i^th resolution layer is split into i equal parts and an (N,i) RS code is applied to it to form the N packets as shown in Figure 38.11. Since every packet contains information from all the N resolution layers, they are of equal priority. The RS code ensures that the i^th resolution layer can be decoded on the reception of at least i packets. Since the distortion-rate function D(r) for a source is a one-to-one function of the rate r, finding the N dimensional distortion vector d corresponds to finding the rate partition R = (R₁,…,R_N) of the multiresolution bit stream (see Figure 38.11). A fast, near-optimal algorithm, of complexity O(N) (where N is the number of packets), based on Lagrangian principles, to solve this problem is described in [21].

Hybrid ARQ

MDFEC method is an attractive solution but it requires progressive video input. Here we propose a way to combine the ARQ and FEC error control methods to improve the performance of unicast communications of single resolution video over packet erasure channels. Hybrid ARQ schemes have been extensively studied in the literature for various communication channels. We do not attempt to survey all of them (the reader is referred to [15] for a textbook treatment) but rather we propose a scheme that specifically addresses the problem of video streaming over 802.11 networks. The idea is illustrated in Figure 38.12. We start by splitting our multimedia data into "packet groups," consisting of k packets each, and then, for each packet group, append n-k RS parity packets to the group as in the FEC coding scheme described above. However, unlike in the pure FEC scheme, we initially send only the first k data packets to the receiver. Then transmitter starts sending parity packets until one of the following two events occurs: either an acknowledgment from the receiver arrives, or the deadline for the transmission is reached. Once at least k packets are received intact, the receiver sends an acknowledgment. Once the acknowledgment is received, the transmitter continues with the next k data packets. One significant advantage of this algorithm is that it does not break down even when acknowledgments are lost. Instead, the transmitter simply assumes that more parity is needed.

click to expand
Figure 38.12: Hybrid ARQ algorithm data flow.

The Hybrid ARQ scheme is a general algorithm and can be adjusted to fit specific cases as is appropriate. For further details on Hybrid ARQ algorithm and its theoretical performance evaluation we refer to [17].

5.2.3 Robust Video Multicast over Wireless LANs

ARQ-based schemes are less appropriate for the multicast case for two reasons: ACK explosions and the requirement to retransmit different packets to all users. For significant packet loss rates, each user will require frequent packet replacement, and different users are most likely to require different packets. To respond to requests by multiple users, we may have to resend a significant fraction of the original data even for small loss rates. However, for small multicast networks, the hybrid ARQ scheme can alleviate the problem of sending different correction packets to each user. Because each parity packet can replace any missing data packet, there is no need for each user to identify which packet is missing. Instead, each user can simply transmit an acknowledgment when it receives enough parity to decode the transmitted data. When acknowledgment packets have been received from all known multicast users, the transmitter can move on to the next packet group.

An alternative attractive solution is based on the MDFEC approach. In order to address the multiuser setup we propose to minimize the following criterion: Δ(R) = max_i (E[d_ii (R)] - E[d_i]_min), where R is the rate partition as defined above, E[d_i]_min is the minimum expected distortion for the i-th client, achieved by using the optimal coding scheme when it is the only client, and E[d_ii (R)] is the expected distortion for the particular coding scheme being used. Such an overall quality criterion is fair in the sense that it minimizes the maximum penalty that any client suffers. It then can be shown that the problem of finding the optimal partitions R is the problem of minimizing the intersection of convex surfaces [17]. For example, in a two-client case our solution is to use the single user version of MDFEC algorithm for each client, and then find the lowest intersection of two distortion-rate surfaces (convex) as the point of equal disappointment. Similarly, for more than two users we can perform the outlined optimization pair-wise and select the lowest point among all possible pairs.

To actually find the rate partition that maximizes the overall quality criterion, we use a simplex optimization method although other standard techniques may be used. While the computational complexity of the simplex algorithm may be high, we do not expect that the algorithm will be run for a very large number of users. This is due to the fact that the algorithm runs at each access point in the wireless LAN network and we do not expect many users present in the vicinity of any access point.

Table 38.4 shows the difference in the MSE penalty (Δ(R)) for users with packet erasure channels having different probabilities p_i for the case when the parameters were optimized for the worst channel case and for the proposed minimax criterion. In the first case the user 1 suffers significant quality degradation (compared to a system optimized for user 1 only), while in the second case users 1 and 3 "share" the largest degradation in MSE.

Table 38.4: Comparison of penalty in distortion for three users for MDFEC designed for the worst-case channel, and for MDFEC designed for the minimax cost.
user	p_i	E[d_i]_min	Worst case MSE penalty	Minimax MSE penalty
1	0.113	137.5	32.3	7.0
2	0.156	162.9	14.9	0.3
3	0.197	193.5	0	7.0

To summarize this section, we point out that media streaming over wireless networks in a single or multiple user scenario poses specific problems to P2P delivery mechanisms that can be efficiently addressed by channel coding for packet erasures (FEC, ARQ, and their combinations) and progressive source coding.