Video — a composition of moving pictures and audio clips — possess many more dimensions of searchable attributes than text or web documents. Consequently, a video-centric P2P system must allow users to search for video based on a large and diverse set of syntactic and semantic attributes. While costly manual annotations may exist in special — but rare — cases, a video-catering peer service must be able to extract information automatically from media content to assist search, browsing, and indexing.
MAPS Distributed Search subsystem therefore provides
A toolbox to extract syntactic and semantic attributes automatically from media (including MPEG-7 audio and video descriptors), and
A powerful search mechanism beyond hash keys, file
Enhanced search specification:
For search, the MAPS Media Search subsystem allows each node to have different search capabilities. Besides standard search
Automatic extraction of Meta-descriptions: The MAPS Distributed Search subsystem exploits two sources for acquiring Meta information:
Meta information encoded in the media files or the media files
Information derived by automatically analyzing the media content.
The most prominent examples of standardized meta information extracted by MAPS from media files are ID3 tags [11] from audio clips, e.g., describing album, title, and
Furthermore, by extracting semantic information automatically from media, our MAPS Video Content Analysis module supports search beyond standard Meta information. Currently, the MAPS Video Content Analysis module incorporates several types of automatic information extraction, for example, visible text in images and video (see Figure 38.8) [13], faces and their
Figure 38.8:
Example of text extraction in
Figure 38.9:
Sample results of
Once the
In a network in which data is shared and proliferated, typically multiple copies of a given file will exist throughout the system. When the data of interest is multimedia, in general, and video in particular, a larger array of possibilities arises. For example, different versions of the same video are likely to exist throughout the network at multiple bit rates and resolutions. Furthermore, content can be transcoded or transformed, even on the fly, e.g., to generate versions at new bit rates and
In such a scenario, obtaining an exact version of data from a far-away source may be a poorer choice than obtaining a different version of the data from a nearby source and transcoding it into the required version. The "best" means of getting a version of an object (e.g., a video or audio clip) on the network can be estimated by a metric that attempts to rank and prioritize available paths to an object. In this context, a path consists of both a conventional network route and an optional sequence of transformations throughout the network that are necessary to deliver the object to the destination node so as to
To support cost-function-based optimization, nodes exchange periodic link and node state information through intermittent broadcast updates and as additional information appended to query response packets. Node state information includes an indication of the node's level of activity (based on current
The cost of the most efficient sequence of operations to obtain a version of an object
O
where { Nk } is the set of "neighbouring" nodes (viz. peer nodes from which the object may be delivered to N D , including itself), { P j } is the set of possible constraints for which a version of the object can be delivered to the local node N D from its neighbouring nodes, c t (N k , N D , O, P j ) is the cost of transmission of the intermediate object from the neighbouring node N k to the local node N D , and c x (N D , O, P, P j ) is the cost of locally converting a version of O possessing initial parameters P j to one that satisfies the node's target constraints P .
For example, if a node
N
D
has a version of the object that satisfies its target constraints
P
, then the cost of retrieving it,
c(N
D
, R, P)
, is zero;
c
t
(N
D
, N
D
, R, P)
is zero since no transmission is required, and
c
x
(N
D
, R, P, P)
is zero since
P=P
j
making transcoding unnecessary. On the other hand, if
N
D
does not have a version of the resource that satisfies the constraints
P
, then the cost
c (N
D
, R, P)
can include a transcoding cost at node
N
D
, the cost of transmission from a peer, as well as the cost of any similar operations (i.e. other transcoding and transmissions) required on its neighbours to obtain the needed object. The best way to obtain a media object can be determined by minimizing the
Figure 38.10 shows how path selection is determined for an example where a node Peer A requests an MPEG-4 version of a stream ("example") from the network. The requesting node constructs a directed graph representing the relationships between participating nodes (viz. nodes containing the requested stream, and nodes capable of transforming the content to the needed form), and their advertised costs for carrying out any needed transformations. Transmission and transcoding costs are expressed as edge weights, while peer nodes and content transformations are mapped to vertices. While Peer B transcodes "example" from MPEG-2 to MPEG-4, a virtual vertex (Peer B) is created.
Figure 38.10:
Example of transcoding/path selection graph construction for delivery of a video stream "example" to a client Peer A possessing only an MPEG-4 decoder. The content is available in MPEG-2 format from Peer C, or directly as an MPEG-4 sequence from Peer D. The requesting node
In reality, minimizing the proceeding cost function is computationally infeasible. Each peer node in the network can potentially be used to process different versions of the object. The graph of the problem will become more complicated if continuous transcoding of different bitrates is allowed. In practice, we simplify the formulation by some assumptions. We trade the optimality of the cost with the computational complexity of evaluating the formulation. To find a
where the delivery cost from Node
N
k
to Node
N
D
C
t
(
…
) can be found by using well-known network routing
After we have the cost of transcoding at each note, we minimize the overall cost by:
i.e., after solving at most n dynamic programming problems, the optimal path can be determined. This approach is possible in the situations where N is not restrictively large. Note that most of the times only a limited number of nodes in the peer network are able to actually handle transcoding
The path MAPS uses to deliver a video is determined by minimizing the preceding cost function. One point to note here is that although "cost" is advertised based on a local perspective by peers, the cumulative cost of an operation is
With wireless networks gaining prominence and acceptance,
There are two major challenges for video streaming over wireless LANs: (1) fluctuations in channel quality, and (2) higher bit/packet error rates compared with wired links. In addition to wireless channel-
There has been a substantial amount of prior work in the area of video streaming. The single user scenario has been addressed by a number of researchers (see, e.g., [1] [20] [3] and references therein). In the following we introduce a new method that combines the reliability and fixed delay advantages of Forward Error Control (FEC) coding with the bandwidth-conserving channel-adaptive properties of Automatic Repeat ReQuest (ARQ) protocol. In a multicast scenario, to tackle the problem of heterogeneity and to ensure
To address the problem of wireless video streaming we need to model the communication channel. The IEEE 802.11 Media Access Control (MAC)/Logical Link Control (LLC) and physical (PHY) layers represent two lower layers in the Open System Interconnect (OSI) reference model, i.e., the Data link and Physical layers. In real life, we do not have direct access to the Physical (or even MAC) layer. Further, most of the successful wireless networks adopt the Internet Protocol (IP) as a network layer simplifying the integration of wireless networks into the Internet networks. In this scenario, user applications see the wireless channel as an IP packet channel with erasures — much like the wired Ethernet. Therefore, in designing our algorithms (which run at the application layer) we model the wireless network channel as a packet erasure channel at the network layer level. There is a very close connection between the problem discussed in Section 3.2 and the problem we consider here. In fact having multiple unreliable servers streaming data to a client is
In order to reliably communicate over packet erasure channels, it is necessary to exert some form of error control. Asynchronous communication protocols, such as ARQ, are reliable but have unbounded delay. In synchronous protocols, the data are transmitted with a bounded delay but
In a unicast scenario there is only one recipient of the video data. The cost function to be optimized is a reconstruction distortion subject to rate and delay constraints. The first approach we describe is the purely FEC-based MDFEC algorithm [21], which assumes as input a scalable video bit stream. On the other hand, the Hybrid ARQ protocol [17] is a combination of FEC and ARQ techniques and does not assume a scalable video bit stream as input, and therefore is readily
MDFEC is a transcoding mechanism to convert a prioritized multiresolution bit stream into a non-prioritized multiple description bit stream (see Figure 38.11) using efficient FEC codes.
Figure 38.11:
Mapping progressive stream to packets using MDFEC codes. Progressive stream (top) is partitioned by R
i
and encoded with (N,i) codes. R
i
corresponds to layer i in Figure 38.5.
Let
d
be an N-dimensional distortion vector (also called the distortion profile) where d
k
reflects the distortion attained when k out of N packets are received. The progressive bit stream is marked at N different positions (that form N resolution layers), which
MDFEC method is an attractive solution but it requires progressive video input. Here we propose a way to combine the ARQ and FEC error control methods to improve the performance of unicast communications of single resolution video over packet erasure channels. Hybrid ARQ schemes have been extensively studied in the literature for various communication channels. We do not attempt to survey all of them (the reader is referred to [15] for a
Figure 38.12:
Hybrid ARQ algorithm data flow.
The Hybrid ARQ scheme is a general algorithm and can be adjusted to fit specific cases as is appropriate. For further details on Hybrid ARQ algorithm and its theoretical performance evaluation we refer to [17].
ARQ-based schemes are less appropriate for the multicast case for two reasons: ACK explosions and the requirement to retransmit different packets to all users. For significant packet loss rates, each user will require frequent packet replacement, and different users are most likely to require different packets. To respond to requests by multiple users, we may have to resend a significant fraction of the original data even for small loss rates. However, for small multicast networks, the hybrid ARQ scheme can alleviate the problem of sending different correction packets to each user. Because each parity packet can replace any missing data packet, there is no need for each user to identify which packet is missing. Instead, each user can simply transmit an acknowledgment when it receives enough parity to decode the transmitted data. When acknowledgment packets have been received from all known multicast users, the transmitter can move on to the next packet group.
An alternative attractive solution is based on the MDFEC approach. In order to address the multiuser setup we propose to minimize the following criterion:
Δ
(
R
) = max
i
(E[d
i
i (
R
)] - E[d
i
]
min
), where
R
is the rate partition as defined above, E[d
i
]
min
is the minimum expected distortion for the i-th client, achieved by using the optimal coding scheme when it is the only client, and E[d
i
i (
R
)] is the expected distortion for the particular coding scheme being used. Such an overall quality criterion is fair in the sense that it minimizes the maximum penalty that any client suffers. It then can be shown that the problem of finding the optimal partitions
R
is the problem of minimizing the intersection of convex surfaces [17]. For example, in a two-client case our solution is to use the single user version of MDFEC algorithm for each client, and then find the lowest intersection of two distortion-rate surfaces (convex) as the point of equal disappointment. Similarly, for more than two users we can perform the outlined optimization pair-wise and select the
To actually find the rate partition that maximizes the overall quality criterion, we use a simplex optimization method although other standard techniques may be used. While the computational complexity of the simplex algorithm may be high, we do not expect that the algorithm will be run for a very large number of users. This is due to the fact that the algorithm runs at each access point in the wireless LAN network and we do not expect many users present in the vicinity of any access point.
Table 38.4 shows the difference in the MSE penalty ( Δ ( R )) for users with packet erasure channels having different probabilities p i for the case when the parameters were optimized for the worst channel case and for the proposed minimax criterion. In the first case the user 1 suffers significant quality degradation (compared to a system optimized for user 1 only), while in the second case users 1 and 3 "share" the largest degradation in MSE.
|
user |
p i |
E[d i ] min |
Worst case MSE penalty |
Minimax MSE penalty |
|---|---|---|---|---|
|
1 |
0.113 |
137.5 |
32.3 |
7.0 |
|
2 |
0.156 |
162.9 |
14.9 |
0.3 |
|
3 |
0.197 |
193.5 |
|
7.0 |
To summarize this section, we point out that media streaming over wireless networks in a single or multiple user scenario poses specific problems to P2P delivery mechanisms that can be efficiently addressed by channel coding for packet erasures (FEC, ARQ, and their combinations) and progressive source coding.