3. Distributed Storage

The computing devices in a P2P network differ significantly in their computing power and link bandwidth capabilities. This heterogeneity must be adequately considered by advanced media archiving techniques, so that the content can be seamlessly served to the end-clients. Section 3.1 explains how multiple versions of the same video at different bitrates can be created and stored efficiently. It is followed by the VISDOM framework [18] - a fully distributed video coding and transmission scheme. VISDOM demonstrates that content can be replicated in disparate locations in order to provide load balancing of servers and fault tolerance. By nature, its transmission scheme falls semantically under Section 5; however, it is presented here together with the coding scheme for clarity. Finally, Section 3.3 addresses how individual peers can deal with their limited storage capacity in a distributed fashion.

3.1 Multi-Rate Differential Encoding

In the P2P scenario of a distributed video database, video data has to be encoded and stored to accommodate for the different connectivity and bandwidth of the network, and computing capabilities of the clients. As described in the previous section, transcoding is the principal means through which video content is adapted to the client specifications. By default, our P2P system keeps the transcoded content for future retrieval since transcoding is a very expensive operation; however, storage is also a precious resource. Therefore, Section 3.1.2 describes our approach of saving storage space by differentially encoding the transcoded bit stream of the video clip with respect to the compressed bit stream of the same clip, but at a different bit rate.

This computational trade-off can also be done at encoding time. Instead of encoding a clip at a single rate and transcoding it at other rates at later times, it is possible to simultaneously encode it at different rates that are likely to be targeted rates by the users. Compared to transcoding, this encoding is better from a rate-distortion point of view since it avoids the successive re-quantization done in transcoding. Although the amount of storage gets increased, it is possible to combine the multi-rate encoder with the differential encoder mentioned above to compensate for this increase.

3.1.1 Multi-Rate Encoding of Video

Figure 38.3 shows the block diagrams of the multi-rate encoder we have developed [27]. The encoder is used to generate N streams at different rates from the same video sequence. For each frame f_n, the encoder computes the compressed data for all streams. The first stream is the reference stream, and the others are dependent streams. There are three main characteristics to this encoder. First, most of the processing is done in the DCT domain. This avoids going back and forth from the image space to the DCT space and therefore can significantly reduce memory operations. Second, motion estimation is done once. As this is the most computationally heavy block of the encoder, this significantly reduces the computational load. Finally, with this architecture, motion compensation has to be performed in the DCT domain.

click to expand
Figure 38.3: Block diagram of multi-rate encoder.

Fast algorithms for motion compensation in the DCT domain (MC-DCT) can be used within our encoder. Also, since we are sequentially encoding a video sequence at different rates, we have the possibility to re-use data that was computed for the reference stream to reduce computation, and also determine how much loss we incur if no motion compensation or only part of it is performed on the dependent stream. The adaptation of the motion compensation is done on a macro block level and three modes can be used. The first of these modes is the usual mode for which the previously reconstructed frame of the same stream is motion compensated to generate the current predicted frame. In the second mode, the motion compensation is computed on the difference between the (n-1)^th reconstructed frames of the current dependent stream and the reference stream. Typically, this difference will be small which, in the DCT domain, translates by a higher number of small or zero coefficients. An appropriate implementation of the MC-DCT can then take advantage of this by not computing the motion compensated DCT coefficients where we expect them to be zero or small. This can be determined by using the quantized residuals of the reference stream . Of course, if a DCT coefficient of the motion compensated frame is not correctly computed at the encoder, it will contribute to a drift (mismatch) error at the decoding time. Finally, in the third mode, we simply use the data from the motion compensated frame of the reference stream. This is more appropriate for B frames since the mismatch errors that are introduced do not propagate to other frames.

3.1.2 Differential Encoding of DCT Coefficients

Figure 38.4 shows a high level diagram of the differential encoder. We assume that a video sequence is simultaneously encoded at multiple rates or quality levels to generate n independent standard compliant (e.g., MPEG-2) video streams. This can be done using the encoder described in Section 3.1.1. One of the video streams is chosen to be the reference stream, in this example, the jth stream, and the other single layer streams are encoded without loss with respect to that reference stream to form differentially encoded streams. The output of the differential video encoder is therefore a reference stream, which is a standard compliant video stream, and a number of differentially encoded streams.

click to expand
Figure 38.4: High-level representation of the differential video encoder (top) and decoder (bottom).

As illustrated in the bottom of Figure 38.4, those differential bit streams can then be used to reconstruct without loss the single layer video streams when the reference stream is available. We emphasize that the video stream generated by the differential decoder is one of the streams generated by the multi-rate encoder, and therefore, it has the rate-distortion characteristics of the single layer encoder used to generate it. This attribute differentiates this approach from scalable codecs and transcoders.

The differential lossless encoder we have developed works at the level of the quantized DCT coefficients. Basically, for a given block of 8x8 pixels, the values of the DCT coefficients of the reference stream are used to predict the values of the DCT coefficients of the target stream. The difference between the predicted and actual values of the target stream DCT coefficients is losslessly encoded using an entropy encoder. Coefficients from inter- and intra-coded blocks are treated differently as we briefly explain next. For intra-coded blocks, the predicted coefficient values of the target stream are simply the quantized coefficient values of the reference stream which are re-quantized with the quantization parameters used for the target stream. The same simple prediction mode could be used for inter-coded blocks. For inter-coded blocks, however, the DCT coefficients encode the difference between the video frame and the motion compensated prediction. As the motion compensated data is not the same at different bit rates, the motion compensated difference will vary with different bit streams. The inter-coefficient predictor we use takes into account the mismatch between the motion compensated frame difference of both streams. It adds to the quantized coefficient values of the reference stream the difference between the motion compensated coefficients of the reference and target streams.

Using the multi-rate encoder described previously, we encoded video sequences without rate control at multiple bit rates. Table 38.1 shows the results for the sequence Football (CIF, 352x240, 30fps) encoded at 5 different rates from 750Kbits/sec to 2Mbits/sec using a quantization parameter value ranging from 7 to 17. The results show that the differential encoding of the coefficients allows saving 30% of the storage space when compared to independently storing the five bit streams at a cost of slightly increasing the CPU load (required to perform differential decoding). The results also show that the performance is independent of the choice of the reference stream. In this specific case, the reference stream was alternatively chosen to be the stream encoded at 1.5 Mbits/sec, 1.2 Mbits/sec, and 1 Mbits/sec with very small difference in performance.

Table 38.1: Bitrate of differentially encoded streams and storage savings with respect to independent coding of the streams for the CIF sequence Football.
Quant. scale value	Bit rate of differential and reference streams (bits/sec)			Bit rate of independently coded streams
Quant. scale value	Ref Q=9	Ref Q=11	Ref Q=13	Bit rate of independently coded streams
7	1,035,357	1,200,314	1,336,536	1,957,853
9	1,492,394	785,261	926,537	1,492,394
11	670,705	1,119,814	625,793	1,193,814
13	626,852	536,158	992,703	992,703
17	435,082	586,105	481,483	741,834
Total bit rate	4,260,390	4,301,652	4,363,052	6,378,599
Savings	33.2%	32.6%	31.6%

3.2 Distributed Video Coding and Transmission

Storage of popular content at a single node in a distributed video database can lead to fragility in the system due to the "single point of failure" phenomenon. Situations when a large number of clients attempt to access data from this single popular storage can lead to excessive loading and consequently content node failure.

A simple and reasonable, but not necessary efficient and always sufficient approach to combat the above problem and to ensure reliability is to replicate or mirror the data at multiple locations, as it usually happens in P2P networks. Therefore, we have developed the VISDOM framework [18], which uses algorithms for efficient coding and transmission of video content that are distributed in nature. VISDOM enables download of real-time content in parallel from multiple sources, with no need for any communication between them, by leveraging the latest advances in the fields of video compression, distributed source coding theory, and networking. Although VISDOM is distributed, there is no loss in performance with respect to an equivalent centralized system. It also leads to load balancing of servers and fault tolerance. That is, if there is a link/server outage, there will be graceful quality degradation at the client end (the decoded quality decreases smoothly with the reduction in the number of received packets).

click to expand
Figure 38.4: VISDOM system block diagram.

The VISDOM approach seeks to deliver a nearly constant perceptual quality to the client. Given that the receiver is connected to multiple sources, VISDOM accomplishes this task while minimizing the net aggregate bandwidth used from all the sources put together. The striking aspect of VISDOM approach is that the net aggregate bandwidth used in VISDOM is the same as that would be used if content were to be streamed from a single source. In this sense, there is no loss in performance with respect to a centralized system even though the VISDOM system is distributed in nature. In fact, the natural diversity effect that arises because of streaming from multiple sources provides load balancing and fault tolerance automatically. The three main aspects of VISDOM, namely video compression, distributed coding, and networking, are operating in synergism leading to a satisfying end user experience. A block diagram of the system is shown in Figure 38.4.

Video Compression: We use a multi-resolution encoded video bit-stream, such as H.263+ or MPEG4-FGS [6]. The advantage of using such a bit-stream is that a single stream can be served over a dynamic network with varying bandwidths and/or cater to diverse receivers.

Robust (Distributed) Coding: A multi-resolution bit stream, as described above, is sensitive to the position of packet loss, i.e., a loss of more "important" bits early on in the bit stream can render the remaining bits useless from the point of view of decoding. Such a prioritized bit stream is inherently mismatched to the Internet, which offers equal treatment to all packets. So, a transcoding mechanism to transform a prioritized bit stream into an un-prioritized one, in which the decoded quality is only a function of the number of packets received, is required. The MDFEC (Multiple Descriptions through Forward Error Correction codes) algorithm offers a computationally efficient solution to this problem by trading off bandwidth for quality [21]. The MDFEC algorithm is discussed in more details in Section 5.2. Essentially, the algorithm uses Forward Error Correction (FEC) ^[10] codes of varying strengths to differentially protect (the more important parts of the bit stream are protected more than the less important parts) the video bit stream. The parameters for encoding are decided based on both the state of the transmission channel as well as the rate-distortion characteristics of the source content. Given these two factors, these parameters are optimized so as to give the best end-to-end performance using a computationally efficient, Lagrangian approach based algorithm. The computational ease of the approach enables dynamic adaptation to changes in network/content characteristics.

Figure 38.5 illustrates the operation of the MDFEC framework. The source content that has been encoded in the multi-resolution format in N layers is unequally protected using channel codes of varying strength. The first layer is protected using an n=N, k = 1 channel code, the second layer is protected using n=N, k =2 code and so on. Thus, with the reception of any one packet, quality commensurate with the first multi-resolution layer is decoded and with the reception of any two packets quality commensurate with the first two layers is obtained and so on. Unlike the multi-resolution bit stream where the decoded quality is sensitive to the position of packet loss, in the MDFEC stream the decoded quality is dependent only on the number of losses.

click to expand
Figure 38.5: MDFEC— The arrangement of FECs and video layers enables successful decoding of layer 1 with reception of any one packet and layers 1 and 2 with the reception of any two packets and so on. The amount of data and protection for each layer can be matched to the transmission channel at hand.

While MDFEC was originally designed for a single channel between the sender and the receiver, an interesting property of this algorithm is that it is inherently distributed. There is a notion of an "aggregate" channel between the client and all the servers, and in this case the MDFEC algorithm is applied to this aggregate channel. How this is achieved in a manner that the servers do not need to communicate with each other is explained in the next paragraph.

Networking: Streaming real-time content from multiple senders to the client in a scalable way without any communication between the various remote servers is accomplished through the client, which acts as the synchronization "master" coordinating all the servers. Thus the client is the controlling agent in this framework ensuring that all the servers operate in harmony.

Further, the same idea is used in ensuring the distributed MDFEC encoding at the servers. The client having access to the aggregate channel state conveys that to all the servers. The individual servers use this information along with the content rate-distortion characteristics to run the MDFEC algorithm independently and hence come up with identical encoding strategies. Then based on a pre-decided allocation, which is also conveyed by the client to all the servers, each of the servers send the pre-assigned portion of the MDFEC bit-stream. See Figure 38.5 for a sample allocation of packets. The dynamic, adaptive nature of MDFEC allows us to run the VISDOM algorithm over coarse time scales to adapt to the varying traffic conditions in the network. The system is fault-tolerant since if one of the servers goes down, the graceful quality degradation of MDFEC with the number of packets as well as the dynamic nature of the adaptation absorbs these variations leading to a nearly imperceptible end-user experience.

Simulation Results: The validity of the VISDOM framework was confirmed in an Internet setting where a content client was placed at University of California, Berkeley and three content servers at University of Illinois, Urbana-Champaign, Rice University, Houston, Texas and EPFL (Ecole Polytechnique Federale de Lausanne) Lausanne, Switzerland respectively. The Football video sequence was streamed to the client in three settings: single server, two servers, and three servers. Figure 38.6 (a),(c),(e) show the delivered picture quality measured in PSNR (dB) as a function of time for the single, two, and three server case respectively. Figure 38.6 (b), (d), (f) show the total transmission rate measured in Kbytes/sec as a function of time for the single, two and three server case. We notice from these plots that for nearly the same transmission rate, the delivered quality becomes more and more smooth as the number of servers is increased. This highlights the diversity effect of using multiple servers.

click to expand
Figure 38.6: Plots (a),(c),(e) show the delivered quality for the Football video sequence (measured in PSNR (dB)) as a function of time for one, two and three servers respectively. Plots (b),(d),(f) show the total transmission rate as a function of time for one, two and three servers respectively. Even though the transmission rate is nearly the same for the three cases, the delivered quality becomes more and more smooth as the number of content servers is increased.

3.3 Adaptive/Dynamic Storage and Caching Management

The storage or caching operations in current database systems—centralized or distributed—are performed on a binary level; either some data is stored/cached or it is deleted. There is no concept of trading in gradual degradation for storage efficiency. For multimedia data such as video "gradual" states of data storage are much more suitable. Media data is bulky but possesses the unique property that it is malleable.

The MAPS Enhanced Media Storage module offers a novel cache management strategy for media data by going beyond binary caching decisions. It exploits transcoding of media data to achieve a higher utilization of cache space. Less frequently accessed media data is cached at a lower quality level in order to consume less cache space. Consequently, more data will fit into the cache, which in turn will lead to a higher cache hit rate.

For every cache entry, a recall value is maintained indicating its popularity. An entry's recall value is calculated based on its access pattern, using heuristics with the following properties:

The more often a media cache entry is accessed, the larger its recall value.
The recall value of a cache entry that is never accessed approaches zero over time, and will thus be deleted if the cache runs out of space.
Often-recalled data will have a higher recall value than newly accessed data.

The cache management system uses entries' recall values to rank the media data in descending order (see Figure 38.7) and to assign class numbers to them according to a local mapping table (see Table 38.2). The class number of a data entry determines its maximum allowed bitrate. If the compression ratio of a cached entry is higher than its allowed bit rate, the entry is added to a background transcoding queue.

Table 38.2: Mapping data to data classes based on its position in the cache. The position is determined by the data's recall value.
Class	From /To
0	[0, N₀]
1	]N₀, N₁]
2	]N₁, N₂]
…	…
N	]N_N-1, N_N]
N+1	] N_N, N_N+1]

click to expand
Figure 38.7: Media data ordering in the adaptive/dynamic storage and caching management system based on recall values. The total cache size is N_N+1.

The amount of transcoding possible at each cache node in a distributed storage system depends on the computational capabilities of the node as well as its current load. Thus, different transcoding tables and cache-partitioning tables should be available at each node for different workloads as well as for nodes with different performance. For instance, a very slow node should not use any transcoding, but instead support only a binary caching decision. Consequently, there would be only one data class assigned to the whole cache. A powerful node, on the other hand, can use a very fine granularity scale transcoding scheme.

Table 38.3 gives an example of a possible mapping from class numbers to target compression qualities.

Table 38.3: Mapping data class numbers to best allowed compression quality.
Class	Video Target Quality	Audio Target Quality
0	Unchanged original quality	Unchanged original quality
1	MPEG4 320x240 at 1.5 Mbits/s	MP3 128Kbits/s
2	MPEG4 320x240 at 700 Kbits /s	MP3 96Kbits/s
…	…	…
N	MPEG4 160x120 at 100 Kbits/s	MP3 mono 11.2 KHz at 16 Kbits/s

^[10]FECs introduce redundancy at the encoder by converting k message symbols into n>k code symbols. This enables the decoder to recover the input k message symbols correctly even if there are some losses in transmission and only some m≥k code symbols are received.