2. System Architecture

There are two basic techniques to assign the data blocks of a media object, in a load balanced manner, to the magnetic disk drives that form the storage system: in a round-robin sequence [3], or in a random manner [31]. Traditionally, the round-robin placement utilizes a cycle-based approach for scheduling of resources to guarantee a continuous display, while the random placement utilizes a deadline-driven approach. In general, the round-robin approach provides high throughput with little wasted bandwidth for video objects that are retrieved sequentially. This approach can employ optimized disk scheduling algorithms (such as elevator [27]) and object replication and request migration [8] techniques to reduce the inherently high startup latency. The random approach allows for fewer optimizations to be applied, potentially resulting in less throughput. However, there are several benefits that outweigh this drawback, as described in [32], such as 1) support for multiple delivery rates with a single server block size, 2) support for interactive applications and 3) support for data reorganization during disk scaling.

One potential disadvantage of random data placement is the need for a large amount of meta-data: the location of each block must be stored and managed in a centralized repository (e.g., tuples of the form <node_x, disk_y>). Yima avoids this overhead by utilizing a pseudo-random block placement. With pseudo-random number generators, a seed value initiates a sequence of random numbers which can be reproduced by using the same seed. File objects are split into fixed-size blocks and each block is assigned to a random disk. Block retrieval is similar. Hence, Yima needs to store only the seed for each file object, instead of locations for every block, to compute the random number sequence.

The design of Yima is based on a bipartite model. From a client's viewpoint, the scheduler, the RTSP and the RTP server modules are all centralized on a single master node. Yima expands on decentralization by keeping only the RTSP module centralized (again from the client's viewpoint) and parallelizing the scheduling and RTP functions as shown in Figure 32.1. Hence, every node retrieves, schedules and sends data blocks that are stored locally directly to the requesting client, thereby eliminating a potential bottleneck caused by routing all data through a single node. The elimination of this bottleneck and the distribution of the scheduler reduces the inter-node traffic to only control related messages, which is orders of magnitude less than the streaming data traffic. The term "bipartite" relates to the two groups, a server group and a client group (in the general case of multiple clients), such that data flows only between the groups and not between members of a group. Although the advantages of the bipartite design are clear, its realization introduces several new challenges. First, since clients are receiving data from multiple servers, a global order of all packets per session needs to be imposed and communication between the client and servers needs to be carefully designed. Second, an RTSP server node needs to be maintained for client requests along with a distributed scheduler and RTP server for each node. Lastly, a flow control mechanism is needed to prevent client buffer overflow or starvation.

click to expand
Figure 32.1: The Yima multi-node hardware architecture. Each node is based on a standard PC and connects to one or more disk drives and the network.

Each client maintains contact with one RTSP module for the duration of a session to relay control related information (such as PAUSE and RESUME commands). A session is defined as a complete RTSP transaction for a continuous media stream, starting with the DESCRIBE and PLAY commands and ending with a TEARDOWN command. When a client requests a data stream using RTSP, it is directed to a server node running an RTSP module. For load-balancing purposes each server node may run an RTSP module. For each client, the decision of which RTSP server to contact can be based on either a round-robin DNS or a load-balancing switch. Moreover, if an RTSP server fails, sessions are not lost — instead they are reassigned to another RTSP server and the delivery of data is not interrupted.

In order to avoid bursty traffic and to accommodate variable bitrate media, the client sends slowdown or speedup signals to adjust the data transmission rate from Yima. By periodically sending these signals to the Yima server, the client can receive a smooth flow of data by monitoring the amount of data in its buffer. If the amount of buffer data decreases (increases), the client will issue speedup (slowdown) requests. Thus, the amount of buffer data can remain close to constant to support consumption of variable bitrate media. This mechanism will complicate the server scheduler logic, but bursty traffic is greatly reduced as shown in Sec. 3.