8.4 Distributed File Systems and File Sharing

Although a peer-to-peer storage network enables multiple servers to share a storage resource, it does not automatically enable servers to share the same storage data. In fact, most SAN deployments are configured for a shared nothing environment, in which individual servers are assigned separate LUNs on the storage target. Each server manages its own data. In a server clustering scheme, the LUNs previously assigned to a failed server can be mapped to active servers so that data access can continue. The SAN provides the network that facilitates this deliberate reassignment of resources, but at any point in time each server has access only to its authorized LUNs.

If, alternatively, any server could access any storage resource in the SAN, data corruption would be rampant. Servers might inadvertently overwrite blocks of data, with no server able to track the random changes to the volumes exposed to common ownership. Consequently, low-level mechanisms such as zoning and LUN masking are used to enforce resource allocation and avoid this potential free-for-all.

To share the storage data itself, a layer of management prevents data corruption. You must monitor the status of a file or record that is accessible to multiple servers on the SAN so that you can track changes. In read-only applications, such as multiple servers accessing display-only Web content, a simple file locking monitor may be sufficient. In active read/write applications, such as a load sharing server cluster, more sophisticated management is required. Changes to shared data must be permitted, but in an orderly fashion that synchronizes all modifications to the original data and writes a coherent version back to disk.

Data sharing is further complicated by the fact that the data is typically dispersed over multiple storage arrays as a storage pool, as shown in Figure 8-5. The complex of physical storage devices on the SAN must be presented as a single logical resource on top of which sits a common view of a file system shared by all servers in a cluster. A distributed volume manager must thus present a coherent view of the physical storage resources; a distributed file system presents a uniform view of directories, subdirectories, and files.

Figure 8-5. A distributed file system presents a common view of resources to all servers

graphics/08fig05.gif

A distributed file system must present a consistent image to the server cluster. If a new file is created by one server, other servers must be updated immediately. Similarly, a file opened by one server cannot be arbitrarily deleted by another, or else the file system will lose integrity. If multiple servers may have the same file opened, and if modifications of the file are permitted, a distributed file system must also be able to notify each server of pending changes.

This task is further complicated by the fact that a server may be working on file content based on a now-outdated version in its cache buffers. Cache coherency between multiple servers requires notification of any changes to the file and a reread of the updated file to refresh the cache. One solution to this problem is to force the server to rely on cache buffers on the storage array instead of the host system. Device Memory Export Protocol (DMEP) has been proposed as a means to maintain cache coherency on storage arrays and avoid the issue of conflicting cached versions of files on each clustered server.

Applications that benefit from data sharing range from high-availability server clusters to processing-intensive application clusters that must digest massive amounts of data. The scientific research facility Fermilab, for example, uses Sistina's Global File System (GFS) for large server clusters that analyze data on the distribution of galaxies in the universe. With an initial server cluster of ten CPUs acting in concert, the Fermilab project must analyze more than 15 terabytes of data. The Sistina GFS presents a common view of data to the server cluster, enabling the processing to be shared across multiple platforms. In this example, the Sistina GFS also interfaces with DMEP in the target storage arrays to provide a streamlined mechanism for file locking and modification.

Although the management of distributed file systems is more complex than that of a shared nothing implementation, a distributed file system facilitates high availability, simplifies system administration, and enables high-performance processing clusters for a wide variety of compute-intensive applications. Tighter integration into operating systems will make distributed file systems and file sharing more available and further enhance the value proposition of SANs.



Designing Storage Area Networks(c) A Practical Reference for Implementing Fibre Channel and IP SANs
Designing Storage Area Networks: A Practical Reference for Implementing Fibre Channel and IP SANs (2nd Edition)
ISBN: 0321136500
EAN: 2147483647
Year: 2003
Pages: 171
Authors: Tom Clark

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net