You've already seen the minimum components of a SAN: a storage device, a connection device, and physical wiring. You build a SAN because it offers you the flexibility to add many components to a network and turn them into a shared resource at a very high speed. Therefore, you can expect to see all manner of devices attached to a SAN, including the following:
Storage networking is one of the most dynamic areas of technology around, and many new devices and software products are introduced every year. All these devices can be related to the shared storage model we've been discussing. Let's use some of the vocabulary you've just seen to put some of the various storage networking devices into context. Block-Oriented ServersThe block layer is where you manage data, create fault tolerance, and organize data for best performance. That means that the following devices are assigned as block-level devices:
The system of I/O that is used to aggregate blocks passes a vector of block addresses between the file/record layer and the block layer. Thus, a typical block-layer device can span three layers of the model, as shown by the prototypical block-layer device, a disk array (refer to Figure 12.3). In fact, block-oriented architectures don't even need to be fully contained within the block layer. As you can see in Figure 12.4, a host (or server) connected to a DAS device extends this type of SAN component to other layers, such as the three hosts shown spanning the file/record layer. Block-oriented devices require three components: disk storage, aggregation, and the logic necessary to supply data to applications. Figure 12.4. Block-oriented devices require a three-layer approach to storage networking. This figure is used by permission of SNIA.File-Oriented ServersThe file/record layer offers different opportunities for the creation of storage networked devices. What happens at the file/record layer is a mapping function of volumes to files and volumes to tables and tuples, which are essentially the functions of a file system. (This is a different kind of mapping than the block address mapping done in the block layer.) A file system can be at the operating system level in a host, in a database server or its equivalent, or in a distributed file system, some of which are built around HTTP protocol caches. SAN components that are file-oriented devices may be any of the following:
Figure 12.5 shows how file-oriented storage devices are illustrated within the SNIA networking model. Figure 12.5. File-oriented storage devices. Figure courtesy of SNIA.Thus, you can see that even file-oriented storage devices can incorporate various layers in the SNIA model. A NAS server is essentially a SAN in a box; it spans all the layers, from the very lowest block level through the file/record layer that connects directly into the LAN. From a host's point of view, a NAS server appears on the network as any other host would. Similarly, a NAS head also appears as a named server on a network, but unlike a fully contained NAS server, a NAS head requires a connection to a storage network to a disk array to be a fully functional storage device. That is, you can't just plug a NAS head into an Ethernet switch and have it work as soon as you assign a TCP/IP address to it. Host-oriented, file-based storage devices require not only an HBA to be functional but also a logical volume manager (LVM) to supply the mapping function. Depending on the type of disk controller used, the HBA can be a simple network interface to the storage network (the storage networkattached host on the right in Figure 12.5), or, when the host has either software or hardware RAID, the block mapping function can be moved into the file layer of the host. At nearly any point in a SAN, it makes sense to place a cache. Referring again to Figure 12.5, which shows the various file-oriented servers, you could put a cache in each of the storage servers, with the exception of the host with LVM and the NAS head. The reason you wouldn't cache those two servers is that you get faster performance from caching the disk array that they both connect to. Caching is a very important application in SANs, and caching appliances are in fact their own category of devices. Network Appliances, for example, has a significant business in cache appliances that are specially tuned filers. From a topological viewpoint, you would place a cache appliance in the network block aggregation layer, directly into the storage network, where it's central location benefits all devices connected to the SAN. All the aforementioned theoretical discussion brings up the essential point that there are two basic types of storage networking devices: block-oriented and file-oriented servers. Intelligent disk arrays such as an EMS Symmetrix, an HDS Freedom, or a Hewlett-Packard StorageWorks server are block-oriented servers. The classic examples of NAS servers are Network Appliances's filers (FS series), EMC Celera, and Dell PowerVaults, to name but a few. There are the big boys, very large storage deployment devices, and there are much smaller departmental and personal storage devices that are similarly architected. Each offers a distributed storage solution, but the two different approaches are better at doing different things. A block-oriented server is highly efficient at moving volumes of data, such as backup or in transactional systems. A file-oriented server is best at serving up files, and it is particularly good at serving up big files, as you might have for streaming applications. While this discussion is rather theoretical in nature, it is important because by simply keeping this one fundamental difference between storage server types in mind, it is possible to guess the reasons that different storage network devices or architectures and different software applications hold sway where they do. Software and ServicesA SAN has the same requirements for managing components as any other network does. Therefore, the SNIA model defines a services component that spans all layers of the storage domain. SAN software runs the gamut in size and cost. There's free Open Source software such as the Samba file-sharing software. You often find a lot of software bundled with the hardware you buy; it isn't exactly free, but it is of unknown cost. Software written for SANseven common server software such as backup programstends to be expensive. The most capable packages (for example, SANPoint Control and Foundation) can costs hundreds of thousands of dollars to employ on a SAN. But here's the thing about SAN software: It is absolutely the most critical part of a SAN and holds the greatest opportunity for both short-term and long-term success, as measured by both performance gains and cost savings. When considering building a SAN, after you decide exactly what functions you need to implement, your next consideration should be the software. Refer to Figure 12.3, where the SNIA model shows software in the services layer. The following classifications of software would be included as service-layer applications:
ArraysArrays are the stars of the SAN world. That may be because it is in the storage container itself that the most money is spent. However, it is also because the arrays of today come with an unprecedented amount of intelligence, in the forms of built-in capabilities and special software for management and staging as well as many specialized tasks. It's easy to develop a storage-server-centric point of view. When it comes to storage, people use the term arrays rather loosely. An array is a storage container that contains multiple storage devices that can be managed as a single entity from either a connected host or a remote point. Some arrays are smalla collection of disks you can count on your hand(s)and live inside your application server. At the high end are storage servers the size of a very large refrigerator, such as the Symmetrix DMX3000, which can contain up to 584 disks and more than a terabyte of disk space. There's a clear distinction being made in the industry between a simple array and an "intelligent" array. An intelligent array implies a complete server solution. Most often arrays are based on hard drives, and nearly all arrays support RAID. But solid-state memory caching servers are also configured as disk arrays, and large managed tape libraries are also referred to as tape arrays; a few even support what is called tape RAID. Any RAID configuration is by definition an array. Depending on your point of view, an enclosure set up as a JBOD may or may not be an array. To qualify as an array, a storage system should be able to be managed by some form of control software. Control software presents a unified method for defining data structures, executing commands that are carried out in a disk controller or intelligent HBA, or executing logic in the BIOS of the storage server as part of the firmware. Nearly any array you purchase comes with software to manage that array. You can also buy software from third parties to manage various storage arrays. Raidtec's line of products exemplifies the range of devices to which the term array is applied. Raidtec (www.raidtec.com) sells an external DAS array called the Raidtec CS3102, which connects to a server using a dual Ultra 320 LVD SCSI controller. Because it is an entry-level system, the CS3102 is populated with SATA disks. Further up the food chain in Raidtec's DAS offerings is its FlexArray Ultra, which is an Ultra 160 SCSI host independent. Raidtec is known primarily for its smaller systems and has been acquired by Plasmon (www.plasmon.com). However, it also has larger products. Its FibreArray Extreme array with Fibre Channel connectivity is the large server in Figure 12.6. In this figure, there are 1u and 2u devices. Raidtec also sells a NAS box, the SNAZ Pro, a managed NAS server with an integrated Fibre Channel SAN router; the SNAZ Elite; and the large array shown in the figure, which is the FibreArray AA solution. Each of these storage systems qualifies as an array, but each is meant to be used with different applications. Figure 12.6. The Raidtec family of products spans the range from small DAS arrays up to large enterprise arrays meant for SAN applications.
Raidtec is only an example of what is a very large marketplace of system manufacturers where no vendor really dominates the industry. Other vendors that offer arrays for the server market are listed in Table 12.2. Chances are that when you purchased your server, your server vendor had an array that it wanted you to buy. All server hardware companies either own or partner with companies that make arrays. Dell, for example, has the PowerVault line of products. In some instances, Dell's products are rebranded from vendors such as IBM or EMC, and in others, they are assembled at Dell. Hewlett-Packard and Sun have storage divisions that build storage arrays, and IBM's storage division (which was sold to Hitachi Data Systems) was another OEM. All these vendors offer a very large range of arrays, from small to large.
When considering an array, you need to consider the following properties:
At the higher end of the food chain are a few vendors and some notable product lines. Most people would recognize the lines of the large block-oriented servers: EMC Symmetrix DMX, Hewlett-Packard StorageWorks XP, the Hitachi Data Systems Lightening 9900 or Thunder 9500, or IBM TotalStorage DS arrays. However, the term array applies equally well to NAS file servers, such as Network Appliances's FAS980 and EMC's Cellera line, which also contains an array of disks. Figure 12.7 shows a picture of the FAS980. If you take the faceplate off the FAS980, you find that the system is a multidisk RAID array (RAID 4) that contains both an onboard processor and controllers and HBAs. Essentially, the FAS is a fully fledged computer, albeit one that is optimized for file services, and the NetApp boxes come with their own small, proprietary, high-performance operating system. All the products mentioned in this paragraph fall under the category of intelligent disk arrays, and it is only whether they are file- or block-oriented that separates one from another. Figure 12.7. The Network Appliances FAS980 is a storage array that is optimized for file services and is a computer or host in its own right.
In purchasing a disk array for your server, you need to be concerned with only a few compatibility issues. A disk array either works with your server and its operating system or it doesn't. In most cases, this compatibility has more to do with the disk or RAID controller than it does with the array hardware itself. As you move up to the very large storage arrays, their use of onboard processors and proprietary operating systems means that you need to be very concerned with their ability to interoperate with other hosts and storage systems. Many companies advertise what they call "open systems storage solutions," which may mean that their servers work with multiple types of hosts, or simply that their servers can connect to various flavors of UNIX. |