NAS is a product concept that packages file system hardware and software with a complete storage I/O subsystem as an integrated file server solution. All of the technologies that have already been discussed in this book, plus additional management features, can be rolled into a single NAS product offering. NAS systems are often referred to as storage appliances because, like kitchen appliances, they are designed for ease of use and convenience. They save significant administration effort compared to traditional file server systems. NAS bypasses the selection and installation of system, CPU, memory, network cards, storage HBAs, storage devices and subsystems, network file system software, and specialized device drivers. The typical installation of a NAS server appliance can be accomplished in a matter of minutes, as opposed to hours or days. NOTE The concept of NAS was developed by Auspex, a company founded by Larry Boucher, one of the most influential and visionary engineers in the history of storage networking. Larry was at Shugart and Associates as one of the inventors of SCSI, and he left there to start Adaptec, a very successful developer of SCSI controller technology, before starting Auspex. His current company, Alacritech, is a developer of TCP offload technology for iSCSI. Few people have contributed more to the storage industry and, more admirably, to their community. All props to Larry. NAS DesignsNAS appliances come in a wide variety of packages and prices. Enterprise-level NAS systems can cost more than $100,000, and entry-level NAS systems with a single disk drive may cost less than $300. Some of the design options that go into NAS products are discussed in the following sections. Physical DimensionsNAS systems come in a wide variety of shapes and sizes, from small desktop packages barely larger than a disk drive to modules that fill complete 19-inch vertical rack spaces. Storage CapacityNAS systems are sold with a few hundred gigabytes of usable storage capacity and extend up to tens of terabytes. Capacity is one of the primary design decisions of any storage product, NAS included. The number of disk drives, the selection of RAID technology used, and the number of I/O channels are some of the primary storage elements that make up a NAS design. Equally important are the power subsystem and the physical cabinet dimensions. Interconnect and Disk Technology SelectionNAS products are typically designed with a particular kind of interconnect and disk technology, including ATA, SATA, parallel SCSI, and Fibre Channel. This selection has a significant impact on the capacity, scalability, and performance of a NAS system. Operating SystemNAS products have historically been implemented using a commercially available or specialized operating system. The primary commercial operating systems include Linux, FreeBSD, and Windows. Often some type of customization is done to the operating systems to minimize nonstorage functions and optimize storage functions. Specialized operating systems have been used to excellent effect by some NAS vendors. Instead of trying to reengineer an existing commercial operating system, with all its underlying complexity, NAS developers can build their own special-purpose operating systems optimized for NAS functions. Sometimes these operating systems are referred to as micro-kernel operating systems because their kernels are much smaller than the kernels of commercial desktop operating systems. File SystemThe selection of the file system to use in the NAS server obviously determines many of its characteristics and capabilities. The file system selection can be made independently of the operating system, but the two are often implemented as a pair. If Windows is chosen as the operating system, it is highly likely that a Windows file system will also be used. However, if Linux or FreeBSD is used, the NAS vendor may be inclined to use an open-source file system such as SGI's XFS or Red Hat's GFS. There are more file system options for open-source operating systems than there are for Windows. Just as special-purpose micro-kernel operating systems have been implemented in NAS systems, special-purpose file systems have also been used very successfully. An independent file system has the advantage of being neutral to Windows and UNIX and therefore can potentially accommodate both types of clients and protocols (NFS and CIFS) more easily. As with the micro-kernel operating system, it may be easier to build and maintain an independent file system than to continue to alter and maintain a "foreign" file system generated externally by another company or as the result of an open-source software initiative. A file system may be chosen for its affinity for certain applications. For example, a NAS design could include a large block-file system to optimize performance for streaming I/O applications. Another NAS system could incorporate a file system with name space virtualization features that could be used in collaborative computing environments or for data management functions. Journaling (or Not)Most NAS systems have journaled file systems to aid in the recoverability of the file system following an unexpected shutdown. However, it is also possible that a file system, such as the Linux file system, could be used, because it does not provide journaling. Journaling takes time, and there are applications, such as real-time data acquisition applications, where recovery is secondary to fast operations. Data ManagementData management refers to the functionality of storing and encoding data files in a way that helps administrators locate and perform operations on files quickly as well as responsibly protect the business from a variety of errors, accidents, and disasters. Data management is distinct from storage management techniques like RAID, mirroring, and remote copy, because data management works on files or other objects, as opposed to operating on blocks of data. One form of data management that has been used successfully with NAS is a file-level snapshot. This allows administrators to keep several aging versions of files available for users if they need them. The topic of snapshots is discussed further in Chapter 17, "Data Management." NOTE Network Appliance has been the industry leader in NAS for many years, based in large part on the integration of data management with its proprietary write anywhere file layout (WAFL) file system. Surprisingly, many would-be competitors have fallen on their swords, not understanding the importance of data management in a product whose reason for existence is storing files. What's with that? Performance and ThroughputNAS systems can be designed for high performance. The speed of the processor used, the number of processors in a system, and the speed and number of network interfaces can all contribute to performance gains in a NAS system. The file system access can also be accelerated. While this is easier said than done, there are companies that have developed hardware and software technology designed to achieve better performance for certain applications, such as genetic sequencing, film/video rendering, and seismic analysis. Availability and ReliabilityNAS systems have a variety of options for data availability. Starting with disk mirroring and RAID, NAS systems can also implement things like hot sparing and remote replication of files to other NAS systems. Some NAS systems can be implemented as clusters so that if one NAS system fails the other can continue supplying applications with file data. Dual-Function Storage: NAS and SANIt is possible to integrate the functionality of both NAS and SAN storage within a single product. After all, if disk drives will be spinning inside a cabinet, why not allow them to be used as either block or file storage? This type of dual-function subsystem depends on having an integrated network file system (file service and file system) running in the subsystem in addition to a storage controller that creates storage address spaces. The software for the network file system could run on processors and memory in the subsystem controller circuit board, but it could also run on a separate processor circuit board. Figure 15-3 shows a dual-purpose storage subsystem. Half of it is used for storing-level block I/O, and the other half is used for NAS file serving. Figure 15-3. A Dual-Purpose NAS + SAN Storage SubsystemNetwork Domain ServicesNAS systems can participate in Windows networks as domain controllers, much the same as file servers running Windows file serving software. This is usually accomplished in NAS systems through the use of licensed Windows software or SAMBA. Backup and Recovery of NAS Systems with NDMPNAS systems built with reduced-functionality or micro-kernel operating systems have scant, if any, support for tape backup hardware and also usually lack backup agent software that works with network backup software. These systems have basically two options for backup: back up over the LAN or use Network Data Management Protocol (NDMP). To back up over the LAN, a backup system mounts the network file system and then copies files from it as if it were a local disk volume. This approach works well for small NAS systems but is not necessarily realistic for larger NAS systems due to the amount of data that regularly needs to be backed up. Instead, they are usually backed up with NDMP. NDMP is a protocol for remotely controlling SCSI tape equipment connected to a NAS system. It can be thought of as a TCP/IP tunneling protocol for SCSI data between a backup server and a NAS system. The backup server issues SCSI commands that are sent via NDMP to the NAS system. The NAS system extracts the commands and sends them to tape equipment. Similarly, SCSI command responses from tape equipment are received and processed by the proxy driver before they are transmitted to the backup server over the NDMP connection. Figure 15-4 shows the basic NDMP backup model. Figure 15-4. The NDMP Backup ModelNAS ScalabilityBeyond the techniques of using RAID or disk striping, NAS systems are often designed for expandable storage capacity. Most have a way to increase the number of additional disk drives or disk drive expansion cabinets that can be connected. Other capacity technologies relevant to NAS are discussed in the following sections. Dynamic File System ExpansionSome NAS systems have dynamically expandable file systems that can take advantage of new capacity as it is added. While this might seem like an obvious feature to include, the storage devices of many NAS systems with ATA or SCSI disk drives do not allow new storage to be added without shutting down the system. Beyond that there needs to be some way to initiate a process that recognizes changes to the storage address space. NAS HeadRelated to the discussion of dynamic file system expansion are NAS products designed to be attached to independent block I/O storage subsystems. NAS heads are basically NAS systems with the operating system and network file system running, but without the bulk storage that would be used to store files. Users or systems integrators are expected to connect the NAS head existing storage and use the storage address space(s) available there. While this might seem counterintuitive for a product class that is commonly thought of as being complete storage appliances, the advantage of the NAS head approach is extended scalability for the NAS file system. If the storage address space of a NAS system is constrained by its physical configuration, there is a point where no more disk storage can be added, which creates problems for administrators (see the next section). If storage can be added to a NAS system by attaching to it through a SAN, it is possible to create NAS systems that are much more scalable. However, scalability is not limited only by physical constraints. Other I/O and logical limitations for file systems prevent endless scaling through a NAS head. Adding More NAS Systems: Less Than WonderfulThe simplicity of NAS eventually becomes problematic when the maximum capacity of the system is reached. At that point the only option is to install another NAS system. Sometimes, this is done as a forklift upgrade, where a new, larger system replaces an older one that is out of capacity. Other times, a system is introduced and the workload is divided between the old and new systems. While these sound simple enough, neither process is as straightforward as it would first appear. In the first case, the data on the old system is copied from the old system to the new one. If there is a lot of data to copy, the process can take many hours. When the data is completely copied, the old server needs to be removed from the network. If it is being used as a domain controller, that function also must be accounted for. Obviously it is important that the transfer of data be 100 percent complete. Administrators may run complete system backups for insurance in case something goes wrong with the copy process. The scenario where an additional NAS system is added to the network is more difficult. Some users and applications will continue to use the old system, while others will switch to the new one. Still others may use both new and old NAS systems. Administrators have to determine which clients can access which directories and files on both NAS systems. After the planning is done, the process of changing mount point definitions in client I/O redirectors starts. With potentially hundreds of clients to adjust, this process can take a great deal of administrative effort and can result in a few inevitable oversights and errors that reduce both IT and user productivity. |