NAS boxes remain thin servers with large storage capacities that function as dedicated I/O file servers. Requests from clients are routed to the NAS server through network file systems that are installed on subscribing application servers. As a result, NAS configurations become dedicated I/O extensions to multiple servers. Usage has given way to popular relational databases (for example, Microsoft SQL/Server and Oracles relational database products), although these remain problematic given their file orientation (see the section titled NAS Caveats later in this chapter).
NAS servers work today in a number of different environments and settings, the most diverse being in storage networking. However, their value continues to focus on data access issues within high-growth environments, as well as how they address particular size challenges found in todays diversity of data types.
The inherent value of NAS continues to be its capability to provide storage quickly and cost effectively by using the resources that already exist in the data center. Todays solutions offer compatibility in both UNIX and Windows environments and connect easily into users TCP/IP networks.
A typical solution for handling a large number of users who need access to data is depicted in Figure 3-8. Here, an end-user constituency of 1500 is represented. In the scenario, the users access data files under 100KB in size for read-only purposes that are stored across a number of servers. If the user base increases by another thousand over the next year, the workload could be handled by upgrading current servers as well as adding new servers with larger storage sizes to support this increase in data and traffic. This solution, however, would be costly and difficult to manage.
However, as Figure 3-8 shows, the NAS solution is used to combine the data spread across the current ten servers, placing it on the NAS storage device. Users requests for data now come to one of the five application servers attached to the network and are then redirected to the NAS box where their request for data from a particular file is executed. Upon completion of the requested I/O operation within the NAS box, the data request is redirected to the requesting user. Because the NAS devices are optimized to handle I/O, their capacity for I/O operations exceeds typical general-purpose servers. The increased I/O capacity accounts for the collapsing of servers from ten to five. Moreover, they can scale higher in storage capacity. As the users and their respective data grows, increasing the NAS devices provides a more scalable solution for handling increased data access requirements within the scenario.
This solution does two major things for the data center. First, it provides a much larger window in which the applications can operate before they reach the non-linear performance curves. Given that both users and storage resources can increase on a linear basis, this adds to the value and stability of the enterprise storage infrastructure and can support multiple applications of this type through these configurations.
Secondly, it provides a cost-effective solution by applying optimized resources to the problem. By utilizing resources that existed within the data centerfor example, existing server file systems and the TCP/IP networkit eliminated the need for additional general-purpose servers and associated network hardware and software. More importantly, it reduced the need for additional personnel to manage these systems.
The ability of a single server to handle large amounts of data is becoming a common problem. As Figure 3-5 illustrates, a configuration supporting storage of geophysical data can be tremendously large but with relatively few users and less-sensitive availability requirements. However, storing gigabyte- sized files is a problem for general-purpose servers. Not only is it difficult to access the data, but its also very costly to copy the source data consisting of external storage devices such as tape to the processing servers.
By adding a NAS solution to this scenario (see Figure 3-9), the data, as in our first example, is consolidated on the NAS devices and optimized for I/O access through the network. Although the sizes of the files accessed are much larger, the number of users are low and the resulting user requests for data can be handled by the existing network. Therefore, the current resources are effectively utilized within the required service levels of the sophisticated applications.
In this case, the user requests for data are similarly received by the server and redirected to the NAS devices on the existing network. The NAS completes the requested I/O, albeit one that is much larger in this case, achieving levels of 300 to 500MB per file, and is then redirected by the server back to the requesting user. However, an additional anomaly to this example is the sourcing of the data through tapes that are sent to the data center. The tapes are read from the server and copied directly to the NAS devices. Conversely, data from the NAS is archived back to tape as new simulations are developed. Given that multiple servers can reach the NAS devices, multiple input streams can be processing simultaneously in order to perform I/O to and from the offline media, which in this case are the tape units.
This solution, although different in perspective from our first example, provides the same benefits. The larger window of operation for the application itself demonstrates that appropriate storage resources balanced with data size and the number of users will stabilize the linear performance of the application. It also shows that NAS provides a more cost-effective solution by consolidating the data within an optimized set of devices (for instance, the NAS devices) and utilizing the existing resources of the data center which in this case were the server and associated file systems and network resources.