Implementing the Plan | Storage Networks: The Complete Reference

Now comes the fun part: the implementation of the storage network configuration. Its time to develop or redevelop the SAN or NAS configurations to suit your particular service levels. The configuration options, driven by service levels and both internal and external requirements, demand a particular level of availability. These are generally categorized as on-demand availability, highly available systems, or fault-tolerant systems. In the end, sensitivity to the end user is an external call and end users obviously have final say in determining their value. These categories are discussed next .

On-Demand Availability These systems support 80 to 95 percent availability for defined processing periods. They generally support transactional systems that do not significantly impact the daily operations of the company if they are down for a reasonable time. Most of these systems are data warehouse systems that are prone to outages given the level of data and their reliance on background processing. Others are those that provide purely informational services to internal users of a company, such as intranet systems, inventory, and educational systems. Figure 22-1 shows a typical cascading configuration supporting a data warehouse.

Figure 22-1: A cascading SAN configuration supporting on-demand availability
Highly Available These systems require 99.999 percent uptime, and given the right resources they can achieve this. They are characterized by applications that have a significant impact on the companys business if they are down. This includes systems such as OLTP financial and banking applications (for obvious reasons), retail and point-of-sale transactional systems, and customer relationship systems that allow customers to place orders, as illustrated in Figure 22-2.

Figure 22-2: A core /edge configuration supporting a highly available CRM application
Fault Tolerant These systems cannot afford any downtime and must be available 24/7. This includes transportation systems such as those supporting air, emergency, mass transit, medical, and emergency 911 systems. For obvious reasons, these need to be supported by fully redundant systems that can provide full failover capability so operations remain uninterrupted. Figure 22-3 depicts a redundancy mesh SAN configuration supporting an emergency medical response application.

Figure 22-3: A redundant mesh SAN configuration supporting a 24/7 application

If we contrast these availability categories to SAN and NAS configurations, we find some simple guidelines can be applied to orient them toward appropriate configurations.

On-Demand AvailabilityNAS and SAN

NAS configurations for on-demand availability can be contained with entry- and workgroup- sized NAS devices. Availability levels in the 80 to 90 percent uptime range are generally acceptable in supporting shared files and network file services for PCs. Given the reliability of the NAS devices performing this type of work, the storage availability will likely be close to the five 9s; however, the variable in keeping these service levels is the external effects of application servers and the local network. Given that scenario, if the external effects are managed within the internal service levels through agreements with your colleagues in the systems and network areas, and external agreements with end users (see Chapter 24) make these effects known, the NAS entry and workgroup devices will perform well.

SAN configurations supporting these areas can be simple or extended cascading configurations where the level of path redundancy is not required, and the performance factors are more oriented toward availability of the data. This will largely be a matter of the reliability of the FC storage devices rather than the switch configuration; however, with todays RAID array functions and their support of larger data segments, the reliability of these devices will be quite high. Because of this, data uptime can exceed 90 percent.

Highly Available SystemsSAN maybe NAS

For systems that require the five 9s, the more robust the underlying devices and the more flexible the configuration, the better the solution. As these systems will likely support the use of RDBMSs for data organizational models, the use of NAS becomes somewhat problematic . In terms of SANs, this means an entry point at which the robust nature of its storage devices and the flexibility of its switched fabric begins to show real value. However, the trade-off is increased complexity, as support for these availability models requires a core/edge configuration to ensure performance that includes consideration for interswitch duplicate paths that are necessary for redundancy. In some cases, the entire SAN configuration may need to be duplicated within a cluster operation, whereby a standby SAN configuration holds the failover system. In the end, this is very expensive and not recommended unless absolutely necessary.

Fault-Tolerant SystemsSAN only

Systems requiring full redundancy may either be highly proprietary systems that provide multiple levels of failover (such as 4- to 8-way clustering systems with OS extensions to synchronize the failover functions), or systems with a single failover used solely as a standby. Other systems may be fully fault tolerant in a tightly coupled system of MPP or high-end SMP architectures (see Chapters 7 and 8 for additional information on MPP and SMP configurations). Regardless of their configuration, these systems require storage. Many of these systems, being UNIX-based, can participate in SAN configurations leveraging the ability to access either shared storage at the hardware level, more sophisticated (and proprietary) shared data systems at the software level, or a combination of both. This brings the value of SAN configurations further into focus.

A word about NAS when used in fault-tolerant environments. Its not true that NAS solutions cannot be used in these environments; there are several caveats that must be considered . Among these are the uses of an RDBMS within the application, and subsequently in housing the database system and user data. If thats the case, NAS is more likely to be problematic given it has limited abilities with the relational data organizational model (see Chapters 9 and 11 on NAS architectures and NAS Software, respectively). In addition is the interconnection using a TCP/IP-based network with a speed fast enough to support failover operations providing a mirrored data environment. If this requires the installation of a short hop special 10Gbe switch and cabling, the cost may or may not be worth the effort. Lastly will be the flexibility affording the switchs network configurations. Each switch should have accommodations for redundant paths to handle additional as well as alternative paths for data communications.

Storage AvailabilityConsidering RAID

RAID storage configurations are available for both SAN and NAS configurations. Certainly from an availability perspective, RAID has become a requirement for most data centers. Given its implementation, reliability, and cost, it has become a defacto standard for applications. Thus, the question becomes: what level of RAID to use for particular applicationsor in terms of this book, what level to use for the I/O workload.

Here, RAID is used in basically two configurations: level 5 and level 1. Other RAID levels are used for specific application support like RAID level 10, also referred to as 0 +1, where files are duplicated with higher performance. However, this comes at the sacrifice of non-stop reliability given there is there is no parity information calculated or stored in this configuration. Other types, such as RAID level 4, are bundled into some NAS vendor solutions.

Considerations for RAID and its associated applications and availability include the following:

RAID Level 5 This level of RAID provides data striping on each device within the storage array. The parity informationthe information the RAID software or firmware uses to reconstruct the data if a disk is lostis also striped across the array. This configuration is generally used for high-transactional OLTP applications that require a high-availability service level. The caveat is the write- intensive nature of the application that will increase I/O latency given the increased write operations, as well as the storage array for the multiple writes necessary with the data and the parity information.
RAID Level 1 This level of RAID provides data mirroring with parity so the application can continue to run if one of the disk pairs is inoperative. This requires more storage capacity given its a true duplication of the data, thus allowing a 2 to 1 ratio for data capacity (in other words, one byte of user data requires a duplicate or mirror). Consequently, a 100GB database table that uses RAID level 1 would require 200GB of storage on different disks. Additional data requirements for parity information are not significant and should not be taken into consideration. RAID level 1 is used for applications that require failover availability, or data-sensitive applications that require data to be available in case of corruption. This comes in handy with applications such as data warehousing, and some OLTP using failover configurations.