Storage Pooling

There are many significant advantages to centralizing the management of storage in an organization. Centralized storage management gives administrators the best control over and information about how storage is being used. One of the most powerful concepts in storage networking is a storage management approach called storage pooling. Storage pooling is usually associated with storage subsystems and SAN virtualization systems, but it can also apply to volume management software.

The sections that follow discuss various benefits of storage pooling with SAN virtualization point products.

Storage Utilization with Pooled Storage

Storage utilization is expressed as a percentage of storage that is populated with data as a part of the total storage capacity available. Storage utilization measures the capacities of storage assigned to file systems and databases as well as unassigned storage that has not been allocated to any systems yet.

It is generally very difficult to accurately predict what the storage capacity requirements will be for a given system and what the growth rate will be. Typically, storage is purchased in bulk amounts, anticipating an initial need for capacity with a projected growth rate for several years. The problem is that some systems grow as projected, but often they grow faster or slower than expected. Data that grows faster than expected will require administrative action sooner than planned for. A system with lower-than-expected growth rates may wind up having lots of unused storage capacity. It's not unusual for a business with a lot of systems to have both higher-than-necessary storage administration costs and underutilized storage.

Obviously, it would be best if storage space could be flexibly assigned to different systems as it is needed; that way, companies would not have to over-purchase storage that may go unused. In other words, the storage capacity could be available on demand. Storage pooling is one of the key SAN applications because it provides a solution to these utilization problems.

Utilization problems are easiest to identify using direct attached storage (DAS) technology like parallel SCSI. With DAS, the storage attached to a system can be accessed only through that system. Data availability is at significantly higher risk due to the unexpected, premature filling of storage and the need to shut down the system to fix the problem. If one system's storage is full and another system's storage is mostly empty, it only makes sense to try to reassign that storage. However, if you want to assign disk storage from one system to another, you must first physically move it from the subsystem it is in and place it in another subsystem. This requires shutting down both systems and involves a fair amount of administrative work at odd hours of the day or on weekends.

Storage utilization can also impact I/O performance. Storage that is overutilized may cause applications to run much more slowly, causing a variety of problems for the IT organization. As it is with processors and systems, you want to operate with optimal price/performance ratios. Disk capacity is too expensive to be insufficiently populated, and applications should be able to run without I/O performance bottlenecks.

NOTE

There are no solid rules of thumb for optimal disk utilization levels, because it depends on the application. Some applications can perform well at high utilization levelssay, between 80 percent and 90 percentwhile others need to be in the 60 percent to 70 percent range.

To best understand, we'll examine an IT environment with three servers, each with its own DAS storage. When the servers were installed, the IT staff projected growth rates based on expected application usage. Referring to Figure 12-11, Server A has a data growth rate that matches the company's projections. Server B has been a surprising success, and its data growth has been much faster than expected. Soon it will need to add more storage. Server C has an application that has had problems, and its storage is mostly unused. It was expected to need a lot of storage, but instead it needs very little. The unused DAS storage attached to Server C will be wasted.

Figure 12-11. Storage Utilization Without Pooling

Using storage pooling in a SAN, it is possible to assign smaller amounts of storage initially and add more when it is needed. In other words, smaller virtual address spaces can be used as opposed to assigning the whole storage subsystem's capacity. Figure 12-12 shows the servers pictured in Figure 12-11, but with smaller initial storage configurations. Server A is growing as planned and will probably need to have more storage allocated from the pool at some future time. Server B has already outgrown its initial allocation of storage but had more storage added from the pool. Server C has not used as much storage as expected, but there is far less wasted storage than in Figure 12-11.

Figure 12-12. Storage Utilization with Pooling

While storage pooling is a powerful application of virtualization, it is important to understand that appending new storage address spaces does not finish the work completely. The filing system that uses that storage address space must be resized to use the space. This is not necessarily a trivial task and may involve some system downtime. Still, using pooling significantly minimizes both administrator effort and any data availability shortages.

Scalability of Pooled Storage

Beyond the flexibility provided by storage pooling to easily expand storage capacity, it also supports the creation of extremely large storage address spaces. Unlike storage subsystems that are constrained physically by the cabinet size, the number of disk drive connectors, and power capabilities, storage pooling in a volume manager or SAN virtualization engine scales with the network. In most cases the size limitation of the virtual storage address space is more a function of the scalability limitations of backup and recovery for protecting a very large address space.

NOTE

So what about subsystems? you ask. Can't they provide storage pooling too for this kind of seamless storage address space expansion? The answer is yes, they canup to the capacity and internal connectivity limits of the subsystem. In fact, many companies have successfully used their subsystems this way.

A SAN virtualization system can be described as a subsystem controller pulled out of a subsystem and stuck in the SAN. The major difference is the total amount of storage that can be addressed. SAN virtualization products can connect to almost any available storage in the network. Most storage subsystems can't. That might change someday, but that's the way it is today.

Target Mix and Breadth in Pooled Storage

Storage pooling accommodates a wide range of dissimilar types of storage targets and allows them to be used together. For instance, data stored on a large enterprise storage subsystem with high-performance Fibre Channel or SCSI disk drives could be backed up to a much less expensive ATA or SATA storage subsystem.

Remote copy applications could also use different classes of storage for local online and remote offline data access. Figure 12-13 shows a SAN virtualization system with an integrated remote copy application that connects to a local enterprise disk subsystem and copies data to an inexpensive remote subsystem.

Figure 12-13. Remote Copy Through a SAN Virtualization System to an Inexpensive Remote Subsystem

The example shown in Figure 12-13 expresses the desire of many companies that would like to use less expensive storage for remote data and business continuity purposes. They cannot do so because the remote copy application on their primary storage subsystem does not support lower-cost subsystems on the remote end.

Tiered Storage Pools

The concepts of primary and secondary storage were used in the preceding section to describe an application-driven distinction between two different classes of storage. One of the seminal concepts in storage pooling is the definition and creation of different storage tiers to respond to different application needs. In most cases this amounts to finding optimal cost levels for certain performance goalsin other words, getting the most bang for your buck.

Performance Classifications for Tiered Storage

The following sections describe some of the primary performance characteristics that could be associated with tiered storage.

Device Performance

Disk drive performance topics were discussed in Chapter 4, "Storage Devices." A storage tier definition should include metrics for disk rotation speed and seek time. Seek time can be further reduced by short-stroking the drive for the highest-performance applications.

Parallelism in Storage

A useful storage tier definition includes the number of member drives in RAID arrays. For instance, an array with ten drives is much more likely to provide better transaction processing performance than an array with five drives.

Parallelism should also be extended to include considerations for an application's read/write mix. Applications performing more than 20 percent of their I/O operations as writes may want to exclude RAID 5 from that storage tier.

Capacity Utilization Levels

The capacity utilization of storage has an impact on performance. Fragmentation of disk drives and hot spots tends to be more problematic as capacity utilization increases. A policy for maximum utilization of a given storage tier could be done through the use of storage resource management (SRM) software, as discussed in Chapter 14, "File System Fundamentals."

Block Size

The size of the disk blocks defined by the filing system can have a significant performance impact on an application. For instance, large block definitions improve streaming applications by reducing the number of seeks disk drives need to make. Conversely, small block definitions may be useful for transaction processing applications by limiting the amount of data transferred in each I/O.

NOTE

Some of the performance parameters that could be used to classify storage, such as block size and capacity utilization, are not determined or managed by either volume management software or SAN virtualization systems. In order to add them to a performance classification, there needs to be some way to integrate this information from file systems or storage management products.

Application Types for Storage Pooling

As discussed in the preceding section, storage pooling can include different types of storage products and implementations to get the best mix of price, performance, and capacity for different types of applications. Some of the different types of applications that could be optimized with specialized storage pools are

Database (transaction processing)
Streaming: scientific and engineering
Graphics
Office automation

Some hypothetical storage pools are defined in Table 12-1.

Table 12-1. Hypothetical Storage Pool Definitions for Common Applications
Application Type	Device Performance	Parallelism	Cache	Utilization	Block Size
Database	High-speed	High	Yes	6070%	Small
Streaming	High-speed	Medium	No	7080%	Large
Graphics	Medium	Medium	Yes	7080%	Large
Office	Low	Low	No	8090%	Medium

Tiered Storage Pools for Data and Storage Management

Data and storage management applications such as point-in-time snapshot, remote copy, backup, archiving and hierarchical storage management (HSM), are excellent candidates for the application of lower-cost tiered storage. In general, the performance of these applications does not have to be exceptional, but they may need to scale to very large sizes.

Storage Utilization with Pooled Storage

Figure 12-11. Storage Utilization Without Pooling

Figure 12-12. Storage Utilization with Pooling

Scalability of Pooled Storage

Target Mix and Breadth in Pooled Storage

Figure 12-13. Remote Copy Through a SAN Virtualization System to an Inexpensive Remote Subsystem

Tiered Storage Pools

Performance Classifications for Tiered Storage

Device Performance

Parallelism in Storage

Capacity Utilization Levels

Block Size

Application Types for Storage Pooling

Table 12-1. Hypothetical Storage Pool Definitions for Common Applications

Tiered Storage Pools for Data and Storage Management