What is Capacity Utilization Efficiency? | The Holy Grail of Network Storage Management

Capacity utilization efficiency refers to the efficient usage of storage platforms based on a consideration of data access requirements and platform costs. At one level, it is an extraordinarily simple concept to grasp. Data, once written, has a discernable pattern of access. Once committed to disk, the accesses made to data tend to drop an average of 50 percent within three days. Within a month, the frequency of access attempts made to the same data may drop by as much as 90 percent or more. Given this "rule of thumb" in data access trends, you need to ask yourself how much of your infrequently accessed data is being stored on your most expensive high-end storage platforms? That question gets to the heart of capacity utilization efficiency.

It goes almost without saying that most organizations today are allocating storage capacity inefficiently. Provisioning storage to applications is one of the two "pain points" most frequently cited by storage administrators in survey after survey (the other being backup). Allocating, then reallocating, storage capacity to applications is a burdensome, time-consuming task that might be made somewhat less onerous through the application of techniques such as LUN aggregation. Randy Chalfant, Chief Technologist for StorageTek, offers the following illustration of the situation. ^[1]

"The picture below (see Figure 8-1) shows the typical storage needs for a database using conventional disk. Some vendors advertise a very attractive price for raw storage capacity in their array products, but in reality mirroring and replication can cause storage efficiency to be low. It is always a good practice to purchase between 20% and 40% more than you expect to use when managing database storage. This growth factor is depicted in the figure as 'unused storage'."

Figure 8-1. Storage allocation to a database.

"When setting up storage for a database, the Database Administrator or Systems/Storage Administrator (DBA/SA) must first determine how much space is required. Once this number is determined, the storage administrator will allocate (or reserve) a volume on the storage subsystem that equals the specified size. Since databases utilize tables, the allocated space initially represents only the table size but no actual data. This variance between the table size and the actual amount of stored data is what we define as 'allocated but unused' storage. If a storage system has many databases stored on it, this allocated but unused storage can add up quickly."

"In order to protect critical data while maintaining 24x7 availability, DBA/SAs may utilize an 'outboard copy' (data replication or 3rd copy). This allows backups to be carried out while the database remains available to the users."

"As this example shows, conventional disk proves to be very wasteful when the actual data needs can be as little as 20 “30% of the total storage capacity. The bottom line is that for databases, conventional storage offers poor storage efficiency and high cost."

Chalfant agrees with the observation that, with few exceptions, even with the best virtualization tools and the most efficient capacity allocation program, capacity utilization inefficiency remains. He submits, however, that his company's Shared Virtual Array provides some alleviation of allocation inefficiency.

An implementation of a virtualization technique that StorageTek calls "Dynamic Mapping," Chalfant describes the Shared Virtual Array (SVA) as a "lynchpin to virtual architecture because pre-determined storage capacity, which he says "almost always far exceeds the actual amount of real data," can be presented to applications via the technology without actually being physically allocated. This "indirect mapping," he argues, is another (though less often discussed) benefit of virtualization. With such an approach, the "SVA is able to offer the allocated but unused space to other volumes , which in turn increases the overall storage efficiency."

Like most virtualization solutions, however, StorageTek's SVA is proprietary and offers platform support that is only as comprehensive as third-party array vendors will allow. Moreover, from the standpoint of capacity utilization, inefficiencies persist. That is mainly because mechanisms do not yet exist to automate the migration of data between platforms based on usage characteristics.

Hierarchical storage management (HSM), discussed in detail below, does not currently migrate data based on access frequency, but on the basis of last updates ("write date"). Most HSM packages assume that if a file or dataset has not been updated, overwritten by a new version, within a given timeframe, it should be moved from one part of the physical storage hierarchy to another. Just as current virtualization technology is inadequate to the task of achieving capacity allocation efficiency, current HSM technology is inadequate to the task of capacity utilization efficiency.

Ultimately, some sort of capacity utilization efficiency technology is needed, or else organizations will literally bankrupt themselves with amassing storage costs. Ideally, such a technology would provide automated intelligence for migrating data from more expensive to less expensive platforms "in the background" without operator intervention. For this to happen, significant changes are required in how data is named and characterized in the first place. Moreover, a stable, well-managed, and secure, networked storage infrastructure is prerequisite for automated data migration.