Capacity utilization efficiency refers to the efficient usage of storage platforms based on a consideration of data access requirements and platform costs. At one level, it is an extraordinarily simple concept to grasp. Data, once written, has a discernable pattern of access. Once committed to disk, the accesses made to data tend to drop an average of 50 percent within three days. Within a month, the frequency of access attempts made to the same data may drop by as much as 90 percent or more. Given this "rule of thumb" in data access trends, you need to ask yourself how much of your infrequently accessed data is being stored on your most expensive high-end storage platforms? That question gets to the heart of capacity utilization efficiency. It goes almost without saying that most organizations today are allocating storage capacity inefficiently. Provisioning storage to applications is one of the two "pain points" most frequently cited by storage administrators in survey after survey (the other being backup). Allocating, then reallocating, storage capacity to applications is a burdensome, time-consuming task that might be made somewhat less onerous through the application of techniques such as LUN aggregation. Randy Chalfant, Chief Technologist for StorageTek, offers the following illustration of the situation. [1]
Chalfant agrees with the observation that, with few exceptions, even with the best virtualization tools and the most efficient capacity allocation program, capacity utilization inefficiency remains. He submits, however, that his company's Shared Virtual Array provides some alleviation of allocation inefficiency. An implementation of a virtualization technique that StorageTek calls "Dynamic Mapping," Chalfant describes the Shared Virtual Array (SVA) as a "lynchpin to virtual architecture because pre-determined storage capacity, which he says "almost always far exceeds the actual amount of real data," can be presented to applications via the technology without actually being physically allocated. This "indirect mapping," he argues, is another (though less often discussed) benefit of virtualization. With such an approach, the "SVA is able to offer the allocated but unused space to other volumes , which in turn increases the overall storage efficiency." Like most virtualization solutions, however, StorageTek's SVA is proprietary and offers platform support that is only as comprehensive as third-party array vendors will allow. Moreover, from the standpoint of capacity utilization, inefficiencies persist. That is mainly because mechanisms do not yet exist to automate the migration of data between platforms based on usage characteristics. Hierarchical storage management (HSM), discussed in detail below, does not currently migrate data based on access frequency, but on the basis of last updates ("write date"). Most HSM packages assume that if a file or dataset has not been updated, overwritten by a new version, within a given timeframe, it should be moved from one part of the physical storage hierarchy to another. Just as current virtualization technology is inadequate to the task of achieving capacity allocation efficiency, current HSM technology is inadequate to the task of capacity utilization efficiency. Ultimately, some sort of capacity utilization efficiency technology is needed, or else organizations will literally bankrupt themselves with amassing storage costs. Ideally, such a technology would provide automated intelligence for migrating data from more expensive to less expensive platforms "in the background" without operator intervention. For this to happen, significant changes are required in how data is named and characterized in the first place. Moreover, a stable, well-managed, and secure, networked storage infrastructure is prerequisite for automated data migration. |