Deconstructing the Data Explosion Myth


However, the first premise of the data explosion myth ”"the predictable and exponential rate of growth of data in all organizations" ”was, and is, a myth. The truth is that analysts have no way of knowing the average rate of data growth in organizations. They lack empirical data, and instead use data on the aggregated capacity of storage products sold into the market as rough guidance. In other words, the analysts are extrapolating trends from highly suspect data sets (vendor sales projections).

At a recent industry conference, this author had the pleasure of chairing a panel discussion that included several leading storage industry analysts. The event afforded an opportunity to query one analyst about the basis for his 100-percent-per-annum data growth rate projection. His first response was to "negotiate": "Okay, perhaps it is closer to 70 percent per year." Pressed further, he conceded that there was no exact way of knowing average rates of data growth, and that storage manufacturers themselves had provided much of his data. The audience chuckled, and the analyst, taken aback, added, "Well, I did poll a number of end-users who told me that the numbers seemed to be in line with the experience in their shops ."

Since that time, analysts have protested over and over again that their data growth projections are validated by interviews with clients who are both end-users of storage technology and also subscribers to the analyst's reporting and analytical services. The problems with this argument are many.

  • It is based on inductive reasoning: The practice of generalizing from a few specific examples is hardly a logical foundation for discerning trends. A deductive approach has long been preferred to an inductive approach for confirming the validity of any theory, especially in the absence of extremely large empirical data sets with a high degree of reliability. (Figure 2-2 shows the difference for anyone who missed logic class in school.)

    Figure 2-2. Inductive versus deductive reasoning.

    graphics/02fig02.gif

  • It depends on the accuracy of input from surveyed customers: Most IT professionals (and their solution providers) will tell you that the majority of businesses are clueless about actual rates of data growth in their storage environments. There is so much stale data, unnecessarily replicated data, junk data, and non-business- related data stored on corporate disk drives that data growth rate analyses, if indeed any are performed, reveal grossly inflated growth rates. Moreover, surveying end-users is a process that is notorious for being subject to corruption. Respondents will prevaricate for any number of reasons ”to increase funding for preferred acquisitions, to appear " intelligent " to the interviewer, to justify or conceal bad decision making or poor acquisition choices, etc. So, end- user reports are suspect all around.

  • The augmentation of consumer survey data with storage platform expenditure data proves nothing: Substituting storage platform expenditures/ revenues for actual data growth assessments may seem like an acceptable practice to understand the data explosion, but it really isn't. In many shops, new storage platform acquisitions do not reflect data growth, but poor data management. For example, in the absence of effective storage capacity management and provisioning tools, applications may appear to need an ongoing infusion of additional disks mounted in the cabinets of standalone storage arrays. However, with effective provisioning tools, it may be possible for the high demand application to obtain additional resources from the storage platforms whose capacities are allocated to low demand applications. This, in turn , would forestall the need for more storage arrays. In such a case, the rate of data growth will not have changed, only the effectiveness of the use of existing resources. However, the rate of data growth would not be intuitively obvious if one looks merely at expenditures for new storage arrays.

The cynic might claim that the industry analysts worked in cahoots with storage vendors to create the specter of a data explosion. While such an allegation would be difficult to prove , it is interesting to note that the data explosion myth became the mantra of analysts at just about the same time that interest was waning in the analyst community's ongoing work in such "red herring" issue areas [3] as thin computing, first-generation application service provisioning, and "the dot.com revolution." It may have been a coincidence that the data explosion myth appeared at the very moment that the analyst community needed a new "cash cow" ”an original hot topic to drive the sale of their information products and services. Certainly, for many industry analysis firms, the appearance of the data explosion was certainly serendipitous.

In point of fact, the only way for analysts to obtain good information on rates of data growth is to consult end users that have implemented exhaustive analyses of their current storage capacity utilization trends. This kind of data is rare because of the effort and cost involved in collecting it. Most companies have limited data to substantiate presumed rates of data growth and many report, after deploying storage topology discovery tools, that they found storage platforms nested away in closets and equipment rooms that they didn't even know existed!

The bottom line is that claims of a data explosion are largely unsubstantiated, an inference based on flimsy evidence. While it is doubtless that data is growing rapidly in many organizations, just how rapidly is a matter of conjecture ”especially, in the absence of effective storage management.

At NASA's Goddard Space Flight Center, it took the earnest efforts of a team of researchers, armed mainly with tenacity and a mandate , nearly two years to produce a meaningful estimate of data growth (over 1 TB of new data would be added daily commencing in year 2000). [4] Data on capacity utilization trends was certainly difficult to come by in a complex environment like GSFC, and it required the allocation of scarce resources by Dr. Milton Halem, then GSFC's gifted and dedicated chief information officer, who was in need of the data to develop a strategic plan for cost-efficient IT growth and expansion. Few people in the public or private sector have exercised as much due diligence in finding out the facts of storage growth in their own environments.

The explanations for this deficit are several. Many have told this author that the software tools for ferreting out data growth trends are inadequate to the task, and that without the software, the job is simply too difficult to complete. Others have observed that it may not be data that is out of control, but end-users. Particularly with the advent of email, storage managers lack any effective way to police data growth rates because end-users have the ultimate control over what data gets saved and what data is discarded. Still others have decried the lack of consistent or enforced storage administration policies as the culprit behind unmanaged data growth: Chief Information Officers (CIOs) change every 18 to 24 months in many firms and each new executive brings in his or her own preferences as to vendors, technology, and policies. Moreover, in many organizations, the management of IT is not centralized at all, and corporate IT professionals complain that it is difficult or impossible to gain the cooperation of individual departmental or business unit managers ”or individual system administrators ”to come up with information on data growth or a set of policies for managing it.

In the final analysis, most organizations have no idea about the rate of data growth within their own shops. They only know that unmanaged growth is costing them money. Every time a server issues a "disk full" error message in response to an application write request, downtime accrues while technical staff add disk to their array or cart out another server with a new array attached.



The Holy Grail of Network Storage Management
The Holy Grail of Network Storage Management
ISBN: 0130284165
EAN: 2147483647
Year: 2003
Pages: 96

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net