However, the first premise of the data explosion myth ”"the predictable and exponential rate of growth of data in all organizations" ”was, and is, a myth. The truth is that analysts have no way of knowing the average rate of data growth in organizations. They lack empirical data, and instead use data on the aggregated capacity of storage products sold into the market as rough guidance. In other words, the analysts are extrapolating trends from highly suspect data sets (vendor sales projections). At a recent industry conference, this author had the pleasure of chairing a panel discussion that included several leading storage industry analysts. The event afforded an opportunity to query one analyst about the basis for his 100-percent-per-annum data growth rate projection. His first response was to "negotiate": "Okay, perhaps it is closer to 70 percent per year." Pressed further, he conceded that there was no exact way of knowing average rates of data growth, and that storage manufacturers themselves had provided much of his data. The audience chuckled, and the analyst, taken aback, added, "Well, I did poll a number of end-users who told me that the numbers seemed to be in line with the experience in their shops ." Since that time, analysts have protested over and over again that their data growth projections are validated by interviews with clients who are both end-users of storage technology and also subscribers to the analyst's reporting and analytical services. The problems with this argument are many.
The cynic might claim that the industry analysts worked in cahoots with storage vendors to create the specter of a data explosion. While such an allegation would be difficult to prove , it is interesting to note that the data explosion myth became the mantra of analysts at just about the same time that interest was waning in the analyst community's ongoing work in such "red herring" issue areas [3] as thin computing, first-generation application service provisioning, and "the dot.com revolution." It may have been a coincidence that the data explosion myth appeared at the very moment that the analyst community needed a new "cash cow" ”an original hot topic to drive the sale of their information products and services. Certainly, for many industry analysis firms, the appearance of the data explosion was certainly serendipitous. In point of fact, the only way for analysts to obtain good information on rates of data growth is to consult end users that have implemented exhaustive analyses of their current storage capacity utilization trends. This kind of data is rare because of the effort and cost involved in collecting it. Most companies have limited data to substantiate presumed rates of data growth and many report, after deploying storage topology discovery tools, that they found storage platforms nested away in closets and equipment rooms that they didn't even know existed! The bottom line is that claims of a data explosion are largely unsubstantiated, an inference based on flimsy evidence. While it is doubtless that data is growing rapidly in many organizations, just how rapidly is a matter of conjecture ”especially, in the absence of effective storage management. At NASA's Goddard Space Flight Center, it took the earnest efforts of a team of researchers, armed mainly with tenacity and a mandate , nearly two years to produce a meaningful estimate of data growth (over 1 TB of new data would be added daily commencing in year 2000). [4] Data on capacity utilization trends was certainly difficult to come by in a complex environment like GSFC, and it required the allocation of scarce resources by Dr. Milton Halem, then GSFC's gifted and dedicated chief information officer, who was in need of the data to develop a strategic plan for cost-efficient IT growth and expansion. Few people in the public or private sector have exercised as much due diligence in finding out the facts of storage growth in their own environments. The explanations for this deficit are several. Many have told this author that the software tools for ferreting out data growth trends are inadequate to the task, and that without the software, the job is simply too difficult to complete. Others have observed that it may not be data that is out of control, but end-users. Particularly with the advent of email, storage managers lack any effective way to police data growth rates because end-users have the ultimate control over what data gets saved and what data is discarded. Still others have decried the lack of consistent or enforced storage administration policies as the culprit behind unmanaged data growth: Chief Information Officers (CIOs) change every 18 to 24 months in many firms and each new executive brings in his or her own preferences as to vendors, technology, and policies. Moreover, in many organizations, the management of IT is not centralized at all, and corporate IT professionals complain that it is difficult or impossible to gain the cooperation of individual departmental or business unit managers ”or individual system administrators ”to come up with information on data growth or a set of policies for managing it. In the final analysis, most organizations have no idea about the rate of data growth within their own shops. They only know that unmanaged growth is costing them money. Every time a server issues a "disk full" error message in response to an application write request, downtime accrues while technical staff add disk to their array or cart out another server with a new array attached. |