The Data Explosion Myth


About the time of the publication of The Holy Grail of Data Storage Management (late 1999 to early 2000), analysts had just begun making the rounds of the storage industry tradeshow circuit to articulate their now-familiar rant about the data explosion that confronted contemporary business organizations. They contended that rates of data growth were predictable and exponential across all organizations. Depending on the analyst, the quantity of data generated by contemporary businesses was growing at an average rate of between 60 to 120 percent per year.

Some credence was given to these estimates by the selective reading of a study produced at the University of California at Berkeley, a study sponsored coincidentally by a leading storage industry vendor. The UC Berkeley study found that the total volume of digital information produced up to the year 2000 would double in 2002. [1] Researchers claimed that approximately two exabytes (10 18 bytes) of electronic data had been generated by millennium 's end (see Table 2-1), and that a rough doubling of this total volume would be seen by 2002.

Within the storage industry, the study was hailed as an academic validation of all the ballyhoo about the data explosion. Vendors seized on the "empirical evidence" in the Berkeley study to make their case for a data explosion, but ignored the fact that the researchers had carefully caveated their findings and had produced a lower estimate of data volume that was about a quarter of the total volume in the upper estimate. They further ignored another finding of the Berkeley professors that the preponderance of new data was not being generated by organizations nor stored on enterprise storage subsystems. Instead, private individuals were creating most of the flood of digital data in what the researchers called "the democratization of data" and were storing their JPEG camera images, their AVI and MPEG digital videos , their e-books, and their personal email on their own PC hard disks.

Thus, it was actually a rather selective reading of the Berkeley study that was used by vendors to reinforce claims about the data explosion. Nevertheless, the myth of the data explosion was subsequently harnessed to another purpose. Specifically, it was used to explain and to justify why networked storage technology ”SAN in particular ”was a "must have" for businesses. Only through the consolidation and centralization of burgeoning digital information into a storage network, the vendors argued, could companies hope to meet storage scalability requirements imposed by the data explosion in a cost-effective way.

Table 2-1. University of California at Berkeley's Estimates of Digital Data Volume by 2000

Storage Medium

Type of Content

Terabytes/Year, Upper Estimate

Terabytes/Year, Lower Estimate

Growth Rate, %

Paper

Books

8

1

2

Newspapers

25

2

-2

Periodicals

12

1

2

Office documents

195

19

2

Subtotal:

240

23

2

Film

Photographs

410,000

41,000

5

Cinema

16

16

3

X-Rays

17,200

17,200

2

Subtotal:

427,216

58,216

4

Optical

Music CDs

58

6

3

Data CDs

3

3

2

DVDs

22

22

100

Subtotal:

83

31

70

Magnetic

Camcorder Tape

300,000

300,000

5

PC Disk Drives

766,000

7,660

100

Departmental Servers

460,000

161,000

100

Enterprise Servers

167,000

108,550

100

Subtotal:

1,693,000

577,210

55

TOTAL:

 

2,120,539

635,480

50

Source: Lyman, Peter and Hal R. Varian, "How Much Information," 2000. Retreived from http://www.sims.berkley.edu/how-much- info on 8/12/2003.

A key argument of the vendors held that a SAN was required to overcome the limitations to scaling imposed by the dominant storage topology of today: server-attached storage (SAS). Figure 2-1 depicts the difference between SAS and SAN.

Figure 2-1. SAN versus SAS topologies.

graphics/02fig01.gif

With server-attached storage (sometimes called direct-attached storage, or DAS), increasing storage platform capacity required that more disk drives be added to the array connected to an application host. To accomplish this task, first, the applications hosted on the server needed to be quiesced and the server itself had to be powered down. Next, the attached storage array had to be upgraded with additional disks. Then, the host had to be powered up again, its operating system rebooted, and the volumes created on the newly expanded array properly registered with the server OS.

Scaling SAS in the manner described above, SAN vendors argued, created a great deal of costly downtime for organizations. Given the data explosion, such costs were bound to increase in frequency and duration. Clearly, said the vendors, an alternative ”and "non-disruptive" ”storage scaling approach was needed.

Fibre Channel SANs provided just such an option, vendors claimed. In a FC SAN, storage volume size could be increased without rebooting a single server, because storage and servers were separate. Just add more disk drives to a SAN volume, even while it was operating to process read and write requests from servers, and, magically, application server operating systems and their hosted application software would "see" the additional storage and begin using it.

There are many problems with the above description of volume scaling in a FC SAN, of course, and these will be fodder for discussion later in this book. For now, we will content ourselves with the argument of the vendors: that their dynamically scalable storage volumes made storage area networks THE mission critical storage infrastructure technology for coping with the data explosion.

On its face, this argument was both airtight and tautological. If you accepted the initial premise , exploding data growth, you had to concede the need for a highly scalable storage topology ”a theoretical capability of a true SAN. [2]



The Holy Grail of Network Storage Management
The Holy Grail of Network Storage Management
ISBN: 0130284165
EAN: 2147483647
Year: 2003
Pages: 96

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net