Data Protection


There have been some subtle changes in the way we as users think of safeguarding our data. It was not that many years ago when those of us with home-based PCs likely backed up our PC disks periodically to a cartridge tape device or floppy disk. Some of us learned through painful experience that hard drive crashes could result in the loss of programs, documents, spreadsheets, photos, charts of family trees, closely guarded family recipes...you name it. They were gone forever if backup copies were not on hand when the replacement PC arrived. Even if we did not learn the value of this basic computing procedure the hard way, we almost certainly knew of someone who had.

A couple of interesting things transpired over the past few years. For one, the average user of a personal computer is no longer a techno geek; the majority of users are now "average citizens"homemakers, students, independent consultants, retireesjust about anyone and everyone. At the same time, the capacity of a single PC disk drive grew from about 100MB to 100GB (an increase of 1,000 x). For many reasons, we still do not properly back up the data we depend on daily. In no specific order, they are as follows:

  • We think we cannot. The disk drives have become too big. A quarter or half a terabyte cannot be backed up in any reasonable time. We could use CDs or DVDs, but we would be feeding the machine for days and wind up with a stack of DVDs several feet high. (However, online backup services are now available that will store as much data as we want to give themfor a price, of course.)

  • We do not know that we should. Many computer users still do not understand where and how PC data is stored. Information appears at a mouse click. Storage? What's storage?

  • Trust. Many of the newer breed of users just assume that either the machine will not break (or be stolen) or that someone else is responsible (the manufacturer maybe?) for data protection. Perhaps they are conditioned by their other life experiences. When one goes to the bank and hands the teller $10 to deposit (okay, no one actually goes to the bank anymore, but hang in there with us anyway), one has the 100 percent expectation that any branch or ATM connected to that bank will give you your $10 back when you want it. A bank that says, "We had a computer glitch and lost the record of your deposit" would arouse great ire. One simply does not expect to hear such a statement any more.

  • We do not know what to back up or do not have the time to sort out the replaceable data from that which is irreplaceable. We think that much of the data could be redownloaded (if it originated during a Web surfing expedition), but we also know that some of the data is very unique to usthe tax return from two years ago, school reports, and customizations of various software applications that we've grown to depend on, for example. It is just too difficult to find those unique items among the massive tangle of other data and back up just those items.

For the previous decade or two, the computer world has been divided into two different major user camps: business users and home users. Business users truly owned the most important data and had the most to lose if it was irretrievably lost or corrupted. However, as a business user, the job of data protection fell (and still falls) on the shoulders of IT administrators. On the other hand, and up until recently, home users did not store that much data electronically that was considered critical for one reason or another, so the exposure to loss for the home user has traditionally been limited.

This situation has now turned upside down for both camps. Today, most business professionals use laptops as their primary computer. Laptops are typically disconnected from corporate networks during the classic backup time (overnight). So, we now have business people with massive amounts of potentially critical corporate data that is susceptible to abuse and damage, yet is seldom connected to any corporate backbone. Furthermore, business people are more likely to self-create unique data than in the past; the traveling business person is creating his or her own PowerPoint presentations, proposals, and Word documents, whereas a few years ago these were created by marketing departments or administrators and thus protected by the centralized IT department. Not so any more. It is estimated by a number of industry analysts that 60 percent of a corporation's digital assets are now living on the "edge" in PCs and laptops and not protected at all.

Similarly, individuals now have critical data stored on their own home computers. There are more at-home independent consultants and more teachers creating material at home in the evenings for the next day. On home PCs, more mothers are organizing book groups and soccer schedules, and more kids are writing sixth-grade essays. The loss of data for the home PC user can now be just as devastating as it is for the business professional (resonating with the Inescapable Data theme that businesses and individuals increasingly share common issues).

The Inescapable Data philosophy brings some solutions. In the Inescapable Data world, data is stored multiple times and in multiple locations, transparently. Consider how most modern applications, such as Microsoft Word, operate regarding mistakes. You can "undo" whatever it is you last did, and most programs enable you nearly unlimited "undo" (unwinding very far back through a series of edits of a document). We now depend on this undo capability, and it allows us to be more freewheeling when making changes to documents. We need the same level of comfort for the actual disk files themselves. We should be able to "go back" through old versions of files (or any data for that matter) and do so while at 30,000 feet in an airplane.

VersioningThe "Undo" Key

Twenty years ago, Digital Equipment Corporation's file system featured built-in versioning. Every time a file was created, a copy of that file was also made in the background and stored. Each time it was updated, the updated file was added to all of the prior versions. Why? If a file was accidentally corrupted, an administrator could roll back to a previous uncorrupted version of the file and start over.

However, the problem in "undoing" one's mistakes in this way becomes apparent when one realizes that, at the time, the average mini-computer's disk capacity was 50MB. The disk could quickly run out of space. Although the file system was busily storing past versions of every file, even those silly transient "binary" ones created by the computer itself, the amount of "litter" that accumulated on the disk got in the way of processing the useful data. The file system was not smart enough to know what to retain and for how long.

Some new "undo" approaches are smarter and offer a less-cluttered approach. IBM has a software package, VitalFile, that runs transparently on your laptop (or UNIX server) and knows the difference between user-created important files and chaff. User files that get updated have the older version tucked away in a discrete repository and can be restored immediately when needed. Chaff is auto-managed; versions of files are stored only for a certain amount of time, and as they age, older ones are deleted automatically. Other companies offer products that will "journal" changes to files out to a disk in the SAN fabric and allow an administrator to roll backward through past transactions to re-create the contents of a file at a certain point in time (although this is more useful for what is called "structured data" [i.e., databases] rather than user files). Over time, individual users will come to expect an "undo" function to fix or restore files that have been lost, destroyed, or have become unreadable from their home-based systems because they are receiving continuous protection (i.e., "undo") from nearly every other computerized tool they use.


Sun's view, in the words of Mark Canepa, senior vice president of storage and networking is this: "The fact that you have to perform backup is a bug." Sun suggests backup is less of a separate application and more of a feature of the components that actually store the data, such as the file system itself. "You simply cannot tolerate ever loosing data anymore," continues Canepa, whose new file system, ZFS, improves on the old "backup" paradigm. IBM's new file system, SAN FS (and its natural physical components, such as fabric replication services), also take the attitude that protection should be more transparent to both the IT storage administrator and the application user. Veritas also offers products that lean toward more continuous protection as well. For IBM, Sun, Veritas, and yes, even Microsoft, the direction is clear: Data protection will be assumed and transparent going forward because it will be provided as a standard storage infrastructure function.

Opportune Protection

Picture yourself (the typical laptop user) in this new world of transparent data protection. While flying over Chicago, you modify a PowerPoint pitch you gave yesterday to General Motorsthe new pitch you intend to use in an upcoming appointment at Ford tomorrow. After making a few cuts and pastes of some spicier artwork, you click the Save button. Unfortunately, you "saved" it as the old General Motors presentation, essentially overwriting it with the Ford pitch. Oops.

In the Inescapable Data world, recovery is but a keystroke away. You are able to swiftly "recover" any instance of any file you have recently changed, even without being connected to the company network or calling for the assistance of a help desk or administrator. (This alone is a monumental step forward.) When you reach your WiFi-connected hotel room, you log on to the hotel's network. Via the Internet, your system automatically migrates some set of the changed files onto a remote system somewhere in the Internet cloud. Later, if your computer is misbehaving or misplaced, you have the comfort of knowing that your important materials are housed elsewhere in the grand network for protection against such mishaps.


At the heart of every computerized thing we do is data that is likely stored away in files on a computer. Proper protection of these files is of paramount importance, and for data to remain pervasive it has to first be assured of persistencemore so now than ever before because we are more integrated with data in the Inescapable Data world. The entire paradigm of data protection in the world of computers is changing and moving toward real-time continuous duplication of all data, which is pushed somewhere into "the network." In the colorful future, when data is changed, it will be sent into the network. The network will manage its redundancy and retrievability. Today, "backup" is still a discrete operation, even using some of the software we have mentioned. The next chapter covers some newer technologies and philosophies for overall network data management that will have a welcome home in the Inescapable Data world.

Groupware as Backup

There is a hidden value contained within groupware software. Using groupware to create, maintain, and exchange files essentially results in redundant copies of data being stored in multiple locations. The Groove software, mentioned earlier in this book, is a peer-to-peer product. There is no "server" to buy and configure, meaning that setup time is not an issue. There is no central data repository either that needs to be backed up. Instead, all the peer members within a groupware group end up having identical copies of filescreated automatically behind the scenes and over a very secure encrypted connection. As groupware members come online and go offline, the Groove software automatically synchronizes shared files with new updates. Offline group members files are resynchronized with the most recent versions when they come back online. For group members, a loss of a hard drive or a stolen laptop is less of a worry because all other members have redundant copies of shared files. It is likely that future Groove versions will support versioning, further protecting users by providing "time-based" copies of files, which is essentially what backup copies are.




    Inescapable Data. Harnessing the Power of Convergence
    Inescapable Data: Harnessing the Power of Convergence (paperback)
    ISBN: 0137026730
    EAN: 2147483647
    Year: 2005
    Pages: 159

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net