Section 15.2. The Remembrance of Data Passed Study


15.2. The Remembrance of Data Passed Study

In August 1998, I was chief technology officer of a computer security start-up. One of my jobs involved setting up a test bed of modem-equipped computers that would answer incoming phone calls and respond with a variety of different prompts. Instead of purchasing new computers for this somewhat mundane task, I bought 10 used machines at $20 each from a small-town computer store. Most of the computers had been sitting on a shelf for more than a year, and the store's owner didn't even know if they worked. My plan was to mix and match the components until I had five or six operational systems.

When I got the computers back to my house and started to inventory the parts, I discovered that the computer store had neglected to sanitize the hard drives prior to selling me the machines. Intrigued, I inventoried the drives and discovered the following:

  • One of the larger machines, a '486-class system with a 40-gigabyte hard drive, had been the Novell file server for a small law firm. The computer had considerable client material on it, including contracts, wills, and billing records.

  • A second computer had been used by an organization that delivered mental health services to community residents under contract to a state agency. The computer included a FileMaker Pro database that had the names, addresses, and diagnoses of several dozen individuals living in the community.

  • A third machine had belonged to a writer who worked for a national magazine and also wrote novels. This machine contained many unpublished works, works-in-progress, and correspondence.

  • A fourth machine had letters sent between a woman and her daughter in college. This computer also had a copy of Quicken, which the woman apparently used to manage her finances.

All of this information was visible once the computers were turned on; no special disk recovery software was needed at all. I called the store's owner. Once he got over his shock and embarrassment, he asked me to wipe the systems as a favor. Apparently, he had meant to sanitize the machines before he sold them, but he had forgotten to do so.

15.2.1. Other Anecdotal Information

My experience with data left on disks that were subsequently sold on the secondary market is hardly unique. In recent years, there have been numerous reports of such cases, including:

  • In April 1997, a woman in Pahrump, Nevada, purchased a used IBM PC and discovered records from 2,000 patients who had prescriptions filled at a Smitty's Supermarkets pharmacy in Tempe, Arizona.[1]

    [1] John Markoff, "Patient Files Turn Up in Used Computer," The New York Times (April 4, 1997).

  • In August 2001, more than 100 computers from the consulting firm Viant containing confidential client data were sold at auction by Dovebid following the closure of Viant's San Francisco office.[2]

    [2] Jay Lyman, "Troubled Dot-Coms May Expose Confidential Data," NewsFactor Network (Aug. 8, 2001); http://www.newsfactor.com/perl/story/12612.html.

  • In spring 2002, the Pennsylvania State Department of Labor and Industry sold computers containing "thousands of files of information about state employees."[3]

    [3] Matt Villanano, "Hard-Drive Magic: Making Data Disappear Forever," The New York Times (May 2, 2002).

  • In August 2002, a Purdue student purchased a used Macintosh computer at the equipment exchange and discovered that the computer contained a FileMaker database with names and demographic information of 100 applicants to the Entomology Department.

  • Also in August 2002, the United States Veterans Administration Medical Center in Indianapolis retired 139 computers. Some of these systems were donated to schools, others were sold on the open market, and at least three ended up in a thrift shop where they were purchased by a journalist. Examination of the computer hard drives revealed sensitive medical information, including the names of veterans with AIDS and mental health problems. Also found were 44 credit card numbers used by the Indianapolis facility.[4]

    [4] Judi Hasson, "VA Toughens Security After PC Disposal Blunders," Federal Computer Week (Aug. 26, 2002).

  • In June 2004, the UK computer security firm Pointsec purchased 100 hard disks on eBay as part of a project on the "life cycle of a lost laptop." Although all of the hard drives had "supposedly" been "wiped clean" or "reformatted," the company was able to recover data from approximately 70 of the drives. The company also purchased laptops at auction that had been lost at airport terminals in England, Germany, Sweden, and the U.S. and verified that police did not sanitize the laptops prior to selling them. Reportedly, the laptop recovered from Sweden "contained sensitive information from a large food manufacturer. The data recovered included four Microsoft Access databases containing company and customer-related information and 15 Microsoft PowerPoint presentations containing highly sensitive company information."[5]

    [5] John Leyden, "Oops! Firm Accidentally eBays Customer Database," The Register (June 7, 2004); http://www.theregister.co.uk/2004/06/07/hdd_wipe_shortcomings/.

While these cases are certainly notable, they represent a tiny fraction of the number of hard disks that are being repurposed, recycled, or otherwise resold on the secondary market.

According to the market research firm Dataquest,[6] nearly 150 million disk drives will be retired in 2002up from 130 million in 2001. Dataquest estimates that 7 disk drives will be retired for every 10 drives that ship in the year 2002; this is up from a 3-for-10 rate of retirement in 1997. Thus, more and more drives are being retired every year!

[6] John Monroe, "Personal Communication," Gartner Dataquestion (Sept. 23, 2002).

But the term retired is something of a misnomer. As the experience at the VA Hospital demonstrates, many disk drives that are "retired" by one organization can appear elsewhere. Indeed, mainstream businesses are increasingly turning to used equipment in an effort to cut coststhe editors at CIO Magazine even ran a cover story giving their readers advice on finding the best deals.[7]

[7] Scott Berinato, "Good Stuff Cheap: A New Hardware Market Is Developing to Give CTOs What They Want Most: Good Stuff Cheap. This Is Its Story," CIO (Oct. 15, 2002).

These anecdotal reports are interesting both because of their similarity to each other and because of their relative scarcity. Clearly, confidential information has been disclosed through computers sold on the secondary market more than a few times. Why, then, have there been so few reports of unintended disclosure?

In the initial publication detailing this study,[8] Shelat and I proposed three possible hypotheses to answer this question:

[8] Simson Garfinkel and Abhi Shelat, "Remembrance of Data Passed," IEEE Security and Privacy (Jan.2002).

  • Disclosure of so-called "data passed" information, while it occurs from time to time, is nevertheless exceedingly rare.

  • Confidential information might be disclosed so often on retired systems that such events are simply not newsworthy.

  • While used equipment is awash with confidential information, nobody is looking for itor at least, few people who are looking for this data are publicizing the fact.

This chapter argues that the third hypothesis is correct. Based on a combination of the information found on the drives and interviews conducted with some of the original data owners, it seems that most confidential information on these "retired" drives is erased but not overwritten. As a result, I believe that many repurposed drives contain significant amounts of personal or confidential information, but few of the drives' current users are aware of this fact.

15.2.2. Study Methodology

Between January 1999 and January 2003, I purchased 235 used hard drives on the secondary market in an effort to determine what information they contained and what, if any, means were taken to clean the drives before they were discarded. Initially the drives were purchased at used computer stores such as WeirdStuff in Sunnyvale, California, and PC Recycle in Belleview, Washington. The majority of drives were purchased as the result of winning bids on the eBay online auction web site. Most purchases consisted of between 3 and 5 drives; in no case were more than 20 drives at a time from the same vendor.

Modern hard disks store information in individually addressable blocks , with each block being 512 bytes in length. A 50-gigabyte disk thus has approximately 10 million blocks.

On receipt, each drive was cataloged and entered into a database. Each drive was then attached to a computer running the FreeBSD operating system and the contents were copied off, block for block, using the command:

     dd if=/dev/ad2 of=NNN.img conv=noerror,sync

where /dev/ad2 is the raw device of the disk, noerror instructs that the dd command should continue copying data even if an error is encountered, and sync specifies that error-containing blocks should be written to the output stream as all zeros.

A filesystem is the piece of a computer's operating system that controls the allocation of disk blocks to individual files. Popular filesystems are FAT32 (used by Windows 3.1, Windows 95, and Windows 98), NTFS (used by Windows NT, 2000, and XP), FFS (used by BSD Unix), and ext2fs (used by Linux). The following discussion is for the FAT32 filesystem, but it applies to all modern filesystems with only minor changes.

Once the images were created, they were mounted with FreeBSD's "memory disk" driver. I then attempted to read the data in the image using FreeBSD's native filesystem implementations for the FAT, NTFS, Novell, and Unix filesystems.

Of the 235 disks, 59 were dead on arrival, and the remaining 176 had data that could be read, for a total of 125 gigabytes of image files. Of these drives, 11 disks contained no data at allthat is, every block on these disks had been overwritten with ASCII NUL bytes. Another 22 disks appeared to have been overwritten completely and then formatted using the Windows FORMAT command. On these 22 disks, more than 99% of the blocks were blank. For the majority of the remaining disks, it appeared that little if anything had been done to remove the data of their previous owners.

Further examination appeared to contradict this conclusion. The remaining disks contained relatively large amounts of recoverable data. Nevertheless, a relatively small percentage of this data seemed to actually reside in files. There were only 168,459 files on the 176 readable drives, accounting for just 38,296,903[9] of the 190,681,765 non-zero disk blocks. Examining the files by file type, I found just 783 Microsoft Word files, 184 Microsoft Excel files, 30 Microsoft PowerPoint files, and just 11 Outlook PST filesnumbers that seemed suspiciously low given that these were used disk drives.

[9] This figure takes into account the fact that a 1-byte file takes an entire block, and a 1,025-byte file takes three blocks.

Typical of the disks recovered was Disk #70, an IBM DALA 3540 that was purchased for $5 on eBay from a Massachusetts retail store. The disk contained 541 megabytes of data in 1,057,392 disk blocks (each disk block holds 512 bytes). Only 6% of the disk blocks were filled with ASCII NUL bytes; the rest contained data. Yet when the disk was mounted, just three files were observedtwo of which were marked as "hidden" by the operating system:

     IO.SYS        (hidden)     MSDOS.SYS     (hidden)     COMMAND.COM

Where was the rest of the data?

15.2.3. FORMAT Doesn't Format

Broadly speaking, modern disk drives have the ability to store two kinds of information. The majority of information stored by the device is directly addressable user data these are the actual blocks that are written by the computer's operating system onto the drive's media in response to WRITE commands, and read back into the computer in response to READ commands. The second kind of information stored on the disk drive is hidden data that is used for the proper operation of the disk drive itself. This information includes the disk's firmware and spare blocks that the drive will use when blocks containing directly addressable user data begin to fail.

When a manufacturer delivers a drive to the computer maker or end user, all blocks that will be used to hold directly addressable user data are filled with the ASCII NUL characterthat is, the blocks are zeroed. (The hidden blocks generally are not zeroed, but they cannot be accessed by the computer's operating system; for most practical purposes, these blocks do not exist.)

When a disk is formatted with the FAT filesystem, the Windows FORMAT command scans the entire disk, reading every block to make sure that the block is functioning. The FORMAT command then writes down boot blocks, the disk's root directory, and finally a file allocation table that is used to distinguish blocks that are in use by the filesystem from those that are not. This process typically takes between 10 and 20 minutes, owing to the time required to read every block on the drive.

Once the root directory is written out, any information that was previously on the disk is rendered inaccessible. The data is still on the disk, but it cannot be retrieved using Windows because the files and directories of the disk cannot be reached by starting at the disk's now empty root directory. Thus, the Windows FORMAT command doesn't really erase the contents of the disk: it actually reads the entire disk and writes a new root. (Overwriting the FAT does make it more difficult to reassemble files that have been fragmentedthat is, written partially in one location and partially in one or more others. This tends to make it harder, although not impossible, to recover large files.)

The failure of the FORMAT command to zero or otherwise initialize a hard drive has an interesting history. The first version of DOS, MS-DOS 1.0, worked only with floppy disks. At the time, floppies were sold without any track or sector information on their magnetic surface and they needed to be "formatted" before they could be used. In the process of formatting the disk, any bad blocks were detected and noted in the disk's FAT so that they would not be used accidentally. If a floppy disk containing data was formatted, the information that it contained would necessarily be overwritten. This process took a few minutes. Thus, the initial meaning of "format" to PC users in 1981 was "a process that initializes a piece of magnetic media, making it usable, and destroying any data that the media might contain in the process."

With the introduction of DOS 2.0, the first version of DOS that directly supported hard-disk drives, FORMAT of hard disks was made nondestructive. Because hard drives were sold already initialized, it was only necessary for the FORMAT command to literally write a format of data structures into the disk's logical blocks so that the disk could be used with the operating system. But the FORMAT command continued to scan the entire disk for bad blocksa process that might take between 10 and 30 minutes.

Thus, the FORMAT command gave the impression that it was overwriting the entire disk because it took a long time and because the resulting disk appeared to contain no data. But, in fact, no such overwriting took place. Not only does the FORMAT command turn visible data into invisible data, but it furthermore does so in a manner that is misleading. Equally misleading is the warning that the command displays:

     A:\>format c:     WARNING, ALL DATA ON NON-REMOVABLE DISK     DRIVE C: WILL BE LOST!     proceed with Format (Y/N)?y     Formatting 1,007.96M     100 percent completed.     Writing out file allocation table     Complete.     Calculating free space (this may take several minutes)...     Complete.            Volume label (11 characters, ENTER for none)?     1,054,851,072 bytes total disk space     1,054,851,072 bytes available on disk             4,096 bytes in each allocation unit.           257,531 allocation units available on disk.     Volume Serial Number is 4026-1EFC     A:\>

The DOS 2.0 FORMAT command could have overwritten the entire disk, but this would have doubled the amount of time that the command required to prepare a new hard drive because every block would have needed to be both written and then read. The program's creators appear to have made a tradeoff here between usability and securityincreasing one while decreasing the other. Unfortunately, it was an invisible, undocumented tradeoff.

Microsoft could have done things differently. For example, the program's creators could have put in a command-line switch that would have forced the program to first overwrite each block with NULs before it was read back. Then, the program could have been modified so that it would display one of two different messages. The "ALL DATA ... WILL BE LOST" message could have been used when the disk was actually overwritten, and a different message could have been used for the less severe option.

One reason that Microsoft's engineers may not have gone in this direction is that the hard drives that were sold in the 1980s generally came with their own separately packaged "disk utilities." Invariably, one of the "utilities" was a program that performed a so-called "low-level format" on the physical disk. The details of what a "low-level format" actually did varied from manufacturer to manufacturer and from drive to drive, but it generally was viewed as destroying all of the user-addressable information that the disk might contain. Mueller's 1991 book, Que's Guide to Data Recovery, noted that the key difference between a low-level format and a high-level format was that "you can recover dataunformatfrom a high-level format."[10] Nevertheless, such knowledge did not diffuse into the general computer-user population.

[10] Scott Mueller with Alan C. Elliott, Que's Guide to Data Recovery (Que Corporation, 1991), 99.

It is incredibly misleading for an operating system to give the impression that all of the information has been removed from a disk when, in fact, the information has merely been made inaccessible to users who have not obtained special data recovery tools. Such a situation is an invitation for mishap: given a freshly formatted hard disk, there is no way for a user to audit the disk and determine if it is, in fact, clean, or if it has a treasure-trove of hidden, confidential information.

Modern versions of the Windows FORMAT command also have the ability to "quick format" a disk, which omits the media scan step. In this case, the entire disk can be formatted in just a few seconds. When Microsoft created the "quick format" option, the company could have gone back and changed the behavior of FORMAT when the "quick" option wasn't selected. Ideally, a non-quick format would actually overwrite the data on the disk. This would have aligned once again the internal workings of the commands with the effects that are visible to the user. Unfortunately, Microsoft left the behavior of the command as it was.

15.2.4. DELETE Doesn't Delete

Just as today's FORMAT command doesn't actually format disks, it turns out that commands for erasing individual files do not actually perform that function, either. Instead of overwriting the actual data, commands like DELETE and ERASE simply remove the entry in the file's containing directory and return the file's blocks to the free list. What happens after the file is deleted depends upon many factors, including the amount of free space on the disk and the system's pattern of usage.

Once again, the usability problem is that the operating system gives the user the appearance that the data has been removed from the computer when, in fact, the data has merely been made inaccessible by ordinary means. The usability problem for end users is compounded by the fact that there is no mention of this behavior in the Microsoft documentation. For example, the Windows built-in help for the DELETE command simply states that DEL "deletes one or more files."

As before, this systematic deception on the part of DELETE and ERASE wasn't exactly secreta 1987 advertisement for the Mace Utilities appearing in The New York Times noted that the $59.95 program could "Unformat, Undelete, Diagnose & Remedy" and much more.[11] But mention that files could be undeleted did not appear in a feature article until 1990, and then only in Peter Lewis's "Executive Computer" column on the 11th page of the Business section of The Times.[12]

[11] Display Ad 57No Title, The New York Times (Feb. 8, 1987), 57. An April 26, 1983 advertisement for the Norton Utilities fails to mention if the Norton's programs can undelete or unformat.

[12] H. Peter Lewis, " 'Little Black Boxes' That Can Save a Hard Drive," The New York Times (April 29, 1990), F11.

15.2.5. A Taxonomy of Sanitized Recovered Data

Now we have an explanation for what happened to the data on Disk #70: the disk was formatted with the Windows FORMAT command before it was resold. Indeed, running the Unix strings(1) command over the disk's image file reveals many interesting things about the disk's previous owner, including the fact that the disk had a copy of IBM AntiVirus Trial Edition installed (Example 15-1) and that the disk was used in some kind of medical application (Example 15-2). Additional investigation revealed that this disk had been used in a computer that belonged to a mail-order pharmacy.

Example 15-1. The contents of block #854420 from Disk #70
 Displaying block 854420 Notes to Users of IBM AntiVirus version 3.0 build 307..=  =  =  =  =  =  = =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =   =  =  =  =  = =  =  =  =  =  =  =  =  =....This file conta ins important notes for all users of IBM AntiVirus,..including a  summary of highlights in this release and last-minute..changes to the printed documentation.  It is divided into these..section s:....   Introduction..   Highlights of release 300..   Highligh ts of release 301..   Highlights of release 302..   Highlights o f build 304..   Highlights of build 306..   Highlights of build 

Example 15-2. The contents of blocks 315782 and 315783 from Disk #70
 Displaying block 315782 *.......&@......u@.ALLERGY ALERT.......@.DPC5.......&@..... u@.D RUG TO DRUG INTERACTION.......@.DPC4.......&@.....0u@.THERAPEUTI C DUPLICATION.......@.DPC,.......&@.....@u@.HIGH DOSE ALERT..... ..@.DPC-.......&@.....Pu@.TOO EARLY REFILL.......@.DPC/........@ .....'u@.EXCESSIVE DURATION.......@.CMB=.......&@.....pu@ INFERR ED DRUG DISEASE PRECAUTION.......@.DPC(.......&@......u@.DRUG GE NDER.......@.DPC0.......&@......u@.DRUG AGE PRECAUTION.......@.D PC+.......&@......u@.LOW DOSE ALERT.......@.DPC........*@....... Displaying block 315783  09/30/1981  03:00   DUPLICAT.ION @.SUSPENDED LICENSE.......@.RTP0........@.......@.DIABETIC STRIP S - C.......@.CMB0........@.......@.DIABETIC STRIPS - B.......@. CMB7.......$@.......@.GENERIC PROD. SUBST-REFILL.......@.DAW>... ....$@.......@!GENERIC PROD. SUBST-NEW & REFILLS.......@.DAW;... ....&@......v@.BENEFICIARY NOT ELIGIBLE PRIME.......@.DPC2...... .0@.......@.MNFR. SPECIFIED ON RX.......@.NIS........&@..... v@. DUPLICATION CLAIM.......@.DPC-.......&@.....0v@.REQUIRES RECIEPT .......@.DPC/.......&@.....@v@.DRUG NOT AVAILABLE.......@.DPC-.. 

In order to facilitate the discussion of sanitization tools and practices, Shelat and I created a sanitization taxonomy (see Table 15-1). Using this taxonomy to discuss Disk #70, we can say that the disk contained one Level 0 file (COMMAND.COM) and two Level 1 files (IO.SYS and MSDOS.SYSboth files that were "hidden") and approximately 508 MB of Level 3 data.

Table 15-1. A sanitization taxonomy

Level

Type of data

Description

0

Regular files

Information contained within the filesystem. Includes filenames, file attributes, and file contents. By definition, there has been no attempt to sanitize the information that is contained within Level 0 files. Level 0 also includes information that is written to the disk as part of any sanitization attempt. For example, if a copy of Windows 95 is installed on a hard drive in an attempt to sanitize the drive, then the files contained within the C:\WINDOWS directory would be considered Level 0 files. No special tools are required to retrieve Level 0 data.

1

Temporary files

Temporary files, including print spooler files, browser cache files, files for "helper" applications, and files in "recycle bins." Most users either expect that these files will be deleted automatically in time or are not even aware that these files exist.

Note that Level 1 files are a subset of Level 0 files. Experience has shown that it is useful to distinguish this subset, because many naive users will overlook Level 1 files when they are browsing a computer's hard drive to see if it contains sensitive information. No special tools are required to retrieve Level 1 data, although special training is required so that the operator knows where to look.

2

Deleted files

When a file is deleted from a filesystem, most operating systems do not overwrite the blocks on the hard disk on which the file is written. Instead, they simply remove the reference to the file from the containing directory. The file's blocks are then placed on the free list. These files can be recovered using traditional "undelete" tools such as Norton Utilities.

3

Retained data blocks

Data that can be recovered from a disk but that does not obviously belong to a named file. Level 3 data includes information in slack space, swap space for virtual memory, and Level 2 data that has been partially overwritten so that an entire file cannot be recovered.

One common source of Level 3 data is disks that have been formatted with the Windows FORMAT command or the Unix newfs command. Even though these commands give the impression that they overwrite the entire hard drive, in fact they do not, and the vast majority of the information on a formatted disk can be recovered with Level 3 tools.

Level 3 data can be recovered using advanced data recovery tools that can "unformat'" a disk drive, and using special-purpose forensics tools.

4

Vendor-hidden data

This level consists of data blocks on the drive that can be accessed using only vendor-specific commands. This level includes the drive's controlling program, blocks used for bad-block management, and the Host Protected Area (HPA) of modern hard drives.

5

Overwritten data

Many individuals maintain that information can be recovered from a hard drive even after it is overwritten. Level 5 is reserved for such information.


The combination of the taxonomy and the statistical analysis of the operational disks provides a simple answer to the questions posed earlier in this chapter. Although the disks that were purchased contained large amounts of personal information, most of this information consisted of Level 2 and Level 3 files: a casual examination of the disks showed disks that were either formatted or had the user files deleted, leaving only the program files. Most potential recipients of disks sold on the secondary market, lacking tools for accessing Level 2 and Level 3 information, probably never encounter the confidential information on disks that they purchase.

DELETE DOESN'T DELETE BACKUPS, EITHER

There is another kind of data that the DELETE command doesn't delete: information stored on backup tapes. This point was made in a very public manner during the 1987 Congressional Iran-Contra hearings. Prior to the hearings, investigators had been able to reconstruct the Reagan Administration's illegal "arms-for-hostages" scheme by accessing stored email messages from the National Security Council's PROFS (Professional Office System) backup tapes. (The original messages had long since been deleted by staff members trying to cover their tracks.) As Lt. Colonel Oliver North stated during his congressional testimony, "We all sincerely believed that when we sent a PROFS message to another party and punched the button 'Delete' that it was gone forever. Wow, were we wrong."[a]

One way to deal with the problem of deleting information on backup tapes is to encrypt every file on the backup with a different key: when it is necessary to sanitize a file, the key can also be deleted. Of course, this contradicts the express purpose of backups : to enable information that is deleted accidentally to be recovered. Certainly, the key can be deleted after a waiting period of minutes or hours, but in some cases, organizations may simply not want to give individuals the ability to delete files from backupsthis is a way to protect the organization against possibly hostile employees.


[a] National Public Radio news broadcasts (1992).

This answer was confirmed, in part, by a series of interviews conducted between December 2003 and October 2004 with the previous owners of 16 of the drives. In some cases (Drives #7, #11, #73, #74, #75, #77, #94, and #134), the organization had a procedure in place for sanitizing the drives, but that procedure was not sufficient to do the job. In other cases (Drives #21 and #44), there was no formal procedure in place. Many owners that were not sophisticated had trusted their reseller to perform the sanitization processa trust that was betrayed (Drives #54, #193, and #205). In the remaining cases of drives that were traced back to their owners, no determination could be made (Drives #6 and #128).



Security and Usability. Designing Secure Systems that People Can Use
Security and Usability: Designing Secure Systems That People Can Use
ISBN: 0596008279
EAN: 2147483647
Year: 2004
Pages: 295

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net