Hack94.Recover Data from Crashed Disks

Hack 94. Recover Data from Crashed Disks

You can recover most of the data from crashed hard drives with a few simple Linux tricks.

As the philosopher once said, "Into each life, a few disk crashes must fall." Or something like that. Today's relatively huge disks make it more tempting than ever to store large collections of data online, such as your entire music collection or all of the research associated with your thesis. Backups can be problematic, as today's disks are much larger than most backup media, and backups can't restore any data that was created or modified after the last backup was made. Luckily, the fact that any Linux/Unix device can be accessed as a stream of characters presents some interesting opportunities for restoring some or all of your data even after a hard drive failure. When disaster strikes, consult this hack for recovery tips.

This hack uses error messages and examples produced by the ext2fs filesystem consistency checking utility associated with the Linux ext2 and ext3 filesystems. You can use the cloning techniques in this hack to copy any Linux disk, but the filesystem repair utilities will differ for other types of Linux filesystems. For example, if you are using ReiserFS filesystems, see "Repair and Recover ReiserFS Filesystems" [Hack #95] for details on using the special commands provided by its filesystem consistency checking utility, reiserfsck.

10.7.1. Popular Disk Failure Modes

Disks generally go bad in one of three basic ways:

Hardware failure that prevents the disk heads from moving or seeking to various locations on the disk. This is generally accompanied by a ticking noise whenever you attempt to mount or otherwise access the filesystem, which is the sound of disk heads failing to launch or locate themselves correctly.
Bad blocks on the disk that prevent the disk's partition table from being read. The data is probably still there, but the operating system doesn't know how to find it.
Bad blocks on the disk that cause a filesystem on a partition of the disk to become unreadable, unmountable, and uncorrectable.

The first of these problems can generally be solved only by shipping your disk off to a firm that specializes in removing and replacing drive internals, using cool techniques for recovering data from scratched or flaked platters, if necessary. The second of these problems is discussed in "Recover Lost Partitions" [Hack #93]. This hack explains how to recover data that appears to be lost due to the third of these problems: bad blocks that corrupt filesystems to the point where standard filesystem repair utilities cannot correct them.

If your disk contains more than one partition and one of the partitions that it contains goes bad, chances are that the rest of the disk will soon develop problems. While you can use the techniques explained in this hack to clone and repair a single partition, this hack focuses on cloning and recovering an entire disk. If you clone and repair a disk containing multiple partitions, you will hopefully find that some of the copied partitions have no damage. That's great, but cloning and repairing the entire disk is still your safest option.

10.7.2. Attempt to Read Block from Filesystem Resulted in Short Read…

The title of this section is one of the more chilling messages you can see when attempting to mount a filesystem that contained data the last time you booted your system. This error always means that one or more blocks cannot be read from the disk that holds the filesystem you are attempting to access. You generally see this message when the fsck utility is attempting to examine the filesystem, or when the mount utility is attempting to mount it so that it is available to the system.

A short read error usually means that an inode in the filesystem points to a block on the filesystem that can no longer be read, or that some of the metadata about your filesystem is located on a block (or blocks) that cannot be read. On journaling filesystems, this error displays if any part of the filesystem's journal is stored on a bad block. When a Linux system attempts to mount a partition containing a journaling filesystem, its first step is to replay any pending transactions from the filesystem's journal. If these cannot be readvoilà!short read.

10.7.3. Standard Filesystem Diagnostics and Repair

The first thing to try when you encounter any error accessing or mounting a filesystem is to check the consistency of the filesystem. All native Linux filesystems provide consistency-checking applications. Table 10-2 shows the filesystem consistency checking utilities for various popular Linux filesystems.

Table 10-2. Different Linux filesystems and their associated repair utilities
Filesystem type	Diagnostic/repair utilities
ext2, ext3	e2fsck, fsck.ext2, fsck.ext3, tune2fs, debugfs
JFS	jfs_fsck, fsck.jfs
reiserfs	reiserfsck, fsck.reiserfs, debugresiserfs
XFS	fsck.xfs, xfs_check

The consistency-checking utilities associated with each type of Linux filesystem have their own ins and outs. In this section, I'll focus on trying to deal with short read errors from disks that contain partitions in the ext2 or ext3 formats, which are the most popular Linux partition formats. The ext3 filesystem is a journaling version of the ext2 filesystem, and the two types of filesystems therefore share most data structures and all repair/recovery utilities. If you are using another type of filesystem, the general information about cloning and repairing disks in later sections of this hack still applies.

If you're using an ext2 or ext3 filesystem, your first hint of trouble will come from a message like the following, generally encountered when restarting your system. This warning comes from the e2fsck application (or a symbolic link to it, such as fsck.ext2 or fsck.ext3):

 # e2fsck /dev/hda1 e2fsck: Attempt to read block from filesystem resulted in short read

If you see this message, the first thing to try is to cross your fingers and hope that only the disk's primary superblock is bad. The superblock contains basic information about the filesystem, including primary pointers to the blocks that contain information about the filesystem (known as inodes). Luckily, when you create an ext2 or ext3 filesystem, the filesystem-creation utility (mke2fs or a symbolic link to it named mkfs.ext2 or mkfs.ext3) automatically creates backups copies of your disk's superblock, just in case. You can tell the e2fsck program to check the filesystem using one of these alternate superblocks by using its -b option, followed by the block number of one of these alternate superblocks within the filesystem with which you're having problems. The first of these alternate superblocks is usually created in block 8193, 16384, or 32768, depending on the size of your disk. Assuming that this is a large disk, we'll try the last as an alternative:

 # e2fsck -b 32768 /dev/hda1 e2fsck: Attempt to read block from filesystem resulted in short read while checking ext3 journal for /dev/hda1

You can determine the locations of the alternate superblocks on an unmounted ext3 filesystem by running the mkfs.ext3 command with the n option, which reports on what the mkfs utility would do but doesn't actually create a filesystem or make any modifications. This may not work if your disk is severely corrupted, but it's worth a shot. If it doesn't work, try 8192, 16384, and 32768, in that order.

This gave us a bit more information. The problem doesn't appear to be with the filesystem's superblocks, but instead is with the journal on this filesystem. Journaling filesystems minimize system restart time by heightening filesystem consistency through the use of a journal [Hack #70]. All pending changes to the filesystem are first stored in the journal, and are then applied to the filesystem by a daemon or internal scheduling algorithm. These transactions are applied atomically, meaning that if they are not completely successful, no intermediate changes that are part of the unsuccessful transactions are made. Because the filesystem is therefore always consistent, checking the filesystem at boot time is much faster than it would be on a standard, non-journaling filesystem.

10.7.4. Removing an ext3 Filesystem's Journal

As mentioned previously, the ext3 and ext2 filesystems primarily differ only in whether the filesystem contains a journal. This makes repairing most journaling-related problems on an ext3 filesystem relatively easy, because the journal can simply be removed. Once the journal is removed, the consistency of the filesystem in question can be checked as if the filesystem was a standard ext2 filesystem. If you're very lucky, and the bad blocks on your system were limited to the ext3 journal, removing the journal (and subsequently fsck'ing the filesystem) may be all you need to do to be able to mount the filesystem and access the data it contains.

Removing the journal from an ext3 filesystem is done using the tune2fs application, which is designed to make a number of different types of changes to ext2 and ext3 filesystem data. The tune2fs application provides the -O option to enable you to set or clear various filesystem features. (See the manpage for tune2fs for complete information about available features.) To clear a filesystem feature, you precede the name of that feature with the caret (^) character, which has the classic Computer Science 101 meaning of "not." Therefore, to configure a specified existing filesystem so that it thinks that it does not have a journal, you would use a command line like the following:

 # tune2fs -f -O ^has_journal /dev/hda1 tune2fs 1.35 (28-Feb-2004) tune2fs: Attempt to read block from filesystem resulted in short read  while reading journal inode

Darn. In this case, the inode that points to the journal seems to be bad, which means that the journal can't be cleared. The next thing to try is the debugfs command, which is an ext2/ext3 filesystem debugger. This command provides an interactive interface that enables you to examine and modify many of the characteristics of an ext2/ext3 filesystem, as well as providing an internal features command that enables you to clear the journal. Let's try this command on our ailing filesystem:

 # debugfs /dev/hda1 debugfs 1.35 (28-Feb-2004) /dev/hda1: Can't read an inode bitmap while reading inode bitmap debugfs: features features: Filesystem not open debugfs: open /dev/hda1 /dev/hda1: Can't read an inode bitmap while reading inode bitmap debugfs: quit

Alas, the debugfs command couldn't access a bitmap in the filesystem that tells it where to find specific inodes (in this case, the journal's inode).

If you are able to clear the journal using the tune2fs or debugfs command, you should retry the e2fsck application, using its -c option to have e2fsck check for bad blocks in the filesystem and, if any are found, add them to the disk's bad block list.

Since we can't fsck or fix the filesystem on the ailing disk, it's time to bring out the big hammer.

10.7.5. Cloning a Bad Disk Using ddrescue

If bad blocks are preventing you from reading or repairing a disk that contains data you want to recover, the next thing to try is to create a copy of the disk using a raw disk copy utility. Unix/Linux systems have always provided a simple utility for this purpose, known as dd, which copies one file/partition/disk to another and provides commands that enable you to proceed even in the face of various types of read errors. You must put another disk in your system that is at least the same size or larger than the disk or partition that you are attempting to clone. If you copy a smaller disk to a larger one, you'll obviously be wasting the extra space on the larger disk, but you can always recycle the disk after you extract and save any data that you need from the clone of the bad disk.

To copy one disk to another using dd, telling it not to stop on errors, you would use a command like the following:

 # dd if=/dev/hda of=/dev/hdb conv=noerror,sync

This command would copy the bad disk (here, /dev/hda) to a new disk (here, /dev/hdb), ignoring errors encountered when reading (noerror) and padding the output with an appropriate number of nulls when unreadable blocks are encountered (sync).

dd is a fine, classic Unix/Linux utility, but I find that it has a few shortcomings:

It is incredibly slow.
It does not display progress information, so it is silent until it is done.
It does not retry failed reads, which can reduce the amount of data that you can recover from a bad disk.

Therefore, I prefer to use a utility called ddrescue, which is available from http://www.gnu.org/software/ddrescue/ddrescue.html. This utility is not included in any Linux distribution that I'm aware of, so you'll have to download the archive, unpack it, and build it from source code. Version 0.9 was the latest version when this book was written.

The ddrescue command has a large number of options, as the following help message shows:

 # ./ddrescue -h GNU ddrescue - Data recovery tool. Copies data from one file or block device to another, trying hard to rescue data in case of read errors. Usage: ./ddrescue [options] infile outfile [logfile] Options:  -h, --help     display this help and exit -V, --version output version information and exit -B, --binary-prefixes show binary multipliers in numbers [default SI] -b, --block-size=<bytes> hardware block size of input device [512] -c, --cluster-size=<blocks> hardware blocks to copy at a time [128] -e, --max-errors=<n> maximum number of error areas allowed -i, --input-position=<pos> starting position in input file [0] -n, --no-split do not try to split error areas -o, --output-position=<pos> starting position in output file [ipos] -q, --quiet quiet operation -r, --max-retries=<n> exit after given retries (-1=infinity) [0] -s, --max-size=<bytes> maximum size of data to be copied -t, --truncate truncate output file -v, --verbose verbose operation Numbers may be followed by a multiplier: b = blocks, k = kB = 10^3 = 1000, Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc…  If logfile given and exists, try to resume the rescue described in it. If logfile given and rescue not finished, write to it the status on exit.  Report bugs to bug-ddrescue@gnu.org #

As you can see, ddrescue provides many options for controlling where to start reading, where to start writing, the amount of data to be read at a time, and so on. I generally only use the --max-retries option, supplying -1 as an argument to tell ddrescue not to exit regardless of how many retries it needs to make in order to read a problematic disk. Continuing with the previous example of cloning the bad disk /dev/hda to a new disk, /dev/hdb, that is the same size or larger, I'd execute the following command:

 # ddrescue --max-retries=-1 /dev/hda /dev/hdb Press Ctrl-C to interrupt rescued: 3729 MB, errsize: 278 kB, current rate: 26083 kB/s    ipos: 3730 MB, errors:  6,    average rate: 18742 kB/s    opos: 3730 MB Copying data…

The display is constantly updated with the amount of data read from the first disk and written to the second, including a count of the number of disk errors encountered when reading the disk specified as the first argument.

Once ddrescue completes the disk copy, you should run e2fsck on the copy of the disk to eliminate any filesystem errors introduced by the bad blocks on the original disk. Since there are guaranteed to be a substantial number of errors and you're working from a copy, you can try running e2fsck with the -y option, which tells e2fsck to answer yes to every question. However, depending on the types of messages displayed by e2fsck, this may not always worksome questions are of the form Abort? (y/n), to which you probably do not want to answer "yes."

Here's some sample e2fsck output from checking the consistency of a bad 250-GB disk containing a single partition that I cloned using ddrescue:

 # fsck -y /dev/hdb1 fsck 1.35 (28-Feb-2004) e2fsck 1.35 (28-Feb-2004) /dev/hdb1 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Root inode is not a directory. Clear? yes Inode 12243597 is in use, but has dtime set. Fix? yes Inode 12243364 has compression flag set on filesystem without compression support. Clear? yes Inode 12243364 has illegal block(s). Clear? yes Illegal block #0 (1263225675) in inode 12243364. CLEARED. Illegal block #1 (1263225675) in inode 12243364. CLEARED. Illegal block #2 (1263225675) in inode 12243364. CLEARED. Illegal block #3 (1263225675) in inode 12243364. CLEARED. Illegal block #4 (1263225675) in inode 12243364. CLEARED. Illegal block #5 (1263225675) in inode 12243364. CLEARED. Illegal block #6 (1263225675) in inode 12243364. CLEARED. Illegal block #7 (1263225675) in inode 12243364. CLEARED. Illegal block #8 (1263225675) in inode 12243364. CLEARED. Illegal block #9 (1263225675) in inode 12243364. CLEARED. Illegal block #10 (1263225675) in inode 12243364. CLEARED. Too many illegal blocks in inode 12243364. Clear inode? yes Free inodes count wrong for group #1824 (16872, counted=16384). Fix? yes Free inodes count wrong for group #1846 (16748, counted=16384). Fix? yes Free inodes count wrong (30657608, counted=30635973). Fix? yes [much more output deleted]

Once e2fsck completes, you'll see the standard summary message:

 /dev/hdb1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/hdb1: 2107/30638080 files (16.9% non-contiguous), 12109308/61273910 blocks

10.7.6. Checking the Restored Disk

At this point, you can mount the filesystem using the standard mount command and see how much data was recovered. If you have any idea how full the original filesystem was, you will hopefully see disk usage similar to that in the recovered filesystem. The differences in disk usage between the clone of your old filesystem and the original filesystem will depend on how badly corrupted the original filesystem was and how many files and directories had to be deleted due to inconsistency during the filesystem consistency check.

Remember to check the lost+found directory at the root of the cloned drive (i.e., in the directory where you mounted it), which is where fsck and its friends place files and directories that could not be correctly linked into the recovered filesystem. For more detailed information about identifying and piecing things together from a lost+found directory, see "Piece Together Data from the lost+found" [Hack #96].

You'll be pleasantly surprised at how much data you can successfully recover using this techniqueas will your users, who will regard you as even more wizardly after a recovery effort such as this one. Between this hack and your backups (you do backups, right?), even a disk failure may not cause significant data loss.

10.7.7. See Also

"Recover Lost Partitions" [Hack #93]
"Repair and Recover ReiserFS Filesystems" [Hack #95]
"Piece Together Data from the lost+found" [Hack #96]
"Recover Deleted Files" [Hack #97]