8.1. I Can't Boot
Because the Partition Is Corrupt
There are a number of reasons why partitions
become corrupt. You may have lost power. Minor electrical surges
can affect what is written to a drive. As hard
wear out, bad
blocks can corrupt your data.
Yes, hard drive specifications suggest that the
mean time between failures is several hundred thousand hours, which
corresponds to several decades. But that's just an average, under
ideal conditions. If all hard drives were that reliable, RAID would
not be quite so popular.
If your hard drive is failing, you may not be
able to fix the problem. The best that you can do is minimize the
corruption until you can create a backup. We'll show you how to
back up data from a failing hard drive in the
One reason for the popularity of the Reiser
filesystem is its sensitivity to hard drive corruption. If you find
corruption on your
-formatted filesystems, you'll
probably have a bit more time to save your data.
In this chapter, we'll describe two categories
of filesystem corruption. The first, whose symptoms are described
in the following annoyance, occurs when a hard drive wears out. The
second is the
glitch that you can recover from while
the data on your disk. The temporary glitch is most
commonly associated with a power failure. For example, once when I
tripped over a cord, I lost power on my desktop computer. The next
that computer, I saw the following message:
*** An error occurred during the filesystem check.
*** Dropping you to a shell; the system will reboot
*** when you leave the shell.
Give root password for maintenance
This problem is most commonly associated with
filesystems that do not include a journal, such as
. Whenever there's corruption, there's a
risk that Linux won't be able to find some of your files.
Journaling filesystems keep a static database of file locations.
But journaling is not a guarantee. I've had this error even on a
Checks with fsck
Whenever there is corruption, the first Linux
command you should use is
Ideally, you can apply this command alone to a specific, unmounted
partition. For example, I managed to clean one partition with this
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
/: recovering journal
Cleaning orphaned inode 16915 (uid=1000, gid=0, mode=0140600, size=0)
Cleaning orphaned inode 16914 (uid=1000, gid=0, mode=0140600, size=0)
Cleaning orphaned inode 16909 (uid=1000, gid=0, mode=0140600, size=0)
Cleaning orphaned inode 302828 (uid=0, gid=0, mode=020600, size=0)
/: clean, 165245/525888 files, 694569/1050241 blocksa
Do not run
on a mounted partition. If you can't
the desired partition,
from a rescue CD such as
On most Linux systems,
works on a variety of filesystem formats.
should find a variety of commands, such as:
/sbin/fsck /sbin/fsck.ext3 /sbin/fsck.msdos /sbin/fsck.xfs
/sbin/fsck.cramfs /sbin/fsck.jfs /sbin/fsck.reiserfs
frontend for all the filesystem-specific commands on your system.
The proper utility is
based on the type of the filesystem you
run it on.
If your system still has bad blocks, it may be
the first sign of an
failure. Hard drives can include
hundreds of thousands of blocks. If one goes bad, that may not be
the end of the world. But it may be a symptom of other problems.
Many Linux gurus believe that is the time to get a new hard
If you're still not sure, the
command can help you determine if
your hard drive is in trouble. For example, the following command
the ID number associated with each bad block to the
badblocks -v /dev/hda7 -o blockbad
Checking for bad blocks (read-only test): 697008/ 1050241
Make sure the target partition is unmounted
before running the
command probably fixed any errors on that
filesystem, and you can continue using Linux normally. The
following output is evidence that the repair was completely
0 bad blocks
When bad blocks
, you should rerun
with more severe options,
described in the next section.
If you need to keep the hard drive working until
a new one arrives, back it up as soon as possible. We show you how
to do this with a partially corrupt partition in the next
annoyance. But until that new hard drive arrives, there are things
you can do to keep your current hard drive going.
8.1.4. Fixing Bad
command can help you check, mark, and fix bad blocks, and can help
preserve the health of your filesystems. For that reason, current
distributions force a periodic
on each filesystem formatted in the
formats. You can do your own
maintenance with the switches shown in
Table 8-1; some of these switches are not documented on the
Table 8-1. fsck command switches
Specifies a different superblock, which you can
find on ext2/ext3 systems with the
command with the existing superblock
Salvages unused chains to files
Sets verbose mode
Specifies a default answer of "yes";
interactively asks if you
want to mark bad blocks
SUSE formats its partitions by default as
ReiserFS filesystems. This filesystem is
that SUSE doesn't force a periodic
on such partitions.
For example, the following command marks the bad
blocks on your system. If you're fortunate, each
"pass" of your partition proceeds without
incident. The following is sample output from a run on a good
fsck -cyfv /dev/hda5
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
Checking for bad blocks (read-only test): done
Pass 1: Checking inodes, blocks and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
. . .
However, I had problems with a different
partition. In the middle of this process, the test seemed to stop.
I was tempted to interrupt the command by pressing Ctrl-C, but
after a few minutes. As you can see here, the
test turned up problems:
Duplicate blocks found.... invoking duplicate block passes
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 1448: 13568
Pass 1C Scan directories for inodes with dup blocks.
Error reading block 697043 (Attempt to read block from filesystem resulted
in a short read). Ignore error?
Pass 1D: Reconciling duplicate blocks
(There are 4 inodes containing duplicate/bad blocks)
File <The journal inode> (inode #8, mod time Fri Nov 12 08:43:05 2005)
has 10 duplicate block(s), shared with 1 file(s):
<The bad blocks inode> (inode #1, mod time Fri Jan 7 12:11:24 2006)
Clone duplicate/bad blocks?
Error reading block 4049 (Attempt to read block from filesystem resulted in short read).
The check continued,
errors. But the most important error is near the beginning of the
file. As you can see, there is corruption even in the journal. Any
pointers from the journal to other files are thus suspect.
After your bad blocks are
avoid reading data from those locations. The time is right for a
backup. If standard techniques described in Chapter 2 don't work,
see the next annoyance.