Hack95.Repair and Recover ReiserFS Filesystems | BSD Sockets Programming from a Multi-Language Perspective (Programming Series)

Hack 95. Repair and Recover ReiserFS Filesystems

Different filesystems have different repair utilities and naming conventions for recovered files. Here's how to repair a severely damaged ReiserFS filesystem.

"Recover Data from Crashed Disks" [Hack #94] explained how to use the ddrescue utility to clone a disk or partition that you could not check the consistency of or read, and how to use the ext2/ext3 e2fsck utility to check and correct the consistency of the cloned disk or partition. This hack explains how to repair and recover severely damaged ReiserFS filesystems.

The ReiserFS filesystem was the first journaling filesystem that was widely used on Linux systems. Journaling filesystems such as ext3, JFS, ReiserFS, and XFS save pending disk updates as atomic transactions in a special on-disk log, and then asynchronously commit those updates to disk, guaranteeing filesystem consistency at any given point. Developed by a team led by Hans Reiser, ReiserFS incorporates many of the cutting-edge concepts of the time into a stable journaling filesystem that is the default filesystem type on Linux distributions such as SUSE. For more information about the ReiserFS filesystem, see its home page at http://www.namesys.com.

ReiserFS filesystems have their own utility, reiserfsck, which provides special options for repairing and recovering severely damaged ReiserFS filesystems. Like fsck, the reiserfsc utility uses a lost+found directory, located at the root of the filesystem, to store undamaged files or directories that could not be relinked into the filesystem correctly during the consistency check. However, unlike with ext2/ext3 filesystems, this directory is not created when a ReiserFS filesystem is created; it is only created when it is needed. If it has already been created by a previous reiserfsck consistency check, the existing lost+found directory is used.

10.8.1. Correcting a Damaged ReiserFS Filesystem

Though ReiserFS filesystems guarantee filesystem consistency through journaling, hardware problems can still prevent a ReiserFS filesystem from reading or correctly replaying its journal. Like inconsistencies in any Linux filesystem that is automatically mounted at boot time, this will cause your system's boot process to pause and drop you into a root shell (after you supply the root password). The following is a sample problem report from the reiserfsck application:

 reiserfs_open: the reiserfs superblock cannot be found on /dev/hda2. Failed to open the filesystem. If the partition table has not been changed, and the partition is valid and it really contains a reiserfs partition, then the superblock is corrupted and you need to run this utility with --rebuild-sb.

When you see a problem such as this, check /var/log/messages for any reports of problems on the specified partition or the disk that contains it. For example:

 Jun 17 06:48:20 64bit kernel: hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error } Jun 17 06:48:20 64bit kernel: hdb: drive_cmd: error=0x04 { DriveStatusError } Jun 17 06:48:20 64bit kernel: ide: failed opcode was: 0xef

If you see drive errors such as these, clone the drive before it actually fails [Hack #94], and then attempt to correct filesystem problems on the cloned disk. If you see no disk errors, it's safe to try to resolve the problem on the original disk. Either way, you should then use the following steps to correct ReiserFS consistency problems (I'll use /dev/hda2 as an example, but you should replace this with the actual name of the partition with which you're having problems):

If the disk reported superblock problems, execute the reiserfsck -rebuild-sb partition command to rebuild the superblock. You'll be prompted for the ReiserFS version (3.6 if you are running a Linux kernel newer than 2.2.x), the block size (4096 by default, unless you specified a custom block size when you created the filesystem), the location of the journal (an internal default unless you changed it when you created the partition), and whether the problem occurred as a result of trying to resize the partition. After reiserfsck performs its internal calculations, you'll be prompted as to whether you should accept its suggestions. The answer to this should always be "yes," unless you want to try resolving the problem manually using the reiserfstune application, which would require substantial wizardry on your part. Here's an example:

 # reiserfsck --rebuild-sb /dev/hda2 reiserfsck 3.6.18 (2003 www.namesys.com) [verbose messages deleted] Do you want to run this program?[N/Yes] (note need to type Yes if you do): Yes reiserfs_open: the reiserfs superblock cannot be found on /dev/hda2. what the version of ReiserFS do you use[1-4] (1) 3.6.x (2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2. 2, choose this one) (3) < 3.5.9 converted to new format (don't choose if unsure) (4) < 3.5.9 (this is very old format, don't choose if unsure) (X) exit 1 Enter block size [4096]: 4096 No journal device was specified. (If journal is not available, re-run with --no-journal-available option specified). Is journal default? (y/n)[y]: y Did you use resizer(y/n)[n]: n rebuild-sb: no uuid found, a new uuid was generated (9966c3a3-7962-4a9b b027-7ea921e567ac) Reiserfs super block in block 16 on 0x302 of format 3.6 with standard journal Count of blocks on the device: 2048272 Number of bitmaps: 63 Blocksize: 4096 Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 0 Root block: 0 Filesystem is NOT clean Tree height: 0 Hash function used to sort names: not set Objectid map size 0, max 972 Journal parameters: Device [0x0] Magic [0x0] Size 8193 blocks (including 1 for journal header) (first block 18) Max transaction length 1024 blocks Max batch size 900 blocks Max commit age 30 Blocks reserved by journal: 0 Fs state field: 0x1: some corruptions exist. sb_version: 2 inode generation number: 0 UUID: 9966c3a3-7962-4a9b-b027-7ea921e567ac LABEL: Set flags in SB: Is this ok ? (y/n)[n]: y The fs may still be unconsistent. Run reiserfsck --check.

Try running the reiserfscheck partition command, as suggested. If you're lucky, this will resolve the problem, in which case you can skip the rest of the steps in this list and go to the next section. However, if the partition contains additional errors, this command will fail with a message like the one shown here:

 # reiserfsck --check /dev/hda2 reiserfsck 3.6.18 (2003 www.namesys.com) [verbose messages deleted] Do you want to run this program?[N/Yes] (note need to type Yes if you do): Yes ########### reiserfsck --check started at Sun Jun 26 21:54:58 2005 ########### Replaying journal.. Reiserfs journal '/dev/hda2' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. Bad root block 0. (--rebuild-tree did not complete) Aborted

If the reiserfsckcheck partition command fails, you need to rebuild the data structures that organize the filesystem tree by using the reiserfsckrebuild-tree partition command, as suggested. You will also want to specify the S option, which tells reiserfsck to scan the entire disk. This forces reiserfsck to do a complete rebuild, as opposed to trying to minimize its data structure updates. The following shows an example of using this command:

 # reiserfsck --rebuild-tree -S /dev/hda2 reiserfsck 3.6.18 (2003 www.namesys.com) [verbose messages deleted] Do you want to run this program?[N/Yes] (note need to type Yes if you do): Yes Replaying journal.. Reiserfs journal '/dev/hda2' in blocks [18..8211]: 0 transactions replayed ########### reiserfsck --rebuild-tree started at Sun Jun 26 21:56:29 2005 ########### Pass 0: ####### Pass 0 ####### The whole partition (2048272 blocks) is to be scanned Skipping 8273 blocks (super block, journal, bitmaps) 2039999 blocks will be read 100% left 0, 9230 /sec 383 directory entries were hashed with "r5" hash. Selected hash ("r5") does not match to the hash set in the super block (not set). "r5" hash is selected Flushing..finished Read blocks (but not data blocks) 2039999 Leaves among those 2032 Objectids found 390 Pass 1 (will try to insert 2032 leaves): ####### Pass 1 ####### Looking for allocable blocks .. finished 100% left 0, 225 /sec Flushing..finished 2032 leaves read 1975 inserted 57 not inserted non-unique pointers in indirect items (zeroed) 444 ####### Pass 2 ####### Pass 2: 100% left 0, 0 /sec Flushing..finished Leaves inserted item by item 57 Pass 3 (semantic): ####### Pass 3 ######### Flushing..finished Files found: 359 Directories found: 25 Broken (of files/symlinks/others): 2 Pass 3a (looking for lost dir/files): ####### Pass 3a (lost+found pass) ######### Looking for lost directories: done 1, 1 /sec Looking for lost files: Flushing..finished Objects without names 4 Files linked to /lost+found 4 Pass 4 - finished Deleted unreachable items 23 Flushing..finished Syncing..finished ########### reiserfsck finished at Sun Jun 26 22:00:26 2005 ###########

Pass 3a in this sample output shows that some files were linked into the filesystem's lost+found directory. See the next section of this hack for information about those files.

Once this command completes, try manually mounting the partition that you had problems with, as in the following example:
```
 # mount -t reiserfs /dev/hda2 /mnt/restore 
```

If the mount completes successfully, check the lost+found directory for recovered files (their naming conventions are explained in the next section):

 # ls -al /mnt/restore/lost+found total 179355 drwx------ 2 root root 144 2005-06-26 20:44 . drwxr-xr-x 27 root root 1176 2005-06-26 20:24 .. -rw-r--r-- 1 root root 33745969 2005-06-26 20:24 350_355 -rw-r--r-- 1 root root 27046983 2005-06-26 20:24 350_356 -rw-r--r-- 1 root root 67049649 2005-06-26 20:24 350_357 -rw-r--r-- 1 root root 55630200 2005-06-26 20:24 350_358

If you experienced problems with one partition on a drive and saw disk errors in the system log (/var/log/messages), you should also check the consistency of all other data partitions on the disk using reiserfsck or the consistency checker that is appropriate for any other type of filesystem you are using. You can list the partitions on the disk and their types using the fdiskl command, as in the following example:

 # fdisk -l /dev/hda Disk /dev/hda: 60.0 GB, 60022480896 bytes 255 heads, 63 sectors/track, 7297 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start   End    Blocks  Id  System  /dev/hda1 *     1    13     104391 83  Linux  /dev/hda2    14  1033    8193150 83  Linux  /dev/hda3  1034  1098    522112+ 82  Linux swap / Solaris  /dev/hda4  1099  7297  49793467+ f  W95 Ext'd (LBA)  /dev/hda5  1099  2118   8193118+ 83  Linux  /dev/hda6  2119  3138   8193118+ 83  Linux  /dev/hda7  3139  4158   8193118+ 83  Linux  /dev/hda8  4159  5178   8193118+ 83  Linux  /dev/hda9  5179  6198   8193118+ 83  Linux  /dev/hda10  6199  7218   8193118+ 83  Linux

10.8.2. Identifying Files and Directories in the ReiserFS lost+found

To explore a filesystem's lost+found directory, you must first mount the filesystem, using the standard Linux mount command, which you must execute as the root user. When mounting ReiserFS filesystems, you must use the mount command's t reiserfs option to identify the filesystem as a ReiserFS filesystem and therefore mount it appropriately. Once the filesystem is mounted, cd to the lost+found directory at the root of that filesystem, which will be located in the directory where you mounted the filesystem. If this directory contains any files or directories, you're in luckthere's more data in your filesystem than just the standard files and directories it contains!

As with the lost+found directories used by other types of Linux filesystems, the entries in a ReiserFS lost+found directory are files and directories whose parent inodes or directories were damaged and discarded during the consistency check. You will have to do a bit of detective work to find out what these are, but two factors work in your favor:

The names of the files and directories in the lost+found directory for ReiserFS filesystems are based on the ReiserFS nodes associated with the lost files or directories and their parents and are in the form NNN_NNN (parent_file/dir). Files and directories with the same numbers in the first portions of their names are usually associated with each other.
The reiserfsck program simply re-links unconnected files and directories into the lost+found directory, which preserves the creation, access, and modification timestamps associated with those files and directories.

Aside from the different naming conventions used by the files in a ReiserFS lost+found directory, the process of identifying related files and directories is the same as that described in "Piece Together Data from the lost+found" [Hack #96]. See that hack for more information.

10.8.3. See Also

"Recover Lost Partitions" [Hack #93]
"Recover Data from Crashed Disks" [Hack #94]
"Recover Deleted Files" [Hack #97]