Chapter Review

Filesystems allow us to organize and manage our data in files and directories. This is not difficult to understand. What may be a little more taxing mentally are the various tricks that HFS and VxFS use to try to make managing our data as efficient an exercise as possible. There is little doubt that HFS was a major advancement in filesystem technology when it was first published. Nowadays, some would say that HFS is " showing its age; " VxFS offers all the features of HFS and more, including fine-tuning of intent logging and mount options, as well as fine-tuning IO characteristics. In our discussions, we have looked at the internal structure of both HFS and VxFS. Some people comment that all the internal "stuff" is too much information and not necessary. Personally, I think if you have an understanding of how a filesystem works, then you can make informed decisions when something goes wrong. At the beginning of the chapter, we mentioned times when saying "no" to fsck might be perfectly okay. Here's a simple example of why having a basic understanding of structures such as inodes can help you make informed decisions regarding how to fix a filesystem:

The example uses an HFS filesystem, just for simplicity. This could apply to VxFS as well (although less likely).
The corruption occurred due to an unexpected system crash.
We want to retain one of the affected files; the other file will need to be deleted. The only safe and supported way to delete the corrupted file is via fsck .
Figure 8-9 summarizes the problem:

Figure 8-9. Two files referencing the same data block.

Here is a listing of the actual files:

 root@hpeos003[]  ll -i /data  total 40      5 -r--r--r--   1 root       sys            566 Feb 11 14:04 hosts      3 drwxr-xr-x   2 root       root          8192 Feb 11 14:03 lost+found      4 -r--r--r--   1 root       sys          10651 Feb 11 14:04 passwd root@hpeos003[]

Both files are referencing block number 144:

 root@hpeos003[]  umount /data  root@hpeos003[]  fsdb -F hfs /dev/vg00/lvol10  file system size = 102400(frags)   isize/cyl group=48(Kbyte blocks) primary block size=8192(bytes) fragment size=1024 no. of cyl groups = 42 2i.fd d0: 2      .  d1: 2      .  .  d2: 3      l  o  s  t  +  f  o  u  n  d  d3: 4      p  a  s  s  w  d  d4: 5      h  o  s  t  s  4i i#:4  md: f---r--r--r-- ln:    1 uid:    0 gid:    3 sz: 10651 ci:0   a0 :   144   a1 :   107  a2 :     0  a3 :     0  a4 :     0  a5 :     0   a6 :     0  a7 :     0  a8 :     0  a9 :     0  a10:     0  a11:     0   a12:     0  a13:     0  a14:     0   at: Wed Feb 11 14:04:16 2004 mt: Wed Feb 11 14:04:16 2004 ct: Wed Feb 11 14:04:16 2004 5i i#:5  md: f---r--r--r-- ln:    1 uid:    0 gid:    3 sz: 566 ci:0   a0 :   144   a1 :     0  a2 :     0  a3 :     0  a4 :     0  a5 :     0   a6 :     0  a7 :     0  a8 :     0  a9 :     0  a10:     0  a11:     0   a12:     0  a13:     0  a14:     0   at: Wed Feb 11 14:04:16 2004 mt: Wed Feb 11 14:04:16 2004 ct: Wed Feb 11 14:04:16 2004

Problems are evident only when we try to access the files. Due to the file size (the sz field in the inode), commands such as cat will display only the specified number of bytes.

 root@hpeos003[]  cat /data/hosts  root:*:0:3::/:/sbin/sh daemon:*:1:5::/:/sbin/sh bin:*:2:2::/usr/bin:/sbin/sh sys:*:3:3::/: adm:*:4:4::/var/adm:/sbin/sh uucp:*:5:3::/var/spool/uucppublic:/usr/lbin/uucp/uucico lp:*:9:7::/var/spool/lp:/sbin/sh nuucp:*:11:11::/var/spool/uucppublic:/usr/lbin/uucp/uucico hpdb:*:27:1:ALLBASE:/:/sbin/sh oracle:*:102:102:Oracle:/home/oracle:/usr/bin/sh nobody:*:-2:-2::/: www:*:30:1::/: smbnull:*:101:101:DO NOT USE OR DELETE - needed by Samba:/home/smbnull:/sbin/sh webadmin:*:40:1::/usr/obam/server/nologindir:/usr/bin/false oracle2:*:104:102:orroot@hpeos003[]

In this instance, we see only 566 bytes from the data block. This looks like data from a passwd file and demonstrates to me that this file is corrupt. The file /data/passwd looks like it is intact:

 root@hpeos003[]  cat /data/passwd  root:*:0:3::/:/sbin/sh daemon:*:1:5::/:/sbin/sh bin:*:2:2::/usr/bin:/sbin/sh sys:*:3:3::/: adm:*:4:4::/var/adm:/sbin/sh uucp:*:5:3::/var/spool/uucppublic:/usr/lbin/uucp/uucico lp:*:9:7::/var/spool/lp:/sbin/sh nuucp:*:11:11::/var/spool/uucppublic:/usr/lbin/uucp/uucico hpdb:*:27:1:ALLBASE:/:/sbin/sh oracle:*:102:102:Oracle:/home/oracle:/usr/bin/sh nobody:*:-2:-2::/: www:*:30:1::/: smbnull:*:101:101:DO NOT USE OR DELETE - needed by Samba:/home/smbnull:/sbin/sh webadmin:*:40:1::/usr/obam/server/nologindir:/usr/bin/false oracle2:*:104:102:oracle2:/home/oracle2:/usr/sbin/sh ... fred:*:2000:30:Fred Flinstone:/home/fred:/usr/sbin/sh barney:*:2001:40:Barney Rubble:/home/barney:/usr/sbin/sh root@hpeos003[]

If we were to manage these files using standard commands such as rm , we would eventually cause a system PANIC (with a PANIC string of " freeing free frag "), i.e., we delete the file called /data/passwd ; the filesystem will release bock 144. We then delete the file /data/hosts using the rm command. This second rm command will confuse the filesystem because it already has block 144 on its free list, hence, the system PANIC.

We will now run fsck . I will say "no" to questions relating to the deletion of the file /data/passwd . I will say "yes" to questions relating to the deletion of the file /data/hosts . Here goes:

 root@hpeos003[]  umount /data  root@hpeos003[]  fsck -F hfs /dev/vg00/rlvol10  ** /dev/vg00/rlvol10 ** Last Mounted on /data ** Phase 1 - Check Blocks and Sizes 144 DUP I=5 ** Phase 1b - Rescan For More DUPS 144 DUP I=4 ** Phase 2 - Check Pathnames DUP/BAD  I=4   OWNER=root MODE=100444 SIZE=10651 MTIME=Feb 11 14:04 2004  FILE=/passwd REMOVE?  n  DUP/BAD  I=5   OWNER=root MODE=100444 SIZE=566 MTIME=Feb 11 14:04 2004  FILE=/hosts REMOVE?  y  ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts BAD/DUP FILE I=5   OWNER=root MODE=100444 SIZE=566 MTIME=Feb 11 14:04 2004  CLEAR?  y  FREE INODE COUNT WRONG IN SUPERBLK FIX?  y  ** Phase 5 - Check Cyl groups 1 BLK(S) MISSING BAD CYLINDER GROUPS FIX?  y  ** Phase 6 - Salvage Cylinder Groups 3 files, 0 icont, 20 used, 99649 free (9 frags, 12455 blocks) ***** FILE SYSTEM WAS MODIFIED ***** root@hpeos003[]  fsck -F hfs /dev/vg00/rlvol10  ** /dev/vg00/rlvol10 ** Last Mounted on /data ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 3 files, 0 icont, 20 used, 99649 free (9 frags, 12455 blocks) root@hpeos003[] root@hpeos003[]  mount /dev/vg00/lvol10 /data  root@hpeos003[]  ll -i /data  total 38      3 drwxr-xr-x   2 root       root          8192 Feb 11 14:03 lost+found      4 -r--r--r--   1 root       sys          10651 Feb 11 14:04 passwd root@hpeos003[]  ll -i /data/lost+found  total 0 root@hpeos003[]

We have now fixed the filesystem safely using fsck . The passwd file has been retained, and the corrupted hosts file has been safely deleted and can be restored from a recent backup (it does not appear in the lost+found directory because this is used for orphaned files, i.e., inodes with no corresponding directory entry). Without an understanding of the basic operation of the filesystem, we would not be in a position to make an informed decision regarding the questions posed by fsck .

IMPORTANT

Being able to diagnose which file is corrupt and which files should be retained is a complex task. In this trivial example, we can determine which file to retain because both files are text files and we know the expected format of both files. In most cases, this may not be possible, especially if the files concerned are binary/non-text files. In such a case, it is advised to delete both files and restore them from a recent backup.

Figure 8-9. Two files referencing the same data block.

IMPORTANT