10.2 Managing Filesystems

This section covers such topics as mounting and dismounting local and remote filesystems, the filesystem configuration file, and checking local filesystem integrity with the fsck utility: in other words, the nitty gritty details of managing filesystems.

10.2.1 Mounting and Dismounting Filesystems

Mounting is the process that makes a filesystem's contents available to the system, merging it into the system directory tree. A filesystem can bemounted or dismounted: that is, it can be connected to or disconnected from the overall Unix filesystem. The only exception is the root filesystem, which is always mounted on the root directory while the system is up and cannot be dismounted.

Thus, in contrast to some other operating systems, mounting a Unix filesystem does more than merely make its data available. Figure 10-1 illustrates the relationship between a system's disk partitions (and their corresponding special files) and its overall filesystem. On this system, the root filesystem the filesystem stored on the first partition of the root disk (disk 0) contains the standard Unix subdirectories /bin, /etc, and so on. It also contains the empty directories /home, /var, and /chem, which serve as mount points for other filesystems. This filesystem is accessed via the special file /dev/dsk/c1d0s0.

Figure 10-1. Mounting disk partitions within the Unix filesystem

The figure also shows several other filesystems. One of them, accessed via the special file /dev/dsk/c1d0s8 (partition 8 of the root disk), contains the files and directories under /var. A third filesystem partition 9 on disk 1 is accessed via the special file /dev/dsk/c1d1s9 and contains users' home directories, located under /home.

Another filesystem on this system is stored on partition 2 of disk 1 and is accessed via the special file /dev/dsk/c1d1s2. Its own root directory contains the subdirectories ./organic and ./inorganic and their contents. We'll call this the /chem filesystem, after its mount point within the system's directory tree. When /dev/dsk/c1d1s2 is mounted, these directories will become subdirectories of /chem.

One of the directories in the /chem filesystem, ./inorganic, is empty and is to be used as the mount point for yet another filesystem. The files in this fifth filesystem, on partition 2 on disk 2 and corresponding to the special file /dev/dsk/c1d2s2, become a subtree of the /chem filesystem when mounted.

The files in the root directory and its system subdirectories all come from disk 0, as do the empty directories /chem, /home, and /var before filesystems are mounted on them. Figure 10-1 illustrates the fact that the contents of the /chem directory tree come from two different physical disks.

In most cases, there is no necessary connection between a given filesystem and a particular disk partition (and its associated special file), for example, between the /chem filesystem and the special file /dev/dsk/c1d1s2. The collection of files on a disk partition can be mounted on any directory in the filesystem. After it is mounted, its top-level directory is accessed via the directory path where it is mounted, and it is often referred to by that directory's name.

At the same time, the root directory of the mounted filesystem replaces the directory where the filesystem is mounted. As a side effect, any files that were originally in the mount directory in this example, any files that might have been in /chem prior to mounting the new filesystem disappear when the new filesystem is mounted and thus cannot be accessed; they will reappear once the filesystem is dismounted.

To illustrate this phenomenon, let's watch a filesystem being mounted:

# ls -saC /chem                                /chem's contents before mount.  total 20  4 .             4 ..            12 README # mount /dev/dsk/c1d1s2 /chem                  Mount partition 2 on disk 1. # ls -saC /chem                                /chem's contents after mount.  total 48  4 .             4 ..             4 inorganic      32 lost+found    4 organic  # du -s /chem                                  /chem is much bigger.  587432 /chem

Before the filesystem is mounted, there is just one ordinary file in /chem: README. After /dev/dsk/c1d1s2 is mounted, README disappears. It's still on the root disk, but it can't be accessed while the /chem filesystem is mounted.However, it will reappear when the filesystem is dismounted. After the filesystem is mounted, the subdirectories organic and inorganic appear, along with their contents (reflected in the larger amount of data under /chem).

On most Unix systems, a filesystem can only be mounted in one place at one time (Linux is an exception).

10.2.2 Disk Special File Naming Conventions

We looked at disk special filenames in detail in Section 2.3. The following list reviews the disk special file naming conventions for a SCSI disk under the various operating systems we are considering by listing the special file used for a partition on the third SCSI disk (SCSI ID 4) on the first SCSI controller (accessed in raw mode):^[10]

^[10] Under FreeBSD 4, the block and raw devices are equivalent. Character devices are vestigial in Version 4 and are slated to be removed in FreeBSD Version 5.

AIX	/dev/hdisk2 (refers to the entire disk)
FreeBSD	/dev/da0s1e (short form: /dev/da1c)
HP-UX	dev/rdsk/c0t4d0
Linux	/dev/sdc1
Solaris	/dev/rdsk/c0t4d0s7
Tru64	/dev/rdisk/dsk2c

10.2.3 The mount and umount Commands

To mount a filesystem manually, use the mount command as follows:

# mount  [-o  options ] block-special-file    mount-point

This command mounts the filesystem located on the specified disk partition. The root directory on this filesystem will be attached at mount-point within the overall Unix filesystem. This directory must already exist before the mount command is executed.

For example, the commands:

# mkdir /users2 # mount /dev/dsk/c1t4d0s7 /users2

create the directory /users2 and mount the filesystem located on the disk partition /dev/dsk/c1t4d0s7 on it. On some systems, mount's -r option may be used to mount a filesystem read-only. For example:

# mount -r /dev/dsk/c1t4d0s7 /mnt

Use mount without options to display a list of currently mounted filesystems.

The mount command can also be used to mount remote filesystems via NFS. We'll consider this use later in this chapter.

The umount command may be used to dismount filesystems:

# umount  name

This command dismounts the filesystem specified by name, where name is either the name of the filesystem's block special file or the name of the mount point where this filesystem is mounted. The -f option may be used to force an dismount operation in some cases (e.g., when there are open files), but it should be used with caution.

This section has illustrated only the simplest uses of mount and umount. We'll look at many more examples in the course of this chapter.

10.2.4 Figuring Out Who's Using a File

Filesystems must be inactive before they can be dismounted. If any user has one of a filesystem's directories as her current directory or has any file within the filesystem open, you'll get an error message something like this one if you try to unmount that filesystem:

umount: /dev/hdb1: device is busy

The fuser command may be used to determine whichfiles within a filesystem are currently in use and to identify the processes and users that are using them. If fuser is given a filename as its argument, it reports on that file alone. If it is given a disk special filename as its argument, it reports on all files within the corresponding filesystem. The -u option tells fuser to display user ID's as well as PID's in its output.

For example, the following command displays all processes and their associated users that are using files on the specified disk on an HP-UX system:

$ fuser -u /dev/dsk/c1t1d0

Under Linux, including the -m option will allow you to specify the filesystem by name; the -c option performs the same function under Solaris.

Here is an example of fuser's output:

/chem: 3119c(chavez) 3229(chavez)  3532(harvey)  3233e(wang)

Four processes are using the /chem filesystem at this moment. Users chavez and harvey have open files, indicated by the second and third process IDs, which appear without a final code letter. User chavez also has her current working directory within this filesystem (indicated by the c code after the first PID), and user wang is running a program whose executable resides within the filesystem (indicated by the e code after the final PID).

fuser's -k option may be used to kill all of the processes using the specified file or filesystem.

The lsof command performs a similar function on FreeBSD systems (and is also available for the other operating systems as well). Its output is a great deal more detailed. Here is a small part of its output (shortened to fit):

COMMAND PID   USER     FD    TYPE     DEVICE  NAME      vi      74808 aefrisch cwd   VDIR 116,131072  /usr/home/aefrisch vi      74808 aefrisch rtd   VDIR 116,131072  / vi      74808 aefrisch txt   VREG 116,131072  /usr/bin/vi vi      74808 aefrisch txt   VREG 116,131072  /usr/libexec/ld-elf.so.1 vi      74808 aefrisch txt   VREG 116,131072  /usr/lib/libncurses.so.5 vi      74808 aefrisch txt   VREG 116,131072  /usr/lib/libc.so.4 vi      74808 aefrisch   0   VCHR        0,0  /dev/ttyp0 vi      74808 aefrisch   1   VCHR        0,0  /dev/ttyp0 vi      74808 aefrisch   2   VCHR        0,0  /dev/ttyp0 vi      74808 aefrisch   3-W VREG 116,131072  /usr/home/aefrisch/.login vi      74808 aefrisch   4   VREG 116,131072  /var/tmp/vi.recover/vi.CJ6cay vi      74808 aefrisch   5   VREG 116,131072  / (/dev/ad0s1a)

These are the entries generated by a vi process editing this user's .login file. Note that this file is opened for writing, indicated by the W following the file descriptor number (column FD).

FreeBSD also provides the fstat command, which performs a similar function.

10.2.5 The Filesystem Configuration File

Mounting filesystems by hand every time they are needed would quickly become tedious, so the required mount commands are generally executed automatically at boot time. The filesystem configuration file typically contains information about all of the system's filesystems, for use by mount and other commands.^[11]

^[11] This section covers only local disks. We'll look at entries for remote disks later in this chapter.

/etc/fstab is the standard Unix filesystem configuration file. It generally has the following format:

special-file  mount-dir  fs-type  options  dump-freq  fsck-pass

The fields have the following meanings:

special-file

The name of the special file on which the filesystem resides. This must be a block device name.

mount-dir

The directory on which to mount the filesystem. If the partition will be used for swapping, / is sometimes used for this field.

fs-type

The filesystem type. The value for local filesystems is highly version-dependent. Common type values are nfs for volumes mounted remotely via NFS, swap or sw for swap partitions (although Tru64 uses UFS for these as well, and HP-UX also has the swapfs type for paging to a file within the filesystem), and ignore, which tells mount to ignore the line. Available filesystem types for the various Unix versions are listed later in this chapter.

options

This field consists of one or more options, separated by commas. The fs-type field, above, determines which options are allowed for any given kind of filesystem. For ignore type entries, this field is ignored.

Multiple options are separated by commas, without intervening spaces. On many systems, the keyword defaults may be placed into this field if no options are needed. Table 10-3 lists commonly used options for local filesystems and paging/swap spaces.

dump-freq

A decimal number indicating the frequency with which this filesystem should be backed up by the dump utility. A value of 1 means backup should occur every day, 2 means every other day, and so on. A value of 0 means that the device is not to be backed up (for example, swap devices). Not all systems actually use this field.

fsck-pass

A decimal number indicating the order in which fsck should check the filesystems. A value of 1 indicates that the filesystem should be checked first, 2 indicates that the filesystem should be checked second, and so on. The root and/or boot filesystems generally have the value 1. All other filesystems generally have higher pass numbers. For optimal performance, two filesystems that are on the same disk drive should have different pass numbers; however, filesystems on different drives may have the same pass number, letting fsck check the two filesystems in parallel. fsck will usually be fastest if all filesystems checked on the same pass are roughly the same size. This field should be 0 for swap devices (0 disables checking by fsck).

Table 10-3. Commonly used filesystem options
Option	Meaning
`rw`	Read-write filesystem (default for read-write devices).
`ro`	Read-only filesystem (default for read-only media such as CDs).
`nosuid`	The SetUID access mode is ignored within this filesystem; `suid` is the default.
`noauto`	Don't automatically mount this filesystem at boot time; `auto` is the default (Linux, FreeBSD).
`noexec`	Prevent binary programs from executing; `exec` is the default (Linux, FreeBSD, Tru64).
`nodev`	Prevent device access via special files (AIX, Linux, FreeBSD, Tru64).
`user`	Allow ordinary users to mount this filesystem (Linux).
`nogrpid`	Use System V-style group ownership inheritance for new files (i.e., the owner's primary group); BSD-style is the default (Linux, Tru64).
`resuid=n` `resgid=n`	Set the UID/GID that has access to the reserved blocks with the filesystem (Linux ext2/ext3).
`largefiles`	Support files larger than 2 GB (HP-UX VxFS, Solaris).
`logging`	Maintain a transaction log (Solaris). The default is `nologging`.
`delaylog`	Delay writing log entries slightly to improve performance, increasing risk of loss slightly. (HP-UX VxFS)
`writeback`	Write out log metadata and filesystem blocks in either order, for a slight performance improvement and increased risk of loss in the event of a crash (Linux ext3).
`nolog`	Don't use a transaction log (HP-UX VxFS).
`nologging`	Don't use a transaction log (Solaris).
`forcedirectio`	Use direct I/O to this filesystem: i.e., no buffering (Solaris). Useful for certain applications such as databases.
`notail`	Disable default behavior of storing small files directly within the hash tree (Linux ReiserFS).
`resize=n`	Resize the filesystem to n blocks on mounting (Linux ReiserFS).
`rq`	Mount read-write and enable disk quotas (Tru64).
`quota`	Enable disk quotas (HP-UX, Solaris).
`userquota` `groupquota`	Enable user/group disk quotas (FreeBSD).
`usrquota` `grpquota`	Enable user/group disk quotas (Linux).
`pri=n`	Set swap space priority (0 to 32767). Under Linux, higher numbers indicated more favored areas, which are used first; HP-UX favors lower priority areas.
`xx`	Ignore this entry (FreeBSD).

Here are some typical /etc/fstab entries, defining one or more local filesystems, a CD-ROM drive, and aswap partition:

# FreeBSD # device           mount         type      options         dump fsck /dev/ad0s1a        /             ufs       rw                 1 1 /dev/cd0c          /cdrom        cd9660    ro,noauto          0 0 /dev/ad0s2b        none          swap      sw                 0 0 # Linux # device           mount         type      options         dump fsck /dev/sda2          /             reiserfs  defaults           1 1 /dev/sda1          /boot         ext2      defaults           1 2 /dev/cdrom         /cdrom        auto      ro,noauto,user     0 0 /dev/sda3          swap          swap      pri=42             0 0 # HP-UX # device           mount         type      options         dump fsck /dev/vg00/lvol3    /             vxfs      defaults           0 1 /dev/vg00/lvol1    /stand        hfs       defaults           0 1 /dev/dsk/c1t2d0    /cdrom        cdfs      defaults           0 0 /dev/vg01/swap     ...           swap      pri=0              0 0 # Tru64 # device           mount         type      options         dump fsck root_domain#root   /             advfs     rw                 0 1 /dev/disk/cdrom0c  /cdrom        cdfs      ro                 0 2 # swap partition is defined in /etc/sysconfigtab

HP-UX and Tru64 use a logical volume manager by default for all local disks. Accordingly, the devices specified in /etc/fstab refer to logical volumes rather than actual disk partitions. Hence the rather strange device names in their examples. Logical volume managers are discussed later in this chapter.

Tru64 specifies swap partitions via the following stanza in the /etc/sysconfigtab file:

vm:    swapdevice = /dev/disk/dsk0b

10.2.5.1 Solaris: /etc/vfstab

Solaris uses a different filesystem configuration file, /etc/vfstab , which has a somewhat different format:

block-special-file char-special-file  mount-dir fs-type fsck-pass auto-mount? options

The ordering of the normal fstab fields is changed somewhat, and there are two additional ones. The second field holds the character device corresponding to the block device in the first field (which is used by the fsck command). The sixth field specifies whether the filesystem should be mounted automatically at boot time (note that the root filesystem is set to no).

Here is an example file:

# Solaris # mount            fsck   # device           device             mount  type  fsck  auto?  options  /dev/dsk/c0t3d0s2 /dev/rdsk/c0t3d0s0  /      ufs    1    no     rw  /dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0  /home  ufs    2    yes    rw,logging /dev/dsk/c0t3d0s1 -                   -      swap   -    no     -

Note that hyphens are placed in unused fields.

10.2.5.2 AIX: /etc/filesystems and /etc/swapspaces

The filesystem configuration file under AIX is /etc/filesystems . This file is updated automatically by various AIX filesystem manipulation commands, including crfs, chfs, and rmfs. /etc/filesystems contains all the information in /etc/fstab and some additional data as well, arranged in a stanza-based format. Here are some example entries:

/:         dev     = /dev/hd4                  Disk device.        vol     = "root"                    Descriptive label.        vfs     = jfs2                      Filesystem type.        mount   = automatic                 Mount automatically with mount -a.        check   = true                      Check with fsck if needed.        log     = /dev/hd8                  Device to use for filesystem log. /chem:         dev     = /dev/us00                 Logical volume.        vol     = "chem"                    Descriptive label.        vfs     = jfs2                      Filesystem type.        log     = /dev/loglv01              Device to use for filesystem log.        mount   = true                      Mount automatically with mount -a.        check   = 2                         Sets the fsck pass.        options = rw,nosuid                 Mount options.        quota   = userquota                 Enable user disk quotas.

Each mount point in the overall filesystem has its own stanza, specifying which logical volume (equivalent to a disk partition for this purpose) is to be mounted there. Like HP-UX and Tru64, AIX uses a logical volume manager by default (discussed later in this chapter).

Under AIX, paging logical volumes are listed in /etc/swapspaces , rather than in the filesystem configuration file. That file is maintained by paging space administration commands such as mkps, chps, and rmps, and its format is very simple:

hd6:       dev = /dev/hd6 paging00:       dev = /dev/paging00

This sample file lists two paging areas.

10.2.6 Automatic Filesystem Mounting

Regardless of its form, once the filesystem configuration file is set up, mounting may take place automatically. mount's -a option may be used to mount all filesystems that the filesystem configuration file says should be mounted on most systems. In addition, if a filesystem is included in the filesystem configuration file, the mount and umount commands will now require only the mount point or the special file name as their argument. For example, the command:

# mount /chem

looks up /chem in the filesystem configuration file to determine what special file is used to access it and then constructs and performs the proper mount operation. Similarly, the following command dismounts the filesystem on special file /dev/disk1d.:

# umount /dev/disk1d

umount also has a -a option to dismount all filesystems.

Both mount and umount have options to specify the type of filesystem being mounted or dismounted. Generally, this option is -t, but HP-UX and Solaris use -F, and AIX uses -v. This option may be combined with -a to operate on all filesystems of a given type. For example, the following command mounts all local filesystems under Tru64:

# mount -a -t advfs

FreeBSD, Tru64, and Linux also allow a type keyword to be preceded with no, causing the command to operate on all filesystem types except those listed. For example, this Linux command mounts all filesystems except DOS filesystems and remote (NFS) filesystems:

# mount -tnomsdos,nfs -a

Finally, under FreeBSD, Tru64, and Solaris, umount has a -h option that unmounts all remote filesystems from a specified host. For example, this command unmounts all filesystems from dalton:

# umount -h dalton

Under AIX, the -n option performs the same function.

10.2.7 Using fsck to Validate a Filesystem

A number of problems, ranging from operator errors to hardware failures, can corrupt a filesystem. The fsck utility ("filesystem check") checks the filesystem's consistency, reports any problems it finds, and optionally repairs them. Only under very rare circumstances will these repairs cause even minor data loss.

The equivalent utility for Tru64 AdvFS filesystems is verify (located in /sbin/advfs).

fsck can find the following filesystem problems:

One block belonging to several files (inodes).
Blocks marked as free but in use.
Blocks marked as used but free.
Incorrect link counts in inodes (indicating missing or excess directory entries).
Inconsistencies between inode size values and the number of data blocks referenced in address fields.
Illegal blocks (e.g., system tables) within files.
Inconsistent data in the filesystem's tables.
Lost files (nonempty inodes not listed in any directory). fsck places these files in the directory named lost+found in the filesystem's top-level directory.
Illegal or unallocated inode numbers in directories.

Basically, fsck performs a consistency check on the filesystem, comparing such items as the block free list against the disk addresses stored in the inodes (and indirect address blocks) and the inode free list against inodes in directory entries. It is important to understand that fsck's scope is limited to repairing the structure of the filesystem and its component data structures. The utility can do nothing about corrupted data within structurally intact files.

On older BSD-style systems, the fsck command is run automatically on boots and reboots. Under the System V scheme, fsck is run at boot time on filesystems only if they were not dismounted cleanly (e.g., if the system crashed). System administrators rarely need to run this utility manually: on boots when it finds serious problems (because fsck's automatic mode isn't authorized to repair all problems), after creating a new filesystem, and under a few other circumstances. Nevertheless, you need to understand how fsck works so that you'll be able to verify that the system boots correctly and to quickly recognize abnormal situations.

fsck has the following syntax:

# fsck  [options ] device

device is the special file for the filesystem. fsck runs faster on a character special file. If the device is omitted as it is at boot time all filesystems listed in the filesystem configuration file will be checked (all filesystems whose check attribute is not false will be checked under AIX).

On all systems except FreeBSD and Linux, the block device must be specified for the root filesystem in order to check it with fsck.

If fsck finds any problems, it asks whether or not to fix them. The example below shows a fsck report giving details about several filesystem errors and prompting for input as to what action to take:

# fsck /dev/rdisk1e  /dev/rdisk1e  ** Phase 1--Check Blocks and Sizes  POSSIBLE FILE SIZE ERROR I = 478 ** Phase 2--Check Pathnames  ** Phase 3--Check Connectivity  ** Phase 4--Check Reference Counts  UNREF FILE I = 478  OWNER = 190  MODE = 140664  SIZE = 0  MTIME = Sept 18 14:27 1990  CLEAR? y FREE INODE COUNT WRONG IN SUPERBLOCK  FIX? y ** Phase 5--Check Cylinder Groups 1243 files   28347 blocks  2430 free  *** FILE SYSTEM WAS MODIFIED ***

fsck found an unreferenced inode aninode marked as in use but not listed in any directory. fsck's output indicates its inode number, owner UID, and mode. From this information, we can figure out that the file is owned by user chavez and is a socket. The mode is interpreted as illustrated in Figure 10-2.

Figure 10-2. Interpreting fsck output

The first one or two digits of the mode indicate the file type: in this case, a socket that can be safely removed.

The available options for fsck allow automatic correction of the filesystem to take place (or be prevented):

-p: Preen the filesystem; automatically perform repairs that don't change any file's contents.
-n: Answer no to all prompts: list but don't repair any problems found.
-y: Answer yes to all prompts: repair all damage regardless of severity. Use this option with caution.^[12]

^[12] At the same time, it's not clear what alternatives you have. You can't mount a damaged filesystem, and, unless you're a real wizard regarding filesystem internals, fsck is the only tool available for fixing the filesystem.
-P: Preen the filesystem only if it is dirty (Tru64).
-f: Force a check even if the filesystem is clean (Linux).
-b n: Use an alternate superblock located at block n (BSD-style syntax). 32 is always an alternate superblock.

fsck is normally run with the -p option. In this mode, the following problems are silently fixed:

Lost files will be placed in the filesystem's lost+found directory, named for their inode number.
Link counts in inodes too large.
Missing blocks in the free list.
Blocks in the free list also in files.
Incorrect counts in the filesystem's tables.
Unreferenced zero-length files are deleted.

More serious errors will be handled with prompts as in the previous example.

For UFS filesystems under Solaris, the BSD-style options are specified as arguments to the -o option (the filesystem type-specific options flag). For example, the following command checks the UFS filesystem on /dev/dsk/c0t3d0s2 and makes necessary nondestructive corrections without prompting:

# fsck -F ufs -o p /dev/dsk/c0t3d0s2

10.2.7.1 After fsck

If fsck modifies any filesystem, it will print a message like:

*** FILE SYSTEM WAS MODIFIED ***

If the root filesystem was modified, an additional message will also appear, indicating additional action needed:

BSD-style if the automatic filesystem remount fails: mount reload of /dev/device failed: *** REBOOT NOW *** System V-style:  ***** REMOUNTING ROOT FILE SYSTEM *****

If this occurs as part of a normal boot process, the remount or reboot will be initiated automatically. If fsck has been run manually on the root filesystem on a BSD system, the rebooting command needs to be entered by hand. Use the reboot command with the -n option:

# reboot -n

The -n option is very important. It prevents the sync command from being run, which flushes the output buffers and might very well recorrupt the filesystem. This is the only time when rebooting should occur without syncing the disks.