Section 27.2. Objective 2: Maintaining a Linux Filesystem | LPI Linux Certification in a Nutshell (In a Nutshell (OReilly))

27.2. Objective 2: Maintaining a Linux Filesystem

This Objective has major overlaps with Chapter 6, Objective 2. You may wish to review the section "Maintain the Integrity of Filesystems" there. Here we'll just touch on the fine points of some tune2fs settings, when and how to force fsck to save your system, and how to resize filesystems.

Fortunately, Linux filesystems require very little maintenance. For example, Linux's allocation strategies lead to very little disk fragmentation , so defragmentation tools, while they exist, are not used much. Fragmentation does increase, though, if the disk becomes excessively full, so monitoring free space is a very good idea. Maintenance tasks are mostly restricted to watching fsck do its work automatically after a crash.

27.2.1. tune2fs

It's important, despite the availability nowadays of journaling (see the next Objective) and RAID, to run fsck on all filesystems from time to time. The two parameters for controlling how often a fsck of a filesystem is forced are: -i and -c. The default for a filesystem is shown when it is created:

 This filesystem will be automatically checked every 29 mounts or 180 days, whichever comes first.  Use tune2fs -c or -i to override.

If you find that your disk systems are very reliable and experience very few corruption problems, you can increase the period between forced filesystem checks. To change the number of mounts between checks (the filesystem typically gets mounted once for each boot), use the -c option. It is relatively important that the number be odd, and even better if it's prime. The reason for this is that you do not want to wait while all your filesystems are being fsck'ed at the same time after, say, 24 reboots. If you use different odd numbers for -c on all your disks, they will tend to be checked at different times. The time between checks is already quite high, so you may not want to use the -i to change it. But if you do, the value is taken to be days, which you can change to weeks or months by postfixing the value with w or m, respectively.

If you ever run into errors in an ext2 filesystem, the kernel can take three different actions based on this, regulated by the value in the -e option:

continue: Just ignore the error. A read or write error should be reported to any application.
remount-ro: Remount the filesystem read only. This prevents escalating or dominoing failures because the filesystem cannot be updated. If the error is due to filesystem inconsistency, more corruption may otherwise result. Applications will fail with write errors subsequent to this remounting, but the data already on disk is secured.
panic: Cause a kernel panic and halt the system. This is a very obvious failure mode that forces action by the user, whereas the two previous failure modes are more subtle and the problems might go unnoticed.

Choose the -e option best suited to your needs. If data consistency is of the utmost importance, one of the two l ast options should be chosen. In any case, the filesystem will be marked as unclean and will be checked on next reboot.

27.2.2. dumpe2fs

We must admit to never having used dumpe2fs. Furthermore, upon reviewing the manpage, we haven't found any use for anyone without a university major in filesystem design. The one option that could be understood by a mere senior system administrator with a decade of experience but without a special interest in filesystems is:

 # /sbin/dumpe2fs -h /dev/Disk2/test dumpe2fs 1.34 (25-Jul-2003) Filesystem volume name:   <none> Last mounted on:          <not available> ... Filesystem features:      filetype sparse_super ... Filesystem state:         clean Errors behavior:          Continue ... Reserved block count:     10240 Free blocks:              197429 Free inodes:              51181 First block:              1 ... Filesystem created:       Sat Jan 24 22:14:19 2004 Last mount time:          Sat Jan 24 22:14:37 2004 Last write time:          Sat Jan 24 22:14:59 2004 Mount count:              1 Maximum mount count:      34 Last checked:             Sat Jan 24 22:14:19 2004 Check interval:           15552000 (6 months) Next check after:         Thu Jul 22 23:14:19 2004 Reserved blocks uid:      0 (user root) Reserved blocks gid:      0 (group root) First inode:              11 Inode size:               128 Default directory hash:   tea Directory Hash Seed:      b5824b65-c864-4ee5-82f1-39aaae3fb572

But this is identical to the output of tune2fs -l. In any case, it supplies information about the filesystem features. The filetype feature is standard and tells you that file types are stored in directory entries. sparse_super indicates that a long space is left between superblock backups, saving some space. If the filesystem were larger than 2 GB, large_files would be set as well, and the filesystem would support files larger than 2 GB. The free blocks and inodes information are the same as displayed by df. The mount counts and next check information were discussed in the previous tune2fs section.

27.2.3. debugfs

Sometimes called debuge2fs. Also a very intricate tool. But it has one very good feature, which helps you undelete files. Helps is a very important word here. First of all, it can help only on ext2 filesystems; on ext3 it does not work at all. Secondly, a real filesystem always has a lot of deleted files, so you have to work hard to find the right one. Finally, you can only hope that the file's blocks have not been reused since you deleted it. Following is an ideal situation with only one deleted file:

 # debugfs -w /dev/Disk2/test debugfs 1.34 (25-Jul-2003) debugfs:  lsdel  Inode  Owner  Mode    Size    Blocks   Time deleted     12      0 100644 261117  256/ 256 Sat Jan 24 22:14:55 2004 1 deleted inodes found. debugfs:  undelete <12> undeleted-file debugfs: ^D # mount -t ext2 /dev/Disk2/test /mnt # cd /mnt # file undeleted-file undeleted-file: RPM v4 bin i386 maildrop-1.3.4-1.7.0 # mv undeleted-file maildrop-1.3.4-1.7.0.i386.rpm

The lsdel and undel commands in debugfs are undocumented in both Debian and Red Hat. The way this process works is that you find a file with a likely size and deletion date with lsdel. In the first column of that line is the inode number of that file. This inode number is then given to undel along with a filename. Note that the <> around the inode number are required. Prior experience has shown that this command sometimes will fail, and sometimes work, and that a fsck after an undelete can be a good idea.

27.2.4. badblocks and e2fsck

Luckily, the badblocks command is very seldom needed these days. All modern disks do bad block remapping, meaning that if they detect a bad block a spare block is substituted. Very neat and nice for system administrators. Sometime in the childhood of Linux, IDE disks could not do such things. Because the disks were small and expensive, a bad block scan was an economic thing to do.

Today disks are huge, and they all do bad block remapping, so badblocks is by and large useless because it takes too long to scan the disk.

There is one situation in which badblocks can potentially be handy. When a disk is failing, it will usually get an exponential increase in bad blocks, and after a short while it will run out of spare blocks, whereupon you will get into trouble with your filesystems on that disk.

You may manage to save the filesystem by running badblocks on it and then passing the list of bad blocks to e2fsck to get the filesystem working enough for a backup. Bad block scanning will take a very, very, very long time on a big device. The process is as follows:

 # badblocks -c 4096 -n -v -v -b 1024 -o /tmp/hdc1-badblocks /dev/hdc1... # e2fsck -l /tmp/hdc1-badblocks /dev/hdc1...

The -n does a nondestructive read/write test. The -v -v ensure that you get progress reports. The -b is important, because if badblocks gets the wrong block size, the badblocks file will be useless. The filesystem block size is 1024 by default, but large filesystems use 4096. You can derive the right parameter by entering tune2fs -l or dumpe2fs -h. The -c 4096 is how many blocks should be checked at a time. The manpage warns about using this, but 4096 blocks of 1024 bytes at once uses only about 14 MB of memory. Most modern systems can handle that well enough.

All in all, and due to the number of hours it takes badblock to scan a modern disk with any interesting size, it seems more likely that a disk with exponential bad blocks will fail totally before the scan is completed.

27.2.5. fsck

Sometimes you will see strange behavior from a filesystem. Files that just disappear. File sizes that are all wrong. File permissions being odd or impossible. You should then know that it's time for a filesystem check, even if Linux thinks that the filesystem is clean and it's a long time until a forced check is due. How to do a file check was discussed in Chapter 26, but briefly summarized, you boot the machine in single-user mode, unmount the filesystem in question, and run fsck -f on it. fsck will look up the partition in /etc/fstab and supply the filesystem type itself. If you know the filesystem to be an ext2 or ext3 filesystem, you can run e2fsck -f or fsck.ext2 -f directly.

27.2.6. mke2fs

This command does not handle filesystem maintenance as such. But there are some things you might want to consider to make maintenance easier or the filesystem more suitable or flexible for future needs. Mostly, some of the command's sizing options are interesting:

-O sparse_super: Enable (first option) or disable (second option) sparse superblocks. On very large disks (which includes most modern disks), it is pointless to have super block backups as often as administrators used to have with small disks. mke2fs does a very good job of guessing what you need here.
-J size=100: Set the size of the ext3 journal. On some fast systems that do an unusual volume of I/O, you may find that the default journal size is too small. The journal must be between 1024 and 102,400 blocks. The size value is in megabytes.
-N number-of-inodes: In filesystems that are going to have a lot of small files, such as a news spool, or the index disk of some backup systems, you need more than the usual number of inodes. One inode is used per file. Round the number up generously. For all normal filesystem uses, the default is generous.

27.2.7. Filesystem Resizing

An ext2 filesystem can be resized. This can be very handy, but is of limited value if your disk does not have free space that you can add to the partition on which the filesystem resides. In Objective 3 of Chapter 28 we'll discuss LVM, which lets you easily resize data volumes. If you don't use LVM, you can use parted (Partition EDitor) to resize, move, and copy partitions and the filesystems they contain.

To enlarge a partition, you must first create free space elsewhere on the same disk. If you don't have any, find one or more underutilized partitions and make them smaller. Then move the partitions that are between the free space and the partition you want to enlarge so that the free space becomes adjacent to the partition you want to enlarge. You're now free to enlarge it.

Several cautions apply to the use of parted:

parted, unfortunately, has been known to corrupt partition tables and ruin filesystems. Before using it, as before any major operation on your system, back up your data and feel relieved if you don't have to restore it later.
The filesystems containing the partitions you are working on need to be unmounted when you work on them.
You should go down to single-user mode if you're doing large operations.

If you are just resizing an existing filesystem, you'll find it simpler to use resize2fs, described later.

Syntax

 parted [ device [ command [ options ] ] ]

Description

parted can be used to edit partition tables. For human consumption, it is best to use it interactively. But scripts, or very knowledgeable humans, may give the options directly on the command line.

Some useful commands

move partition N start end: Move the partition numbered N to the given start and end points. This can resize the partition at the same time.
resize partition N start end: Resize the partition numbered N to fit within the new start and end. Making a filesystem smaller can be quite time consuming, because files that are stored past the new end of the filesystem must be moved.

Syntax

 resize2fs [-f] device_name [ size ]

Description

Resize the ext2 filesystem present on the partition named by device_name. If the size is omitted, the filesystem is resized to fill the whole partition. size should be given in units of filesystem blocks, or if it is postfixed by s, K, M, or G, it is in 512-byte sectors, kilobytes, megabytes, or gigabytes, respectively. The -f option forces resizing.

Example

 resize2fs /dev/hda7 5G

27.2.8. fsck

fsck is normally run automatically as needed. Under certain circumstances, you may want to make sure it runs because, for example, you see things that makes you believe the filesystem is corrupt. It could be files disappearing, files that have 0 or small sizes but appear huge if you read them with less or cat, or anything else that makes you feel that the filesystem is not acting sanely.

The commands and techniques discussed for system recovery will be useful here. If any filesystem except your root filesystem seems messed up, simply take the machine down to single-user mode, unmount the problematic partition, and run fsck -f on it. If it is the root filesystem that seems insane, you will need to reboot with the kernel option init=/bin/bash and, instead of mounting the root filesystem, run fsck -f on it.

After fixing the filesystem, reboot. The system should come up sanely and without any strangeness in the filesystems.

27.2.9. Self-Monitoring, Analysis, and Reporting Technology System (SMART)

This system is built into most modern ATA and SCSI hard disks. SMART provides an efficient and cheap solution for monitoring potential failures in your hard disk devices. A hard disks is a very delicate device, so it is well worthwhile having any potential warning that it is about to crash.

Warning: SMART doesn't substitute for the always-recommended practice to get all your critical data stored, updated, and tested in your backup media.

SMART implementation in Linux is very mature, and there are good tools to manage it. To start using this great benefit of modern disks, you must check these prerequisites:

The hard disk must be SMART compliant.
Your operating system must read SMART commands.
You must install software capable of managing and showing SMART alert messages.

Assuming you don't have problems with the first and second items, you can use smartmontools (http://smartmontools.sourceforge.net) to fulfill the last one. This suite contain two binaries, smartd and smartctl, along with corresponding initialization scripts.

smartd runs as a daemon working in the background and monitors the hard disk. smartctl is a utility that controls and monitors SMART. smartctl is designed to perform SMART tasks such as printing the SMART self-test and error logs, enabling and disabling SMART automatic testing, and initiating device self-tests. Assuming that /dev/hda is the first hard disk installed in your system, you can display the status as follows:

 # /etc/init.d/smartd start  * Starting S.M.A.R.T. monitoring daemon...                                 [ ok ] # smartctl -i /dev/hda smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model:     WDC WD400BB-00FRA0 Serial Number:    WD-WMAJF1340215 Firmware Version: 77.07W77 Device is:        In smartctl database [for details use: -P show] ATA Version is:   6 ATA Standard is:  Exact ATA specification draft version not indicated Local Time is:    Tue Jul 26 20:23:21 2005 BRT SMART support is: Available - device has SMART capability. SMART support is: Enabled

If the command shows:

 SMART support is: Disabled

just turn it on using:

 # smartctl -s on /dev/hda

You can customize /etc/smartd.conf to monitor the temperature and error rates of disks, among other interesting things, but details are outside the scope of this objective.