|
Before the year 2000, Ext2 was the de facto file system for most Linux machines. Ext2 is robust, reliable, and suitable for most deployments. However, as Linux displaces UNIX and other operating systems in more and larger server and computing environments, Ext2 is being pushed to its limits. In fact, many now-common requirementslarge hard-disk partitions, quick recovery from crashes, high-performance I/O, and the need to store thousands and thousands of files representing terabytes of dataexceed the capabilities of Ext2. Fortunately, a number of other Linux file systems take up where Ext2 leaves off. Indeed, Linux now offers four alternatives to Ext2: Ext3, ReiserFS, XFS, and JFS. In addition to meeting some or all of the requirements just listed, each of these alternative file systems supports journaling, a feature demanded by enterprises and beneficial to anyone running Linux. A journaling file system can simplify restarts, reduce fragmentation, and accelerate I/O. Journaling file systems minimize the need to run file system checkers. System administrators who maintain complex systems or those who require high availability should consider deploying one or more journaling file systems. When Good File Systems Go BadThe following describes the methodology for increasing the size of a file from three blocks to five blocks when it is modified:
Although writing data to a file appears to be a single atomic operation, the actual process involves a number of steps (even more steps than shown here, considering all the accounting required to remove free blocks from a list of free space, among other possible metadata changes). If all the steps to write a file are completed perfectly (and this happens most of the time), the file is saved successfully. However, if the process is interrupted at any time (perhaps due to power failure or other systemic failure), a non-journaled file system can end up in an inconsistent state. Corruption occurs because the logical operation of writing (or updating) a file is actually a sequence of I/O, and the entire operation might not be totally reflected on the media at any given point in time. If the metadata or the file data is left in an inconsistent state, the file system no longer functions properly. Non-journaled file systems rely on fsck to examine all the file system's metadata and detect and repair structural integrity problems before restarting. If Linux shuts down smoothly, fsck typically returns a clean bill of health. However, after a power failure or crash, fsck is likely to find some kind of error in metadata. Because file systems contain significant amounts of metadata, running fsck can be very time-consuming. fsck has to scan a file system's entire repository of metadata to ensure consistency and error-free operation; therefore, the speed of fsck on a disk partition is proportional to the size of the partition, the number of directories, and the number of files in each directory. For large file systems, journaling becomes crucial. Journaled file systems provide improved structural consistency, better recovery, and faster restart times than non-journaled file systems. In most cases, journaled file systems can restart in less than a second. Transactions Are the SolutionThe magic of journaling file systems lies in transactions. Like database transactions, journaling file system transactions treat a sequence of changes as a single, atomic operation. However, instead of tracking updates to tables, the journaling file system tracks changes to file system metadata or user data. The transaction guarantees that either all or none of the file system updates are done. For example, the process of creating a new file modifies several metadata structures (inodes, free lists, and directory entries). Before the file system makes those changes, it creates a transaction that describes what it is about to do. After the transaction has been recorded (on disk), the file system goes ahead and modifies the metadata. The journal in a journaling file system is simply a list of transactions. In the event of a system failure, the file system is restored to a consistent state by replaying the journal. Rather than examine all metadata (the fsck way), the file system inspects only those portions of the metadata that have recently changed. Recovery is much fasterusually only a matter of seconds. Better yet, recovery time is not dependent on the size of the partition. In addition to faster restart times, most journaling file systems also address another significant problem: scalability. Combining even a few large-capacity disks, it is easy to assemble some massive (certainly by early-'90s standards) file systems. Features of modern file systems include the following:
More advanced file systems also manage sparse files, internal fragmentation, and the allocation of inodes better than Ext2. A Wealth of OptionsAlthough advanced file systems are tailored primarily for the high throughput and high uptime requirements of servers (from single-processor systems to clusters), these file systems can also benefit client machines where performance and reliability are wanted or needed. Recent releases of Linux include not one, but four journaling file systems. JFS from IBM, XFS from SGI, and ReiserFS from Namesys have all been "open sourced" and subsequently included in the Linux kernel. In addition, Ext3 was developed as a journaling add-on to Ext2. Figure 11-2 shows where file systems fit into Linux. Note that JFS, XFS, ReiserFS, and Ext3 are independent "peers." It is possible for a single Linux machine to use all these types of file systems at the same time. A system administrator can configure a system to use JFS on one partition and ReiserFS on another. Figure 11-2. Where file systems fit in the operating system.The following output from the mount command shows a system with all four of the journaling systems: # mount /dev/hdb6 on / type reiserfs (rw) proc on /proc type proc (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) shmfs on /dev/shm type shm (rw) usbdevfs on /proc/bus/usb type usbdevfs (rw) /dev/hda1 on /xfs type xfs (rw) /dev/hdb1 on /jfs type jfs (rw) /dev/hda4 on /ext3 type ext3 (rw) /dev/hda2 on /ext2 type ext2 (rw) /dev/hda3 on /reiserfs type reiserfs (rw) The df command shows all these file systems and their available space: # df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/hdb6 4441800 1770448 2671352 40% / shmfs 192736 0 192736 0% /dev/shm /dev/hda1 806448 144 806304 1% /xfs /dev/hdb1 3999504 659320 3340184 17% /jfs /dev/hda4 1739324 32828 1618140 2% /ext3 /dev/hda2 798508 20 757924 1% /ext2 /dev/hda3 811248 32840 778408 5% /reiserfs |
|