Section 2.8. Filesystems

   


2.8. Filesystems

A regular file is a linear array of bytes and can be read and written starting at any byte in the file. The kernel distinguishes no record boundaries in regular files, although many programs recognize line-feed characters as distinguishing the ends of lines, and other programs may impose other structure. No system-related information about a file is kept in the file itself, but the filesystem stores a small amount of ownership, protection, and usage information with each file.

A filename component is a string of up to 255 characters. These filenames are stored in a type of file called a directory. The information in a directory about a file is called a directory entry and includes, in addition to the filename, a pointer to the file itself. Directory entries may refer to other directories, as well as to plain files. A hierarchy of directories and files is thus formed, called a filesystem; a small one is shown in Figure 2.2. Directories may contain subdirectories, and there is no inherent limitation to the depth with which directory nesting may occur. To protect the consistency of the filesystem, the kernel does not permit processes to write directly into directories. A filesystem may include not only plain files and directories but also references to other objects, such as devices, sockets, and fifos.

Figure 2.2. A small filesystem tree.


The filesystem forms a tree, the beginning of which is the root directory, sometimes referred to by the name slash, spelled with a single solidus character (/). The root directory contains files; in our example in Figure 2.2, it contains kernel, a copy of the kernel-executable object file. It also contains directories; in this example, it contains the usr directory. Within the usr directory is the bin directory, which mostly contains executable object code of programs, such as the files ls and vi.

A process identifies a file by specifying that file's pathname, which is a string composed of zero or more filenames separated by slash (/) characters. The kernel associates two directories with each process for use in interpreting pathnames. A process's root directory is the topmost point in the filesystem that the process can access; it is ordinarily set to the root directory of the entire filesystem. A pathname beginning with a slash is called an absolute pathname and is interpreted by the kernel starting with the process's root directory.

A pathname that does not begin with a slash is called a relative pathname and is interpreted relative to the current working directory of the process. (This directory also is known by the shorter names current directory or working directory.) The current directory itself may be referred to directly by the name dot, spelled with a single period (.). The filename dot-dot (..) refers to a directory's parent directory. The root directory is its own parent.

A process may set its root directory with the chroot system call and its current directory with the chdir system call. Any process may do chdir at any time, but chroot is permitted only a process with superuser privileges. Chroot is normally used to set up restricted access to the system.

Using the filesystem shown in Figure 2.2, if a process has the root of the filesystem as its root directory and has /usr as its current directory, it can refer to the file vi either from the root with the absolute pathname /usr/bin/vi or from its current directory with the relative pathname bin/vi.

System utilities and databases are kept in certain well-known directories. Part of the well-defined hierarchy includes a directory that contains the home directory for each user for example, /usr/staff/mckusick and /usr/staff/gnn in Figure 2.2. When users log in, the current working directory of their shell is set to the home directory. Within their home directories, users can create directories as easily as they can regular files. Thus, a user can build arbitrarily complex subhierarchies.

The user usually knows of only one filesystem, but the system may know that this one virtual filesystem is really composed of several physical filesystems, each on a different device. A physical filesystem may not span multiple logical devices. Since most physical disk devices are divided into several logical devices, there may be more than one filesystem per physical device, but there will be no more than one per logical device. One filesystem the filesystem that anchors all absolute pathnames is called the root filesystem and is always available. Others may be mounted that is, they may be integrated into the directory hierarchy of the root filesystem. References to a directory that has a filesystem mounted on it are converted transparently by the kernel into references to the root directory of the mounted filesystem.

The link system call takes the name of an existing file and another name to create for that file. After a successful link, the file can be accessed by either filename. A filename can be removed with the unlink system call. When the final name for a file is removed (and the final process that has the file open closes it), the file is deleted.

Files are organized hierarchically in directories. A directory is a type of file, but, in contrast to regular files, a directory has a structure imposed on it by the system. A process can read a directory as it would an ordinary file, but only the kernel is permitted to modify a directory. Directories are created by the mkdir system call and are removed by the rmdir system call. Before 4.2BSD, the mkdir and rmdir system calls were implemented by a series of link and unlink system calls being done. There were three reasons for adding systems calls explicitly to create and delete directories.

  1. The operation could be made atomic. If the system crashed, the directory would not be left half-constructed, as could happen when a series of link operations were used.

  2. When a networked filesystem is being run, the creation and deletion of files and directories need to be specified atomically so that they can be serialized.

  3. When supporting non-UNIX filesystems, such as an NT filesystem, on another partition of the disk, the other filesystem may not support link operations. Although other filesystems might support the concept of directories, they probably would not create and delete the directories with links, as the UNIX filesystem does. Consequently, they could create and delete directories only if explicit directory create and delete requests were presented.

The chown system call sets the owner and group of a file, and chmod changes protection attributes. Stat applied to a filename can be used to read back such properties of a file. The fchown, fchmod, and fstat system calls are applied to a descriptor instead of to a filename to do the same set of operations. The rename system call can be used to give a file a new name in the filesystem, replacing one of the file's old names. Like the directory-creation and directory-deletion operations, the rename system call was added to 4.2BSD to provide atomicity to name changes in the local filesystem. Later, it proved useful explicitly to export renaming operations to foreign filesystems and over the network.

The truncate system call was added to 4.2BSD to allow files to be set to an arbitrary size. Thus, truncate is poorly named because it may be used to both shorten and lengthen a file. Files may have holes in them. Holes are void areas in the linear extent of the file where data have never been written. A process can create these holes by positioning the pointer past the current end-of-file and writing. Alternatively, a hole may be added to the end of a file by using the truncate system call to increase its length. When read, holes are treated by the system as zero-valued bytes.

Once the filesystem had the ability to shorten files, the kernel took advantage of that ability to shorten large, empty directories. The advantage of shortening empty directories is that it reduces the time spent in the kernel searching them when names are being created or deleted.

Newly created files are assigned the user identifier of the process that created them and the group identifier of the directory in which they were created. A three-level access-control mechanism is provided for the protection of files. The following three levels specify the accessibility of a file.

  1. To the user who owns the file

  2. To the group that owns the file

  3. To everybody else

Each level of access has separate indicators for read permission, write permission, and execute permission. If finer granularity access control is needed, FreeBSD 5.2 also provides ACLs (access control lists) to allow specification of read, write, and execute permission on a per-user or per-group level.

Files are created with zero length and may grow when they are written. While a file is open, the system maintains a pointer into the file showing the current location in the file associated with the descriptor. This pointer can be moved about in the file in a random-access fashion. Processes sharing a file descriptor through a fork or dup system call share the current location pointer. Descriptors created by separate open system calls have separate current location pointers.

Filestores

The user visible part of the filesystem is its hierarchical naming, locking, quotas, attribute management, and protection. But the bulk of the filesystem implementation involves the organization and management of the data on the storage media. Laying out the contents of files on the storage media is the responsibility of the filestore. FreeBSD uses the traditional Berkeley fast filesystem format. The disk is organized into groups of contiguous blocks called cylinder groups. Files that are likely to be accessed together, based on their locations in the filesystem hierarchy, are stored in the same cylinder group. Files that are not expected to be accessed together are moved into different cylinder groups.

A key responsibility of the filestore is to ensure that the filesystem is always kept in a state in which it can be recovered after a hardware or software failure. While recoverability can be maintained by using synchronous writes to the disk, the performance of a filesystem using this technique would be unacceptably slow. FreeBSD uses a technique called soft updates (see Section 8.6) to ensure recoverability while still delivering good performance and fast restart after a system crash.

Another useful feature of the FreeBSD filestore is the ability to quickly take a filesystem snapshot. Snapshots can be taken every few hours and mounted in a well-known location so that users can recover inadvertently deleted files that they created or wrote earlier in the day. Snapshots can also be used to allow the creation of consistent archives of filesystems that are in continuous active use. Snapshots are described in Section 8.7.


   
 


The Design and Implementation of the FreeBSD Operating System
The Design and Implementation of the FreeBSD Operating System
ISBN: 0201702452
EAN: 2147483647
Year: 2003
Pages: 183

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net