Section 11.7. File System Types

11.7. File System Types

Early Macintosh systems used the Macintosh File System (MFS)a flat file system in which all files were stored in a single directory. The software presented an illusory hierarchical view that showed nested folders. MFS was designed for floppy disks, not for high-capacity storage media such as hard disks and CD-ROMs. The Hierarchical File System (HFS) was introduced with the Macintosh Plus as a file system with "true" hierarchy, although it differed from a traditional Unix file system in that the hierarchical structure was entirely maintained in a central catalog. HFS was the primary file system format used until Mac OS 8.1, when HFS Plus replaced it.

Each MFS volume contained a folder called Empty Folder at its root level. Renaming this folder created a new folder, with a replacement Empty Folder appearing as a side effect.

It is common for modern operating systems to support several file systemsLinux supports dozens! Mac OS X also supports a number of file systems. Because of the number of sources Mac OS X draws from, it has multiple file system APIs: Carbon File Manager, NSFileManager and family (Cocoa), and BSD system calls. Figure 1118 shows how these APIs are layered in the system.

Figure 1118. Mac OS X file system APIs

The file systems available on Mac OS X can be categorized as follows.

Local file systems are those that use locally attached storage. Mac OS X supports HFS Plus, HFS, ISO 9660, MS-DOS, NTFS, UDF, and UFS.
Network file systems are those that allow files residing on one computer to appear locally on another computer, provided the two computers are connected over a network. Mac OS X supports the Apple Filing Protocol (AFP), FTP file system, NFS, SMB/CIFS, and the WebDAV file system.
Pseudo file systems are those typically used for providing file-like views of nonfile information. Some others are used as special file system layers. In general, pseudo file systems do not have persistent backing stores.^[7] Mac OS X supports cddafs, deadfs, devfs, fdesc, specfs, fifofs, synthfs, union, and volfs.
^[7] cddafs provides a file system view of nonfile information that is persistent.

Another Apple-provided file system for Mac OS X is the Apple Cluster File System (ACFS)a shared SAN file system that underlies the Xsan product (see Section 2.15.2).

Let us briefly look at each of these file systems.

11.7.1. HFS Plus and HFS

The Mac OS Extended file system (another name for HFS Plus, or simply HFS+) is the preferred, default, and most feature-rich file system on Mac OS X. Although it is architecturally similar to its predecessor, HFS, it has undergone numerous additions, improvements, and optimizations to be a respectable modern-day file system. We will discuss HFS+ in detail in the next chapter.

When HFS was introduced, it was quite innovative in how it lent support to the Macintosh graphical user interface. It provided the abstraction of two forks, the data fork and the resource fork, with the latter allowing structured storage of GUI-related (and other) resources alongside regular file data. Although the two forks were parts of the same file, they could be individually accessed and manipulated. The following are examples of resources:

'ICON'an icon
'CODE'executable code
'STR'program strings

Besides the data and resource forks, HFS provides for additional per-file information, such as a four-character file type, a four-character creator code, and attributes such as those specifying whether the file is locked, is invisible, or has a custom icon. This allowed the user interface to determine which application to launch to handle a file when the user double-clicked on its icon.

HFS also differs from traditional file systems in that it uses a B-Tree-based catalog file to store the file system's hierarchical structure, rather than explicitly storing directories on disk. In order to locate the contents of a fork, HFS records up to the first three extentsthat is, { starting block, block count } pairsin the corresponding file record in the catalog file. If a fork is fragmented enough to have more than three extents, the remaining extents overflow to another B-Tree-based file: the extents overflow file. As we will see in Chapter 12, HFS+ retains the basic design of HFS.

Both HFS and HFS+ use the colon character (:) as a path separatorit is not a valid filename character. They also do not have the notion of a filename extension.

11.7.2. ISO 9660

ISO 9660 is a system-independent file system for read-only data CDs. Apple has its own set of ISO 9660 extensions. Moreover, Mac HFS+/ISO 9660 hybrid discs contain both a valid HFS+ and a valid ISO 9660 file system. Both file systems can be read on Mac OS X, whereas on non-Apple systems, you would typically be able to read only the ISO 9660 data. This does not mean there is redundant data on the discusually, the data that needs to be accessed from both Mac OS X and other operating systems is stored on the ISO 9660 volume and is aliased on the HFS+ volume. Consider the following example, where we create a hybrid ISO image containing two files, each visible from only a single file system.

$ hdiutil makehybrid -o /tmp/hybrid.iso . -hfs -iso -hfs-volume-name HFS \     -iso-volume-name ISO -hide-hfs iso.txt -hide-iso hfs.txt Creating hybrid image... ... $ hdiutil attach -nomount /tmp/hybrid.iso /dev/disk10              Apple_partition_scheme /dev/disk10s1            Apple_partition_map /dev/disk10s2            Apple_HFS $ hdiutil pmap /dev/rdisk10 Partition List ## Dev_______ Type_______________ Name_____________ Start___ Size____ End_____  0 disk10s1   Apple_partition_map Apple                    1       63       63 -1            Apple_ISO           ISO                     64       24       87  1 disk10s2   Apple_HFS           DiscRecording 3.0       88       36      123 Legend     - ... extended entry     + ... converted entry ...

If we explicitly mount the hybrid volume's ISO 9660 file system, we will not see the HFS+-only file hfs.txt on it. If we mount it as an HFS+ volume, we will see only hfs.txt on it.

$ mkdir /tmp/iso $ mount -t cd9660 /dev/disk10 /tmp/iso $ ls /tmp/iso ISO.TXT $ umount /tmp/iso $ hdiutil detach disk10 ... $ open /tmp/hybrid.iso ... $ ls /Volumes/HFS hfs.txt

Apple's ISO 9660 implementation stores a resource fork as an associated file, which it names by adding the ._ prefix to the containing file's name.

11.7.3. MS-DOS

Mac OS X includes support for the FAT12, FAT16, and FAT32 variants of the MS-DOS file system. The file system is not compiled into the kernel but is present as a loadable kernel extension (/System/Library/Extensions/msdosfs.kext). When an MS-DOS volume is being mounted, the mount_msdos command attempts to load the kernel extension if it is not already loaded.

hdiutil supports writing an MS-DOS file system to a disk image.

$ hdiutil create -size 32m -fs MS-DOS -volname MS-DOS /tmp/ms-dos.dmg $ hdiutil attach /tmp/ms-dos.dmg /dev/disk10                                             /Volumes/MS-DOS $ hdiutil pmap /dev/rdisk10 ## Dev_______ Type_______________ Name_____________ Start___ Size____ End_____ -1 disk10     MS-DOS              Single Volume            0    65536    65535 ...

The Mac OS X MS-DOS file system implementation supports symbolic links by storing link-target information in a specially formatted text file that is exactly 1067 bytes in size. Figure 1119 shows the contents of a symbolic link file symlink.txt whose link target is specified as a relative path target.txt.

Figure 1119. Structure of a symbolic link on the MS-DOS file system on Mac OS X

You can synthesize your own symbolic link by simply writing the appropriate information to a file^[8]the MD5 digest of the link-target path can be generated using a command such as the following.

^[8] You may need to unmount and remount the volume; the file system initially might not recognize the file as a symbolic link because of caching.

$ echo -n target.txt | md5 4d6f333d2bc24ffddcca34414a0cb12d

11.7.4. NTFS

Mac OS X includes read-only support for NTFS. The NTFS file system driver (/System/Library/Extensions/ntfs.kext) is based on the FreeBSD NTFS driver. As with the MS-DOS file system, NTFS it is not compiled into the kernel but is loaded by the mount_ntfs program when required.

11.7.5. UDF

Universal Disk Format (UDF) is the file system used by DVD-ROM discs (including DVD-video and DVD-audio discs) and many CD-R/RW packet-writing programs. It is implemented as a kernel extension (/System/Library/Extensions/udf.kext) that is loaded by the mount_udf program when required. Mac OS X 10.4 supports the "normal" flavor of the UDF 1.5 specification.

11.7.6. UFS

Darwin's implementation of UFS is similar to that on FreeBSD, but they are not entirely compatible because the Darwin implementation is always big-endian (as was NEXTSTEP's)even on little-endian hardware. Technically, UFS is BSD's Unix file system layer that is independent of the underlying file system implementation. The part that deals with on-disk structures is based on the Berkeley Fast File System (FFS). We will use the term UFS to represent the combination of UFS and FFS.

UFS does not provide some features that HFS+ providesfor example, it does not support multiple forks, (native) extended attributes, and aliases. However, resource forks and extended attributes can be used on UFS through emulation. For example, when a file (named, say, file.txt) with a nonzero resource fork is copied to a UFS volume, it is split into two files: file.txt (containing the data fork) and ._file.txt (containing the resource fork). Copying such a file to an HFS+ volume will populate both forks in the destination file.

UFS2

Newer versions of FreeBSD include UFS2, a redesigned version of UFS (or UFS1, as it is now called). UFS2 provides numerous improvements over UFS1, such as the following: 64-bit block pointers, 64-bit time fields for access and modification times, support for a per-file "birth time" field, support for extended attributes, and dynamically allocated inodes. As in the case of HFS+, UFS2 extended attributes are used for implementing access control lists (ACLs). They are also used for data labeling in FreeBSD's mandatory access control (MAC) framework.

Unlike HFS+, UFS is always case-sensitive. It also supports sparse filesor files with "holes"which HFS+ does not. If a file contains a relatively large amount of zero data (as compared to its size), it is efficiently represented as a sparse file. On a file system with sparse file support, you can create a physically empty file with a nonzero size. In other words, the file would contain virtual disk blocks. When such a file is read, the kernel returns zero-filled memory in place of the virtual disk blocks. When a portion of a sparse file is written, the file system manages the sparse and nonsparse data. A good example of the utility of sparse files is in an emulator (such as Virtual PC) that uses large disk images as virtual disks belonging to guest operating systems. If the emulator is not to allocate physical storage until necessary, either it must use sparse files or it must simulate the sparseness itselfabove the file system. The latter is likely to result in a fragmented disk image.

Even though HFS+ does not support sparse files, it supports deferred zeroing of file blocks that have never been written. Meanwhile, the kernel will return zero-filled pages when such blocks are read.

Let us compare the behavior of HFS+ and UFS when we attempt to create an empty file with a nonzero size that exceeds the file system's capacity. We will create two disk images, each 16MB in size, but one containing an HFS+ file system and the other containing a UFS file system. Note that hdiutil supports writing a UFS file system to a disk image. Next, we will use the mkfile command to attempt to create 32MB sparse files on both volumes.

$ hdiutil create -size 16m -fs HFS -volname HFS hfs.dmg ... $ hdiutil create -size 16m -fs UFS -volname UFS ufs.dmg ... $ open hfs.dmg $ open ufs.dmg $ cd /Volumes/HFS $ df -k . Filesystem   1K-blocks Used Avail Capacity  Mounted on /dev/disk10s2    16337  261 16076     2%    /Volumes/HFS $ mkfile -nv 32m bigfile mkfile: (bigfile removed) Write Error: No space left on device $ cd /Volumes/UFS $ df -k . Filesystem   1K-blocks Used Avail Capacity  Mounted on /dev/disk11s2    15783   15 14979     0%    /Volumes/UFS $ mkfile -nv 32m bigfile bigfile 33554432 bytes $ ls -lh bigfile -rw-------   1 amit  amit       32M Oct 22 11:40 bigfile $ df -k . Filesystem   1K-blocks Used Avail Capacity  Mounted on /dev/disk11s2    15783   27 14967     0%    /Volumes/UFS

Although Mac OS X supports UFS as a root file system, the operating system's features are best integrated with HFS+. Therefore, HFS+ is recommended as the primary file system.

11.7.7. AFP

The Apple Filing Protocol (AFP) is a protocol for file sharing over the network. It was the primary file-sharing protocol in Mac OS 9 and was extensively used by AppleShare servers and clients. It still is the default protocol for sharing files between Mac OS X systems. In general, any file system that supports Unix semantics can be shared over AFP. In particular, besides HFS+ mounts, AFP can be used to export NFS and UFS mounts.

The Mac OS X implementation of AFP is contained in a loadable kernel extension (/System/Library/Filesystems/AppleShare/afpfs.kext).

The /Systems/Library/Filesystems/AppleShare/ directory also contains the asp_atp.kext and asp_tcp.kext kernel extensions. These implement the Apple Session Protocol (ASP) over AppleTalk and TCP, respectively. ASP is a session-layer protocol that allows a client to establish a session with a server and send commands to the latter.

When an application on an AFP client computer accesses a remote file residing on an AFP file server, the native file system layer sends the requests to the AFP translator, which translates and sends the requests to the server. Not all AFP-related communication goes through the translator, however. There are AFP commands with no native file system equivalentsfor example, commands for user authentication. Such commands may be sent directly to the AFP server while bypassing the translator.

Unlike traditional NFS, which is stateless, AFP is session based. An AFP server shares one or more volumes, which AFP clients can access during sessions. An AFP session begins when an AFP client authenticates with an AFP server using a User Authentication Method (UAM). AFP supports multiple UAMs such as the following:

No User Authentication
Cleartext Password
Random Number Exchange
Two-Way Random Number Exchange
Diffie-Hellman Exchange
Diffie-Hellman Exchange 2
Kerberos
Reconnect

The Reconnect UAM is intended for use by a client for reconnecting a session that was disruptedsay, due to a network outage. Note also that AFP supports tunneling with SSH as an option.

AFP has another, less secure level of access control, wherein each volume made available through AFP may have a fixed-length eight-character password associated with it. It also provides Unix-style access privileges, with support for owner and group privileges for searching, reading, and writing.

AFP versions older than 3.0 do not support file permissions. If both the AFP client and server support AFP 3.x, BSD file permissions are sent as-is (untranslated) over the connection. If only the client or the server is using AFP 2.x, permissions have slightly different semantics. For example, when dealing with folders, the 3.x party (server or client) maps between BSD's read, write, and execute bits to AFP's See Files, See Folders, and Make Changes analogs. Note that a process with an effective user ID (UID) of 0 cannot use AFP to access data over the network.

Mac OS X uses a user-space AFP daemon (/usr/sbin/AppleFileServer), which is launched when you select the Personal File Sharing checkbox under System Preferences Sharing. The AFP server provides synchronization rules to facilitate sane simultaneous-file access. In particular, AFP has the notion of a AFP commands can be grouped under the following functional categories:

Login commands
Volume commands
Directory commands
File commands
Combined directory and file commands
Fork commands
Desktop database commands

11.7.8. FTP

The mount_ftp command makes a directory residing on an FTP server locally visible, thus providing an FTP file system.

$ mount_ftp ftp://user:password@host/directory/path local-mount-point

Mac OS X includes a private framework called URLMount that allows AFP, FTP, HTTP (WebDAV), NFS, and SMB URLs to be mounted. The directory /System/Library/Filesystems/URLMount/ contains .URLMounter plug-in bundles for each of these URL types.

The FTP file system is implemented as a user process that is both an FTP client and a local NFS serverit uses NFS to export the FTP view, which is then mounted by the Mac OS X built-in NFS client. You can use the nfsstat program to monitor client-side NFS activity caused by accessing an instance of the FTP file system.

$ mkdir /tmp/ftp $ mount -t ftp ftp://anonymous@sunsite.unc.edu/pub /tmp/ftp $ ls /tmp/ftp Linux                   electronic-publications micro X11                     gnu                     mirrors academic                historic-linux          multimedia archives                languages               packages docs                    linux                   solaris $ nfsstat Rpc Counts:   Getattr   Setattr    Lookup  Readlink      Read     Write    Create    Remove        12         0        19         1         0         0         0         0    Rename      Link   Symlink     Mkdir     Rmdir   Readdir  RdirPlus    Access         0         0         3         0         0         1         0         0     Mknod    Fsstat    Fsinfo  PathConf    Commit         0         8         0         0         0 Rpc Info:  TimedOut   Invalid X Replies   Retries  Requests         0         0         0         3        44 ...

Even in the absence of any explicit NFS mounts, nfsstat will normally report NFS activity because the automount daemon, which is started by default on Mac OS X, also uses the local NFS server approach used by the FTP file system.

11.7.9. NFS

Mac OS X derives its NFS client and server support from FreeBSD. The implementation conforms to NFS version 3 and includes the NQNFS extensions. As we saw in Section 8.10.2, the Mac OS X kernel uses a separate buffer cache for NFS.

Mac OS X also includes the usual supporting daemons for NFS, namely, the following.

rpc.lockd implements the Network Lock Management protocol.
rpc.statd implements the Status Monitor protocol.
nfsiod is the local asynchronous NFS I/O server.

Not Quite NFS (NQNFS)

NQNFS adds procedures to NFS to make the protocol stateful, which is a departure from the original NFS design. The NFS server maintains states of open and cached files on clients. NQNFS uses leases to facilitate server recovery of client state in the case of a crash. The leases are short term but are extended on use.

11.7.10. SMB/CIFS

Server Message Block (SMB) is a widely used protocol for sharing a variety of resources over a network. Examples of SMB-shareable resources include files, printers, serial ports, and named pipes. SMB has been around since the early 1980s. Microsoft and other vendors helped evolve an enhanced version of SMBthe Common Internet File System (CIFS). Mac OS X provides support for SMB/CIFS through Samba, a popular Open Source SMB server that is available for numerous platforms.

11.7.11. WebDAV

Web-based Distributed Authoring and Versioning (WebDAV) is an extension of the ubiquitous Hypertext Transfer Protocol (HTTP) that allows collaborative file management on the web. For example, using WebDAV, you can create and edit content remotely by connecting to a WebDAV-enabled web server. Given the URL to a WebDAV-enabled directory, the mount_webdav command can mount the remote directory as a locally visible file system. In particular, since a .Mac account's iDisk is available through WebDAV, it can also be mounted this way.

$ mkdir /tmp/idisk $ mount_webdav http://idisk.mac.com/<member name>/ /tmp/idisk ... # a graphical authentication dialog should be displayed $ ls /tmp/idisk About your iDisk.rtf    Movies                  Sites Backup                  Music                   Software Documents               Pictures Library                 Public

Mac OS X also supports secure WebDAV, in which Kerberos and the HTTPS protocol can be used while accessing WebDAV volumes.

The Mac OS X WebDAV file system implementation uses a hybrid approach that involves a user-space daemon (implemented within mount_webdav) and a loadable file system kernel extension (/System/Library/Extensions/webdav_fs.kext). Figure 1120 shows an overview of this implementation.

Figure 1120. Implementation of the WebDAV file system

Most of the actual work is performed by the user-space daemon. As Figure 1120 shows, the file system kernel extension communicates with the daemon using an AF_LOCAL socket, which the daemon provides to the kernel as a mount() system call argument. The various vnode operations in the kernel-resident WebDAV code simply redirect I/O requests to the daemon, which services the requests by performing the appropriate network transfers. The daemon also uses a local temporary cache directory.^[9] Note that the daemon does not actually send file data to the kernelonce the data of interest has been downloaded to a cache file, the kernel reads it directly from the file. This setup may be thought of as a special form of file system stacking.

^[9] The files in this directory are unlinked by the daemon while they are open; therefore, they are "invisible."

11.7.12. cddafs

The cdda^[10] file system is used to make the tracks of an audio compact disc appear as AIFF files. The implementation consists of a mounting utility (mount_cddafs) and a loadable file system kernel extension (/System/Library/Extensions/cddafs.kext). The mount utility attempts to determine the album name and the names of the audio tracks on the disc. If it fails, "Audio CD" and "<track number> Audio Track" are used as the album and track names, respectively. The mount utility uses the mount() system call to pass these names to the cddafs kernel extension, which creates a file system view from the audio tracks on the disc. Each track's filename has the format <track number> <track name>.aiff, whereas the album name is used as the volume's name. The kernel extension also creates an in-memory file called .TOC.plist, which appears in the root directory along with the track files and contains XML-formatted table-of-contents data for the disc.

^[10] CD-DA stands for Compact Disc Digital Audio.

$ cat /Volumes/Joshua Tree/.TOC.plist ... <key>Sessions</key>         <array>                 <dict>                         <key>First Track</key>                         <integer>1</integer>                         <key>Last Track</key>                         <integer>11</integer>                         <key>Leadout Block</key>                         <integer>226180</integer>                         <key>Session Number</key>                         <integer>1</integer>                         <key>Session Type</key>                         <integer>0</integer>                         <key>Track Array</key>                         <array> ...

11.7.13. deadfs

deadfs essentially facilitates revocation of accesssay, to the controlling terminal or to a forcibly unmounted file system. The revoke() system call, which revokes access to a given pathname by invalidating all open file descriptors that refer to the file, also causes the corresponding vnode to be dissociated from the underlying file system. Thereafter, the vnode is associated with deadfs. The launchd program uses revoke() to prepare a controlling terminal while starting a session.

The VFS layer (see Section 11.6) uses the vclean() function [bsd/vfs/vfs_subr.c] to dissociate the underlying file system from a vnodeit removes the vnode from any mount list it might be on, purges the name-cache entry associated with the vnode, cleans any associated buffers, and eventually reclaims the vnode for recycling. Additionally, the vnode is "moved" to the dead file system (deadfs). Its vnode operations vector is also set to that of the dead file system.

// bsd/vfs/vfs_subr.c static void vclean(vnode_t vp, int flags, proc_t p) {     ...     if (VNOP_RECLAIM(vp, &context))         panic("vclean: cannot reclaim");     ...     vp->v_mount = dead_mountp; // move to the dead file system     vp->v_op = dead_vnodeop_p; // vnode operations vector of the dead file system     vp->v_tag = VT_NON;     vp->v_data = NULL;     ... }

Most operations in deadfs return an error, with a few exceptions, such as those listed here.

close() trivially succeeds.
fsync() trivially succeeds.
read() returns end-of-file for character devices but an EIO error for all others.

11.7.14. devfs

The device file system (devfs) provides access to the kernel's device namespace in the global file system namespace. It allows device entries to be dynamically added and removed. In particular, the I/O Kit's IOStorageFamily uses devfs functions to add and remove block and character nodes corresponding to media devices as they are attached and detached, respectively.

devfs is allocated, initialized, and mounted from within the Mac OS X kernel during BSD initialization. The kernel mounts it on the /dev/ directory by default. Additional instances of it can be mounted later, from user space, using the mount_devfs program.

$ mkdir /tmp/dev $ mount_devfs devfs /tmp/dev $ ls /tmp/dev bpf0                    ptyte                   ttyr4 bpf1                    ptytf                   ttyr5 ... $ umount /tmp/dev

During bootstrapping, VFS initialization iterates over each built-in file system, calling the file system's initialization function, which is devfs_init() [bsd/miscfs/devfs/devfs_vfsops.c] in the case of devfs. Shortly afterward, the kernel mounts devfs. devfs_init() creates device entries for the following devices: console, tty, mem, kmem, null, zero, and klog.

devfs redirects most of its vnode operations to specfs (see Section 11.7.16).

11.7.15. fdesc

The fdesc file system, which is conventionally mounted on /dev/fd/, provides a list of all active file descriptors in the calling process.^[11] For example, if a process has descriptor number n open, then the following two function calls will be equivalent:

^[11] A process can access only its own open file descriptors using the fdesc file system.

int fd; ... fd = open("/dev/fd/n", ...); /* case 1 */ fd = dup(n);                 /* case 2 */

In Mac OS X versions older than 10.4, the /etc/rc startup script mounts the fdesc file system as a union mount on /dev/. Beginning with Mac OS X 10.4, fdesc is mounted by launchd instead.

// launchd.c ... if (mount("fdesc", "/dev", MNT_UNION, NULL) == 1)     ...

Note that the mount point in launchd's invocation of the mount() system call invocation is /dev/ (and not /dev/fd/). The fd/ directory is maintained by the fdesc file system as one of the entries in its root directory. Besides fd/, it also maintains three symbolic links: stdin, stdout, and stderr. The targets of these links are fd/0, fd/1, and fd/2, respectively. Like devfs, there can be multiple instances of fdesc.

$ mkdir /tmp/fdesc $ mount_fdesc fdesc /tmp/fdesc $ ls -l /tmp/fdesc total 4 dr-xr-xr-x   2 root  wheel  512 Oct 23 18:33 fd lr--r--r--   1 root  wheel    4 Oct 23 18:33 stderr -> fd/2 lr--r--r--   1 root  wheel    4 Oct 23 18:33 stdin -> fd/0 lr--r--r--   1 root  wheel    4 Oct 23 18:33 stdout -> fd/1

The functionality of fdesc is similar to Linux's /proc/self/fd/ directory, which allows a process to access its own open file descriptors. Linux systems also have /dev/fd/ symbolically linked to /proc/self/fd/.

11.7.16. specfs and fifofs

Devices (the so-called special files) and named pipes (fifos) can reside on any file system that can house such files. Although the host file system maintains the names and attributes of special files, it cannot easily handle the operations that are performed on such files. In fact, many operations that are relevant for regular files may not even make sense for special files. Moreover, multiple special files with the same major and minor numbers may exist with different pathnames on a file system, or even on different file systems. It must be ensured that each of these filesessentially a device aliasunambiguously refers to the same underlying device. A related issue is that of multiple buffering, where the buffer cache could hold more than one buffer for the same block on a device.

Ideally, accesses to device files should be directly mapped to their underlying devicesthat is, to the respective device drivers. It would be unreasonable to require each file system type to include explicit support for special file operations. The specfs layer, which was introduced in SVR4, provides a solution to this problem: It implements special-file vnode operations that can be used by any file system. Consider the example of a block or character special file on an HFS+ volume. When HFS+ needs a new vnode, say, during a lookup operation, it calls hfs_getnewvnode() [bsd/hfs/hfs_cnode.c]. The latter checks whether it is a fifo or a special file. If so, it arranges for the vnode to be created with a vnode operations table other than the one for HFS+: hfs_fifoop_p and hfs_specop_p redirect appropriate operations to fifofs and specfs, respectively.

// bsd/hfs/hfs_cnode.c int hfs_getnewvnode(struct hfsmount *hfsmp, ...) {     ...         if (vtype == VFIFO )                 vfsp.vnfs_vops = hfs_fifoop_p;    // a fifo         else if (vtype == VBLK || vtype == VCHR)                 vfsp.vnfs_vops = hfs_specop_p;    // a special file         else                 vfsp.vnfs_vops = hfs_vnodeop_p;   // use HFS+ vnode operations         ...         if ((retval = vnode_create(VNCREATE_FLAVOR, VCREATESIZE, &vfsp, ...))) {     ... }

Note that both fifofs and specfs are file system layersnot file systems. In particular, they cannot be mounted, unmounted, or seen by users.

11.7.17. synthfs

synthfs is an in-memory file system that provides a namespace for creation of arbitrary directory trees. Therefore, it can be used for synthesizing mount pointssay, while booting from a read-only device that may not have a spare directory for use as a mount point. Besides directories, synthfs also allows creation of symbolic links (but not files).

Although synthfs source is part of the xnu source, the default Mac OS X kernel does not include synthfs as a compiled-in file system. In the case of such a kernel, you must first compile synthfs.

Let us look at an example of using synthfs. Suppose you have a read-only file system mounted on /Volumes/ReadOnly/, and you wish to synthesize a directory tree within /Volumes/ReadOnly/mnt/, where mnt/ is an existing subdirectory. You can do so by mounting an instance of synthfs on top of /Volumes/ReadOnly/mnt/. Thereafter, you can create directories and symbolic links within the mnt/ subdirectory.

$ lsvfs # ensure that synthfs is available Filesystem                        Refs Flags -------------------------------- ----- --------------- ufs                                  0 local ... synthfs                              0 $ ls -F /Volumes/ReadOnly # a read-only volume mnt/ root/ boot/ ... $ ls -F /Volumes/ReadOnly/mnt # subdirectory of interest $ sudo mkdir /Volumes/ReadOnly/mnt/MyDir # cannot create a new directory mkdir: /Volumes/ReadOnly/mnt: No such file or directory $ mount_synthfs synthfs /Volumes/ReadOnly/mnt # mount synthfs $ mount ... <synthfs> on /Volumes/ReadOnly/mnt (nodev, suid, mounted by amit) $ sudo mkdir /Volumes/ReadOnly/mnt/MyDir # try again $ ls -F /Volumes/ReadOnly/mnt # now a directory can be created MyDir/ $ umount /Volumes/ReadOnly/mnt # cannot unmount synthfs because of MyDir/ umount: unmount(/Volumes/ReadOnly/mnt): Resource busy $ sudo rmdir /Volumes/ReadOnly/mnt/MyDir # remove MyDir/ $ umount /Volumes/ReadOnly/mnt # now synthfs can be unmounted $

Note that if it is required to keep a synthfs mount point's existing contents visible, you can mount synthfs with the union option (see Section 11.7.18).

11.7.18. union

The null mount file system (nullfs) is a stackable file system in 4.4BSD. It allows mounting of one part of the file system in a different location. This can be used to join multiple directories into a new directory tree. Thus, file system hierarchies on various disks can be presented as one directory tree. Moreover, subtrees of a writable file system can be made read-only. Mac OS X does not use nullfs, but it does provide the union mount file system, which conceptually extends nullfs by not hiding the files in the "mounted on" directoryrather, it merges the two directories (and their trees) into a single view. In a union mount, duplicate names are suppressed. Given a name, a lookup locates the logically topmost entity with that name. Let us look at a sequence of commands that will illustrate the basic concepts behind union mounting.

First, we create two disk images with HFS+ file systems and attach them.

$ hdiutil create -size 16m -layout NONE -fs HFS+ \     volname Volume1 /tmp/Volume1.dmg ... $ hdiutil create -size 16m -layout NONE -fs HFS+ \     volname Volume2 /tmp/Volume2.dmg ... $ hdiutil attach -nomount /tmp/Volume1.img /dev/disk10            Apple_HFS $ hdiutil attach -nomount /tmp/Volume2.img /dev/disk11            Apple_HFS

Next, we mount both images and create files on them: Volume1 will contain one file (a.txt), whereas Volume2 will contain two files (a.txt and b.txt).

$ mkdir /tmp/union $ mount -t hfs /dev/disk10 /tmp/union $ echo 1 > /tmp/union/a.txt $ umount /dev/disk10 $ mount -t hfs /dev/disk11 /tmp/union $ echo 2 > /tmp/union/a.txt $ echo 2 > /tmp/union/b.txt $ umount /dev/disk11

Let us now union-mount both file systems by specifying the union option to the mount command.

$ mount -t hfs -o union /dev/disk10 /tmp/union $ mount -t hfs -o union /dev/disk11 /tmp/union

Since Volume2 was mounted on top of Volume1, a filename that exists in botha.txt in our casewill be suppressed in the latter. In other words, we will access the file on the logically topmost volume.

$ ls /tmp/union         # contents will be union of Volume1 and Volume2 a.txt b.txt $ cat /tmp/union/a.txt  # this should come from Volume2 (the top volume) 2 $ umount /dev/disk11    # let us unmount Volume2 $ ls /tmp/union         # we should only see the contents of Volume1 a.txt $ cat /tmp/union/a.txt  # this should now come from Volume1 1 $ umount /dev/disk10

We can also union-mount the volumes in the opposite order and verify whether doing so causes a.txt to come from Volume1 instead.

$ mount -t hfs -o union /dev/disk11 /tmp/union $ mount -t hfs -o union /dev/disk10 /tmp/union $ ls /tmp/union a.txt   b.txt $ cat /tmp/union/a.txt 1

If we wrote to a.txt now, it would modify only the top volume (Volume1). The file b.txt appears in the union but is present only in the bottom volume. Let us see what happens if we write to b.txt.

$ cat /tmp/union/b.txt 2 $ echo 1 > /tmp/union/b.txt $ cat /tmp/union/b.txt 1 $ umount /dev/disk10s2       # unmount top volume (Volume1) $ cat /tmp/union/b.txt       # check contents of b.txt in Volume2 2

We see that the bottom volume's b.txt is unchanged. Our writing to b.txt resulted in its creation as well, because it did not exist in the union layer we were writing to. If we delete a file that exists in the top two layers, the file in the topmost layer is deleted, and the one from the layer below shows up.

$ mount -t hfs -o union /dev/disk10 /tmp/union $ cat /tmp/union/b.txt 1 $ rm /tmp/union/b.txt $ cat /tmp/union/b.txt 2

The /etc/rc startup script on the Mac OS X installer disc uses union mounting to mount RAM disks on top of directories that the installation process is likely to write to, such as /Volumes, /var/tmp, and /var/run.

11.7.19. volfs

The volume ID file system (volfs) is a virtual file system that exists over the VFS of another file system. It serves the needs of two different Mac OS X APIs: the POSIX API and the Carbon File Manager API. Whereas the POSIX API uses Unix-style pathnames, the Carbon API specifies a file system object by a triplet consisting of a volume ID, a containing folder ID, and a node name. volfs makes it possible to use the Carbon API atop a Unix-style file system.

By default, volfs is mounted on the /.vol directory. Each mounted volume is represented by a subdirectory under /.vol, provided the volume's file system supports volfs. HFS+ and HFS support volfs, whereas UFS does not.

In Mac OS X versions prior to 10.4, volfs is mounted by /etc/rc during system startup. Beginning with Mac OS X 10.4, it is mounted by launchd.

$ mount /dev/disk1s3 on / (local, journaled) devfs on /dev (local) fdesc on /dev (union) <volfs> on /.vol ... $ ls -li /.vol total 0 234881029 dr-xr-xr-x   2 root  wheel  64 Oct 23 18:33 234881029

/.vol in this example contains only one entry, which corresponds to the root volume. In general, reading directory entries at the topmost level in a volfs instance will return a list of all mounted volumes that support volfs. Each directory's name is the decimal representation of the corresponding device number (dev_t). Given a device's major and minor numbers, the value of dev_t can be constructed using the makedev() macro.

// <sys/types.h> #define makedev(x,y)    ((dev_t)(((x) << 24) | (y)))

Let us compute the device number of the disk in our current example and verify that its volfs entry indeed has that name.

$ ls -l /dev/disk1s3 brw-r-----   1 root  operator   14,   5 Oct 23 18:33 /dev/disk1s3 $ perl -e 'my $x = (14 << 24) | 5; print "$x\n"' 234881029

If we know a file's ID and the volume ID of its containing volume, we can access the file through volfs. As we will see in Chapter 12, a file's inode number (as reported by ls -i) is its HFS+ file ID in most cases. Consider a file, say, /mach_kernel:

$ ls -li /mach_kernel 2150438 -rw-r--r--   1 root  wheel  4308960 Jul  2 22:28 /mach_kernel $ ls -li /.vol/234881029/2150438 2150438 -rw-r--r--   1 root  wheel  4308960 Jul  2 22:28 /.vol/234881029/2150438

Similarly, all files and directories within the root file system are accessible using their file IDs through volfs. However, note that volfs vnodes exist only for the root of each volumethat is, the volfs hierarchy has only two levels. Reading directory entries within a /.vol subdirectory will return only the . and .. entries. In other words, you cannot enumerate the contents of a file system through volfsyou must know the ID of the target file system object to access it through volfs.

$ ls -lid /usr 11061 drwxr-xr-x   11 root  wheel  374 May 11 19:18 /usr $ ls -las /.vol/234881029/usr ls: /.vol/234881029/usr: No such file or directory $ ls -las /.vol/234881029/11061 total 0 0 drwxr-xr-x    11 root  wheel    374 May 11 19:18 . 0 drwxrwxr-t    39 root  admin   1428 Oct 23 18:33 .. 0 drwxr-xr-x     8 root  wheel    272 Mar 27  2005 X11R6 0 drwxr-xr-x   736 root  wheel  25024 Oct 24 15:00 bin ...

The `/proc` File System

Mac OS X does not provide the /proc file system. It does provide alternative interfaces such as sysctl(3) and the obsoleted kvm(3). The sysctl(3) interface provides read and write access to a management information base (MIB) whose contents are various categories of kernel information, such as information related to file systems, virtual memory, networking, and debugging. As we saw in Chapter 8, the kvm(3) interface provides access to raw kernel memory.