11.6. The VFS LayerMac OS X provides a virtual file system interfacethe vnode/vfs layeroften referred to simply as the VFS layer. First implemented by Sun Microsystems, the vnode/vfs concept is widely used by modern operating systems to allow multiple file systems to coexist in a clean and maintainable manner. A vnode (virtual node) is an in-kernel representation of a file, whereas a vfs (virtual file system) represents a file system. The VFS layer sits between the file-system-independent and file-system-dependent code in the kernel, thereby abstracting file system differences from the rest of the kernel, which uses VFS-layer functions to perform I/Oregardless of the underlying file systems. Beginning with Mac OS X 10.4, a VFS kernel programming interface (KPI) is implemented in bsd/vfs/kpi_vfs.c.
The Mac OS X VFS is derived from FreeBSD's VFS, although there are numeroususually minor in conceptdifferences. An area of major difference is the file system layer's integration with virtual memory. The unified buffer cache (UBC) on Mac OS X is integrated with Mach's virtual memory layer. As we saw in Chapter 8, the ubc_info structure associates Mac OS X vnodes with the corresponding virtual memory objects. Figure 1112 shows a simplistic visualization of the vnode/vfs layer. In object-oriented parlance, the vfs is akin to an abstract base class from which specific file system instances such as HFS Plus and UFS are derived. Continuing with the analogy, the vfs "class" contains several pure virtual functions that are defined by the derived classes. The vfsops structure [bsd/sys/mount.h] acts as a function-pointer table for these functions, which include the following (listed in the order they appear in the structure):
Figure 1112. An overview of the vnode/vfs layer's role in the operating systemSimilarly, a vnode is an abstract base class from which files residing on various file systems are conceptually derived. A vnode contains all the information that the file-system-independent layer of the kernel needs. Just as the vfs has a set of virtual functions, a vnode too has a (larger) set of functions representing vnode operations. Normally, all vnodes representing files on a given file system type share the same function-pointer table. As Figure 1112 shows, a mount structure represents an instance of a mounted file system. Besides a pointer to the vfs operations table, the mount structure also contains a pointer (mnt_data) to instance-specific private datawhich is private in that it is opaque to the file-system-independent code. For example, in the case of HFS Plus, mnt_data points to an hfsmount structure, which we will discuss in Chapter 12. Similarly, a vnode contains a private data pointer (v_data) that points to a file-system-specific per-file structurefor example, the cnode and inode structures in the case of HFS Plus and UFS, respectively. Because of the arrangement shown in Figure 1112, the code outside of the VFS layer usually need not worry about file system differences. Incoming file and file system operations are routed through the vnode and mount structures, respectively, to the appropriate file systems.
Technically, code outside the VFS layer should see the vnode and mount structures as opaque handles. The kernel uses vnode_t and mount_t, respectively, as the corresponding opaque types. Figure 1113 shows a more detailed view of key vnode/vfs data structures. The mountlist global variable is the head of a list of mount structuresone per mounted file system. Each mount structure has a list of associated vnodesmultiple lists, actually (the mnt_workerqueue and mnt_newvnodes lists are used when iterating over all vnodes in the file system). Note that the details shown correspond to a mounted HFS Plus file system. Figure 1113. A mounted file system and its vnodesThe kernel maintains an in-memory vfstable structure ([bsd/sys/mount_internal.h]) for each file system type supported. The global variable vfsconf points to a list of these structures. When there is a mount request, the kernel searches this list to identify the appropriate file system. Figure 1114 shows an overview of the vfsconf list, which is declared in bsd/vfs/vfs_conf.c. Figure 1114. Configuration information for file system types supported by the kernelThere also exists a user-visible vfsconf structure (not a list), which contains a subset of the information contained in the corresponding vfstable structure. The CTL_VFSVFS_CONF sysctl operation can be used to retrieve the vfsconf structure for a given file system type. The program in Figure 1115 retrieves and displays information about all file system types supported by the running kernel. Figure 1115. Displaying information about all available file system types
Note that the program output in Figure 1115 would contain additional file system types if new file systems (such as MS-DOS and NTFS) were dynamically loaded into the kernel. The vnode structure is declared in bsd/vfs/vnode_internal.hits internals are private to the VFS layer, although the VFS KPI provides several functions to access and manipulate vnode structures. vnode_internal.h also declares the vnodeop_desc structure, an instance of which describes a single vnode operation such as "lookup," "create," and "open." The file bsd/vfs/vnode_if.c contains the declaration of a vnodeop_desc structure for each vnode operation known to the VFS layer, as shown in this example. struct vnodeop_desc vnop_mknod_desc = { 0, // offset in the operations vector (initialized by vfs_op_init()) "vnop_mknod", // a human-readable name -- for debugging 0 | VDESC_VP0_WILLRELE | VDESC_VPP_WILLRELE, // flags // various offsets used by the nullfs bypass routine (unused in Mac OS X) ... };
The shell script bsd/vfs/vnode_if.sh parses an input file (bsd/vfs/vnode_if.src) to automatically generate bsd/vfs/vnode_if.c and bsd/sys/vnode_if.h. The input file contains a specification of each vnode operation descriptor. A vnodeop_desc structure is referred to by a vnodeopv_entry_desc [bsd/sys/vnode.h] structure, which represents a single entry in a vector of vnode operations. // bsd/sys/vnode.h struct vnodeopv_entry_desc { struct vnodeop_desc *opve_op; // which operation this is int (*opve_impl)(void *); // code implementing this operation }; The vnodeopv_desc structure [bsd/sys/vnode.h] describes a vector of vnode operationsit contains a pointer to a null-terminated list of vnodeopv_entry_desc structures. // bsd/sys/vnode.h struct vnodeopv_desc { int (***opv_desc_vector_p)(void *); struct vnodeopv_entry_desc *opv_desc_ops; }; Figure 1116 shows how vnode operation data structures are maintained in the VFS layer. There is a vnodeopv_desc for each supported file system. The file bsd/vfs/vfs_conf.c declares a list of vnodeopv_desc structures for built-in file systems. Figure 1116. Vnode operations vectors in the VFS layer
Typically, each vnodeopv_desc is declared in a file-system-specific file. For example, bsd/hfs/hfs_vnops.c declares hfs_vnodeop_opv_desc. // bsd/hfs/hfs_vnops.c struct vnodeopv_desc hfs_vnodeop_opv_desc = { &hfs_vnodeop_p, hfs_vnodeop_entries }; hfs_vnodeop_entriesa null-terminated list of vnodeopv_entry_desc structuresis declared in bsd/hfs/hfs_vnops.c as well. // bsd/hfs/hfs_vnops.c #define VOPFUNC int (*)(void *) struct vnodeopv_entry_desc hfs_vnodeop_entries[] = { { &vnop_default_desc, (VOPFUNC)vn_default_error }, // default { &vnop_lookup_desc, (VOPFUNC)hfs_vnop_lookup }, // lookup { &vnop_create_desc, (VOPFUNC)hfs_vnop_create }, // create { &vnop_mknod_desc, (VOPFUNC)hfs_vnop_mknod }, // mknod ... { NULL, (VOPFUNC)NULL } }; During bootstrapping, bsd_init() [bsd/kern/bsd_init.c] calls vfsinit() [bsd/vfs/vfs_init.c] to initialize the VFS layer. Section 5.7.2 enumerates the important operations performed by vfsinit(). It calls vfs_op_init() [bsd/vfs/vfs_init.c] to set known vnode operation vectors to an initial state. // bsd/vfs/vfs_init.c void vfs_op_init() { int i; // Initialize each vnode operation vector to NULL // struct vnodeopv_desc *vfs_opv_descs[] for (i = 0; vfs_opv_descs[i]; i++) *(vfs_opv_descs[i]->opv_desc_vector_p) = NULL; // Initialize the offset value in each vnode operation descriptor // struct vnodeop_desc *vfs_op_descs[] for (vfs_opv_numops = 0, i = 0, vfs_op_descs[i]; i++) { vfs_op_descs[i]->vdesc_offset = vfs_opv_numops; vfs_opv_numops++; } } Next, vfsinit() calls vfs_opv_init() [bsd/vfs/vfs_init.c] to populate the operations vectors. vfs_opv_init() iterates over each element of vfs_opv_descs, checking whether the opv_desc_vector_p field of each entry points to a NULLif so, it allocates the vector before populating it. Figure 1117 shows the operation of vfs_opv_init(). Figure 1117. Initialization of vnode operations vectors during bootstrap
Figure 1116 shows an interesting feature of the FreeBSD-derived VFS layer: There can be multiple vnode operations vectors for a given vnodeopv_desc. |