14.5. The Virtual File System (vfs) InterfaceThe vfs layer provides an administrative interface into the file system to support commands like mount and umount in a file-system-independent manner. The interface achieves independence by means of a virtual file system (vfs) object. The vfs object represents an encapsulation of a file system's state and a set of methods for each of the file system administrative interfaces. Each file system type provides its own implementation of the object. Figure 14.4 illustrates the vfs object. A set of support functions provides access to the contents of the vfs structure; file systems should not directly modify the vfs object contents. Figure 14.4. The vfs Object
14.5.1. vfs MethodsThe methods within the file system implement operations on behalf of the common operating system code. For example, given a pointer to a tmpfs's vfs object, the generic VFS_MOUNT() call will invoke the appropriate function in the underlying file system by calling the tmpfs_mount() method defined within that instance of the object. #define VFS_MOUNT(vfsp, mvp, uap, cr) fsop_mount(vfsp, mvp, uap, cr) int fsop_mount(vfs_t *vfsp, vnode_t *mvp, struct mounta *uap, cred_t *cr) { return (*(vfsp)->vfs_op->vfs_mount)(vfsp, mvp, uap, cr); } See usr/src/uts/common/sys/vfs.h A file system declares its vfs methods through a call to vfs_setfsops(). A template provides allows a selection of methods to be defined, according to Table 14.1.
A regular file system will define mount, unmount, root, statvfs, and vget methods. The vfs methods are defined in an fs_operation_def_t template, terminated by a NULL entry. The template is constructed from an array of fs_operation_def_t structures. The following example from the tmpfs implementation shows how the template is initialized and then instantiated with vfs_setfsops(). The call to vfs_setfsops() is typically done once per module initialization, systemwide. static int tmpfsinit(int fstype, char *name) { static const fs_operation_def_t tmp_vfsops_template[] = { VFSNAME_MOUNT, tmp_mount, VFSNAME_UNMOUNT, tmp_unmount, VFSNAME_ROOT, tmp_root, VFSNAME_STATVFS, tmp_statvfs, VFSNAME_VGET, tmp_vget, NULL, NULL }; int error; error = vfs_setfsops(fstype, tmp_vfsops_template, NULL); ... } See usr/src/uts/common/fs/tmpfs/tmp_vfsops.c A corresponding free of the vfs methods is required at module unload time and is typically located in the _fini() function of the module. int _fini() { int error; error = mod_remove(&modlinkage); if (error) return (error); /* * Tear down the operations vectors */ (void) vfs_freevfsops_by_type(tmpfsfstype); vn_freevnodeops(tmp_vnodeops); return (0); } See usr/src/uts/common/fs/tmpfs/tmp_vfsops.c The following routines are available in the vfs layer to manipulate the vfs object. They provide support for creating and modifying the FS methods (fsops), /* * File systems use arrays of fs_operation_def structures to form * name/value pairs of operations. These arrays get passed to: * * - vn_make_ops() to create vnodeops * - vfs_makefsops()/vfs_setfsops() to create vfsops. */ typedef struct fs_operation_def { char *name; /* name of operation (NULL at end) */ fs_generic_func_p func; /* function implementing operation */ } fs_operation_def_t; int vfs_makefsops(const fs_operation_def_t *template, vfsops_t **actual); Creates and builds (dummy) vfsops structures void vfs_setops(vfs_t *vfsp, vfsops_t *vfsops); Sets the operations vector for this vfs vfsops_t * vfs_getops(vfs_t *vfsp); Retrieves the operations vector for this vfs void vfs_freevfsops(vfsops_t *vfsops); Frees a vfsops structure created by vfs_makefsops() int vfs_freevfsops_by_type(int fstype); For a vfsops structure created by vfs_setfsops(), use vfs_freevfsops_by_type() int vfs_matchops(vfs_t *vfsp, vfsops_t *vfsops); Determines if the supplied operations vector matches the vfs's operations vector. Note that this is a "shallow" match. The pointer to the operations vector is compared, not each individual operation. See usr/src/uts/common/sys/vfs.h 14.5.2. vfs Support FunctionsThe following support functions are available for parsing option strings and filling in the necessary vfs structure fields. The file systems also need to parse the option strings to learn what options should be used in completing the mount request. The routines and data structures are all defined in the vfs.h header file. It is expected that all the fields used by the file-system-specific mount code in the vfs structure are normally filled in and interrogated only during a mount system call. At mount time the vfs structure is private and not available to any other parts of the kernel. So during this time, locking of the fields used in mnttab/ options is not necessary. If a file system wants to update or interrogate options at some later time, then it should be locked by the vfs_lock_wait()/vfs_unlock() functions. All memory allocated by the following routines is freed at umount time, so callers need not worry about memory leakage. Any arguments whose values are preserved in a structure after a call have been copied, so callers need not worry about retained references to any function arguments. struct mntopts_t *vfs_opttblptr(struct vfs *vfsp); Returns a pointer to the mount options table for the given vfs structure. void vfs_initopttbl(const mntopts_t *proto, mntopts_t *tbl); Initializes a mount options table from the prototype mount options table pointed to by the first argument. A file system should always initialize the mount options table in the vfs structure for the current mount but may use this routine to initialize other tables if desired. See the documentation below on how to construct a prototype mount options table. Note that the vfs_opttblptr() function described above should be used to access the vfs structures mount options table. void vfs_parsemntopts(mntopts_t *tbl, char *optionstr); Parses the option string pointed to by the second argument, using the mount options table pointed to by the first argument. Any recognized options will be marked by this function as set in the pointed-to options table and any arguments found are recorded there as well. Normally file systems would call this with a pointer to the mount options table in the vfs structure for the mount currently being processed. The mount options table may be examined after the parse is completed, to see which options have been recognized, by using the vfs_optionisset() function documented below. Note that the parser will alter the option string during parsing, but will restore it before returning. Any options in the option string being parsed that are not recognized are silently ignored. Also if an option requires an arg but it is not supplied, the argument pointer is silently set to NULL. Since options are parsed from left to right, the last specification for any particular option in the option string is the one used. Similarly, if options that toggle each other on or off (i.e. are mutually exclusive), are in the same options string, the last one seen in left to right parsing determines the state of the affected option(s). void vfs_clearmntopt(mntopts_t *tbl, const char *opt); Clears the option whose name is passed in the second argument from the option table pointed to by the first argument, i.e., marks the option as not set and frees any argument that may be associated with the option. Used by file systems to unset options if so desired in a mount options table. Note that the only way to return options to their default state is to reinitialize the options table with vfs_initopttbl(). void vfs_setmntopt(mntopts_t *tbl, const char *opt, const char *arg, int flags); Marks the option whose name is given by the second argument as set in the mount options table pointed to by the first argument. If the option takes an argument, the third parameter points to the string for the argument. The flags arg is provided to affect the behavior of the vfs_setmntopt function. It can cause it to override the MO_IGNORE flag if the particular option being set has this flag enabled. It can also be used to request toggling the MO_NODISPLAY bit for the option on or off. (see the documentation for mount option tables). Used by file systems to manually mark options as set in a mount options table. Possible flags to vfs_setmntopt: VFS_DISPLAY 0x02 /* Turn off MO_NODISPLAY bit for option */ VFS_NODISPLAY 0x04 /* Turn on MO_NODISPLAY bit for option */ int vfs_optionisset(mntopts_t *tbl, const char *opt, char **argp); Inquires if the option named by the second argument is marked as set in the mount options table pointed to by the first argument. Returns non-zero if the option was set. If the option has an argument string, the arg pointed to by the argp pointer is filled in with a pointer to the argument string for the option. The pointer is to the saved argument string and not to a copy. Users should not directly alter the pointed to string. If any change is desired to the argument string the caller should use the set/ clearmntopt() functions. int vfs_buildoptionstr(mntopts_t *tbl, char *buf, int len); Builds a comma-separated, null-terminated string of the mount options that are set in the table passed in the first argument. The buffer passed in the second argument is filled in with the generated options string. If the length passed in the third argument would be exceeded, the function returns EOVERFLOW; otherwise, it returns zero on success. If an error is returned, the contents of the result buffer are undefined. int vfs_setoptprivate(mntopts_t *tbl, const char *opt, void *arg); Sets the private data field of the given option in the specified option table to the provided value. Returns zero on success, non-zero if the named option does not exist in the table. Note that option private data is not managed for the user. If the private data field is a pointer to allocated memory, then it should be freed by the file system code prior to returning from a umount call. int vfs_getoptprivate(mntopts_t *tbl, const char *opt, void **argp); Fills in the pointer pointed to by the argp pointer with the value of the private data field of the given option in the specified table. Returns zero on success, non-zero if the named option does not exist in the table. void vfs_setmntpoint(struct vfs *vfsp, char *mp); Sets the vfs_mntpt field of the vfs structure to the given mount point. File systems call this if they want some value there other than what was passed by the mount system call. int vfs_can_sync(vfs_t *vfsp); Determines if a vfs has an FS-supplied (non default, non error) sync routine. void vfs_setresource(struct vfs *vfsp, char *resource); Sets the vfs_resource field of the vfs structure to the given resource. File systems call this if they want some value there other than what was passed by the mount system call. See usr/src/uts/common/sys/vfs.h 14.5.3. The mount MethodThe mount method is responsible for initializing a per-mount instance of a file system. It is typically invoked as a result of a user-initiated mount command. Figure 14.5. Mount Invocation
The tasks completed in the mount method will often include
An excerpt from the tmpfs implementation shows an example of the main functions within a file system mount method. static int tmp_mount( struct vfs *vfsp, struct vnode *mvp, struct mounta *uap, struct cred *cr) { struct tmount *tm = NULL; ... if ((error = secpolicy_fs_mount(cr, mvp, vfsp)) != 0) return (error); if (mvp->v_type != VDIR) return (ENOTDIR); /* tmpfs doesn't support read-only mounts */ if (vfs_optionisset(vfsp, MNTOPT_RO, NULL)) { error = EINVAL; goto out; } ... if (error = pn_get(uap->dir, (uap->flags & MS_SYSSPACE) ? UIO_SYSSPACE : UIO_USERSPACE, &dpn)) goto out; if ((tm = tmp_memalloc(sizeof (struct tmount), 0)) == NULL) { pn_free(&dpn); error = ENOMEM; goto out; } ... vfsp->vfs_data = (caddr_t)tm; vfsp->vfs_fstype = tmpfsfstype; vfsp->vfs_dev = tm->tm_dev; vfsp->vfs_bsize = PAGESIZE; vfsp->vfs_flag |= VFS_NOTRUNC; vfs_make_fsid(&vfsp->vfs_fsid, tm->tm_dev, tmpfsfstype); ... tm->tm_dev = makedevice(tmpfs_major, tmpfs_minor); ... See usr/src/uts/common/fs/tmpfs/tmp_vfsops.c 14.5.4. The umount MethodThe umount method is almost the reverse of mount. The tasks completed in the umount method will often include
14.5.5. Root vnode IdentificationThe root method of the file system is a simple function used by the file system lookup functions when traversing across a mount point into a new file system. It simply returns a pointer to the root vnode in the supplied vnode pointer argument. static int tmp_root(struct vfs *vfsp, struct vnode **vpp) { struct tmount *tm = (struct tmount *)VFSTOTM(vfsp); struct tmpnode *tp = tm->tm_rootnode; struct vnode *vp; ASSERT(tp); vp = TNTOV(tp); VN_HOLD(vp); *vpp = vp; return (0); } See usr/src/uts/common/fs/tmpfs/tmp_vfsops.c 14.5.6. vfs Information Available with MDBThe mounted list of vfs objects is linked as shown in Figure 14.6. Figure 14.6. The Mounted vfs ListYou can traverse the list with an mdb walker. Below is the output of such a traversal. sol10# mdb -k > ::walk vfs fffffffffbc7a7a0 fffffffffbc7a860 > ::walk vfs |::fsinfo -v VFSP FS MOUNT fffffffffbc7a7a0 ufs / R: /dev/dsk/c3d1s0 O: remount,rw,intr,largefiles,logging,noquota,xattr,nodfratime fffffffffbc7a860 devfs /devices R: /devices ffffffff80129300 ctfs /system/contract R: ctfs ffffffff80129240 proc /proc R: proc You can also inspect a vfs object with mdb. An example is shown below. sol10# mdb -k > ::walk vfs fffffffffbc7a7a0 fffffffffbc7a860 > fffffffffbc7a7a0::print vfs_t { vfs_next = devices vfs_prev = 0xffffffffba3ef0c0 vfs_op = vfssw+0x138 vfs_vnodecovered = 0 vfs_flag = 0x420 vfs_bsize = 0x2000 vfs_fstype = 0x2 vfs_fsid = { val = [ 0x19800c0, 0x2 ] } vfs_data = 0xffffffff8010ae00 vfs_dev = 0x66000000c0 vfs_bcount = 0 vfs_list = 0 vfs_hash = 0xffffffff816a8b40 vfs_reflock = { _opaque = [ 0, 0 ] } vfs_count = 0x2 vfs_mntopts = { mo_count = 0x20 mo_list = 0xffffffff8133d580 } vfs_resource = 0xffffffff8176dbb8 vfs_mntpt = 0xffffffff81708590 vfs_mtime = 2005 May 17 23:47:13 vfs_femhead = 0 vfs_zone = zone0 vfs_zone_next = devices vfs_zone_prev = 0xffffffffba3ef0c0 } |