Section 14.5. The Virtual File System (vfs) Interface


14.5. The Virtual File System (vfs) Interface

The vfs layer provides an administrative interface into the file system to support commands like mount and umount in a file-system-independent manner. The interface achieves independence by means of a virtual file system (vfs) object. The vfs object represents an encapsulation of a file system's state and a set of methods for each of the file system administrative interfaces. Each file system type provides its own implementation of the object. Figure 14.4 illustrates the vfs object. A set of support functions provides access to the contents of the vfs structure; file systems should not directly modify the vfs object contents.

Figure 14.4. The vfs Object


14.5.1. vfs Methods

The methods within the file system implement operations on behalf of the common operating system code. For example, given a pointer to a tmpfs's vfs object, the generic VFS_MOUNT() call will invoke the appropriate function in the underlying file system by calling the tmpfs_mount() method defined within that instance of the object.

#define VFS_MOUNT(vfsp, mvp, uap, cr) fsop_mount(vfsp, mvp, uap, cr) int fsop_mount(vfs_t *vfsp, vnode_t *mvp, struct mounta *uap, cred_t *cr) {         return (*(vfsp)->vfs_op->vfs_mount)(vfsp, mvp, uap, cr); }                                                        See usr/src/uts/common/sys/vfs.h 


A file system declares its vfs methods through a call to vfs_setfsops(). A template provides allows a selection of methods to be defined, according to Table 14.1.

Table 14.1. Solaris 10 vfs Interface Methods from sys/vfs.h

Method

Description

VFS_MOUNT

Mounts a file system on the supplied vnode. The file-system-dependent part of mount includes these actions.

  • Determine if mount device is appropriate.

  • Prepare mount device (e.g., flush pages/blocks).

  • Read file-system-dependent data from mount device.

  • Sanity-check file-system-dependent data.

  • Create/initialize file-system-dependent kernel data structures.

  • Reconcile any transaction devices.

VFS_UNMOUNT

Unmounts the file system. The file-system-dependent part of unmount includes these actions.

  • Lock out new transactions and complete current transactions.

  • Flush data to mount device.

  • Close down any helper threads.

  • Tear down file-system-dependent kernel data structures.

VFS_ROOT

Finds the root vnode for a file system.

VFS_STATVFS

Queries statistics on a file system.

VFS_SYNC

Flushes the file system cache.

VFS_VGET

Finds a vnode that matches a unique file ID.

VFS_MOUNTROOT

Mounts the file system on the root directory.

VFS_FREEVFS

Calls back to free resources after last unmount. NFS appears to be the only one that needs this. All others default to fs_freevfs(), which is a no-op.

VFS_VNSTATE

Interface for vnode life cycle reporting.


A regular file system will define mount, unmount, root, statvfs, and vget methods. The vfs methods are defined in an fs_operation_def_t template, terminated by a NULL entry. The template is constructed from an array of fs_operation_def_t structures. The following example from the tmpfs implementation shows how the template is initialized and then instantiated with vfs_setfsops(). The call to vfs_setfsops() is typically done once per module initialization, systemwide.

static int tmpfsinit(int fstype, char *name) {         static const fs_operation_def_t tmp_vfsops_template[] = {                 VFSNAME_MOUNT, tmp_mount,                 VFSNAME_UNMOUNT, tmp_unmount,                 VFSNAME_ROOT, tmp_root,                 VFSNAME_STATVFS, tmp_statvfs,                 VFSNAME_VGET, tmp_vget,                 NULL, NULL         };         int error;         error = vfs_setfsops(fstype, tmp_vfsops_template, NULL); ... }                                            See usr/src/uts/common/fs/tmpfs/tmp_vfsops.c 


A corresponding free of the vfs methods is required at module unload time and is typically located in the _fini() function of the module.

int _fini() {         int error;         error = mod_remove(&modlinkage);         if (error)                 return (error);         /*          * Tear down the operations vectors          */         (void) vfs_freevfsops_by_type(tmpfsfstype);         vn_freevnodeops(tmp_vnodeops);         return (0); }                                            See usr/src/uts/common/fs/tmpfs/tmp_vfsops.c 


The following routines are available in the vfs layer to manipulate the vfs object. They provide support for creating and modifying the FS methods (fsops),

/*  * File systems use arrays of fs_operation_def structures to form  * name/value pairs of operations.  These arrays get passed to:  *  *      - vn_make_ops() to create vnodeops  *      - vfs_makefsops()/vfs_setfsops() to create vfsops.  */ typedef struct fs_operation_def {         char *name;                      /* name of operation (NULL at end) */         fs_generic_func_p func;          /* function implementing operation */ } fs_operation_def_t; int vfs_makefsops(const fs_operation_def_t *template, vfsops_t **actual); Creates and builds (dummy) vfsops structures void vfs_setops(vfs_t *vfsp, vfsops_t *vfsops); Sets the operations vector for this vfs vfsops_t * vfs_getops(vfs_t *vfsp); Retrieves the operations vector for this vfs void vfs_freevfsops(vfsops_t *vfsops); Frees a vfsops structure created by vfs_makefsops() int vfs_freevfsops_by_type(int fstype); For a vfsops structure created by vfs_setfsops(), use vfs_freevfsops_by_type() int vfs_matchops(vfs_t *vfsp, vfsops_t *vfsops); Determines if the supplied operations vector matches the vfs's operations vector. Note that this is a "shallow" match. The pointer to the operations vector is compared, not each individual operation.                                                        See usr/src/uts/common/sys/vfs.h 


14.5.2. vfs Support Functions

The following support functions are available for parsing option strings and filling in the necessary vfs structure fields. The file systems also need to parse the option strings to learn what options should be used in completing the mount request. The routines and data structures are all defined in the vfs.h header file.

It is expected that all the fields used by the file-system-specific mount code in the vfs structure are normally filled in and interrogated only during a mount system call. At mount time the vfs structure is private and not available to any other parts of the kernel. So during this time, locking of the fields used in mnttab/ options is not necessary. If a file system wants to update or interrogate options at some later time, then it should be locked by the vfs_lock_wait()/vfs_unlock() functions. All memory allocated by the following routines is freed at umount time, so callers need not worry about memory leakage. Any arguments whose values are preserved in a structure after a call have been copied, so callers need not worry about retained references to any function arguments.

struct mntopts_t *vfs_opttblptr(struct vfs *vfsp); Returns a pointer to the mount options table for the given vfs structure. void vfs_initopttbl(const mntopts_t *proto, mntopts_t *tbl); Initializes a mount options table from the prototype mount options table pointed to by the first argument.  A file system should always initialize the mount options table in the vfs structure for the current mount but may use this routine to initialize other tables if desired.  See the documentation below on how to construct a prototype mount options table. Note that the vfs_opttblptr() function described above should be used to access the vfs structures mount options table. void vfs_parsemntopts(mntopts_t *tbl, char *optionstr); Parses the option string pointed to by the second argument, using the mount options table pointed to by the first argument.  Any recognized options will be marked by this function as set in the pointed-to options table and any arguments found are recorded there as well.  Normally file systems would call this with a pointer to the mount options table in the vfs structure for the mount currently being processed. The mount options table may be examined after the parse is completed, to see which options have been recognized, by using the vfs_optionisset() function documented below.  Note that the parser will alter the option string during parsing, but will restore it before returning.  Any options in the option string being parsed that are not recognized are silently ignored.  Also if an option requires an arg but it is not supplied, the argument pointer is silently set to NULL. Since options are parsed from left to right, the last specification for any particular option in the option string is the one used.  Similarly, if options that toggle each other on or off (i.e. are mutually exclusive), are in the same options string, the last one seen in left to right parsing determines the state of the affected option(s). void vfs_clearmntopt(mntopts_t *tbl, const char *opt); Clears the option whose name is passed in the second argument from the option table pointed to by the first argument,  i.e., marks the option as not set and frees any argument that may be associated with the option.  Used by file systems to unset options if so desired in a mount options table.  Note that the only way to return options to their default state is to reinitialize the options table with vfs_initopttbl(). void vfs_setmntopt(mntopts_t *tbl, const char *opt, const char *arg, int flags); Marks the option whose name is given by the second argument as set in the mount options table pointed to by the first argument.  If the option takes an argument, the third parameter points to the string for the argument.  The flags arg is provided to affect the behavior of the vfs_setmntopt function.  It can cause it to override the MO_IGNORE flag if the particular option being set has this flag enabled.  It can also be used to request toggling the MO_NODISPLAY bit for the option on or off. (see the documentation for mount option tables).  Used by file systems to manually mark options as set in a mount options table.  Possible flags to vfs_setmntopt: VFS_DISPLAY  0x02 /* Turn off MO_NODISPLAY bit for option */ VFS_NODISPLAY 0x04 /* Turn on MO_NODISPLAY bit for option */ int vfs_optionisset(mntopts_t *tbl, const char *opt, char **argp); Inquires if the option named by the second argument is marked as set in the mount options table pointed to by the first argument.  Returns non-zero if the option was set. If the option has an argument string, the arg pointed to by the argp pointer is filled in with a pointer to the argument string for the option.  The pointer is to the saved argument string and not to a copy.  Users should not directly alter the pointed to string.  If any change is desired to the argument string the caller should use the set/ clearmntopt() functions. int vfs_buildoptionstr(mntopts_t *tbl, char *buf, int len); Builds a comma-separated, null-terminated string of the mount options that are set in the table passed in the first argument.  The buffer passed in the second argument is filled in with the generated options string.  If the length passed in the third argument would be exceeded, the function returns EOVERFLOW; otherwise, it returns zero on success. If an error is returned, the contents of the result buffer are undefined. int vfs_setoptprivate(mntopts_t *tbl, const char *opt, void *arg); Sets the private data field of the given option in the specified option table to the provided value.  Returns zero on success, non-zero if the named option does not exist in the table.  Note that option private data is not managed for the user.  If the private data field is a pointer to allocated memory, then it should be freed by the file system code prior to returning from a umount call. int vfs_getoptprivate(mntopts_t *tbl, const char *opt, void **argp); Fills in the pointer pointed to by the argp pointer with the value of the private data field of the given option in the specified table. Returns zero on success, non-zero if the named option does not exist in the table. void vfs_setmntpoint(struct vfs *vfsp, char *mp); Sets the vfs_mntpt field of the vfs structure to the given mount point. File systems call this if they want some value there other than what was passed by the mount system call. int vfs_can_sync(vfs_t *vfsp); Determines if a vfs has an FS-supplied (non default, non error) sync routine. void vfs_setresource(struct vfs *vfsp, char *resource); Sets the vfs_resource field of the vfs structure to the given resource. File systems call this if they want some value there other than what was passed by the mount system call.                                                        See usr/src/uts/common/sys/vfs.h 


14.5.3. The mount Method

The mount method is responsible for initializing a per-mount instance of a file system. It is typically invoked as a result of a user-initiated mount command.

Figure 14.5. Mount Invocation


The tasks completed in the mount method will often include

  • A security check, to ensure that the user has sufficient privileges to perform the requested mount. This is best done with a call to secpolicy_fs_mount(), with the Solaris Least Privilege framework.

  • A check to see if the specified mount point is a directory.

  • Initialization and allocation of per-file system mount structures and locks.

  • Parsing of the options supplied into the mount call, with the assistance of the vfs_option_* support functions.

  • Manufacture of a unique file system ID, with the help of vfs_make_fsid(). This is required to support NFS mount instances over the wire protocol using unique file system IDs.

  • Creation or reading of the root inode for the file system.

An excerpt from the tmpfs implementation shows an example of the main functions within a file system mount method.

static int tmp_mount(         struct vfs *vfsp,         struct vnode *mvp,         struct mounta *uap,         struct cred *cr) {         struct tmount *tm = NULL; ...         if ((error = secpolicy_fs_mount(cr, mvp, vfsp)) != 0)                 return (error);         if (mvp->v_type != VDIR)                 return (ENOTDIR);         /* tmpfs doesn't support read-only mounts */         if (vfs_optionisset(vfsp, MNTOPT_RO, NULL)) {                 error = EINVAL;                 goto out;         } ...         if (error = pn_get(uap->dir,             (uap->flags & MS_SYSSPACE) ? UIO_SYSSPACE : UIO_USERSPACE, &dpn))                 goto out;         if ((tm = tmp_memalloc(sizeof (struct tmount), 0)) == NULL) {                 pn_free(&dpn);                 error = ENOMEM;                 goto out;         } ...         vfsp->vfs_data = (caddr_t)tm;         vfsp->vfs_fstype = tmpfsfstype;         vfsp->vfs_dev = tm->tm_dev;         vfsp->vfs_bsize = PAGESIZE;         vfsp->vfs_flag |= VFS_NOTRUNC;         vfs_make_fsid(&vfsp->vfs_fsid, tm->tm_dev, tmpfsfstype); ...         tm->tm_dev = makedevice(tmpfs_major, tmpfs_minor); ...                                            See usr/src/uts/common/fs/tmpfs/tmp_vfsops.c 


14.5.4. The umount Method

The umount method is almost the reverse of mount. The tasks completed in the umount method will often include

  • A security check, to ensure that the user has sufficient privileges to perform the requested mount. This is best done with a call to secpolicy_fs_mount(), with the Solaris Least Privilege framework.

  • A check to see if the mount is a forced mount (to take special action, or reject the request if the file system doesn't support forcible unmounts and the reference count on the root node is >1).

  • Freeing of per-file system mount structures and locks.

14.5.5. Root vnode Identification

The root method of the file system is a simple function used by the file system lookup functions when traversing across a mount point into a new file system. It simply returns a pointer to the root vnode in the supplied vnode pointer argument.

static int tmp_root(struct vfs *vfsp, struct vnode **vpp) {         struct tmount *tm = (struct tmount *)VFSTOTM(vfsp);         struct tmpnode *tp = tm->tm_rootnode;         struct vnode *vp;         ASSERT(tp);         vp = TNTOV(tp);         VN_HOLD(vp);         *vpp = vp;         return (0); }                                            See usr/src/uts/common/fs/tmpfs/tmp_vfsops.c 


14.5.6. vfs Information Available with MDB

The mounted list of vfs objects is linked as shown in Figure 14.6.

Figure 14.6. The Mounted vfs List


You can traverse the list with an mdb walker. Below is the output of such a traversal.

sol10# mdb -k > ::walk vfs fffffffffbc7a7a0 fffffffffbc7a860 >  ::walk vfs |::fsinfo -v             VFSP FS               MOUNT fffffffffbc7a7a0 ufs             /               R: /dev/dsk/c3d1s0               O: remount,rw,intr,largefiles,logging,noquota,xattr,nodfratime fffffffffbc7a860 devfs /devices               R: /devices ffffffff80129300 ctfs            /system/contract               R: ctfs ffffffff80129240 proc            /proc               R: proc 


You can also inspect a vfs object with mdb. An example is shown below.

sol10# mdb -k > ::walk vfs fffffffffbc7a7a0 fffffffffbc7a860 > fffffffffbc7a7a0::print vfs_t {     vfs_next = devices     vfs_prev = 0xffffffffba3ef0c0     vfs_op = vfssw+0x138     vfs_vnodecovered = 0     vfs_flag = 0x420     vfs_bsize = 0x2000     vfs_fstype = 0x2     vfs_fsid = {         val = [ 0x19800c0, 0x2 ]     }     vfs_data = 0xffffffff8010ae00     vfs_dev = 0x66000000c0     vfs_bcount = 0     vfs_list = 0     vfs_hash = 0xffffffff816a8b40     vfs_reflock = {         _opaque = [ 0, 0 ]     }     vfs_count = 0x2     vfs_mntopts = {         mo_count = 0x20         mo_list = 0xffffffff8133d580     }     vfs_resource = 0xffffffff8176dbb8     vfs_mntpt = 0xffffffff81708590     vfs_mtime = 2005 May 17 23:47:13     vfs_femhead = 0     vfs_zone = zone0     vfs_zone_next = devices     vfs_zone_prev = 0xffffffffba3ef0c0 } 





SolarisT Internals. Solaris 10 and OpenSolaris Kernel Architecture
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)
ISBN: 0131482092
EAN: 2147483647
Year: 2004
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net