14.10. The Directory Name Lookup CacheThe directory name lookup cache (DNLC) is based on BSD 4.2 code. It was ported to Solaris 2.0 and threaded and has undergone some significant revisions. Most of the enhancements to the DNLC have been performance and threading, but a few visible changes are noteworthy. Table 14.6 summarizes the important changes to the DNLC.
14.10.1. DNLC OperationEach time we open a file, we call the open() system call with a path name. That path name must be translated to a vnode by the process of reading the directory and finding the corresponding name that matches the requested name. To prevent us from having to reread the directory every time we translate the path name, we cache the containing directory vnode/file-name name and the corresponding vnode mappings in the directory name lookup cache. The cache is managed as an LRU cache, so that most frequently used directory entries are kept in the cache. The Solaris DNLC replaces the original SVR4 DNLC algorithm. It yielded a significant improvement in scalability. The Solaris 2.4 DNLC algorithm removed LRU list lock contention by eliminating the LRU list completely. In addition, the list takes into account the number of references to a vnode and whether the vnode has any pages in the page cache. This design allows the DNLC to cache the most relevant vnodes, rather than just the most frequently looked-up vnodes. Figure 14.13 illustrates the Solaris DNLC. Figure 14.13. Solaris DNLCThe lookup algorithm uses a rotor pointing to a hash chain; the rotor switches chains for each invocation of dnlc_enter() that needs a new entry. The algorithm starts at the end of the chain and takes the first entry that has a vnode reference count of 1 or no pages in the page cache. In addition, during lookup, entries are moved to the front of the chain so that each chain is sorted in LRU order. The DNLC was enhanced to use the kernel memory allocator to allocate a variable length string for the name; this change removed the 31-character limit. In the Solaris 7 DNLC structure, shown in Figure 14.13, note that the name field has changed from a static structure to a pointer. The number of entries in the DNLC is controlled by the ncsize parameter, which is initialized to 4 * (max_nprocs + maxusers) + 320 at system boot. Most of the DNLC work is done with two functions: dnlc_enter() and dnlc_lookup(). When a file system wants to look up the name of a file, it first checks the DNLC with the dnlc_lookup() function, which queries the DNLC for an entry that matches the specified file name and directory vnode. If no entry is found, dnlc_lookup fails and the file system reads the directory from disk. When the file name is found, it is entered into the DNLC with the dnlc_enter() function. The DNLC stores entries on a hashed list (nc_hash[]) by file name and directory vnode pointer. Once the correct nc_hash chain is identified, the chain is searched linearly until the correct entry is found. The original BSD DNLC had 8 nc_hash entries, which was increased to 64 in SunOS 4.x. Solaris 2.0 sized the nc_hash list at boot, attempting to make the average length of each chain no more than 4 entries. It used the total DNLC size, ncsize, divided by the average length to establish the number of nc_hash entries. Solaris 2.3 had the average length of the chain dropped to 2 in an attempt to increase DNLC performance; however, other problems, related to the LRU list locking and described below, adversely affected performance. Each entry in the DNLC is also linked to an LRU list, in order of last use. When a new entry is added into the DNLC, the algorithm replaces the oldest entry from the LRU list with the new file name and directory vnode. Each time a lookup is done, the DNLC also takes the entry from the LRU and places it at the end of the list so that it won't be reused immediately. The DNLC uses the LRU list to attempt to keep most-used references in the cache. Although the DNLC list had been made short, the LRU list still caused contention because it required that a single lock be held around the entire chain. 14.10.2. Primary DNLC Support FunctionsThe primary DNLC support functions are summarized below. void dnlc_enter(vnode_t *dvp, char *name, vnode_t *vp, cred_t *cr); Enters a new ncache entry into the DNLC for the given name and directory vnode pointer. If an entry already exists for the name and directory pointer, the function returns with no action. void dnlc_update(vnode_t *dvp, char *name, vnode_t *vp, cred_t *cr); Enters a new ncache entry into the DNLC for the given name and directory vnode pointer. If an entry already exists for the name and directory pointer but the vnode is different, then the entry is overwritten. Otherwise, the function returns with no action. vnode_t *dnlc_lookup(vnode_t *dvp, char *name, cred_t *cr); Locates an ncache entry that matches the supplied name and directory vnode pointer. Returns a pointer to the vnode for that entry or returns NULL. void dnlc_purge(void); Called by the vfs framework when an umountall() is called. void dnlc_purge_vp(vnode_t *vp); Purges all entries matching the vnode supplied. int dnlc_purge_vfsp(vfs_t *vfs, int); Purges all entries matching the vfs supplied. void dnlc_remove(vnode_t *vp, char *name); Removes the entry matching the supplied name and directory vnode pointer. int dnlc_fs_purge1(struct vnodeops *vop); Purge 1 entry from the dnlc that is part of the file system(s) represented by 'vop'. The purpose of this routine is to allow users of the dnlc to free a vnode that is being held by the dnlc.If we find a vnode that we release which will result in freeing the underlying vnode (count was 1), return 1, 0 if no appropriate vnodes found. See usr/src/uts/common/sys/dnlc.h 14.10.3. DNLC Negative CacheThe DNLC has support for negative caching. Some applications repeatedly test for the existence or nonexistence of a file (for example, a lock file or a results file). In addition, many shell PATH variables list directories that don't exist. For these applications, caching the fact that the file doesn't exist (negative caching) is a performance boost. The DNLC negative cache follows the NFS negative-cache solution. It defines a negative cache vnode that is initialized with the reference count set to 1 so that VOP_INACTIVE() never gets called on it. vnode_t negative_cache_vnode; #define DNLC_NO_VNODE &negative_cache_vnode See usr/src/uts/common/sys/dnlc.h File systems were updated in Solaris 8 to use negative caching so that each dnlc_lookup() checks for a DNLC_NO_VNODE return. Negative cache entries will be added when directory lookups fail, and will be invalidated by dnlc_update() when a real file of that name is added. 14.10.4. DNLC Directory CacheThe directory cache adds a new set of interfaces to the DNLC to cache entire directories. The directory cache eliminates performance bottlenecks for directories with tens of thousands of files. This helps performance when the file name repeatedly changes and when new files are created. It removes the need to search the entire directory to find out if the file name already exists. It turns out that mail and news spool directories see this scenario all the time. The DNLC structure is shown below. /* * This structure describes the elements in the cache of recent * names looked up. * * Note namlen is a uchar_t to conserve space * and alignment padding. The max length of any * pathname component is defined as MAXNAMELEN * which is 256 (including the terminating null). * So provided this doesn't change, we don't include the null, * we always use bcmp to compare strings, and we don't start * storing full names, then we are ok. The space savings are worth it. */ typedef struct ncache { struct ncache *hash_next; /* hash chain, MUST BE FIRST */ struct ncache *hash_prev; struct vnode *vp; /* vnode the name refers to */ struct vnode *dp; /* vnode of parent of name */ int hash; /* hash signature */ uchar_t namlen; /* length of name */ char name[1]; /* segment name - null terminated */ } ncache_t; See usr/src/uts/common/sys/dnlc.h File systems must provide a structure for use only by the DNLC directory caching code for each directory. typedef struct dcanchor { void *dca_dircache; /* opaque directory cache handle */ kmutex_t dca_lock; /* protects the pointer and cache */ } dcanchor_t; All file systems have an in-memory xxnode (for example, inode in ufs) that could contain such a structure. Following is an example of how a file system would use the directory cache interfaces. fs_lookup(dir, name) { Return entry if in regular dnlc dcap = dir->dcap; switch dnlc_dir_lookup(dcap, name, &handle) case DFOUND: use handle to get and return vnode break case DNOENT: return ENOENT } caching = 0; if want to cache directory { switch dnlc_dir_start(dcap, num_dir_entries) case DNOMEM: case DTOOBIG: mark directory as non cache-able break; case caching = 1; } while not end of directory { if entry && caching handle = ino and offset; dnlc_dir_add_entry(dcap, entry_name, handle) if free space && caching handle = offset; dnlc_dir_add_space(dcap, length. handle) if entry matches get vnode if various errors if caching dnlc_dir_purge(dcap) return error } if caching dnlc_dir_complete(dcap) return vnode or ENOENT } The following set of new dnlc interfaces will be provided to cache complete directory contents (both entries and free space).
Additional notes on the directory cache interface are as follows:
14.10.5. DNLC Housekeeping ThreadThe DNLC maintains a task queue. The dnlc_reduce_cache() activates the task queue when there are ncsize name cache entries, and it reduces the size to dnlc_nentries_low_water, which is by default one hundredth less than (or 99% of) ncsize. If dnlc_nentries hits dnlc_max_nentries (twice ncsize), then this means that dnlc_reduce_cache() is failing to keep up. In this case, we refuse to add new entries to the dnlc until the task queue catches up. 14.10.6. DNLC StatisticsBelow is an example of DNLC statistics obtained with the kstat command. sol10$ kstat -n dnlcstats module: unix instance: 0 name: dnlcstats class: misc crtime 70.644144966 dir_add_abort 0 dir_add_max 0 dir_add_no_memory 0 dir_cached_current 0 dir_cached_total 269 dir_entries_cached_current 0 dir_fini_purge 0 dir_hits 131992 dir_misses 1312735 dir_reclaim_any 23 dir_reclaim_last 4 dir_remove_entry_fail 0 dir_remove_space_fail 0 dir_start_no_memory 0 dir_update_fail 0 double_enters 310146 enters 22732358 hits 384680010 misses 2390823 negative_cache_hits 6048394 pick_free 0 pick_heuristic 15613169 pick_last 632544 purge_all 0 purge_fs1 0 purge_total_entries 5369737 purge_vfs 27052 purge_vp 3009 snaptime 4408540.56846945 |