Section 14.10. The Directory Name Lookup Cache


14.10. The Directory Name Lookup Cache

The directory name lookup cache (DNLC) is based on BSD 4.2 code. It was ported to Solaris 2.0 and threaded and has undergone some significant revisions. Most of the enhancements to the DNLC have been performance and threading, but a few visible changes are noteworthy. Table 14.6 summarizes the important changes to the DNLC.

Table 14.6. Solaris DNLC Changes

Year

OS Rev

Comment

1984

BSD 4.2

14-character name maximum

1990

Solaris 2.0

31-character name maximum

1994

Solaris 2.4

Performance (new locking/search algorithm)

1998

Solaris 7

Variable name length

2001

Solaris 8

Directory caching and negative entry caching


14.10.1. DNLC Operation

Each time we open a file, we call the open() system call with a path name. That path name must be translated to a vnode by the process of reading the directory and finding the corresponding name that matches the requested name. To prevent us from having to reread the directory every time we translate the path name, we cache the containing directory vnode/file-name name and the corresponding vnode mappings in the directory name lookup cache. The cache is managed as an LRU cache, so that most frequently used directory entries are kept in the cache.

The Solaris DNLC replaces the original SVR4 DNLC algorithm. It yielded a significant improvement in scalability. The Solaris 2.4 DNLC algorithm removed LRU list lock contention by eliminating the LRU list completely. In addition, the list takes into account the number of references to a vnode and whether the vnode has any pages in the page cache. This design allows the DNLC to cache the most relevant vnodes, rather than just the most frequently looked-up vnodes.

Figure 14.13 illustrates the Solaris DNLC.

Figure 14.13. Solaris DNLC


The lookup algorithm uses a rotor pointing to a hash chain; the rotor switches chains for each invocation of dnlc_enter() that needs a new entry. The algorithm starts at the end of the chain and takes the first entry that has a vnode reference count of 1 or no pages in the page cache. In addition, during lookup, entries are moved to the front of the chain so that each chain is sorted in LRU order.

The DNLC was enhanced to use the kernel memory allocator to allocate a variable length string for the name; this change removed the 31-character limit. In the Solaris 7 DNLC structure, shown in Figure 14.13, note that the name field has changed from a static structure to a pointer.

The number of entries in the DNLC is controlled by the ncsize parameter, which is initialized to 4 * (max_nprocs + maxusers) + 320 at system boot.

Most of the DNLC work is done with two functions: dnlc_enter() and dnlc_lookup(). When a file system wants to look up the name of a file, it first checks the DNLC with the dnlc_lookup() function, which queries the DNLC for an entry that matches the specified file name and directory vnode. If no entry is found, dnlc_lookup fails and the file system reads the directory from disk. When the file name is found, it is entered into the DNLC with the dnlc_enter() function. The DNLC stores entries on a hashed list (nc_hash[]) by file name and directory vnode pointer. Once the correct nc_hash chain is identified, the chain is searched linearly until the correct entry is found.

The original BSD DNLC had 8 nc_hash entries, which was increased to 64 in SunOS 4.x. Solaris 2.0 sized the nc_hash list at boot, attempting to make the average length of each chain no more than 4 entries. It used the total DNLC size, ncsize, divided by the average length to establish the number of nc_hash entries. Solaris 2.3 had the average length of the chain dropped to 2 in an attempt to increase DNLC performance; however, other problems, related to the LRU list locking and described below, adversely affected performance.

Each entry in the DNLC is also linked to an LRU list, in order of last use. When a new entry is added into the DNLC, the algorithm replaces the oldest entry from the LRU list with the new file name and directory vnode. Each time a lookup is done, the DNLC also takes the entry from the LRU and places it at the end of the list so that it won't be reused immediately. The DNLC uses the LRU list to attempt to keep most-used references in the cache. Although the DNLC list had been made short, the LRU list still caused contention because it required that a single lock be held around the entire chain.

14.10.2. Primary DNLC Support Functions

The primary DNLC support functions are summarized below.

void    dnlc_enter(vnode_t *dvp, char *name, vnode_t *vp, cred_t *cr); Enters a new ncache entry into the DNLC for the given name and directory vnode pointer. If an entry already exists for the name and directory pointer, the function returns with no action. void    dnlc_update(vnode_t *dvp, char *name, vnode_t *vp, cred_t *cr); Enters a new ncache entry into the DNLC for the given name and directory vnode pointer. If an entry already exists for the name and directory pointer but the vnode is different, then the entry is overwritten. Otherwise, the function returns with no action. vnode_t *dnlc_lookup(vnode_t *dvp, char *name, cred_t *cr); Locates an ncache entry that matches the supplied name and directory vnode pointer. Returns a pointer to the vnode for that entry or returns NULL. void    dnlc_purge(void); Called by the vfs framework when an umountall() is called. void    dnlc_purge_vp(vnode_t *vp); Purges all entries matching the vnode supplied. int     dnlc_purge_vfsp(vfs_t *vfs, int); Purges all entries matching the vfs supplied. void    dnlc_remove(vnode_t *vp, char *name); Removes the entry matching the supplied name and directory vnode pointer. int     dnlc_fs_purge1(struct vnodeops *vop); Purge 1 entry from the dnlc that is part of the file system(s) represented by 'vop'. The purpose of this routine is to allow users of the dnlc to free a vnode that is being held by the dnlc.If we find a vnode that we release which will result in freeing the underlying vnode (count was 1), return 1, 0 if no appropriate vnodes found.                                                       See usr/src/uts/common/sys/dnlc.h 


14.10.3. DNLC Negative Cache

The DNLC has support for negative caching. Some applications repeatedly test for the existence or nonexistence of a file (for example, a lock file or a results file). In addition, many shell PATH variables list directories that don't exist. For these applications, caching the fact that the file doesn't exist (negative caching) is a performance boost.

The DNLC negative cache follows the NFS negative-cache solution. It defines a negative cache vnode that is initialized with the reference count set to 1 so that VOP_INACTIVE() never gets called on it.

vnode_t negative_cache_vnode; #define DNLC_NO_VNODE &negative_cache_vnode                                                       See usr/src/uts/common/sys/dnlc.h 


File systems were updated in Solaris 8 to use negative caching so that each dnlc_lookup() checks for a DNLC_NO_VNODE return. Negative cache entries will be added when directory lookups fail, and will be invalidated by dnlc_update() when a real file of that name is added.

14.10.4. DNLC Directory Cache

The directory cache adds a new set of interfaces to the DNLC to cache entire directories. The directory cache eliminates performance bottlenecks for directories with tens of thousands of files. This helps performance when the file name repeatedly changes and when new files are created. It removes the need to search the entire directory to find out if the file name already exists. It turns out that mail and news spool directories see this scenario all the time.

The DNLC structure is shown below.

 /*  * This structure describes the elements in the cache of recent  * names looked up.  *  * Note namlen is a uchar_t to conserve space  * and alignment padding. The max length of any  * pathname component is defined as MAXNAMELEN  * which is 256 (including the terminating null).  * So provided this doesn't change, we don't include the null,  * we always use bcmp to compare strings, and we don't start  * storing full names, then we are ok. The space savings are worth it.  */ typedef struct ncache {         struct ncache *hash_next;       /* hash chain, MUST BE FIRST */         struct ncache *hash_prev;         struct vnode *vp;               /* vnode the name refers to */         struct vnode *dp;               /* vnode of parent of name */         int hash;                       /* hash signature */         uchar_t namlen;                 /* length of name */         char name[1];                   /* segment name - null terminated */ } ncache_t;                                                       See usr/src/uts/common/sys/dnlc.h 


File systems must provide a structure for use only by the DNLC directory caching code for each directory.

typedef struct dcanchor {         void *dca_dircache;     /* opaque directory cache handle */         kmutex_t dca_lock;      /* protects the pointer and cache */ } dcanchor_t; 


All file systems have an in-memory xxnode (for example, inode in ufs) that could contain such a structure. Following is an example of how a file system would use the directory cache interfaces.

fs_lookup(dir, name) {         Return entry if in regular dnlc         dcap = dir->dcap;         switch dnlc_dir_lookup(dcap, name, &handle)         case DFOUND:                 use handle to get and return vnode                 break         case DNOENT:                  return ENOENT          }         caching = 0;         if want to cache directory {                 switch dnlc_dir_start(dcap, num_dir_entries)                 case DNOMEM:                 case DTOOBIG:                         mark directory as non cache-able                         break;                 case                         caching = 1;         }         while not end of directory {                 if entry && caching                         handle = ino and offset;                         dnlc_dir_add_entry(dcap, entry_name, handle)                 if free space && caching                         handle = offset;                         dnlc_dir_add_space(dcap, length. handle)                 if entry matches                         get vnode                 if various errors                         if caching                                 dnlc_dir_purge(dcap)                         return error         }         if caching                 dnlc_dir_complete(dcap)         return vnode or ENOENT } 


The following set of new dnlc interfaces will be provided to cache complete directory contents (both entries and free space).

[View full width]

Status returns from the directory cache interfaces #define DOK 0 /* operation successful */ #define DNOCACHE 1 /* there is no cache */ #define DFOUND 2 /* entry found */ #define DNOENT 3 /* no entry found */ #define DTOOBIG 4 /* exceeds tunable dnlc_dir_max_size */ #define DNOMEM 5 /* no memory */ Interfaces for building and adding to the directory cache int dnlc_dir_start(dcanchor_t *dcap, uint_t num_entries); Requests that a directory be cached. This must be called initially to enable caching on a directory. After a successful call, directory entries and free space can be added (see below) until the directory is marked complete. num_entries is an estimate of the current number of directory entries. The request is rejected with DNOCACHE if num_entries falls below the tunable dnlc_dir_min_size (see below), and rejected with DTOOBIG if it's above dnlc_dir_max_size. Returns DOK, DNOCACHE, DTOOBIG, DNOMEM (see below) int dnlc_dir_add_entry(dcanchor_t *dcap, char *name, uint64_t handle); Adds an entry (name and handle) into the partial or complete cache. Handle is a file-system-specific quantity that is returned on calls to dnlc_dir_lookup() - see below. Handle for ufs holds the inumber and a directory entry offset. Returns DOK, DNOCACHE, DTOOBIG int dnlc_dir_add_space(dcanchor_t *dcap, uint_t len, uint64_t handle); Add free space (length and file-system-specific handle) into the partial or complete cache. Handle for ufs holds the directory entry offset Returns DOK, DNOCACHE, DTOOBIG void dnlc_dir_complete(dcanchor_t *dcap); Indicates the previously partial cache is now complete void dnlc_dir_purge(dcanchor_t *dcap); Deletes the partial or complete cache Interface for reading the directory cache int dnlc_dir_lookup(dcanchor_t *dcap, char *name, uint64_t *handlep); Looks up a file in the cache. Handlep must be non-null, and will be set to point to the file-system-supplied handle Returns DFOUND, DNOENT, DNOCACHE Interfaces for amending the cache int dnlc_dir_update(dcanchor_t *dcap, char *name, uint64_t handle); Update the handle for the given entry Returns DFOUND, DNOENT, DNOCACHE int dnlc_dir_rem_entry(dcanchor_t *dcap, char *name, uint64_t *handlep); Remove an entry Returns the handle if handlep non-null and DFOUND, DNOENT, DNOCACHE int dnlc_dir_rem_space_by_len(dcanchor_t *dcap, uint_t len, uint64_t *handlep); Find and remove a space entry with at least the given length and Returns the handle, and DFOUND, DNOENT, DNOCACHE int dnlc_dir_rem_space_by_handle(dcanchor_t *dcap, uint64_t handle); Find and removes the free space with the given handle Returns DFOUND, DNOENT, DNOCACHE Interfaces for initializing and finishing with the directory cache anchor void dnlc_dir_init(dcanchor_t *dcap); Initializes the anchor. This macro clears the dca_dircache field and does a mutex_init on the lock void dnlc_dir_fini(dcanchor_t *dcap); Called to indicate the anchor is no longer used. This macro asserts there's no cache and mutex_destroys the lock.


Additional notes on the directory cache interface are as follows:

  • Because of memory shortages, directory caches can be purged at any time. If the last directory cache is purged because of a memory shortage, then the directory cache is marked internally as "no memory." Future returns will all be DNOCACHE until the next dnlc_start_dir(), which will return DNOMEM once. This memory shortage may only be transient. It's up to the file system to handle this condition, but an attempt to immediately rebuild the cache will very likely lead to the same shortage of memory and to thrashing.

  • It's file system policy as to when and what size directories to cache.

  • Directory caches are purged according to LRU basis when a plea to release memory comes from the kmem system. A kmem_cache is used for one data structure, and on the reclaim callback, the LRU directory cache is released. Directory caches are also purged on failure to get additional memory. Otherwise, directories are cached as much as memory allows.

14.10.5. DNLC Housekeeping Thread

The DNLC maintains a task queue. The dnlc_reduce_cache() activates the task queue when there are ncsize name cache entries, and it reduces the size to dnlc_nentries_low_water, which is by default one hundredth less than (or 99% of) ncsize. If dnlc_nentries hits dnlc_max_nentries (twice ncsize), then this means that dnlc_reduce_cache() is failing to keep up. In this case, we refuse to add new entries to the dnlc until the task queue catches up.

14.10.6. DNLC Statistics

Below is an example of DNLC statistics obtained with the kstat command.

sol10$ kstat -n dnlcstats module: unix                            instance: 0 name:   dnlcstats                       class:    misc         crtime                          70.644144966         dir_add_abort                   0         dir_add_max                     0         dir_add_no_memory               0         dir_cached_current              0         dir_cached_total                269         dir_entries_cached_current      0         dir_fini_purge                  0         dir_hits                        131992         dir_misses                      1312735         dir_reclaim_any                 23         dir_reclaim_last                4         dir_remove_entry_fail           0         dir_remove_space_fail           0         dir_start_no_memory             0         dir_update_fail                 0         double_enters                   310146         enters                          22732358         hits                            384680010         misses                          2390823         negative_cache_hits             6048394         pick_free                       0         pick_heuristic                  15613169         pick_last                       632544         purge_all                       0         purge_fs1                       0         purge_total_entries             5369737         purge_vfs                       27052         purge_vp                        3009         snaptime                        4408540.56846945 





SolarisT Internals. Solaris 10 and OpenSolaris Kernel Architecture
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)
ISBN: 0131482092
EAN: 2147483647
Year: 2004
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net