Section 8.14. Task Working Set Detection and Maintenance

8.14. Task Working Set Detection and Maintenance

The kernel uses physical memory as a cache for virtual memory. When new pages are to be brought in because of page faults, the kernel may need to decide which pages to reclaim from among those that are currently in physical memory. For an application, the kernel should ideally keep in memory those pages that would be needed very soon. In the utopian operating system, the kernel would know ahead of time the pages an application references as it runs. Several algorithms that approximate such optimal page replacement have been researched. Another approach uses the Principle of Locality, on which the Working Set Model is based. As described in the paper titled "Virtual Memory,"^[19] locality can be informally understood as a program's affinity for a subset of its pages, where this set of favored pages changes membership slowly. This gives rise to the working setinformally defined as the set of "most useful" pages for a program. The Working Set Principle establishes the rule that a program may run if and only if its working set is in memory, and a page may not be removed if it is the member of a running program's working set. Studies have shown that keeping a program's working set resident in physical memory typically allows it to run with acceptable performancethat is, without causing an unacceptable number of page faults.

^[19] "Virtual Memory," by Peter J. Denning (ACM Computing Surveys 2:3, September 1970, pp. 153189).

8.14.1. The TWS Mechanism

The Mac OS X kernel includes an application-profiling mechanism that can construct per-user, per-application working set profiles, save the corresponding pages in a designated directory, and attempt to load them when the application is executed by that user. We will call this mechanism TWS, for task working set (the various functions and data structures in its implementation have the tws prefix in their names).

TWS is integrated with the kernel's page-fault-handling mechanismit is called when there is a page fault. The first time an application is launched in a given user context, TWS captures the initial working set and stores it in a file in the /var/vm/app_profile/ directory. Several aspects of the TWS scheme contribute to performance.

The profile information is used during page-fault handling to determine whether any nearby pages should be brought in. Bringing more pages in than those corresponding to the immediate page fault leads to a single large request to the pager, avoiding multiple subsequent requests that would otherwise have to be made to bring in pages that are expected to be needed in the near future. This is relevant only for nonsequential pages, however, since sequential pages are brought in anyway because of cluster I/O.
TWS captures and stores on disk an application's initial working set the first time the application is started by a particular user. This information is used for seeding (or preheating) the application's working set when it is launched again in the same user context. This way, the application's profile is built over time.
The locality of memory references is often captured on disk, as on-disk files typically have good locality on HFS Plus volumes. Normally, the working sets can be read from disk with little seeking overheads.

8.14.2. TWS Implementation

Given a user with user ID U, TWS stores application profiles for that user as two files in /var/vm/app_profile/: #U_names and #U_data, where #U is the hexadecimal representation of U. The names file is a simple database that contains a header followed by profile elements, whereas the data file contains the actual working sets. The profile elements in the names file point to the working sets in the data file.

// bsd/vm/vm_unix.c // header for the "names" file struct profile_names_header {         unsigned int    number_of_profiles;         unsigned int    user_id;         unsigned int    version;         off_t           element_array;         unsigned int    spare1;         unsigned int    spare2;         unsigned int    spare3; }; // elements in the "names" file struct profile_element {         off_t           addr;         vm_size_t       size;         unsigned int    mod_date;         unsigned int    inode;         char name[12]; };

The kernel maintains a global profile cache data structure containing an array of global profiles, each of whose entries contains profile file information for one user.

// bsd/vm/vm_unix.c // meta information for one user's profile struct global_profile {     struct vnode *names_vp;     struct vnode *data_vp;     vm_offset_t   buf_ptr;     unsigned int  user;     unsigned int  age;     unsigned int  busy; }; struct global_profile_cache {     int max_ele;     unsigned int age;     struct global_profile profiles[3]; // up to 3 concurrent users }; ... struct global_profile_cache global_user_profile_cache = {     3,     0,     { NULL, NULL, 0, 0, 0, 0 },     { NULL, NULL, 0, 0, 0, 0 },     { NULL, NULL, 0, 0, 0, 0 } };

Let us use the readksym.sh script to read the contents of global_user_profile_cache. We can see from the output shown in Figure 827 that the three global per-user slots are occupied by the user IDs 0x1f6 (502), 0, and 0x1f5 (501).

Figure 827. Reading the contents of the TWS subsystem's global user profile cache

$ sudo ./readksym.sh _global_user_profile_cache 128 0000000    0000    0003    0000    4815    053b    0c60    049a    dbdc 0000010    5da2    a000    0000    01f6    0000    47f9    0000    0000 0000020    040e    5ce4    0406    e4a4    5d5d    0000    0000    0000 0000030    0000    4814    0000    0000    045c    3738    045c    3840 0000040    5a74    d000    0000    01f5    0000    480c    0000    0000 0000050    0000    0001    040f    b7bc    03fa    9a00    04a5    f420 0000060    063b    3450    0472    4948    0442    96c0    0000    0000 0000070    0000    0000    0000    0000    0000    0000    0000    0000

Most of the TWS functionality is implemented in osfmk/vm/task_working_set.c and bsd/vm/vm_unix.c. The former uses functions implemented by the latter for dealing with profile files.

prepare_profile_database() creates unique absolute pathnames to the names and data files for the given user ID. It is called by setuid() to prepare these files for the new user.
bsd_search_page_cache_data_base() searches for an application's profile in the given names file.
bsd_open_page_cache_files() attempts to either open or create the names and data files. If both files are present, they will be opened. If neither is present, both will be created. If only one is present, the attempt will fail.
bsd_close_page_cache_files() decrements references on the names and data files for the given user profile.
bsd_read_page_cache_file() first calls bsd_open_page_cache_files(), then looks for the given application's profile in the names file using bsd_search_page_cache_data_base(). If the profile is found, the function reads profile data from the data file into the given buffer.
bsd_write_page_cache_file() writes to the names and data files.

As shown in Figure 86, the task structure's dynamic_working_set field is a pointer to a tws_hash structure [osfmk/vm/task_working_set.h]. This pointer is initialized during task creationspecifically by task_create_internal(), which calls task_working_set_create() [osfmk/vm/task_working_set.c]. Conversely, when the task is terminated, the working set is flushed (by task_terminate_internal()) and the corresponding hash entry is destroyed (by task_deallocate()).

// osfmk/kern/task.c kern_return_t task_create_internal(task_t     parent_task,                      boolean_t  inherit_memory,                      task_t    *child_task) {     ...     new_task->dynamic_working_set = 0;     task_working_set_create(new_task, TWS_SMALL_HASH_LINE_COUNT,                             0, TWS_HASH_STYLE_DEFAULT);     ... }

task_working_set_create() calls tws_hash_create() [osfmk/vm/task_working_set.c] to allocate and initialize a tws_hash structure. As shown in Figure 828, execve() saves the executable's name for the TWS mechanism. Before a Mach-O executable is loaded, the Mach-O image activator calls tws_handle_startup_file() [osfmk/vm/task_working_set.c] to preheat the task if possible.

Figure 828. TWS-related processing during the `execve()` system call

// bsd/kern/kern_exec.c int execve(struct proc *p, struct execve_args *uap, register_t *retval) {     ...     if (/* not chroot()'ed */ && /* application profiling enabled */) {         // save the filename from the path passed to execve()         // the TWS mechanism needs it to look up in the names file         ...     }     ... } ... // image activator for Mach-O binaries static int exec_mach_imgact(struct image_params *imgp) {     ...     if (/* we have a saved filename */) {         tws_handle_startup_file(...);     }     vm_get_shared_region(task, &initial_region);     ...     // actually load the Mach-O file now     ... }

tws_handle_startup_file() first calls bsd_read_page_cache_file() [bsd/vm/vm_unix.c] to read the appropriate page cache file. If the read attempt succeeds, the existing profile is read by a call to tws_read_startup_file(). If the read attempt fails because no profile was found for the application, a new profile is created by calling tws_write_startup_file(), which in turn calls task_working_set_create(). The working set information is later written to disk by a call to tws_send_startup_info(), which calls bsd_write_page_cache_file().

The rest (and most) of the TWS activity occurs during page-fault handlingthe mechanism is specifically invoked on a page fault, which allows it to monitor the application's fault behavior. vm_fault() [osfmk/vm/vm_fault.c]the page-fault handlercalls vm_fault_tws_insert() [osfmk/vm/vm_fault.c] to add page-fault information to the current task's working set. vm_fault_tws_insert() is provided with a VM object and an offset within it, using which it performs a hash lookup in the tws_hash data structure pointed to by the task's dynamic_working_set field. This way, it determines whether the object/offset pair needs to be inserted in the hash and whether doing so needs the cached working set to be expanded. Moreover, vm_fault_tws_insert() returns a Boolean value to its caller indicating whether the page cache files need to be written. If so, vm_fault() calls tws_send_startup_info() to write the files through an eventual call to bsd_write_page_cache_file(). vm_fault() may also call vm_fault_page() [osfmk/vm/vm_fault.c], which finds the resident page for the virtual memory specified by the given VM object and offset. In turn, vm_fault_page() may need to call the appropriate pager to retrieve the data. Before it issues a request to the pager, it calls tws_build_cluster() to add up to 64 pages from the working set to the request. This allows a single large request to be made to the pager.

8.14. Task Working Set Detection and Maintenance

8.14.1. The TWS Mechanism

8.14.2. TWS Implementation

Figure 827. Reading the contents of the TWS subsystem's global user profile cache

Figure 828. TWS-related processing during the execve() system call

Figure 828. TWS-related processing during the `execve()` system call