8.13. System Shared MemoryThe kernel provides a mechanism for system-wide memory sharingthe Shared Memory Server subsystem. Using this facility, both the kernel and user programs can share code and data among all tasks on the system. It is also possible to give one or more tasks private versions of the shared memory. 8.13.1. Applications of Shared MemoryIn Chapter 6, we looked at the commpage area, which is a set of pages that are mapped (shared and read-only) into every task's virtual address space. The pages contain both code and data. As we saw in earlier chapters, during bootstrapping, the startup thread (kernel_bootstrap_thread() [osfmk/kern/startup.c]) calls commpage_populate() [osfmk/ppc/commpage/commpage.c] to populate the commpage area. Figure 821 shows a summary of shared-memory-related initialization during bootstrapping. Figure 821. System-wide shared memory setup during bootstrapping
The system library, which contains code to use the contents of the commpage area, places the commpage symbols in a special section called __commpage in its __DATA segment. Recall also that the last eight pages of a 32-bit virtual address space are reserved for the commpage area, of which the very last page is unmapped. We can use the vmmap utility to verify that the last submap is indeed the commpage area. $ vmmap $$ ... ==== Non-writable regions for process 24664 ... system ffff8000-ffffa000 [ 8K] r--/r-- SM=SHM commpage [libSystem.B.dylib] ... As shown in Figure 821, besides the commpage area, the kernel creates two 256MB submaps that are shared by all tasks. Mac OS X uses these submaps for supporting shared libraries. A shared library on Mac OS X can be compiled such that its read-only (__TEXT and __LINKEDIT[17]) and read-write (__DATA) segments are split and relocated at offsets relative to specific addresses. This split-segment dynamic linking is in contrast to the traditional case where the read-only and read-write portions are not separated into predefined, nonoverlapping address ranges.
A split-segment dynamic shared library can be created by passing the -segs_read_only_addr and -segs_read_write_addr options to the static link editor (ld). Both options require a segment-aligned address as an argument, which becomes the starting address of the corresponding segments (read-only or read-write) in the library. Now, a split-segment library can be mapped so that its text segment is completely shared between taskswith a single physical map (i.e., the same page table entries)whereas the data segment is shared copy-on-write. The predefined address ranges for the text and data segments of Apple-provided libraries are 0x9000_0000-0x9FFF_FFFF and 0xA000_0000-0xAFFF_FFFF, respectively. This way, a single mapping of a shared library can be used by multiple tasks. Note that programs cannot get write access to the global shared text segment by changing its protection: Calls such as vm_protect() and mprotect() eventually call vm_map_protect() [osfmk/vm/vm_map.c], which will deny the protection change because the maximum protection value does not permit write access. If it is required to modify memory mapped in this range during runtime, an option is to use a debug version of the corresponding library, which will not be a split-segment library and as such will not map into the global shared regions.
Split-segment libraries are meant to be implemented only by Apple. Therefore, the global shared text and data regions are reserved for Apple and must not be directly used by third-party software. Let us consider the example of the system library. Its normal, nondebug version is a split-segment library with a preferred load address of 0x9000_0000 for its text segment.
A library may not load at its preferred base addressthe one it is prebound tobecause an existing mapping may collide with the library's desired address range. If instructed by the caller, the Shared Memory Server subsystem can attempt to still load the library but at an alternative locationthe library slides to a different base address. $ otool -hv /usr/lib/libSystem.dylib /usr/lib/libSystem.dylib: Mach header magic cputype cpusubtype filetype ncmds sizeofcmds flags MH_MAGIC PPC ALL DYLIB 10 2008 NOUNDEFS DYLDLINK PREBOUND SPLIT_SEGS TWOLEVEL $ otool -l /usr/lib/libSystem.dylib /usr/lib/libSystem.dylib: Load command 0 cmd LC_SEGMENT cmdsize 872 segname __TEXT vmaddr 0x90000000 ... Load command 1 cmd LC_SEGMENT cmdsize 804 segname __DATA vmaddr 0xa0000000 ... The debug version of the system library is not split-segment: It specifies 0x0000_0000 as the load address of its text segment. $ otool -hv /usr/lib/libSystem_debug.dylib /usr/lib/libSystem_debug.dylib: Mach header magic cputype cpusubtype filetype ncmds sizeofcmds flags MH_MAGIC PPC ALL DYLIB 9 2004 NOUNDEFS DYLDLINK TWOLEVEL $ otool -l /usr/lib/libSystem_debug.dylib /usr/lib/libSystem.B_debug.dylib: Load command 0 cmd LC_SEGMENT cmdsize 872 segname __TEXT vmaddr 0x00000000 ... We can instruct dyld to load the debug versions of libraries (provided a library has a debug version available) by setting the value of the DYLD_IMAGE_SUFFIX environment variable to _debug. Let us verify the difference between the mappings of the split-segment and non-split-segment versions of the system library. Note that the current and maximum permissions of the text segment in the case of the split-segment library are r-x and r-x, respectively. The corresponding permissions for the debug version are r-x and rwx, respectively. Therefore, in the case of the debug version, a debugger can request write access to that memorysay, for inserting breakpoints. $ vmmap $$ ... ==== Non-writable regions for process 25928 ... __TEXT 90000000-901a7000 [ 1692K] r-x/r-x SM=COW /usr/lib/libSystem.B.dylib __LINKEDIT 901a7000-901fe000 [ 348K] r--/r-- SM=COW /usr/lib/libSystem.B.dylib ... $ DYLD_IMAGE_SUFFIX=_debug /bin/zsh $ vmmap $$ ... ==== Non-writable regions for process 25934 ... __TEXT 01008000-0123b000 [2252K] r-x/rwx SM=COW /usr/lib/libSystem.B_debug.dylib __LINKEDIT 0124e000-017dc000 [5688K] r--/rwx SM=COW /usr/lib/libSystem.B_debug.dylib ... Using global shared regions for commonly used libraries (such as the system library, which is used by all normal programs) reduces the number of mappings maintained by the VM subsystem. In particular, shared regions facilitate prebinding since library contents are at known offsets. 8.13.2. Implementation of the Shared Memory Server SubsystemThe Shared Memory Server subsystem's implementation can be divided into a BSD front-end and a Mach back-end. The front-end provides a set of Apple-private system calls used by dyld. It is implemented in bsd/vm/vm_unix.c. The back-end, which is implemented in osfmk/vm/vm_shared_memory_server.c, hides Mach VM details and provides low-level shared memory functionality for use by the front-end. The following system calls are exported by this subsystem:
8.13.2.1. shared_region_make_private_np()shared_region_make_private_np() privatizes the current task's shared region, after which a file mapped into that region is seen only by threads in the current task. The call takes a set of address ranges as an argument. Except these explicitly specified ranges, all other mappings in the privatized "shared" region are deallocated, possibly creating holes in the region. dyld uses this call under circumstances in which a private mapping of a library is necessary or desiredsay, because the shared region is full, because the split-segment library to be loaded conflicts with one that is already loaded (and the latter is not needed by the task), or because the DYLD_NEW_LOCAL_SHARED_REGIONS environment variable was set. dyld specifies the set of ranges not to deallocate based on the split-segment libraries used by the process so far. DYLD_NEW_LOCAL_SHARED_REGIONS is useful when either additional or different libraries need to be loaded in a certain program and it is undesirable to pollute the globally shared submap. Let us consider an example. Suppose you want to experiment with an alternate version of a split-segment system library you have. Assuming that the library file is located in /tmp/lib/, you can arrange for it to be loadedsay, for the zsh programin a privatized "shared" region as follows: $ DYLD_PRINT_SEGMENTS=1 DYLD_LIBRARY_PATH=/tmp/lib \ DYLD_NEW_LOCAL_SHARED_REGIONS=1 /bin/zsh dyld: making shared regions private ... dyld: Mapping split-seg un-shared /usr/lib/libSystem.B.dylib __TEXT at 0x90000000->0x901A6FFF __DATA at 0xA0000000->0xA000AFFF ... $ echo $$ 26254 $ vmmap $$ ... __TEXT 90000000-901a7000 [ 1692K] r-x/r-x SM=COW /tmp/lib/libSystem.B.dylib __LINKEDIT 901a7000-901fe000 [ 348K] r--/r-- SM=COW /tmp/lib/libSystem.B.dylib ... Note that all processes created from this shell will inherit the privacy of the shared regionsthey will not share global shared submaps. We can modify our private copy of the system library to see this effect. $ echo $$ 26254 $ ls /var/vm/app_profile/ ls: app_profile: Permission denied $ perl -pi -e 's#Permission denied#ABCDEFGHIJKLMNOPQ#g' /tmp/lib/libSystem.B.dylib $ ls /var/vm/app_profile/ ls: app_profile: ABCDEFGHIJKLMNOPQ 8.13.2.2. shared_region_map_file_np()shared_region_map_file_np() is used by dyld to map parts of a split-segment library in the global shared read-only and read-write regions. dyld parses the load commands in the library file and prepares an array of shared region mapping structures, each of which specifies the address, size, and protection values of a single mapping. It passes this array along with an open file descriptor for the library to shared_region_map_file_np(), which attempts to establish each of the requested mappings. shared_region_map_file_np() also takes as an argument a pointer to an address variable: If the pointer is non-NULL and the requested mappings cannot fit in the target address space as desired, the kernel will attempt to slide (move around) the mappings to make them fit. The resultant slide value is returned in the address variable. If the pointer is NULL instead, the call returns an error without attempting to slide. struct shared_region_mapping_np { mach_vm_address_t address; mach_vm_size_t size; mach_vm_offset_t file_offset; vm_prot_t max_prot; vm_prot_t init_prot; }; typedef struct shared_region_mapping_np sr_mapping_t; int shared_region_map_file_np(int fd, int mapping_count, sr_mapping_t *mappings, uint64_t *slide); Note that a split-segment library file must reside on the root file system for it to be mapped into the system-wide global shared region (the default region). If the file resides on another file system, the kernel returns an EXDEV error ("Cross-device link") unless the calling task's shared region has been privatized by a prior call to shared_region_make_private_np(). shared_region_map_file_np() calls the back-end function map_shared_file() [osfmk/vm/vm_shared_memory_server.c] to perform the mappings. The back-end maintains a hash table of files loaded in the shared space. The hash function uses the relevant VM object's address. The actual mappings are handled by mach_vm_map() [osfmk/vm/vm_user.c].
The two _np (nonportable) calls do not have stubs in the system segment library, whereas the other calls do. Mac OS X 10.4 is the first system version in which these two calls are implemented. The KERN_SHREG_PRIVATIZABLE sysctl can be used to determine whether shared regions can be privatizedthat is, whether the shared_region_make_private_np() call is implemented. dyld uses this sysctl during its operation. shared_region_make_private_np() calls clone_system_shared_regions() [bsd/vm/vm_unix.c] internal function to get a private copy of the current shared regions. clone_system_shared_regions() can either completely detach the cloned region from the old region, or it can create a shadowed clone and retain all mappings of the old region. In the latter case, if the back-end fails to locate something (a VM object) in the new region, it will also look in the old region. shared_region_make_private_np() uses this call to create a detached clone. The chroot() system call also uses it, but to create a shadowed clone. 8.13.2.3. load_shared_file()load_shared_file() performs a similar role as shared_region_map_file_np() but has somewhat different semantics. Its arguments include an address in the caller's address space where the split-segment library is currently mmap()'d and an array of mapping structures (struct sf_mapping), each of which it attempts to load in the shared region. // osfmk/mach/shared_memory_server.h struct sf_mapping { vm_offset_t mapping_offset; vm_size_t size; vm_offset_t file_offset; vm_prot_t protection; vm_offset_t cksum; }; typedef struct sf_mapping sf_mapping_t; int load_shared_file(char *filename, caddr_t mmapped_file_address, u_long mmapped_file_size, caddr_t *base_address, int mapping_count, sf_mapping_t *mappings, int *flags); load_shared_file() can be passed the following flags to affect its behavior.
For each requested mapping, the sequence of actions performed by the back-end implementation of load_shared_file() includes the following:
8.13.2.4. reset_shared_file()Like load_shared_file(), reset_shared_file() takes a list of shared-file-mapping structures. For each mapping in the global shared data segment, it calls vm_deallocate() to deallocate that mapping and then calls vm_map() to create a fresh, copy-on-write mapping. In other words, this call discards any changes that the task may have made to its private copy of the library's data segment. Older versions of dyld used this call when they needed to remove a loaded split-segment librarysay, because a bundle that loaded that library failed to load. 8.13.2.5. new_system_shared_regions()new_system_shared_regions() calls remove_all_shared_regions() [osfmk/vm/vm_shared_memory_server.c] to disconnect all shared regions present in the default environment while marking the regions as stale. After this, new tasks will not have the old libraries mapped in their address spaces. load_shared_file() can be used to load new libraries into the new set of shared regions. // osfmk/kern/task.c kern_return_t task_create_internal(task_parent_task, boolean_t inherit_memory, task_t *child_task) { ... // increment the reference count of the parent's shared region shared_region_mapping_ref(parent_task->system_shared_region); new_task->system_shared_region = parent_task->system_shared_region; ... } 8.13.3. The Loading of Shared Object Files by the Dynamic LinkerWe have come across several aspects of Mach-O files in earlier chapters. We also noted that Apple does not support the creation of statically linked executables on Mac OS X. In fact, almost all executables that are part of Mac OS X are dynamically linked.
The otool and otool64 programs are examples of executables that are statically linked. As we saw in Chapter 7, the execve() system call eventually hands over control to dyld while preparing to execute a dynamically linked Mach-O file. dyld processes several load commands found in the Mach-O file. In particular, dyld loads the shared libraries the program depends on. If the libraries depend on other libraries, dyld loads them too, and so on. dyld was overhauled in Mac OS X 10.4. The following are important differences between the overhauled version and the older versions.
Figures 822 and 823 depict the operation of dyld while loading non-split-segment and split-segment Mach-O files, respectively. Figure 822. dyld's operation while loading a non-split-segment fileFigure 823. dyld's operation while loading a split-segment file8.13.4. The Use of shared_region_map_file_np() by a System ApplicationAlthough all typical user applications benefit from the services of the Shared Memory Server subsystem, the corresponding APIs are reserved exclusively for Apple-provided applications, with dyld being the only client. Using these APIs can affect all applications on a systempotentially adversely. Therefore, third-party programs must not use these APIs, at least in products. With this caveat, let us look at an example of programmatically mapping a split-segment library into the global shared region. This example will help illustrate the actual working of this mechanism. Figure 824 shows the program. Figure 824. Using shared_region_map_file_np()
We can test the program in Figure 824 by loading a trivial split-segment library in the global shared region. Figure 825 shows the test. Figure 825. Loading a split-segment library in the global shared region
8.13.5. A Note on PrebindingA prebound Mach-O executable contains an additional type of load command: LC_PREBOUND_DYLIB.[18] There is one such command for every shared library that the prebound executable links to. Figure 826 shows the structure of this load command. The command is described by a prebound_dylib_command structure. The structure's name field refers to the prebound shared library's name. The nmodules field specifies the number of modules in the librarya single object file (a ".o") amounts to one module, and so does the linkedit data. The linked_modules field refers to a bit vector that contains a bit for each module in the library. If a module in the library is linked to a module in the executable, the bit corresponding to that library module is set in the vector.
Figure 826. Structure of the LC_PREBOUND_DYLIB load command
Since prebinding executables is deprecated beginning with Mac OS X 10.4, the static linker does not create prebound executables unless the environment variable MACOSX_DEPLOYMENT_TARGET is set to an earlier version of Mac OS Xfor example, 10.3. Note that although the generation of prebound executables is deprecated on Mac OS X 10.4, an Apple-provided executable may be prebound. $ otool -hv /Applications/iTunes.app/Contents/MacOS/iTunes # PowerPC /Applications/iTunes.app/Contents/MacOS/iTunes: Mach header magic cputype cpusubtype filetype ncmds sizeofcmds flags MH_MAGIC PPC ALL EXECUTE 115 14000 NOUNDEFS DYLDLINK PREBOUND TWOLEVEL $ otool -l /Applications/iTunes.app/Contents/MacOS/iTunes | \ grep LC_PREBOUND_DYLIB | wc -l 90 |