Process Identity Crisis: The exec() System Call When an established thread desires to replace its logical memory view with that of another program, "changing its stripes," so to speak, it places a system call to exec(). The exec() call is passed the name of an executable file. The file may be either a compiled binary image or a text file to be processed as a script by an interpreter program such as the POSIX shell or PERL. In the case of a compiled program, the file starts with a header written in a format recognizable to the operating system and containing a "magic number" indicating the hardware platform, O/S type, and version for which the code was compiled. If the magic number does not match those acceptable to the kernel, an error message stating "bad magic" is generated and the exec() call fails. HP-UX 32-bit executables use the Spectrum Object Module (SOM) header format. The name echoes the history of the PA-RISC processor design, which was originally called the spectrum computing family. When the 64-bit PA-RISC hardware was introduced (PA-RISC 2.0), HP-UX adopted the POSIX ELF-64 header format for wide executables (there is an ELF-32 definition, but HP-UX does not currently use it). The Kernel's Draft Horse, getxfile() The exec() call relies heavily on the kernel routine getxfile() to do the majority of its work. The sequence of actions taken depends on the type of fork used to create the calling thread. The first challenge is to determine if the filename passed is a suitable object for running. A program file must not be open to any process for writing; the file must be executable on the system (magic number check), and the requesting thread's UID or GID must have the proper credentials for execution. Assuming these tests are passed in the case of fork()/exec(), the vas and pregion lists must be deconstructed, while for the vfork()/exec() case, we must create a uarea, vas, and pregion/region list from scratch. In both cases, getxfile() must then build the new logical memory view in the system's virtual space. Calling exec() Immediately After vfork() In the vfork()/exec() case (where vfork_state=VFORK_CHILDRUN), we first need to request the creation of a new uarea for the thread along with minimal vas, pregion, and region structures. Next, the content of the parent's uarea (the one we are currently using) is copied to the new one, and another routine is called to switch the thread's execution trace to the new uarea and kernel stack. The vfork_state is changed to VFORK_CHILDEXIT, and we initiate the cleanup of the parent's context. Once the parent uarea and stack have been restored, the parent wakes and releases the vforkinfo buffer as it resumes. Calling exec() after a fork() Following a call to fork(), the newly created child has a vas, pregions, private and shared regions, and an initial thread and uarea. In this case getxfile() must dispose of most of the existing pregions and regions. This function is performed by the kernel procedure dispreg(). Figure 9-5 shows the state of a process's memory view after the disposition of its old image. Figure 9-5. The exec() Call: Disposing of Old Regions Removing pregions, regions, and b-trees The first step in removing pregions, regions, and b-trees is to acquire a write lock for the process and terminate any existing sibling threads. Next, we walk the pregion list and dispose of all except our thread's p_type=PT_UAREA. If we are attempting to perform an exec self (defined as a process attempting to execute itself with another copy of the same program image), we also save the p_type=PT_TEXT and p_type=PT_NULLDREF pregion/regions to save time when we reconstruct the new logical view. Disposing of a pregion involves several steps. First, it must be removed from the active list. If vhand's agehand or stealhand is currently pointing to it, they are moved to p_next so vhand won't waste time. Next, we follow the p_reg pointer to its associated region structure. If we are the last pregion to reference the region (r_refcnt=1), we schedule its cleanup (if others are still referencing an RT_SHARED region, we simply decrement its r_refcnt). After dealing with the region, each pregion structure is also freed and returned to the kernel memory arenas. The actual cleanup of a region requires that we wait for any pending I/O operations to its pages to complete. We then walk the b-tree and delete all virtual address translations, purging them from the processor TLBs and freeing the associated pdir entries. If the virtual-to-physical pde was allocated from an alias or sparse pdir entry, it is returned to the appropriate free list. If it was located in the machine-visible hashtable, it is simply marked as invalid (pde_valid is cleared). If this is the only virtual reference to a physical page (pf_use=1 in the page's pfdat), we must free the physical page and adjust the system's available page count, freemem (if any process threads are currently blocked, waiting on a memory page, we wake them). The page entry in pfdat_ptr and virt_to_virt_prt is updated, and appropriate adjustments to the swap reservation and allocation structures are also made. Finally, we release any reserved or allocated swap and physical pages used to hold the region's page list (the pages needed to hold the b-tree nodes and chunks). The region is unlinked from the forward and backward pointers to the systemwide region list, and we return its structure to the kernel memory arena. Building the New Logical View At this point, regardless of which path we followed, getxfile() sets up the new logical view and maps it within the system's virtual address space (see Figure 9-6). To take stock, we have a new proc and kthread structure and a minimal vas with a uarea. Figure 9-6. getxfile() Rebuilds the Memory View The exec routine opens the program file and passes its vnode to getxfile() along with its attributes and load-specific information from the header. If the program file is a text file, the vnode passes points to the executable image of the interpreter program, which is to be run to process the script. Now it's finally time to build the new image map! The compiler magic type tells us what type of mapping (DEMAND_MAGIC, EXEC_MAGIC, SHMEM_MAGIC, SHARE_MAGIC) to use as we locate the regions in quadrants. First, the null dereference and text regions are configured and mapped (if we are doing an exec self, then these regions will have been retained). Private regions for data (initialized, BSS, and heap) and user stack are added next. Additional regions are added as required for shared memory objects and memory-mapped files (both private and shared varieties). The transformation is now complete, and we return from the exec() call wearing our new image and ready to run its code. Before we examine the final call in a the process's life cycle, exit(), let's take a closer look at the way shared memory objects are mapped into the system's virtual space and attached to a process's logical view. |