6.4. Entering the KernelOn a typical operating system, user processes are logically insulated from the kernel's memory by using different processor execution modes. The Mac OS X kernel executes in a higher-privileged mode (PowerPC OEA) than any user program (PowerPC UISA and VEA). Each user processthat is, each Mach taskhas its own virtual address space. Similarly, the kernel has its own, distinct virtual address space that does not occupy a subrange of the maximum possible address space of a user process. Specifically, the Mac OS X kernel has a private 32-bit (4GB) virtual address space, and so does each 32-bit user process. Similarly, a 64-bit user process also gets a private virtual address space that is not subdivided into kernel and user parts.
Although the Mac OS X user and kernel virtual address spaces are not subdivisions of a single virtual address space, the amounts of virtual memory usable within both are restricted due to conventional mappings. For example, kernel addresses in the 32-bit kernel virtual address space lie between 0x1000 and 0xDFFFFFFF (3.5GB). Similarly, the amount of virtual memory a 32-bit user process can use is significantly less than 4GB, since various system libraries are mapped by default into each user address space. We will see specific examples of such mappings in Chapter 8. We will refer to the kernel virtual address space simply as the kernel space. Moreover, even though each user process has its own address space, we will often use the phrase the user space when the specific process is not relevant. In this sense, we can think of all user processes as residing in the user space. The following are some important characteristics of the kernel and user spaces.
Since the kernel mediates access to physical resources, a user program must exchange information with the kernel to avail the kernel's services. Typical user-space execution requires exchange of both control information and data. In such an exchange between a Mach task and the kernel, a thread within the task transitions to kernel space from user space, transferring control to the kernel. After handling the user thread's request, the kernel returns control back to the thread, allowing it to continue normal execution. At other times, the kernel can acquire control even though the current thread was not involved in the reason for the transferin fact, the transfer is often not explicitly requested by the programmer. We refer to execution within the kernel and user spaces as being in the kernel mode and the user mode, respectively.
6.4.1. Types of Control TransferAlthough such transfers of control are traditionally divided into categories based on the events that caused them, at the PowerPC processor level, all categories are handled by the same exception mechanism. Examples of events that can cause the processor to change execution mode include the following:
Nevertheless, it is still useful to categorize control transfers in Mac OS X based on the events causing them. Let us look at some broad categories. 6.4.1.1. External Hardware InterruptsAn external hardware interrupt is a transfer of control into the kernel that is typically initiated by a hardware device to indicate an event. Such interrupts are signaled to the processor by the assertion of the processor's external interrupt input signal, which causes an external interrupt exception in the processor. External interrupts are asynchronous, and their occurrence is typically unrelated to the currently executing thread. Note that external interrupts can be masked. An example of an external interrupt is a storage device controller causing an interrupt to signal the completion of an I/O request. On certain processors, such as the 970FX, a thermal exceptionused to notify the processor of an abnormal conditionis signaled by the assertion of the thermal interrupt input signal. In this case, even though the abnormal condition is internal to the processor, the source of the interrupt is external. 6.4.1.2. Processor TrapsA processor trap is a transfer of control into the kernel that is initiated by the processor itself because of some event that needs attention. Processor traps may be synchronous or asynchronous. Although the conditions that cause traps could all be termed abnormal in that they are all exceptional (hence the exception), it is helpful to subclassify them as expected (such as page faults) or unexpected (such as a hardware failure). Other examples of reasons for traps include divide-by-zero errors, completion of a traced instruction, illegal access to memory, and the execution of an illegal instruction. 6.4.1.3. Software TrapsThe Mac OS X kernel implements a mechanism called asynchronous system traps (ASTs), wherein one or more reason bits can be set by software for a processor or a thread. Each bit represents a particular software trap. When a processor is about to return from an interrupt context, including returns from system calls, it checks for these bits, and takes a trap if it finds one. The latter operation involves executing the corresponding interrupt-handling code. A thread checks for such traps in many cases when it is about to change its execution state, such as from being suspended to running. The kernel's clock interrupt handler also periodically checks for ASTs. We categorize ASTs on Mac OS X as software traps because they are both initiated and handled by software. Some AST implementations may use hardware support. 6.4.1.4. System CallsThe PowerPC system call instruction is used by programs to generate a system call exception, which causes the processor to prepare and execute the system call handler in the kernel. The system call exception is synchronous. Hundreds of system calls constitute a well-defined set of interfaces serving as entry points into the kernel for user programs.
To sum up, a hardware interrupt from an external device generates an external interrupt exception, a system call generates a system call exception, and other situations result in a variety of exceptions. 6.4.2. Implementing System Entry MechanismsPowerPC exceptions are the fundamental vehicles for propagating any kind of interrupts (other than ASTs), whether hardware- or software-generated. Before we discuss how some of these exceptions are processed, let us look at the key components of the overall PowerPC exception-processing mechanism on Mac OS X. These include the following, some of which we have come across in earlier chapters:
A system linkage instruction connects user-mode and supervisor-mode software. For example, by using a system linkage instruction (such as sc), a program can call on the operating system to perform a service. Conversely, after performing the service, the operating system can return to user-mode software by using another system linkage instruction (such as rfid). 6.4.2.1. Exceptions and Exception VectorsThe __VECTORS segment of the kernel executable (Figure 65) contains the kernel's exception vectors. As we saw in Chapter 4, BootX copies these to their designated location (starting at 0x0) before transferring control to the kernel. These vectors are implemented in osfmk/ppc/lowmem_vectors.s. Figure 65. The Mach-O segment containing the exception vectors in the kernel executable
Table 51 lists various PowerPC processor exceptions and some of their details. Recall that most exceptions are subject to one or more conditions; for example, most exceptions can occur only when no higher-priority exception exists. Similarly, exceptions caused by failed effective-to-virtual address translations can occur only if address translation is enabled. Moreover, depending on a system's specific hardware, or whether the kernel is being debugged, some exceptions listed in Table 51 may be inconsequential. Figure 66 shows an excerpt from lowmem_vectors.s. For example, when there is a system call exception, the processor executes the code starting at the label .L_handlerC00 (vector offset 0xC00). Figure 66. The kernel's exception vectors
The exception vectors for the x86 version of Darwin are implemented in osfmk/i386/locore.s.
Figure 67. Trap vectors in Third Edition UNIX
6.4.2.2. Exception-Handling RegistersThe Machine Status Save/Restore Register 0 (SRR0) is a special branch-processing register in the PowerPC architecture. It is used to save machine status on interrupts and to restore machine status on return from interrupts. When an interrupt occurs, SRR0 is set to the current or next instruction address, depending on the nature of the interrupt. For example, if an interrupt is being caused due to an illegal instruction exception, then SRR0 will contain the address of the current instruction (the one that failed to execute). SRR1 is used for a related purpose: It is loaded with interrupt-specific information when an interrupt occurs. It also mirrors certain bits of the Machine State Register (MSR) in the case of an interrupt. The special-purpose registers SPRG0, SPRG1, SPRG2, and SPRG3 are used as support registers (in an implementation-dependent manner) in various stages of exception processing. For example, the Mac OS X kernel uses SPRG2 and SPRG3 to save interrupt-time general-purpose registers GPR13 and GPR11, respectively, in the implementation of the low-level exception vectors. Furthermore, it uses SPRG0 to hold a pointer to the per_proc structure. 6.4.2.3. System Linkage InstructionsSystem CallWhen a system call is invoked from user space, GPR0 is loaded with the system call number, and the sc instruction is executed. The effective address of the instruction following the system call instruction is placed in SRR0, certain bit ranges of the MSR are placed into the corresponding bits of SRR1, certain bits of the SRR1 are cleared, and a system call exception is generated. The processor fetches the next instruction from the well-defined effective address of the system call exception handler. Return from Interruptrfid (return-from-interrupt-double-word) is a privileged, context-altering, and context-synchronizing instruction used to continue execution after an interrupt. Upon its execution, among other things, the next instruction is fetched from the address specified by SRR0. rfid's 32-bit counterpart is the rfi instruction.
A context-altering instruction is one that alters the context in which instructions are executed, data is accessed, or data and instruction addresses are interpreted in general. A context-synchronizing instruction is one that ensures that any address translations associated with instructions following it will be discarded if the translations were performed using the old contents of the page table entry (PTE). 6.4.2.4. Machine-Dependent Thread StateWe will examine the in-kernel thread data structure [osfmk/kern/thread.h] and related structures in Chapter 7. Each thread contains a machine-dependent state, represented by a machine_thread structure [osfmk/ppc/thread.h]. Figure 68 shows a portion of the machine_thread structure. Its fields include the following.
Figure 68. Structure for a thread's machine-dependent state
6.4.2.5. Exception Save AreasSave areas are fundamental to xnu's exception processing. Important characteristics of the kernel's save area management include the following.
We can write a simple program as follows to display some save-area-related sizes used by the kernel. $ cat savearea_sizes.c // savearea_sizes.c #include <stdio.h> #include <stdlib.h> #define XNU_KERNEL_PRIVATE #define __APPLE_API_PRIVATE #define MACH_KERNEL_PRIVATE #include <osfmk/ppc/savearea.h> int main(void) { printf("size of a save area structure in bytes = %ld\n", sizeof(savearea)); printf("# of save areas per page = %ld\n", sac_cnt); printf("# of save areas to make at boot time = %ld\n", InitialSaveAreas); printf("# of save areas for an initial target = %ld\n", InitialSaveTarget); exit(0); } $ gcc -I /work/xnu -Wall -o savearea_sizes savearea_sizes.c $ ./savearea_sizes size of a save area structure in bytes = 640 # of save areas per page = 6 # of save areas to make at boot time = 48 # of save areas for an initial target = 24 Structure declarations for the various save area types are also contained in osfmk/ppc/savearea.h. // osfmk/ppc/savearea.h #ifdef MACH_KERNEL_PRIVATE typedef struct savearea_comm { // ... fields common to all save areas // ... fields used to manage individual contexts } savearea_comm; #endif #ifdef BSD_KERNEL_PRIVATE typedef struct savearea_comm { unsigned int save_000[24]; } savearea_comm; #endif typedef struct savearea { savearea_comm save_hdr; // general context: exception data, all GPRs, SRR0, SRR1, XER, LR, CTR, // DAR, CR, DSISR, VRSAVE, VSCR, FPSCR, Performance Monitoring Counters, // MMCR0, MMCR1, MMCR2, and so on ... } savearea; typedef struct savearea_fpu { savearea_comm save_hdr; ... // floating-point context that is, all FPRs } savearea_fpu; typedef struct savearea_vec { savearea_comm save_hdr; ... save_vrvalid; // valid VRs in saved context // vector context that is, all VRs } savearea_vec; ... When a new thread is created, a save area is allocated for it by machine_thread_create() [osfmk/ppc/pcb.c]. The save area is populated with the thread's initial context. Thereafter, a user thread begins life with a taken interruptthat is, it looks from an observer's standpoint that the thread is in the kernel because of an interrupt. It returns to user space through thread_return() [osfmk/ppc/hw_exception.s], retrieving its context from the save area. In the case of kernel threads, machine_stack_attach() [osfmk/ppc/pcb.c] is called to attach a kernel stack to a thread and initialize its state, including the address where the thread will continue execution. // osfmk/ppc/pcb.c kern_return_t machine_thread_create(thread_t thread, task_t task) { savearea *sv; // pointer to newly allocated save area ... sv = save_alloc(); // allocate a save area bzero((char *)((unsigned int)sv // clear the save area + sizeof(savearea_comm)), (sizeof(savearea) - sizeof(savearea_comm))); sv->save_hdr.save_prev = 0; // clear the back pointer ... sv->save_hdr.save_act = thread; // set who owns it thread->machine.pcb = sv; // point to the save area // initialize facility context thread->machine.curctx = &thread->machine.facctx; // initialize facility context pointer to activation thread->machine.facctx.facAct = thread; ... thread->machine.upcb = sv; // set user pcb ... sv->save_fpscr = 0; // clear all floating-point exceptions sv->save_vrsave = 0; // set the vector save state ... return KERN_SUCCESS; }
|