9.3. Mach IPC: The Mac OS X ImplementationThe core of the IPC subsystem is implemented in files in the osfmk/ipc/ directory in the kernel source tree. Moreover, the osfmk/kern/ipc_* set of files implements IPC support functions and IPC-related functions for kernel objects such as tasks and threads. Figure 95 shows an overview of Mach IPC implementation in Mac OS X. We will examine the pieces of this picture in the next few sections. Figure 95. An overview of Mach IPC implementation in Mac OS X9.3.1. IPC SpacesEach task has a private IPC spacea namespace for portsthat is represented by the ipc_space structure in the kernel. A task's IPC space defines its IPC capabilities. Consequently, IPC operations such as send and receive consult this space. Similarly, IPC operations that manipulate a task's rights operate on the task's IPC space. Figure 96 shows the fields of the ipc_space structure. Figure 96. The data structure for a task's IPC space
The IPC space encapsulates the knowledge necessary to translate between task-specific (local) port names and kernel-wide (global) port data structures. This translation is implemented using translation entries for port capabilities. Each capability is recorded in the kernel using an IPC entry data structure (struct ipc_entry). An IPC space always contains a table of IPC entries that is pointed to by the is_table field of the ipc_space structure. It can also contain a splay tree[6] of IPC entries, in which case the is_tree field will be non-NULL. Both these are per-task data structures.
The table holds "small" port rights, with each table entry (struct ipc_entry) consuming 16 bytes. If a port right is contained in the table, the right's name is an index into the table. The splay tree holds "large" port rights, with each tree entry (struct ipc_tree_entry) consuming 32 bytes.
9.3.1.1. IPC Entry TableIn general, port right names, which are integers (see Section 9.3.2), do fit in a table because the number of ports a typical task uses is small enough. As we will see shortly, Mach allows a task to rename a port. Moreover, ports can also be allocated using caller-specified names. This means a port name could represent an index that is out of bounds for the task's table. Such rights can be accommodated by overflowing them to the task's splay tree. To minimize memory consumption, the kernel dynamically adjusts the threshold at which entries are held in the splay tree. In fact, the table can also be grown in size. When the kernel does grow the table, it expands it to a new size that is specified (in units of number of table entries) by the is_table_next field of the ipc_space structure. As shown in Figure 95, the is_table_next field points to an ipc_table_size structure. The kernel maintains an array called ipc_table_entries of such structures. This array, which is populated during the IPC subsystem's initialization, is simply a predefined sequence of table sizes.
When a port right whose entry is in the table is deleted, the entry is placed on a free list of unused entries. The list is maintained within the table itself by chaining together unused entries through their ie_next fields. When the next port right is allocated, the last freed entry (if any) is used. The ie_index field implements an ordered hash table used for (reverse) translating an { IPC space, IPC object } pair to a name. This hash table uses open addressing with linear probing. 9.3.1.2. IPC Entry Splay TreeAs shown in Figure 95, an entry in the splay tree consists of an ipc_entry structure (the same as a table entry) along with the following additional fields: name, IPC space, and pointers to left and right children. The ite_next field implements a global open hash table used for (reverse) translating an { IPC space, IPC object } pair to a { name, IPC entry } pair. 9.3.2. The Anatomy of a Mach PortA Mach port is represented in the kernel by a pointer to an ipc_port structure. The IPC entry structure's ipc_object field points to an ipc_object structure, which is logically superimposed on an ipc_port structure. Figure 97 shows an internal representation of the port data structure. Figure 97. A view of the internal structure of a Mach port
From an object-oriented perspective, an ipc_port structure is a subclass of an ipc_object structure. Ports can be grouped into port sets in Mach, with the corresponding structure being an ipc_pset structure [osfmk/ipc/ipc_pset.h]. In such a case, a right will be represented in the kernel by passing a pointer to the ipc_pset structure in question (rather than an ipc_port structure). Another possibility is an rpc_port structure. The fields of an ipc_port structure include a pointer to the IPC space of the task holding the receive right, a pointer to the kernel object that the port represents, and various reference counts such as the make-send count, the number of send rights, and the number of send-once rights. 9.3.2.1. What's in a Port's Name?It is important to realize the different between mach_port_t and mach_port_name_t: The two are treated the same in user space but not in the kernel. A port's name is relevant only in a particular namespace, corresponding to a task. A mach_port_name_t represents the local, namespace-specific identity of a port, without implying any associated rights. A mach_port_t represents a reference added or deleted to a port right. Such a reference is represented in user space by returning the name of the right (or many rights) that was altered within the task's IPC space, which is why it is the same as a mach_port_name_t in user space. Within the kernel, however, port rights are represented by passing a pointer to the appropriate port data structure (ipc_port_t). If a user program receives a mach_port_name_t from the kernel, it means that the kernel has not mapped any associated port rightsthe name is simply the port's integer representation. When the kernel returns a mach_port_t, it maps the associated port rights to the recipient of the message. In both cases, the user program sees the same integer, but with different underlying semantics.
The same port can exist with different names in multiple tasks. Conversely, the same port name can represent different ports in different tasks. It is important to note that knowing a port name in another task is not enough to use that port, since the kernel will evaluate the name in the caller's IPC space. For example, if you print a mach_port_name_t value in a program and then attempt to use the value in another task (one that does not have send rights to that port) to send a message, you will not succeed. In a given port namespace, if there exist multiple rights for a given port, say, a send right and a receive right, the names for the various rights will coalesce into a single name. In other words, a single name can denote multiple rights. This is not so in the case of send-once rights, which are always named uniquely. The ie_bits field of the ipc_entry structure holds the types of rights a given name represents. This bitmap is what allows a single name in an IPC space to represent multiple rights. The IE_BITS_TYPE macro is used to test the bit values. // osfmk/mach/mach_port.h typedef natural_t mach_port_right_t; #define MACH_PORT_RIGHT_SEND ((mach_port_right_t) 0) #define MACH_PORT_RIGHT_RECEIVE ((mach_port_right_t) 1) #define MACH_PORT_RIGHT_SEND_ONCE ((mach_port_right_t) 2) #define MACH_PORT_RIGHT_PORT_SET ((mach_port_right_t) 3) #define MACH_PORT_RIGHT_DEAD_NAME ((mach_port_right_t) 4) #define MACH_PORT_RIGHT_NUMBER ((mach_port_right_t) 5) typedef natural_t mach_port_type_t; typedef mach_port_type_t *mach_port_type_array_t; #define MACH_PORT_TYPE(right) \ ((mach_port_type_t)(((mach_port_type_t) 1) \ << ((right) + ((mach_port_right_t) 16)))) #define MACH_PORT_TYPE_NONE ((mach_port_type_t) 0L) #define MACH_PORT_TYPE_SEND MACH_PORT_TYPE(MACH_PORT_RIGHT_SEND) #define MACH_PORT_TYPE_RECEIVE MACH_PORT_TYPE(MACH_PORT_RIGHT_RECEIVE) #define MACH_PORT_TYPE_SEND_ONCE MACH_PORT_TYPE(MACH_PORT_RIGHT_SEND_ONCE) #define MACH_PORT_TYPE_PORT_SET MACH_PORT_TYPE(MACH_PORT_RIGHT_PORT_SET) #define MACH_PORT_TYPE_DEAD_NAME MACH_PORT_TYPE(MACH_PORT_RIGHT_DEAD_NAME)
Before Mach 3.0, names of routines and data types in the IPC interface were not prefixed with mach_ or MACH_. For example, instead of mach_port_t, there was port_t. The prefixes were added in Mach 3.0 to avoid any name conflicts between the old and the new Mach interfaces, even though the two are similar in many respects. This allows the same set of header files to export both interfaces and allows a program to mix interfaces, if necessary. Although port names are commonly assigned by the kernel, a user program can create a port right with a specific nameusing the mach_port_allocate_name() routine. A kernel-assigned mach_port_name_t value has two components: an index and a generation number. // osfmk/mach/port.h #define MACH_PORT_INDEX(name) ((name) >> 8) #define MACH_PORT_GEN(name) (((name) & 0xff) << 24) #define MACH_PORT_MAKE(index, gen) (((index) << 8) | (gen) >> 24) If a user program needs to use port names for arbitrarily mapping them to user data, it must use only the index part of the port name, which is why the layout of a mach_port_name_t is exposed to user space.
9.3.2.2. Validity of a Port NameThe kernel defines the value 0 to be the name of the null port (MACH_PORT_NULL). A null port is a legal port value that can be carried in messages to indicate the absence of any port or port rights. A dead port (MACH_PORT_DEAD) indicates that a port right was present but no longer isthat is, the right is dead. The numerical value of MACH_PORT_DEAD is a natural_t with all bits set. It is also a legal port value that can appear in a message. However, these two values do not represent valid ports. All remaining natural_t values are valid port values. The header file osfmk/mach/port.h contains several port-related definitions. The code that manages IPC entries provides interfaces to look up an IPC object given its name in an IPC space and, conversely, to look up the name of an IPC object in a given IPC space. The former type of lookup, typically a <task, mach_port_name_t> mach_port_t TRanslation, is used while sending a message. The latter, typically a <task, mach_port_t> mach_port_name_t translation, is used while receiving a message. 9.3.3. Tasks and IPCMach tasks and threads both begin life with certain sets of standard Mach ports (recall that we came across these ports in Chapter 7). Figure 98 shows the IPC-related data structures associated with a task. Besides the task's standard ports, the task structure also contains a pointer (itk_space) to the task's IPC space. Figure 98. IPC-related data structures associated with a Mach task
The set of standard task ports includes the following:
When a task is created, a new port is allocated in the kernel's IPC space. The task structure's itk_self field is set to the name of this port, whereas the itk_self member contains a send right to this port. A new IPC space is created for the task and assigned to the task structure's itk_space field. The new task inherits the parent's registered, exception, host, and bootstrap ports, as the kernel creates naked[7] send rights for the child for each of these ports from the existing naked rights of the parent. As noted in Chapter 7, other than these ports, Mach ports are not inherited across task creationthat is, across the fork() system call.
As we saw in Chapter 5, /sbin/launchd is the first user-level program executed by the kernel. launchd is the ultimate parent of all user processes, analogous to the traditional init program on Unix systems. Moreover, launchd also acts as the Bootstrap Server.
On Mac OS X versions prior to 10.4, the first user-level program executed by the kernel is /sbin/mach_init, which forks and runs/sbin/init. The launchd program subsumes the functionality of both mach_init and init in Mac OS X 10.4. During its initialization, launchd allocates several Mach ports, one of which it sets as its bootstrap port by calling task_set_bootstrap_port(). This port (technically a subset of this port, with limited scope) is inherited by new tasks as they are created, allowing all programs to communicate with the Bootstrap Server.
task_set_bootstrap_port() is a macro that resolves to a call to task_set_special_port() with TASK_BOOTSTRAP_PORT as an argument. 9.3.4. Threads and IPCFigure 99 shows the IPC-related data structures associated with a thread. Like a task, a thread contains a self port and a set of exception ports used for error handling. Whereas a newly created task's exception ports are inherited from the parent, each of a thread's exception ports is initialized to the null port when the thread is created. Both task and thread exception ports can be programmatically changed later. If a thread exception port for an exception type is the null port, the kernel uses the next most specific port: the corresponding task-level exception port. Figure 99. IPC-related data structures associated with a Mach thread
The thread structure's ith_rpc_reply field is used to hold the reply port for kernel RPCs. When the kernel needs to send a message to the thread and receives a reply (i.e., performs an RPC), it allocates a reply port if the current value of ith_rpc_reply is IP_NULL. 9.3.5. Port AllocationNow that we are familiar with port-related data structures and the roles ports play, let us look at the important steps involved in the allocation of a port right. Figure 910 shows these steps. Figure 910. The allocation of a port rightAlthough mach_port_allocate() is typically used to allocate a port right, there exist more flexible variants such as mach_port_allocate_name() and mach_port_allocate_qos() that allow additional properties of the new right to be specified. All these routines are special cases of mach_port_allocate_full(), which is also available to user space. typedef struct mach_port_qos { boolean_t name:1; // caller-specified port name boolean_t prealloc:1; // preallocate a message buffer boolean_t pad1:30; natural_t len; // length of preallocated message buffer } mach_port_qos_t; kern_return_t mach_port_allocate_full( ipc_space_t space, // target IPC space mach_port_right_t right, // type of right to be created mach_port_t proto, // subsystem (unused) mach_port_qos_t *qosp, // quality of service mach_port_name_t *namep); // new port right's name in target IPC space mach_port_allocate_full() creates one of three types of port rights based on the value passed as the right argument:
It is possible to create a port right with a caller-specified name, which must not already be in use for a port right in the target IPC space. Moreover, the target space must not be a fast IPC space. The caller can specify a name by passing a pointer to it in the namep argument and setting the name bit-field of the passed-in quality of service (QoS) structure. The latter is also used to designate the new port as a real-time port that requires QoS guarantees. The only manifestation of a QoS guarantee is that a message buffer is preallocated and associated with the port's internal data structure. The buffer's size is specified by the len field of the QoS structure. The kernel uses a port's preallocated bufferif it has onewhen sending messages from the kernel. This way, a sender of critical messages can avoid blocking on memory allocation. As Figure 910 shows, mach_port_allocate_full() calls different internal "alloc" functions based on the type of right. In the case of a receive right, ipc_port_alloc_name() [osfmk/ipc/ipc_port.c] is called if the caller has mandated a specific name; otherwise, ipc_port_alloc() [osfmk/ipc/ipc_port.c] is called. ipc_port_alloc() calls ipc_object_alloc() [osfmk/ipc/ipc_object.c] to allocate an IPC object of type IOT_PORT. If successful, it calls ipc_port_init() [osfmk/ipc/ipc_port.c] to initialize the newly allocated port and then returns. Similarly, ipc_port_alloc_name() calls ipc_object_alloc_name() to allocate an IOT_PORT object with a specific name. Allocation of an IPC object includes the following steps.
The mach_port_names() routine can be used to retrieve a list of ports, along with their types, in a given IPC space. Moreover, mach_port_get_attributes() returns various flavors of attribute information about a port. The program shown in Figure 911 lists details of port rights in a (BSD) task given its process ID. Note that the mach_port_status structure populated by mach_port_get_attributes() contains other fields besides those printed by our program. Figure 911. Listing the Mach ports and their attributes in a given process
9.3.6. Messaging ImplementationLet us look at how the kernel handles sending and receiving messages. Given that IPC underlies much of the functionality in Mach, messaging is a frequent operation in a Mach-based system. It is therefore not surprising that a Mach implementation, especially one used in a commercial system like Mac OS X, would be heavily optimized. The core kernel function involved in messagingboth sending and receivingis the one that we came across earlier: mach_msg_overwrite_trap() [osfmk/ipc/mach_msg.c]. This function contains numerous special cases that attempt to improve performance in different situations. One of the optimizations used is handoff scheduling. As we saw in Chapter 7, handoff scheduling involves direct transfer of processor control from one thread to another. A handoff may be performed both by senders and by receivers participating in RPCs. For example, if a server thread is currently blocked in a receive call, a client thread can hand off to the server thread and block itself while it waits for the reply. Similarly, when the server is ready to send a reply to the client, it will hand off to the waiting client thread and block itself as it waits for the next request. This way, it is also possible to avoid having to enqueue and dequeue messages, since a message can be directly transferred to the receiver. Figure 912 shows a simplified overviewwithout any special casesof the kernel processing involved in sending a message. Figure 912. An overview of the kernel processing for sending a Mach IPC messageMach message passing is reliable and order-preserving. Therefore, messages may not be lost and are always received in the order they were sent. However, the kernel delivers messages sent to send-once rights out of order and without taking into account the receiving port's queue length or how full it is. We noted earlier that the length of a port's message queue is finite. When a queue becomes full, several behaviors are possible, such as the following.
Various other error codes can be returned when sending a message fails. These fall in a few general categories, such as the following:
Figure 913 shows a simplified overview of the kernel processing involved in receiving a message. Figure 913. An overview of the kernel processing for receiving a Mach IPC message9.3.7. IPC Subsystem InitializationFigure 914 shows how the IPC subsystem is initialized when the kernel boots. We have already come across some aspects of this initialization, for example, the setting up of the host special ports. We will discuss MIG initialization in Section 9.6.3.2. Figure 914. Initialization of the IPC subsystem
host_notify_init() initializes a system-wide notification mechanism that allows user programs to request notifications on one of the host notification ports managed by Mach. Mac OS X 10.4 provides only one notification port as part of this mechanism: HOST_NOTIFY_CALENDAR_CHANGE. A program can use the host_request_notification() Mach routine to request the kernel to send it a message when the system's date or time changes. Mac OS X has numerous other notification mechanisms, most of which we will discuss in Section 9.16. |