Section 9.3. Mach IPC: The Mac OS X Implementation


9.3. Mach IPC: The Mac OS X Implementation

The core of the IPC subsystem is implemented in files in the osfmk/ipc/ directory in the kernel source tree. Moreover, the osfmk/kern/ipc_* set of files implements IPC support functions and IPC-related functions for kernel objects such as tasks and threads. Figure 95 shows an overview of Mach IPC implementation in Mac OS X. We will examine the pieces of this picture in the next few sections.

Figure 95. An overview of Mach IPC implementation in Mac OS X


9.3.1. IPC Spaces

Each task has a private IPC spacea namespace for portsthat is represented by the ipc_space structure in the kernel. A task's IPC space defines its IPC capabilities. Consequently, IPC operations such as send and receive consult this space. Similarly, IPC operations that manipulate a task's rights operate on the task's IPC space. Figure 96 shows the fields of the ipc_space structure.

Figure 96. The data structure for a task's IPC space

// osfmk/ipc/ipc_space.h typedef natural_t ipc_space_refs_t; struct ipc_space {     decl_mutex_data(,is_ref_lock_data)     ipc_space_refs_t is_references;     decl_mutex_data(,is_lock_data)     // is the space active?     boolean_t is_active;     // is the space growing?     boolean_t is_growing;     // table (array) of IPC entries     ipc_entry_t is_table;     // current table size     ipc_entry_num_t is_table_size;     // information for larger table     struct ipc_table_size *is_table_next;     // splay tree of IPC entries (can be NULL)     struct ipc_splay_tree is_tree;     // number of entries in the tree     ipc_entry_num_t is_tree_total;     // number of "small" entries in the tree     ipc_entry_num_t is_tree_small;     // number of hashed entries in the tree     ipc_entry_num_t is_tree_hash;     // for is_fast_space()     boolean_t is_fast; };

The IPC space encapsulates the knowledge necessary to translate between task-specific (local) port names and kernel-wide (global) port data structures. This translation is implemented using translation entries for port capabilities. Each capability is recorded in the kernel using an IPC entry data structure (struct ipc_entry). An IPC space always contains a table of IPC entries that is pointed to by the is_table field of the ipc_space structure. It can also contain a splay tree[6] of IPC entries, in which case the is_tree field will be non-NULL. Both these are per-task data structures.

[6] A splay tree is a space-efficient, self-adjusting binary search tree with (amortized) logarithmic time.

The table holds "small" port rights, with each table entry (struct ipc_entry) consuming 16 bytes. If a port right is contained in the table, the right's name is an index into the table. The splay tree holds "large" port rights, with each tree entry (struct ipc_tree_entry) consuming 32 bytes.

Naturally Speaking

The integer type used to represent a port name is historically the native integer type for the machine. This type is called natural_t and is accessed by including <mach/machine/vm_types.h>, which in turn accesses it from <mach/ppc/vm_types.h> or <mach/i386/vm_types.h> on the PowerPC and x86 versions, respectively, of Mac OS X. With the introduction of the 64-bit Darwin ABI, several Mach data types (such as vm_offset_t and vm_size_t) have been scaled to be the same size as a pointer. However, natural_t is 32 bits in size regardless of the ABI.


9.3.1.1. IPC Entry Table

In general, port right names, which are integers (see Section 9.3.2), do fit in a table because the number of ports a typical task uses is small enough. As we will see shortly, Mach allows a task to rename a port. Moreover, ports can also be allocated using caller-specified names. This means a port name could represent an index that is out of bounds for the task's table. Such rights can be accommodated by overflowing them to the task's splay tree. To minimize memory consumption, the kernel dynamically adjusts the threshold at which entries are held in the splay tree. In fact, the table can also be grown in size. When the kernel does grow the table, it expands it to a new size that is specified (in units of number of table entries) by the is_table_next field of the ipc_space structure. As shown in Figure 95, the is_table_next field points to an ipc_table_size structure. The kernel maintains an array called ipc_table_entries of such structures. This array, which is populated during the IPC subsystem's initialization, is simply a predefined sequence of table sizes.

Fast IPC Space

A fast IPC space is a special-case space that does not use a splay tree. It can be used only if port names are guaranteed to be within table bounds.


When a port right whose entry is in the table is deleted, the entry is placed on a free list of unused entries. The list is maintained within the table itself by chaining together unused entries through their ie_next fields. When the next port right is allocated, the last freed entry (if any) is used. The ie_index field implements an ordered hash table used for (reverse) translating an { IPC space, IPC object } pair to a name. This hash table uses open addressing with linear probing.

9.3.1.2. IPC Entry Splay Tree

As shown in Figure 95, an entry in the splay tree consists of an ipc_entry structure (the same as a table entry) along with the following additional fields: name, IPC space, and pointers to left and right children. The ite_next field implements a global open hash table used for (reverse) translating an { IPC space, IPC object } pair to a { name, IPC entry } pair.

9.3.2. The Anatomy of a Mach Port

A Mach port is represented in the kernel by a pointer to an ipc_port structure. The IPC entry structure's ipc_object field points to an ipc_object structure, which is logically superimposed on an ipc_port structure. Figure 97 shows an internal representation of the port data structure.

Figure 97. A view of the internal structure of a Mach port


From an object-oriented perspective, an ipc_port structure is a subclass of an ipc_object structure. Ports can be grouped into port sets in Mach, with the corresponding structure being an ipc_pset structure [osfmk/ipc/ipc_pset.h]. In such a case, a right will be represented in the kernel by passing a pointer to the ipc_pset structure in question (rather than an ipc_port structure). Another possibility is an rpc_port structure.


The fields of an ipc_port structure include a pointer to the IPC space of the task holding the receive right, a pointer to the kernel object that the port represents, and various reference counts such as the make-send count, the number of send rights, and the number of send-once rights.

9.3.2.1. What's in a Port's Name?

It is important to realize the different between mach_port_t and mach_port_name_t: The two are treated the same in user space but not in the kernel. A port's name is relevant only in a particular namespace, corresponding to a task. A mach_port_name_t represents the local, namespace-specific identity of a port, without implying any associated rights. A mach_port_t represents a reference added or deleted to a port right. Such a reference is represented in user space by returning the name of the right (or many rights) that was altered within the task's IPC space, which is why it is the same as a mach_port_name_t in user space. Within the kernel, however, port rights are represented by passing a pointer to the appropriate port data structure (ipc_port_t). If a user program receives a mach_port_name_t from the kernel, it means that the kernel has not mapped any associated port rightsthe name is simply the port's integer representation. When the kernel returns a mach_port_t, it maps the associated port rights to the recipient of the message. In both cases, the user program sees the same integer, but with different underlying semantics.

The same port can exist with different names in multiple tasks. Conversely, the same port name can represent different ports in different tasks. It is important to note that knowing a port name in another task is not enough to use that port, since the kernel will evaluate the name in the caller's IPC space. For example, if you print a mach_port_name_t value in a program and then attempt to use the value in another task (one that does not have send rights to that port) to send a message, you will not succeed.


In a given port namespace, if there exist multiple rights for a given port, say, a send right and a receive right, the names for the various rights will coalesce into a single name. In other words, a single name can denote multiple rights. This is not so in the case of send-once rights, which are always named uniquely.

The ie_bits field of the ipc_entry structure holds the types of rights a given name represents. This bitmap is what allows a single name in an IPC space to represent multiple rights. The IE_BITS_TYPE macro is used to test the bit values.

// osfmk/mach/mach_port.h typedef natural_t mach_port_right_t; #define MACH_PORT_RIGHT_SEND      ((mach_port_right_t) 0) #define MACH_PORT_RIGHT_RECEIVE   ((mach_port_right_t) 1) #define MACH_PORT_RIGHT_SEND_ONCE ((mach_port_right_t) 2) #define MACH_PORT_RIGHT_PORT_SET  ((mach_port_right_t) 3) #define MACH_PORT_RIGHT_DEAD_NAME ((mach_port_right_t) 4) #define MACH_PORT_RIGHT_NUMBER    ((mach_port_right_t) 5) typedef natural_t         mach_port_type_t; typedef mach_port_type_t *mach_port_type_array_t; #define MACH_PORT_TYPE(right)                                           \                 ((mach_port_type_t)(((mach_port_type_t) 1)              \                 << ((right) + ((mach_port_right_t) 16)))) #define MACH_PORT_TYPE_NONE      ((mach_port_type_t) 0L) #define MACH_PORT_TYPE_SEND      MACH_PORT_TYPE(MACH_PORT_RIGHT_SEND) #define MACH_PORT_TYPE_RECEIVE   MACH_PORT_TYPE(MACH_PORT_RIGHT_RECEIVE) #define MACH_PORT_TYPE_SEND_ONCE MACH_PORT_TYPE(MACH_PORT_RIGHT_SEND_ONCE) #define MACH_PORT_TYPE_PORT_SET  MACH_PORT_TYPE(MACH_PORT_RIGHT_PORT_SET) #define MACH_PORT_TYPE_DEAD_NAME MACH_PORT_TYPE(MACH_PORT_RIGHT_DEAD_NAME)


Before Mach 3.0, names of routines and data types in the IPC interface were not prefixed with mach_ or MACH_. For example, instead of mach_port_t, there was port_t. The prefixes were added in Mach 3.0 to avoid any name conflicts between the old and the new Mach interfaces, even though the two are similar in many respects. This allows the same set of header files to export both interfaces and allows a program to mix interfaces, if necessary.


Although port names are commonly assigned by the kernel, a user program can create a port right with a specific nameusing the mach_port_allocate_name() routine. A kernel-assigned mach_port_name_t value has two components: an index and a generation number.

// osfmk/mach/port.h #define MACH_PORT_INDEX(name)      ((name) >> 8) #define MACH_PORT_GEN(name)        (((name) & 0xff) << 24) #define MACH_PORT_MAKE(index, gen) (((index) << 8) | (gen) >> 24)


If a user program needs to use port names for arbitrarily mapping them to user data, it must use only the index part of the port name, which is why the layout of a mach_port_name_t is exposed to user space.

Renaming Ports

It is possible for a task to rename a port to a new name. Such renaming may be useful if a program wishes to overload port names with some program-specific meaning, say, the address of hash table entries, each of which has a one-to-one correspondence with a port name. A task still cannot have multiple names for the same port.


9.3.2.2. Validity of a Port Name

The kernel defines the value 0 to be the name of the null port (MACH_PORT_NULL). A null port is a legal port value that can be carried in messages to indicate the absence of any port or port rights. A dead port (MACH_PORT_DEAD) indicates that a port right was present but no longer isthat is, the right is dead. The numerical value of MACH_PORT_DEAD is a natural_t with all bits set. It is also a legal port value that can appear in a message. However, these two values do not represent valid ports. All remaining natural_t values are valid port values. The header file osfmk/mach/port.h contains several port-related definitions.

The code that manages IPC entries provides interfaces to look up an IPC object given its name in an IPC space and, conversely, to look up the name of an IPC object in a given IPC space. The former type of lookup, typically a <task, mach_port_name_t> mach_port_t TRanslation, is used while sending a message. The latter, typically a <task, mach_port_t> mach_port_name_t translation, is used while receiving a message.

9.3.3. Tasks and IPC

Mach tasks and threads both begin life with certain sets of standard Mach ports (recall that we came across these ports in Chapter 7). Figure 98 shows the IPC-related data structures associated with a task. Besides the task's standard ports, the task structure also contains a pointer (itk_space) to the task's IPC space.

Figure 98. IPC-related data structures associated with a Mach task

// osfmk/mach/ppc/exception.h #define EXC_TYPES_COUNT          10 // osfmk/mach/mach_param.h #define TASK_PORT_REGISTER_MAX3 // number of "registered" ports // osfmk/kern/task.h struct task {     // task's lock     decl_mutex_data(,lock)     ...     // IPC lock     decl_mutex_data(,itk_lock_data)     // not a right -- ipc_receiver does not hold a reference for the space     // used for representing a kernel object of type IKOT_TASK     struct ipc_port *itk_self;     // "self" port -- a "naked" send right made from itk_self     // this is the task's kernel port (TASK_KERNEL_PORT)     struct ipc_port *itk_sself;     // "exception" ports -- a send right for each valid element     struct exception_action exc_actions[EXC_TYPES_COUNT];     // "host" port -- a send right     struct ipc_port *itk_host;     // "bootstrap" port -- a send right     struct ipc_port *itk_bootstrap;     // "registered" port -- a send right for each element     struct ipc_port *itk_registered[TASK_PORT_REGISTER_MAX];     // task's IPC space     struct ipc_space *itk_space;     ... };

The set of standard task ports includes the following:

  • A self portalso known as the task's kernel portrepresents the task itself. The kernel holds receive rights to this port. The self port is used by the task to invoke operations on itself. Other programs (such as debuggers) wishing to perform operations on a task also use this port.

  • A set of exception ports includes one port for each type of exception supported by the kernel. The kernel sends a message to the task's appropriate exception port when an exception occurs in one of the task's threads. Note that exception ports also exist at the thread level (more specific than a task-level exception port) and the host level (less specific). As we will see in Section 9.7.2.1, the kernel attempts to send exception messages to the most specific port first. Exception ports are used for implementing both error-handling and debugging mechanisms.

  • A host port represents the host on which the task is running.

  • A bootstrap port is used to send messages to the Bootstrap Server, which is essentially a local name server for services accessible through Mach ports. Programs can contact the Bootstrap Server requesting the return of other system service ports.

  • A set of well-known system ports are registered for a taskthese are used by the runtime system to initialize the task. There can be at most TASK_PORT_REGISTER_MAX such ports. The mach_ports_register() routine can be used to register an array of send rights, with each right filling a slot in the itk_registered array in the task structure.

Host Special Ports

A host object is represented in the kernel by host_data_t, which is an alias for struct host [osfmk/kern/host.h]. This structure contains an array of host-level special ports and another array of host-level exception ports. The host special ports are host port, host privileged port, and host security port. These ports are used for exporting different interfaces to the host object.

The host port is used as an argument in "safe" Mach routines that retrieve unprivileged information about the host. Acquiring send rights to this port does not require the calling task to be privileged. The host privileged port, which can be acquired only by a privileged task, is used in privileged Mach routines, such as host_processors(), which retrieves a list of send rights representing all processors in the system. The host security port is used to change a given task's security token or to create a task with an explicit security token.

When the IPC subsystem is initialized, each host-level special port is set to represent a send right to the same port.


When a task is created, a new port is allocated in the kernel's IPC space. The task structure's itk_self field is set to the name of this port, whereas the itk_self member contains a send right to this port. A new IPC space is created for the task and assigned to the task structure's itk_space field. The new task inherits the parent's registered, exception, host, and bootstrap ports, as the kernel creates naked[7] send rights for the child for each of these ports from the existing naked rights of the parent. As noted in Chapter 7, other than these ports, Mach ports are not inherited across task creationthat is, across the fork() system call.

[7] A naked right exists only in the context of the kernel task. It is so named because such a right is not inserted into the port namespace of the kernel taskit exists in limbo.

As we saw in Chapter 5, /sbin/launchd is the first user-level program executed by the kernel. launchd is the ultimate parent of all user processes, analogous to the traditional init program on Unix systems. Moreover, launchd also acts as the Bootstrap Server.

On Mac OS X versions prior to 10.4, the first user-level program executed by the kernel is /sbin/mach_init, which forks and runs/sbin/init. The launchd program subsumes the functionality of both mach_init and init in Mac OS X 10.4.


During its initialization, launchd allocates several Mach ports, one of which it sets as its bootstrap port by calling task_set_bootstrap_port(). This port (technically a subset of this port, with limited scope) is inherited by new tasks as they are created, allowing all programs to communicate with the Bootstrap Server.

task_set_bootstrap_port() is a macro that resolves to a call to task_set_special_port() with TASK_BOOTSTRAP_PORT as an argument.


9.3.4. Threads and IPC

Figure 99 shows the IPC-related data structures associated with a thread. Like a task, a thread contains a self port and a set of exception ports used for error handling. Whereas a newly created task's exception ports are inherited from the parent, each of a thread's exception ports is initialized to the null port when the thread is created. Both task and thread exception ports can be programmatically changed later. If a thread exception port for an exception type is the null port, the kernel uses the next most specific port: the corresponding task-level exception port.

Figure 99. IPC-related data structures associated with a Mach thread

// osfmk/kern/thread.h struct thread {     ...     struct ipc_kmsg_queue ith_messages;     // reply port -- for kernel RPCs     mach_port_t ith_rpc_reply;     ...     // not a right -- ip_receiver does not hold a reference for the space     // used for representing a kernel object of type IKOT_THREAD     struct ipc_port *ith_self;     // "self" port -- a "naked" send right made from ith_self     // this is the thread's kernel port (THREAD_KERNEL_PORT)     struct ipc_port *ith_sself;     // "exception" ports -- a send right for each valid element     struct exception_action exc_actions[EXC_TYPES_COUNT];     ... };

The thread structure's ith_rpc_reply field is used to hold the reply port for kernel RPCs. When the kernel needs to send a message to the thread and receives a reply (i.e., performs an RPC), it allocates a reply port if the current value of ith_rpc_reply is IP_NULL.

9.3.5. Port Allocation

Now that we are familiar with port-related data structures and the roles ports play, let us look at the important steps involved in the allocation of a port right. Figure 910 shows these steps.

Figure 910. The allocation of a port right


Although mach_port_allocate() is typically used to allocate a port right, there exist more flexible variants such as mach_port_allocate_name() and mach_port_allocate_qos() that allow additional properties of the new right to be specified. All these routines are special cases of mach_port_allocate_full(), which is also available to user space.

typedef struct mach_port_qos {     boolean_t name:1;     // caller-specified port name     boolean_t prealloc:1; // preallocate a message buffer     boolean_t pad1:30;     natural_t len;        // length of preallocated message buffer } mach_port_qos_t; kern_return_t mach_port_allocate_full(     ipc_space_t        space,  // target IPC space     mach_port_right_t  right,  // type of right to be created     mach_port_t        proto,  // subsystem (unused)     mach_port_qos_t   *qosp,   // quality of service     mach_port_name_t  *namep); // new port right's name in target IPC space


mach_port_allocate_full() creates one of three types of port rights based on the value passed as the right argument:

  • A receive right (MACH_PORT_RIGHT_RECEIVE), which is the most common type of right created through this function

  • An empty port set (MACH_PORT_RIGHT_PORT_SET)

  • A dead name (MACH_PORT_RIGHT_DEAD_NAME) with one user reference

It is possible to create a port right with a caller-specified name, which must not already be in use for a port right in the target IPC space. Moreover, the target space must not be a fast IPC space. The caller can specify a name by passing a pointer to it in the namep argument and setting the name bit-field of the passed-in quality of service (QoS) structure. The latter is also used to designate the new port as a real-time port that requires QoS guarantees. The only manifestation of a QoS guarantee is that a message buffer is preallocated and associated with the port's internal data structure. The buffer's size is specified by the len field of the QoS structure. The kernel uses a port's preallocated bufferif it has onewhen sending messages from the kernel. This way, a sender of critical messages can avoid blocking on memory allocation.

As Figure 910 shows, mach_port_allocate_full() calls different internal "alloc" functions based on the type of right. In the case of a receive right, ipc_port_alloc_name() [osfmk/ipc/ipc_port.c] is called if the caller has mandated a specific name; otherwise, ipc_port_alloc() [osfmk/ipc/ipc_port.c] is called. ipc_port_alloc() calls ipc_object_alloc() [osfmk/ipc/ipc_object.c] to allocate an IPC object of type IOT_PORT. If successful, it calls ipc_port_init() [osfmk/ipc/ipc_port.c] to initialize the newly allocated port and then returns. Similarly, ipc_port_alloc_name() calls ipc_object_alloc_name() to allocate an IOT_PORT object with a specific name.

Allocation of an IPC object includes the following steps.

  • Allocate an IPC object structure (struct ipc_object [osfmk/ipc/ipc_object.h]) from the appropriate zone for the IPC object type. Note that a pointer to this structure is the in-kernel representation of the port (struct ipc_port [osfmk/ipc/ipc_port.h]).

  • Initialize the mutex within the IPC object structure.

  • Allocate an IPC object entry structure (struct ipc_entry [osfmk/ipc/ipc_entry.h]). This operation first attempts to find a free entry in the given IPC space's table using the "first free" hint. If there are no free entries in the table, the table is grown. If the table is already being grown because of some other thread, the caller blocks until the growing finishes.

The mach_port_names() routine can be used to retrieve a list of ports, along with their types, in a given IPC space. Moreover, mach_port_get_attributes() returns various flavors of attribute information about a port. The program shown in Figure 911 lists details of port rights in a (BSD) task given its process ID. Note that the mach_port_status structure populated by mach_port_get_attributes() contains other fields besides those printed by our program.

Figure 911. Listing the Mach ports and their attributes in a given process

// lsports.c #include <stdio.h> #include <stdlib.h> #include <mach/mach.h> #define PROGNAME "lsports" #define EXIT_ON_MACH_ERROR(msg, retval) \     if (kr != KERN_SUCCESS) { mach_error(msg ":" , kr); exit((retval)); } void print_mach_port_type(mach_port_type_t type) {     if (type & MACH_PORT_TYPE_SEND)      { printf("SEND ");      }     if (type & MACH_PORT_TYPE_RECEIVE)   { printf("RECEIVE ");   }     if (type & MACH_PORT_TYPE_SEND_ONCE) { printf("SEND_ONCE "); }     if (type & MACH_PORT_TYPE_PORT_SET)  { printf("PORT_SET ");  }     if (type & MACH_PORT_TYPE_DEAD_NAME) { printf("DEAD_NAME "); }     if (type & MACH_PORT_TYPE_DNREQUEST) { printf("DNREQUEST "); }     printf("\n"); } int main(int argc, char **argv) {     int                    i;     pid_t                  pid;     kern_return_t          kr;     mach_port_name_array_t names;     mach_port_type_array_t types;     mach_msg_type_number_t ncount, tcount;     mach_port_limits_t     port_limits;     mach_port_status_t     port_status;     mach_msg_type_number_t port_info_count;     task_t                 task;     task_t                 mytask = mach_task_self();     if (argc != 2) {         fprintf(stderr, "usage: %s <pid>\n", PROGNAME);         exit(1);     }     pid = atoi(argv[1]);     kr = task_for_pid(mytask, (int)pid, &task);     EXIT_ON_MACH_ERROR("task_for_pid", kr);     // retrieve a list of the rights present in the given task's IPC space,     // along with type information (no particular ordering)     kr = mach_port_names(task, &names, &ncount, &types, &tcount);     EXIT_ON_MACH_ERROR("mach_port_names", kr);     printf("%8s %8s %8s %8s %8s task rights\n",            "name", "q-limit", "seqno", "msgcount", "sorights");     for (i = 0; i < ncount; i++) {         printf("%08x ", names[i]);         // get resource limits for the port         port_info_count = MACH_PORT_LIMITS_INFO_COUNT;         kr = mach_port_get_attributes(                  task,                           // the IPC space in question                  names[i],                       // task's name for the port                  MACH_PORT_LIMITS_INFO,          // information flavor desired                  (mach_port_info_t)&port_limits, // outcoming information                  &port_info_count);              // size returned         if (kr == KERN_SUCCESS)             printf("%8d ", port_limits.mpl_qlimit);         else             printf("%8s ", "-");         // get miscellaneous information about associated rights and messages         port_info_count = MACH_PORT_RECEIVE_STATUS_COUNT;         kr = mach_port_get_attributes(task, names[i], MACH_PORT_RECEIVE_STATUS,                                       (mach_port_info_t)&port_status,                                       &port_info_count);         if (kr == KERN_SUCCESS) {             printf("%8d %8d %8d ",                    port_status.mps_seqno,     // current sequence # for the port                    port_status.mps_msgcount,  // # of messages currently queued                    port_status.mps_sorights); // # of send-once rights         } else             printf("%8s %8s %8s ", "-", "-", "-");         print_mach_port_type(types[i]);     }     vm_deallocate(mytask, (vm_address_t)names, ncount*sizeof(mach_port_name_t));     vm_deallocate(mytask, (vm_address_t)types, tcount*sizeof(mach_port_type_t));     exit(0); } $ gcc -Wall -o lsports lsports.c $ ./lsports $$ # superuser privileges required on newer versions of Mac OS X     name  q-limit    seqno msgcount sorights task rights 0000010f        5        0        0        0 RECEIVE 00000207        -        -        -        - SEND 00000307        -        -        -        - SEND 0000040f        5        0        0        0 RECEIVE 00000507        5       19        0        0 RECEIVE 0000060b        5        0        0        0 RECEIVE 0000070b        -        -        -        - SEND 00000807        -        -        -        - SEND 00000903        5        0        0        0 RECEIVE 00000a03        5       11        0        0 RECEIVE 00000b03        -        -        -        - SEND 00000c07        -        -        -        - SEND 00000d03        -        -        -        - SEND 00000e03        5       48        0        0 RECEIVE 00000f03        -        -        -        - SEND

9.3.6. Messaging Implementation

Let us look at how the kernel handles sending and receiving messages. Given that IPC underlies much of the functionality in Mach, messaging is a frequent operation in a Mach-based system. It is therefore not surprising that a Mach implementation, especially one used in a commercial system like Mac OS X, would be heavily optimized. The core kernel function involved in messagingboth sending and receivingis the one that we came across earlier: mach_msg_overwrite_trap() [osfmk/ipc/mach_msg.c]. This function contains numerous special cases that attempt to improve performance in different situations.

One of the optimizations used is handoff scheduling. As we saw in Chapter 7, handoff scheduling involves direct transfer of processor control from one thread to another. A handoff may be performed both by senders and by receivers participating in RPCs. For example, if a server thread is currently blocked in a receive call, a client thread can hand off to the server thread and block itself while it waits for the reply. Similarly, when the server is ready to send a reply to the client, it will hand off to the waiting client thread and block itself as it waits for the next request. This way, it is also possible to avoid having to enqueue and dequeue messages, since a message can be directly transferred to the receiver.

Figure 912 shows a simplified overviewwithout any special casesof the kernel processing involved in sending a message.

Figure 912. An overview of the kernel processing for sending a Mach IPC message


Mach message passing is reliable and order-preserving. Therefore, messages may not be lost and are always received in the order they were sent. However, the kernel delivers messages sent to send-once rights out of order and without taking into account the receiving port's queue length or how full it is. We noted earlier that the length of a port's message queue is finite. When a queue becomes full, several behaviors are possible, such as the following.

  • The default behavior is to block new senders until there is room in the queue.

  • If a sender uses the MACH_SEND_TIMEOUT option in its invocation of mach_msg() or mach_msg_overwrite(), the sender will block for at most the specified time. If the message still cannot be delivered after that time has passed, a MACH_SEND_TIMED_OUT error will be returned.

  • If the message is being sent using a send-once right, the kernel will deliver the message despite the queue being full.

Various other error codes can be returned when sending a message fails. These fall in a few general categories, such as the following:

  • Those that indicate that the send call did not perform any operation from the caller's standpoint, usually because one or more of the arguments (or their properties) were invalidsay, an invalid message header or an invalid destination port

  • Those that indicate that the message was partly or wholly destroyedfor example, because the out-of-line memory being sent was invalid or a port right being sent in the message was bogus

  • Those that indicate that the message was returned to the sender because of a send timeout or a software interrupt

Figure 913 shows a simplified overview of the kernel processing involved in receiving a message.

Figure 913. An overview of the kernel processing for receiving a Mach IPC message


9.3.7. IPC Subsystem Initialization

Figure 914 shows how the IPC subsystem is initialized when the kernel boots. We have already come across some aspects of this initialization, for example, the setting up of the host special ports. We will discuss MIG initialization in Section 9.6.3.2.

Figure 914. Initialization of the IPC subsystem


host_notify_init() initializes a system-wide notification mechanism that allows user programs to request notifications on one of the host notification ports managed by Mach. Mac OS X 10.4 provides only one notification port as part of this mechanism: HOST_NOTIFY_CALENDAR_CHANGE. A program can use the host_request_notification() Mach routine to request the kernel to send it a message when the system's date or time changes. Mac OS X has numerous other notification mechanisms, most of which we will discuss in Section 9.16.





Mac OS X Internals. A Systems Approach
Mac OS X Internals: A Systems Approach
ISBN: 0321278542
EAN: 2147483647
Year: 2006
Pages: 161
Authors: Amit Singh

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net