Section 8.15. Memory Allocation in User Space

8.15. Memory Allocation in User Space

There are several user-space and kernel-space memory allocation APIs in Mac OS X, although all such APIs are built atop a single low-level mechanism. In the kernel, memory is fundamentally allocated through a page-level allocator in the Mach VM subsystem. In user space, memory is fundamentally allocated through the Mach vm_allocate() API,^[20] although user programs typically use application-environment-specific APIs for memory allocation. The system library provides malloc(), which is the preferred user-space memory allocation function. malloc() is implemented using the Mach API. Memory allocation APIs in Carbon and Core Foundation are implemented on top of malloc(). Besides malloc-based memory allocation, user programs can also use the stack-based alloca(3) memory allocator, which allocates temporary space in the runtime stack. The space is automatically reclaimed when the function returns.^[21] The alloca() function is built into the C compiler on Mac OS X.

^[20] We will not distinguish between the 32-bit and 64-bit Mach VM APIs in this section.

^[21] The Mac OS X alloca() implementation frees the allocated memory not upon the function's return but during a subsequent invocation of the function.

Table 86 shows a sampling of ways in which memory can be allocated by user-space programs. Note that the list shown is not exhaustiveits purpose is to illustrate that a variety of APIs exist. Moreover, in many cases, you can use a function listed under a particular environment in other environments as well.

Table 86. Notable User-Level Memory Allocation Functions
Environment	Allocation	Deallocation
System library (Mach)	`vm_allocate`	`vm_deallocate`
System library	`malloc`	`free`
Carbon	`NewPtr`	`DisposePtr`
Carbon	`NewHandle`	`DisposeHandle`
Carbon Multiprocessing Services	`MPAllocateAligned`	`MPFree`
Cocoa	`NSAllocateObject`	`NSDeallocateObject`
Cocoa	`[NSObject alloc]`	`[NSObject dealloc]`
Core Foundation	`CFAllocatorAllocate`	`CFAllocatorDeallocate`
Open Transport	`OTAllocInContext`	`OTFreeMem`

8.15.1. A Historical Break

Historically, in UNIX the sbrk() and brk() system calls are used to dynamically change the amount of space allocated for the calling program's data segment. sbrk() is implemented but not supported by the Mac OS X kerneldirectly invoking the system call will result in an ENOTSUP error being returned. brk() is not even implemented by the kernel. However, the system library implements both sbrk() and brk(). Whereas brk() always returns a value of -1, sbrk() simulates a program break region by using a 4MB memory area that it allocates through vm_allocate() on the first invocation of sbrk(). Subsequent invocations of sbrk() adjust the current break value within the simulated region. If the adjusted size falls outside the region, a value of -1 is returned.

// libSystem: mach/sbrk.c static int          sbrk_needs_init = TRUE; static vm_size_t    sbrk_region_size = 4*1024*1024; static vm_address_t sbrk_curbrk; caddr_t sbrk(int size) {     kern_return_t ret;     if (sbrk_needs_init) {         sbrk_needs_init = FALSE;         ret = vm_allocate(mach_task_self(), &sbrk_curbrk, sbrk_region_size,                           VM_MAKE_TAG(VM_MEMORY_SBRK)|TRUE);         ...     }     if (size <= 0)         return((caddr_t)sbrk_curbrk);     else if (size > sbrk_region_size)         return((caddr_t)-1);     sbrk_curbrk += size;     sbrk_region_size -= size;     return((caddr_t)(sbrk_curbrk  size)); }

Note the use of the VM_MAKE_TAG macro by sbrk(). The macro can be used to tag any vm_allocate() allocation, thereby indicating the purpose of that allocation. The available tag values are defined in <mach/vm_statistics.h>. Besides system-reserved tags, there are tags available that user programs can use for tagging application-specific allocations. The vmmap tool displays memory regions in a program along with the region types. For example, running vmmap with the process ID of the WindowServer program produces an output like the following.

$ sudo vmmap 71 ... REGION TYPE             [ VIRTUAL] ===========             [ =======] Carbon                  [   4080K] CoreGraphics            [ 271960K] IOKit                   [ 139880K] MALLOC                  [  46776K] STACK GUARD             [      8K] Stack                   [   9216K] VM_ALLOCATE ?           [   7724K] ...

In a program that has called sbrk(), the 4MB region can be seen in the output of the vmmap command as a writable region labeled SBRK.

// sbrk.c #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main(void) {     char cmdbuf[32];     sbrk(0);     snprintf(cmdbuf, 32, "vmmap %d", getpid());     return system(cmdbuf); } $ gcc -Wall -o sbrk sbrk.c $ ./sbrk ... SBRK                    [   4096K] ...

8.15.2. Memory Allocator Internals

The Mac OS X system library's malloc implementation uses an abstraction called a malloc zone (unrelated to the Mach zone allocator). Malloc zones are variable-size blocks of virtual memory from which malloc draws memory for its allocations. The system library creates a default malloc zone when the malloc package is initialized in a program, which occurs when the zone is accessed for the first time (e.g., during the program's first call to malloc() or calloc()).

A zone is analogous to a UNIX program's heap from the standpoint of memory allocation. The malloc implementation supports the creation of multiple zones. Although creating your own malloc zones is typically unnecessary, it may be useful in certain circumstances. Destroying a zone^[22] frees all objects allocated from that zone; therefore, using a custom zone may improve performance if a large number of temporary objects need to be deallocated. Figure 829 shows the API exported by the malloc zone layer.

^[22] The default zone should not be destroyed.

Figure 829. The malloc zones API

// Retrieve a pointer to the default zone malloc_zone_t *malloc_default_zone(void); // Create a new malloc zone malloc_zone_t *malloc_create_zone(vm_size_t start_size, unsigned flags); // Destroy an existing zone, freeing everything allocated from that zone void malloc_destroy_zone(malloc_zone_t *zone); // Allocate memory from the given zone void *malloc_zone_malloc(malloc_zone_t *zone, size_t size); // Allocate cleared (zero-filled) memory from the given zone for num_items // objects, each of which is size bytes large void *malloc_zone_calloc(malloc_zone_t *zone, size_t num_items, size_t size); // Allocate page-aligned, cleared (zero-filled) memory from the given zone void *malloc_zone_valloc(malloc_zone_t *zone, size_t size); // Free memory referred to by the given pointer in the given zone void malloc_zone_free(malloc_zone_t *zone, void *ptr); // Change the size of an existing allocation in the given zone // The "existing" pointer can be NULL void *malloc_zone_realloc(malloc_zone_t *zone, void *ptr, size_t size); // Retrieve the zone, if any, corresponding to the given pointer malloc_zone_t *malloc_zone_from_ptr(const void *ptr); // Retrieve the actual size of the allocation corresponding to the given pointer size_t malloc_size(const void *ptr); // Batch Allocation // Allocate num_requested blocks of memory, each size bytes, from the given zone // The return value is the number of blocks being returned (could be less than // num_requested, including zero if none could be allocated) unsigned malloc_zone_batch_malloc(malloc_zone_t *zone, size_t size,                                   void **results, unsigned num_requested); // Batch Deallocation // Free num allocations referred to in the to_be_freed array of pointers void malloc_zone_batch_free(malloc_zone_t *zone, void **to_be_freed,                             unsigned num);

Malloc Zones in Cocoa

The Cocoa API provides wrappers for several malloc_zone_* functions. For example, NSCreateZone() and NSRecycleZone() can be used to create and destroy, respectively, malloc zones. A malloc zone exists in Cocoa as an NSZone structure.

The NSObject class provides the allocateWithZone: method to create an instance of the receiving class with memory allocated from the specified zone.

The actual zone allocator is implemented in an independent moduleseparate from the malloc layer that consists of the programmer-visible malloc family of functions. The malloc layer uses well-defined functions exported by the zone layer, and in fact, the malloc front-end can support an alternate underlying allocator or even multiple allocators. Figure 830 shows an overview of the communication between the malloc layer and the zone allocator. The latter is called the scalable zone allocator because it uses allocation strategies that scale from very small to very large allocation sizes.

Figure 830. An overview of the Mac OS X malloc implementation

In the malloc layer, a malloc zone is represented by the malloc_zone_t structure, which is a substructure of the szone_t structure. The latter is visible only to the scalable zone layer. The malloc layer provides the malloc_create_zone function for creating zones. This function calls create_scalable_zone(), which creates a new scalable zone by allocating and initializing an szone_t structure. As part of this initialization, the scalable malloc layer populates the malloc_zone_t substructure, a pointer to which is returned to the malloc layer. Before returning, malloc_create_zone() calls malloc_zone_register(), which saves the malloc_zone_t pointer in a global array of malloc zones. Note that although the malloc layer directly calls create_scalable_zone(), it calls the other scalable zone functions only through the function pointers set up in the malloc_zone_t structure. As shown in Figure 830, the scalable zone layer provides standard malloc family functions, batch allocation and deallocation functions, and functions for introspection.

The scalable zone allocator categorizes memory allocation requests based on size into tiny, small, large, and huge requests. As shown in Figure 831, it maintains internal bookkeeping data structures for each of these categories. Tiny and small allocations are made from tiny and small regions, respectively. We earlier said that a zone is analogous to a program's heap. A specific tiny or small region can be regarded as a bounded subheap from which allocations falling within a certain size range are made.

Figure 831. Scalable zone data structures

#define NUM_TINY_SLOTS            32 #define INITIAL_NUM_TINY_REGIONS  24 #define NUM_SMALL_SLOTS           32 #define INITIAL_NUM_SMALL_REGIONS  6 ... typedef struct {     uintptr_t checksum;     void      *previous;     void      *next; } free_list_t; typedef struct {     // Data structure for "compact" (small) pointers     // Low bits represent number of pages, high bits represent address     uintptr_t address_and_num_pages; } compact_range_t; typedef struct {     vm_address_t address;     vm_size_t    size; } vm_range_t; typedef void            *tiny_region_t; typedef void            *small_region_t; typedef compact_range_t large_entry_t; typedef vm_range_t      huge_entry_t; typedef struct {     malloc_zone_t basic_zone; // This substructure is seen by the malloc layer     ...     // Regions for tiny objects     unsigned       num_tiny_regions;     tiny_region_t  *tiny_regions;     void           *last_tiny_free;     unsigned       tiny_bitmap; // Cache of the free lists     free_list_t    *tiny_free_list[NUM_TINY_SLOTS]; // Free lists     ...     // Regions for small objects     unsigned       num_small_regions;     small_region_t *small_regions;     ...     // Large objects     unsigned       num_large_entries;     large_entry_t  *large_entries; // Hashed by location     // Huge objects     unsigned       num_huge_entries;     huge_entry_t   *huge_entries;     ...     // Initial region list     tiny_region_t initial_tiny_regions[INITIAL_NUM_TINY_REGIONS];     small_region_t initial_small_regions[INITIAL_NUM_SMALL_REGIONS]; } szone_t;

Table 87 shows the size ranges and the corresponding allocation quanta for the categories in the case of the 32-bit system library. Table 88 shows the numbers for the 64-bit version.

Table 87. Scalable Malloc Allocation Categories (32-bit)
Allocation Region	Region Size	Allocation Sizes	Allocation Quantum
Tiny	1MB	1496 bytes	16 bytes
Small	8MB	49715,359 bytes	512 bytes
Large		15,36016,773,120 bytes	1 page (4096 bytes)
Huge		16,773,121 bytes and more	1 page (4096 bytes)

Table 88. Scalable Malloc Allocation Categories (64-bit)
Allocation Region	Region Size	Allocation Sizes	Allocation Quantum
Tiny	2MB	1992 bytes	32 bytes
Small	16MB	99315,359 bytes	1024 bytes
Large		15,36016,773,120 bytes	1 page (4096 bytes)
Huge		16,773,121 bytes and more	1 page (4096 bytes)

The following are some noteworthy points about scalable malloc's allocation strategies.

The maximum allocation size for a tiny request is 31 times the tiny allocation quantumthat is, 31 x 16 and 31 x 32 bytes for 32-bit and 64-bit, respectively.
The minimum allocation size for a large request is 15 x 1024 bytes.
The maximum allocation size for a large request is 4095 pages.
Tiny requests are satisfied from tiny regions. Each such region is akin to a heap whose size is 1MB (32-bit) or 2MB (64-bit). Similarly, memory in the small range is allocated from small regions, each of which is akin to an 8MB heap.

Let us now look at the details of the individual strategies.

8.15.2.1. Tiny Allocations

Tiny allocations are made from tiny regions. The allocator divides a tiny region into equal-size pieces called tiny quanta. Each tiny quantum is TINY_QUANTUM bytes in size.

TINY_QUANTUM is the minimum allocation size, regardless of the actual amount requested by the caller. For example, the amounts of memory actually allocated to satisfy allocation requests of one byte, TINY_QUANTUM bytes, and (TINY_QUANTUM+1) bytes are TINY_QUANTUM bytes, TINY_QUANTUM bytes, and 2xTINY_QUANTUM bytes, respectively.

The tiny region is laid out as a contiguous chunk of memory containing NUM_TINY_BLOCKS quanta. Immediately following the last quantum is a metadata area that has the following structure.

At the beginning of the metadata is a header bitmap that contains one bit for each quantum in the tiny region. Thus, it contains NUM_TINY_BLOCKS bits.
Following the header bitmap is a 32-bit pad word all of whose bits are setthat is, its value is 0xFFFF_FFFF.
Following the pad word is an in-use bitmap. Like the header bitmap, this bitmap contains NUM_TINY_BLOCKS bits, one for each tiny quantum.
Following the in-use bitmap is a 32-bit pad word that is not written to by the allocator.

Even though a tiny quantum's size is fixed, a tiny allocation can range from as small as 1 byte to as large as 31xTINY_QUANTUM. Allocations larger than a quantum will consist of multiple contiguous quanta. A set of such contiguous tiny quanta, whether allocated or free, is called a tiny block. The header and in-use bitmaps are used to maintain the notions of free and allocated blocks as follows.

If a given quantum is the first quantum of a block in use, the corresponding bits in both the header bitmap and the in-use bitmap are 1.
If a given quantum is part of a block in use (but is not the first quantum of that block), the corresponding bit in the header bitmap is 0, whereas the corresponding bit in the in-use bitmap is irrelevant.
If a given quantum is the first quantum of a free block, the corresponding bits in the header and in-use bitmaps are 1 and 0, respectively.
If a given quantum is part of a free block (but is not the first quantum of that block), the corresponding bits in both the header bitmap and the in-use bitmap are irrelevant.

When a pointer is freed, the corresponding block's header and in-use bits are appropriately modified. Moreover, information is written to the first few bytes of the pointed-to memory to convert that block into a free block. Free blocks are chained together in free lists. The allocator maintains 32 free lists for tiny region memory. Each list contains free blocks containing a particular number of quantathe first list contains free blocks that are all one quantum in size, the second list contains free blocks that are all two quanta in size, and so on. Although the maximum allocation size contains 31 quanta, it is possible to have free blocks that are larger, since adjacent blocks can be coalesced. The last list holds these extra-large free blocks.

The structure of a free block is as follows:

The first pointer-sized field contains a checksum computed by XOR'ing the free block's previous pointer, next pointer, and the constant CHECKSUM_MAGIC (defined to be 0x357B).
The next pointer-sized field contains a pointer to the previous free block, if any, in the chain. If there is no previous free block, this field contains a 0.
The next pointer-sized field contains a pointer to the next free block, if any, in the chain. Again, the field is 0 if there is no next free block.
The next field is an unsigned short value that specifies the free block's size in units of quanta.

Consider the free and allocated blocks shown in Figure 832. The free block starting at quantum q_i contains m quanta, whereas the allocated block starting at quantum q_k contains n quanta. Bits i and k are both set in the header bitmap. However, only bit k is set in the in-use block.

Figure 832. Internal bookkeeping of tiny allocations (32-bit)

Let us examine the working of the allocator by using the debugger on a simple program. Figure 833 shows a program that performs four tiny allocations. We will examine the allocator's state immediately after performing these allocations. Next, we will free the pointers and examine the state again. We will use a debugger watchpoint to stop execution of the program at the desired locations.

Figure 833. A program that performs tiny allocations

// test_malloc.c (32-bit) #include <stdlib.h> int watch = -1; int main(void) {     void *ptr1, *ptr2, *ptr3, *ptr4;     ptr1 = malloc(490); // 31 TINY_QUANTUMs     ptr2 = malloc(491); // 31 TINY_QUANTUMs     ptr3 = malloc(492); // 31 TINY_QUANTUMs     ptr4 = malloc(493); // 31 TINY_QUANTUMs     watch = 1;          // breakpoint here     free(ptr1);     free(ptr3);     watch = 2;          // breakpoint here     free(ptr2);     free(ptr4);     watch = 3;          // breakpoint here     return 0; }

To examine the tiny region's metadata area, we first need to determine the area's base address. Given a pointer known to be allocated from a tiny region, we can determine the base address of that tiny region since the region is always aligned on a boundary defined by the product of NUM_TINY_BLOCKS and TINY_QUANTUMthat is, the total allocatable size of the tiny region. The allocator ensures this alignment when it allocates memory for the tiny region itself. Therefore, given a tiny region pointer p, the following formula will give us the tiny region's base address:

TINY_REGION_FOR_PTR(p) = p & ~((NUM_TINY_BLOCKS * TINY_QUANTUM) - 1)

Since we are dealing with a 32-bit program, we must use the 32-bit-specific values of NUM_TINY_BLOCKS and TINY_QUANTUM. As shown in Figure 832, these values are 65,536 and 16, respectively. Using these values, the product is calculated to be 0x100000. Our formula then reduces to the following:

TINY_REGION_FOR_PTR(p) = p & 0xFFF00000

Given the tiny region's base address, we can readily compute the metadata area's base address, since the metadata area immediately follows the last tiny quantum. This means that the metadata area starts (NUM_TINY_BLOCKS * TINY_QUANTUM) bytes from the tiny region's base address, which gives us the following formula for the metadata area's base address:

TINY_REGION_END(p) = (p & 0xFFF00000) + 0x100000

The metadata area's base address is where the header bitmap is located. The location of the in-use bitmap can be calculated by adding NUM_TINY_BLOCKS/8 bytes (the size of the header bitmap) and a further 4 bytes (the size of the pad word). We finally have the following expressions:

HEADER_BITMAP_32(p) = (p & 0xFFF00000) + 0x100000 INUSE_BITMAP_32(p)  = HEADER_BITMAP_32(p) + 0x2004

Now that we know how to calculate the addresses we wish to look at, let us compile and run the program shown in Figure 833.

$ gcc -Wall -g -o test_malloc test_malloc.c $ gdb ./test_malloc ... (gdb) watch watch Hardware watchpoint 1: watch (gdb) run Starting program: /private/tmp/test_malloc Reading symbols for shared libraries . done Hardware watchpoint 1: watch Old value = -1 New value = 1 main () at test_malloc.c:18 18          free(ptr1); (gdb) where full #0  main () at test_malloc.c:18         ptr1 = (void *) 0x300120         ptr2 = (void *) 0x300310         ptr3 = (void *) 0x300500         ptr4 = (void *) 0x3006f0 (gdb)

When the first watchpoint is hit, our program has performed the four tiny allocations. Given the pointer values and the formulas devised earlier, we see that for all four pointers, the tiny region's base address is 0x300000, the header bitmap is at address 0x400000, and the in-use bitmap is at address 0x400000+0x2004. Thus, the pointers are 0x120, 0x310, 0x500, and 0x6f0 bytes away, respectively, from the base address. We can divide these distances by TINY_QUANTUMthat is, 16to get the starting quantum number for each pointer. The starting quanta are 18, 49, 80, and 111, with the first quantum being numbered 0. If we call the first bit in a bitmap as bit 0, we should examine bits 18, 49, 80, and 111 in both the header and in-use bitmaps.

(gdb) x/4x 0x400000 0x400000:       0x25920400      0x00000200      0x00000100      0x00800000 (gdb) x/4x 0x400000+0x2004 0x402004:       0x25920400      0x00000200      0x00000100      0x00800000

Indeed, each of the four bits is set in both the bitmaps. Note that while accessing a bit, the allocator first locates the byte that contains the bit. The leftmost byte is byte 0, the next byte to the right is byte 1, and so on. Within a byte, the lowest numbered bit is the rightmost one.

Let us continue the program, which will cause ptr1 and ptr3 to be freed before the next watchpoint is hit.

(gdb) cont Continuing. Hardware watchpoint 1: watch Old value = 1 New value = 2 main () at test_malloc.c:22 22          free(ptr2);

At this point, we expect the header bitmap to remain unchanged, but bits 18 and 80corresponding to ptr1 and ptr3should have been cleared in the in-use bitmap.

(gdb) x/4x 0x400000 0x400000:       0x25920400      0x00000200      0x00000100      0x00800000 (gdb) x/4x 0x400000+0x2004 0x402004:       0x25920000      0x00000200      0x00000000      0x00800000

Moreover, the memory contents of ptr1 and ptr3 should have been populated to convert the corresponding blocks to free blocks.

(gdb) x/4x ptr1 0x300120:       0x0030307b      0x00300500      0x00000000      0x001f0000 (gdb) x/4x ptr3 0x300500:       0x0030345b      0x00000000      0x00300120      0x001f0000

Recall that the free block begins with a checksum, which is followed by pointers to previous and next free blocks, with a trailing unsigned short value specifying the size of the block in units of quanta. We see that the freed blocks are chained on the same free listptr1's previous pointer points to ptr3, whereas ptr3's next pointer points to ptr1. This is expected, as both blocks have the same number of quanta (0x1f, or 31). Moreover, we can verify that the two checksums are correct by XOR'ing the relevant pointers with the magic number we saw earlier (0x357B).

Let us continue the program again, which will free ptr2 and ptr4 before the next watchpoint is hit.

(gdb) cont Continuing. Hardware watchpoint 1: watch Old value = 2 New value = 3 main () at test_malloc.c:26 26          return 0; (gdb) x/4x 0x400000 0x400000:       0x25920400      0x00000200      0x00000000      0x00800000 (gdb) x/4x 0x400000+0x2004 0x402004:       0x25920000      0x00000200      0x00000000      0x00800000 (gdb) x/4x ptr2 0x300310:       0x00000000      0x00000000      0x00000000      0x00000000 (gdb) x/4x ptr4 0x3006f0:       0x00000000      0x00000000      0x00000000      0x00000000 (gdb)

We see that although ptr2 and ptr4 have been freed, the corresponding bits in both the header and in-use bitmaps are unchanged. Moreover, there is no free block information in the contents of ptr2 and ptr4. This is so because the allocator has coalesced the four adjacent free blocks into a single big free block starting at ptr1. Examining ptr1's contents should confirm this.

(gdb) x/4x ptr1 0x300120:       0x0000357b      0x00000000      0x00000000      0x007c0000

The free block at ptr1 is now the only block on its free list, as both its previous and next pointers contain zeros. Moreover, the number of quanta in this free block is 0x7c (124), which is as expected.

8.15.2.2. Small Allocations

Small allocations are handled in a manner conceptually similar to tiny allocations. Figure 834 shows the key data structures involved in small allocations. Besides the larger quantum size of a small region, the metadata area following the region is structured differently from the tiny region metadata area: Instead of the header and in-use bitmaps, the small region metadata area is an array of 16-bit quantities. The i^th element of this array provides two pieces of information about the i^th quantum in the small region. If the most significant bit of the array element is set, the corresponding quantum is the first quantum of a free block. The remaining 15 bits, which are nonzero only in a first quantum (whether of a free block or an allocated block), represent the number of quanta contained in the block. Like the set of tiny free lists, there are 32 small free lists, each of which is used for referring to free objects of the corresponding quantum size.

Figure 834. Internal bookkeeping of small malloc allocations (32-bit)

8.15.2.3. Large Allocations

Large allocations are described by large_entry_t structures. As shown in Figure 831, a large_entry_t structure is an alias for a compact_range_t, which contains a single field called address_and_num_pages that represents both an address (high bits) and the number of pages allocated at that address (low bits). Since the maximum large allocation is 4095 pages, the low 12 bits are used to specify the number of pages. Large allocations are performed through vm_allocate(), which always returns a page-aligned addressthat is, the low 12 bits of the address are always zero. Therefore, it can be OR'ed with the number of pages to yield address_and_num_pages with no loss of information. The allocator tracks large entries by hashing their addresses in a table referenced by the large_entries field of the szone_t structure. An entry's hash index is its address modulo the total number of entries. If there is a hash collision, the index is incremented by one, and so onwhen the index becomes equal to the size of the table, it wraps around to zero. If the number of large allocations in use becomes higher than approximately 25% of the number of entries in the hash table, the table's density is deemed too high, and the allocator grows the hash table.

8.15.2.4. Huge Allocations

Huge allocations have a relatively simpler bookkeeping. For each huge allocation, the allocator maintains a huge_entry_t structure. These structures are kept in the array referenced by the huge_entries field of the szone_t structure. Upon a huge entry allocation, which is performed through vm_allocate(), the allocator grows this array by internally allocating memory, copying the old entries to the grown array, and freeing the old array. As shown in Figure 831, huge_entry_t is an alias for vm_range_t, which contains an address and a size.

8.15.3. The `malloc()` Routine

The various malloc API functions, such as malloc() and free(), are simple wrappers around internal functions that in turn call the scalable zone functions through the function pointers in the malloc_zone_t structure corresponding to the default zone. To use any other zone than the default, the malloc_zone_* functions from Figure 829 must be used directly.

Figure 835 shows the processing of the malloc() library function, which simply calls malloc_zone_malloc(), with the return value from inline_malloc_default_zone() being the zone to allocate from. The inline function checks the value of the global variable malloc_num_zones to determine whether it is being called for the first time. If so, it first initializes the malloc package by creating a scalable zone, which will be used as the default malloc zone. It calls malloc_create_zone(), which first checks for the presence of several environment variables that affect malloc's behavior. These variables are documented in the malloc(3) manual page, and a summary of the available variables can be printed by running any program that uses malloc() with the MallocHelp environment variable set.

$ MallocHelp=1 /usr/bin/true (10067) malloc: environment variables that can be set for debug: - MallocLogFile <f> to create/append messages to file <f> instead of stderr - MallocGuardEdges to add 2 guard pages for each large block - MallocDoNotProtectPrelude to disable protection (when previous flag set) - MallocDoNotProtectPostlude to disable protection (when previous flag set) ... - MallocBadFreeAbort <b> to abort on a bad free if <b> is non-zero - MallocHelp - this help!

Figure 835. Processing of the `malloc()` function in the system library

Once the default zone is available, malloc_zone_malloc() calls the malloc() function exported by the scalable malloc layer. The latter functionszone_malloc()calls the szone_malloc_should_clear() internal function with the cleared_request Boolean parameter set to false. In contrast, the szone_calloc() function calls szone_malloc_should_clear() with cleared_request set to true. As shown in Figure 835, szone_malloc_should_clear() classifies the allocation request based on size and dispatches it to the appropriate handler.

While changing the size of an allocation because of a realloc() call, the allocator first attempts to reallocate in place. If that fails, a new buffer is allocated, into which the allocator copies the contents of the old buffer. If the old buffer had at least VM_COPY_THRESHOLD bytes (defined to be 40KB) of memory, the allocator uses vm_copy() for copying. If the old buffer had less memory, or if vm_copy() fails, the allocator uses memcpy().

8.15.4. The Largest Single Allocation (32-bit)

Depending on a program's needs, it may be affected by the most amount of contiguous memory it can allocate. Given the size of a 64-bit virtual address space, a 64-bit program is highly unlikely to face this issue. However, a 32-bit address space is limited to 4GB of virtual memory, not all of which is available to the program. As we saw in Table 84, several virtual address ranges are unavailable for the program's use as the system uses these ranges for predefined mappings.

The program shown in Figure 836 can be used to determine the maximum size of a single allocation, which can be no larger than the largest free contiguous virtual address range. Note that the exact number is likely to differ across operating system versions and even across specific installations of the same version, although the difference may not be great.

Figure 836. Determining the size of the largest single `malloc()` allocation

// large_malloc.c #include <stdio.h> #include <stdlib.h> #include <limits.h> #include <mach/mach.h> #define PROGNAME "large_malloc" int main(int argc, char **argv) {     void *ptr;     unsigned long long npages;     if (argc != 2) {         fprintf(stderr, "usage: %s <allocation size in pages>\n", PROGNAME);         exit(1);     }     if ((npages = strtoull(argv[1], NULL, 10)) == ULLONG_MAX) {         perror("strtoull");         exit(1);     }     if ((ptr = malloc((size_t)(npages << vm_page_shift))) == NULL)         perror("malloc");     else         free(ptr);     exit(0); } $ gcc -Wall -o large_malloc large_malloc.c $ ./large_malloc 577016 $ ./large_malloc 577017 large_malloc(786) malloc: *** vm_allocate(size=2363461632) failed (error code=3) large_malloc(786) malloc: *** error: can't allocate region large_malloc(786) malloc: *** set a breakpoint in szone_error to debug malloc: Cannot allocate memory

As the output in Figure 836 shows, on this particular system, at the instant the program was run, a contiguous allocation is limited to 577,016 pages, or 2,363,457,536 bytes (approximately 2.2GB). Note that there still exist other, smaller free virtual address ranges, so the program could allocate a sum total that may be considerably larger than the single largest allocation. For example, on the system shown in Figure 836, allocating another 1GB (approximately) of memory through malloc() succeeds, after which several much smaller allocations can still be made before the process's virtual address space is exhausted.

8.15.5. The Largest Single Allocation (64-bit)

Let us see how much virtual memory we can allocate using mach_vm_allocate() in a 64-bit task. Figure 837 shows a program that attempts to allocate the amount of memory given as a command-line argument.

Figure 837. Allocating 2 petabytes of virtual memory

// max_vm_allocate.c #include <stdio.h> #include <stdlib.h> #include <limits.h> #include <mach/mach.h> #include <mach/mach_vm.h> #define PROGNAME "max_vm_allocate" int main(int argc, char **argv) {     kern_return_t      kr;     unsigned long long nbytes;     mach_vm_address_t  address;     if (argc != 2) {         fprintf(stderr, "usage: %s <number of bytes>\n", PROGNAME);         exit(1);     }     if ((nbytes = strtoull(argv[1], NULL, 10)) == ULLONG_MAX) {         fprintf(stderr, "invalid number of bytes specified\n");         exit(1);     }     kr = mach_vm_allocate(mach_task_self(), &address,                           (mach_vm_size_t)nbytes, TRUE);     if (kr == KERN_SUCCESS) {         printf("allocated %llu bytes at %p\n", nbytes, (void *)address);         mach_vm_deallocate(mach_task_self(), address, (mach_vm_size_t)nbytes);     } else         mach_error("mach_vm_allocate:", kr);     exit(0); } $ gcc -arch ppc64 -Wall -o max_vm_allocate max_vm_allocate.c $ ./max_vm_allocate 2251793095786496 allocated 2251793095831552 bytes at 0x8feb0000 $ ./max_vm_allocate 2251793095786497 mach_vm_allocate: (os/kern) no space available

8.15.6. Enumerating All Pointers

The scalable zone implementation provides several functions for debugging and analysis. The malloc layer exports some of this functionality to user programs through functions such as malloc_zone_from_ptr(), malloc_zone_get_all_zones(), malloc_zone_print(), malloc_zone_print_ptr_info(), malloc_zone_statistics(), and malloc_zone_log(). Moreover, those scalable zone functions that have function pointers in the malloc_zone_t structure or the szone_t structure can also be called by a user program, although doing so would be a layering violation since the szone_t structure is meant to be opaque in the malloc layer.

Note that we cannot use printf(3) in our program since printf()'s implementation itself uses malloc(). We will use a custom printf()-like functionlet us call it nomalloc_printf()that does not use printf() and in turn does not use malloc(). Figure 838 shows the implementation of nomalloc_printf(). We will also use this function in other examples that follow.

Figure 838. Implementing a version of the `printf()` function without using `malloc()`

// nomalloc_printf.h #ifndef _NOMALLOC_PRINTF_H #define _NOMALLOC_PRINTF_H #include <stdarg.h> extern void _simple_vdprintf(int, const char *, va_list); inline void nomalloc_printf(const char *format, ...) {     va_list ap;     va_start(ap, format);     _simple_vdprintf(STDOUT_FILENO, format, ap);     va_end(ap); } #endif

The program shown in Figure 839 calls a pointer enumeration function implemented by the scalable zone layer. The function is available to the malloc layer through the enumerator field of the malloc_introspection_t substructure within the malloc_zone_t structure. One of the arguments to the function is a pointer to a recorder functiona callback invoked for each allocated range encountered. The function also accepts a type mask that limits the enumeration to specific types of memory.

Figure 839. Enumerating all `malloc()`-allocated pointers in a program

// malloc_enumerate.c #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <limits.h> #include <malloc/malloc.h> #include <mach/mach.h> #include "nomalloc_printf.h" struct recorder_args {     const char *label;     unsigned type_mask; } recorder_args[] = {     { "Allocated pointers\n",            MALLOC_PTR_IN_USE_RANGE_TYPE   },     { "\nRegions containing pointers\n", MALLOC_PTR_REGION_RANGE_TYPE   },     { "\nInternal regions\n",            MALLOC_ADMIN_REGION_RANGE_TYPE }, }; void my_vm_range_recorder(task_t task, void *context, unsigned type_mask,                      vm_range_t *ranges, unsigned range_count) {     vm_range_t *r, *end;     for (r = ranges, end = ranges + range_count; r < end; r++)         nomalloc_printf("%16p   %u\n", r->address, r->size); } int main(int argc, char **argv) {     int                 i;     void               *ptr = NULL;     unsigned long long  size;     malloc_zone_t      *zone;     if (!(zone = malloc_default_zone()))         exit(1);     if (argc == 2) { // allocate the requested size         if ((size = strtoull(argv[1], NULL, 10)) == ULLONG_MAX) {             fprintf(stderr, "invalid allocation size (%s)\n", argv[1]);             exit(1);         }         if ((ptr = malloc((size_t)size)) == NULL) {             perror("malloc");             exit(1);         }     }     for (i = 0; i < sizeof(recorder_args)/sizeof(struct recorder_args); i++) {         nomalloc_printf("%s         address   bytes\n", recorder_args[i].label);         zone->introspect->enumerator(mach_task_self(),            // task                                      NULL,                        // context                                      recorder_args[i].type_mask,  // type                                      (vm_address_t)zone,                                      NULL,                        // reader                                      my_vm_range_recorder);       // recorder     }     exit(0); } $ gcc -Wall -o malloc_enumerate malloc_enumerate.c $ ./malloc_enumerate 8192 Allocated pointers          address   bytes         0x300000   32         0x300020   48         0x300050   64         0x300090   48         0x3000c0   48         0x3000f0   48        0x1800000   1024        0x1800400   8192 Regions containing pointers          address   bytes         0x300000   1048576        0x1800000   8388608 Internal regions          address   bytes         0x400000   20480         0x300000   1048576        0x2000000   32768        0x1800000   8388608

8.15.7. Displaying Scalable Zone Statistics

The program shown in Figure 840 retrieves and displays statistics on the various types of malloc regions. In particular, you can use this program to see how the allocator classifies an allocation request as tiny, small, large, or huge based on the request's size.

Figure 840. Displaying scalable-zone statistics

// scalable_zone_statistics.c #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <limits.h> #include <malloc/malloc.h> #include "nomalloc_printf.h" #define PROGNAME "scalable_zone_statistics" enum { TINY_REGION, SMALL_REGION, LARGE_REGION, HUGE_REGION }; extern boolean_t scalable_zone_statistics(malloc_zone_t *,                                           malloc_statistics_t *, unsigned); void print_statistics(const char *label, malloc_statistics_t *stats) {     nomalloc_printf("%8s%16u%16lu%16lu", label, stats->blocks_in_use,                      stats->size_in_use, stats->max_size_in_use);     if (stats->size_allocated != -1)         nomalloc_printf("%16lu\n", stats->size_allocated);     else         printf("%16s\n", "-"); } int main(int argc, char **argv) {     void                *ptr = NULL;     unsigned long long   size;     malloc_statistics_t  stats;     malloc_zone_t       *zone;     if (!(zone = malloc_default_zone()))         exit(1);     if (argc == 2) {         if ((size = strtoull(argv[1], NULL, 10)) == ULLONG_MAX) {             fprintf(stderr, "invalid allocation size (%s)\n", argv[1]);             exit(1);         }         if ((ptr = malloc((size_t)size)) == NULL) {             perror("malloc");             exit(1);         }     }     nomalloc_printf("%8s%16s%16s%16s%16s\n", "Region", "Blocks in use",                      "Size in use", "Max size in use", "Size allocated");     scalable_zone_statistics(zone, &stats, TINY_REGION);     print_statistics("tiny", &stats);     scalable_zone_statistics(zone, &stats, SMALL_REGION);     print_statistics("small", &stats);     scalable_zone_statistics(zone, &stats, LARGE_REGION);     stats.size_allocated = -1;     print_statistics("large", &stats);     scalable_zone_statistics(zone, &stats, HUGE_REGION);     stats.size_allocated = -1;     print_statistics("huge", &stats);     if (ptr)         free(ptr);     exit(0); } $ gcc -Wall -o scalable_zone_statistics scalable_zone_statistics.c $ ./scalable_zone_statistics 496   Region   Blocks in use     Size in use Max size in use  Size allocated     tiny               7             784           21264         1069056    small               1               0           33792         8421376    large               0               0               0               -     huge               0               0               0               - $ ./scalable_zone_statistics 497   Region   Blocks in use     Size in use Max size in use  Size allocated     tiny               6             288           20768         1069056    small               2             512           34304         8421376    large               0               0               0               -     huge               0               0               0               - $ ./scalable_zone_statistics 15360   Region   Blocks in use     Size in use Max size in use  Size allocated     tiny               6             288           20768         1069056    small               2             512           34304         8421376    large               1           16384           16384               -     huge               0               0               0               - $ ./scalable_zone_statistics 16777216   Region   Blocks in use     Size in use Max size in use  Size allocated     tiny               7             304           20784         1069056    small               1               0           33792         8421376    large               0               0               0               -     huge               1        16777216        16777216               -

8.15.8. Logging Malloc Operations

The malloc implementation supports logging of malloc operations to help in the analysis of memory-related bugs. The MallocStackLogging environment variable can be set to cause the allocator to remember the function call stack at the time of each allocation. A variantthe MallocStackLoggingNoCompact environment variablecauses all allocations to be logged, regardless of their sizes or lifetimes. Mac OS X provides several tools for memory-related debugging, for example, heap, leaks, malloc_history, and MallocDebug.app.^[23]

^[23] MallocDebug.app is part of the Apple Developer Tools package.

The malloc layer allows a custom malloc logger to be installed by setting the malloc_logger global function pointer. In fact, setting the aforementioned environment variables results in this pointer being set to an internal logger function. Figure 841 shows a program that implements its own malloc logging through this mechanism.

Figure 841. Logging malloc operations

// malloc_log.c #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <malloc/malloc.h> #include "nomalloc_printf.h" // defined in Libc/gen/malloc.c #define MALLOC_LOG_TYPE_ALLOCATE   2 #define MALLOC_LOG_TYPE_DEALLOCATE 4 #define MALLOC_LOG_TYPE_HAS_ZONE   8 #define MALLOC_LOG_TYPE_CLEARED    64 #define MALLOC_OP_MALLOC  (MALLOC_LOG_TYPE_ALLOCATE|MALLOC_LOG_TYPE_HAS_ZONE) #define MALLOC_OP_CALLOC  (MALLOC_OP_MALLOC|MALLOC_LOG_TYPE_CLEARED) #define MALLOC_OP_REALLOC (MALLOC_OP_MALLOC|MALLOC_LOG_TYPE_DEALLOCATE) #define MALLOC_OP_FREE    (MALLOC_LOG_TYPE_DEALLOCATE|MALLOC_LOG_TYPE_HAS_ZONE) typedef void (malloc_logger_t)(unsigned, unsigned, unsigned, unsigned, unsigned,                                unsigned); // declared in the Libc malloc implementation extern malloc_logger_t *malloc_logger; void print_malloc_record(unsigned type, unsigned arg1, unsigned arg2, unsigned arg3,                     unsigned result, unsigned num_hot_frames_to_skip) {     switch (type) {     case MALLOC_OP_MALLOC: // malloc() or valloc()     case MALLOC_OP_CALLOC:         nomalloc_printf("%s : zone=%p, size=%u, pointer=%p\n",                         (type == MALLOC_OP_MALLOC) ? "malloc" : "calloc",                         arg1, arg2, result);         break;     case MALLOC_OP_REALLOC:         nomalloc_printf("realloc: zone=%p, size=%u, old pointer=%p, "                         "new pointer=%p\n", arg1, arg3, arg2, result);         break;     case MALLOC_OP_FREE:         nomalloc_printf("free   : zone=%p, pointer=%p\n", arg1, arg2);         break;     } } void do_some_allocations(void) {     void *m, *m_new, *c, *v, *m_z;     malloc_zone_t *zone;     m = malloc(1024);     m_new = realloc(m, 8192);     v = valloc(1024);     c = calloc(4, 1024);     free(m_new);     free(c);     free(v);     zone = malloc_create_zone(16384, 0);     m_z = malloc_zone_malloc(zone, 4096);     malloc_zone_free(zone, m_z);     malloc_destroy_zone(zone); } int main(void) {     malloc_logger = print_malloc_record;     do_some_allocations();     return 0; } $ gcc -Wall -o malloc_log malloc_log.c $ ./malloc_log malloc : zone=0x1800000, size=1024, pointer=0x1800400 realloc: zone=0x1800000, size=8192, old pointer=0x1800400, new pointer=0x1800800 malloc : zone=0x1800000, size=1024, pointer=0x6000 calloc : zone=0x1800000, size=4096, pointer=0x1802a00 free   : zone=0x1800000, pointer=0x1800800 free   : zone=0x1800000, pointer=0x1802a00 free   : zone=0x1800000, pointer=0x6000 malloc : zone=0x2800000, size=4096, pointer=0x2800400 free   : zone=0x2800000, pointer=0x2800400

8.15.9. Intercepting the Malloc Layer

Since the malloc layer calls the allocator functions through function pointers, we can easily intercept invocations of these functionssay, for debugging or for experimenting with an alternate allocator. Specifically, we can retrieve a pointer to the default zone by calling malloc_default_zone(). We can then save the original function pointers from the malloc_zone_t structure and insert pointers to our own functions in the structure. Thereafter, we can call the original functions from our functions or provide an alternate allocator implementation altogether. Figure 842 shows an example of using such interception.

Figure 842. Intercepting the malloc layer

// malloc_intercept.c #include <stdlib.h> #include <unistd.h> #include <malloc/malloc.h> #include "nomalloc_printf.h" void *(*system_malloc)(malloc_zone_t *zone, size_t size); void (*system_free)(malloc_zone_t *zone, void *ptr); void * my_malloc(malloc_zone_t *zone, size_t size) {     void *ptr = system_malloc(zone, size);     nomalloc_printf("%p = malloc(zone=%p, size=%lu)\n", ptr, zone, size);     return ptr; } void my_free(malloc_zone_t *zone, void *ptr) {     nomalloc_printf("free(zone=%p, ptr=%p)\n", zone, ptr);     system_free(zone, ptr); } void setup_intercept(void) {     malloc_zone_t *zone = malloc_default_zone();     system_malloc = zone->malloc;     system_free = zone->free;     // ignoring atomicity/caching     zone->malloc = my_malloc;     zone->free = my_free; } int main(void) {     setup_intercept();     free(malloc(1234));     return 0; } $ gcc -Wall -o malloc_intercept malloc_intercept.c $ ./malloc_intercept 0x1800400 = malloc(zone=0x1800000, size=1234) free(zone=0x1800000, ptr=0x1800400)