Section 8.18. 64-bit Computing

8.18. 64-bit Computing

When introduced in 1991, the MIPS R4000 was the world's first 64-bit processor. The term 64-bit meant several things when applied to the R4000, for example:

A 64-bit virtual address space (although the maximum user process size was limited to 40 bits on the R4000)
A 64-bit system bus
The 64-bit general-purpose (integer) registers
A 64-bit ALU and a 64-bit on-chip FPU
A 64-bit natural mode of operation, with support for 32-bit operation with integer registers acting as 32-bit registers

A processor is informally considered a 64-bit processor if it has 64-bit general-purpose registers and can support 64-bit (or at least "much more" than 32-bit) virtual memory. Moreover, the operating system must explicitly make use of the processor's 64-bit capabilities for 64-bit computing to materialize.

The introduction and evolution of 64-bit computing with Mac OS X can be summarized as follows:

The G5 (PowerPC 970, specifically) was the first 64-bit processor to be used in a Macintosh computer.
Mac OS X 10.3 was the first Apple operating system to support more than 4GB of physical memory on 64-bit hardware. User virtual address spaces were still 32-bit-only.
Mac OS X 10.4 was the first Apple operating system to support 64-bit user virtual address spaces on 64-bit hardware.

How Many Bits?

The G4, which is a 32-bit processor, contains 64-bit and even 128-bit registers. We saw in Chapter 3 that the floating-point registers are 64 bits wide and the vector registers are 128 bits wide on both the G4 and the G5. What makes the G5 a 64-bit processor is that it has 64-bit general-purpose registers, and it can use 64-bit virtual addressing. A 64-bit-wide C data type such as a long long resides in a single register when the G5 is operating as a 64-bit processor; however, on the G4 (or the G5 operating as a 32-bit processor), a long long is split into two 32-bit quantities, occupying two registers. Consequently, integer math and logical operations require more instructions and more registers.

8.18.1. Reasons for 64-bit Computing

Often 64-bit computing is (incorrectly) understood to be invariably conducive to performance. Although in some cases this may be true, usually only programs with very specific needs benefit from 64-bit computing. Whether a program performs better just by virtue of being 64-bit depends on whether the processor performs better in its 64-bit mode, perhaps because its 64-bit instructions operate on more data at the same time. Another, more important reason for justifying 64-bit computing is the substantially larger address space it provides. Let us look at some of these reasons in the context of Mac OS X on the G5.

8.18.1.1. 32-bit Execution on 64-bit PowerPC

In general, 64-bit processors and operating systems allow simultaneous coexistence of 32-bit and 64-bit programs. However, architectures differ in how a 64-bit processor performs when in 32-bit mode. As we saw in Chapter 3, the PowerPC began life with a 64-bit architecture that had a 32-bit subset. When a 64-bit PowerPC implementation (such as the G5) operates in 32-bit computation mode, there is no great performance penalty as is the case with some other processor architectures. In particular, the following aspects are noteworthy about the 32-bit operation of a 64-bit PowerPC.

All 64-bit instructions are available.
All 64-bit registers are available.
The processor's use of busses, caches, data paths, execution units, and other internal resources is the same regardless of the operating mode.

The current computation mode is determined by bit 0the SF (Sixty Four) bitof the Machine State Register (MSR). The processor runs in 64-bit mode when this bit's value is 1.

However, there are important differences between the two computation modes.

An effective address is treated as a 32-bit address in 32-bit mode. 32-bit load/store instructions ignore the upper 32 bits of memory addresses. Note that the address computations actually produce 64-bit addresses in 32-bit modethe upper 32 bits are ignored as a software convention.
Condition codes (such as carry, overflow, and zero bits) are set per 32-bit arithmetic in 32-bit mode.
When branch conditional instructions test the Count Register (CTR), they use 32-bit conventions in 32-bit mode.

The available instructions, the number of available registers, and the width of these registers all remain the same in both 64-bit and 32-bit computation modes. In particular, you can perform hardware-optimized 64-bit integer arithmetic in 32-bit programs, albeit with some caveats. However, the 32-bit ABI will use the same conventions for passing parameters, saving nonvolatile registers, and returning values, regardless of which instructions are used. Consequently, using full 64-bit registers from a nonleaf function (one that calls at least one other function) in a 32-bit program is not safe.

Let us consider an example. The cntlzd instruction is a 64-bit-only instruction that counts the number of consecutive zero bits starting at bit 0 of its second operand, placing the count in the first operand. Consider the program shown in Figure 855. The main function causes this instruction to execute in two ways: first, by calling another function, and second, by using inline assembly.

Figure 855. Using a 64-bit-only instruction

; cntlzd.s         .text         .align 2 #ifndef __ppc64__         .machine ppc970 #endif         .globl _cntlzd _cntlzd:         cntlzd r3,r3         blr // cntlzd_main.c #include <stdio.h> #include <stdint.h> extern uint64_t cntlzd(uint64_t in); int main(void) {     uint64_t out;     uint64_t in = 0x4000000000000000LL;     out = cntlzd(in);     printf("%lld\n", out);     __asm("cntlzd %0,%1\n"           : "=r"(out)           : "r"(in)     );     printf("%lld\n", out);     return 0; }

We can attempt to compile the source shown in Figure 855 in several ways, as shown in Table 89.

Table 89. Compiling for a 64-bit PowerPC Target
Compiler Options	Description	Result
No special options	Compile normally, as a 32-bit program.	Will not compile.
`-force_cpu_subtype_ALL`	Compile as a 32-bit program, but force 64-bit instructions to be accepted by the compiler.	Will run only on 64-bit hardware, but both uses of `cntlzd` will produce undesirable results.
`-mpowerpc64 -mcpu=G5`	Compile as a 32-bit program, with explicit support for 64-bit instructions on 64-bit hardware.	Will run only on 64-bit hardware. The inline usage of `cntlzd` will produce the desired result, but the function call version will not, because `main()` will pass the 64-bit argument to `cntlzd()` as two 32-bit quantities in two GPRs.
`-arch ppc64`	Compile as a 64-bit program.	Will run only on 64-bit hardware and produce the desired result in both uses of `cntlzd`.

Let us look at some examples of using the information in Table 89.

[View full width]

$ gcc -Wall -o cntlzd_32_32 cntlzd_main.c cntlzd.s /var/tmp//ccozyb9N.s:38:cntlzd instruction is only for 64-bit implementations (not allowed without -force_cpusubtype_ALL option) cntlzd.s:6:cntlzd instruction is only for 64-bit implementations (not allowed without

-force_cpusubtype_ALL option) $ gcc -Wall -force_cpusubtype_ALL -o cntlzd cntlzd_main.c cntlzd.s $ ./cntlzd 141733920768 141733920768

$ gcc -Wall -mpowerpc64 -mcpu=G5 -o cntlzd cntlzd_main.c cntlzd.s $ ./cntlzd 141733920768 1 $ gcc -Wall -arch ppc64 -o cntlzd cntlzd_main.c cntlzd.s $ ./cntlzd 1 1

Enabling 64-bit instructions in a 32-bit PowerPC program sets the CPU subtype in the Mach-O header to ppc970, which prevents execve() from running it on 32-bit hardware.

8.18.1.2. Need for Address Space

The need for more than 4GB of virtual address space is perhaps the most justifiable reason for 64-bit computing on the PowerPC. That said, even 32-bit Mac OS X programs can benefit from 64-bit hardware with more than 4GB of physical memory. Such systems are supported beginning with Mac OS X 10.3. A 32-bit program could use mmap() and munmap() to switch between multiple windows of disk-backed memory. The sum of all the window sizes could be larger than 4GB, even though the program would not be able to address more than 4GB of virtual memory at any given time. Since the Mac OS X buffer cache is greedy, it will consume all available physical memory, keeping as much data as possible resident, provided the file descriptors corresponding to the various mappings are kept open. This approach is tantamount to a program handling its own paging, whereas in the case of a 64-bit address space, the kernel would handle the paging.

This approach, although workable, is still a compromise. Depending on the specific needs of a memory-hungry program, the approach may be merely inconvenient, or it may be unacceptable.

8.18.1.3. Large-File Support

One aspect sometimes associated with 64-bit computing is large-file supportthat is, the operating system's ability to use file offsets larger than 32 bits wide. A 32-bit signed offset can address only up to 2GB of data in a file. Besides support from the file system to house large files, you need larger offsetssay, 64 bits wide, for convenienceto use such files. However, large-file support does not require 64-bit hardware: Numbers larger than a hardware register can be synthesized using multiple registers on 32-bit hardware. Many operating systems, including Mac OS X, provide large-file support on 32-bit and 64-bit hardware alike.

The off_t data type, which is used by relevant system calls, is a 64-bit signed integer on Mac OS X, allowing file-system-related calls to handle 64-bit offsets in 32-bit programs. The size_t data type is defined to be an unsigned long integer, which is 32 or 64 bits wide, respectively, in the 32-bit and 64-bit environments.

8.18.2. Mac OS X 10.4: 64-bit User Address Spaces

The primary user-visible aspect of 64-bit computing in Mac OS X 10.4 is that you can have a user-space program with a 64-bit virtual address space, which allows the program to conveniently/concurrently use more than 4GB of virtual memory. The PowerPC version of Mac OS X explicitly supports binaries for two architectures: ppc and ppc64, with respective executable formats (Mach-O and Mach-O 64-bit). When a ppc64 binary runs, the corresponding process can concurrently address more than 4GB of virtual memory.

Both ppc64 and ppc versions of an executable can be contained in a single file and executed transparently by using fat files. On 64-bit hardware, the execve() system call selects the ppc64 executable from a fat file that contains both ppc64 and ppc executables.

8.18.2.1. Data Model

The Mac OS X 64-bit environment uses the LP64 data model, as do most other 64-bit operating systems. The letters L and P in LP64 mean that the long and pointer data types are 64 bits wide. The integer data type remains 32 bits wide in this model. LP64 is also known as 4/8/8 for this reason. ILP64 (8/8/8) and LLP64 (4/4/8) are alternative modelsthe I in ILP64 represents the integer data type. Table 810 shows the models used by 64-bit versions of several operating systems. As seen in the table, the pointer data type is 64 bits wide in all models. In contrast, the 32-bit Mac OS X environment uses the ILP32 data model, in which the integer, long, and pointer data types all are 32 bits wide. In both the LP64 and ILP32 models, the following relationship holds:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)

Table 810. A Sampling of Abstract Data Models in 64-bit-Capable Operating Systems
Operating System/Platform	Data Model
Mac OS X 10.4	LP64
AIX	LP64
Cray (various operating systems)	ILP64
Digital UNIX	LP64
HP-UX	LP64
IRIX	LP64
Linux	LP64
NetBSD (alpha, amd64, sparc64)	LP64
Solaris	LP64
Tru64	LP64
Windows	LLP64 (also known as P64)
z/OS	LP64

8.18.2.2. Implementation

Although Mac OS X 10.4 supports 64-bit user programs, the kernel is still 32-bit.^[26] Although the kernel manages as much physical memory as the system can support, it does not directly address more than 4GB of physical memory concurrently. To achieve this, the kernel uses appropriately sized data structures to keep track of all memory, while itself using a 32-bit virtual address space with 32-bit kernel pointers. Similarly, device drivers and other kernel extensions remain 32-bit. Figure 856 shows a conceptual view of 64-bit support in Mac OS X 10.4.

^[26] In fact, a given version of Mac OS uses the same kernel executable for all supported Apple computer models.

Figure 856. An overview of 64-bit support in Mac OS X

The kernel uses addr64_t, defined to be a 64-bit unsigned integer, as the basic effective address type. An addr64_t is passed and returned as two adjacent 32-bit GPRs, regardless of the register width of the underlying processor. This data type is used in the kernel for common code that is used unchanged on 32-bit and 64-bit machines. For example, the pmap interface routines use addr64_t as the address data type. The kernel also uses the 64-bit long long data type (equivalent to addr64_t) for various VM subsystem entities. It internally converts between long long (or addr64_t) parameters and single 64-bit register values.

// osfmk/mach/memory_object_types.h typedef unsigned long long memory_object_offset_t; typedef unsigned long long memory_object_size_t; // osfmk/mach/vm_types.h typedef uint64_t vm_object_offset_t; typedef uint64_t vm_object_size_t;

Although the kernel's own virtual address space is 32-bit, the VM subsystem does run the processor in 64-bit computation mode for mapping certain VM-related data structures.

The kernel defines ppnum_t, the data type for the physical page number, to be a 32-bit unsigned integer. Consequently, there can be at most UINT32_MAX physical pages. For a page size of 4KB, this limits the physical address space to 16TB.

8.18.2.3. Usage and Caveats

In Mac OS X 10.4, 64-bit support is limited to C and C++ programs that only link against the system library (i.e., libSystem.dylib or System.framework),^[27] which is available as a dual-architecture library. Additionally, the Accelerate framework (Accelerate.framework) is available in both 32-bit and 64-bit versions. GCC 4.0.0 or higher is required to compile 64-bit programs.

^[27] Certain operations in the 32-bit system library are optimized for the host processorthat is, they make use of 64-bit hardware if it is available.

$ lipo -info /usr/lib/libSystem.dylib Architectures in the fat file: /usr/lib/libSystem.dylib are: ppc ppc64

Key Mac OS X frameworks such as Carbon, Cocoa, Core Foundation, and the I/O Kit framework are 32-bit-only. Both generic and Mac OS Xspecific migration issues must be dealt with while creating 64-bit programs.

The 64-bit ABI has several differences from the 32-bit ABI, on which it is based. For example, 64-bit integer parameters are passed in a single GPR. The Pthreads library uses GPR13 for thread-specific data retrieved by pthread_self().
64-bit programs cannot use 32-bit libraries or plug-ins and vice versa. Specifically, 32-bit and 64-bit code cannot be mixed in a single program, since the kernel tags an entire task as 32-bit or 64-bit.
64-bit programs cannot have native Mac OS X graphical user interfaces since the relevant frameworks are not available in 64-bit versions.
Although 64-bit and 32-bit programs can share memory and can communicate with each other through IPC, they must use explicit data types while doing so.
Programs that serialize binary data may want to ensure that the size and alignment of the serialized data does not change between 32-bit and 64-bit programs, unless only one type of program will access that data.
An I/O Kit driver's user client (see Chapter 10) cannot be used from a 64-bit program unless the driver explicitly supports 64-bit user address spaces. A kernel extension can access physical addresses above 4GB by using the IOMemoryDescriptor I/O Kit class.

The x86 version of Mac OS X 10.4 does not support 64-bit computing. As Apple adopts 64-bit x86 processors,^[28] Mac OS X should regain 64-bit support. It is very likely that most, if not all, user libraries will have 64-bit equivalents in future versions of Mac OS X.

^[28] A likely first candidate is Intel's "Merom" 64-bit mobile processor.

8.18.3. Why Not to Use 64-bit Executables

Especially in Mac OS X, 64-bit programs are not necessarily "better" just by being 64-bit. In fact, typical programs are likely to have poorer performance if compiled for 64-bit computing. The following are some reasons against using 64-bit executables on Mac OS X.

The memory footprint of 64-bit programs is higher in general: They use larger pointers, stacks, and data sets. This potentially leads to more cache and TLB misses.
64-bit software support in Mac OS X 10.4 is nascent. The interfaces that have migrated to 64-bit are not mature, and most of the commonly used interfaces are still 32-bit.
As we discussed earlier, some of the usual reasons for moving to 64-bit computing are not very compelling on the PowerPC.
Certain PowerPC nuances can slow down 64-bit execution. For example, if a 32-bit signed integer is used as an array index, then, unless the integer is stored in a register, each access will require an extra extsw instruction to sign-extend the value.

8.18.4. The 64-bit "Scene"

As Table 810 indicates, there exist several 64-bit operating systems. For example, 64-bit Solaris has a fully 64-bit kernel with 64-bit drivers. Barring some obsolete libraries, Solaris system libraries have both 32-bit and 64-bit versions. Both types of applications can run concurrently. Similarly, the AIX 5L operating system for 64-bit POWER hardware has a fully 64-bit kernel. Again, drivers and other kernel extensions are also 64-bit, and both 32-bit and 64-bit user environments are supported concurrently. There is also a 32-bit AIX 5L kernel that supports 64-bit applications on 64-bit hardware. However, the amount of physical memory it can support is limited (96GB) as compared to the 64-bit kernel.

Standards and 64-bit

The Single UNIX Specification, Version 2 (UNIX 98) included large-file support and removed architectural dependencies to allow 64-bit processing. APIs that were tied to 32-bit data types were cleaned up. For example, several functions were made large-file-aware, using off_t instead of size_t. Version 3 of the Single UNIX Specification (UNIX 03) revised, combined, and updated several standards, including the POSIX standard.