UML Execution Modes

Traditionally, UML has had two modes of operation, one for unmodified hosts and one for hosts that have been patched with what is known as the skas patch. The first mode is called tt mode, or "tracing thread" mode, after the single master thread that controls the operation of the rest of the UML instance. The second is called skas mode, or "separate kernel address space" mode. This requires a patch applied to the host kernel. UML running in this mode is more secure and performs better than in tt mode.

Recently, a third mode has been added that provides the same security as skas, plus some of the performance benefits, on unmodified hosts. The current skas host patch is the third version, so it's called skas3. This new mode is called skas0 since it requires no host changes. The intent is for this to completely replace tt mode since it is generally superior, but tt mode still has some advantages. Once this is no longer the case, the support for tt mode will be removed. Even so, I will describe tt mode here since it is not clear when support for it will be removed, and you may need an older release of UML that doesn't have skas0 support.

As the term skas suggests, the main difference between tt mode and the two skas modes is how UML lays out address spaces on the host. Figure 9.2 shows a process address space in each mode. In tt mode, the entire UML kernel resides within each of its process's address spaces. In contrast, in skas3 mode, the UML kernel resides entirely in a different host address space. skas0 mode is in between, as it requires that a small amount of UML kernel code and data be in its process address spaces.

Figure 9.2. The three UML execution modes differ in how they lay out their process address spaces. `tt` mode maps the entire UML kernel into the upper. 5GB of the process address space. `skas0` mode leaves the UML kernel outside the process address space, except for two pages of memory mapped at the very top of the process address space. These are used to receive `SIGSEGV` signals and pass the resulting page fault information back to the UML kernel, and to modify the process address space. These two pages are unnecessary in `skas3` mode, which allows its processes to use the entire address space.

The relationship between UML processes and the corresponding host processes for each mode follows from this. Figure 9.3 shows these relationships.

Figure 9.3. Comparison of the three UML execution modes. `tt` mode has a separate host thread (the tracing thread), which controls the execution of the other threads. Processes and threads within the UML instance have corresponding threads on the host. Each such host process has the UML kernel mapped into the top of its address space. In `skas3` mode, there is no separate tracing threadthis role is performed by the kernel thread. There is a single process on the host in which all UML processes run. `skas0` mode is a hybrid of `tt` mode and `skas3` mode. Like `skas3` mode, there is no tracing thread and there is a separate kernel thread in which the UML kernel lives. Like `tt` mode, each UML process has a corresponding host process.

tt mode really only exists on x86 hosts. The x86_64 and S/390 ports were made after skas0 mode was implemented, and they both use that rather than tt mode. Because of this, in the following discussion about tt mode, I will talk exclusively about x86. Also, the discussion about address space sizes and constraints on UML physical memory sizes are confined to x86, since this issue affects only 32-bit hosts.

`tt` Mode

In tt mode, a single tracing thread controls the rest of the threads in the UML instance by deciding when to intercept their system calls and have them executed within UML. When a UML process is running, it intercepts its system calls, and when the UML kernel is running, it doesn't. This is the tracing that gives this thread its name.

The tracing thread has one host process per UML process under its control. This is necessary because UML needs a separate host address space for each UML process address space, and creating a host process is the only way to get a new host address space. This is wasteful since all of the other host kernel data associated with the process, such as the kernel stack and task structure, are unnecessary from the point of view of UML. On a uniprocessor UML instance, there can be only one of these host processes running at any given time, so all of the idle execution contexts represented by the other host processes are wasted. This problem is fixed in skas3 mode, as described in the next section.

The UML kernel is placed in the upper .5GB of each process address space. This is the source of the insecurity of tt modethe UML kernel, including its data, is present and writable in the address spaces of its processes. Thus, a process that knew enough about the internals of UML could change the appropriate data inside UML and escape onto the host by tricking the tracing thread into not intercepting its system calls.

It is possible to protect the kernel's memory from its processes by write-protecting it when exiting the kernel and write-enabling it when entering the kernel. This has been implemented but never used because it imposes a huge performance cost. This protection has other problems as well, including complicating the code and making Symmetric Multi-Processing (SMP) impossible. So, it has probably never been used except in testing.

The fact that UML occupies a portion of the process address space is also a problem. The loss to UML processes of the upper .5GB of address space is inconvenient to some processes, and confining UML to that small address space limits the size of its physical memory. Since normal physical memory must be mapped into the kernel address space, the maximum physical memory size of a UML is less than .5GB. In practice, the limit is around 480MB.

You can use Highmem support to get around this. Highmem support in Linux exists because of the need to support more than 4GB of physical memory in 32-bit x86 machines, which can access only 4GB of memory in a single 32-bit address space. In practice, since the x86 kernel has 1GB of address space (by default, it occupies the upper 1GB of its process's address spaces), it needs Highmem support to access more than 1GB of physical memory.

The memory pages above the lower 1GB can be easily used for process memory, but if the kernel is to use a Highmem page for its own memory, it must temporarily map it into its address space, manipulate the data in it, and then unmap it. This imposes a noticeable performance cost.

UML has a similar problem with Highmem memory, and, in tt mode, it starts at around .5GB of physical memory, rather than 1GB. To access memory outside this region, it must also map it into its address space, but this mapping is more expensive for UML than it is for the host. So, UML suffers a greater performance penalty with a large physical memory than the host does.

`skas3` Mode

The problems with tt mode motivated the development of the skas3 host patch. These problems were driven by host limitations (or so we thought until someone figured out a way around them), so the skas3 patch added mechanisms to the host that allowed UML to avoid them.

skas3 gets its name from using the third version of the "separate kernel address space" host patch. As its rather unimaginative name suggests, the skas3 patch allows the UML kernel to be in a separate host address space from its processes. This protects it from nosy processes because those processes can't form a UML kernel address to write. The UML kernel is completely inaccessible to its processes.

skas3 also improved UML performance. Removing the UML kernel from its processes made new process creation faster, shrunk some pieces of data in the host kernel, and may speed context switching. In combination, these effects produced a very noticeable performance improvement over tt mode.

To allow the UML kernel to exist in a separate address space from its processes, a small number of new facilities were needed in the host:

Creation, manipulation, and destruction of host address spaces that are not associated with a process
Extraction of page fault information, such as the faulting address, access type, and processor flags, after a process receives a SIGSEGV
Manipulation of the Local Descriptor Table (LDT) entries of another process

The address space manipulation is enabled through a new file in /proc called /proc/mm. Opening it creates a new, empty host address space and returns a file descriptor that refers to that address space. When the file descriptor is closed, and there are no users of the address space, the address space is freed.

A number of operations were formerly impossible to perform on an outside address space. Changing mappings is the most obvious. To handle a page fault in tt mode, it is sufficient to call mmap since the kernel is inside the process address space. When the process is outside it, we need something else. We can have the address space file descriptor support these operations through writing specially formatted structures to it. Mapping, unmapping, and changing permissions on pages are done this way, as is changing LDT entries associated with the address space.

Now that we can create host address spaces without creating new host processes, the resource consumption associated with tt mode goes away. Instead of one host process per UML process, there is now one host process per virtual processor. The UML kernel is in one host process that does system call interception on another, which, on a uniprocessor UML, runs all UML processes. It does so by switching between address spaces as required, under the control of the UML kernel invoking another ptrace extension, PTRACE_SWITCH_MM. This extension makes the ptraced process switch from one host address space to another.

With the UML kernel in its own address space, it is no longer constrained to the 1GB of address space of tt mode. This enables it to have a much larger physical memory without needing to resort to Highmem. In principal, the entire 3GB address space on x86 is available for use as UML physical memory. In practice, the limit is some what lower, but, at around 2.5GB, still much greater than the 480MB limit imposed by tt mode.

In order to achieve this higher limit, the UML kernel must be configured with CONFIG_MODE_TT disabled. With both CONFIG_MODE_TT and CONFIG_MODE_SKAS enabled, the resulting UML kernel must be able to run in both modes, depending on its command line and the host capabilities it detects when it boots. A dual-mode UML instance will be compiled to load into the upper .5GB of its address space, as required for tt mode, and will be subject to the 480MB physical memory limit. Disabling CONFIG_MODE_TT causes the UML binary to be compiled so it loads lower in its address space, where more normal processes load. In this case, the physical memory limit increases to around 2.5GB.

This is fortunate since Highmem is slower in skas3 mode than in tt mode, unlike almost all other operations. This is because a skas3 mode UML instance needs to map Highmem pages into its address space much more frequently than a tt mode UML instance does. When a UML process makes a system call, it is often the case that one of the arguments is a pointer, and the data referenced by that pointer must be copied into the UML kernel address space. In tt mode, that data is normally available to simply copy since the UML kernel is in the UML process address space. In skas3 mode, that isn't the case. Now, the UML kernel must work out from the process pointer it was given where in its own physical memory that data lies. In the case of Highmem memory, that data is not in its physical memory, and the appropriate page must be mapped into its address space before it can access the data.

Finally, it is necessary to extract page fault information from another process. Page faults happen when a process tries to execute code or access data that either has not been read yet from disk or has been swapped out. Within UML, process page faults manifest themselves as SIGSEGV signals being delivered to the process. Again, in tt mode, this is easy because the UML kernel itself receives the SIGSEGV signal, and all the page fault information is on its stack when it enters the signal handler. In skas3 mode, this is not possible because the UML kernel never receives the SIGSEGV. Rather, the UML kernel receives a notification from the host that its process received a SIGSEGV, and it cancels the signal so that it is never actually delivered to the process. So, the skas3 patch adds a ptrace option, PTRACE_FAULTINFO, to read this information from another process.

Together, these host changes make up the skas3 patch. UML needed to be modified in order to use them, of course. Once this was done, and the security and performance benefits became apparent, skas3 became the standard for serious UML installations.

`skas0` Mode

More recently, an Italian college student, Paolo Giarrusso, who had been doing good work on UML, thought that it might be possible to implement something like skas3 on hosts without the skas3 patch.

His basic idea was to insert just enough code into the address space of each UML process to perform the address space updates and information retrieval for which skas3 requires a host patch. As I implemented it over the following weekend, this inserted code takes the form of two pages mapped by the UML kernel at the top of each process address space. One of these pages is for a SIGSEGV signal frame and is mapped with write permission, and the other contains UML code and is mapped read-only.

The code page contains a function that invokes mmap, munmap, and mprotect as requested by the UML kernel. The page also contains the SIGSEGV signal handler. The function is invoked whenever address space changes are needed in a UML process and is the equivalent of requesting an address space change through a /proc/mm file descriptor. The signal handler implements the equivalent of PTRACE_FAULTINFO by receiving the SIGSEGV signal, reading all of the fault information from its stack, and putting it in a convenient form where the UML kernel can read it.

Without changes in the host kernel, we have no way to create new host address spaces without creating new host processes. So, skas0 mode resembles tt mode in having one host process for each UML process.

This is the only similarity between skas0 mode and tt mode. In skas0 mode, the UML kernel runs in a separate host process and has a separate host address space from its processes. All of the skas3 benefits to security and performance flow from this property. The fact that the UML kernel is controlling many more processes than in skas3 mode means that we have the same wasted kernel memory that tt mode has. This makes skas0 mode somewhat less efficient than skas3 mode but still a large improvement over tt mode.

To Patch or Not to Patch?

With respect to how you want to run UML, at this writing, the basic choice is between skas0 mode and skas3 mode. The decision is controlled by whether you are willing to patch the host kernel in order to get better performance than is possible by using skas0 mode.

We have a number of performance-improving patches in the works, some or all of which may be merged into the mainline kernel by the time this book reaches your bookshelf. You will be able to tell what, if any, patches are missing from your host kernel by looking at the early boot messages. Here is an example:

Checking that ptrace can change system call numbers...OK Checking syscall emulation patch for ptrace...missing Checking PROT_EXEC mmap in /tmp...OK Checking if syscall restart handling in host can be \    skipped...OK Checking for the skas3 patch in the host:   - /proc/mm...not found   - PTRACE_FAULTINFO...not found   - PTRACE_LDT...not found UML running in SKAS0 mode Adding 16801792 bytes to physical memory to account for \    exec-shield gap

The message about the syscall emulation patch is talking about a ptrace extension that cuts in half the number of ptrace calls needed to intercept and nullify a host system call. This is separate from the skas3 patch and is used in all UML execution modes. At this writing, this patch is in the mainline kernel, so a UML instance running on a host with 2.6.14 or later will benefit from this.

A few lines later, you can see the instance checking for the individual pieces of the skas3 patch /proc/mm, PTRACE_FAULTINFO, and PTRACE_LDT. Two of these, the two ptrace extensions, are likely to be merged into the mainline kernel separately, so there will likely be a set of host kernels for which UML finds some of these features but not all. In this case, it will use whatever host capabilities are present and use fallback code for those that are missing. /proc/mm will never be in the mainline kernel, so we are thinking about alternatives that will be acceptable to Linus.

For a smallish UML installation, a stock unmodified host kernel will likely provide good UML performance. So, in this case, it is probably not necessary to patch and rebuild a new kernel for the host.

Note that tt mode was not recommended in any situation. However, sometimes you may need to run an old version of UML in which skas0 is not available. In this case, it may be a good idea to patch the host with the skas3 patch. If UML running under tt mode is too slow or too resource intensive, or you need the security that comes with skas3 mode, then patching with the skas3 patch is the best course.

Vanderpool and Pacifica

Yet another option, which at this writing is not yet available but will be relatively soon, is to take advantage of the hardware virtualization support that Intel and AMD are incorporating into their upcoming processors. These extensions are called Vanderpool and Pacifica, respectively. UML is currently being modified in order to take advantage of this support.

Vanderpool and Pacifica are similar, and compatible, in roughly the same way that AMD's Opteron and Intel's EM64T architectures are similar. There are some differences in the instructions, but they are relatively minor, and software written for one will generally run unmodified on the other. UML is currently getting Vanderpool Technology support, with the work being done by a pair of Intel engineers in Russia, but the result will likely run, perhaps with some tweaks, on an AMD processor with Pacifica support.

This support will likely bring UML performance close to native performance. The hardware support is sufficient to eliminate some of the largest performance bottlenecks that UML faces on current hardware. The main bottleneck is the context switching that ptrace requires to intercept and nullify system calls on the host. The hardware virtualization support will enable this to be eliminated, allowing UML to receive process system calls directly, without having to go through the host kernel. A number of other things will be done more efficiently than is currently possible, such as modifying process address spaces and creating new tasks.

In order to use this hardware virtualization support, you will need a host new enough to have the support in its processor. You will also need a version of UML that has the required support. Given these two requirements are met, UML will likely perform noticeably better than it does without that support.

tt Mode

skas3 Mode

skas0 Mode

To Patch or Not to Patch?

Vanderpool and Pacifica

`tt` Mode

`skas3` Mode

`skas0` Mode