Flylib.com

Books Software

 
 
 

Controlling a UML Instance with Signals


Controlling a UML Instance with Signals

So far, I've described the civilized ways to control UML instances from the host. However, sometimes an instance isn't healthy enough to cooperate with these mechanisms. For these cases, some limited amount of control is available by sending the instance a signal.

To send a UML instance a signal, you first need to know which process ID to send it to. A UML instance is comprised of a number of threads, so the choice is not obvious. Also, when the host has a number of instances, there is a real chance of misreading the output of ps and hitting the wrong UML instance.

To solve this problem, a UML instance writes the process ID of its main thread into the pid file in its umid directory. This thread is the one responsible for handling the signals that can be used for this last ditch control. Given a umid , sending a signal to the corresponding instance is done like this:

kill -TERM `cat

~/.

uml/debian/pid`


When this main thread receives SIGINT , SIGTERM , or SIGHUP , it will run the UML-specific parts of the shutdown process. This will have the same effect as the MConsole halt or sysrq b requests . No userspace or kernel cleanup will happen. Only the host resources that have been allocated by UML will be released. The UML instance's filesystems will be dirty and need either an fsck or a journal replay.



Chapter 9. Host Setup for a Small UML Server

After having talked about UML almost exclusively so far, we will now talk about the host. This chapter and the next will cover setting up and running a secure, well-performing UML server. First we will talk about running a small UML server, where the UML instances will be controlled by fairly trusted people, such as the host administrator or others with logins on the host. Thus, we won't need the same level of security as on a large UML server with unknown, untrusted people inside the UML instances. We will have a basic level of security, where nothing can break out of a UML instance onto the host. We won't be particularly paranoid about whether network traffic from the UMLs is originating from the expected IP addresses or whether there is too much of it. Similarly, we will talk about getting good performance from the UML instances, but we won't try to squeeze every bit of UML hosting capacity from the host.

All of these things, which a large UML hosting provider cares about more than a casual in-house UML user does, will be discussed in the next chapter. There, we will cover tougher security measures, such as how to protect the host even if a user does somehow manage to break out of a UML instance and how to ensure that UML instances are not spoofing IP addresses or sending out unreasonably large amounts of traffic. We will also discuss how to log resource usage, such as network traffic, in that chapter. But first, let's cover what more casual users want to know.



Host Kernel Version

Technically, UML will run on any x86 host kernel from a stable series (Linux kernel versions 2.2, 2.4, or 2.6) since 2.2.15. However, the 2.2 kernel is of historic interest onlyif you have such a machine that you are going run UML instances on, you should upgrade. The 2.4 and 2.6 kernels make good hosts , but 2.6 is preferred. UML will run on any x86_64 (Opteron/AMD64 or Intel EM64T) host, which is a newer architecture and has had the necessary basic support since the beginning. However, x86_64 hosts are stable only on hosts running 2.6.12 or later. On S/390, a fairly new 2.6 host kernel is required because of bugs that were found and fixed during the UML port to that architecture.

UML makes use of the AIO and O_DIRECT facilities in the 2.6 kernels for better performance and lower memory consumption. AIO is kernel-level asynchronous I/O, where a number of I/O requests can be issued at once, and the process that issued them can receive notifications asynchronously when they finish. The kernel issues the notifications when the data is available, and the order in which that happens may not be related to the order in which they are issued.

The alternative, which is necessary on earlier kernels, is to either make normal read and write system calls, which are synchronous, and make the process sleep until the operation finishes, or to dedicate a thread (or multiple threads) to I/O operations. Doing I/O synchronously allows only one operation to be pending at any given time. Doing I/O asynchronously by having a separate thread do synchronous I/O at least allows the process to do other work while the operation is pending. On the other hand, only one operation can be pending for each such I/O thread, and the process must context-switch back and forth from these threads and communicate with them as operations are issued and completed. Having one thread for each pending I/O operation is hugely wasteful .

glibc has AIO support in all kernels, even those without AIO support, and it implements this using threads, potentially one thread per outstanding I/O request. UML, on such hosts, emulates AIO in a similar way. It creates a single thread, allowing one I/O request to be pending at a time.

The AIO facility present in the 2.6 kernel series allows processes to do true AIO. UML uses this by having a separate thread handle all I/O requests, but now, this thread can have many operations pending at once. It issues operations to the host and waits for them to finish. As they finish, the thread interrupts the main UML kernel so that it can finish the operations and wake up anything that was waiting for them.

This allows better I/O performance because more parallel I/O is possible, which allows data to be available earlier than if only one I/O request can be pending.

O_DIRECT allows a process to ask that an I/O request be done directly to and from its own address space without being cached in the kernel, as shown in Figure 9.1. At first glance, the lack of caching would seem to hurt performance. If a page of data is read twice with O_DIRECT enabled, it will be read from disk twice, rather than the second request being satisfied from the kernel's page cache. Similarly, write requests will go straight to disk, and the request won't be considered finished until the data is on the disk.

Figure 9.1. O_DIRECT I/O compared to buffered I/O. When a process does a buffered read, the data is first read from disk and stored in the kernel's page cache. Then it is copied into the address space of the process that initiated the read. Buffering it in the page cache provides faster access to the data if it is needed again. However, the data is copied and stored twice. When a process performs an O_DIRECT read, the data is read directly from the disk into the process address space. This eliminates the extra copy operation and the extra memory consumption caused by a buffered read. However, if another process needs the data, it must be read from disk rather than simply copied from the kernel's page cache. The figure also shows a read done by the kernel for its own purposes, to compare it to the O_DIRECT read. In both cases, the data is read directly from disk and stored only once. When the process doing the O_DIRECT read is UML reading data into its own page cache, the two cases are identical.


However, O_DIRECT is intended for specialized applications that implement their own caching and use AIO. For an application like this, using O_DIRECT can improve performance and lower its total memory requirements, including memory allocated on its behalf inside the kernel. UML is such an application, and use of O_DIRECT actually makes it behave more like a native kernel.

A native kernel must wait for the disk when it writes data, and there is no caching level below it (except perhaps for the on-disk cache), so if it reads data, it must again wait for the disk. This is exactly the behavior imposed on a process when it uses O_DIRECT I/O.

The elimination of the caching of data at the host kernel level means that the only copy of the data is inside the UML instance that read it. So, this eliminates one copy of the data, reducing the memory consumption of the host. Eliminating this copy also improves I/O latency, making the data available earlier than if it was read into the host's page cache and then copied (or mapped) into the UML instance's address space.

For these reasons, for x86 hosts, a 2.6 host kernel is preferable to 2.4. As I pointed out earlier, running UML on x86_64 or S/390 hosts requires a 2.6 host because of host bugs that were fixed fairly recently.