Looking at a UML from the Inside and Outside

Finally, we'll see a login prompt. Actually, I see three on my screen. One is in the xterm window in which I ran UML. The other two are in xterm windows run by UML in order to hold the second console and the first serial line, which are configured to have gettys running on them. We'll log in as root (using the highly secure default root password of root that most of my UML filesystems have) and get a shell:

usermode login: root Password: Last login: Thu Jan 27 18:51:35 2005 on tty0 Linux usermode 2.6.11-rc1-mm1 #83 Thu Jan 27 12:16:00 EST 2005 \       i686 unknown usermode:~#

Again, this is identical to what you'd see if you logged in to a physical machine booted on this filesystem.

Now it's time to start poking around inside this UML and see what it looks like. First, we'll look at what processes are running, as shown in Figure 2.4.

Figure 2.4. Output from ps uax inside UML

 usermode:~# ps uax USER       PID %CPU %MEM   VSZ  RSS TTY   STAT  START  TIME COMMAND root         1  0.0  0.3  1100  464 ?     S     19:17  0:00 init [2] root         2  0.0  0.0     0    0 ?     RWN   19:17  0:00 [ksoftirqd/0] root         3  0.0  0.0     0    0 ?     SW<   19:17  0:00 [events/0] root         4  0.0  0.0     0    0 ?     SW<   19:17  0:00 [khelper] root         5  0.0  0.0     0    0 ?     SW<   19:17  0:00 [kthread] root         6  0.0  0.0     0    0 ?     SW<   19:17  0:00 [kblockd/0] root         7  0.0  0.0     0    0 ?     SW    19:17  0:00 [pdflush] root         8  0.0  0.0     0    0 ?     SW    19:17  0:00 [pdflush] root        10  0.0  0.0     0    0 ?     SW<   19:17  0:00 [aio/0] root         9  0.0  0.0     0    0 ?     SW    19:17  0:00 [kswapd0] root        96  0.0  0.4  1420  624 ?     S     19:17  0:00 /sbin/syslogd root        98  0.0  0.3  1084  408 ?     S     19:17  0:00 /sbin/klogd daemon     102  0.0  0.3  1200  420 ?     S     19:17  0:00 /sbin/portmap root       105  0.0  0.4  1128  548 ?     S     19:17  0:00 /sbin/rpc.statd root       111  0.0  0.4  1376  540 ?     S     19:17  0:00 /usr/sbin/inetd root       120  0.0  0.6  1820  828 ?     S     19:17  0:00 /bin/sh /usr/bin/ mysql      133  0.1  1.2 19244 1540 ?     S     19:17  0:00 /usr/sbin/mysqld mysql      135  0.0  1.2 19244 1540 ?     S     19:17  0:00 /usr/sbin/mysqld mysql      136  0.0  1.2 19244 1540 ?     S     19:17  0:00 /usr/sbin/mysqld root       144  0.9  0.9  2616 1224 ?     S     19:17  0:00 /usr/sbin/sshd root       149  0.0  1.0  2588 1288 ?     S     19:17  0:00 /usr/sbin/apache root       152  0.0  0.9  2084 1220 tty0  S     19:17  0:00 -bash root       153  0.0  0.3  1084  444 tty1  S     19:17  0:00 /sbin/getty 38400 root       154  0.0  0.3  1084  444 tty2  S     19:17  0:00 /sbin/getty 38400 root       155  0.0  0.3  1084  444 ttyS0 S     19:17  0:00 /sbin/getty 38400 www-data   156  0.0  1.0  2600 1284 ?     S     19:17  0:00 /usr/sbin/apache www-data   157  0.0  1.0  2600 1284 ?     S     19:17  0:00 /usr/sbin/apache www-data   158  0.0  1.0  2600 1284 ?     S     19:17  0:00 /usr/sbin/apache www-data   159  0.0  1.0  2600 1284 ?     S     19:17  0:00 /usr/sbin/apache www-data   160  0.0  1.0  2600 1284 ?     S     19:17  0:00 /usr/sbin/apache root       162  2.0  0.5  2384  736 tty0  R     19:17  0:00 ps uax usermode:~#

There's not much to comment on except the total normality of this output. What's interesting here is to look at the host. Figure 2.5 shows the corresponding processes on the host.

Figure 2.5. Partial output from ps uax on the host

 USER     PID %CPU %MEM   VSZ  RSS TTY     STAT START TIME  COMMAND jdike   9938  0.1  3.1 131112 16264 pts/3 R    19:17 0:03 ./linux [ps] jdike   9942  0.0  3.1 131112 16264 pts/3 S    19:17 0:00 ./linux [ps] jdike   9943  0.0  3.1 131112 16264 pts/3 S    19:17 0:00 ./linux [ps] jdike   9944  0.0  0.0   472  132 pts/3   T    19:17 0:00 jdike  10036  0.0  0.5  8640 2960 pts/3   S    19:17 0:00 xterm -T Virtual jdike  10038  0.0  0.0  1368  232 ?       S    19:17 0:00 /usr/lib/uml/port jdike  10039  0.0  1.5 131092 8076 pts/6  S    19:17 0:00 ./linux [hwclock] jdike  10095  0.0  0.1   632  604 pts/3   T    19:17 0:00 jdike  10099  0.0  0.0   416  352 pts/3   T    19:17 0:00 jdike  10107  0.0  0.0   428  332 pts/3   T    19:17 0:00 jdike  10113  0.0  0.1   556  516 pts/3   T    19:17 0:00 jdike  10126  0.0  0.0   548  508 pts/3   T    19:17 0:00 jdike  10143  0.0  0.0   840  160 pts/3   T    19:17 0:00 jdike  10173  0.0  0.2  1548 1140 pts/3   T    19:17 0:00 jdike  10188  0.0  0.1  1232  780 pts/3   T    19:17 0:00 jdike  10197  0.0  0.1  1296  712 pts/3   T    19:17 0:00 jdike  10205  0.0  0.0   452  452 pts/3   T    19:17 0:00 jdike  10207  0.0  0.0   452  452 pts/3   T    19:17 0:00 jdike  10209  0.0  0.0   452  452 pts/3   T    19:17 0:00 jdike  10210  0.0  0.5  8640 2960 pts/3   S    19:17 0:00 xterm -T Virtual jdike  10212  0.0  0.0  1368  232 ?       S    19:17 0:00 /usr/lib/uml/port jdike  10213  0.0  2.9 131092 15092 pts/7 S    19:17 0:00 ./linux [/sbin/ge jdike  10214  0.0  0.1  1292  688 pts/3   T    19:17 0:00 jdike  10215  0.0  0.1  1292  676 pts/3   T    19:17 0:00 jdike  10216  0.0  0.1  1292  676 pts/3   T    19:17 0:00 jdike  10217  0.0  0.1  1292  676 pts/3   T    19:17 0:00 jdike  10218  0.0  0.1  1292  676 pts/3   T    19:17 0:00 jdike  10220  0.0  0.1  1228  552 pts/3   T    19:17 0:00

Each of the nameless host processes corresponds to an address space inside this UML instance. Except for application and kernel threads, there's a one-to-one correspondence between UML processes and these host processes.

Notice that the properties of the UML processes and the corresponding host processes don't have much in common. All of the host processes are owned by me, whereas the UML processes have various owners, including root. The process IDs are totally different, as are the virtual and resident memory sizes.

This is because the host processes are simply containers for UML address spaces. All of the properties visible inside UML are maintained by UML totally separate from the host. For example, the owner of the host processes will be whoever ran UML. However, many UML processes will be owned by root. These processes have root privileges inside UML, but they have no special privileges on the host. This important fact means that root can do anything inside UML without being able to do anything on the host. A user logged in to a UML as root has no special abilities on the host and, in fact, may not have any abilities at all on the host.

Now, let's look at the memory usage information in /proc/meminfo, shown in Figure 2.6.

Figure 2.6. The UML /proc/meminfo

 usermode:~# cat /proc/meminfo MemTotal:       126796 kB MemFree:        112952 kB Buffers:           512 kB Cached:           7388 kB SwapCached:          0 kB Active:           6596 kB Inactive:         3844 kB HighTotal:           0 kB HighFree:            0 kB LowTotal:       126796 kB LowFree:        112952 kB SwapTotal:           0 kB SwapFree:            0 kB Dirty:               0 kB Writeback:           0 kB Mapped:           5424 kB Slab:             2660 kB CommitLimit:     63396 kB Committed_AS:    23100 kB PageTables:        248 kB VmallocTotal:   383984 kB VmallocUsed:        24 kB VmallocChunk:   383960 kB

The total amount of memory shown, 126796K, is close to the 128MB we specified on the command line. It's not exactly 128MB because some memory allocated during early boot isn't counted in the total. Going back to the host ps output in Figure 2.5, notice that the linux processes have a virtual size (the VSZ column) of almost exactly 128MB. The difference of 50K is due to a small amount of memory in the UML binary, which isn't counted as part of its physical memory.

Now, let's go back to the host ps output and pick one of the UML processes:

jdike    9938  0.1  3.1 131112 16264 pts/3  R    19:17   0:03 \      ./linux [ps]

We can look at its open files by looking at the /proc/9938/fd directory, which shows an entry like this:

lrwx------ 1 jdike jdike 64 Jan 28 12:48 3 -> \      /tmp/vm_file-AwBs1z (deleted)

This is the host file that holds, and is the same size (128MB in our case) as, the UML "physical" memory. It is created in /tmp and then deleted. The deletion prevents something else on the host from opening it and corrupting it. However, this has the somewhat undesirable side effect that /tmp can become filled with invisible files, which can confuse people who don't know about this aspect of UML's behavior.

To make matters worse, it is recommended for performance reasons to use tmpfs on /tmp. UML performs noticeably better when its memory file is on tmpfs rather than on a disk-based filesystem such as ext3. However, a tmpfs mount is smaller than the disk-based filesystem /tmp would normally be on and thus more likely to run out of space when running multiple UML instances. This can be handled by making the tmpfs mount large enough to hold the maximum physical memories of all the UML instances on the host or by creating a tmpfs mount for each UML instance that is large enough to hold its physical memory.

Take a look at the root directory:

UML# ls / bfs  boot  dev floppy initrd lib        mnt  root tmp var bin  cdrom etc home   kernel lost+found proc sbin usr

This looks strikingly similar to the listing of the loopback mount earlier and somewhat different from the host. Here UML has done the equivalent of a loopback mount of the ~/roots/debian_22 file on the host.

Note that making the loopback mount on the host required root privileges, while I ran UML as my normal, nonroot self and accomplished the same thing. You might think this demonstrates that either the requirement of root privileges on the host is unnecessary or that UML is some sort of security hole for not requiring root privileges to do the same thing. Actually, neither is true because the two operations, the loopback mount on the host and UML mounting its root filesystem, aren't quite the same thing. The loopback mount added a mount point to the host's filesystem, while the mount of / within UML doesn't. The UML mount is completely separate from the host's filesystem, so the ability to do this has no security implications.

However, from a different point of view, some security implications do arise. There is no access from the UML filesystem to the host filesystem. The root user inside the UML can do anything on the UML filesystem, and thus, to the host file that contains it, but can't do anything outside it. So, inside UML, even root is jailed and can't break out. ^[6]

^[6] We will talk about this in greater detail in Chapter 10, but UML is secure against a breakout by the superuser only if it is configured properly. Most important, module support and the ability to write to physical memory must be disabled within the UML instance. The UML instance is owned by some user on the host, and the UML kernel has the same privileges as that user. So, the ability for root to modify kernel memory and inject code into it would allow doing anything on the host that the host user can do. Disallowing this ensures that even the superuser inside UML stays jailed.

This is a general property of UMLa UML is a full-blown Linux machine with its own resources. With respect to those resources, the root user within UML can do anything. But it can do nothing at all to anything on the host that's not explicitly provided to the UML. We've just seen this with disk space and files, and it's also true for networking, memory, and every other type of host resource that can be made accessible within UML.

Next, we can see some of UML's hardware support by looking at the mount table:

UML# mount /dev/ubd0 on / type ext2 (rw) proc on /proc type proc (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) none on /tmp type tmpfs (rw)

Here we see the ubd device we configured on the command line now mounted as the root filesystem. The other mounts are normal virtual filesystems, procfs and devpts, and a tmpfs mount on /tmp. df will show us how much space is available on the virtual disk:

UML# df Filesystem         1k-blocks     Used Available Use% Mounted on /dev/ubd0            1032056   242108    737468  25% / none                   63396        0     63396   0% /tmp

Compare the total size of /dev/ubd0 (1032056K) to that of the host file:

-rw-rw-r--  1 jdike jdike 1074790400 Jan 27 18:31 \       /home/jdike/roots/debian_22

They are nearly the same, ^[7]with the difference probably being the ext2 filesystem overhead. The entire UML filesystem exists in and is confined to that host file. This is another way in which users inside the UML are confined or jailed. A UML user has no way to consume more disk space than is in that host file.

^[7] The difference between the 1074790400 byte host file and 1032056K (1056825344 bytes) is 1.7%.

However, on the host, it is possible to extend the filesystem file, and the extra space becomes available to UML. In Chapter 6 we will see exactly how this is done, but for now, it's just important to note that this is a good example of how much more flexible virtual hardware is in comparison to physical hardware. Try adding extra space to a physical disk or a physical disk partition. You can repartition the disk in order to extend a partition, but that's a nontrivial, angst-ridden operation that potentially puts all of the data on the disk at risk if you make a mistake. You can also add a new volume to the volume group you wish to increase, but this requires that the volume group be set up beforehand and that you have a spare partition to add to it. In comparison, extending a file using dd is a trivial operation that can be done as a normal user, doesn't put any data at risk except that in the file, and doesn't require any prior setup.

We can poke around /proc some more to compare and contrast this virtual machine with the physical host it's running on. For some similarities, let's look at /proc/filesystems:

 UML# more /proc/filesystems nodev   sysfs nodev   rootfs nodev   bdev nodev   proc nodev   sockfs nodev   pipefs nodev   futexfs nodev   tmpfs nodev   eventpollfs nodev   devpts         reiserfs         ext3         ext2 nodev   ramfs nodev   mqueue

There's no sign of any UML oddities here at all. The reason is that the filesystems are not hardware dependent. Anything that doesn't depend on hardware will be exactly the same in UML as on the host. This includes things such as virtual devices (e.g., pseudo-terminals, loop devices, and TUN/TAP ^[8]network interfaces) and network protocols, as well as the filesystems.

^[8] The TUN/TAP driver is a virtual network interface that allows packets to be handled by a process, in order to create a tunnel (the origin of "TUN") or a virtual Ethernet device ("TAP").

So, in order to see something different from the host, we have to look at hardware-specific stuff. For example, /proc/interrupts contains information about all interrupt sources on the system. On the host, it contains information about devices such as the timer, keyboard, and disks. In UML, it looks like this:

 UML# more /proc/interrupts             CPU0   0:      211586      SIGVTALRM timer   2:          87          SIGIO console, console, console   3:           0          SIGIO console-write, console-write, \                           console-write   4:        2061          SIGIO ubd   6:           0          SIGIO ssl   7:           0          SIGIO ssl-write   9:           0          SIGIO mconsole  10:           0          SIGIO winch, winch, winch  11:          56          SIGIO write sigio

The timer, keyboard, and disks are here (entries 0, 2 and 6, and 4, respectively), as are a bunch of mysterious-looking entries. The -write entries stem from a weakness in the host Linux SIGIO support. SIGIO is a signal generated when input is available, or output is possible, on a file descriptor. A process wishing to do interrupt-driven I/O would set up SIGIO support on the file descriptors it's using. An interrupt when input is available on a file descriptor is obviously useful. However, an interrupt when output is possible is also sometimes needed.

If a process is writing to a descriptor, such as one belonging to a pipe or a network socket, faster than the process on the other side is reading it, then the kernel will buffer the extra data. However, only a limited amount of buffering is available. When that limit is reached, further writes will fail, returning EAGAIN. It is necessary to know when some of the data has been read by the other side and writes may be attempted again. Here, a SIGIO signal would be very handy. The trouble is that support of SIGIO when output is possible is not universal. Some IPC mechanisms support SIGIO when input is available, but not when output is possible.

In these cases, UML emulates this support with a separate thread that calls poll to wait for output to become possible on these descriptors, interrupting the UML kernel when this happens. The interrupt this generates is represented by one of the -write interrupts.

The other mysterious entry is the winch interrupt. This appears because UML wants to detect when one of its consoles changes size, as when you resize the xterm in which you ran UML. Obviously this is not a concern for the host, but it is for a virtual machine. Because of the interface for registering for SIGWINCH on a host device, a separate thread is created to receive SIGWINCH, and it interrupts UML itself whenever one comes in. Thus, SIGWINCH looks like a separate device from the point of view of /proc/interrupts.

/proc/cpuinfo is interesting:

 UML# more /proc/cpuinfo processor       : 0 vendor_id       : User Mode Linux model name      : UML mode            : skas host            : Linux tp.user-mode-linux.org 2.4.27 #6 \       Thu Jan 13 17:06:15 EST 2005 i686 bogomips        : 1592.52

Much of the information in the host's /proc/cpuinfo makes no sense in UML. It contains information about the physical CPU, which UML doesn't have. So, I just put in some information about the host, plus some about the UML itself.