COW Files | User Mode Linux

First, let's fire up our UML instances with basically the same command line as before, with a couple of changes:

linux mem=128M ubda=cow1,/home/jdike/roots/debian_22 \ umid=debian1

and, in another window:

linux mem=128M ubda=cow2,/home/jdike/roots/debian_22 \ umid=debian2

The main difference is that we included the ubda switch on both command lines to add what is called a COW file to the UML block device. COW stands for Copy-On-Write, a mechanism that allows multiple UML instances to share a host file as a filesystem, mounting it read-write without seeing each others' changes or otherwise interfering with each other.

This has a number of benefits, including saving disk space and memory and simplifying the management of multiple instances.

COW works by attaching a second file to the UML block device that captures all of the changes made to the filesystem. A good analogy for this is a sheet of clear plastic placed over a painting. You can "change" the artwork by painting on the plastic without changing the underlying painting. When you look at it, you see your changes in the places where you painted on the plastic sheet, and you see the underlying work of art in the places you haven't touched. This is shown in Figure 4.1, where we give Mona Lisa a moustache. ^[1] We paint the mustache on a plastic sheet and place it over the Mona Lisa. We have committed artistic blasphemy without breaking any actual laws.

^[1] Which is one of my secret fantasies, and probably one of yours, too.

Figure 4.1. Using COW to give Mona Lisa a mustache without getting arrested

The COW file is the analog of the clear plastic sheet, and the original file that contains the UML filesystem is the analog of the painting.

The COW is placed "over" the filesystem in the same way that the clear sheet is placed over the painting. When you modify a file on a COWed block device, the changed blocks are written to the COW file, not the underlying, "backing" file. This is the equivalent of painting on the sheet rather than on the painting. When you read a modified file, this is like looking at a spot on the painting that you've painted over on the plastic, and the driver reads the data from the COW file rather than the backing file.

Figure 4.2 shows how this works. We start with a COW file with no valid blocks and a fully populated backing file. If a process reads a block from this device, it will get the data that's in the backing file. If it then writes that block back, the new data will be written to the corresponding block in the COW file. At this point, the original block in the backing file is covered and will never be read again. All subsequent reads of that block will get the data from the COW file.

Figure 4.2. COW and backing files

Thus, the backing file is never modified since all changes are stored in the COW file. The backing file can be treated as read-only, but the device as a whole is still read-write.

On a host with multiple UML instances, this has a number of advantages. First, all the instances can boot from the same backing file, as long as they have private COW files. This saves disk space. Since no instance is likely to change every file on its root filesystem, most of the data it uses will come from the shared backing file, and there will be only one copy of that on the host rather than one copy per instance. This may not seem like a big deal since disks are so big and so cheap these days, but system memory, as large as it is, is finite. Disk space savings will translate directly into host memory savings since, if there's only one block on disk that's shared by all the instances, it can be present in the host's page cache only once. Host memory is often the factor limiting the number of instances that a host can support, so this memory savings translates directly into greater hosting capacity.

Second, because the data that an instance has changed is in a separate file from the backing file, it is a lot easier to make backups. The only data that needs saving is in the COW file, which is generally much smaller than the backing file. In Chapter 6, we will see how to back up an instance's data in a few seconds for a reasonably-sized filesystem, without having to reboot it.

Third, using COW files for multiple instances on a host can improve the instances' performance. The reason is the elimination of data duplication described earlier. If an instance needs data that another instance has already used, such as the contents of bash or libc, it will likely already be in the host's memory, in its page cache. So, access to that data will be much faster than when it is still on disk. The first instance to access a certain block from the backing file will have to wait for it to be read from disk, but later instances won't since the host will likely still have it in memory.

Finally, there is a fairly compelling use for COW files even when you're just running a single UML instance. They make it possible to test changes to a filesystem and back them out if they don't work. For example, you can reconfigure a service, storing the changes in a COW file. If the changes were wrong, you can revert them simply by throwing out the COW file. If they are good, you can commit them by merging them into the backing file. We will look at how to do this later in the chapter.

Along with these advantages, there is one major disadvantage, which stems from the fact that the backing file is read-only. If the backing file is modified after it has COW files, those COW files will become invalid. The reason is that if one of the blocks on the backing file that changed was also changed in a COW file, reading that block would result in the COW data being read, rather than the new data in the backing file. This means that this ubd device would appear to be a combination of old data and new, resulting in data corruption for blocks that contain file data and filesystem corruption for blocks that contain filesystem metadata.

The most common reason for wanting to modify the backing file is to upgrade the filesystem on it. This is understandable, but for backing files that have COW files based on them, this can't work. The right way to do upgrades in this case is to upgrade the COW files individually.

Going back to our two UML instances, which we booted from the same backing file, we see that they have almost exactly the same boot sequence. One exception is this from the first instance:

Creating "cow1" as COW file for "/home/jdike/roots/debian_22"

and this from the second:

Creating "cow2" as COW file for "/home/jdike/roots/debian_22"

You can specify, as we just did, a nonexistent file for the COW file, and the ubd driver will create the file when it starts up. Now that we have two UMLs booted, on the host, we can look at them:

host% ls -l cow* -rw-r--r--  1 jdike jdike 1075064832 Apr 24 17:33 cow1 -rw-r--r--  1 jdike jdike 1075064832 Apr 24 17:34 cow2

Looking at those sizes, you may think I was fibbing when I went on about saving disk space. These files seem about the same size as the backing file. In fact, they look a bit larger than the backing file:

host% ls -l /home/jdike/roots/debian_22 -rw-rw-r--  1 jdike jdike 1074790400 Apr 23 21:40     /home/jdike/roots/debian_22

I was not, in fact, fibbing, and therein lies an important fact about UML COW files. They are sparse, which means that even though their size implies that they occupy a certain number of blocks on disk (a disk block is 512 bytes, so the number of blocks occupied by a file is generally its size divided by 512, plus possibly another for the fragment at the end), many of those blocks are not occupied or allocated on disk.

There are two definitions for a file size here, and they conflict when it comes to sparse files. The first is how much data can be read from the file. The second is how much disk space the file occupies. Usually, these sizes are close. They won't be exactly the same because the fragment of the file at the end may occupy a full block. However, for a sparse file, many data blocks will not be allocated on disk. When they are read, the read operation will produce zeros, but those zeros are not stored on disk. Only when a hitherto untouched block is written is it allocated on disk.

So, for our purposes, the "true" file size is its disk allocation, which you can see by adding the s switch to ls:

host% ls -ls cow* 540 -rw-r--r--  1 jdike jdike 1075064832 Apr 24 17:53 cow1 540 -rw-r--r--  1 jdike jdike 1075064832 Apr 24 17:54 cow2

The number in the first column is the number of disk blocks actually allocated to the file. This implies that the two COW files are actually using 270K of disk space, rather than the 1GB implied by the ls -l output. This space is occupied by data that the instances modified as they booted, generally log files and the like, which are touched by daemons and other system utilities as they start up.

We will talk more fully about COW file management later in this chapter, but here I will point out that the sparseness of COW files requires us to take some care when dealing with them. Primarily, this means being careful when copying them. The most common methods of copying a sparse file result in it becoming nonsparseall the parts of the file that were previously unallocated on disk become allocated and that disk space filled with zeros. So, to avoid this, copying a COW file must be done in a sparseness-aware way. The main file copying utilities have switches for preserving sparseness when copying a file. For example, cp has --sparse=auto and --sparse=always, and tar has -S and --sparse.

Also, in order to detect that a backing file has been changed, thus invalidating any COW files based on it, the ubd driver compares the current modification time of the backing file to the modification time at the point that the COW file was created (which is stored in the COW file header). If they differ, the backing file has been modified, and a mount of the COW file may result in a corrupt filesystem.

Merely copying the backing file after restoring or moving it for some reason will change the modification time, even though the contents are unchanged. In this case, it is safe to mount a COW file that's based on it, but the ubd driver will refuse to do the mount. For this reason, it is important to also preserve the modification time of backing files, as well as sparseness, when copying them. However, everyone will forget once in a while, and later in this chapter, we will discuss some ways to recover from this.

Booting from COW Files

Now, we should look at what these COW files really mean from the perspective of the UML instances. First, we will make some changes in the two filesystems. In the first instance, let's copy /lib to /tmp :

UML1 # cp -r /lib /tmp

In the second, let's copy /usr/bin to /tmp:

UML2 # cp -r /usr/bin /tmp

In each, let's look at /tmp to see that the changes in one instance are not reflected in the other. First, the one where we copied /lib:

UML1 # ls -l /tmp total 0 drwxr-xr-x    4 root     root      1680 Apr 25 13:02 lib

And next, the one with the copy of /usr/bin:

UML2 # ls -l /tmp total 0 drwxr-xr-x    3 root     root      7200 Apr 25 13:07 bin

Here we can see that, even though they are booted off the same root filesystem, any changes they make are private. They can't be seen by other instances that have been booted from the same backing filesystem.

We can check this in another way by seeing how the sizes of the COW files on the host have changed:

host% ls -ls cow* 936 -rw-r--r--   1 jdike jdike 1075064832 Apr 25 13:22 cow0 1060 -rw-r--r--   1 jdike jdike 1075064832 Apr 25 13:22 cow1

Recall that after they booted, they both had 540 blocks allocated on disk. Now, they both have more than that396 and 520 more, respectively. I chose to copy /lib and /usr/bin for this example because /usr/bin is noticeably larger than /lib, and making a copy of it should cause a significantly larger number of blocks to change in the COW file. This is exactly what happened.

So, at this point, we have two instances each booted on a 1GB filesystem, something that would normally take 2GB of disk space. With the use of COW files, this is taking 1GB plus 1MB, since together, the UMLs have made about 1MB worth of changes in this filesystem. There is a commensurate saving of memory on the host because the data that both instances read from the filesystem will be present only once in the host's page cache instead of twice, as would be the case if they were booted from separate filesystems. Each new UML instance booted from the same filesystem similarly requires only enough host disk space to store its modifications, so the more instances you have booted from the same COWed filesystem, the more host disk space and memory you save.

I have one final remark on the subject of sharing filesystem images. Doing it using COW files is the only safe mechanism for sharing. If you booted two instances on the same filesystem, you would end up with a hopelessly corrupted filesystem. This is basically the same thing as booting two physical machines from the same disk, when both have direct access to the disk, as when it is dual-ported to both machines. Each instance will flush out data from memory to the filesystem file in such a way as to keep its own data consistent, but without regard to anything else that might be doing the same thing.

The only way for two machines to access the same data directly is for them to coordinate with each other, as happens with a clustering filesystem. They have to cooperate to maintain the consistency of the data they are sharing. We will see an example of such a UML cluster in Chapter 12.

In fact, you can't boot two UML instances from the same filesystem because UML locks the files it uses according to the access it needs to those files. It gets exclusive locks on filesystems it is going to write and nonexclusive read-only locks on files it will access but not write. So, when using a COW file, the UML instance will get an exclusive, read-write lock on the COW file and a nonexclusive read-only lock on the backing file. If another instance tries to get any lock on that COW file or a read-write lock on the backing file, it will fail. If that's the UML's root filesystem, the result will be an error message followed by a panic:

F_SETLK failed, file already locked by pid 21238 Failed to lock '/home/jdike/roots/debian_22', err = 11 Failed to open '/home/jdike/roots/debian_22', errno = 11 VFS: Cannot open root device "98:0" or unknown-block(98,0) Please append a correct "root=" boot option Kernel panic - not syncing: VFS: Unable to mount root fs on       unknown-block(98,0)

This prevents people from accidentally booting two instances from the same filesystem and protects them from the filesystem corruption that would certainly follow.

Moving a Backing File

In order to avoid some basic mistakes, the UML block driver performs some sanity checks on the COW file and its backing file before mounting them. The COW file stores some information about the backing file:

The filename
Its size
Its last modification time

Without these, the user would have to specify both the COW file and the backing file on the command line. If the backing file were wrong, without any checks, the result would be a hopelessly corrupted filesystem. The COW file is a block-level image of changes to the backing file. As such, it is tightly tied to a particular backing file and makes no sense with any other backing file.

If the backing file were modified, that would invalidate any already-existing COW files. This is the reason for the check of the modification time of the backing file.

However, this check gets in the way of moving the backing file since the file, in its new location, would normally have its modification time updated. So, it is important to preserve the timestamp on a backing file when moving it. A number of utilities have the ability to do this, including

cp with the -a or -p switch
tar with the -p switch

After you have carefully moved the backing file, you still need to get the COW file header to contain the new location. You do this by booting an instance on the COW file, specifying both filenames in the device description:

ubda=cow-file,new-backing-file

The UML block driver will notice the mismatch between the command line and the COW file header, make sure the size and timestamp of the new location are what it expects, and update the backing file location. When this happens, you will see a message such as this:

Backing file mismatch - "debian30" requested, "/home/jdike/linux/debian30" specified in COW header of "cow2" Switching backing file to 'debian30'

However, at some point, you will forget to preserve the timestamp, and the COW file will appear to be useless. If it's a UML root device, the boot will fail like this:

mtime mismatch (1130814229 vs 1130970724) of COW header vs \     backing file Failed to open 'cow2', errno = 22 VFS: Cannot open root device "98:0" or unknown-block(98,0) Please append a correct "root=" boot option

All is not lost. You need to restore the timestamp on the new backing file by hand, taking the proper timestamp from the error message above:

host% date --date="1970-01-01 UTC 1130814229 seconds" Mon Oct 31 22:03:49 EST 2005 host% touch --date="Mon Oct 31 22:03:49 EST 2005" debian30

The date command converts the timestamp, which is the number of seconds since January 1, 1970, into a form that touch will accept. In turn, the touch command applies that timestamp as the modification time of the backing file.

To minimize the amount of typing, you can abbreviate this operation as follows:

touch --date="`date --date='1970-01-01 UTC 1130814229 seconds'`" \       debian30

You may wonder why this isn't automated like the filename operation. When both the backing filename and timestamp don't match the information in the COW header, the only thing left is the file size. And there aren't enough common file sizes to have any sort of reasonable guarantee that you're associating the COW file with the correct backing file. I require that you update the timestamp by hand so you look at the file in question and can catch a mistake before it happens.

Merging a COW File with Its Backing File

Sometimes you want to merge the modified data in a COW file back into the backing file. For example, you may have created a COW file in order to test a modification of the filesystem, such as the installation or modification of a service. If the results are bad, you can back out to the original filesystem merely by throwing out the COW file. If the results are good, you want to keep them by merging them back into the backing filein essence, committing them.

The tool used to do this is called uml_moo. ^[2]Using it is simple. You just need to decide whether you want to do an in-place merge or create a new file, leaving the original COW and backing files unchanged. The second option is recommended if you're feeling paranoid, although making a copy of the backing file before doing an in-place merge is just as safe. Most often, people choose based on the amount of disk space available on the hostif it's low, they do an in-place merge.

^[2] I can only offer my deep and humble apologies for the namea bovine theme pervades the COW file support in UML.

Create a new file by doing this:

host% uml_moo COW-file new-backing-file

Do an in-place merge like this:

host% uml_moo -d COW-file

You can use the -b switch to specify the true location of the backing file in the event that the name stored in the COW file header is incorrect. This happens most often when the COW file was created inside a chroot jail. In this case, the backing file specified in the COW file will be relative to the jail and thus wrong outside the jail. For example, if you had a COW file created by a UML instance that was jailed to /jail and contains /rootfs as the backing file, you would do an in-place merge like this:

host% uml_moo -b /jail/rootfs -d /jail/cow-file