Section 5.2. Build States: Virgin, Up-to-date, Changed, Interrupted, Clean

5.2. Build States: Virgin, Up-to-date, Changed, Interrupted, Clean

The build state is the state of the source files used by the build tool when it starts a build. However, except for references to "clean builds," the names for the different states that a build can be in don't seem to be standardized, certainly not in any well-known way. So I have invented names for each state: virgin, up-to-date, changed, interrupted, and clean. Each state is explained in more detail in this section.

Build states refer to the state of the source code being used at the start of the build. This assumes that the build process has been correctly defined, with no syntax errors, circular dependencies, or similar mistakes in the build files. This means that whether a build succeeds or fails depends only on the state of the source code, not on the build process.

Understanding different build states is useful when talking about builds. Imagine the next time you want to describe to someone exactly how your product's build is broken. "Virgin builds are broken" is much more precise than just "the build's broken." The latter often receives a response of "well, it works for me," when what the person actually means is "changed builds work for me." Debugging a build problem often depends on the build state, and names for the specific kinds of build states help improve communication between developers.

Virgin: A completely fresh set of source code files, never before used in any build. Changes may have been made to source files, but no build has ever used them. Other names for this state could be sterile or unbuilt, but virgin reminds me of a virgin forest.
Up-to-date: No changes have been made in the source code files (or in any generated files) since the last build. If the build tool is performing correctly, this state is where builds end up, and every intended target is up-to-date.
Changed: Changes have been made in the set of source code files or generated files since the last time the build process was started. Typically, this happens by editing some of the source files, but sometimes there is information such as the time of the build or the build label that is different for every build. (Another name for this state could be dirty, in the sense of being the opposite of clean.)
Interrupted: The last time that the build tool was run it was interrupted, so some files may be incomplete or have unexpected contents. To use database terminology, running a build tool is not an atomic transaction, since no rollback capability is defined for the files that it changes.
Clean: All the files that were generated by a previous build have been deleted from the source files. Ideally, this state is identical to that of a virgin build, including the state of any modified source files. In practice, some generated files may get missed, or some files may have had their timestamps updated by the build or changed in some other way.

Figure 5-1 shows each of the different build states and how a set of source files moves from one state to another.

Figure 5-1. Five different build states

If a build process is working properly, then there are some assumptions that you can make to reduce the number of build states from five to three, which should make describing a build a little easier. If your build process has bugs in it, then you should use all five build states to make debugging it easier.

First, let's assume that our build files are carefully written so that the clean build state is the same as a virgin build state, and we can eliminate the virgin state.

Second, we can assume that interrupting a build is similar to modifying the source code, albeit in an unknown manner. This sounds rather scary, but it's a valid assumptionwhen you stop a build, you don't actually know what state the build tool has left various files in. This is especially true if you have written custom shell scripts and run them as part of your build. Figure 5-2 shows the different build states after making these two simplifications. Figure 5-2 is like Figure 5-1, but with the virgin state merged into the clean state and the interrupted state merged into the changed state.

Figure 5-2. Three simplified build states

5.2.1. Good Builds, Bad Builds

A successful or good build is one where the build tool was able to do what you asked it to do, such as building the product. A broken, failed, or bad build is one where the build tool was unable to do what you asked it to do. Builds can break for many reasons. Some of these are: incorrect source code, incorrect build dependencies (which are discussed in Section 5.3.1, later in this chapter), changes in the tools or their options used in a build (especially for cross-compiles), lack of disk space, or network interruptions.

Build states refer to the state of the source code being used at the start of the build, but they don't say anything about whether the code will compile or even if it makes any sense. So whether a build was successful or failed has nothing to do with the build state. That is, a virgin build is still a virgin build, whether its first build turned out to be a good or bad one.

5.2.2. Build States and Different Targets

State diagrams such as Figure 5-1 and Figure 5-2 apply for every target that a build tool knows about. If you ask the build tool to build the whole project, then the source files needed to build the whole project are referred to in each state. If you ask the build tool just to build one small executable target, then the source files for the state diagrams are only the ones that matter for that particular target.

The size of the set of files related to each state depends on what you asked the build tool to do, but the different build states apply to each different target's set of files.

5.2.3. Build States in Practice

Let's assume that you have a local machine with a copy of the source code for your product. Perhaps you used an SCM tool to access an internal site in your company. Perhaps the source tree came as part of an open source distribution that you downloaded. Either way, now you have the source code, and you're ready to build it for the first time. This kind of build is a virgin build.

After you've tried to build the product, the set of files that the build tool knows about will have changed. You've probably generated some intermediate files. If you wanted to build the product and succeeded, then you have probably generated some file you could actually use. Some new source code files may have been generated as part of the build. The build state is now changed, because the set of files that is used by the build tool has changed. This is true whether the build succeeded or failed.

Since you have the source code, you decide to make some changes to it. Maybe your manager thinks that's what you're paid to do, or maybe you do it even though you don't get paid for it; it doesn't matter. After you've made your changes, you rerun the build tool to create a new version of the product. The changes you made may only have been small ones, perhaps a few lines in just a couple of files, but this kind of a build is still a changed build, just as it was after you first ran the build tool.

Now that particular build has finished, and ideally it just rebuilt the parts of the program that depend on the files you changed. However, you can't remember if you saved one particular change before you reran the build. "Never mind," you think, "I'll just rebuild it again to be safe." If in fact you had already saved all your changes, then you'll get an up-to-date build. This kind of build is one where the generated files are already up-to-date from the previous build. In a perfect world, no one would ever perform an up-to-date buildwhy would you rebuild something if there have been no changes and the generated files are already up-to-date? In practice, people perform up-to-date builds to convince themselves that they are not getting a changed buildthat is, to make sure that nothing has changed unexpectedly. Since these builds do occur, they should be as short in duration as possible, because all they are doing is rechecking the build dependencies and they shouldn't actually have to execute any commands.

From past experience, you know that before you commit your changes, you want to make sure that they will work in a clean build. A clean build attempts to restore the source code to what it looked like when you first obtained it, except for any changes that you've made to the source files. A clean build does this by removing all the generated files, both generated source code and generated executable files. The reason that there are both clean and virgin build states is that not all clean builds really do restore the build to a virgin state. One easy way this difference can appear is when dependencies change, and the last versions of generated files (which no longer need to be built) still continue to exist. Another way is that some build tools leave droppings, small files scattered all over the source tree to maintain information about the state of the build; these droppings don't exist in virgin builds.

This time, you didn't interrupt your build tool in any way, so you never entered the other kind of build state, the interrupted build. This state, which is often quietly assumed to be the same as the changed build state, occurs when the build tool was in the middle of execution but was stopped somehow. Some tools can leave half-written or inconsistent sets of files in this case. As noted earlier, shell scripts that are executed as part of a build are particularly susceptible to creating this build state in their generated files. Programs that generate source code files are also vulnerable. Even compilers can leave incomplete object files if they are interrupted at just the wrong moment. Interrupted builds can produce very odd errors with build tools. Luckily, a virgin build will usually make things work again, but that does take time. The better approach is to make sure that generators and scripts clean up before and after themselves; also, such scripts should never assume that they ran to completion the last time they were run.