Section 5.1. How Software Gets Built

5.1. How Software Gets Built

This section is a brief overview of how source code is turned into an executable, a program that can actually run on a computer. This is the process is known as "building software." A summary of the different stages of a build is shown in the next section, Section 5.1.1. This summary is also used later on when considering how different build tools work in practice.

Source code (or "the program") is what developers (or "programmers") write. Source code can be written in high-level programming languages (such as C or Java), scripting or dynamic languages (such as Perl or Python), or low-level languages (such as assembly code). Source code can also be binary files such as images or precompiled libraries. Loosely speaking, anything needed by your product that cannot be generated from another place within your project is part of the source for your project. A build is the process in which a build tool uses other tools to convert the source code into a working product that can be used by other people.

How a product is used by other people varies for different customers and different machines. Some languages (such as Perl) are interpreted, which means that the source code is used directly by software that's already on the machine. This existing software is called an interpreter. Other languages (such as C and Java) are compiled, which means that another tool called a compiler converts the source code to the appropriate binary file format for the CPU to execute on each particular machine.

Writing source code is relatively straightforward until the amount of source code begins to grow. To help you keep track of what's going on in your program, you really want to divide up the source code. "Put the GUI code in these files, put the disk access code over here in this file, and then put all the database interface code into all these other files," and so on. Each of these parts depend on some of the other parts, but they probably don't depend on all of the other parts, and so dividing up the code in this way makes the product easier to imagine. Products depend upon these other parts being present, either when the product is compiled (at compile time) or when the program is run (at runtime).

To reduce the number of dependencies between different parts of the program, all kinds of simple and complex mechanisms have been invented over the years. Some of the commonly used ones are header files, data encapsulation, and interfaces. Section 5.3, later in this chapter, goes into more details. What all these approaches have in common is an attempt to make it very clear to different parts of the program how to use the other parts. These ideas do indeed help reduce the number of interfaces between different parts of a program, but at the cost of having to update them as the program grows.

The problem of building programs starts to get harder when different parts of the program have to be built in a certain order. For instance, with C programs you commonly build .o or .obj object files and then combine them into library files, before linking the generated files together to create an executable. With Java, you have to build .class files before you can create a .jar file. Before running your Java program, you'll also have to make sure that any other required .class or .jar files are present on your machine.

Whatever the particular requirements are for building the different parts of a program in a certain order, you might think that making sure that all those different steps are performed in the right order shouldn't be too hard. After all, it's just like a recipe for a meal with a large number of ingredients and lots of complicated steps to follow in order. Build tools are designed to perform the specified steps, using a defined build processthe recipewhich is usually described in build files. Once the recipe has been defined, then it just needs to be followed by the build tool.

If a program is not being changed, then even the simplest build tools should be able to follow a well-defined build process. However, change is inevitable in any program that is being developed or maintained. Changing just one source file means that the changed file has to be rebuilt. After that, other files that depend on the changed file have to be rebuilt. (The structure of which files depend on which is known as the dependency tree.) Shifting up to the next conceptual gear, the list of files that depend on the changed file also changes over time. That is, the parts of a program that need to be rebuilt for a particular change is not constant. Once the build tool has worked out which parts of the programs need rebuilding, it has to execute the appropriate commands to build just those parts. These commands may also have their own required order (e.g., compile before linking). Table 5-1 describes some of the ways that source code can change and what the build tools have to do to deal with the changes.

Table 5-1. When builds change
Type of change	What the build tool should do
New files were added.	[The developer needs to update the build files and check them for correctness.]
The contents of a file changed.	Rebuild the file; detect whether the compile failed or succeeded.
A file depends on some other file whose contents changed in some way.	Rebuild all the affected files.
A file now depends on a file that already exists.	Rebuild the dependency tree. Can the file be located properly?
A file now depends on a new file.	Rebuild the dependency tree. Does the new file already exist, or will it be created as part of the build?
A file no longer depends on another file.	Rebuild the dependency tree. Does the existence of the old file cause a problem? Should it be deleted?
A file now depends on a generated file.	Make sure that the dependency tree causes the file to be generated before it is needed.
A file now depends on a generated file whose own source has changed.	Regenerate all the necessary generated files.

The hardest things to get right with build processes have to do with what happens when dependencies change, not when source code changesit's like having a recipe change every time you try to use it.

Some concrete examples of the different types of changes shown in Table 5-1 are as follows ("foo.c foo.o" means that foo.o depends on foo.c):

foo.c foo.o: The contents of the file foo.c were changed, so the file foo.o needs to be rebuilt.
foo.h foo.c foo.o: The contents of the file foo.h were changed, and foo.c depends on foo.h, so foo.o needs to be rebuilt.
{foo.h bar.h} foo.c foo.o: foo.c now depends on both foo.h and bar.h, so foo.o needs to be rebuilt.
bar.h foo.c foo.o: foo.c now depends only on bar.h, so foo.o needs to be rebuilt. All information linking foo.c and foo.h should be forgotten.
foo.y foo.c foo.o: foo.c is now a generated file, derived from foo.y, so foo.c (and consequently foo.o) needs to be rebuilt whenever foo.y is changed or if the tools that generate foo.c change.

5.1.1. The Different Stages of a Build

Builds are made up of a number of different stages, just as compilers can have preprocessing, compiling, and linking stages. Each stage is usually performed by the same build tool, though not always. For example, configuration and the calculation of dependencies may use separate tools, sometimes even using the compiler itself. In practice, some of these build stages are small or nonexistent with some build tools, but the order of the stages is usually the same for all build tools. The sequence of the different stages for a typical build tool is:

Define the targets: What would you like to build? Everything? Just one file? A subset of files? The answer to this defines the targets of the build. By default, some build tools build as much as possible, while others just build the targets that are defined for the files in the current directory. The desired targets can usually be specified on the command line or as defaults in build files.
Read the build files: The names of the executables and the source files for each executable are defined explicitly somewhere, often in build files. The contents of a generic build file are shown in the next section, Section 5.1.2. The build tool reads the build files so that it knows what it's trying to do. It also reports syntax errors in the build files.
Configuration: The build tool discovers which platform and tools are to be used. The results of the build may be intended for a different platform than the one on which the build is executed (i.e., a cross-compilation), and this platform's details may be specified at the command line, along with the build targets. Some build tools assume that particular platforms have certain tools. Other build tools perform small experiments to discover precisely what works and how.
Calculate the dependencies: The build tool scans the build files and source files to work out which parts of the program depend on which other parts. Many dependencies are not specified in the build files, because there are too many of them and updating them by hand would be both error-prone and tedious. Instead, they are discovered by the build tool in this stage. This stage also reports any errors such as circular dependencies, where the chain of dependencies has a loop in it.
Determine what to build: Using the dependencies, the build tool works out which files need to be updated or generated. It reports errors such as nonexistent source files or files that couldn't be generated.
Construct the build commands: The build tool assembles the appropriate commands to update the out-of-date parts of the program. These commands are different for each platform and developer; even a tool such as gcc may be used with very different arguments on different platforms.
Execute the build commands: The build tool runs the commands to update the files that need updating and reports any errors returned by the commands. If there are errors, you can often choose whether to stop the whole build or to keep going.

Some of the stages may be repeated during a build. For instance, if source files are generated by executing some build commands, their dependencies will also need to be calculated. Likewise, the actual build commands that are executed may be constructed piece by piece.

5.1.2. A Typical Build File

Example 5-1 shows a generic build file, showing how some dependencies are specified; others will be discovered by the build tool. For example, the build tool does not specify what other files are required by fileA.

Example 5-1. A build file

# The executable myproduct is made up of fileA and fileB and uses # libraryX as well executable("myproduct", "fileA, fileB", "libraryX") # The library named libraryX is made up of file1 and file2 library("libraryX", "file1, file2") # This list of tests comprises two files named test_alice and test_bob # which are defined in some other build file files("tests", "test_alice, test_bob") # Installation targets which specify the files that are built for each # different kind of user install("testers", "myproduct, tests") install("customer", "myproduct")

5.1. How Software Gets Built

Table 5-1. When builds change

5.1.1. The Different Stages of a Build

5.1.2. A Typical Build File

Example 5-1. A build file