Section 5.5. Build Tools | Practical Development Environments

5.5. Build Tools

The six different build tools examined in the second half of this chapter are: shell scripts and batch files, make, GNU Autotools, Ant, Jam, and SCons. These are all no-cost, open source build tools. This is not to say that there are no useful build tools from commercial vendorsthere are a dozen or so listed at http://www.softwareengineering.info/build/build_by_product.html. They're just not used as commonly as the build tools that are listed below.

Section 5.1.1, earlier in this chapter, described the different stages of a build. It is worth remembering each stage as you consider different build tools: configuration, target definition, reading build files, dependency calculation, deciding what to build, command construction, and execution of commands. Some of these stages in some build tools are very small or even nonexistent.

What should you look for in a build tool? In order of most important to least, my recommendations are:

It should have accurate dependency checking.
Startup and dependency checking should be fast. The build tool should use tools such as compilers intelligently.
Builds should be independent from the local user environment in which the build tool was started. This makes it easier to reproduce builds on different machines.
Variant builds for debugging or with extra optimization should be easy to specify, preferably with a single argument to the build tool on the command line.
It should include support for many platforms and languages, particularly if the product is open source. The ability to build using the same source tree on multiple platforms at the same time is helpful.
It should be easy to write and read the build files, and the build tool should already be understood by many of the project members.
It should be scalable, with support for parallel builds.
It should have support for debugging builds and build file problems. Graphical display of the dependencies and changes in dependencies is a bonus. Clear output from a build tool about what command-line arguments were used and which user started the build is helpful. Minimizing the number of complete commands that are displayed can also make build logs easier to read, though the complete commands and their output should also be logged somewhere for each build.

Good support from a tool vendor or the user community is also extremely helpful. And don't forget that almost all tools have won an award at some time or other!

5.5.1. Shell Scripts and Batch Files

The simplest possible build tool is a piece of paper with a list of the compilation commands to type in order. A shell script or Windows batch file is really just that same list of commands, with tests for success or failure added occasionally.

This approach has some advantages. It's quick to develop, since many of the commands can be cut and pasted straight from the command line after you work out what they should be. There is little or no confusion about why a certain command was executed. For simple projects, with a few dozen small files and with straightforward build dependencies, shell scripts or batch files can be an adequate build tool.

Unfortunately, there are many disadvantages to using scripts and batch files as build tools. Some of these are:

Rebuilding every file: The biggest issue is that simple build scripts rebuild every file, every time. You might not think that the difference between a 5-second build and a 25-second build is very much, but if you spend a day working on the project and recompile it 10 times per hour for 10 hours, then over half an hour (which is more than 5% of the working day) is wasted just waiting for the project to rebuild.
Failure detection: Another big disadvantage of scripts is having to make explicit tests to see whether a command failed before continuing with the next command. If these tests are not made, you may not notice that a command halfway through the build failed, making the whole build suspect.
Debugging: Debugging shell scripts is done mainly by printing out text messages and the values of interesting variables at strategic points in the script.^[2] Some shells let you set a flag to display all the commands as they are executed in a script, which is really just a more verbose way of displaying interesting variables. It is also possible to run scripts in a "dry run" mode, where no commands are executed; instead, they're just displayed. This is helpful for scripts that would take a long time to execute, but if the behavior of the script depends on the contents of a file that is supposed to be generated by some other part of the script, then this approach doesn't work, unless you add echo statements to certain lines.
^[2] There is an interesting project at http://bashdb.sourceforge.net to extend the bash shell to support debugging.
Portability: Making scripts and shell files portable, so that they can be executed on a variety of platforms, is tedious and prone to error. The names and arguments of even the familiar commands to copy a file or to find a named file vary from platform to platform. A good build tool can shield the developer from many of these details.

In conclusion, shell scripts and batch files are adequate build tools only for the smallest and simplest projects. If you ever expect your project to grow, then take the time to use a real build tool.

5.5.2. make

make is the original build tool and is probably still the most common one. make is generally most popular with developers of C and C++ applications, while those using other languages, particularly Java, more commonly use Ant (see Section 5.5.4, later in this chapter).

The History of make

make was created by Stuart Feldman in 1977. The original paper about make is "MakeA Program for Maintaining Computer Programs" (Bell Labs Technical Report 57, April 1977), which can be found at http://citeseer.ist.psu.edu/feldman79make.html. Though most tools win awards at some time or other, Stuart Feldman and make won the prestigious ACM Software System Award in 2003, in recognition of the historical significance of make.

While much has been written about the advantages and shortcomings of make, nothing has been complained about as much as the trivial, but indeed irritating, requirement that some makefile lines must start with a tab character. Here's what make's creator has to say on the matter:

Why the tab in column 1? Yacc was new, Lex was brand new. I hadn't tried either, so I figured this would be a good excuse to learn. After getting myself snarled up with my first stab at Lex, I just did something simple with the pattern newline-tab. It worked, it stayed. And then a few weeks later I had a user population of about a dozen, most of them friends, and I didn't want to screw up my embedded base. The rest, sadly, is history. (Stuart Feldman, quoted in Eric S. Raymond, The Art of Unix Programming, Addison-Wesley, 2003)

Build dependencies are specified explicitly in build files, which are conventionally named Makefile or makefile and are written in make's own programming language. The format of these makefiles is shown in Example 5-2. make maintains static definitions of dependencies in the makefiles and does not detect dependencies between files. Files are noted as changed only when their modification timestamps change or when they are absent.

Example 5-2. A simple makefile

myproduct : file1.c file2.c     gcc -o myproduct file1.c file2.c     echo "Finished building"

In the Unix project shown in Example 5-2, the target myproduct depends on file1.c and file2.c. To create this target, make will execute the lines after the dependency line with the : on it (that is, the lines that begin with gcc and echo).

Note that both of the lines below the target myproduct start with a tab character, not four spaces. Let's say that file1.c is a C source file and that it contains the line: #include "file1.h"; that is, file1.c depends on file1.h. However, modifying or even deleting file1.h after myproduct has been built will never cause make to rebuild myproduct, because the dependency on file1.h is unknown. You could manually fix this problem by adding file1.h to the dependency line, changing the first line of the makefile to:

myproduct : file1.c file1.h file2.c

This lack of automatic dependency checking explains why most large projects that use make alone as their build tool have trouble rebuilding only those files that are affected by a change.

The are numerous implementations of make for almost every platform created in the last 25 years. The original make first became widely used with System V Release 2 Unix in 1984 but had been available since 1977. Compatible versions of make since then include a distributed make called dmake (1990), gmake (1991), BSD NET2 make (1991), and SunOS make (1989). nmake, the Microsoft version of make, is probably the most divergent make; it is one of the underlying build tools that are used when Visual Studio projects are built, whether from within the GUI or by using the msdev or devenv command-line tools.^[3] gmake (http://www.gnu.org/software/make) is the GNU version of make; it was written by Richard Stallman and Roland McGrath. gmake is singled out from all the other versions of make since it has been ported to so many platforms and supports a wide set of all the different features from the other versions of make. gmake has been maintained and updated by Paul Smith since Version 3.76 in 1997.

^[3] Visual Studio 2005 has a tool named MSBuild that uses XML to represent projects and dependencies, in a similar way to that used by nant, which is Ant for .NET (see Section 5.5.4, later in this chapter).

Makealike (or should it be makeoid?) programs are build tools that are based on the concepts introduced by make. Some of these are cake, cook (which is used with the CMS tool Aegis), Lucent's nmake (no relation to Microsoft's nmake), Plan 9's mk, and mms for VMS. makepp (http://makepp.sourceforge.net) is a more recent replacement for gmake, designed to address many of the problems with make that are listed later in this section, with improvements including automatic dependency checking and using file signatures instead of timestamps. bmake (http://www.crufty.net/help/sjg/bmake.html) is derived from NetBSD's make and has some useful extensions in how variables and conditional targets are treated. All these tools have been used in projects, but none have achieved wide usage.

Documentation varies widely among the different versions of make. gmake is reasonably well documented (http://www.gnu.org/software/make/manual), but, as with many build tools, developers tend to write makefiles for large projects by copying fragments from existing makefiles. This is especially true if the project has a complicated or hierarchical directory structure. Beyond the gmake manual, there are two books for more information about make. The first is Managing Projects with make, by Robert Mecklenburg. The third edition covers much more than the previous two editions. The second book is more gmake-specific: Programming with GNU Software, by Mike Loukides and Andy Oram. Both books are published by O'Reilly.

Of the two books mentioned, only Managing Projects with make discusses how to create makefiles that scale well as a project grows. The classic article about this issue is "Recursive Make Considered Harmful," by Peter Miller (http://www.canb.auug.org.au/~millerp/rmch/recu-make-cons-harm.html). Peter Miller is also the author of the CMS tool Aegis. His key observation is that if you have makefiles in many directories and subdirectories, then make can spend a lot of time and effort processing each makefile, reevaluating some of the same dependencies over and over, and still end up with a fragmented view of what the whole project needs. The better alternative is to use make's include directive to gather all the makefiles into a single, logical makefile and then process just that one makefile. This approach is called included or nonrecursive make.

There is a good discussion of how the OpenRADIUS project implemented included makefiles at http://www.xs4all.nl/~evbergen/nonrecursive-make.html. If you are interested in practical comparisons of the two approaches, see Appendix A for some tests and their results, and some similar results by Boris Kolpackov at http://kolpackov.net/projects/build/benchmark.xhtml.

The greatest strength of make is that it's everywhere, and so it's already familiar to many developers. Unfortunately, the problems with make are numerous:

Incomplete dependency analysis

The common recursive use of make, with makefiles calling other makefiles, can lead to incomplete dependency graphs or even circular dependencies. The traditional workarounds for this are to change the order in which the subdirectories are visited; to repeat the execution of make multiple times; or to always run make clean first. This last choice is uncomfortably like just using a script for your build tool (as discussed earlier in this chapter, in Section 5.5.1).

Separate tools to help create and maintain dependencies for make do existmkdepend is one such tool, and the -M argument for the gcc preprocessor is anotherbut these are other tools that have to be configured in addition to make.

Portability

Not only do the versions of make commonly installed on different platforms differ significantly from each other, but the way that tools such as compilers are invoked varies widely from platform to platform. This means that either makefiles have to be written specifically for each platform, or a project-wide mechanism to set variables correctly on different platforms has to be created for each project.

As an aside, the largest project that I know of with custom makefiles for every platform is libpng (http://www.libpng.org), with 41 platform-specific makefiles for about 30,000 lines of C. That's a lot of makefiles to modify when you add a new source file.

Speed

Builds of large projects using make still take hours, if not days, despite the tremendous advances in CPU and disk speeds in recent years.

Debugging

Working out why make did or didn't choose to compile a file can be difficult. The -n argument to make can be used to perform a dry run, which doesn't actually execute any commands but instead just prints them out. Print statements such as echo can also help sometimes. gmake has better debugging output than most versions of make and supports a -d argument for choosing the desired type of debugging output, but the output is still overly verbose and hard to use.

Clock skew

The use of file-modification timestamps to determine when a file has been updated is imprecise and prone to error, especially across distributed filesystems such as NFS. Restoring a file from a backup copy, even on the same machine, can change the timestamp and cause make to rebuild source files unnecessarily or, even worse, not to rebuild source files when it should.

Makefile syntax

The make language does not have the conveniences of a carefully designed programming language. For example, make has the strange requirement of tabs rather than spaces at the beginning of certain makefile lines. A number of versions of make also behave strangely with lines longer than 80 characters in makefiles.

Paul's Rules of Makefiles

These rules are taken verbatim from http://make.paulandlesley.org/rules.html, the personal web site of Paul Smith, the maintainer of GNU gmake; square brackets indicate my comments. The rules deserve wider exposure, even though some of the things that they refer to are not explained in this book. The same web site also has an informative article titled "How Not to Use VPATH" (http://make.paulandlesley.org/vpath.html).

1. Use GNU make

Don't hassle with writing portable makefiles, use a portable make instead! [There is some bias here perhaps, but gmake is an extensive, full-featured, and widely ported version of make.]

2. Every non-.PHONY rule must update a file with the exact name of its target.

Make sure every command script touches the file "$@"not "../$@", or "$(notdir $@)", but exactly $@. That way you and GNU make always agree.

3. Life is simplest if the targets are built in the current working directory.

Use VPATH to locate the sources from the objects directory, not to locate the objects from the sources directory.

4. Follow the Principle of Least Repetition.

Try to never write a filename more than once. Do this through a combination of make variables, pattern rules, automatic variables, and gmake functions.

5. Every non-continued line that starts with a TAB is part of a command script, and vice versa.

If a non-continued line does not begin with a TAB character, it is never part of a command script: it is always interpreted as makefile syntax. If a non-continued line does begin with a TAB character, it is always part of a command script: it is never interpreted as makefile syntax.

Continued lines are always of the same type as their predecessor, regardless of what characters they start with.

Once the problems with maintaining a large number of makefiles for multiple platforms became apparent, Prof. David Wheeler's observation that "all problems in computer science can be solved by another level of indirection" came into play, and numerous makefile generators were created.^[4] The idea is to take a template file that lists both the files that you want from your build tool and also your source files, and then run the makefile generator with some platform-specific arguments to generate the appropriate makefiles. Some of the better-known ones are imake and Automake. Automake is by far the most common one nowadays in open source projects and is covered further in Section 5.5.3.2, later in this chapter. imake is primarily used for building the X11 Window System and is described further in the book Software Portability with imake, by Paul DuBois (O'Reilly).

^[4] Wheeler was chief programmer for the EDSAC project in the early 1950s and is one of the inventors of the subroutine. He is also coinventor of the compression algorithm used by the popular Unix bzip2 tool.

Some products that attempt to solve the problems with using make for parallel or distributed builds include Electric Make (http://www.electric-cloud.com) and distcc (see Section 5.4.1, earlier in this chapter). Electric Make is a commercial replacement for make (original make, GNU gmake, and Microsoft's nmake are all supported). For a starting price of $50,000, it monitors all the compilations and other build-related commands on multiple machines, and when it detects an incorrect dependency it reschedules the erroneous compilation for later on. Even the logfiles are rewritten to show the correct build ordering.

To summarize, if your project is unlikely to grow too large, is intended for only a small number of platforms, and is written in C or C++, then make is still an appropriate build tool. If not, there are better alternatives; these are discussed in the rest of this chapter.

5.5.3. GNU Autotools

The GNU Autotools suite, also referred to simply as the Autotool suite, or sometimes as the GNU Build System, is probably the build suite most commonly used by open source C and C++ projects. The familiar incantation of ./configure; make; make install seems to build and install about 90% of the tarballs that you can download.

The History of GNU Autotools

GNU Autotools were developed in the early 1990s for the GNU Project. Among the early contributors were David MacKenzie, Gary Vaughan, Ben Elliston, Tom Tromey, and Ian Lance Taylor. Judging by the rate of recent releases, it would seem safe to say that the tools are mature, but actually real functionality is still being added. The licensing scheme for GNU Autotools is, not surprisingly, the GPL.

The GNU Autotools suite actually consists of three separate toolsAutoconf, Automake, and Libtoolall of which use some common configuration files. Autoconf (http://www.gnu.org/software/autoconf) creates shell scripts named configure. These scripts can be executed to find out which features required by an application are present on a particular platform. Automake (http://www.gnu.org/software/automake) uses Makefile.am files to create Makefile.in template files, which Autoconf can then use to create GNU gmake makefiles. Libtool (http://www.gnu.org/software/libtool) helps create static and shared libraries in a portable and versioned manner for programs that are written in C. The most commonly used of these three tools seems to be Autoconf, judging by the number of configure files out there. Even if a project doesn't use Autoconf, the configure file is often the place where installation begins.

Any filename that ends in .ac is probably an Autoconf-related file. Any filename that ends in .am is probably an Automake-related file. Any filename that ends in .in is probably an input file for one of the three GNU Autotools.

5.5.3.1. Autoconf

The concept of Autoconf is that all you should have to do to install a package is run the generated configure script to discover the precise details of how various features, such as compilation and linking, are actually working on your platform. This information is then passed to the build tool and also into the program by header files and preprocessor #define definitions. This concept alone may explain the popularity of the GNU Autotools suite; other build tools tend to make assumptions about what is provided on each version of each platform, but the GNU Autotools suite confirms this information when the application is actually installed. Other factors that explain the suite's popularity are that only the basic Bourne shell, the necessary C or C++ compilers, and make are usually needed for the configure script to work. Also, the default make dist command creates a convenient tarball of all the files that you need to package a release.

The configure script has a number of different arguments that you can use to enable and disable various parts of the program that you are building. The default of using no arguments Should Just Work, and if configure fails to run due to a missing dependency for the project, then the error message is usually clear enough. However, if running configure fails in any other way, then debugging the problem can be hard. Section 5.5.3.5, later in this chapter, describes some helpful approaches for debugging installations from the perspective of a user.

From the perspective of a developer working with Autoconf, you create a file configure.ac to be the input for Autoconf. configure.ac (which used to be named configure.in) is written in a mixture of GNU m4 (a macro processing language) and Unix shell commands. However, you need to ship only the generated configure file, not its precursors.

5.5.3.2. Automake

Automake produces makefiles that will work with GNU make, and they should also work with many of the other variants of make. Automake uses the file Makefile.am to describe the desired executables, libraries, and other files for the project. The language used for Makefile.am is specific to Automake, but Automake also reads the same configure.ac configuration file that is used by Autoconf. Using the Makefile.am file, Automake produces Makefile.in files that Autoconf can use to produce makefiles.

The make targets in the makefiles that are created by Automake and then Autoconf include well-known targets such as all, install, and clean. Other useful targets are uninstall, which simplistically undoes whatever install did, and check, which executes any tests defined by the package developer. The target dist creates a distribution archive file, and distcheck creates a distribution, then untars the archive into a new directory, builds the package there, installs it, and finally runs make check.

Since Autoconf and Automake are separate tools, one can imagine creating build files other than makefiles. For instance, the AutoJam prototype (http://developer.berlios.de/projects/autojam) creates the equivalent of Jamfile.in files for Autoconf to use to create the build files named Jamfile used by the build tool Jam (see Section 5.5.5, later in this chapter). SCons (see Section 5.5.6, later in this chapter) already has its own Autoconf-like functionality that does at least some of what Autoconf does, so a separate AutoSCons tool is less likely to be developed.

5.5.3.3. Libtool

Libtool is designed to hide the implementation details of creating libraries, especially shared libraries, on different platforms. It's fully integrated with Autoconf and Automake but can be used even without them. The contents of the desired libraries are defined in Makefile.am, just as executables were for Automake. Libtool also includes support for versioning of libraries and for tracking which versions are expected for a particular package.

When it comes to finding out which libraries have already been installed and using them in other packages, pkg-config (http://pkgconfig.sourceforge.net) is a separate tool that integrates well with the GNU Autotools suite and is useful for discovering the particular arguments that are necessary to use an installed library on your platform.

5.5.3.4. An Autotools "Hello World" program

None of the half dozen or so tutorials that I found while researching this section worked exactly as written. So I've provided the precise steps that I used to create my own "Hello World" program with GNU Autotools. These instructions are for Autoconf 2.53 and Automake v1.6.3. Libtool was not used here.

Create a new directory with two simple C source files named hello.c and hello.h. hello.c should contain at least the standard main function and can also contain lines such as:
```
#ifdef HAVE_STDLIB_H   .   .   (any code that depends on stdlib.h goes here)   .   . #endif
```
At the top of hello.c, add the line:
```
#include "hello.h"
```
The other C file, hello.h, should contain:
```
#ifdef HAVE_CONFIG_H #  include <config.h> #endif #ifdef HAVE_STDLIB_H #  include <stdlib.h> #endif
```
Eventually, HAVE_STDLIB_H will be defined or not defined in config.h, depending on whether configure finds the header file stdlib.h on your platform.
In the same directory as hello.c, create a file named Makefile.am containing the two lines:
```
bin_PROGRAMS = hello hello_SOURCES = hello.c
```
In the same directory, run the autoscan command to create the file configure.scan. This command emitted a warning about an uninitialized value for me, but succeeded. Since autoscan does not overwrite configure.ac, you can run it periodically to detect potential portability problems in your project.
Rename the file configure.scan to configure.ac and edit that file as follows:
1. Change AC_INIT(FULL-PACKAGE-NAME, VERSION, BUG-REPORT-ADDRESS) to AC_INIT(hello, 0.1, yourname@example.org).
2. Add a line containing only AM_INIT_AUTOMAKE below the top AC_INIT line.
3. Change AC_CONFIG_HEADER([config.h]) to AM_CONFIG_HEADER([config.h]).
4. Change AC_CONFIG_FILES([ ]) to AC_CONFIG_FILES([Makefile]).
Now create four empty placeholder files with the command touch NEWS README AUTHORS ChangeLog. This step can be avoided if the --foreign argument is used when automake is run later on.
Run aclocal to produce the file aclocal.m4, which contains m4 macro definitions such as AM_INIT_AUTOMAKE.
Run autoconf to produce the configure file and a cache directory.
Run autoheader to produce config.h.in, which contains #define directives that will be set up when configure is run.
Run automake add-missing to produce Makefile.in and some other necessary scripts. Note that the last four commands (starting at aclocal, in step 6) should be run as a group if any subsequent changes are made.
This is the step that a customer downloading the source for this project starts at. Run configure to produce the file Makefile. The line in the generated file Makefile that starts with DEFS shows what was found by running configure, as does the generated file config.h.
Run make to produce the "Hello World" executable named hello, the result of compiling hello.c in a portable way.

5.5.3.5. Debugging GNU Autotools installs

The advice in this section refers to what you can do when ./configure; make; make install doesn't do what it should, but you have faith that compiling the package on your machine should be possible. Problems of what to do when Autoconf, Automake, and Libtool don't work when you are developing the configure file itself are a different issue. For those kinds of problems, the manuals and the mailing lists at http://lists.gnu.org/archive/html/autoconf, http://lists.gnu.org/archive/html/automake, and http://lists.gnu.org/archive/html/libtool are good places to start. Also, since the actual Autotools programs are themselves written in shell script and Perl, familiarity with using a Perl debugger may help.

Some problems that you may see during installation and build include:

Could not find...

If configure fails with a message saying that it could not find some dependency, your first action should be to check whether the required files are installed and, for dependencies involving libraries, that the installed versions are adequate. There may also be arguments to configure to let you tell it where to find particular packages. configure help will show the available options.

Wrong Autotools version

If configure fails with an error about needing a different version of one of the Autotools, you should be able to download and install it safely, since the Autotools are written with the intent that multiple versions will coexist on one machine. This usually happens only if you are trying to modify a package, in which case it's not unreasonable to require that you use the same versions of the Autotools as the package's developers.

configure succeeded, make failed

Let's assume that you can see the text of the command that failed. First cut and paste the offending command to a shell script file and execute this file to confirm that the command fails using your own environment, as well as the environment being used by the Autotools.

If the command succeeds, then change the environment, preferably in configure.ac. If the command fails, do what is necessary by hand to the command line or source code to make it succeed. If success can be defined as "ignore this failure," then many versions of make support a -i argument to simply ignore errors and keep on going.

If you had to change the source to get the command to run successfully, then it is possible that the source wasn't written portably. Email the maintainers with the diffs and platform details. If changes in -D defines or other command-line arguments were necessary, apply the minimal set of changes to the files config.h or Makefile, understanding that these are generated files and thus may be overwritten.

Finally, make the effort to send the package maintainers the details of the problem: your package and Autotools versions, your platform, the broken command line and the working command line, and, ideally, the changes that fixed the problem.

I just want to add one file!

The proper way to do this is to modify Makefile.am and then to rerun the Autotools. A temporary hack is to modify the generated Makefile, adding the new source file to the _SOURCES variables. Of course, you run the risk of the modified Makefile being overwritten if the Autotools are rerun at any time.

make install failed

If the prefix directory is owned by root, you may need to run make install with root privileges, or rerun configure and specify a different prefix directory for which you do have write permission.

5.5.3.6. More Autotools

Documentation for the individual GNU Autotools is good enough, but the current sources of information about how to use them all together are barely adequate. There are the standard GNU manuals for each tool at http://www.gnu.org/software/autoconf/manual, http://www.gnu.org/software/automake/manual, and http://www.gnu.org/software/libtool/manual.html. There is also one book that aims to bind all three tools together, GNU Autoconf, Automake, and Libtool, by Gary Vaughan, Ben Elliston, Tom Tromey, and Ian Lance Taylor (New Riders). It's also known as the "Goat book" because of its cover picture. Even though it's rather dated now, it is still strongly recommended for anyone trying to do more than cut and paste from other projects. The contents of the Goat book are also available online at http://sources.redhat.com/autobook. Another useful (though dated) article, which includes a well-written example, is "The GNU Configure and Build System," by Ian Lance Taylor (http://www.airs.com/ian/configure).

The GNU Autotools were originally developed for Unix systems and only later modified to work for Windows machines. The recommended way to use them in Windows is to install the Cygwin environment and the necessary Perl and Microsoft tools. The project at http://gnuwin32.sourceforge.net has more information on how to do this. Mac OS X is supported by GNU Autotools. Languages that are supported by the default installations of the GNU Autotools include C, C++, Fortran, Perl, Lex, Yacc, TEX, Emacs Lisp, and, to a lesser degree, Java and Python.

The Autotools are a well-established way of making sure that software is portable to a very large number of platforms. Since developers can write their own m4 macros for configure.ac files, the Autotools are extensible. Indeed, there is a public archive of useful m4 macros at http://autoconf-archive.cryp.to, and the GNULIB project (http://savannah.gnu.org/cgi-bin/viewcvs/gnulib/gnulib/m4) has more examples. The Autotools have good support for localization of text strings in products, using the GNU gettext utilities (http://www.gnu.org/software/gettext).

Still, opinion is divided about GNU Autotools. On one hand, thousands of people use the results of Autoconf and run configure quite happily every day. On the other hand, when an install does fail, they have little hope of understanding or fixing the problem without substantial effort. Most of the problems with GNU Autotools seem to be voiced by developers who are trying to write the makefiles. These issues generally fall into the following categories:

Layers upon layers: It can be confusing when different files are used as input to tools that generate files, which are then used as input to yet more tools. Debugging the results of such a complicated scheme can be hard and time-consuming. If you want to add a source file to a downloaded package, you (understandably) have to have the GNU Autotools installed in order to regenerate the makefiles; otherwise, you have to temporarily modify complex makefiles.
Large and complex generated files: The generated files can become quite large, starting at around 3,000 lines and often growing to more than 30,000 lines of shell script. configure scripts are written using only the simplest shell constructs in order to ensure portability, but this ends up creating convoluted scripts not really intended for reading by humans. Also, even using cached results, running configure can take a fair amount of time. Supporting all the standard targets and options for make also adds to the size of the hierarchical makefiles.
Mixed and arcane languages: The macro processing language m4 is not a particularly common language (the only other applications that are using it seem to be Sendmail and fvwm), so it's a barrier for developers wanting to use GNU Autotools. The total number of languages used by all the Autotools is quite large: GNU m4, Perl, shell languages, make, Autoconf macros, and the Automake language.

Names of build files

The first build tool was make. Make uses files named makefile, in various combinations of upper- and lowercase. These files have come to be known as "makefiles." By generalization, build tools use "build files" for their configuration, but of course the actual names of the build files are different for each tool. Many build tools have one special top-level build file and then many other build files. The filenames that are used by convention for the tools discussed in the following sections are:

5.5.4. Ant

Ant (http://ant.apache.org) is an open source build tool, part of the Apache community of open source software projects. Ant is licensed under the Apache License.

Originally designed as an extensible replacement for make but "without make's wrinkles," Ant quickly found favor as the build tool for projects written in Java. Ant comes with a large number of ready-to-use tasks. Each task allows users to define a small part of a build process in their build files. The ease with which Ant can have other tasks added to it has resulted in a build tool with a diverse set of abilities. You could go so far as to say that if you want your Java tool to be used nowadays, it has to have an Ant task to run it. All IDEs, whether open or closed, that are intended for developing Java projects now have built-in support for using Ant.

The History of Ant

Ant (possibly an acronym of "another neat tool") was developed by James Duncan Davidson (quite possibly during a long airplane flight) as part of the Apache Tomcat project. Tomcat 3.1, released in April 2000, contained an early version of Ant. Version 1.1 of Ant was released in July 2000, and roughly one major release per year has occurred since then. Ant was promoted to its own Apache project in November 2002 and has a core team of around 20 people.

The build files for Ant are written in XML. The core of Ant itself is written in Java, as are the Ant tasks that are used in the build files. The build files have <project> XML elements, which contain <target> elements, which are the names of targets that can be passed to Ant on the command line. target elements can specify that they depend on other target elements. Each target contains one or more <task> elements, which are the elements that control what Ant actually executes during the build.

Ant build files are conventionally named build.xml. For the Java project shown in Figure 5-5, a build.xml that uses the jar, javac, and the delete Ant tasks would look like that shown in Example 5-3.

Figure 5-5. Directory tree of an example Java project

The default task is named dist and it calls the run_tests target, which in turn calls the compile and compile_tests targets to compile the product and compile the tests, using the javac task. Where there is just an echo task in this example build file, you would add tasks to actually run the tests. Finally, the first target, dist, uses the jar task to create a distributable JAR archive of the product.

Example 5-3. An Ant build.xml file

<?xml version="1.0" encoding="ISO-8859-1"?> <project name="My Project" basedir="." default="dist">   <target name="dist"            description="Jar up the project ready for distribution"            depends="run_tests">     <jar destfile="dist/project.jar" basedir="build/classes"/>   </target>   <target name="run_tests" depends="compile, compile_tests"           description="Test the project class files">      <echo>Call an Ant task to test the project</echo>   </target>   <target name="compile"            description="Compile the source files">     <javac srcdir="src" destdir="build/classes"/>   </target>   <target name="compile_tests"            description="Compile the test source files">     <javac srcdir="test" destdir="build/test_classes"/>   </target>   <target name="clean"            description="Remove all generated files">     <delete dir="build/classes/org"/>     <delete dir="build/test_classes/org"/>     <delete dir="dist/*"/>   </target> </project>

The output from executing ant with no arguments (so that the default dist target is used) shows the target names and the tasks that they call. Note that while this minimal build took 15 seconds from start to finish on a rather underpowered laptop, it only took 3 seconds on a more modern desktop machine:

[theo@theo-laptop example]$ ant Buildfile: build.xml compile:     [javac] Compiling 2 source files to /home/theo/example/build/classes compile_tests:     [javac] Compiling 1 source file to /home/theo/example/build/test_classes run_tests:      [echo] Call an Ant task to test the project here. dist:      [jar] Building jar: /home/theo/example/dist/project.jar BUILD SUCCESSFUL Total time: 15 seconds

Since Ant is written in Java, it runs unchanged on all platforms that support Java. The common ones are Solaris, GNU/Linux, and Windows, using the downloads from Sun (http://java.sun.com). Dozens of other platforms have had other Java Virtual Machines (JVMs) written for them; there is an extensive list available at http://www.geocities.com/marcoschmidt.geo/jvm.html. All official Ant tasks are written to operate correctly regardless of the underlying platform; that this mostly works as designed is due to plenty of testing. Troublesome areas include the Cygwin environment and older versions of Windows. If you write a task yourself, be careful about portability when using native commands, especially ones like ps, the Unix command for listing processes, which seems to have different arguments wherever it is found.

The documentation for Ant is generally good and certainly extensive, with standardized descriptions of what each Ant task does.^[5] The Ant documentation includes FAQs, reference manuals, API manuals, Wikis, presentations, and half a dozen books at last count. The books include O'Reilly's Ant: The Definitive Guide, now in a second edition, by Steve Holzner, and Java Development with Ant (Manning Publications), by Erik Hatcher and Steve Loughran, who are two members of the Ant project. There is also a different O'Reilly Ant book only in German: Antkurz und gut, by Stefan Edlich. Be warned, though: the first editions of the O'Reilly books cover only up to Ant 1.4.

^[5] This may be partly due to Javadoc (see Section 8.8), the standard tool that makes it easy to create API reference documents for Java programs; it is installed with Java by default. Perhaps Java programmers have come to expect more from their documentation?

Ant has many strengths. It runs on all the platforms for which you can find a JVM, and that's every major platform I've used in the past 10 years. Ant uses XML to describe the build dependencies for a project, so you don't have to be able to write Java to use Ant. Using XML also means that the information can be easily transformedfor instance, into a graphical representation of the dependencies, as is done by Vizant (http://vizant.sourceforge.net) and AntGraph (at the time of this writing, the download site is no longer valid, but AntGraph's author can be contacted via http://www.ericburke.com).

Another major strength of Ant is the large number of Ant tasks that already exist. Over 80 core tasks are shipped with Ant, and they provide support for Java compilation, filesystem commands, archiving, mail, and CVS commands. The 60 optional tasks that are also part of the Ant project provide access to other SCM tools, unit testing, and much more. There are hundreds of other public Ant tasks freely available to do everything a project could require, including source code and document generation, web page maintenance, and even automated blogging.

Creating your own Ant task is not hard if you can write basic Java, since the documentation is clear and there are a plethora of available examples. You can also use the exec task to specify the precise commands to be executed as part of the build, though this tends to create platform-specific build files and makes it harder to determine whether a task succeeded or failed.

Writing an Ant Task for JDiff

Everything I had read said that writing your own Ant task was easy, but I wanted to confirm it for myself. There had been a few requests for an Ant task for JDiff (http://www.jdiff.org), an open source tool I wrote that compares two different versions of a set of Java files. JDiff produces a Javadoc-like report of all the parameters whose types have been changed, the methods and classes that have been added and removed, and so on. JDiff is implemented as a Javadoc doclet (an application that uses Javadoc to parse Java source files) and had already graduated from build scripts to Ant a few years before. I had been testing JDiff using Ant's ready-made Javadoc task, but there were a lot of knobs to twiddle to use JDiff with that task. What was wanted was a simple Ant task to make it easier to generate a no-frills comparison of two Java projects.

I started with Ant's tutorial "Writing Your Own Task" (http://ant.apache.org/manual/develop.html), and after less than 30 minutes of carefully following along, I had my first working custom Ant task. Adding support for attributes and nested elements in the XML for the task took another hour or so to get working, with the help of the Java Development with Ant book. Then I decided to create my own JDiff task by extending the Javadoc task class, rather than the ordinary Task class. Documentation for this was mainly the Javadoc task API pages, which told me enough to generate my first call to Javadoc. The trouble was that my task wanted to make three calls to Javadoc, not just one, and I couldn't see any easy way to reset the parent Javadoc task object to a clean state so that I could execute it again and again.

It was time to get away from that Javadoc task class. Instead of the JDiff task class inheriting from the Javadoc task class, I changed the JDiff task class to use three different instances of the Javadoc class. That worked nicely, after I eventually discovered that you have to call the method setProject yourself on such separate Task objects. The documentation is skimpy for this particular approach to custom Ant tasks, so if you are considering writing your own Ant task, extending an existing class for a task still seems to be the easiest way to proceed. The quality of the Javadoc documentation for your chosen parent task will make a big difference in how easy the process is. Of course, if you really want to be sure of how that parent task works, the source code is where the answers are. Ant 1.7 should have some improvements in how tasks can be reused.

Overall, writing my own Ant task for JDiff was not too hard. It took about a day to complete, mainly because of trying the two different approaches. The errors and warnings generated by Ant when there were errors in the build file were clear enough. The verbose and debug arguments for Ant helped me to see what my task was really executing. If I had needed more details, I could have run Ant inside a Java debugger or added the time-honored println or log statements, which both appear nicely interleaved with the output from Ant.

Some of Ant's weaknesses are:

XML limitations

One of Ant's weaknesses is that large projects have large build files. XML was not designed to be particularly concise; thus a good XML editor such as XMLSpy or even Emacs becomes vital for larger projects. Keeping track of all the different parts of large build files can be complex, though each target XML element can have a textual description conveniently associated with it.

XML is fine as a format for build files until you try to use Ant as a scripting language or want to add conditional (if-then-else) constructs to your build files (though there now is a condition task). One specific example of the issues of using XML is that if you want to execute a shell command that uses a < input redirection operator, you have to write < in the XML. Quoting of arguments is another messy area. The use of XML can be frustrating when a particular task doesn't do quite what you want it to do. You can either tweak the Java source for the task and then recompile it or try writing XSLT scripts to transform your XML files into the kind of XML that Ant build files need.

Complex dependency chains

Another nonobvious feature of Ant is that if target 3 depends on targets 1 and 2, and target 2 also depends on target 1 (i.e., a triangle), and if you invoke Ant as ant target2 target3, then target 1 will be executed twice: once for target 2 and once for target 3. The way to avoid this is to always invoke target 3 from the command line, which will do the right thing.

This can sometimes create challenges when defining the target dependencies as projects grow. One way to avoid this is to eschew the depends attribute altogether and define the required dependencies in another target, using the antcall task to invoke each target in turn. The ant task can be useful when creating a hierarchy of recursive build files, though this approach can result in slower builds, just as with recursive make. The import task, new in Ant 1.6, promises to make large, modular builds somewhat easier.

Limited properties

Ant build files use properties to store values, but these are not as powerful as variables in a regular programming language. For instance, once a property has been set, it cannot be changed later on in the build file. Also, you cannot use one property to hold the name of another property; one level of redirection is all you get. Many XML editors don't know how to expand Ant properties.

Parallel builds and dry runs

Parallel builds are not as straightforward with Ant as with other build tools (and they're not particularly easy in some of those tools either). Dry runs (seeing what Ant would do if it were run but not actually executing any commands) are not supported by Ant, though individual tasks may support them.

Slow startup

Since most JVMs seem to take some time to start up, subjectively Ant is not a particularly fast build tool. However, most Java developers find it to be much faster than make. This is primarily because Ant uses a single JVM for multiple tasks, but it may also be because Ant can reduce the number of times that a Java compiler has to be invokedfor example, by compiling more files at a time with each compilation command. The dependency checking of the default Java compilation task is not particularly robust, so other build dependency tools have emerged. There is Ant's own optional depend task. I've had good results using Misha Dmitriev's Javamake (http://www.experimentalstuff.com/Technologies/JavaMake/index.html). You can also use faster Java compilers, such as the open source Jikes compiler, originally from IBM (http://jikes.sourceforge.net). For small to medium-sized Java projects, many experts take the approach that a clean build using a fast compiler beats complex, and sometimes error-prone, dependency analysis.

Platform-dependent issues

Platform-dependent issues such as the use of forward or backward slashes in filenames can be minimized through careful use of Ant tasks such as PathConvert to generate the platform-specific version of a filename. For more information on this and other issues about using Ant in real development, see Steve Loughran's article "Ant in Anger" at http://ant.apache.org/ant_in_anger.html.

Ant does not support internationalization directly, nor is any of the Ant logging output localized, but given the good support in Java for both, this should be easier to do than with most build tools.

Numerous Ant-related projects have been developed; some add to the list of tasks for Ant, some providing alternate ways such as GUIs to create build files for Ant, and some reimplement Ant for other environments. Examples of each of these include:

More tasks: AntContrib (http://ant-contrib.sourceforge.net) provides support for compiling C and C++ source files with Ant on a variety of platforms.
Build file generators: Antelope (http://antelope.tigris.org) is a UI for creating Ant build files that can also help with profiling and debugging Ant build files.
Other variants on Ant: nant (http://nant.sourceforge.net) is a build tool for the Microsoft .NET Framework that is written in C#. nant uses XML build files very much like Ant's and includes a convenient slingshot task to create an Ant build file from an existing Visual Studio .NET project. Visual Studio 2005 has a tool named MSBuild that uses XML similarly to the way nant does.

As many other IDEs already do, Microsoft's Visual Studio and Borland's JBuilder both save their lists of the source files that are associated with a project in XML files, and these files are used by the IDE's internal build tool to build the project. XSLT scripts can transform these tool-specific XML project files into Ant or nant build files. This often makes it easier or cheaper to build projects entirely without the IDE, since you may be able to avoid installing the full and sometimes costly IDE on multiple build machines.

Another popular use of Ant is as the basis for an automation environment. Automation environments are discussed in detail in Section 3.4; basically, they are applications to help you automate the checkout-build-test-deploy parts of your build process (this is known as "continuous integration" in the XP literature). Typical tasks for automated environments are downloading the source for a project using an SCM tool and then running a build and unit tests, and doing all this continually, or every hour or night. Reports of the current state of the project are created and made available, often through a web site or by email.

Some automation environments that use Ant are Anthill (http://www.urbancode.com/projects/anthill), which is a commercial tool that also has a no-cost version, and CruiseControl (http://cruisecontrol.sourceforge.net), an open source automation environment. Both these and other automation environments are discussed further in Section 3.4.

Another outgrowth of Ant is Maven (http://maven.apache.org), a project management tool from the Apache Project. While you can use Ant to manage other parts of your project such as releases and tracking multiple .jar file dependencies, large build.xml files can become hard to understand. Maven lets you describe your project's structure using a version of XML with programming constructs named Jelly. You can still call Ant tasks from Maven, but Maven can also download the required files for a project as necessary; Ant 1.7 will also be able to do this. Many of the Apache projects use Maven to describe their structure. The idea of having an overall structured description of a project is a good one and seems likely to become common in all automated environments.

In summary, Ant has become the default build tool for Java projects: if a project is written in Java, it's most likely being built with Ant. The exceptions are those projects tied to an IDE with its own build tool, and even these IDEs now support building products by using Ant. In addition, if you want a definition of a build that can be transformed into formats suitable for use by other tools, then using XML for your build file language is very convenient.

5.5.5. Jam

Jam (http://www.perforce.com/jam/jam.html) is an open source build tool from Perforce Software, written by Christoper Seiwald and aimed squarely at projects written in C and C++. Jam costs nothing, and is licensed in a very free manner:

License is hereby granted to use this software and distribute it freely, as long as this copyright notice is retained and modifications are clearly marked. ALL WARRANTIES ARE HEREBY DISCLAIMED.

That's the entirety of the license for Jam. As you might guess from it, Jam is not supported by Perforce; Perforce's business is more focused on its own SCM tool, which is discussed in Section 4.6.4.

The History of Jam

Jam (which may have once stood for "just another make") was originally created in the early 1990s by Christoper Seiwald for internal use at Sybase. Release 1.0 was in November 1993. Seiwald went on to found the SCM tool company Perforce Software in 1995 and is still the CEO there.

Jam is still used to build the various SCM-related products that are sold by Perforce, but new releases of Jam have appeared on average only once every two years. The current version of Jam is 2.5rc3. Since Jam is free, work has continued on it outside Perforce. The most active fork of Jam is BoostJam (http://www.boost.org) by Dave Abrahams, Rene Rivera, Vladimir Prus, and others. It was originally developed to build the Boost libraries, which are a suite of free, peer-reviewed, portable C++ source libraries. BoostJam was actually derived from FT Jam (http://www.freetype.org/jam), an extension of Jam by David Turner for the FreeType software font engine project. Jam has also been used internally by Cisco and other related networking companies to build large pieces of networking software.

In Jam, build dependencies are specified in build files called Jamfiles, and these Jamfiles can include each other in a hierarchical fashion (the top-level build file is named Jamrules). Example 5-4 shows a typical Jamfile. First, note the quirky but required spaces before the semicolons at the end of the lines. The top line of the Jamfile specifies the location of the Jamfile in the directory hierarchy; this one is in a directory named src. The next two lines show how there are two other Jamfiles included by this Jamfile. The line beginning with Main shows an executable named hello being defined as containing the four .c source files and needing to be linked with the libraries listed after LinkLibraries. This Jamfile will create an executable hello on any one of the many platforms supported by Jam. The appropriate file suffixes and prefix are supplied by Jam for each platform.

Example 5-4. A Jamfile

SubDir TOP src ; SubInclude TOP src subdir1 ; SubInclude TOP src subdir2 ; Main hello : hello.c foo.c bar.c baz.c ; LinkLibraries hello : libother libcommon ;

There are three internal stages of execution when Jam is run. In the first stage, all the Jamfiles are opened and (re)read, and for all the Jam targets that could be named on the command line, such as hello, a dependency tree is created. Source files are automatically scanned for other dependencies using extensible rules. Just as with make, file modification timestamps are used to decide whether files have changed, as opposed to the digest scheme used by SCons.

In the second stage, Jam calculates which files need to be updated as part of building (or rebuilding) the specified targets.

In the third stage, the specific Jam rules such as Main that were used in the Jamfiles are used to create the Jam variables that are later used to create the commands that are actually executed. Jam rules are written in the Jam language, and each Jam rule has an action associated with it, which is written as a snippet of shell script.

Unix platforms that are supported by Jam include GNU/Linux, AIX, the BSD Unixes, HP-UX, IRIX, QNX, Solaris, and a number of other less well-known Unixes. Other platforms include Windows and Mac OS X, and also VMS, OS/2, and BeOS. Compilers that are supported by default include gcc, Microsoft Visual C++, MinGW, the Borland compiler bcc, HP-UX aCC, AIX Visual Age, Solaris CC, and IRIX MIPSpro. Other recognized tools and file formats include Lex and Yacc.

Documentation for Jam is accurate, but woefully thin. What is most obviously missing is a cookbook of recipes for various kinds of projects. The next place to search for help is the Jamming mailing list (http://maillist.perforce.com/pipermail/jamming). Another surprising lack is that of a central list of bugs related to Jam (BoostJam does have such a service).

Jam's strengths are many compared with make. First, the idea of a global understanding of dependencies works well; just what is needed for the target that you specify is what gets built. Jam is also fast. One commercial project is able to build all the targets that are found in a million lines of C source code in over 4,000 files in under 10 minutes on an unremarkable desktop machine. Jam is also relatively small; its source code is under 15,000 lines of C source, which makes porting it to different platforms somewhat easier than with larger build tools. Indeed, Jam has already been ported to many different platforms, probably second in number only to make. Extending Jam is relatively easy, since new or overriding rules and actions can be defined in the Jamfiles that are parsed at startup, as shown in Example 5-5. This example also shows that Jam rules and actions are easy to customize, but are often hard to understand.

Example 5-5. Extending Jam

# # Generate C source using a CORBA interface defined in an  # IDL file. # # Argument $(1) is the name of the output file # Argument $(2) is the name of the IDL source file # Argument $(3) is an optional string argument passed to  # the IDL compiler. # rule Idl {    # A Jam idiom to extract the directory from the # first    # argument. outdir is a local variable also available to the     # action defined below.    local outdir = $(<[1]:D) ;    # Set a global Jam variable named IDLFLAGS    IDLFLAGS on $(<) = $(3) ;    # Specify that the C source depends upon the IDL file and     # also on the output directory    Depends $(<) : $(outdir) $(>) ;    # Make sure that the output directory exists    MkDir $(outdir) ; } # # Actually generate the source files using the IDL compiler  # and the IDL source file. # # An action is made up of shell script commands, so doesn't  # have to have the space-semicolon line endings, but semicolons # are still used to separate the shell commands. # actions Idl {    cd $(outdir);    idlcompiler $(IDLFLAGS) $(>); } # # An example of using the new rule in a Jamfile # to generate the file def.c from def.idl with some verbose  # output # Idl build/project1/idl/def.c : def.idl : -Dverbose=true ;

Jam's weaknesses are mainly in debuggability and documentation. Jam's debugging output is extremely verbose and hard to follow or restrict in any useful way. This leads to one of the common complaints about Jam, along the lines of "Jam didn't build my new library." This complaint is often due to the differences between make and Jam: if nothing depended on your new library, then Jam knew it didn't have to build it, so it didn't. In make, you probably defined every target explicitly, so some target or other probably ended up building it for you.

Other weaknesses of Jam include:

The Jam language: The Jam language itself can sometimes prove irritating. Rather oddly, many lines must end with a space and a semicolon. Omitting the space will lead to all kinds of cryptic messages and hard-to-debug build problems, especially if the line in question is the last one in a Jamfile. Using an editor such as Emacs with a mode to color Jamfiles syntactically helps a little with this. The language is also entirely string based, so it has no increment operatorwhich is fine until you really want one. Jam currently has no support for internationalization.
Grist: Another awkward part of using Jam is the concept of grist. Grist is a string associated with every file in a build used by Jam; it tells Jam where to find or write that file. Some of the built-in rules for Jam expect the grist to be implicit, while others expect it to be explicitly provided. When a file's grist is wrong, it can be hard to work out why Jam can't locate the file. Naming a very specific target on the command line, such as a single .o object file, is harder than necessary because of the format of grist, which contains characters such as < and ! that have to be escaped to avoid confusing most shells.
Separate phases: Jam operates strictly in phases: it first evaluates dependencies and all the files that need rebuilding and then executes the commands to rebuild them. This means that the dependency calculations and any included Jam rules in the first stage cannot use files that are generated as part of the second stage. One way around this problem is to call Jam from a shell script or batch file and then use this script to create any generated Jamrules files, for example.
Local header files: There is a current bug that proves surprising to newcomers to Jam. Header files included without a directory name (e.g., include "myproject.h") will be found just fine by the compiler (since the current directory is usually part of the default include path), but Jam may not recognize the dependencies on the included files. This means that changes to header files such as myproject.h will not result in the proper rebuilding of the files in which they were included. Including the file with a directory name (e.g., include "subdir/myproject.h") is a rather long-winded workaround for this bug.

BoostJam is a frontend to Jam and includes explicit support for building object files somewhere other than in the source directory. It also supports building variants (debug, optimize) and multiple targets with multiple compilers better than Jam does. BoostJam also has a better way of quoting arguments passed on the command line, much improved debugging output, and a number of other helpful additions. As of 2005, the BoostJam team is actively developing Version 2 of BoostJam.

In summary, Jam is an accurate, free, and fast build tool, probably the fastest of all the build tools examined in this chapter. A wide range of different platforms and C and C++ compilers are supported by default, and adding more is possible. Weighing against these benefits is the difficulty of debugging Jamfiles, minimal documentation, and lack of any impetus from Perforce to develop Jam any further. The current developments in BoostJam, particularly Version 2, are the brightest hope for improvements to Jam.

5.5.6. SCons

SCons (http://www.scons.org) is an open source build tool written by Steven Knight and others. Among its early contributors were Chad Austin, Charles Crain, Steve Leblanc, and Anthony Roach. The current licensing scheme is the MIT license.

The History of SCons

A build tool named Cons, for "construction," was created by Bob Sidebotham in 1996. It was eventually released under the GPL and is an FSF-connected project. Cons was originally inspired by Jam but used Perl as the language for its build files instead of the Jam language. A key idea of Cons was that you should be able to use the full power of an ordinary programming language within the build files. However, Cons was hard to extend for different file types and didn't handle parallel builds very well; also, some people felt that Perl was not the easiest language in which to write build files. Cons is currently in maintenance mode at http://www.dsmit.com/cons, but the last release was back in 2002.

SCons was conceived as a rewriting of Cons in Python for the Software Carpentry competition, which was sponsored in 2000 by Lawrence Livermore National Laboratory and O'Reilly, among others. More details of the competition can be found at http://www.onlamp.com/pub/a/python/2001/09/13/pythonnews.html. ScCons (for "Software Carpentry Cons") was renamed SCons (for "software construction") after it won the build tool part of the competition in August 2000. Opinions on pronouncing SCons differ, but "ess-cons" seems to be the most common way. The first release of SCons, modestly versioned at 0.01 alpha, was released in December 2001. The first beta release was 0.90 in June 2003, and the first nonbeta release is expected sometime soon.

SCons is implemented in Python (http://www.python.org), a modern, interpreted, object-oriented language freely available under the Python license on many platforms. Python is often compared to Tcl, Perl, Scheme, and Java, and is reportedly one of the easiest languages to learn.^[6] For its build files, SCons uses script files that are written in Python and named SConscript.

^[6] The argument that Python is the easiest language to learn is in a persuasively presented paper available at http://www.python10.org/p10-papers/14/index.htm. It's a good paper, but it was presented at a conference about Python.

Unix platforms that are supported by SCons include GNU/Linux, AIX, the BSD Unixes, HP-UX, IRIX, and Solaris. Other platforms include Windows and Mac OS X, and also OS/2. Compilers that are supported include gcc, Java, Microsoft Visual C++ (including the use of precompiled headers), MinGW, the Intel compiler tools, .NET tools, the Borland compiler bcc, HP-UX aCC, AIX Visual Age, Solaris CC, and IRIX MIPSpro, among others. Other recognized tools and file formats include Lex, Yacc, tar, LATEX, m4, Qt, SWIG, PostScript, PDF, and zip.

Some distinguishing aspects of SCons are:

Portable build files: The way that programs are specified in the SConscript build files is independent of the platform; this permits most platform-dependent decisions to be made in a single configuration file, rather than in each build file. Judicious use of the os.sep string rather than the /, \\, or : characters can help maintain portability. Actually, Python and SCons understand that foo/bar refers to the file bar in the directory foo on both Unix and Windows, so even this may be unnecessary.
Automatic dependency scanning: The dependency checking for SCons is reliable, and a number of languages are supported by default (C, C++, D, Java, and Fortran). You can also extend SCons to support dependencies on objects that are not filesfor example, entries in a database.
Signature files: An MD5 signature of each preprocessed file and the arguments to the compiler is created for every generated file. This makes it easy to detect when a file really does or does not need to be recompiled, which saves a lot of build time. However, just "touching" a key file no longer causes the expected rebuild, which at first can confuse people familiar with make. It also makes it possible to share generated files between different people with more confidence in the correctness of such files.
Parallel build support: The SCons model for performing distributed builds is a carefully considered one. There are multiple threads, with a single job executing per thread. Each thread synchronizes with a task master, which gives it more jobs to work on. In comparison, older versions of make can end up spawning a process for every recursive makefile, making the number of jobs executing at any moment dependent on the structure of the source directories.
Programming language for build files: The use of a full-fledged programming language for the build files means that all the usual features such as conditional branching, loops, text formatting for output, and a sane syntax are present in SCons. You also gain the advantages of having a debugger, profiler, and other such tools that are already part of most programming languages but have historically been missing from most build tools.
SCM tool integration: SCons contains native support for checking out files from various SCM tools, including SCCS, RCS, CVS, BitKeeper, and Perforce. Subversion is not yet supported.
Extensibility and modularity: Adding support for building from new file types is relatively easy. Using small parts of SCons in other applications such as installers is also possible since SCons was designed to be modular. The part of SCons that does the dependency checking and the execution of commands (the "build engine") is separate from the part of SCons that specifies which files to compile. In theory, you could even write the build files in a language other than Python (this idea is explored further in Section 5.5.7, later in this chapter).

Example 5-6 shows an SConscript file. It is written in Python, which has no semicolons at the end of lines and uses the indentation of each line to identify a block of code, instead of curly braces as seen in C and Java. Comments are preceeded by the # character. A single top-level file named SConstruct can be used to tell SCons about the SConscript files, potentially one in each subdirectory.

Example 5-6. An SCons SConscript file

# Explicitly allow this file to use a common environment defined elsewhere. Import('env') # Define an executable named 'hello' env.Program(target = 'hello',              source = ['hello.c', 'foo.c', 'bar.c', 'baz.c'],             LIBS = ['other', 'common'])

The line with the Program method call shows an executable named hello being defined as containing the four .c source files and needing to be linked with the libraries listed in the LIBS argument. This SConscript file will create an executable hello on all of the platforms supported by SCons. The appropriate file suffixes and prefix are supplied by SCons for each platform.

The output shown in Example 5-7 is a small section of the results of using the --debug=tree command-line argument with SCons. It shows how libA.a depends on the three object files, which in turn depend on the .c three source files. Note that if this were a build on a Windows platform, then the file extensions and directory separators would be changed appropriately. Another very useful debug option is --debug=explain, which tells you why a file was recompiled.

Example 5-7. SCons debug output

+-dir1/libA.a   +-dir1/a1.o   | +-dir1/a1.c   +-dir1/a2.o   | +-dir1/a2.c   +-dir1/a3.o     +-dir1/a3.c

Documentation of SCons is of good quality. There is a 5,000-line manpage installed with the package, which is also available from the SCons home page. The Wiki on the home page is well laid-out and has a number of cookbook examples about how to use SCons. Basic examples for using SCons can also be found at the end of the manpage. The mailing lists for SCons are friendly and a good source of help. No book has yet been published about SCons, but it seems to be only a matter of time before one is.

Some other well-considered aspects of SCons include the fact that the environment in which the build is performed is defined independently of the user's own environment. This helps to avoid the awkward situation of leaving departed developers' machines untouched just to perform builds. Variables in the Python build files are properly scoped, which is not true of many other build tools, though of course this does mean that variables have to be explicitly imported between SConscript files. Scalability of SCons for long command lines (lengths of over 10,000 characters) has also been demonstrated, at least for Windows.

Other things that make SCons easier to debug than most build tools are options to display the time spent in different major parts of a build, access to full-scale profiling data for the Python profiler pstats, and an option to start the build from within pdb, the Python debugger.

SCons even has enough confidence in the correctness of its dependency checking to provide a command-line argument to explicitly build the product in a random order! This unique idea helps performance when building different versions of a product simultaneously from the same source code, since it avoids performance problems that occur when multiple processes try to access the same file at the same time. The SCons core developers have an extensive set of unit tests and system tests, and changes to SCons are controlled using the change management tool Aegis.

Weaknesses and irritations of SCons seem to be relatively few. One complaint is about how long an already up-to-date build can take (up to 10 seconds when nothing actually needs to be built). The time spent checking dependencies can be reduced by using the SCons arguments --max-drift=1 and --implicit-deps-unchanged. Still, the startup time for SCons can feel slow. If you make an error in an SConscript file, the Python backtrace at the command line can be quite long, and you have to get used to the most recent part of the trace being at the bottom of the screen, not the top. The .sconsign MD5 signature files exist for each generated file, but these droppings can be made to occur in the directory where the files are built, or even in a single file, as opposed to the source directory. Internationalization is not currently supported.

SCons does require Python, which may not be installed on your machine by default. However, it deliberately does not require cutting-edge versions of Python (though it is tested with them), and Python isn't hard to learn, with many good tutorials available online. Python in a Nutshell, by Alex Martelli (O'Reilly), is one good introduction to Python, as is Dive into Python, by Mark Pilgrim (Apress), which is also available at http://diveintopython.org. The Python Cookbook, by Alex Martelli and David Ascher (O'Reilly), is also handy when you want just an example of how to do something with Python.

In summary, SCons is a well-designed build tool for a range of languages and platforms, and one that has been developed carefully by a knowledgeable group of developers. The choice of Python as a build tool does introduce yet another language dependency to any project, but this overhead seems to be small in practice. If you are starting a project from scratch, or your current build tool is just too much to bear any longer, SCons is the build tool to use.

A Vision of the Future?

As few years ago, I had a vision of a language-independent build tool. With this tool, you would write the build files in whatever language was most appropriate and familiar for the projectJava, C, Python, or even Visual Basic. The build files would be small programs and would make use of a well-defined API to a build engine. Each section of the build file would make calls such as CreateExecutable("hello", SomeFiles), where SomeFiles is the collection of files that are used to create the executable file hello. Dependency checking would be completely automatic, so you would need to recompile or reinterpret the build files only when a file was added to or removed from a target. Detecting this could be done using the digest or signature of the build file itself.

Sound a bit far-fetched? Interprocess communication (IPC) mechanisms such as CORBA have been enabling communication between applications written in different languages for years. Distributed build tools already use IPC mechanisms to communicate between the different processes. Build engines such as the one developed for SCons are already constructed to exist as separate pieces of functionality. All the pieces are coming together for such a build toolwatch the skies!

5.5.7. Custom Build Tools

As projects grow, there is often a need to generate some part of the source code automatically using generic code structures, in the same sense as C++ templates. Common areas for generated code are logging functions, class skeletons, and header files. The input specification can be a simple text file, an XML file, or a whole hierarchy of definitions in different files. A generator tool that uses these input files is invoked as part of the build process to generate the required files. Whatever mechanism is chosen, the generator is essentially a custom build tool, complete with all the potential advantages and frustrations that all the other build tools in this chapter possess. Before writing or using such a generator tool, consider all the things that you want in a regular build tool, such as accurate dependency calculation, easy debugging, and fast up-to-date builds.