Section 6.5. Build tools


Prev	don't be afraid of buying books	Next

6.5 Build tools

The larger your project is and the more frequently it is updated, the sooner you will discover that creating a working web site is only the beginning. Maintaining it is just as important and often more time-consuming . If nothing else, running a transformation again and again for each updated page is tedious and error-prone . In Chapter 5, we addressed this by programming a batch processing mode in the stylesheet ( 5.6 ).

Housekeeping chores. However, transforming pages is only a part of the web site maintenance routine. Other tasks you will likely want to automate are:

Deleting stale versions of transformed pages. For pages being updated, this is not necessary because your stylesheet will overwrite the old HTML file. If, however, you have removed a source page document, you'll want its corresponding HTML page to be cleaned up automatically.
Deleting all sorts of temporary files, such as SVG and intermediate PNG files left over from graphic generation ( 5.5.2 ).
Uploading the finished HTML files and graphics to the staging or production server.
Taking a certain action (e.g., emailing the site admin) in case validation or transformation of a document fails. This may be necessary if the site building process runs unattended.

Process what was changed. For large sites it would also be nice to transform only those documents that changed since the last transformation, instead of all the documents registered in the master. This can be done by comparing the modification times of the source and output files for each page and running the transformation only if the source is newer than the output. If, however, the master document or the stylesheet was modified, we will need to transform all the pages, regardless of their timestamps.

A hierarchy of tasks. All these tasks form a natural hierarchy wherein some tasks depend on others. For instance, uploading files only makes sense after some of the pages have been transformed. So, if the user gives the upload command, the system must start by checking whether the transformed files are up to date. If some are stale, validation and transformation must be run for them first; if there is any error in validation or transformation, the entire upload operation must be canceled .

Tools for building projects. All this is possible with one of the build tools that we'll explore in this section. A build tool is basically an interpreter that executes a build file a formalized description of your project that lists all of its tasks and components and specifies how they depend on each other. When a build file is used, the entire project, including all of its primary and auxiliary tasks, can be built by one simple command.

The build file does not need to change during the routine maintenance of a project, but only when you add or remove components (such as web pages) or change the dependency rules. It therefore makes sense to generate the build file automatically from the master document whenever the latter is updated.

6.5.1 make

No doubt the most widely used build tool today is the classic make utility. It is the tool of choice for countless programming projects and therefore the first thing to try.

The build file of make , called a makefile , uses a simple plain text format. It contains definitions of targets (tasks that you want to perform), their prerequisites (other targets or files that each target depends on), and the corresponding operating system commands that will be run to fulfill each target. A combination of a target, its prerequisites, and its commands is called a rule .

The reliance on OS commands is the largest drawback of make . This utility is native to the Unix world (it is available by default on almost any Unix system) and therefore takes many conventions of a Unix environment for granted. Nevertheless, you can successfully use make on Windows as part of the Cygwin environment. ^[70]

^[70] www.cygwin.com

Let's see how make can be used to automate web site building with a Java-based XSLT processor. The examples in this section cover a few basic tasks, but applying these principles to more complex scenarios is straightforward. If you want to dig deeper, a complete manual for make is available online. ^[71]

^[71] www.gnu.org/manual/make

6.5.1.1 Validation and transformation script

When processing a makefile, make stops and signals an error whenever any of the commands it runs returns a nonzero exit status (a standard Unix convention for indicating a program's failure). We can use this facility to cancel transformation of a page that fails to validate, but we need to make some preparations first.

The problem is that in our setup, the Schematron schema used for validating a page document is nothing but a special-purpose XSLT stylesheet ( 5.1.2 ). From the processor's viewpoint, the stylesheet is OK and works as programmed even when it displays diagnostics. This means that a zero (no error) exit status is returned by the XSLT processor when validation finishes, regardless of its outcome. (And you cannot set the exit status of the processor from within a stylesheet.)

One more layer of logic. To work around this problem, we'll write a shell script ^[72] that tries to validate a page document, analyzes the validation output, and runs the transformation only if no errors were detected . Such a script for an sh- compatible shell ^[73] is shown in Example 6.3.

^[72] I.e., a small program to be run by the OS command interpreter, called shell on Unix systems.

^[73] Just as does make itself, this script runs on any Unix system or on Windows with Cygwin.

A language for scripting what? As was the case with the Java examples in Chapter 5, you don't need to be familiar with the shell scripting language. You will most likely be able to use this example as is or with trivial changes in XSLT processor invocation.

Variables and parameters. The script takes three command-line parameters: $1 is the pathname of the input XML file, $2 is the pathname of the transformed HTML page, and $3 is the value of the $env parameter of the stylesheet ( 5.1.1.1 ; for example, staging ). The saxon variable stores the command to run the XSLT processoryou can change it to whatever works on your system.

Example 6.3. `process` : A shell script that validates and transforms a page document.

  #!/bin/sh  saxon="java net.sf.saxon.Transform" ERR=`$saxon -l  schema-compiled.xsl env= 2>&1` ERR=$ERR`$saxon -l  schema2-compiled.xsl env= 2>&1`  if  [ -z "$ERR" ]  then   echo  "Validation successful, transforming..."   $saxon -o   style.xsl env=  else   echo  $ERR  exit  1  fi

Two-stage validation. Suppose you have two Schematron schemas (such as those in Examples 3.3 and 5.20) and want to run both schemas against each source document. To make this possible, the $ERR variable first receives the output of one validation command ('back quotes' run the quoted command and return its output) and then appends the output of the second one.

No matter whose problem it is. The 2>&1 construction redirects standard error to standard output so that both output streams are caught by the back quotes. In this case, standard output will contain any Schematron diagnostics, while standard error will be used by the processor itself in case the compiled schema is broken or some other runtime error happens.

If and only if. Then, the value of $ERR is checked. If it is an empty string, no errors of any kind were encountered and we can safely run our transformation. The exit status of the script in this case will be the same as the exit status of the transformation process (which may also fail for a variety of reasons, such as a broken stylesheet or a Java problem).

Note that for this to work, your schema must be absolutely silent unless it finds a serious error. Unsolicited advice such as that in Example 2.2 (page 65) needs to be shut off.

Otherwise, we print $ERR and finish with a nonzero exit status. In this branch of the if statement, you can add any other emergency measures, such as emailing $ERR to the site admin, excessive loud beeping, or going belly up. You can also specify a nonzero exit code other than 1; it will be reported by make so that you can use it to tell a validation error from other kinds of errors.

6.5.1.2 Makefile

Armed with this script, we can now write a makefile for our web site project. For Example 6.4 to be immediately understandable, some of the hairier stuff, such as image generation, is ignored. This example does, however, demonstrate all the main building blocks of a makefiletargets, prerequisites, commands, and variables. A real project's makefile will therefore be very much like this one, only larger.

Variables. The first part of the makefile contains variable definitions. As in any other programming language, make can use variables to store values used more than once, thereby making the code less verbose and easier to understand. Here,

saxon is the command that runs the XSLT processor.

We could store this command in an environment variable and use it both in the shell script and in the makefile.
env is the environment identifier passed to the stylesheet ( 5.1.1.1). It is assumed that each environment has its own makefile, so we won't try to do any environment switching here.
out- path is the directory path for output files (same as $out-path in the stylesheet). We don't need to specify the source path because the makefile will always be run from the root directory of the site's source tree (where it resides along with the stylesheet, schemas, and the master document).

upload-path is the URI or pathname of the directory where the final HTML pages will be uploaded after transformation. This is not the same as $target-path in the stylesheet; for example, if you are installing the site in Apache's HTML directory on your system, upload-path may be /var/www/html/ while $target-path may simply be /. If your makefile is generated automatically from the master document, you can store the upload-path value for each environment, along with all the other paths, in the master.

Example 6.4. `Makefile` : A makefile primer.

 saxon = java net.sf.saxon.Transform env = staging out-path = /home/d/web/out/ upload-path = remote:/tmp/upload/ globals = \   _master.xml \   style.xsl \   schema-compiled.xsl \   schema2-compiled.xsl files = \   $(out-path)en/index.html \   $(out-path)en/team/index.html \   $(out-path)en/team/hire.html \   $(out-path)en/team/history.html  # ... list all of your output files here   upload  : build         rsync -v -t -u -r $(out-path) $(upload-path)  build  : $(files)  $(  out-path  )%.html  : %.xml $(globals)         ./process $*.xml $(out-path)$*.html $(env)  %-compiled.xsl :  %.sch         $(saxon) -o $*-compiled.xsl $*.sch schematron-saxon.xsl  clean :  rm -f $(files)

globals is a list of files that all page documents will specify as their prerequisites. Include here all the files that are used during validation or transformation of a page (except the source of that page): master document, transformation stylesheet, and compiled schema(s). You can add Makefile itself to this list to make sure the project is rebuilt when the makefile changes. (The \ at the end of a line means the list is continued on the next line.)

You may also need to include here all static images that are accessed during transformation (e.g., using the graph:geth() and graph:getw() extension functions, 5.5.1 ). Otherwise, your HTML may stray out of sync with the images should they be changed for some reason.
files is the list of all HTML files generated by the project. Each file is given as a complete pathname starting with the out-path (to reference a makefile variable, you enclose it in () and prepend $ ). Only a few files are shown here for illustration, but in a real makefile, you'll list all of your site's pages in this variable.

Upload rule. Following the variable definitions, the makefile lists its rules . Each rule consists of a target (before the colon), a list of prerequisites (after the colon ), and optional commands (each on its own line starting with a tab character, visible as an eight-space indent).

A target may be a filename or an arbitrary identifier. The target of the first rule in our makefile is called upload . This rule depends on another target, build (prerequisites of a rule may be either filenames or targets). To fulfill the upload target, make will run rsync ^[74] to copy the output directory to the upload location (remote or local).

^[74] rsync (rsync.samba.org) is a file transfer utility that uses a protocol similar to ftp , only better.

Smart uploading. Note that upload specifies another target, and not some file(s), as its prerequisite. This means make cannot compare timestamps to determine if this target needs to be rerun or not. As a result, the rsync command will always run when the upload target is activated. Since make may be unable to find out the status of uploaded files on a remote server, this "just in case" uploading makes sense: With the options shown in 6.5.1.2, page 350, rsync will itself compare source and destination files and only transfer those that are newer on the source side.

Build rule. The second rule's target is build , and this rule runs no commands. It simply refers to the files variable to declare that all of the project output files must be in place and up-to-date for this target to be considered fulfilled. Translated into plain English, this rule says, "Consider build done when all of $(out)/en/index.html , $(out)/en/team/index.html , ... are done."

Page processing rule. Next comes a rule for processing specific files. It is unusual in that its target contains a % character, which works as a wildcard. Rules whose targets contain % are called pattern rules ; they are supposed to run multiple times on different files.

Thus, when make attempts to fulfill the first prerequisite of build, $(out-path)en/index.html , it first tries to find a rule with exactly this target. If none is found, it checks the available pattern rules to see if one of them matches. Indeed, $(out-path)%.html will match if we replace % by en/index .

So, make fires this pattern rule on this specific file. It first checks the existence of en/index.xml and all the globals ; if no HTML file for this page exists yet, or if some of the prerequisites are newer than the HTML file, it runs the process script ( 6.5.1.1 ) to validate and transform the page document. Within a command in a pattern rule, the $* construct means "whatever corresponds to % in the current invocation of the rule." This pattern rule will thus be run again and again for all the HTML files listed in build .

Schema compilation rule. The compiled schemas also depend on other filesnamely, on their source Schematron ( .sch ) documents. Another pattern rule sees to it that any schema whose filename matches %-compiled.xsl is recompiled into XSLT whenever its Schematron source is modified.

Cleanup rule. The last rule, clean , does not depend on anything, nor is it a prerequisite for any other rule. It simply removes all files so that the next run of make will recreate all the HTML from scratch. You can include other tidying-up operations in this rule, such as removing temporary files, generated images, and compiled schemas.

6.5.1.3 Running make

Save 6.5.1.2, page 350 as a file called Makefile in the site source's root directory. Put the process script (Example 6.3) there, too. ^[75] Now, building and uploading the entire web site is as simple as typing

^[75] On Unix, you need to make it executable first, e.g., by saying chmod +x process .

 make

on the command line. When called without parameters, make tries to fulfill the first target in the makefile, upload (called the default target ). This, in turn , triggers build , and build pushes all the files through the corresponding pattern rule. Thus, in the first run of make , all project files will be validated , transformed, and uploaded.

On a subsequent run, make will validate and transform only those pages whose XML sources were updated. If, however, a change was made to one of the files in globals (e.g., the stylesheet or one of the Schematron schemas), all pages will be redone because they all depend on the changed file.

Explicit targets. When running make , you can give it the name of a specific target that you want to invoke. For example,

 make build

will perform validation and transformation only, but not upload. Or, you can say

 make clean

to remove all generated files. Since the clean target is outside the dependency tree rooted in the default upload target, the only way to activate clean is by calling it explicitly.

6.5.1.4 Makefile generation

When the project is stable and updates are routine, make is a convenient way to keep the web-ready transformed site in sync with any source modifications. It only becomes less convenient when you frequently add or delete pages (which requires adding or deleting corresponding pathnames in the makefile).

In this case, it is preferable to have the makefile generated automatically from the information in the master document. For this, you may need to expand your master document's environment element type to store the additional information to be put into the makefiles, such as the commands used to run the processor in each environment and the names of the upload directories.

Once this information is available, however, translating it from the master document's XML into the makefile format is very simple. It may be convenient to include this functionality within the main transformation stylesheet, replacing our batch template ( 5.6 ), so that makefile generation is triggered when you run the stylesheet on the master document.

6.5.2 Apache Ant

Apache Ant ^[76] is a Java-based build tool that in many respects is similar to make . Ant's main points of difference are these:

^[76] ant.apache.org

Ant's build file is XML, not plain text. The two immediate consequences of this are that the build file (called build.xml ) is more verbose and easier to read than a makefile.
To fulfill its targets, Ant runs Java classes, not OS commands. This gives Ant its biggest advantage: portability and independence of the underlying OS. You can use any of Ant's collection of predefined classes for performing common tasks, or you can write your own class for the functionality you need. One of the predefined Ant tasks, exec , even lets you run an arbitrary program outside the Java virtual machine (this is not recommended, though).
Unlike make , in Ant targets only depend on other targets, not on files. This means you can build a hierarchy of tasks, but you cannot expect a task to be run only when its output file is older than the input file. Ant will run a prerequisite task in any case; it is the class called by that task that may decide whether to perform an action based on file dates (or any other information).

Thus, the built-in Ant task javac that compiles Java source files does check for file dates and processes only those files whose source ( .java ) is newer than the binary ( .class ). However, an XSLT transformation target that you may define will not have this functionality (unless you write a custom Java class for this), and your build process will therefore always transform all page documents.

Despite this latter limitation, Ant may still be a good choice if you know Java and use other Java-based tools for working with XML.


	Amazon