So far, XML has mainly been used as an input to our build process, specifying the configuration for both Ant and CruiseControl. However, XML can also be used as an output. The main advantage of XML as an output format is that you can define a vocabulary for the domain you are working in, such as the contents of a music CD or an address book. Using this vocabulary you can then validate any XML document instance. For example, does an instance of the CD have a title, an artist, tracks, track lengths, and so on? You can also use a number of XML-related technologies and tools to manipulate a document instance. For the address book example, you could use XSLT stylesheets to transform an address book in XML document format into HTML for better presentation. Chapter 7 showed one application of XML as an output format for reportingthe CruiseControl Build Results web. When CruiseControl generates a build log, it is output in XML. You can click the XML Log File in the Build Results web if you want to see the raw output. This information is then subsequently transformed into HTML using XSLT stylesheets.
This section discusses how XML can be used as an output format to describe and standardize label or baseline reports. Figure 9.1 shows what you can achieve. It is a well presented and formatted report that describes the status and content of a UCM baseline, including activities, contributor activities, and element versions.
Figure 9.1. HTML baseline report
Baseline Reports in XML
If you think about the contents of a baseline report, what aspects come to mind? What information do you want the report to convey? For me, any baseline report should capture the following:
At a high level, XML documents consist of elements, attributes, and their relationships. An element is basically an object with a defined start and end tag. An attribute is a name-value pair that modifies certain features of the element. If you deconstructed a baseline, you might say that the baseline is a top-level element, and an attribute of it is the date it was created. Similarly, you could also say a version of a file is an element but that it is one of many versions related to the baseline element. With this in mind, the XML document instance for a specific baseline might look something like the following:
Note that in this example the baseline is described in terms of both activities and versions; however, there is no reason why both would always be required. If you think of the vocabulary analogy again, just because you know all the words in the dictionary, you don't necessarily have to use them all when writing a document. This is particularly relevant when you are defining a baseline schema that would work for both Base ClearCase and UCM. With UCM you can extract information on the activities that have contributed to a UCM. However, with Base ClearCase, since there is no concept of activities, the baseline would have to be defined solely in terms of versions. In XML the concept of defining a vocabulary and the relationships between elements in this vocabulary is called a Document Type Definition (DTD).
An XML DTD for Baselines
An XML DTD is a schema that defines the set of allowable elements, attributes, and relationships for an XML document. If you create an XML document and say that it is based on a particular DTD, you have defined a contractone that can be validated by any XML parser. For example, the DTD for a simple book might look like this:
<!ELEMENT book (preface, chapter?, index)> <!ELEMENT chapter (#PCDATA) ... <!ATTLIST chapter title CDATA #REQUIRED pages CDATA #IMPLIED> ...
As you can see, this DTD essentially has two types of entries: <!ELEMENT ...>, which defines XML element names and their relationships, and <!ATTLIST ...>, which defines the attributes of a particular element. The first line states that a book element should consist of three childrenpreface, chapter, and index. It further states that they are all required and they must appear in this specific order. There is no use having the preface at the back of the book! The ? after chapter indicates that there may be one or more chapter elements in a book. The second line indicates that a chapter consists of parsed character data via the #PCDATA keyword. This basically means that the chapter has some textual content. Moving on to attributes, the third line states what attributes a chapter can or should havein this case, a title and the number of pages. The type of both attributes is basic text, as specified by the CDATA keyword. This means that they are not marked out for any additional validation. Alternative attribute types include unique IDs, references, and tokens, all of which an XML parser could validate. Finally, the attributes' "requiredness" is also specified. The title attribute is specified as being always required for each chapter via the #REQUIRED keyword. The pages attribute, however, is optional, as specified by the #IMPLIED keyword. Here's an example of an XML document written using this DTD:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book SYSTEM "book.dtd"> <book> ... <chapter title="Chapter One" pages="10"> This is chapter one ... </chapter> <chapter title="Chapter Two"> This is chapter two ... </chapter> ... </book>
The first line in this example states that this file is an XML file and specifies what XML version it conforms to for compatibility purposes. The second line specifies which file the DTD is recorded inin this case, the local file book.dtd.
If you think back to a baseline report, and the requirements that were outlined at the beginning of this section, the DTD for a baseline report could be defined as follows:
<!ELEMENT baseline (activity* | version*)> <!ELEMENT activity (contributor* | version*)> <!ELEMENT contributor (version*)> <!ELEMENT version (#PCDATA)> <!ATTLIST baseline name CDATA #REQUIRED date CDATA #REQUIRED author CDATA #REQUIRED status CDATA #IMPLIED level CDATA #IMPLIED> <!ATTLIST activity name CDATA #REQUIRED date CDATA #REQUIRED author CDATA #REQUIRED headline CDATA #IMPLIED> <!ATTLIST contributor name CDATA #REQUIRED date CDATA #REQUIRED author CDATA #REQUIRED headline CDATA #IMPLIED>
In this example, you can see that the baseline consists of a collection of either activity or version elementsthe "or" relationship is specified via the | symbol. The * next to the elements specifies that zero or more instances may occur. Subsequently, each activity consists of either contributor or version elements. The contributor element is of direct use in UCM. In UCM, whenever a developer delivers his or her changes to the project's integration stream, new integration activities are automatically created. If you look at these activities, they have a name similar to deliver alex_RatlBankModel_Dev on 09/01/2006 18:46:15, which is interesting but does not really give you an idea of what functionality is contained in the activity. However, it is possible to query an integration activity to see what its contributors arethis is what the contributor element refers to. The remainder of the DTD defines the attributes for each element and what is required or optional.
Although you can use many more capabilities when specifying an XML DTD, this example should at least give you an appreciation for what is possible. For more information on XML DTD and XML in general, refer to Hunter [Hunter04]. Now that I have defined the contract (via the DTD) for baseline reports, let's look at how best to automate their creation.