To execute the example stylesheets in this chapter, you'll need an XSLT processor. The Office 2003 Professional and standalone editions of Word 2003 come with an XSLT processor built-in (for onload and onsave stylesheets, as introduced in Chapter 4), but the examples in this chapter assume you will be invoking them outside of Word, for example, with a command-line processor. You can read about and download one such utility, msxsl.exe, at this URL: http://msdn.microsoft.com/library/en-us/dnxml/html/msxsl.asp.
The libxml project (hosted at http://www.xmlsoft.org) houses some quite useful command-line utilities for XML processing. I personally use Cygwin (a Linux-like environment for Windows see http://www.cygwin.com) and the Cygwin distribution of the libxml tools. But there are also native Windows binaries for each of the libxml tools, available at http://www.zlatkovic.com/libxml.en.html. One particularly convenient tool in the libxml suite is the xmllint command. Its --format option, which inputs an XML document and outputs a pretty-printed version of it (adding line breaks and indentation), is an excellent tool for learning WordprocessingML and for helping to author stylesheets that create Word documents. It was also instrumental in preparing many of the code examples of this book.
The libxslt project also contains its own XSLT processor, with a command-line tool called xsltproc. Other freely-available XSLT processors you may want to try out include Saxon (http://saxon.sourceforge.net) and Xalan (http://xml.apache.org/xalan-j/), both of which are Java-based processors.
WARNING: If you process or create WordprocessingML documents using XML tools that output line endings using a linefeed character (LF) rather than a carriage return and linefeed pair (CRLF), and if your documents contain Base64-encoded data such as VBA macros or embedded images, then you will need to convert the line endings to CRLF before opening the document in Word. Otherwise, Word will not be able to open the document correctly, even though it is well-formed XML. This is arguably a bug in Word's XML processing behavior, but it can be explained by the fact that the Base64 specification requires that individual lines end with a CRLF sequence in the canonical Base64 format. Fortunately, there are easy workarounds. For example, in a Unix or Cygwin environment, you can run the unix2dos command on your file, converting each instance of the LF character to a CRLF sequence.