Hack 32 Transform an XML Document with a Command-Line Processor

   

figs/beginner.gif figs/hack32.gif

A good number of free XSLT processors are available for transforming XML on the command line. I'll introduce some possible choices here: Michael Kay's Saxon written in Java (http://saxon.sourceforge.net), Apache's Xalan written in C++ (http://xml.apache.org/xalan-c/), and Microsoft's MSXSL, also written in C++ (search for MSXSL on http://msdn.microsoft.com). Xalan is written in both Java (http://xml.apache.org/xalan-j/) and C++, but I'll be covering only the C++ version.

3.3.1 Saxon

You can use Saxon in the regular Java version (saxon8.jar) or as Instant Saxon, which is a Windows executable (saxon.exe).. Both are available at http://saxon.sourceforge.net. The latest (and probably last) release of Instant Saxon is Version 6.5.3, which came out in August 2003. The latest release of the regular Saxon is Version 8.0. It is likely that a more recent version of Saxon will be available after this book goes to print, as Saxon's development is keeping up with drafts of XSLT 2.0 and XPath 2.0. Both Instant Saxon and Saxon are free, and Saxon is open source, although you can also now purchase a commercial version from Saxonica.

Saxon was the first spec-compliant XSLT 1.0 processor and was released 17 days after the XSLT 1.0 and XPath 1.0 recommendations were published in 1999. Saxon's creator, Michael Kay, is the editor of the XSLT 2.0 specification and one of the editors for XPath 2.0.

3.3.1.1 Instant Saxon

Download Instant Saxon from http://saxon.sourceforge.net (instant-saxon6_5_3.zip), unzip and install it, and place saxon.exe in your path. You can then display usage information for Instant Saxon by typing the following at a Windows command prompt:

saxon

You should see the usage information as shown in Example 3-1.

Example 3-1. Saxon 6.5.3 usage information
No source file name SAXON 6.5.3 from Michael Kay Usage: saxon [options] source-doc style-doc {param=value}... Options:   -a              Use xml-stylesheet PI, not style-doc argument   -ds             Use standard tree data structure   -dt             Use tinytree data structure (default)   -o filename     Send output to named file or directory   -m classname    Use specified Emitter class for xsl:message output   -r classname    Use specified URIResolver class   -t              Display version and timing information   -T              Set standard TraceListener   -TL classname   Set a specific TraceListener   -u              Names are URLs not filenames   -w0             Recover silently from recoverable errors   -w1             Report recoverable errors and continue (default)   -w2             Treat recoverable errors as fatal   -x classname    Use specified SAX parser for source file   -y classname    Use specified SAX parser for stylesheet   -?              Display this message

Instant Saxon expects the name of the source document followed by the name of the stylesheet as parameters, as shown:

saxon time.xml clock.xsl

You will get this output:

11:59:59 p.m.

To direct Instant Saxon's output to a file, use the -o switch:

saxon -o time.out time.xml clock.xsl

You can transform a document that has an XML stylesheet PI [Hack #3] by using the -a option:

saxon -a clock.xml

For verbose output, use the -t option:

saxon -t time.xml clock.xsl

3.3.1.2 Full Java version of Saxon

A stable version of Saxon for XSLT 1.0 is 6.5.3. The latest version at the time of writing is 8.0. Saxon progressively supports the working drafts for XSLT 2.0 and XPath 2.0. Saxon will most probably have gone beyond Version 8.0 by the time you are reading this.

You can use Saxon on any platform that supports Java. This requires you to have a JRE or Java VM installed (Version 1.4 or later). You can download the latest SDK or JRE from http://java.sun.com.

Download Saxon 8.0 (or later) from http://saxon.sourceforge.net and install it. You can place the saxon8.jar in your classpath [Hack #10] . The following examples assume that the JAR is in your current directory.

For usage information, enter this line at a command or shell prompt:

java -jar saxon8.jar

The output shown in Example 3-2 will appear.

Example 3-2. Saxon 8.0 usage information
No source file name SAXON 8.0 from Saxonica Usage:  java net.sf.saxon.Transform [options] source-doc          style-doc {param=value}... Options:   -a              Use xml-stylesheet PI, not style-doc argument   -c              Indicates that style-doc is a compiled stylesheet   -ds             Use standard tree data structure   -dt             Use tinytree data structure (default)   -im modename    Start transformation in specified mode   -m template     Start transformation by calling named template   -l              Retain line numbers in source doucment tree   -o filename     Send output to named file or directory     -m classname    Use specified Emitter class for xsl:message output   -r classname    Use specified URIResolver class   -t              Display version and timing information   -T              Set standard TraceListener   -TJ             Trace calls to external Java functions   -TL classname   Set a specific TraceListener   -u              Names are URLs not filenames   -v              Validate source documents using DTD   -w0             Recover silently from recoverable errors   -w1             Report recoverable errors and continue (default)   -w2             Treat recoverable errors as fatal   -x classname    Use specified SAX parser for source file   -y classname    Use specified SAX parser for stylesheet   -?              Display this message   param=value     Set stylesheet string parameter   +param=file     Set stylesheet document parameter   !option=value   Set serialization option

Now transform time.xml with clock.xsl by typing the following:

java -jar saxon8.jar time.xml clock.xsl

Send output to a file with the -o command-line option:

java -jar saxon8.jar -o time.out time.xml clock.xsl

An XML stylesheet PI [Hack #3] in an XML document allows you to transform using the -a switch like this:

java -jar saxon8.jar -a clock.xml

Use the -t option for verbose output:

java -jar saxon8.jar -t test.xml test.xsl

If the XML source document has a DTD with a document type declaration [Hack #68], you can validate it with the -v switch:

java -jar saxon8.jar -v valid.xml clock.xsl

Instant Saxon does not have the -v option.

3.3.2 Xalan

Xalan is an open source XSLT processor developed by Apache. To use Xalan C++, you must install the C++ version of Apache's XML parser, Xerces. Both Xalan C++ and Xerces C++ are available at Apache's XML site (http://xml.apache.org). At the time of writing, Xalan C++ is at Version 1.8.0 and Xerces C++ is at Version 2.5.0.

Download and install both Xerces and Xalan, and then add the executables to your path. Both run on Windows and various flavors of Unix. After you have installed Xalan and Xerces, you can enter the following at a command prompt or in a shell:

xalan

You will then get the output shown in Example 3-3.

Example 3-3. Xalan 1.8.0 usage information
Xalan version 1.8.0 Xerces version 2.5.0 Usage: Xalan [options] source stylesheet Options:   -a                   Use xml-stylesheet PI, not the                         'stylesheet' argument   -e encoding          Force the specified encoding for the                         output.   -i integer           Indent the specified amount.   -m                   Omit the META tag in HTML output.   -o filename          Write output to the specified file.   -p name expression   Sets a stylesheet parameter.   -u                   Disable escaping of URLs in HTML output.   -?                   Display this message.   -v                   Validates source documents.   -                    A dash as the 'source' argument reads                         from stdin. ('-' cannot be used for both arguments.)

Transform a document with Xalan using this command:

xalan time.xml clock.xsl

To direct the result tree from the processor to a file, use the -o option:

xalan -o time.out time.xml clock.xsl

The result of the transformation is redirected to the file named time.out.

Process a document with an XML stylesheet PI [Hack #3] using Xalan's -a like this:

xalan -a clock.xml

If an XML source document uses a DTD [Hack #68], Xalan can validate it using the -v switch:

xalan -v valid.xml clock.xsl

3.3.3 MSXSL

MSXSL is a free Win32 executable XSLT processor from Microsoft. To get it, go to http://msdn.microsoft.com/downloads/ and search for "MSXSL." The latest version of MSXSL also requires MSXML 4.0, which you can download from the same location (search for "MSXML 4.0 Service Pack 2"). MSXSL is small (about 25 KB) and fast. You can download the source code, too.

MSXSL uses UTF-16 output by default, which doesn't produce very attractive output in a console window. You have to use the encoding attribute on an output element [Hack #43] in a stylesheet to override this, as clock.xsl does.

After downloading MSXSL, place the executable msxsl.exe in your path. To display usage information for MSXSL, type the following line at a Windows command prompt:

msxsl -?

You will see this usage information (Example 3-4).

Example 3-4. MSXSL 4.0 usage information
Microsoft (R) XSLT Processor Version 4.0     Usage: MSXSL source stylesheet [options] [param=value...] [xmlns:prefix=uri...]     Options:   -?           Show this message   -o filename  Write output to named file   -m startMode Start the transform in this mode   -xw          Strip non-significant whitespace from source                 and stylesheet   -xe          Do not resolve external definitions during                 parse phase   -v           Validate documents during parse phase   -t           Show load and transformation timings   -pi          Get stylesheet URL from xml-stylesheet PI in                 source document   -u version   Use a specific version of MSXML: '2.6', '3.0',                 '4.0'   -            Dash used as source argument loads XML from                 stdin   -            Dash used as stylesheet argument loads XSL                 from stdin

To transform a document using MSXSL, type:

msxsl time.xml clock.xsl

To direct output from MSXSL to a file, use the -o switch:

msxsl -o time.out time.xml clock.xsl

To transform an XML document that contains an XML stylesheet PI [Hack #3], use the -pi option:

msxsl -pi clock.xml

For timing information, use the -t switch:

msxsl -t time.xml clock.xsl

If the source document uses a DTD [Hack #68], you can validate it by using the -v option:

msxsl -v valid.xml clock.xsl



XML Hacks
XML Hacks: 100 Industrial-Strength Tips and Tools
ISBN: 0596007116
EAN: 2147483647
Year: 2006
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net