Chapter 12. XSLT Processors, Extensions, and Java

CONTENTS

12.1 XSLT Processors
12.2 Extension Elements and Functions
12.3 Namespaces
12.4 Java
12.5 Commercial XSLT Processors

Extensions
Conformance
Namespaces
Java
Sun Microsystems XSLTC
Oracle XML suite
Microsoft MSXML

As a form of XML, XSLT is by nature extensible. Extensions are additions to the standard library of elements and functions described in the previous chapters, created and supported by various XSLT processor implementations. With extensions, the potential ways XSLT can be used are limitless.

The W3C specification for XSLT, however, does not provide a way to define or control extensions, and extensions are not required to be implemented by processors.^[1] What this means is that an XSLT stylesheet solution that works with one processor might well not work with others. For this reason, the XSLT element-available() and function-available() functions, and the <xsl:fallback> instruction element, can be used to test for support of extention elements and functions, and to provide for contingencies.

You may find that one processor or another suits your needs because of the platform on which it runs, its speed, the extensions it offers, or a combination of these. In this chapter, we will discuss three commerical products: Sun Microsystems XSLTC, the Oracle XML suite, and the Microsoft MSXML, as well as Java and its installation procedures. Freeware XSLT processors Xalan, Saxon, and XT are discussed in Chapter 13. Information on these and other available tools can be obtained through the following URIs, among others:

http://xml.coverpages.org http://oasis-open.org http://www.xml.org

12.1 XSLT Processors

The fact that the processors we've chosen to discuss are largely Java-based is due only to the ubiquity of the platform and wide base of common familiarity with and knowledge of its interface. There are certainly implementations in Perl, Python, C++, and so on, accessible through the links above.

If there is any describable "methodology" for the processors we've chosen for detailed discussion here, it is nothing more other than the general amount of popular use, discussion traffic on the Internet, and the likely duration of continued development. Accordingly, we will discuss Xalan, Saxon, and XT in some detail in the following chapter. We have emphasized these shareware processors largely because of our own views on the open source movement.

The range of commercial processors is increasing almost monthly. For this reason, we have chosen to focus on freeware programs. However, we will discuss Sun Microsystems' XSLTC, Oracle's XML suite, and Microsoft's MSXML briefly in this chapter.

The remaining processors emphasized in this book are discussed in Chapter 13. We offer some detail on how to install these processors, as we have learned that many aspects of Java CLASSPATHs and other details necessary to set up an XSLT processor, which may be obvious to programmers, are not always as clear to a markup expert. The majority of each processor's discussion, however, is devoted to explaining the extensions each has.

We make no effort to evaluate the processors, though we do report features of the W3C specification not implemented if any and any observable aspects noted by developers, such as memory footprint and speed. This is presented only to save the legwork of gathering this information and summarizing it, rather than to formally endorse or advocate the respective assessments. Before we introduce the processors and their extensions, however, we'll first give a general overview of the concept of extensions and their use according to the W3C specification for XSLT, as well as the mechanics of using them regardless of which software they are supported by in an XSLT stylesheet.

12.2 Extension Elements and Functions

In its simplest sense, an extension element is any element not prescribed in the W3C specification for XSLT. Of course, LREs are also not prescribed; however, LREs are not software-dependent and do not have to be namespaced. Extension elements are first visually distinguished by their namespace prefix. The same is true of extension functions insofar as extension functions are always namespaced. In fact, XSLT elements and functions are namespaced also, but their namespaces are derived by implicit inheritance from the default namespace defined in the document element. As we will see, it is possible to define extension attributes and functions, as well as extension top-level elements.

The W3C specification only describes the use of extension instruction elements and extension functions, expressly contrasting them with LREs. Specifically, if an extension element occurs within a template, it is considered an instruction element rather than an LRE, as long as that namespace has been defined and is available to the context of the extension element. An interesting consequence is that the W3C specification for XSLT does not address the use of extension top-level elements.

Nonetheless, processors such as Saxon have added extension top-level elements, so there will be discussion of them here in this chapter. Remember, the uses of and behavior for these extensions is processor-specific and not, as of XSLT 1.0, prescribed by the W3C specification.

12.3 Namespaces

Namespaces identify the particular markup vocabulary from which element-type names and attribute names are derived. They are important because, for example, when using links, the idea of embedding linked content from other documents raises the problem of possible duplicate element-type names and attribute names with different meanings. For instance, if the element-type name "body" was used in two different ways, such as by an auto parts manual and by a physician's desktop reference, how should this be handled?

It is essential that element-type names and attribute names be distinguishable in the way that, for example, the particular meaning of element in chemical element is distinguishable from its meaning in markup element. Another way to think of the issue is in terms of surnames. For example, lots of folks are named John, and an inconvenience (especially for those named John) can arise from the somewhat unsavory uses of john but a surname (like Gardner or Jacobjingleheimerschmidt) disambiguates "a john" from "John Smith," "John Doe," or "the john." Namespaces allow a unique identification of a name by providing a way for them to be differentiated.

12.3.1 Theory of a Namespace

A namespace is, well, a space for a name. Element-type names or an attribute names are qualified with a prefix, a sort of short name or alias for the namespace. Each namespace is declared prior to using its prefix, either in the same element in which it is used or in an ancestor element. The declaration of the namespace contains a URI that (hopefully) points to something that regulates that namespace.

Note

The prefix is necessary because URI constructs are not allowed in element and attribute names and would render a document invalid if used as such.

Any XSLT element-type name is qualified with a namespace, and as such, it is called a QName. A QName is an element-type name preceded by a prefix, with a separating colon. In essence, if an element-type name has a colon (:), you can be sure it is a QName and, accordingly, indicates an element with a namespace. Element-type names without colons are not QNames, then, and can be referred to as no-colon names, or NCNames.

When declaring a namespace, you must specify what the prefix of the namespace will be, and the name of the namespace, which is usually a URI (Uniform Resource Identifier) giving at least in human terms information that identifies the namespace according to its lineage or pedigree. The URI could also point to a DTD or Schema for the declared namespace.

Note

We are deliberately avoiding what is and has been a huge topic of debate about the declaration of a namespace. There are some who feel the URI must always resolve to a particular DTD or schema, others feel it never should, and a range of passionate opinions fall between. The Namespace Specification allows for a particular namespace URI to resolve to a "place" such as a Web resource as well as for it not to resolve to a place. Quoting from the Namespace Specification: "The namespace name, to serve its intended purpose, should have the characteristics of uniqueness and persistence. It is not a goal that it be directly usable for retrieval of a schema (if any exists)."

A namespace declaration is similar to a DOCTYPE declaration, only it is found in a different part of the XML document instance. It is not a child of the root; it is an attribute on an element. The examples in this chapter concern the declarations of XSLT namespaces, but those that might be found in XML documents will be similar. It is not the purpose of this chapter to provide a comprehensive guide to writing namespaces, but rather, to enable you to effectively recognize and use them with XSLT. Namespaces become very important for distinguishing extensions to XSLT that different XSLT processors implement, and as such are discussed further in Chapter 12.

12.3.2 Anatomy of a Namespace

A namespace must be declared before it can be used. Declaring a namespace such as xmlns:xsl="http://www.w3.org/1999/XSL/Transform" defines a prefix and the value of the namespace name, which is usually a URI pointing to the namespace owner. The prefix (xsl, in this example) follows the XML namespace declaration, xmlns. The XML namespace declaration is the reserved attribute name, xmlns (which happens to be the prefix for the xmlns namespace), followed by the : separator. Immediately following the separator is the prefix being defined, followed by an equal sign, which separates it from the address or URI of the namespace.

The address or URI of the namespace is only used to identify the namespace; whether it resolves to a location or a schema is irrelevant. It is only intended to be a unique value that can be used to differentiate elements from two separate sources that may happen to have the same name.

Figure 12-1 illustrates an example of a namespace declaration. Notice that there is what appears to be a Web address (or URI) for the name of the namespace. In fact, this particular namespace name is a specific Web address; if you type it into a browser, you'll get a Web page on the W3C Web site. This namespace URI is not actually controlling the content of the document in relation to any specification, but it serves as a placeholder for the future of such an engine. The Namespaces recommendation does not preclude that the URI be a specific address.

Figure 12-1. Anatomy of a namespace declaration.

graphics/12fig01.gif

A namespace declaration, unless it is xml or xmlns, must be declared in the element it is used in or in an ancestor of that element. The xmlns namespace declaration attribute is therefore allowed in any XML element. To actually use a namespace in an element tag, the prefix (xsl, in this example) should appear before the element-type name, with a colon (:) between them:

<xsl:template match="body">

12.3.3 Default Namespace

In any XSLT stylesheet there is a default namespace that does not need to be declared. The XML namespace, for example, is implicitly declared (see Section 12.3.7) and can be used anywhere in the stylesheet (as seen using the xml:space attribute). The default namespace for an XSLT stylesheet can be changed by declaring a new namespace and removing its prefix from the declaration.

In Example 12-1, the XHTML^[2] namespace is declared and made the default namespace of the document by removing the prefix.

Example 12-1 An LRE stylesheet declaring the XHTML namespace as the default namespace.

<html xsl:version="1.0"              xmlns:xsl="http://www.w3.org/1999/XSL/Transform"              xmlns="http://www.w3.org/TR/XHTML">       <head>              <title>My document title.</title>       </head>       <body>              <p>My document content.</p>       </body> </html>

12.3.4 Qualified Names (QNames) and No-Colon Names (NCNames)

Namespaces contain two rather esoteric parts: QNames and NCNames. An NCName is a no-colon name, or a name without a colon, while a QName is a name with a colon. Strangely enough, each part of a qualified name (the parts around the colon) is an NCName. So if iowa:season is a QName, iowa and season individually are both NCNames. However, they each have their own signification.

A QName has two parts: an optional prefix and a local part. The prefix is associated with a namespace, which is expanded into a URI using the namespace declared for that prefix. The local part is the name of the object in the document instance. The prefix and the local part are separated by a colon. With or without a prefix, both element-type names and attribute names are QNames.

The name of an XSLT object is the combination of the expanded prefix (if used) and the local part of the QName, and is referred to as the expanded name. If a QName does not have a prefix, the local part is considered the name of the XSLT object. The default namespace defined in the stylesheet is not used as a prefix for QNames that do not have prefixes.This discussion on names and prefixes may be quite confusing, but suffice it to say that the names of most objects in XSLT are governed by the XML naming rules for names:

A name must start with either a letter or an underscore character (_).^[3]
The remaining characters in a name must be one of the following:
- Letter
- Digit
- Period (.)
- Dash (-)
- Underscore (_)
- Combining characters
- Extenders

Combining characters and extenders are special characters derived from the Unicode character database (listed in Appendix B of the XML specification).

In addition to element-type names and attribute names, internal XSLT objects must be named using QNames. Internal XSLT objects are defined by the XSLT specification as one of six types: specifically, a named template, a mode, an attribute set, a key, a decimal-format, and a variable or parameter.

12.3.5 The XSL Namespace

The document element (<xsl:stylesheet> or <xsl:transform>) of an XSLT stylesheet contains a namespace declaration that defines the document as an XSLT stylesheet. This namespace always uses the same format. Once the namespace has been declared, XSLT elements are recognized by the processor using the "xsl" prefix in the element-type name. Notice from the examples used previously that each instruction and top-level XSLT element contains the "xsl" prefix followed by a colon, as in <xsl:template>. If these elements did not have the prefix, the processor would assume them to be LREs and this would either invalidate the stylesheet, or send the LREs to the output.

The declaration of the XSL namespace for any XSLT stylesheet occurs as an attribute in the document element. The prefix for the XSLT namespace is xsl, reflecting the relationship of XSLT to XSL (XSLT is part of XSL). The declaration is made as follows:

<xsl:stylesheet        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"       version="1.0">

Notice in this case that the namespace is actually being declared in an attribute on an element that is already using the prefix. Although it may appear that the prefix is being used before it is declared, this is a perfectly valid way to declare and use a namespace.

The namespace declaration is preceded by the reserved namespace declaration attribute name xmlns. XSLT processors recognize the xmlns attribute as the beginning of a declaration for a namespace. The following attribute model definition shows the appropriate structure of the xmlns attribute.

ATTRIBUTE: xmlns:xsl CDATA #FIXED VALUE = "http://www.w3.org/1999/XSL/Transform"

The prefix xmlns is used only for namespace bindings and is not itself bound to any namespace name. (W3C REC-xml-names-19990114, Namespaces in XML, Section 4)

Following the xmlns namespace declaration, the prefix that will be used for the given namespace is given, followed by an equals sign and the value of the attribute, which is a URI.

Note

The prefix is not xslt, but xsl. XSLT is a subset of XSL. The larger, complete styling specification includes another set of functions for formatting only. All QNames from either XSLT or XSL are part of XSL, so they have the same namespace.

12.3.6 Using Other Namespaces

XSLT, as an "extensible" language, can be extended to include elements other than those in the base set specified by the W3C. For instance, James Clark's XT has several functions that were added beyond the basic set specified in version 1.0 of the XSLT specification. Because these additional functions are commonly used, we might want to declare a namespace for XT extensions in the document element as follows:

<xsl:stylesheet          xmlns:xsl="http://www.w3.org/1999/XSL/Transform"          version="1.0"          xmlns:xt="http://www.jclark.com/xt">

Thereafter, any extension elements used in the stylesheet would have the xt namespace prefix, while the normal XSLT elements would still have the xsl namespace prefix. Because the XT namespace is declared on the document element, all elements in the document can use the declared namespace if required.

12.3.7 The Default XML Namespace

XSLT stylesheets, being part of XML, have a default XML namespace that does not need to be declared.

There are two XML-specific attributes that use the XML namespace and can be used in XSLT: xml:lang and xml:space. The xml:space attribute is discussed in detail in Section 2.1.3.1 in connection with the document element. The xml:lang is not treated specially by XSLT since there is an XSLT-specific attribute (lang) that covers the same functionality. The XSLT lang attribute is discussed in Chapter 9.

12.3.8 Declaring the Extension Namespace and Its Applicability

As elsewhere in this book, the easiest way to explain how an extension element namespace applies is to look at the logical structure of the XSLT stylesheet as an XML document instance. When an extension element namespace is declared in the document element, that namespace declaration applies to the entire stylesheet. It does not, however, apply to imported or included stylesheets. Thus, if you wanted it to apply James Clark's XT to an entire XSLT stylesheet, you could declare the namespace for it, as shown below.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                  version="1.0"                  xmlns:xt=http://www.jclark.com/xt>

If you were using only one particular extension element in the context of only one template rule, you might declare the namespace only on that element, as shown below using the Saxon namespace.

<xsl:template match="something" saxon:trace="yes"       xmlns:saxon="http://icl.com/saxon">

If your XSLT stylesheet uses <xsl:fallback> with element-available() and function-available() to provide contingencies for unsupported elements or functions, you might more likely use the extension-element-prefixes attribute in the document element to provide a list of the possible namespaces, so that all possible processors are covered. This makes an XSLT stylesheet truly portable.

It is important to remember that conforming XSLT processors are not required to signal an error if an unsupported extension is encountered. Therefore, as you are fine-tuning or debugging a stylesheet, be sure and check that any extensions you are using are supported before assuming it's a deficiency in your stylesheet composition skills! At the same time, because some XSLT processors do not support the entire W3C specification, but may also have their own extension functions and/or elements, it can be confusing to trace the source of incorrect output. Comparing your chosen components to those supported by your particular processor is the first step in resolving an issue.

12.3.9 Processor Extensions, Java Additions, and Future W3C XSLT Specifications

In spite of the title of this section, we are not forecasting the future of XSLT. However, there is a crossover between the wish list for future XSLT specifications both publicly discussed and formally specified in the XSLT specification and the actual extensions provided by most mainstream XSLT processors.

The specification lists some 22 targeted goals as possible future additions to the current XSLT W3C specification. Among them are items such as IDREF functionality, which could serve, for instance, as a complement to the id() function. Other items are DTD and/or schema support, entity reference support, conditional expressions, case-insensitive comparisons, increased access to RTFs, and more.

However, the extensibility of XSLT has made it possible that "the future is now" on some wish list items, depending on what processor you choose. One of the most frequently requested and widely implemented extensions to XSLT is the ability for a single XSLT stylesheet to create several different output files. For instance, Xalan implements this as <xalan:write>, Saxon as <saxon:output>, and XT as <xt:document> extension instruction elements. This raises the issue of portability because the same functionality is available from each XSLT processor, but in different syntax. Thus, a portable XSLT stylesheet would need to use <xsl:fallback> to support the various processors on which it might be run to insure comparable results. These types of shared extensions, as well as the unique offerings with specific XSLT implementations, are discussed individually with the respective processors in which they are implemented.

Another item on the wish list is a common Java binding mechanism for external functions written in Java. Currently, four of the processors we discuss have very similar approaches to this: Xalan (with a few variations), Saxon, XT, and Oracle. The details of each are best explained in the product documentation that accompanies them. Additional information can be obtained through the respective Web sites for each XSLT processor.

12.3.10 Conforming XSLT Processors and the OASIS XSLT Conformance Committee

Throughout this book, we frequently use the phrase "conforming XSLT processor," or "processor that conforms to the W3C specification for XSLT." Just what does conforming mean, and who decides whether the processor conforms?

Until very recently, there has not been a specific way to assign a label of "conforming" to any XSLT processor that had any industry acceptance. However, one of the ongoing efforts of the Organization for the Advancement of Structured Information Standards, OASIS, has been the chartering of conformance committees and establishment of conformance testing parameters.

These conformance committees are not evaluative bodies in the sense that the committee members pass judgment on one software implementation or another. Quite the contrary, the committees are charged with the task of detailed research into the full implications of what a W3C specification means, both in its prescribed and proscribed behavioral descriptions for any software implementation. From this research is derived a set of conformance tests and prescriptions for how they are to be implemented, and their results evaluated. Industry implementations may then access these tests and apply them to their own software, or to the software they are evaluating, to determine levels of conformance and desirability. It is not a "speed test" or memory footprint test as much as a finely tuned filter for finding precise levels of granularity in subtle nuances of how every implication of the specification is or is not met by a given software implementation.

Currently, G. Ken Holman of Crane Softwrights Ltd., a leading developer who uses XSLT and the author of a comprehensive, regularly updated resource on XSLT, is the chair of the XSLT/XPath Conformance Technical Committee. Representatives from the National Institute for Standards in Technology (NIST) a branch of the U.S. government in addition to established experts in markup technology such as Mulberry Technologies, which, among other things, is home to the XSL-list email list (http://www.mulberrytech.com/) and representatives from industry, such as IBM and Sun Microsystems, fill the seats on the committee. These are collaborative efforts of voluntary service that serve the wider needs of the XML community. For instance, a rich set of thousands of detailed XSLT tests was contributed through David Marston, the IBM/Lotus representative, while many tests are being received from other sources such as Sun and Microsoft, among others.

The test suite, which will be available at http://www.oasis-open.org/, allows users and corporations to access an independent, nonprofit-generated set of tests for evaluating XSLT software under consideration for use in whatever XML deployment they are designing. Similarly, programmers of XSLT processors have a ready benchmark by which to progressively evaluate the robustness of their implementations. OASIS will ultimately address the gamut of XML-related technologies and provide a valuable resource to the information industry. You are encouraged to utilize the committee's work in evaluating software. It is a useful learning exercise as well to run these tests, as they cover the specification in greater detail than does any single publication on XSLT or XPath.

12.4 Java

Each implementation of XSLT that we've chosen to emphasize is described according to three basic characteristics. First, the installation is explained in sufficient detail that users unfamiliar with, for example, Java CLASSPATHs, will still be able to take advantage of the software that requires them. Second, the various special extensions added by each package are presented. Finally, any caveats or unimplemented features are noted, not as an evaluation, but as a summary of frequently discussed aspects of the software that are useful in planning how to work with XSLT stylesheets and how to utilize XSLT processing software.

In Section 12.3.1, we offer an opportunity to learn to install the JDK (Java Development Kit) that is required for most XML processing software, especially for XSLT. It is too often the case that packages come with the somewhat unhelpful line "Install JDK 1.1.6 or later" and little else. This means chasing around the Internet to find the JDK, figuring out which version to use, making sense of PATHs vs. CLASSPATHs, and often never even getting Java, let alone an XSLT processor or XSLT stylesheet, running. This book was designed to not take for granted that everyone already knows everything about Java. For those who do know the JDK, skip Section 12.3.1; for those who don't, we hope it helps, as we wrote it out of our own frustration with too much Java-related technology being taken for granted and too little help available.

12.4.1 Getting Java Going on Your Solaris/UNIX, Macintosh, or Windows Machine

The processors emphasized here all work predominantly with the JDK. Most assume or prefer JDK 1.1.6 or later, and some work best with JDK 1.2. In general, you are almost always safest with JDK 1.1.8 or later, but in many cases, XSLT processors are designed with increased convenience or simplicity if you have JDK 1.2 (also called, somewhat confusingly, Java 2). For example, Java 2 does not require the additional installation of Swing, which further simplifies the use of the GUI for Xalan, designed by Eric Lawson.

In essence, you are going to learn how to install a Java Virtual Machine, sometimes called a Java VM, or JVM. The JVM and JDK are not necessarily the same thing, and the nuances are important if you are programming in Java. They are not pivotal for XSLT. All you really need is a general idea what the JDK does on your system so that you can get the most out of your XSLT processor. In short, think of the JDK as being like your operating system.

If you work on Solaris, the JDK is an environment, so to speak, in which Java applications can run. The same thing applies to a Macintosh or Windows machine; the JDK is a "layer" on top of the basic operating system. The JDK runs like an application on top of the Mac OS (operating system) or Windows 95, 98, or NT basic OS. Then, a Java application in this case, the XSLT processor runs on top of that. If it sounds complex, don't worry; the beauty of Java is that you are largely insulated from this and your machine is insulated from crashes when you run a Java application.

It may prove worthwhile to add a little memory to your machine, because these layers do take up space, especially if the XML document instances you are transforming with your XSLT stylesheets are large. At a minimum, 64MB of RAM is needed for Windows, and realistically, not less than 128MB are required. At a minimum, the same is required for Macintosh, but realistically, unless you have a lot of time on your hands to wait while it runs, you need 128MB or more. Mac OS 7.6 or later is a bare minimum, but you should really have 8.1 or later. You can always use virtual memory, but this is going to be as slow as your hard drive already is because, of course, that's what virtual memory uses.

For Solaris and UNIX, you are usually on a shared network, so this is less of a problem, but you are going to want to carve out a little extra memory (we'll show you how) when you run the XSLT processor. If you have a Sun Ultra, for instance, don't even try to work with less than 64MB of RAM, and again, realistically, you will be need 128MB (just using swap doesn't help because, again, virtual memory is as slow as your hard drive).

Another advantage is that one Java application runs on Java for Mac, or Java for Windows, or Java for UNIX, largely transparently. That's why you will read about Java parsers or XSLT processors more than Windows or Macintosh processors; the Java XSLT processor can run on either, for the most part.

So, to get the benefits of Java and to start using XSLT processors, you will need to get the JDK that is appropriate for your machine. Once you have that "Java operating system" ready, you can run all the Java XML processing software you want. That's a great advantage of Java too; once the JVM is installed, you've instantly made a whole new world of software available to yourself much or most of which is also free!

12.4.1.1 Accessing a Java Virtual Machine

If you are on a Solaris or UNIX system, chances are there is already a JVM installed (if not, there are instructions in Section 12.3.1.2). Contact your system administrator to find out where it is on the system so you know in what directory to look for the settings described below for each XSLT processor. Usually, it will be the location of the Java classes, so you can set a Java CLASSPATH to point to them.

You can also find the JVM yourself with a little persistence. Using a command-line window (terminal, in Solaris), type java. If a set of instructions for running Java comes up, you are all set. If not, you will need to do a couple more things.

Try to find the JVM by changing into your root directory. Type cd / at the command line.
Test to see if there is a Java version running by typing Java-version.
Type find . -name 'java' -print.
Set your environment to point to the path by typing:
```
setenv CLASSPATH=/java/classes/classes.zip 
```
Note that this assumes you are using csh, not ksh or Bourne shell. The path shown here may change depending on where java is on your system.
If you know where your XSLT processor files are, you can do all this in one step by separating each with a colon (:) as below for an installation of XT where the XT files are all in your usr/bin directory.
1. For pre-JDK 1.2:
```
setenv CLASSPATH=/usr/bin/xt/sax.jar:/usr/bin/xt/xt:/java/classes/classes.zip 
```
2. For JDK1.2+:
```
setenv CLASSPATH=/usr/bin/xt/sax.jar:/usr/bin/xt/xt:. 
```
You can also make this permanent if you're comfortable editing your .cshrc file; just add the following line to it (exactly as above, but with set rather than setenv, and remember that /java/classes/classes.zip is only for pre-JDK 1.2; otherwise only a "." period is needed).
1. For pre-JDK 1.2:
```
set CLASSPATH=/usr/bin/xt/sax.jar:/usr/bin/xt/xt:/java/classes/classes.zip 
```
2. For JDK1.2+:
```
set CLASSPATH=/usr/bin/xt/sax.jar:/usr/bin/xt/xt:. 
```

If you work with Internet Explorer 5.0 or later (actually 4.0 or later, but we recommend 5.0), there is a JVM built in, which can be used with either Instant Saxon or the Windows version of XT. In these two cases, the Saxon or XT processor comes preconfigured to access the JVM that is part of IE. You can download IE 5.0 at http://www.microsoft.com. The instructions for using Instant Saxon or XT Windows version are quite simple once you have IE 5.0, which is free and automatically installs all you need if you follow the default settings in the installation wizard.

12.4.1.2 Setting Up a JVM on Solaris/UNIX (Linux, too)

Most of these installations are very similar, so we will generalize here. When you choose the specific package you need, detailed instructions will lead you through the particulars.

The most important thing here is to be sure you get the right version of the JVM when you download. It requires a few additional steps to work in UNIX or Solaris, but it's still quite easy and should not take more than an hour with a reliable, fast Ethernet Internet connection. If you skipped Section 12.3.1.1, reading it could save a lot of time if and it is highly likely that this is the case the JVM is already on your system.

If not, go to the Sun site at http://www.java.sun.com and download the JDK file (1.1.8 or later). Get the SPARC or Intel Solaris version as needed. For instance, with Java 2, you might download the Solaris SPARC file 1.2.2_05_jdk_sparc.tar.Z. Root privileges are not necessary to install or run Java.

Choose a directory in which to install, such as the common /usr/local/bin or /usr/bin or /local/bin (sometimes Solaris installs work most transparently when you use /opt), and copy the files into it if they are not already there.
Use uncompress 1.2.2_05_jdk_sparc.tar.Z to unzip, then use tar xvf 1.2.2_05_jdk_sparc.tar to unarchive, or untar, the file.
Untarring the file creates all necessary directories and subdirectories.
Set your environment permanently by changing your .cshrc file; for instance, if you are installing into /usr/bin, add the following line to your .cshrc.
1. For pre-JDK 1.2:
```
set CLASSPATH=/usr/bin/java/classes/classes.zip 
```
2. For JDK1.2+:
```
set CLASSPATH=. 
```
  On Windows, remember to use the semi-colon (;) and reverse slash (\).
If you do not want to make this permanent, you can simply set this each time with:
1. For pre-JDK 1.2:
```
setenv CLASSPATH=/usr/bin/java/classes/classes.zip 
```
2. For JDK1.2+:
```
setenv CLASSPATH=. 
```

You are now ready to run the XSLT processor on your Solaris, UNIX, or Linux system by augmenting this CLASSPATH with the location of the XSLT files you have chosen to use. Test the installation by typing java at the prompt, or java version to see if it's the version you thought you installed.

12.4.1.3 Installing the JDK on Windows

When you choose to run Java on Windows, you will download either one large .zip file or several small files from http://java.sun.com. Whichever you choose, follow the instructions for dealing with the multiple or single files as appropriate (it is recommended that, to avoid errors, waiting the additional time for downloading the single file is worth it). Put the downloaded file into a folder on your PC, such as c:\utilities or c:\java, and unzip it with WinZip (likely on your machine; otherwise, do a simple search on the Web and download it). You can double-click on the .zip file with WinZip on your system and it will open up. Then, double-click the resulting executable file and follow the auto-prompts through the installation process with the install wizard.

Once you've gone through the install, you need to add a few lines to your autoexec.bat file, which you will find by using Notepad and looking for it on the root c:\directory (remember to choose "All" in the Files of Type pulldown in the Open File window). The change to PATH identifies the actual program location. CLASSPATH identifies the class file locations. If you installed the JDK to c:\jdk1.2.2, program files like javac are in c:\jdk1.2.2\bin, and the class files are in c:\jdk1.2.2\lib\classes.zip. Add the following lines (assuming you have installed the JDK into c:\java, though the wizard will often choose something like c:\jdk1.2.2) to your autoexec.bat file:

PATH ......;C:\jdk1.2.2\bin;....... SET CLASSPATH=.

The "......" denote whatever directories are already in your PATH; be sure to separate each one with a semicolon, though no semicolon is added at the end of the line, and do not use spaces or hard returns. If you are using Windows NT, you can specify the path by going to Start, Control Panel, System, Environment. Test to see if you have been successful by typing java at a c:\ prompt in a DOS window (go to Start, Run, and type cmd), or to see if you have the right version running in case there was another previously installed or conflicting version type c:\java version.

12.4.1.4 Installing Java on a Macintosh

Setting up Java on a Macintosh is extremely simple. You need only to run the install and all the paths for running Java will be correct. The only paths you will need to add are those for whatever XSLT software you choose.

First, get the Mac MRJ 2.2.2. Next, you need to get the Java for Mac that has JBindery. It comes with the MRJ SDK. You can find the MRJ 2.2 SDK for free at http://developer.apple.com/java/text/download.html#sdk (it takes a few minutes from home, less on Ethernet). Macintosh Runtime for Java Software Development Kit 2.1 is fine if you have it. Now install it by double-clicking the icon, following all the automatic presets. When it's done, you will see the MRJ SDK 2.1 directory on your hard drive, inside that is a folder called Tools, and one called Application Builders. It is inside the Application Builders where you will find the JBindery folder (if not, remove MRJ, reinstall with the Custom option, and select all components).

12.5 Commercial XSLT Processors

As noted, we are only describing three commercial XSLT processors due to limits of space and the simple pragmatics of how widespread and commonly used they are. These three processors are Sun Microsystem's XSLTC, Oracle's XML suite, and Microsoft's MSXML.

12.5.1 Sun Microsystems' XSLTC

Sun Microsystems' XSLTC (XSLT Compiler) is a unique addition to the XSLT processor realm in that it produces translets per stylesheet, which greatly reduce the amount of memory consumed by the DOM, in most cases, and often substantially increase speed. It is also possible to protect your stylesheet composition with the Sun translets, as they are compiled byte-code rather than human-readable XML document instances. This approach brings the promise of sophisticated transformations with a small enough memory footprint to work with wireless devices. This very new technology is significantly different from a conventional XSLT processor.

The latest tests at Sun show that XSLTC is typically 30% faster, if not more, than XT. The following table (provided by Sun) of some contributed XSL stylesheets shows a sample of processing speeds.

	XSLTC	XT	Saxon	Xalan
test1	2.3	2.76	3.96	6.16
test2	2.1	2.98	4.4	5.98
test3	2.98	4	4.26	14.1
test4	4.18	4.72	6.3	12.48
test5	3.74	5.78	5.82	10.1
test6	4.7	7.5	6.6	11.84
test7	2.08	4.5	3.98	7.22

12.5.2 Oracle XML Developer's Kit (XDK)

Oracle provides a suite of XML tools in its XML Developer's Kit (XDK), which is available for Java, C, C++, and PL/SQL. This suite includes an XML parser and an XSLT processor, as well as other useful XML tools. XSLT support is provided with version 2 of the XML parser, which can be downloaded from the Oracle Technology Network Web site at http://technet.oracle.com/tech/xml. The Oracle XDK is free, and it is fully supported for Oracle customers who have an existing support plan. Anyone using the Oracle XDK can also take advantage of free technical assistance provided through the XML Discussion Forum on Oracle Technet. The XDK can also be redistributed under the terms of the Oracle license.

The Oracle XML Parser is available in a standalone command-line version, or Libraries for Java, C, C++, and PL/SQL. It supports DOM2, SAX2, XML Namespaces 1.0, and XML Schema, and also runs on Oracle 8i and Oracle Application Server.

The XSLT processor supports the W3C XSLT Recommendation 1.0 and the W3C XPath Recommendation 1.0.

12.5.3 Installing the Oracle XSL Processor

The XSLT processor is included with the XML parser download. The Oracle XML parsers can be downloaded from the Oracle Technology Network Web site at http://technet.oracle.com/tech/xml. Select the appropriate XDK language version, and then click on the "Software" icon.

Note

You must be a registered Oracle Technology Network member to have access to download software. Membership is free, but you may have to fill out a registration form prior to downloading software.

Software version choices are listed in Table 12-1

Table 12-1. Software for XSLT and Oracle
Software	Platform
Java v2	UNIX, Windows NT
C	Linux, Sun Solaris, Windows NT
C++ v2	Linux v2.0.2, Sun Solaris v2.0.4, HP Unix, Windows NT v2.0.3
PL/SQL	UNIX, Windows NT

The appropriate programming environment for each version should be installed and configured prior to installing the parser. For example, if you download the Java version, Java JDK-1.1.x or higher should be installed on your system. Also, since the files are zipped, either GNU gzip on UNIX or the WinZip executable on Windows should be available to unzip the files.

Once the appropriate software is installed and the zipped parser file has been downloaded, unzip the file into a directory of your choice and configure your system to add the directory and parser to your PATH and CLASSPATH environment variables.

The Oracle XSL processor has a command line utility as well as libraries. The download includes sample files. The command to run the XSL processor on the sample files is:

java oracle.xml.parser.v2.oraxsl <sample xsl file> <sample xml file>

12.5.4 Microsoft MSXML

The October 2000 Microsoft XML Parser (MSXML) 3.0 final release provides a complete implementation of XSLT/XPath and complete conformance to the specifications from the W3C and the OASIS Test Suite.^[4]

Complete implementation of XSLT/XPath is of course the key. Previous versions of the Microsoft Parser did not support certain elements and functions of the XSLT and XPath specifications. The specific changes in this release include support for the <xsl:decimal-format> element, the unparsed-entity-uri() and format-number() functions, and the namespace axis.

Microsoft's original implementation of MSXML (for Internet Explorer 5) contained functionality that was not implemented in the final XSLT or XPath specifications. However, those ideas contributed to the development of a new W3C Proposal, XML Query Language (XQL),^[5] and also led to the creation of a new W3C working group, the XML Query group.

MSXML 3.0 continues to support the previous XSL language WD-XSL,^[6] which integrated the XQL extensions. Since this language is obsolete, Microsoft encourages people to move to compliant XSLT as soon as practical.

The Microsoft web site provides complete documentation for XSLT, XML, and SAX2, including MSXML extensions. The documentation is included in the installation for the 3.0 Version of the XML SDK and can be downloaded from:

http://msdn.microsoft.com/xml/default.asp XML Downloads  XML  MSXML SDK 3.0 Release

The files are installed in the \Program Files\Microsoft XML Parser SDK folder on your system. The documentation is in the Docs directory, and can be accessed by clicking on the xmlsdk30.chm file.

12.5.4.1 MSXML Extension Elements and Functions

The Microsoft XSLT Processor extends the XSLT and XPath specifications with one extension element, <msxsl:script>, and one extension function, node-set().

The <msxsl:script> Extension Element

The <msxsl:script> extension element is a top-level element (a child of <xsl:stylesheet> or <xsl:transform>) and contains script blocks that define global variables and functions for script extensions. It has two attributes: language, which is used to specify the scripting language of the script block it contains; and implements-prefix, a required attribute used to associate a namespace prefix with the script block, as shown in the following element model definition. The <msxsl:script> element does not allow child elements.

<!-- Category: extension top---level element --> <msxsl:script   language = "language-name"   implements---prefix = "prefix of user's namespace"> </msxsl:script>

Calling the script function using the <xsl:value-of> instruction element executes the script block and converts the result into a text string. The select attribute of the <xsl:value-of> element passes in the parameters required by the function. The script blocks defined by <msxsl:script> are globally available to all XSLT elements.

The optional language attribute specifies the active scripting language for the function defined in the script block. It accepts the same values that are allowed on the <script> element in HTML. The processor uses the default language Microsoft JScript if no other language is specified, or if the attribute is not specified.

The required implements-prefix attribute is used to associate a namespace prefix with the script block. The namespace must be declared prior to the use of the <msxsl:script> element. The value of the attribute is the prefix declared by the namespace.

Example 12-2 comes directly from the Microsoft SDK^[7] documentation, and shows an example of the declaration of the user namespace, as well as the declaration and calling of a script block.

Example 12-2 Using the `<msxsl:script>` function.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                  xmlns:msxsl="urn:schemas---microsoft---com:xslt"                  xmlns:user="http://mycompany.com/mynamespace"                  version="1.0">       <msxsl:script language="JScript" implements---prefix="user">         function xml(nodelist) {           return nodelist.nextNode().xml;         }        </msxsl:script>        <xsl:template match="/">          <xsl:value---of select="user:xml(.)"/>        </xsl:template> </xsl:stylesheet>

12.5.4.2 The `node-set()` Extension Function

The MSXML node-set() function enables the conversion of a tree into a node set. The result is a single node that contains the root node of the tree. The function return type of this function is a node-set. Its one required argument is a string, as shown in the function prototype below. The string is processed in a manner defined by the msxsl namespace to convert the string into a node-set. The function must be called using the msxsl namespace prefix, unless the msxsl prefix is declared to be the default namespace of the stylesheet.

`Function:` `node-set` `msxml:node-set` (`string`)
Function Name	Function Group	Function Return Type	Arguments	Argument Type
`node---set()`	Node-set	Node-set	String	Required

One of the features that was removed from the original MSXML implementation, in order to support conformance, was the ability to use variable references in pattern expressions. Because you cannot use variable references in a pattern, it is not possible to have arbitrary patterns that return a node-set. For example, the use of the $var variable in <xsl:for-each select="$var/el"> is not allowed, but the node-set() function gets around this limitation, as shown below:

<xsl:for-each select="msxsl:node-set($var)/el)">

The result of the evaluation of the select expression is a node-set consisting of <el> nodes which are descended from whatever element is defined in the var variable. The content of the <xsl:for-each> element will then apply to each node in the node-set.

12.5.5 Installing the Latest Microsoft XML Parser

Download the current version of MSXML from the Microsoft Web site at:

http://msdn.microsoft.com/xml/default.asp XML Downloads  XML  MSXML Parser 3.0 Release

The Download screen will appear n the window on the right hand side. Click on the "Download" icon, accept the end-user license agreement, and select "save to disk." This will install a file called "msxml3.exe" on your system. Double click on this file to install the software. Note: You must have the Microsoft Windows Installer version 1.1 loaded on your machine prior to running this installation. It can be downloaded from the same window that was used to download the MSXML parser, but may already be included in your system if you have the latest version of Microsoft Office or Windows NT. The installer will run and install the program, but will not tell you where or how to use it.

Installing the 3.0 version will not overwrite any previous versions of MSXML on your system unless it is installed specifically using the "replace" mode. This means that your programs will continue to use the older version of the parser until you replace it with the new version using the xmlinst.exe installer tool, which can be downloaded from:

http://msdn.microsoft.com/xml/default.asp XML Downloads  XML  Xmlinst.exe Installer Tool.

The Microsoft web site recommends installing Internet Explorer 4.01 Service Pack 1 or later in order for this beta release to function properly. The latest version of Internet Explorer can be downloaded from:

http://www.microsoft.com/windows/ie/default.htm

^[1] The W3C XSL committee is currently in the process of defining specifications for how extensions will be implemented. XSLT v1.1 will include extension functions and XSLT v2.0 will include extension elements.

^[2] . For more on XHTML, see http://www.w3.org/TR/xhtml1/. However, be aware that there is now only one XHTML, but the example in the W3C specification for XSLT reflects a prior phase, since rescinded, in which XHTML had three possible namespaces.

^[3] XML names also allows a colon (:), but this is reserved for use with namespaces.

^[4] There are a few minor bugs in this release, which Microsoft is addressing with Web releases.

^[5] See http://www.w3.org/TandS/QL/QL98/pp/xql.html.

^[6] See http://www.w3.org/TR/WD-xsl.

^[7] Much of the information for this section comes directly from the Microsoft SDK 3.0 documentation. All copyrights for Microsoft and the MSXML, as stated in the copyright page of the SDK, apply.

CONTENTS

12.1 XSLT Processors

12.2 Extension Elements and Functions

12.3 Namespaces

12.3.1 Theory of a Namespace

12.3.2 Anatomy of a Namespace

Figure 12-1. Anatomy of a namespace declaration.

12.3.3 Default Namespace

Example 12-1 An LRE stylesheet declaring the XHTML namespace as the default namespace.

12.3.4 Qualified Names (QNames) and No-Colon Names (NCNames)

12.3.5 The XSL Namespace

12.3.6 Using Other Namespaces

12.3.7 The Default XML Namespace

12.3.8 Declaring the Extension Namespace and Its Applicability

12.3.9 Processor Extensions, Java Additions, and Future W3C XSLT Specifications

12.3.10 Conforming XSLT Processors and the OASIS XSLT Conformance Committee

12.4 Java

12.4.1 Getting Java Going on Your Solaris/UNIX, Macintosh, or Windows Machine

12.4.1.1 Accessing a Java Virtual Machine

12.4.1.2 Setting Up a JVM on Solaris/UNIX (Linux, too)

12.4.1.3 Installing the JDK on Windows

12.4.1.4 Installing Java on a Macintosh

12.5 Commercial XSLT Processors

12.5.1 Sun Microsystems' XSLTC

12.5.2 Oracle XML Developer's Kit (XDK)

12.5.3 Installing the Oracle XSL Processor

Table 12-1. Software for XSLT and Oracle

12.5.4 Microsoft MSXML

12.5.4.1 MSXML Extension Elements and Functions

Example 12-2 Using the <msxsl:script> function.

12.5.4.2 The node-set() Extension Function

12.5.5 Installing the Latest Microsoft XML Parser

Example 12-2 Using the `<msxsl:script>` function.

12.5.4.2 The `node-set()` Extension Function