1.1 How XSLT Works

About the quickest way to get you acquainted with how XSLT works is through simple, progressive examples that you can do yourself. The first example walks you through the process of transforming a very brief XML document using a minimal XSLT stylesheet. You transform documents using a processor that complies with the XSLT 1.0 specification.

All the documents and stylesheets discussed in this book can be found in the example archive available for download at http://www.oreilly.com/catalog/learnxslt/learningxslt.zip. All example files mentioned in a particular chapter are in the examples directory of the archive, under the subdirectory for that chapter (such as examples/ch01, examples/ch02, and so forth). Throughout the book, I assume that these examples are installed at C:\LearningXSLT\examples on Windows or in something like /usr/mike/learningxslt/examples on a Unix machine.

1.1.1 A Ridiculous XML Document

Now consider the ridiculously brief XML document contained in the file msg.xml:

<msg/>

There isn't much to this document, but it's perfectly legal, well-formed XML. It's just a single, empty element with no content. Technically, it's an empty element tag.

Because it is the only element in the document, msg is the document element. The document element is sometimes called the root element, but this is not to be confused with the root node, which will be explained later in this chapter. The first element in any well-formed XML document is always considered the document element, as long as it also contains all other elements in the document (if it has any other elements in it). In order for XML to be well-formed, it must follow the syntax rules laid out in the XML specification. I'll highlight well-formedness rules throughout this book, when appropriate.

A document element is the minimum structure needed to have a well-formed XML document, assuming that the characters used for the element name are legal XML name characters, as they are in the case of msg, and that angle brackets (< and >) surround the tag, and the slash (/) shows up in the right place. In an empty element tag, the slash appears after the element name, as in <msg/>. Tags are part of what's called markup in XML.

1.1.2 A First XSLT Stylesheet

You can use the XSLT stylesheet msg.xsl to transform msg.xml:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="text"/>    <template match="msg">Found it!</template>    </stylesheet>

Before transforming msg.xml with msg.xsl, I'll discuss what's in this stylesheet. You'll notice that XSLT is written in XML. This allows you to use some of the same tools to process XSLT stylesheets that you would use to process other XML documents.

1.1.2.1 The stylesheet element

The first element in msg.xsl is stylesheet:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">

This is the document element for stylesheet, one of two possible document elements in XSLT. The other possible document element is transform, which is actually just a synonym for stylesheet. You can use one or the other, but, for some reason, I see stylesheet used more often than transform, so I'll knuckle under and use it also. Whenever I refer to stylesheet in this book, the same information applies to the transform element as well. You are free to choose either for the stylesheets you write. The stylesheet and transform elements are documented in Section 2.2 of the XSLT specification (this W3C recommendation is available at http://www.w3.org/TR/xslt).

The version attribute in stylesheet is required, along with its value of 1.0. (Attributes are explained in Section 1.2.1.1, later in this chapter.) An XSLT processor may support Versions 1.1 and 2.0 as the value of version, but this support is only experimental at this point (see Chapter 16). The stylesheet element has other possible attributes beside version, but don't worry about those yet.

1.1.2.2 The XSLT namespace

The xmlns attribute is a special attribute for declaring a namespace. This attribute, together with a Uniform Resource Identifier (URI) value, is called a namespace declaration:

xmlns="http://www.w3.org/1999/XSL/Transform"

Such a declaration is not peculiar to stylesheet elements, but is more or less universal in XML, meaning that you can use it on any XML element. Nevertheless, an XSLT stylesheet must always declare a namespace for itself in order for it to work properly with an XSLT processor. The official namespace name, or URI, for XSLT is http://www.w3.org/1999/XSL/Transform. A namespace name is always a URI.

The special xmlns attribute is described in the XML namespaces specification, officially, "Namespaces in XML" (http://www.w3.org/TR/REC-xml-names). A namespace declaration associates a namespace name with elements and attributes that attempt to make such names unambiguous.

The Namespace Prefix

You can also associate a namespace name with a prefix, and then use the prefix with elements and attributes. More often than not, the XSLT elements are prefixed with xsl, such as in xsl:stylesheet. While the xsl prefix is commonly used in XSLT, these three letters are only a convention, and you are not required to use them. You can use any prefix you want, as long as the characters are legal for XML names. (See Sections 2.2 and 2.3 of the XML specification at http://www.w3.org/TR/REC-xml.html for details on what characters are legal for XML names.) For simplicity, I avoid using a prefix in the first few XSLT examples in the book, but I will start using xsl when the stylesheets get a little more complicated because a prefix will help sort out namespaces more readily. You'll learn more about namespaces, including how to use prefixes, in Chapter 2.


1.1.2.3 The output element

The stylesheet element is followed by an optional output element. This element has 10 possible attributes, but I'll only cover method right now:

<output method="text"/>

The value text in the method attribute signals that you want the output to be plain text. The default output method for XSLT is xml, and another possible value is html. XSLT 2.0 also offers xhtml (see Chapter 16). There's more to tell about the output element, but I'll leave it at that until Chapter 3. In the XSLT specification, the output element is discussed in Section 16.

1.1.2.4 The template element

Next up in msg.xsl is the template element. This element is really at the heart of what XSLT is and does. A template rule consists of two parts: a pattern to match, and a sequence constructor (so named in XSLT 2.0). The match attribute of template contains a pattern, and the pattern in this instance is merely the name of the element msg:

<template match="msg">Found it!</template>

A pattern attempts to identify nodes in a source document, but has some limitations, which will come more fully to light in Chapter 4. A sequence constructor is a list of things telling the processor what to do when a pattern is matched. This very simple sequence constructor just tells the processor to write the text Found it! when the pattern is matched. (I won't use the phrase sequence constructor much in this book but will usually just use the term template instead.) Put another way, when an XSLT processor finds the msg element in the source document msg.xml, it writes the text Found it! from the template to output. When a template writes text from its content to the result tree, or triggers some other sort of output, the template is said to be instantiated.

The source document becomes a source tree when it is processed by an XSLT processor. Such source documents are usually files containing XML documents, such as msg.xml. The result of a transformation becomes a result tree within the processor. The result tree is then serialized to standard output (most often the computer's display screen) or to an output file. The source or result of a transformation, however, doesn't have to be a file. A source tree could be built just as easily from an input stream as from a file, and a result tree could be serialized as an output stream.

The output and template elements are called top-level elements. They are two of a dozen possible top-level elements that are defined in XSLT 1.0. They are called top-level elements because they are contained within the stylesheet element.


1.1.2.4.1 The root node

Another way you could write a location path is with a slash (/). In XPath, a slash by itself indicates the root node or starting point of the document, which comes before the first element in the document or document element. A node in XPath represents a distinct part of an XML document. A few examples of nodes are the root node, element nodes, and attribute nodes. (You'll get a more complete explanation of nodes in Chapter 4.)

In root.xsl, the match attribute in template matches a root node in any source document:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="text"/>    <template match="/">Found it!</template>    </stylesheet>

The msg element is the document element of msg.xml, and it is really the only element in msg.xml. The template in root.xsl only matches the root node (/), which demarcates the point at which processing begins, before the document element. But because the template processes the children of the root node, it finds msg in the source tree as a matter of course.

Because of a feature called built-in templates, this stylesheet will produce the same results as msg.xsl. Just trust me on this for now: it would be overwhelming at this point to go into all the ramifications of the built-in templates. I will say this, though: built-in templates automatically find nodes that are not specifically matched by a template. This can rattle nerves at first, but you'll get more comfortable with built-in templates soon enough.



Learning XSLT
Learning XSLT
ISBN: 0596003277
EAN: 2147483647
Year: 2003
Pages: 164

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net