Trees and Nodes

Trees and Nodes

When youre working with XSLT, you no longer think in terms of documents, but rather in terms of trees. A tree represents the data in a document as a set of nodeselements, attributes, comments, and so on are all treated as nodesin a hierarchy, and in XSLT, the tree structure follows the W3C XPath recommendation (www.w3.org/TR/xpath). In this chapter, Ill go through whats happening conceptually with trees and nodes, and in Chapters 3 and 4, Ill give a formal introduction to XPath and how it relates to XSLT. You use XPath expressions to locate data in XML documents, and those expressions are written in terms of trees and nodes.

In fact, the XSLT recommendation does not require conforming XSLT processors to have anything to do with documents; formally , XSLT transformations accept a source tree as input, and produce a result tree as output. Most XSLT processors do, however, add support so that you can work with documents.

From the XSLT point of view, then, documents are trees built of nodes; XSLT recognizes seven types of nodes:

  • The root node. This is the very start of the document. This node represents the entire document to the XSLT processor. Important: Dont get the root node mixed up with the root element , which is also called the document element (more on this later in this chapter).

  • Attribute node. Holds the value of an attribute after entity references have been expanded and surrounding whitespace has been trimmed .

  • Comment node. Holds the text of a comment, not including <!-- and -->.

  • Element node. Consists of the part of the document bounded by a start and matching end tag, or a single empty element tag, such as <br/> .

  • Namespace node. Represents a namespace declarationand note that it is added to each element to which it applies.

  • Processing instruction node. Holds the text of the processing instruction, which does not include <? and ?>. The XML declaration, <?xml version="1.0"?> , by the way, is not a processing instruction, even though it looks like one. XSLT processors strip it out automatically.

  • Text node. Text nodes hold sequences of charactersthat is, PCDATA text. Text nodes are normalized by default in XSLT, which means that adjacent text nodes are merged.

As youll see in Chapter 7, you use XPath expressions to work with trees and nodes. An XPath expression returns a single node that matches the expression; or, if more than one node matches the expression, the expression returns a node set . XPath was designed to enable you to navigate through trees, and understanding XPath is a large part of understanding XSLT.

Heres an important point to keep in mind: The root node of an XSLT tree represents the entire document. It is not the same as the root element. For example, take a look at the following document; in XSLT terms, the root node represents the whole document, and the root element is <library> :

 <?xml version="1.0"?>  <library>     <book>          <title>              Earthquakes for Lunch          </title>          <title>              Volcanoes for Dinner          </title>     </book>  </library> 

The term root element comes from the XML recommendation, and because its easy to confuse with the XSLT root node , which comes from the XPath recommendation, some XSLT authors call the root element the document element. This overlap in nomenclature is definitely unfortunate.

In addition, you should know that XSLT processors normalize text nodes. That is, they merge any two adjacent text nodes into one large text node to make it easier to work with the tree structure of a document. This means, for example, that there will never be more than one text node between two adjacent element nodes, as long as there was only text between the element nodes to start with.

In XSLT, nodes can have names , as well as child nodes and parent nodes. In other words, element nodes, attribute nodes, namespace nodes, and processing instruction nodes can have names; every element node and the root node can have children; and all nodes except the root node have parents.

For example, heres how the XML document we just saw looks to an XSLT processor as a tree of nodes:

graphics/02figa.gif

As you can see, the root node is at the very top of the tree, followed by the root elements node, corresponding to the <library> element. This is followed by the <book> node, which has two <title> node children. These two <title> nodes are grandchildren of the <library> element. The parents, grandparents, and great-grandparents of a node, all the way back to and including the root node, are that elements ancestors . The nodes that are descended from a node (its children, grandchildren, great-grandchildren, and so on) are called its descendants . Nodes on the same level are called siblings .

This kind of tree model can represent every well-formed XML document. In fact, XSLT is not limited to working with well-formed documents. In well- formed documents, there must be one element that contains all the others, but the XSLT recommendation does not require this. In XSLT, the root node can have any children that an element can have, such as multiple elements or text nodes. In this way, XSLT can work with document fragments , not simply well-formed documents.

Result Tree Fragments

Besides working with input tree fragments, processors can include a special data type in XSLT 1.0 called a result tree fragment in their output. The result tree fragment data type has been eliminated in the XSLT 1.1 working draft, however (see Chapter 7), which means it will probably not be part of XSLT 2.0.

Actually, the tree diagram shown earlier does not represent the whole picture from an XSLT processors point of view. Ive left out one type of node that causes a great deal of confusion: text nodes that contain only whitespace. Because this causes so much confusion in XSLT, its worth taking a look at now.

Whitespace

The example XML document weve been working on so far is nicely indented to show the hierarchical structure of its elements, like this:

 <?xml version="1.0"?>  <library>     <book>          <title>              Earthquakes for Lunch          </title>          <title>              Volcanoes for Dinner          </title>     </book>  </library> 

However, from an XSLT point of view, the whitespace Ive used to indent elements in this example actually represents text nodes. This means that by default, those spaces will be copied to the output document. Understanding how this works is a major source of confusion in XSLT, so Ill take a quick look at it here, and take a look at how to handle whitespace in detail in the next chapter.

In XSLT, there are four whitespace characters : spaces, carriage returns, line feeds, and tabs. These characters are all treated as whitespace. That means that from an XSLT processors point of view, the input document looks like this:

 <?xml version="1.0"?>  <library> graphics/carriage.gif ....<book> graphics/carriage.gif ........<title> graphics/carriage.gif ............Earthquakes for Lunch graphics/carriage.gif ........</title> graphics/carriage.gif ........<title> graphics/carriage.gif ............Volcanoes for Dinner graphics/carriage.gif ........</title> graphics/carriage.gif ....</book> graphics/carriage.gif </library> 

All the whitespace between the elements is treated as whitespace text nodes in XSLT. That means that there are five whitespace text nodes we have to add to our diagram: one before the <book> element, one after the <book> element, as well as one before, after, and in between the <title> elements:

graphics/02figb.gif

Whitespace nodes such as these are text nodes that contain nothing but whitespace. Because XSLT processors preserve this whitespace by default, you should not be surprised when it shows up in result documents. This extra whitespace is usually not a problem in HTML, XML, and XHTML documents, and Ill eliminate it in the result documents here in the text to make sure the indenting indicates the correct document structure. Well see how XSLT processors can strip whitespace nodes from documents, as well as how XSLT processors can indent result documents. Note that text nodes that contain characters other than whitespace are not considered whitespace nodes, and so will never be stripped from a document.

Another thing to note is that attributes are themselves treated as nodes. Although attribute nodes are not considered child nodes of the elements in which they appear, the element is considered their parent node. (This is different from the XML DOM model, in which attributes both are not children and do not have parents.) If I add an attribute to an element like this:

 <?xml version="1.0"?>  <library>     <book>          <title>              Earthquakes for Lunch          </title>          <title pub_date="2001">              Volcanoes for Dinner          </title>     </book>  </library> 

Then heres how this attribute appears in the document tree:

graphics/02figc.gif

Each node has a number of set properties associated with it in XSLT, and the following list includes the kinds of properties that the writers of XSLT processors keep track of for each node:

  • name . The name of the node.

  • string-value. The text of the node.

  • base-URI. The nodes base URI (the XML version of an URL).

  • child. A list of child nodes; null if there are no children.

  • parent. The nodes parent node.

  • has-attribute. Specifies an element nodes attributes if it has any.

  • has-namespace. Specifies an element nodes namespace nodes.

Theres another consideration to take into account when working with trees: XSLT processors are built on top of XML parsers, and the rules for XML parsers and XSLT processors are slightly different, which can lead to problems. This issue can become important in some cases, so the following section discusses it briefly .



Inside XSLT
Inside Xslt
ISBN: B0031W8M4K
EAN: N/A
Year: 2005
Pages: 196

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net