Moving On with XSLT | The Official XMLSPY Handbook

As you found out by debugging, XSLT is a specially formatted XML document that operates on another document. The XSLT specification is defined by using http://www.w3.org/1999/XSL/Transform as a namespace identifier. Traditionally, an XSLT document uses the prefix xsl to define the XSLT namespace. However, XMLSPY and other XSLT processors can digest any other namespace. In your code examples, it is a good idea to use xsl as a prefix, especially when you are learning the XSLT techniques. Consistently using the same namespace prefix makes it easier to understand the code and to cut and paste already-created examples.

Within an XSLT document, various XML nodes represent XSLT instructions. It is as if an XML parser executed the value of each XML node, based on the identifier of the XML node. The XSLT processor can execute a condition, do loops, call functions, and create other generic programming constructs. Keep in mind, however, that XSLT is not a programming language in the classical sense, like Java, C++, and C#. XSLT is a programming language specifically geared toward the transformation of XML content into content that can be used for another purpose, such as presenting the text on a Web page.

Doing a simple node selection

In the debugging example shown in the preceding section, the debugging process requires the developer to constantly specify which XSLT document to debug or which XML document to associate with the XSLT document. To simplify debugging and XSLT execution, you can associate the XML file with the XSLT sheet by following these steps:

Within the XMLSPY IDE, select the XSLT document so that it is the window with the focus.
Choose XSL → Assign Sample XML. A message box appears asking if XMLSPY can reformat your XML code.
Click OK and the Find dialog box shown in the debugging process is displayed.
Select the XML file that you want to associate and click OK. At the top of the XSLT document, an additional XML instruction is added that looks similar to the following XML fragment:
```
<?xml version="1.0" encoding="UTF-8"?> <?xmlspysamplexml D:\Instructor\BookXMLSpy\xslt_project\introduction.xml?> <xsl:stylesheet version="1.0"
```

The XML instruction xmlspysamplexml is specific to the XMLSPY IDE and allows the debugger and XSLT processor to know which XML document should be processed by the XSLT document.

In the earlier debugging example, the XSLT processor output the XML content as default content. This form of content is not compatible with HTML because it does not contain any specific HTML tags. Most HTML browsers would read this content as simple text. To create content consumable by an HTML browser, consider the following XSLT document:

<?xml version=”1.0” encoding=”UTF-8”?> <?xmlspysamplexml D:\Instructor\BookXMLSpy\xslt_project\introduction.xml?> <xsl:stylesheet version=”1.0” xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>     <xsl:template match=”*”>         <html>             <body>                 <table>                     <tr>                         <td>                             <xsl:value-of select=”elements”/>                         </td>                     </tr>                 </table>             </body>         </html>      </xsl:template> </xsl:stylesheet>

The XSLT document just shown matches the node as defined by the XML node xsl:template. In this XSLT example, the match expression is *, which says “match any node.” The template definition includes a series of HTML tags and also the xsl:value-of XML node. When the XSLT processor finds a match, it executes the content within the found XML node. The default execution of non-XSL tags outputs the non-XSL tags to the output stream. In this example, the default execution means the generation of HTML tags. When the xsl:value-of XML node is hit, the XSLT processor executes the instruction. In the case of xsl:value-of, the XSLT selects the node specified by the select attribute and outputs the text values of that content. The select attribute specifies to find all subchild elements where the XML node tag identifier is elements. If you refer back to Listing 6-1, you can see that it has two elements XML nodes. Executing the XSLT as is results in the following XML output:

<html>        <body>               <table>                                  <tr><td>Hello</td></tr>               </table>        </body> </html>

It would appear from the HTML output that the XSLT processor made an error because only the Hello content is output and the World content is missing. It is not an XSLT processor error, but an XSLT document-instruction error.

Tracing the XSLT Processing Steps

As it goes through the XSLT processing steps, the XSLT matches the xsl:template. But this match only occurs once because when a match is found, that node and all subnodes are considered found. So, in the case of the XML document, the data XML node is found. Then the contents of the found node are executed, and the XSLT processor exits. The data XML node is the root node, and all subnodes are tagged for further processing. The result is that the XSLT does not have any other nodes to process.

When the xsl:template items are being executed, the xsl:value-of instruction outputs only the first matched element, which in this case is the first elements XML node. The later elements XML node is skipped. Because of this and because the XSLT processing marks all elements XML nodes as found, only one elements XML node is displayed.

When all the XML nodes within the xsl:template are executed, you can assume that the nodes are legal XML. This is an important consideration because HTML is not XML-compliant. Specifically, the HTML tag <br> does not require a closing </br> tag. When a sole <br> HTML tag is added to the XSLT document, the XSLT processor will cause an error indicating the incorrectness of the XSLT document. Therefore, when you generate HTML content from an XSLT document, the HTML content must be valid XML.

Iterating within an XSLT document

You can iterate through the XML content in two ways, as defined by the original XML document. The first way is to specifically select the element. The other is to specifically iterate a node set. These two methods are described in the following sections.

Specifically Selecting an Element

Thus far, the template matches have been wildcard matches, which means that the root XML node is selected. To select a specific node, you can change the match attribute to a specific XML node identifier, as shown in the following XSLT document. Notice that the HTML content generation has been removed for simplicity.

<?xml version="1.0" encoding="UTF-8"?> <?xmlspysamplexml      D:\Instructor\BookXMLSpy\xslt_project\introduction.xml?> <xsl:stylesheet version="1.0"      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="*">         (<xsl:value-of select="."/>)     </xsl:template>     <xsl:template match="elements">        (<xsl:value-of select="."/>)     </xsl:template> </xsl:stylesheet>

The second xsl:template has a match value of elements. Executing the XSLT document results in the following output:

<?xml version=”1.0” encoding=”UTF-8”?>     (HelloWorld)

This result is not what you want. Instead, it is a result of the XSLT processing rule that says that when a node is selected, so are all the child nodes. To prove this, the xsl:value-of within the XSLT document is changed to the following:

(<xsl:value-of select=”name()”/>)

Within the select attribute, as was shown previously, a node can be selected and called a regular expression. This new version is still a regular expression, but it is also an XPath expression. XPath is a specification that an XSLT processor uses to select individual XML nodes. But it is also possible with XPath to call specific functions that retrieve the values of an XML node. In the modified xsl:value-of node, the textual identifier of the currently selected XML node is output. Executing the modified XSLT document results in the following output:

<?xml version=”1.0” encoding=”UTF-8”?>     (data)

This output shows that, indeed, the root XML element has been selected and the XSLT processor exits processing. There is a way out of this problem: reselecting all the child nodes. You do this by using the xsl:apply-templates XSLT instruction, which allows iteration of the child nodes. Here is the modified XSLT document:

<?xml version="1.0" encoding="UTF-8"?> <?xmlspysamplexml      D:\Instructor\BookXMLSpy\xslt_project\introduction.xml?> <xsl:stylesheet version="1.0"      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="*">          <xsl:apply-templates />          (<xsl:value-of select="name()"/>)     </xsl:template>     <xsl:template match="elements">         (<xsl:value-of select="name()"/>)     </xsl:template> </xsl:stylesheet>

To make the XML document more interesting, the first elements XML node is embedded within a child XML node. The modified XML document is as follows:

<?xml version=”1.0” encoding=”UTF-8”?> <?xml-stylesheet type=”text/xsl”      href=”D:\Instructor\BookXMLSpy\xslt_project\HelloWorld.xsl”?> <data>     <child>         <elements>Hello</elements>     </child>     <elements>World</elements> </data>

If the modified XSLT document is executed, the following results are generated:

<?xml version=”1.0” encoding=”UTF-8”?>               (elements)               (child)               (elements)               (data)

Looking at this output, you can see that the elements nodes have indeed been selected along with the individual child and data nodes. Simply put, instead of a single node selection, all nodes have been selected.

A compromise is to select only the elements XML nodes by commenting out the xsl:template with a wildcard match attribute, as shown by the following modified XSLT document fragment:

<xsl:stylesheet version=”1.0”      xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>     <!--<xsl:template match=”*”>         <xsl:apply-templates />         (<xsl:value-of select=”name()”/>)         </xsl:template>-->     <xsl:template match=”elements”>

Now instead of selecting every XML node in the document, only the elements XML nodes are selected. Executing the modified XSLT document results in the following output:

<?xml version="1.0" encoding="UTF-8"?>      (elements)           (elements)

The generated output gives you what you originally wanted.

After you see the various XSLT modifications, it should become apparent that selection is not such a simple thing. Another complication that can occur is when found elements are embedded within each other.

Consider the following modified XML document:

<?xml version=”1.0” encoding=”UTF-8”?> <?xml-stylesheet type=”text/xsl”      href=”D:\Instructor\BookXMLSpy\xslt_project\HelloWorld.xsl”?> <data>     <child>         <elements>Hello</elements>     </child>     <elements>World<sub>     <elements>Embedded</elements></sub></elements> </data>

In this modified XML document, the elements XML node has an embedded elements XML node. Running the modified XSLT document on the modified XML document results in the following output:

<?xml version=”1.0” encoding=”UTF-8”?>      (elements)      (elements)      (elements)

The modified XML document contains four references to elements XML nodes, but only three are output. This result is a problem because only the outer elements XML node has been selected; the inner elements XML node has been ignored. This means that information has been lost. To get around this, you have to apply the xsl:apply-templates rule within the found template, as shown in the following XSLT document fragment:

<xsl:template match="elements">      <xsl:apply-templates />      (<xsl:value-of select="name()"/>) </xsl:template>

Running the modified XSLT document on the modified XML document results in the following output:

<?xml version=”1.0” encoding=”UTF-8”?>WorldEmbedded     (elements)     (elements)     (elements)     (elements)

Now the correct output has been generated because all elements XML nodes have been selected. The flipside of this may be that the developer intended for the embedded elements XML node to be ignored. But that is a decision that must be made when designing the XML data and XSLT document.

Iterating a Found Element

The other way of iterating various XML nodes is to use the XSLT instruction xsl:for-each. This XSLT instruction builds a node set based on the matching criteria identical to the xsl:value-of select attribute. Consider the following XML document:

<data>      <child>          <elements>Hello</elements>      </child>      <elements>World<sub>          <elements>Embedded</elements></sub>      </elements> </data>

The simplest iteration strategy is to select the data root XML node and then iterate through the various elements XML nodes. You can apply the following XSLT document:

<xsl:stylesheet version=”1.0”     xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>     <xsl:template match=”*”>         <xsl:for-each select=”elements”>             (<xsl:value-of select=”name()”/>)         </xsl:for-each>     </xsl:template> </xsl:stylesheet>

The xsl:for-each instruction must be within the found xsl:template and is applied to the currently found node. In this case, the found node would be the data node because of the wildcard match. The xsl:for-each select attribute specifies the elements XML node XPath expression. Running the XSLT document on the XML document results in the following output:

<?xml version="1.0" encoding="UTF-8"?>      (elements)

This result is only partially correct. Only one of the three possible elements has been found because the XPath expression was written incorrectly. In the XSLT document, the XPath only selects the elements XML nodes that are directly below the data XML node. This means that only the first-level child nodes are selected. With respect to this XML document, there is only a first-level child elements node. To select all the child elements XML nodes, you need to modify the XPath to the following:

<xsl:for-each select=”//elements”>

The added double slash is a special function in XPath that says, “Select all the XML nodes with the identifier elements, regardless of where they appear in the current selection.” Running the modified XSLT document on the XML document results in the following output:

<?xml version=”1.0” encoding=”UTF-8”?>      (elements)               (elements)               (elements)

This is the result that you want considering that there are three child elements XML nodes. But change your query to match all elements XML nodes within a specific child XML node. The XSLT query to execute against the XML document at the beginning of this section is as follows:

<xsl:stylesheet version=”1.0”      xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>     <xsl:template match=”child”>         <xsl:for-each select=”//elements”>             (<xsl:value-of select=”name()”/>)         </xsl:for-each>     </xsl:template> </xsl:stylesheet>

The expected result of this document is for the XSLT processor to iterate all the XML nodes. When it finds a child XML node, all the descendant child elements XML nodes are iterated and displayed. Running the modified XSLT document on the XML document results in the following output:

<?xml version=”1.0” encoding=”UTF-8”?>     (elements)              (elements)              (elements)     WorldEmbedded

This result is not the desired output because the XML document at the beginning of this section has one child XML node and only one contained elements XML node. All elements XML nodes in the document have been selected because the XPath query did not reference the current XML node location. Here is the correct XPath:

      <xsl:for-each select=”.//elements”>

The difference in this selection is the period before the double slash. The period is an XPath instruction that says, “Start the search in the current context and not in the root document.” Executing the new XSLT query results in the following output:

<?xml version=”1.0” encoding=”UTF-8”?>     (elements)     WorldEmbedded

This output is correct because only one elements XML node has been selected.

What is a bit bothersome is that the content WorldEmbedded is generated even though it has nothing to do with the XSLT document. The answer to this problem requires an additional inspection of the original XML document and the location of the generated content. Consider the original XML document again and look at the bold section:

<data>     <child>         <elements>Hello</elements>     </child>     <elements>World<sub>         <elements>Embedded</elements></sub>     </elements> </data>

What should become apparent is that the bold XML elements are not selected by the XSLT document specified by the user. Instead, those XML elements are selected by the default template, as shown by the following XSLT default template fragment:

<xsl:template match="*|/" mode="?">     <xsl:apply-templates mode="?"/> </xsl:template> <xsl:template match="text()|@*" mode="?">     <xsl:value-of select="."/> </xsl:template>

Notice the second xsl:template, which has a match of text(). This XSLT instruction is responsible for generating the extra output. This output informs the developer that some XML content has not been accounted for. To get rid of this unwanted side effect, you need to modify the XSLT document to the following:

<xsl:stylesheet version=”1.0” xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>     <xsl:template match=”child”>         <xsl:for-each select=”.//elements”>              (<xsl:value-of select=”name()”/>)         </xsl:for-each>      </xsl:template>      <xsl:template match=”text()”>      </xsl:template> </xsl:stylesheet>

The additional xsl:template with a match of text() overrides the default template handler without affecting the XSLT generation of the document. The function text() is another Xpath-specific function. In this case, the match XML attribute indicates that if there is a text node that has a value, true is returned for matching.

Tip

Because I am writing a book, when I am trying to diagnose an XSLT problem like the one just outlined, I have time to experiment and figure out why certain problems arise. But in a production-coding scenario, you do not have that luxury. The easiest way to diagnose any XSLT problem is to debug it using the XMLSPY XSLT debugger. Following any other course of action is a lesson in futility.