XSLT Introduction


XSLT is a very powerful tool in the XML toolkit. It is an XML-based standard that facilitates the transformation of an XML document into another format. Depending on your requirements, the output document can be one of (but not limited to) the following formats:

  • XML

  • XHTML

  • HTML

  • WML

  • RT F

  • CSV

  • Others

As you can see, the term XSLT processing describes any one of a number of tasks . Given an input XML document, a transformation could change an element name ; the character data inside an element can be modified; an attribute can be changed; or elements and attributes can be filtered, so that they don't appear in the output XML document. XSLT is a powerful tool that enables us to easily change the contents of an XML document based on our requirements.

Figure 8.1 shows the components of an XSLT process. As you can see in the figure, there are three required components to any XSLT-based transformation:

Figure 8.1. XSLT transformation process.

graphics/08fig01.gif

  • Input XML document

  • XSLT processor

  • XSLT stylesheet

An important concept to take away from Figure 8.1 is that the data only flows in one direction. The input is always XML to XSLT; processors don't generate XML, they only transform XML to other formats.

The input XML document is the source of the data that will be transformed. Note that this is the only data source available for this transformation (that is, all information in the transformed files is the entire contents or a subset of the input XML document). The fundamental components of the XSLT transformation process are XSLT stylesheets and the XSLT processor.

Why would we want to convert an XML document into another XML document? Several situations exist in which we would want to convert between XML and another format.

Probably the two most widely used transformations are converting XML documents to other XML documents and converting XML documents to HTML documents. Let's take a look at a few situations in which these two transformations would be used.

Converting XML Documents to Other XML Documents

The ability to convert from one XML document to other XML document formats can be used in a number of situations and provides a great deal of flexibility. Let's take a look at a few everyday scenarios when this conversion could be used. This conversion can be used if you need to filter an XML document or convert it to support another DTD or XML schema format. We'll take a closer look at each of these cases in the following sections.

XML Document Filtering

The subject of XML document filtering has a number of applications. For example, let's say that your company exchanges XML formatted documents with other companies. Assume that internal to your company, you use and distribute an XML document that is based on a corporate DTD or schema containing information about products and customers. However, this DTD or schema contains several elements that contain company-sensitive information (for example, client names and phone numbers ). This is clearly information that you will need to remove from the XML document before you send it to another company.

One way to filter the sensitive elements from the XML document is to manually edit it. This isn't very practicalit is error prone and may be a nearly impossible task if the XML document is large. The better approach is to use an XML document filter to generate a new XML document. The first step is to generate a new external DTD or XML schema that doesn't contain the sensitive elements. Then, use XSLT to transform the original internal document into a version that conforms with the new DTD or XML schema that does not contain sensitive information. This approach also provides a better long-term solution because it can easily be reused.

Transforming XML to Support a Different DTD or XML Schema

When companies exchange data using XML, the ideal situation is for all the companies involved to standardize on a common DTD or XML schema. However, that may not always happen for a number of reasons. For example, one of the companies may not be willing (or able) to change their internal applications to support a common DTD or XML schema.

One approach is to transform your XML document to another company's preferred format. This process isn't as smooth as it could be if both companies had agreed on a common DTD or XML schema. However, there are workarounds you can use to facilitate the transfer of data. In this case, you could follow an approach similar to the one discussed in the last section. Develop a DTD or XML schema that matches the preferred format of the other company. Then, develop an XSLT stylesheet to transform your XML document to comply with a different DTD or XML schema.

Another example of transforming to use a different format would be different companies using the Electronic Business using XML (ebXML) format to exchange data. ebXML is a group of related specifications designed to facilitate global exchange of business- related information. ebXML is a powerful suite of specifications; however, for internal corporate information interchange, it is probably more than you need. In cases such as this, a company can translate internal XML documents to ebXML documents to send data to a global partner. The same approach can be followed for incoming ebXML documents you may want to convert into a local, internal corporate format.

Converting XML Documents to HTML Documents

Another widely used transformation is from XML to HTML/XHTML. As you can imagine, this is an important application of XSLT, especially on the web. For example, users can query a database using the Perl DBI, convert the results of the query to XML by using any number of modules (for example, XML::Writer) and dynamically generate HTML reports. One of the benefits of this conversion is that the format and content of the reports can be predefined by using an XSLT stylesheet. After the stylesheet has been defined, reports can be generated over and over again using the same stylesheet.

For example, let's say that you're responsible for generating a dynamic accounting report that tracks charges based on project number. The accounting for each project can change on a daily basis depending on travel, purchases, and so forth By using an XSLT stylesheet to convert from XML to HTML/ XHTML, you can easily provide reports that will always be based on the latest available data. The next few sections go into more detail on how to perform these types of transformations.

If you remember, earlier in the book I mentioned that one of the major benefits of XML is that it separates the data from the presentation of the data. XSLT can be used to generate different versions (or views) of the same data depending on the client that requests the data. This can easily be accomplished by developing multiple XSLT stylesheets for each report. Each of the XSLT stylesheets can be used to generate the HTML for a particular device. For example, let's assume that you have a report that contains graphics (for example, charts or photographs), and some of the clients requesting the report aren't capable of displaying all the graphics (for example, mobile phone). A WAP-specific XSLT stylesheet can be used to filter out the graphics and just send the text of the report.

XPath Introduction

XPath is a non-XML language that was developed to search and access portions of an XML document. In addition to the searching capability, XPath also provides a few basic functions for manipulating data. The name XPath comes from the fact that the standard uses path notation (similar to a URL) to work with the hierarchical structure of a document. We can compare the relationship between XPath and XML to the relationship between SQL and a database. Granted, they are not the same, but the relationship between XPath and XML can be compared to the relationship between SQL and a database.

XPath provides the capability to select the first occurrence of an element, the third occurrence of another element, or to retrieve the social security attribute of an employee element that has the firstname "Mark." It is a very powerful capability, and it is an important component of a number of XML-based standards, such as XPointer, XML Schemas, and XSLT. So, before discussing XSLT stylesheets, I thought it would be advantageous to provide an introductory discussion of the XPath standard.

Note

One advantage of using XPath in the other standards is commonality among the standards. After you're familiar with XPath, you'll be able to apply the knowledge in a number of areas. This is an example of when reuse of a technology is beneficial to you as the user .


Because we're concerned with XPath and XSLT in this chapter, what is their relationship? When using XSLT to transform an input XML document, we'll generally have an input XML document, an input XSLT stylesheet, and an output document (which could be in any number of formats). XSLT searches the input XML document using XPath by comparing the rules defined in the XSLT stylesheet (also called templates) to the input XML document. After a match has been found in the input XML document, the matching construct can be copied by the XSLT processor and then processed based on the rule in the XSLT stylesheet. This rule might call for the conversion to HTML, filtering of the XML document, or any of a number of possibilities.

We have an idea of what XPath does, but how does it work, and what is the format of the syntax? One important point to keep in mind is that XPath works on an XML document that has already been parsed and stored in a tree. So, similar to the DOM parsers (which were discussed in Chapter 4, "Tree-Based Parser Modules"), XPath views an XML document as a tree of nodes. The notation used to describe the tree structure is basically the same, so if you're familiar with the DOM trees, you shouldn't have a problem understanding the XPath notation.

It is easier to discuss XPath in the context of an XML document. Let's assume that your company employs a web-based timekeeping system, and that an application generates an XML document containing all the submitted timecard information. A sample of an XML document with timecard information is shown in Listing 8.1.

Listing 8.1 Corporate timesheet information in XML. (Filename: ch8_xpath_timesheet.xml)
 <?xml version="1.0" encoding="UTF-8"?>  <!DOCTYPE timesheet SYSTEM "timesheet.dtd">  <timesheet week="5/31">     <employee type="salary">        <name>Joseph</name>        <project>Book development</project>        <hours>40</hours>     </employee>     <employee type="hourly">        <name>Kayla</name>        <project>Editing</project>        <hours>45</hours>     </employee>        <employee type="hourly">        <name>Marijo</name>        <project>Artwork</project>        <hours>45</hours>     </employee>     <employee type="salary">        <! Dan had a busy week, as usual !>        <name>Dan</name>        <project>Indexing</project>        <hours>70</hours>     </employee>  </timesheet> 

As far as XPath is concerned, there are seven types of possible nodes that might appear in the node tree. All the different types of XML objects (for example, elements, attributes, and so forth) are mapped to node types in the tree. The seven types are

  • Root node Appears only once in a tree because there is only one root element in an XML document. Note that the root node does not have a parent.

  • Element node Every element in the input XML document will have a corresponding element node in the node tree (for example, <name> in Listing 8.1). Note that an element has a parent and may have child nodes.

  • Attribute node Every attribute in the document appears as an attribute node in the tree (for example, the type attribute in Listing 8.1).

  • Text node Contains the text from an element (for example , " Joseph " from the first <name> element in Listing 8.1). Note that a text node has a parent but cannot have any children.

  • Comment node Contains the text from the comment nodes, minus the opening <! and the closing > (for example, " Dan had a busy week! "). Note that comment nodes have a parent but don't have any children.

  • Processing Instruction (PI) node Provided so that XML can pass additional information to a processor that may read the XML document. Remember, a PI node starts with " <? " and ends with " ?> ". The contents of the PI in the node tree does not include the " <? "and " ?> " that surround the instruction. In Listing 8.1, the PI is xml version="1.0" encoding="UTF-8" .

  • Namespace node Represents the namespace on an element or node. Each namespace node has a parent element but isn't considered to be a child of that parent (that is, the parent doesn't consider the namespace node to be a child).

An XPath expression contains the information required to match an XML object, such as an element or an attribute. The format of the expressions are similar to the paths used in a Unix filesystem, where the "/" represents the root element, and other directories can be specified by using their path. For example, on a Unix machine, the path /home/mark/xml would point to the xml directory in the mark directory, which is part of the home directory.

Let's try and look at a few simple XPath expressions that are based on the XML document shown in Listing 8.1. A majority of your XPath expressions refer to the root node, element nodes, or attribute nodes. Let's take a look at the XPath expressions required to access each of these node types.

Retrieving the Root Node

The simplest XPath expression is "/", which would match on the root node. Because this XPath expression would match on the entire document, the result returned by this XPath expression would be the entire XML document except for the PIs at the top of the XML document.

Retrieving Element Nodes

XPath expressions can be used to match groups of element nodes, or an individual element node. For example, the XPath expression /timesheet/ employee would match on all the employee nodes of the current context node. At this point, the context node is timesheet, so we'll retrieve all the employee elements and their children. In our case, that would be the following:

 <employee type="salary">     <name>Joseph</name>     <project>Book development</project>     <hours>40</hours>  </employee>  <employee type="hourly">     <name>Kayla</name>     <project>Editing</project>     <hours>45</hours>  </employee> NODE   <employee type="hourly">     <name>Marijo</name>     <project>Artwork</project>     <hours>45</hours>  </employee> NODE   <employee type="salary">     <! Dan had a busy week, as usual !>     <name>Dan</name>     <project>Indexing</project>     <hours>70</hours>  </employee> 

If we want to go one layer deeper into the XML document and retrieve all the name nodes in the tree, we could use nearly the same XPath expression as the previous example; however, we'd add one additional branch onto the tree. For example, the XPath expression /timesheet/employee/name would return all the name nodes from the node tree. In our case, the following elements would be returned:

 <name>Joseph</name>  <name>Kayla</name>  <name>Marijo</name>  <name>Dan</name> 

We can retrieve a particular element node based on the node location in the tree. If we wanted to find the name of the second employee in the node tree, we would use the following XPath expression: /timesheet/employee[2]/name .As you can see, we have an index on the employee node that tells XPath which employee node we're interested in. Note that the index of the first occurrence starts at 1 rather than 0 (for those of you who are used to arrays starting at 0). Using this XPath expression, the result would be

 <name>Kayla</name> 
Retrieving Attribute Nodes

Attributes can also be retrieved using XPath by specifying the attribute name using the at symbol "@". For example, let's say that we want to retrieve the week attribute of root timesheet node. We can use the XPath expression /timesheet/@week , and the result would be

 week="5/31" 

Attributes can also be retrieved from a particular element node. For example, let's say that we want to retrieve the salary type of the second employee in the node tree. We can use the XPath expression /timesheet/employee[2]/@type , and the result will be

 type="hourly" 

Unfortunately, we can't cover all the possible examples of XPath expressions. However, we've given you a few examples of the most widely used types of XPath expressions. For additional XPath information, take a look at the XPath standard available online at http://www.w3.org/TR/xpath.

XSLT Stylesheets

XSLT stylesheets are XML files that define the rules for the transformation between XML and other text formats. The XSLT stylesheets contain all the information regarding output content (that is, which elements and attributes you want to have appear in the output document) and format (for example, colors, font, and so forth).

Note

This is a very involved topic that can easily fill a book all by itself (and there are already several available). Unfortunately, I don't have the luxury of time (and paper) to go into great detail about XSLT stylesheets. However, I will present enough material to get you started and introduce the topic of XSLT stylesheets. This, combined with the XSLT stylesheets used in the examples throughout this chapter, should provide you with a good understanding of the topic. A great book that you should add to your library is Inside XSLT , by Steven Holzner from New Riders.


Let's take a look at a sample XSLT stylesheet that converts an XML document to an HTML document containing a bulleted list. However, I'll start at the beginning of the process, so that you can see all the steps involved in developing a stylesheet.

First, Listing 8.2 shows a DTD for a very simple XML document that will contain a listing of XML-related book titles. Because this DTD defines an XML document that is about as simple as you can build, this is probably the equivalent of our "Hello World" example. As you can see, this XML document has a root element named <xml_library> and has one or more child elements named <book> . Listing 8.3 shows the XML schema that describes the same document.

Listing 8.2 DTD for the XML library document. (Filename: ch8_xml_library.dtd)
 <?xml version="1.0" encoding="UTF-8"?>  <!ELEMENT xml_library (book+)> <!ELEMENT book (#PCDATA)> 
Listing 8.3 XML schema for the XML library document. (Filename: ch8_xml_library.xsd)
 <?xml version="1.0" encoding="UTF-8"?>  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"  elementFormDefault="qualified">     <xs:element name="book" type="xs:string"/>     <xs:element name="xml_library">        <xs:complexType>           <xs:sequence>              <xs:element ref="book" maxOccurs="unbounded"/>           </xs:sequence>        </xs:complexType>     </xs:element>  </xs:schema> 

Listing 8.4 shows the XML document that was built using the DTD shown in Listing 8.2. As you can see, it contains a short list of some of the XML-related books currently available from New Riders Publishing. This XML document is the input XML file for the transformation to HTML.

Listing 8.4 Simple XML document for XML to HTML conversion. (Filename: ch8_xml_library.xml)
 <?xml version="1.0" encoding="UTF-8"?>  <!DOCTYPE xml_library SYSTEM "ch8_xml_library.dtd">  <xml_library>     <book>        XML and PHP     </book>     <book>        XML and ASP.NET     </book>     <book>       XML, HTML, XHTML Magic     </book>     <book>        Inside XML     </book>     <book>        Designing SVG Web Graphics     </book>     <book>        XML and Perl     </book>  </xml_library> 

Remember, our goal here is to convert this XML file to an HTML file that will display all the books in a bulleted list. Listing 8.5 shows the XSLT file that contains the rules used by the XSLT processor to transform the input file to HTML. Let's take a closer look at the XSLT file.

Listing 8.5 XSLT stylesheet to convert from XML to HTML. (Filename: ch8_xml_library.xslt)
 1.   <?xml version="1.0"?>  2.   <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">  3.  4.   <xsl:template match="/">  5.   <html>  6.      <head>  7.         <title>My XML Library</title>  8.      </head>  9.      <body>  10.     <h2>Library Report</h2>  11.        <ul>  12.           <xsl:for-each select="xml_library/book">  13.              <li><xsl:value-of select="."/></li>  14.           </xsl:for-each>  15.        </ul>  16.     </body>  17.  </html>  18.  </xsl:template>  19.  20.  </xsl:stylesheet> 

12 The first important property of XSLT stylesheets that you should be aware of is on line 1. An XSLT stylesheet is an XML document. This is important for a few reasons. First, it is important because an XML parser can read the XSLT stylesheetthat is, it is in an already understood and well-defined format. Second, because the XSLT stylesheet is an XML document, it must follow all the rules related to format and content as any other XML document does. Because you're now familiar with the format of an XML document (that is, elements, attributes, and so forth), you'll understand the format of an XSLT stylesheet and what the stylesheet is trying to do.

 1.   <?xml version="1.0"?>  2.   <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 

All XSLT elements are in the http://www.w3.org/1999/XSL/Transform namespace. Typically , most users represent this namespace by using an xsl prefix. As you can see, the xsl prefix is used throughout our stylesheet.

Note

The root element of this XSLT stylesheet is named stylesheet ; however, transform can also be used. These two terms have the same meaning to an XSLT processor.


410 All the elements that we want to appear in the output document must be identified by a xsl:template element. This element has an attribute named match that contains the element we want to extract from the source XML document. In our case, we're matching the " / " element, which is the root of our document. So, starting at the root of the document (that is, the entire XML document), this stylesheet is enclosing our XML document with HTML <html> and <body> body tags. The <html> tag is the outermost tag and identifies the document as an HTML document, and the <head> tag contains the <title> tag, which contains the string that will show up in the title bar of the browser window.

 4.   <xsl:template match="/">  5.   <html>  6.      <head>  7.         <title>My XML Library</title>  8.      </head>  9.      <body>  10.     <h2>Library Report</h2> 

1120 This section contains the definition for the unordered list. The <ul> tags display an unordered bulleted list, and the <li> tag indicates an itemized element preceded by a bullet. As you can see, we're using the <xsl:for-each> element to iterate through all the elements that match the XPath expression xml_library/book . In our case, we'll match on all the <book> elements. After we find a matching element, we retrieve the contents of the current or context element by using the select="." attribute.

 11.        <ul>  12.           <xsl:for-each select="xml_library/book">  13.              <li><xsl:value-of select="."/></li>  14.           </xsl:for-each>  15.        </ul>  16.     </body>  17.  </html>  18.  </xsl:template>  19.  20.  </xsl:stylesheet> 

At this point, I have defined all the inputs required for an XSLT processor (that is, a well- formed XML document and an XSLT stylesheet). I've purposely skipped showing the code for the XSLT processorthat is covered in the next section. The purpose here was to focus on the input files that are required. At this point, think of an XSLT processor as the cloud or black box that takes an XML document and an XSLT stylesheet as input and performs some type of translation. In this example, the XSLT processor would now take these input files and generate an HTML document. The generated HTML document is shown in Listing 8.6. As you can see, it is a simple HTML document that has the contents of our title element and the contents of the input XML document formatted in an unordered bulleted list.

Listing 8.6 HTML generated from the XSLT transformation. (Filename: ch8_xml_library.html)
 <html>     <head>        <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">        <title>Library Report</title>     </head>     <body>     <h2>My XML Library </h2>       <ul>         <li>            XML and PHP         </li>         <li>            XML and ASP.NET        </li>         <li>            XML, HTML, XHTML Magic         </li>         <li>            Inside XML         </li>         <li>            Designing SVG Web Graphics         </li>         <li>            XML and Perl         </li>      </ul>     </body>  </html> 

Viewed from within a browser, the HTML file shown in Listing 8.6 would produce the output shown in Figure 8.2.

Figure 8.2. Browser view of the XML library HTML file.

graphics/08fig02.gif

Granted, this was a very simple example, but I just wanted to present a simple example how a stylesheet works and the process involved in using a stylesheet. A lot of options exist that can be used with stylesheets. Unfortunately, I can't cover them all; however, I'll have several examples and use different stylesheet options.

Now that we've taken a look at XSLT stylesheets , let's talk about the XSLT processor.

What Is an XSLT Processor?

Because XSLT is an XML-based standard, XSLT processors are available for all the major programming languages (for example, C/C++, Java, and of course, Perl). Several Perl modules exist (for example, XML::LibXSLT and XML::Sablotron) that perform XSLT processing, and I'll present examples of the most popular modules a little bit later in this chapter.

Looking back at Figure 8.1, recall that the XSLT processor accepts an XML document or a DOM-like tree and an XSLT stylesheet as input and then generates output based on the rules in the XSLT stylesheet. All XSLT processors follow several basic steps.

  1. First, the XSLT processor requires that the input XML is converted (by parsing) into a DOM-like tree structure. Some XSLT processors perform the parsing themselves , while others expect the tree structure as input. Either way, the XSLT processor then parses the stylesheet and stores the contents in another tree structure. For our discussion, let's call this the XSLT tree. If you remember, tree-based parsing was discussed back in Chapter 4.

  2. Second, the XSLT processor parses the input XML document and stores the contents in a separate tree structure. For this discussion, let's call this the XML input document tree.

  3. Finally, the XSLT processor uses the template specified by the <xsl: template> element(s) in the XSLT tree and finds the corresponding elements in the XML input document tree. In our simple example, we only had one <xsl:template> element and it was <xsl:template match="/"> , so we basically matched the entire XML input document tree. In a more advanced stylesheet, the XSLT processor would walk through the XSLT tree and for each new <xsl:template> it encountered , it would recursively search the XML input document tree for matching elements.

We've had a chance to walk through the mechanics of a generic XSLT processor, but haven't mentioned anything about Perl. So, let's take a look at a few examples using the different Perl XSLT modules that are currently available.



XML and Perl
XML and Perl
ISBN: 0735712891
EAN: 2147483647
Year: 2002
Pages: 145

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net