6.1 XPathScript Basics | XML Publishing with Axkit

An XPathScript document's basic syntax is similar to that found in Active Server Pages and similar technologies, where literal output is separated from functional code blocks using the familiar <% . . . %> pseudotags as delimiters. As with Apache::ASP, the embedded language is Perl and the mixture of code and markup is processed , returning the content as it is encountered from the top down, while replacing the code blocks with the markup they return to produce the complete result:

 <html>   <body>   <p>Greetings. The local time for this server is <% print scalar localtime; %>   </p>   </body> </html>

Appending an equal sign « = » to the opening delimiter offers a shortcut that sends the interpolated values returned from any code it encloses directly to the output. The following would send the same result to the browser as shown in the previous example (the print statement is gone):

 <html>   <body>   <p>Greetings. The local time for this server is <%= scalar localtime %>   </p>   </body> </html>

XML elements and attributes can be added to the result from within code blocks by simply returning the appropriate textual representation:

 <select name="day">   <%      foreach my $day ( qw(Sun Mon Tues Wed Thu Fri Sat) ) {         print "<option value='$day'>$day</option>";      }   %>   </select>

In terms of lexical scoping, an XPathScript stylesheet behaves like any Perl script in that all variables and subroutines not enclosed within a block, or explicitly declared as part of another package, become part package::main and are available throughout the entire stylesheet. Among other things, this allows you to break your stylesheets into logical sections that can provide a somewhat cleaner division between functional code and markup.

 <% my @fish_names = qw( tuna halibut salmon scrod ); %> <html>   <body>     <form>       <p>Choose Your Favorite Fish</p>       <select name="fave_fish">       <%          # @fish_names is still in scope here          foreach my $fish ( sort(@fish_names) ) {             print "<option value='$fish'>$fish</option>";          }       %>       </select>     </form>   </body> </html>

Any similarity between XPathScript and various Server Pages technologies ends at this superficial syntactic level. Remember, XPathScript was designed as an alternative to XSLT, and as such, it provides both a means for selecting and extracting information for an XML source as well as a means for creating and applying declarative rules for transforming that contentthat is, unlike ASP, XPathScript documents are stylesheets that are applied to a source XML document to create a transformed result, not simply a source of dynamically generated content. The typical XPathScript stylesheet takes the form shown in Example 6-1.

Example 6-1. Structure of a typical XPathScript stylesheet

 <%      # Perl code block that imports any required modules,     # and performs any required initialization     use Some::Package;        # Declarative template rules defined via the      # special "template hash" $t     $t->{some:element}->{pre}  = '<new>';     $t->{some:element}->{post} = '</new>';     # and so on..  %>   <root>       <!-- begin literal output -->       <child>           <%             # A mixture of literal output and code blocks that             # call XPathScript functions to select data from              # or apply templates to the source XML           %>       </child>   </root>

From this, you can see that a typical XPathScript stylesheet consists of two parts : a Perl code block that contains any initialization required for the current transformation as well as the template rule configuration via XPathScript's special template hash, $t , and a block of literal output that provides the overall structure of the result and contains escaped code blocks through which the contents of the source XML document are accessed.

6.1.1 Accessing Document Content

Like XSLT, XPathScript uses the XPath language as the mechanism for selecting and evaluating the nodes in an XML document. (We touched on the basics of XPath in Section 5.1 in Chapter 5, and I will not duplicate that introduction here.) Recall that XPath provides a compact syntax for accessing and evaluating the contents of an XML document using a combination of location paths and function evaluation. In XPathScript, these XPath expressions are passed as arguments to one of a handful of Perl subroutines, implemented by the XPathScript processor, that perform the desired actions. The following sections provide an introduction to each of these subroutines.

6.1.1.1 findvalue( )

As the name suggests, the findvalue function offers a way to select the value from a given node in the source XML document. The required first argument to this function is the XPath location path that will be evaluated to select the node whose value will be returned.

 <html>     <head>         <title><%= findvalue('/article/artheader/title') %></title>     </head> </html>

An optional second argument may be passed to the findvalue( ) function. This argument is expected to be a node from the current document and will be used to provide the context in which the expression contained in the first argument will be evaluated.

 <% foreach my $chapter ( findnodes('//chapter') ) {     # select the title of each chapter     my $title = findvalue('title', $chapter); } %>

This can be useful, but it is far more common to use the object-oriented interface, calling findvalue on the nodes themselves , rather than passing that node as the second argument. Calling findvalue as a method on the node has the effect of limiting the value returned to the contents of that specific node and gives the same result as calling findvalue as a function and passing in the given node as the second argument:

 <% foreach my $chapter ( findnodes('//chapter') ) {     # Same as above     my $title = $chapter->findvalue('title'); } %>

As with XSLT's xsl:value-of element, if more than one node is returned from evaluating the given expression, the text descendants of all nodes selected will be concatenated (in document order) into a single string. For example, applying findvalue('para') to the following XML snippet:

 <para>     Well, <emphasis>I</emphasis> wouldn't say that. </para>

produces this somewhat more ambiguous snippet:

 Well, I wouldn't say that.

Unlike other XPathScript subroutines designed to select nodes from the source XML document, XPath expressions passed to findvalue( ) are not limited to location path expressions:

 <p>     This document contains <%= findvalue('count(//section)') %> sections. </p>

6.1.1.2 findnodes( )

While findvalue( ) is used for retrieving the values of the selected nodes, the findnodes( ) function is used to select the nodes themselves. In a list context, findnodes( ) returns a list of all nodes that match the XPath expression passed as the first argument; in a scalar context, it returns an XML::XPath::NodeSet object. As with findvalue( ) , a node may be passed as the optional second argument to constrain the context in which the XPath expression will be evaluated.

 # print the 'id' attributes from all 'product' elements calling findnodes( ) in a list  context foreach my $product ( findnodes('/productlist/product') ) {     print $product->findvalue('@id') . "\n"; } # the same, but calling findnodes( ) in a scalar context to return an XML::XPath::NodeSet  object my $products = findnodes('/productlist/product'); foreach my $product ( $products->get_nodelist ) {     print $product->findvalue('@id') . "\n"; }

6.1.1.3 findnodes_as_string( )

The findnodes_as_string( ) function works just like findnodes( ) , but it returns the textual representation of the selected nodes rather than a nodeset object. It is used primarily to copy larger document fragments , as is, into the result of the transformation. For example, the following copies the entire contents of the file /include/common_header.xhtml into the result of the current process:

 <body>    <!-- insert the common header -->     <%= findnodes_as_string('document("/include/common_header.xhtml")') %>     . . .  </body>

Taken together, these subroutines provide direct access to the contents of the source XML document from within a stylesheet's code blocks. They are most useful for cases in which the types of transformations required would benefit from the ability to use Perl's control structures, operators, and functions to generate the result based on nodes from the source document. However, this is only half of the story; XPathScript also offers the ability to set up and apply template rules that will generate new content as each element node in a given set is visited by the XPathScript processor.

6.1.2 Declarative Templates

Declarative templates provide a way to easily set up transformations for all instances of certain elements in the source document. Template rules are defined by adding keys to a global Perl hash reference, $t . During processing, the names of selected nodes are evaluated against the key names in the hash that $t refers to. When a match is found, the node is transformed based on the contents of that template.

The result of applying a template to a matching element is governed by the values assigned to one or more of a specific set of subkeys for each key in the template hash. For example, the value assigned to the pre subkey is added to the output just before the matching element is processed, while the value assigned to post key is appended to the output just after the node is processed. Adding the following to your XPathScript stylesheet has the effect of replacing all section elements with div elements with a class attribute with the value section :

 <% $t->{'section'}{pre} = '<div class="section">'; $t->{'section'}{post} = '</div>'; %>

If you are familiar with XSLT, the above corresponds to the following template rule:

 <xsl:template match="section">     <div class="section">         <xsl:apply-templates/>     </div> </xsl:template>

XPathScript template matches are based solely on the element's name rather than the evaluation of more complex XPath expressions used in XSLT. For example, the first template below matches all instances of the product element, while the second would never match, even though a node may exist in the source document that matches the given expression:

 <% # convert <product> elements to list items $t->{'product'}{pre} = "<li>"; $t->{'product'}{post} = "</li>"; # will never match since only element names are tested $t->{'/products/product'}{pre} = "<li>"; $t->{'products/product'}{post} = "</li>"; %>

The fact that matches are based on the element name only is the major difference between declarative templates in XPathScript and those in XSLT. While this may appear to limit XPathScript's power, you will see that XPathScript provides other facilities for more fine- tuned control over the templates' matching behavior. (See Section 6.2, later in this chapter.)

XPathScript is not implemented in XML, so there is really no way to bind an XML namespace URI to a prefix, according to the rules of XML. This means that if you are creating template rules or location expressions to match element and attribute names that use XML namespaces and prefixes, you must use the same literal string prefix used in the source document. Compare this to XSLT, in which a prefix in the stylesheet and the one in the source document can vary as long as they are bound to the same URI.

Now you have a general idea of how to create template rules. Let's examine how to apply those templates to the source content.

6.1.2.1 Applying templates

Templates are applied to the source content using the apply_templates( ) function. This function accepts a single optional argument that can either be an XPath expression that selects the appropriate nodes to be transformed, or an XML::XPath::NodeSet object containing those nodes. If no argument is passed, the templates are tested for matches against all nodes in the current document.

 # Apply template rules to all <product> elements and their children print apply_templates('/productlist/product'); # The same, but pass a nodeset instead my $product_nodes = findnodes('/productlist/product'); print apply_templates( $product_nodes );

The XPath expression passed to the apply_templates function can be as specific as the case requires, and Perl scalars will be interpolated if the expression is passed as a double-quoted string. Also, the «@ » must be escaped in the following, so the processor does not confuse the XPath attribute selection syntax with a Perl array and attempt to interpolate it (this is not needed if the expression is passed as a single-quoted string):

 <% my $id = $some_object->get_id( ); my $product = findnodes("/productlist/product[\@id=$id]"); if ( $product->size( ) > 0 ) {     print apply_templates( $product ); } else {    print "<p> Sorry, no product ID match for $id in the current document.</p>"; } %>

When applying templates, you must use Perl's built-in print function or the <%= opening delimiter shortcut for the result of the template processing to appear in the final output:

 <table>             <%  # part of a much longer Perl block  if ( $some_condition ) {      print apply_templates('/productlist/product');  } %> </table> # The same, but using the <%= delimiter shortcut <table>     <%= apply_templates('/productlist/product') %> </table>

In both cases, the result of any template processing appears in the final result at the same location that the print statement or special opening delimiter is found.

6.1.2.2 Importing templates

The import_template function provides a way to add template rules to the current stylesheet by importing an external stylesheet. This function's single required argument is the DocumentRoot-relative path to the stylesheet that you want to import.

 <% import_template('/path/to/included.xps')->( ); %>

Since an XPathScript stylesheet is really a sort of fancy preprocessed Perl script, and the template hash is a global variable within that script, imported templates have the potential to override or add to template rules defined in the current stylesheet. That is, the element hashes will be merged, but since the template hash can only have one unique subkey for each template rule, if your stylesheet has a rule such as $t->{'para'}{pre} = ' <p>'; , and the imported stylesheet also has a rule to match para elements and defines a pre subkey, then the subkey definition that appears last in the stylesheet (e.g., the one that assigns a value to the given subkey last) takes precedence. If different subkeys are defined for the same element match, both subkeys are applied during processing.

6.1.2.3 Expression interpolation

Template rules may also contain information selected from the document tree by adding XPath expressions delimited by curly braces « { } » to the rule's value. The contents of the braces are passed to the findvalue( ) function in which the expression is evaluated in the context of the current matching node, and the result is added to the output. For example, the following copies the value of the url attribute from the current < ulink > element into the href attribute of the newly created HTML hyperlink:

 $t->{ulink}{pre}  = '<a href="{@url}">'; $t->{ulink}{post} = '</a>';

To save resources, this sort of interpolation is not turned on by default. It must be explicitly switched on by adding the AxXPSInterpolate configuration directive to your httpd.conf or other server configuration file:

 AxXPSInterpolate 1