6.3 XPathScript Cookbook | XML Publishing with Axkit

Just as we concluded our look at XSLT with a few recipes that offer solutions to common tasks , we will do the same here with XPathScript. Since most XSLT tips are easy to re-create using XPathScript, I will not repeat the same recipes but will focus instead on things unique to XPathScript (or simply not possible with vanilla XSLT 1.0).

6.3.1 Accessing Client Request and Server Data

6.3.1.1 Problem

You need to access information about the current request (POSTed form data, cookies, etc.) or other parts of the Apache HTTP server API from within your stylesheet.

6.3.1.2 Solution

Use the Apache object that is passed as the first argument to the top level of every XPathScript stylesheet transformation.

 <% # at the top lexical level of your XPathScript stylesheet: my $r = shift; # $r now contains the same Apache object passed to mod_perl handler scripts. %>

6.3.1.3 Discussion

One benefit of running XPathScript inside of AxKit is that each stylesheet can directly access the Apache server environment via the same Apache object that gives mod_perl its power and flexibility. This object can be used to examine (and in many cases, control) virtually every aspect of the Apache HTTP Server, from user -submitted form and query data to outgoing headers and host configurations.

6.3.1.4 Accessing form and query data

 <% use Apache::Request;  my $r = shift; my $cgi = Apache::Request->instance($r); %> <html>   <body>      <p>       You said that your favorite fish is <%= $cgi->param('fave_fish') %>      </p>   </body> </html>

6.3.1.5 Setting cookies

 <% use Apache::Cookie;  my $r = shift; my $out_cookie = Apache::Cookie->new( $r,      -name => 'mycookie',     -value => 'somevalue' ); $out_cookie->bake;  %> <html>     <!--Stylesheet contents  --> </html>

6.3.1.6 Redirecting the client

 <% use Apache::Constants qw(REDIRECT OK); my $r = shift; if ( $some_condition =  = 1 ) {     $r->headers_out->set(Location => '/some/other.xml');     $r->status(REDIRECT);     $r->send_http_header; } %> <html>     <!--Default stylesheet contents  --> </html>

6.3.2 Generating Fresh Dynamic Content

6.3.2.1 Problem

Your XPathScript stylesheet generates or transforms content based on runtime logic, but AxKit is serving the cached result of a previous transformation.

6.3.2.2 Solution

Pass a nonzero value to the no_cache( ) method on the Apache object to turn off caching for the current resource:

 <% my $r = shift; $r->no_cache( 1 ); %> <html>     <!--Default stylesheet contents  --> </html>

6.3.2.3 Discussion

AxKit's aggressive default caching behavior helps keep it speedy and easy to set up for many common publishing cases, but sometimes, your stylesheets will need to generate or transform content based on data unavailable until runtime. In these cases, the stylesheet behaves as expected for the first request, but once the cache is created, the result of all transformations in the current processing chain is sent directly to the client, and the stylesheet containing the dynamic logic is not applied again until the source XML file (or one of the stylesheet documents) is changed.

Setting the AxNoCache configuration directive to On can do the trick, but setting it up on a resource-by-resource basis can be cumbersome. It gets even trickier if several possible transformation chains can be applied to a given resource and only some of them could benefit from turning caching off altogether.

The most direct way to ensure that your dynamic stylesheet is always applied is to simply pass a true value to $r->no_cache( ) from within your XPathScript stylesheet itself. Be aware that any resources transformed using that stylesheet will never be cached (and the entire transformation chain will be run for each request), but if you're generating or transforming content dynamically, that is probably what you want to happen in any case.

6.3.3 Importing Templates Dynamically

6.3.3.1 Problem

You want to apply different sets of template rules based on runtime data or aspects of document content beyond the root element name or DOCTYPE declaration.

6.3.3.2 Solution

Use a simple wrapper stylesheet containing a call to the import_template( ) function, and generate the path to the imported stylesheet dynamically:

 <% my $r = shift; $r->no_cache( 1 ); my $import_path = '/styles/myapp/'; # Select the import based on the presence of an <error>  # element anywhere in the source document. if ( findvalue('//error') ) {     $import_path  .= 'error.xps'; } else {     $import_path .= 'default.xps'; } import_template($import_path)->( ); %> <html>     <%= apply_templates( ) %> </html>

6.3.3.3 Discussion

The combination of AxKit's styling directives and StyleChooser plug-ins offers the ability to create chains of transformations, then select the appropriate chain to apply at request time based on a wide variety of environmental factors (the type of client making the connection, the root element name of the source XML document, etc.). However, sometimes there are cases in which explicitly defining (then selecting) a processing chain for all possible sets of conditions can get tedious .

Using the AxAddDynamicProcessor directive (that lets you write a Perl class that will generate the steps in the processing chain dynamically, at request time) offers a solution. However, it can seem like overkill for a lot of cases, and even then, you cannot readily access the content of the source XML document. Letting logic in the XPathScript stylesheet itself decide which set of imported templates to apply allows you to set up a single processing chain while still applying the appropriate styles in response to the environment.

6.3.4 Tokenizing Text into Elements

6.3.4.1 Problem

You have elements in your source XML document with text content that you want to tokenize into child elements or mixed content (containing both text data and child elements).

6.3.4.2 Solution

Use a template with a testcode subroutine that operates on the element's text children directly, and use simple Perl text-handling logic to create the new elements:

 $t->{'my:element'}{testcode} = sub {     my ($node, $t) = @_;          foreach my $child ( $node->getChildNodes ) {         if ( $child->getNodeType =  = TEXT_NODE ) {             my $text = $child->getValue( );             # Process $text as a string, adding pointy brackets as needed             # to create tokenized  mixed content             print $text;         }         else {             # otherwise, apply templates manually to the child node             apply_templates( $node );         }         # You have already processed this node and its children, so tell         # the XPathScript processor not to.         return 0; }

6.3.4.3 Discussion

Carving up text nodes into tokenized mixed content, sometimes called up-translation , can be a tricky proposition ( especially using XSLT), but it is not as crazy as it may sound. For example, you may be given documents that contain dates marked up as a date element containing a single text child, but it would simplify processing if the year, month, and day components were defined individually:

 <!-- What you get --> <date>2003-06-07</date> <!-- What you wish you were getting --> <date>     <year>2003</year>     <month>06</month>     <day>07</day> </date>

Here, the solution is to select the value of the date element, split that value on the « - » character using Perl's built-in split function, then print the appropriate textual representation of the new elements:

 $t->{date}{testcode} = sub {     my ($node, $t) = @_;     my $date_string = $node->getValue( );     my ($year, $month, $day) = spilt(/-/, $date_string);     print "<date><year>$year</year><month>$month</month><day>$day</day></date>";     return 0; };

You must return from the testcode subroutine, since you have already printed all the output you need for the current element, and you do not want the XPathScript processor to descend into the old text node and try to add it to the output. Be aware, too, that by taking control of the entire output and returning from the testcode subroutine, you render the other template options ( pre , post , prechild , showtag , etc.) useless for the current element. That is, by printing the output and returning you are essentially saying to the XPathScript processor "Do not process this node. I want to do it myself ," so the other rules are never applied.

Things get more interesting when the element in question may already have mixed content, and you need to make sure that the child nodes are processed in the typical way. In this case, you need to manually iterate over all the children of the current node and examine the node type of each child. You can then operate on the text nodes, as needed, while explicitly calling apply_templates for all other types of nodes.

Example 6-2 illustrates how to cope with the task of tokenizing character data when the elements being transformed already have a mixed content model. This stylesheet examines the text contents of all p elements and their children for the presence of a keyword passed in via a posted form or query string parameter named matchword . A simple Perl regular expression is used to find the matches. When one is found, it is replaced in the result by the same word wrapped in a span element whose style class renders that word in the browser in bold text with a yellow background.

Example 6-2. XPathScript stylesheet for highlighting keywords in XHTML

 <% use Apache::Request; my $r = shift; $r->no_cache( 1 ); my $cgi = Apache::Request->instance( $r ); sub tokenizer {     my ($element, $t) = @_;           if ( my $match_word = $cgi->param('matchword') ) {         my $name = $element->getName( );         print "<$name>";         foreach my $node ( $element->getChildNodes ) {             if ( $node->getNodeType( ) =  = TEXT_NODE ) {                 my $text = $node->getValue( );                 $text =~ s\b($match_word)\b<span class="matched"></span> g;                 print $text;             }             else {                 print apply_templates( $node );             }         }         print "</$name>";         return 0;     }     else {         $t->{showtag} = 1;        return 1;     } } # apply the 'tokenizer' sub to all 'p' elements # and their children $t->{'*'}{testcode} = sub {     my ( $node, $t ) = @_;     if ( $node->findnodes('ancestor-or-self::p') ) {        tokenizer( $node, $t );     }     else {         return 1;     } }; %> <html>   <head>     <style>     span.matched { background: yellow; font-weight: bold;}     </style>    </head>   <body>   <form action="highlight.xml" method="GET">     <input name="matchword" type="text">     <input type="submit">   </form>     <%= apply_templates( ); %>   </body> </html>

Certainly, XPathScript is not to everyone's taste. For some, its visible reliance on Perl and the fact that it does not use an XML syntax are enough to send them scrambling for the hills. For others, having an XML syntax for their processing tools (in addition to the content source) is going too far, and those very same properties are what make XPathScript desirable. Reality is that one size never fits all. The fact that AxKit supports a variety of transformation languages is part of what makes it a viable , professional development environment.

Similarly, do not let the fact that XSLT and XPathScript were featured in the last two chapters dissuade you from investigating the other options that AxKit makes available. The goal here has been simply to introduce two of the more popular choices in an effort to give you the means to be productive with AxKit as quickly as possible, not to recommend one technology over another. For example, you may find that one of the other generic transformation tools such as SAX filter chains (Apache::AxKit::Language SAXMachines) or Petal (Apache::AxKit::Language::Petal), is better suited to your needs. In any case, remember that picking one transformative Language module does not mean abandoning all othersAxKit allows you to pick the tool that best suits the task, even mixing different languages within a single processing chain.