Section 5.6.  Batch processing

  
Prev don't be afraid of buying books Next

5.6 Batch processing

Just one more coding session, and we'll make our stylesheet at least an order of magnitude more convenient to use.

Statement of the problem. No matter what XSLT processor you use, to run a stylesheet on a page document you normally have to specify full pathnames of the input and output files. These pathnames may be relativefor example,


 saxon -o out/de/index.html de/index.xml style.xsl env=staging 


but only if you are running this command from the correct base directory.

And therein lies a problem. Our $src- path and $out-path are stored in and retrieved from the master documentand now we have to spell them out once again in the command line! All sorts of unpleasant surprises are bound to happen if our command line does not correspond to the environment parameters stored in the master document.

This is indeed a problem, and we are going to deal with it immediately. Instead of applying the stylesheet manually to each page document, we can relatively easily implement batch processing that is, transforming many pages at once. (And while we're at it, why not validate all those pages before transformation?) The idea is to run the stylesheet on one source filethe master documentand let it figure out automatically the input/output paths of all page documents registered in the master.

The importance of being functional. In principle, we could use XSLT's ability to handle multiple input and multiple output documents to open , transform, and write out all site pages registered in the master during one stylesheet run. For example, a template matching a page element in the master and transforming the corresponding page document might be as simple as [31]

[31] The page-out() function in this example is similar to page-link() ( 5.1.1.4 ) but uses $out-path instead of $target-path . It is only necessary for meta-functionality such as batch processing where the stylesheet needs to access the files it creates.




 <xsl:template match=  "site//page"  >   <xsl:result-document href=  "eg:page-out(@id, $lang)"  format=  "html"  >     <xsl:apply-templates         select=  "document(eg:page-src(@id, $lang))"  />   </xsl:result-document> </xsl:template> 


There's a serious problem with this approach, however: It only works if we have no global variables , or if our global variable values are equally applicable to all page documents. This is not the case with our stylesheet. Recall ( 5.1.1 ) that many of the global variables defined in our stylesheet store the path, language, and other parameters directly related to the currently processed page document. And XSLT won't allow us to change the values of these variables when we're finished with one page and move on to the next one.

This is an interesting situation illustrating how the principles of functional programming ( 4.3 ) affect the design of an XSLT stylesheet. In this case, the entire stylesheet is a function whose input is one XML page document and whose output is one HTML web page. If we want to process several documents, we are supposed to simply call this function several times, not try to cram all processing into one call.

5.6.1 Launcher templates

But can we really call our stylesheet as a function from within itself? Remember that we have the files:run() extension function ( 5.5.2.7 )and we're not afraid to use it to run as many copies of the XSLT processor as necessary to validate and transform all pages of the site.

Launching transformation. Example 5.17 is a template that runs transformation on all language versions of one page . Note the mode value of transform ; we want this template to activate only when specifically called with this mode . Within the xsl:for-each loop that iterates over all defined languages, an xsl:message first reports the pathname of the file being transformed. Then, the complete command line is constructed and fed to the files:run() function.

Example 5.17. The transformation template uses the files:run() function to launch one instance of the XSLT processor per page document.
 <xsl:template match=  "page"  mode=  "transform"  >   <xsl:variable name=  "id"  select=  "@id"  />   <xsl:for-each select=  "$master//languages/lang"  >     <xsl:message>  Transforming  <xsl:value-of select=  "eg:page-src($id, .)"  />     </xsl:message>     <xsl:value-of select=  "   files:run(concat(   'java net.sf.saxon.Transform ',   '-o ', eg:page-out($id, .),   ' ', eg:page-src($id, .),   ' style.xsl',   ' env=', $env,   ' images=', if ($id='home') then $images else 'no'   ))"  />   </xsl:for-each> </xsl:template> 

The command line consists of a call to Java with Saxon's class name, output (after -o ) and input documents' pathnames, stylesheet pathname ( style.xsl ), and stylesheet parameters. Note that the images parameter of the current stylesheet process is passed to the subprocess only for the front page, because we don't want to re-create all images again and again for every page of the site.

Other commands, other environments. To make this command line truly portable, you may want to replace all the constant strings with variables and store these variables' values in the corresponding environment in the master document. Another launcher template for Schematron validation (with mode="validate" ) might be similar to Example 5.17, except that there's no -o parameter in the command line and the compiled schema ( 5.1.2 ) is used instead of the main transformation stylesheet. See Example 5.21 for a complete stylesheet listing including both transformation and validation launcher templates.

5.6.2 The batch template

Now, all we need to do is apply the launcher templates to the master's page elements from a template matching the root element of a master document, site (Example 5.18).

Example 5.18. The main batch processing template calls the validation and transformation launcher templates and logs all the commands they run.
 <xsl:template match=  "/site"  >   <xsl:result-document       href=  "file:///{$src-path}validate-commands.log"  format=  "txt"  >     <xsl:apply-templates         select=  "menu//page"  mode=  "validate"  />   </xsl:result-document>   <xsl:result-document       href=  "file:///{$src-path}transform-commands.log"  format=  "txt"  >     <xsl:apply-templates         select=  "menu//page"  mode=  "transform"  />   </xsl:result-document> </xsl:template> 

Note that each of the xsl:apply-templates calls is enveloped in an xsl:result-document that redirects the output of the applicable template into a log file. Since the files:run() function returns its command-line argument and the launcher template outputs it using xsl:value-of , after each batch run you will have two log files listing all commands that were executed for validation and transformationfor your debugging pleasure .

Other site-wide jobs. Besides batch processing, the template for the master's root element might perform other tasks that apply to the site in general but to no specific page in particular. One such task is creating the robots.txt file controlling site access by search engine spiders; another is creating a site map page graphically depicting the hierarchy of sections and pages.

Separation vs. convenience. It might be cleaner to place both the launcher templates and the batch template into a separate stylesheet, as they have little to do with the main transformation stylesheet. On the other hand, matching the master document's root element (not matched by anything else) and the unique modes of the launcher templates provide sufficient separation so that adding them to the main stylesheet is convenient and causes no problems.

5.6.3 Problems and solutions

Our batch processing implementation is very simpleyet workable . What are its advantages, aside from the obvious time-saving convenience?

  • These templates are absolutely orthogonal to everything else in the stylesheet . We didn't have to change anything in our normal processing of a page. You can still use the same stylesheet for transforming single pages as before.

  • Compared to other batch processing approaches, such as using shell scripts, batch files, or build tools ( 6.5 ), the stylesheet-only solution is the most portable and independent of the underlying platform. If you can run your stylesheet at all, you can run it in batch mode. Using the files:run() extension function spoils the picture a bit; still, copying an extension Java class from one system to another is usually much easier than porting a shell script or a native binary utility.

There are certain problems associated with this approach as well, some of them relatively easy to resolve and some more serious.

  • As implemented, the stylesheet processes all pages of the site, which may take quite some time . You thus have a limited choice between transforming one page only or producing the entire site. More convenient would be specifying pages (by their identifiers) for transformation in a stylesheet parameter; for example,


     saxon _master.xml style.xsl process="fb+ home" 


    to process the fb+ and home pages only. Implementing this process stylesheet parameter is not in fact difficult and could be a good exercise for a practical XSLT programmer. (Hint: Use tokenize() or xsl:analyze-string .)

    Even more useful would be the possibility of automatically running the stylesheet only on those pages that have changed since the last transformation. This is tricky, and while doable with XSLT, may require an uncommon amount of extension programming. It is where an external build tool might be a better solution ( 6.5 ).

  • Now, validation errors do not stop batch processing , which always continues until the last page document (or until a manual user break). If you want processing to stop upon encountering a validation error, you must do some nontrivial programming in a special Java methodcatching the validation output, parsing it, and reporting the validation outcome as a single boolean value.

    Is this worth the trouble? Probably, if you are planning to run batch processing in an automatic unattended fashionyou don't want it to blindly transform (and possibly even upload to the server!) broken pages. Still, an XSLT stylesheet may not be the best place for implementing validation-dependent transformation.

 
  
Amazon


XSLT 2.0 Web Development
ASP.Net 2.0 Cookbook (Cookbooks (OReilly))
ISBN: 0596100647
EAN: 2147483647
Year: 2006
Pages: 90

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net