xsl:result-document

The <xsl:result-document> instruction is used to create a new result tree, and optionally to specify how the result tree should be serialized. The facility allows a transformation to produce multiple result documents, so you can write a stylesheet that splits a large XML file into smaller XML files, or into multiple HTML files, perhaps connected to each other by hyperlinks .

Changes in 2.0

This instruction is new in XSLT 2.0. Many XSLT 1.0 processors provided a similar capability as a proprietary extension, but there are likely to be differences in the detail.

Format

 <xsl:result-document   format? = qname   href? = {  uri-reference  }   validation? = "strict"  "lax"  "preserve"  "strip"   as? =  sequence-type  >   <!-- Content: sequence-constructor --> </xsl:result-document>

Position

<xsl:result-document> is an instruction, which means it may occur anywhere in a sequence constructor.

Attributes

Name	Value	Meaning
href optional	Attribute value template returning a relative or absolute URI	Defines the location where the output document will be written after serialization
format optional	lexical QName	Defines the required output format
validation optional	«strict », «lax », «preserve », or «strip »	Defines the validation to be applied to the result tree
type optional	lexical QName	Defines the schema type against which the document element should be validated

Note

the Working Group has agreed a late addition to this instruction, which allows it to take any of the serialization attributes defined on the <xsl:output> element, such as indent and encoding. These attributes can be specified as attribute value templates; they supplement and override those on the output definition named in the format attribute.

Content

The content of the <xsl:result-document> element is a sequence constructor.

The <xsl:result-document> element may contain an <xsl:fallback> element. If it does, the <xsl:fallback> element defines the action that an XSLT 1.0 processor will take when it encounters the <xsl:result-document> instruction. Note that the fallback processing only applies if the stylesheet is executing in forwards-compatible mode, which will be the case if you set «version="2.0" » on the <xsl:stylesheet> element, or «xsl:version="2.0" » on any enclosing literal result element. (XSLT 1.0 processors do not recognize a version attribute on any other element.)

Effect

When the <xsl:result-document> instruction is evaluated, a new document node is created, in the same way as for the <xsl:document> instruction. The sequence constructor contained in the <xsl:result-document> is evaluated to produce a sequence of nodes and items, and this sequence is used to form the content of the document node as described under <xsl:document> in the section The Content of the Document on page 258. The tree rooted at this document node is referred to as a result tree.

Validation of the result tree also follows the same rules as <xsl:document> : See Validating and Annotating the Document on page 259. Note that although the validation process (if requested ) conceptually creates a result tree in which the elements and attributes are annotated with types, these type annotations will never be seen if the result tree is immediately serialized. But a very useful processing model is to run a series of transformations in a pipeline, where the output of one stylesheet provides the input to the next . The pipeline might also include non-XSLT applications. For example, an XML database product might allow you to run a transformation as part of the process of loading new documents into the database, in which case the result of the transformation might well be captured directly in the database, complete with type information.

The difference between <xsl:document> and <xsl:result-document> is that <xsl:document> adds the new document node to the result sequence, making it available for further processing by the stylesheet, while <xsl:result-document> outputs the new document as a final result of the transformation.

What actually happens to the result tree is to some degree system-dependent, and it is likely that vendors will provide a degree of control over this through the processor API.

Often, the result tree will be serialized (perhaps as XML or HTML) and written to a file on disk. The format in which it is serialized is then controlled using the format attribute, and the location of the file on disk will typically be controlled using the href attribute.

If the format attribute is present then its value must be a lexical QName, which must match the name of an <xsl:output> declaration in the stylesheet. The result tree will then be serialized as specified by that <xsl:output> declaration. If there is no format attribute then the result tree ( assuming it is serialized at all) will be serialized as specified by the unnamed <xsl:output> declaration if there is one, or the default serialization rules if not.

The way in which the href attribute is used is deliberately left a little vague in the specification, because the details are likely to be implementation-dependent. Its value is a relative or absolute URI. Details of the URI schemes that may be specified are left entirely to the implementation. The specification is also written in such a way that the URI can be interpreted as referring either to the result tree itself, or to its serialized representation. The important thing that the specification says is that it is safe to use relative URIs as links from one output document to another: If you create one result document with «href= "chap1.html" » and another with «href="chap2.html" » , then the content of the first document can include an element such as <a href="chap2.html">next chapter</a> and expect the link to work, whether the result trees are actually serialized or not. The specification achieves this by saying that any relative URI used in the href attribute of an <xsl:result-document> element is interpreted relative to a Base Output URI, which in effect is supplied in the API that invokes the transformation.

We are used to thinking of URIs rather like filenames: as addresses of documents found somewhere on the disk. In fact, URIs are intended to be used as unique names for any kind of resource, hence their use for identifying namespaces and collations. Using URIs to identify result trees (which might exist only as a data structure in memory) is no different. Implementations are expected to provide some kind of mechanism in their API to allow the application to process the result tree, given its URI.

One way this might be done is to allow the application, when the transformation is complete, to use a method call such as getResultDocument (URI) to get a reference to the result tree with a given URI, perhaps returned in the form of a DOM document.

Another possible mechanism would be for the application to supply a resolver or listener object, which is notified whenever a result tree is created.

The specification of <xsl:result-document> is complicated by the fact that it is an instruction that has side effects. The instruction does not return a result (technically, it returns an empty sequence, which means it doesn't affect the result of evaluating the sequence constructor that it is part of). In a pure functional language, side effects are always problematical-though, of course, the only purpose of running any program is to change something in the environment in which it is run. Normally, if a compiler knows that a particular construct will return a particular result, then it can generate code that short cuts the evaluation of this construct. But it would destroy the purpose of <xsl:result-document> if it did nothing, just because the compiler already knows what result it will return.

To take an example of how this is a problem in practice, suppose that the stylesheet defined a variable as follows:

  <xsl:variable name="dummy">   <xsl:result-document href="hello.xml">   <hello to="world"/>   </xsl:result-document>   <xsl:sequence select="3"/>   </xsl:variable>

Now suppose that this variable is never referenced. Is the result document produced, or not? Normally, an XSLT optimizer will avoid evaluating variables that aren't referenced, but this strategy becomes problematic if the evaluation of a variable has a side effect.

The way that the XSLT specification has dealt with this problem is essentially to say that you can only use <xsl:result-document> when the sequence constructor you are evaluating is destined to form the content of a result tree. When the stylesheet starts executing, this condition is true for the sequence constructor contained in the first template to be evaluated, and it remains true except when you evaluate <xsl:variable> or similar elements such as <xsl:param>, <xsl: with-param > , and <xsl:message> . You also can't use <xsl:result-document> while evaluating the result of a stylesheet function defined using <xsl:function> , or while computing the content of <xsl:attribute>, <xsl:comment>, <xsl:value-of>, <xsl:namespace>, <xsl:processing-instruction>, <xsl:key>, or <xsl: sort > . This restriction is a runtime rule rather than a compile-time rule, for example you can use <xsl:result-document> within <xsl:template> if the template is called from within <xsl:element> , but not if it's called from within <xsl:variable> .

XSLT processors are allowed to evaluate instructions in any order. This means that you can't reliably predict the order in which result trees get written. There is a rule preventing a stylesheet from writing two different result trees with the same URI, because if overwriting was allowed, the results would be nondeterministic. There is also a rule saying that it's an error to attempt to write a result tree and then read it back again using the document() function: This would be a sneaky way of exploiting side effects and making your stylesheet dependent on the order of execution. In practice, processors may have difficulty detecting this error and you might get away with it.

The fact that order of execution is unpredictable has another consequence: If a transformation doesn't run to completion, because a runtime error occurred (or perhaps because <xsl:message> was used with «terminate="yes" » ) then it's unpredictable as to whether a particular result tree was output before the termination. In practice most processors only exploit the freedom to change the order of execution when evaluating variables or functions, so you are unlikely to run into this problem in practice.

Usage

Generating multiple output files is something I have often found useful when doing transformations. A typical scenario is that a weighty publication, such as a dictionary, is managed as a single XML file, which would be far too large to download for a user who only wants to see a few entries. So the first stage in preparing it for human consumption is to split it up into bite- sized chunks, perhaps one document per letter of the alphabet or even one per dictionary headword. You can make these chunks individual HTML pages, but I usually find it's better to do the transformation in two stages: First split the large XML document into several small XML documents, and then convert each of these into HTML independently.

The usual model is to generate one principal output file and a whole family of secondary output files. The principal output file can then serve as an index. Often you'll need to keep links between the files so that you can easily assemble them again (using the document() function described on page 532 in Chapter 7), or so that you can generate hyperlinks for the user to follow.

Examples

This feature is often used to break up large documents into manageable chunks. In the section for <xsl:param> on page 392 there is an example of a stylesheet that breaks up a Shakespeare play to produce a cover page together with one page per scene. But here we'll illustrate the principle with a much smaller document.

Creating Multiple Output Files

This example takes a poem as input, and outputs each stanza to a separate file. A more realistic example would be to split a book into its chapters, but I wanted to keep the files small.

Source

The source file is poem.xml . It starts:

  <poem>   <author>Rupert Brooke</author>   <date>1912</date>   <title>Song</title>   <stanza>   <line>And suddenly the wind comes soft,</line>   <line>And Spring is here again;</line>   <line>And the hawthorn quickens with buds of green</line>   <line>And my heart with buds of pain.</line>   </stanza>   <stanza>   <line>My heart all Winter lay so numb,</line>   <line>The earth so dead and frore,</line>   ...

Stylesheet

The stylesheet is split.xsl .

We want to start a new output document for each stanza, so we use the <xsl:result-document> instruction in the template rule for the <stanza> element. Its effect is to switch all output produced by its sequence constructor to a different output file. In fact, it's very similar to the effect of an <xsl:variable> element that creates a tree, except that the tree, instead of being a temporary tree, is serialized directly to an output file of its own:

  <?xml version="1.0"?>   <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   version="2.0">   <xsl:template match="poem">   <poem>   <xsl:copy-of select="title, author, date"/>   <xsl:apply-templates select="stanza"/>   </poem>   </xsl:template>   <xsl:template match="stanza">   <xsl:variable name="file"   select="concat('verse', string(position()), '.xml')"/>   <verse number="{position()}" href="{$file}"/>   <xsl:result-document href="{$file}">   <xsl:copy-of select="."/>   </xsl:result-document>   </xsl:template>   </xsl:stylesheet>

To run this example under Saxon, you need to make sure that an output file is supplied for the principal output document. This determines the base output URI, and the other output documents will be written to locations that are relative to this base URI. For example:

  java c:\saxon\saxon7.jar -o c:\temp\index.xml poem.xml split.xsl

This will write the index document to c:\temp\index.xml , and the verses to files such as c:\temp\verse2.xml .

Output

The principal output file contains the skeletal poem below (new lines added for legibility):

  <?xml version="1.0" encoding="utf-8" ?>   <poem>   <title>Song</title>   <author>Rupert Brooke</author>   <date>1912</date>   <verse number="1" href="verse1.xml"/>   <verse number="2" href="verse2.xml"/>   <verse number="3" href="verse3.xml"/>   </poem>

Three further output files verse1.xml, verse2.xml , and verse3.xml are created in the same directory as the principal output file. Here is verse1.xml:

  <?xml version="1.0" encoding="utf-8" ?>   <stanza>   <line>And suddenly the wind comes soft,</line>   <line>And Spring is here again;</line>   <line>And the hawthorn quickens with buds of green</line>   <line>And my heart with buds of pain.</line>   </stanza>

For another version of this example, which uses the element-available() function to test whether the <xsl:result-document> instruction is implemented and takes fallback action if not, see the entry for element-available() on page 542 in Chapter 7.