Chapter 10. Controlling Output Options

CONTENTS
  •  10.1 The <xsl:output> Top-Level Element
  •  10.2 The <xsl:strip-space> and <xsl:preserve-space> Top-Level Elements
  •  10.3 Generating Error Messages and Logs
  • <xsl:output>

  • <xsl:strip-space>

  • <xsl:preserve-space>

  • <xsl:message>

In this chapter, we will work with elements that are, for the most part, concerned with how the output from the stylesheet is handled or processed, and what kind of output it will be. The first element, <xsl:output>, is not used to create output (contrary to its name), but controls the logical structure of the output, and what kinds of processing-specific components are to be included or excluded. The other two top-level elements discussed in this chapter are the complementary pair <xsl:strip-space> and <xsl:preserve-space>. Notably, <xsl:strip-space> and <xsl:preserve-space> do not affect output directly; they govern the handling of whitespace in the input tree, which of course can affect the whitespace in the output result tree.

As another output option, messages such as logs and errors can be generated from the actual processing of the stylesheet using the <xsl:message> element, which is discussed in the last section of this chapter.

10.1 The <xsl:output> Top-Level Element

The <xsl:output> top-level element establishes the type of structure the output will contain (XML, HTML, or text) and provides attributes to select several different output options. It must be a child of the <xsl:stylesheet> or <xsl:transform> document element, and can occur anywhere within the document element, unless the <xsl:import> element is present, in which case it must come after that element. Its ten attributes, shown in the following element model definition, are optional and will be discussed individually in the following sections. This top-level element is always an empty element.

<!-- Category: top-level-element --> <xsl:output   method = "xml" | "html" | "text" | qname-but-not- ncname   version = nmtoken   encoding = string   omit-xml-declaration = "yes" | "no"   standalone = "yes" | "no"   doctype-public = string   doctype-system = string   cdata-section-elements = qnames   indent = "yes" | "no"   media-type = string /> 

Multiple <xsl:output> elements are allowed within one stylesheet and can also be imported from other stylesheets. However, the processor concatenates the information from each one into a single <xsl:output> element with all the effective values. If conflicts arise, the same conflict resolution for templates (discussed in Chapter 7) applies, except for the values of cdata-section-elements attributes, which are concatenated into one list.

In its primary capacity, <xsl:output> affects the form of the logical structure of the XSLT stylesheet's output. We use the phrase logical structure to underscore that formatting of output "pretty-printing" is not specifically the purview of XSLT, apart from any HTML LREs. The function of formatting is under the scope of the XSL formatting objects specification.[1] The formatting affected by <xsl:output> is at a more general level. The <xsl:output> element affects the type of output, HTML, text, or XML, not the human-readable or printable results. Whitespace and indenting of elements are affected, but things like running headers and footers are not. Character encoding, inclusion of DOCTYPE declarations, and similar kinds of output structural elements are specified using <xsl:output> and its attributes, as discussed in the following sections.

10.1.1 Attributes for <xsl:output>

The best way to enumerate the functions performed by <xsl:output> is through its attributes, which are the mechanisms by which it affects its results. We will introduce each of them in turn, according to the order in which they are found in the element model definition (shown above) from the W3C specification for XSLT, Section 16. These attributes are also discussed individually according to how they operate with each of the values of the <xsl:output> element's method attribute.

10.1.1.1 The method Attribute

The most frequently used attribute for <xsl:output> is the method attribute, which describes the overall file type, or logical structure, of the output XML document instance. If you are familiar with the Save As function in a word processor, which allows the choice of different file types, such as text-only or rich text format, you will see a similar functionality in the role played by the method attribute.

This attribute has three predefined values of xml, html, and text, and one user-defined value. The output from using a method equal to xml will always be a well-formed document. If the output method is html, the logical structure of the output document will be HTML, valid according to the version of HTML selected (and, correspondingly, not equivalent to XHTML). Finally, in the case of text, the output is similar to a file whose content is only a text string or plain text file. We will attend to each output type in turn after generally introducing the balance of the <xsl:output> attributes.

There is also a fourth kind of output, shown in the element model definition above as qname-but-not-ncname, which can be defined by extensions supported by specific XSLT implementations of various processing software. Extensions are discussed in Chapters 12 and 13, which address processor-specific implementations. However, the following brief summary will provide an overview.

A QName is accepted as a fourth possible value for method, if there is a supported extension output type for that method. The QName must be prefixed by the namespace prefix for the extension output type. A namespace declaration must also be in effect for the prefix at the point in the XSLT stylesheet at which the output method is used.

10.1.1.2 The version Attribute

The version attribute for <xsl:output> is used to stipulate the version of the respective output type applicable to xml, html, and conceivably, extension output types as declared with the method attribute. For XML output, the current value would be 1.0 to match the current version of the XML specification. For HTML output, the default is assumed to be 4.0, unless it is necessary to choose HTML 2.0 for some specific purpose perhaps to display the HTML LREs that are output in an older version of Netscape, for instance.

10.1.1.3 The encoding Attribute

The encoding attribute is used in combination with the xml, html, and text values for the method attribute, and specifies which encoding the output result tree will have for a particular character set. The result will be different for the different kinds of output specified under method. With an xml output, for instance, the stipulation of UTF-16 will create an encoding attribute with a value of UTF-16 in the XML declaration of the resulting output XML file. With either text or html output, the processor will generate UTF-16 valid text. XSLT processors are not required to support all values of encoding, but they are required to respect both UTF-8 and UTF-16.

10.1.1.4 The omit-xml-declaration Attribute

The omit-xml-declaration attribute is only used when the method output is set to xml. It governs whether or not there must be an XML declaration at the beginning of the output XML document instance. The default, if left undeclared, is no, meaning that there will be an XML declaration. The only other possible value is yes, which omits the declaration from the output result tree.

10.1.1.5 The standalone Attribute

The standalone attribute establishes the value of the standalone attribute generated in the XML declaration of the output document. Its possible values are yes and no. It is only used when the method attribute has a value of xml, and is ignored otherwise since it is not applicable to the other output types. If the value of the standalone attribute is set to yes, the resulting XML document (including its DTD or any external parameter entities, if specified) is understood to be free from external declarations. External declarations, defined by the XML specification, are:

  1. Attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes

  2. Entities (other than amp, lt, gt, apos, quot), if references to those entities appear in the document

  3. Attributes with values subject to normalization, where the attribute appears in the document with a value which will change as a result of normalization

  4. Element types with element content, if whitespace occurs directly within any instance of those types

References to external entities in a standalone document are valid as long as the entities are declared within the document. The following use of <xsl:output> shows the declaration of the standalone attribute:

<xsl:output method="xml" standalone="yes"> 

This use of the <xsl:output> element results in the following XML declaration in the output document:

<?xml version="1.0" standalone="yes"?> 
10.1.1.6 The doctype-public Attribute

The doctype-public attribute is used to add a public identifier in the document type declaration of the output document. The value of the attribute is a string, which becomes the value of the public identifier in the output document. Note that the XSLT processor does not validate the string to ensure a proper identifier. The string is sent directly to the output document as it appears in the stylesheet. The doctype-public attribute is used with either the html or xml output method.

10.1.1.7 The doctype-system Attribute

The doctype-system attribute enables the declaration of a system identifier for the document type declaration in the output document. The value of the attribute is a string, in the form of a URL or relative path, which becomes the value of the system identifier in the output document. Note that the XSLT processor does not validate the string to ensure a proper identifier. The string is sent directly to the output document as it appears in the stylesheet. The doctype-system attribute is used with either the html or xml output method.

10.1.1.8 The cdata-section-elements Attribute

The cdata-section-elements attribute is used to list the element-type names in the output XML document whose text node children will be output as CDATA[2] sections. CDATA sections are used to escape portions of the output result tree that will possibly contain special characters that should not be escaped with character entity references.

As an example, some XML documents may contain a <sample> element that will be used to contain programming-specific code. Therefore, a cdata-section-elements attribute value of "sample" would stipulate that all text children of the <sample> element are to be output as CDATA sections. In other words, the text inside any <sample> element will appear in the output with the CDATA section delimiters in the form "<![CDATA[original text]]>", where "original text" is the original content of the text node. The CDATA section delimiters are <![CDATA[ for the opening delimiter and ]]> for the closing delimiter. These delimiters are special characters and are considered markup.

The sequence of two closing square brackets followed by a greater-than symbol (]]>), when found inside a CDATA section in the output, will cause an error. It is not valid for a CDATA section to contain another CDATA section. Note that the > symbol can appear in the CDATA section as the appropriate character entity reference &gt; and still be considered part of the markup. If this sequence of characters is found in an LRE in the stylesheet or in an input element that will be treated as a CDATA section in the output, the processor will generate two individual CDATA sections, one containing the two ]] and another containing the >.

If any special characters are found in the <sample> element's text node, the special characters are output as the correct character instead of the entity reference. So, if the text contains characters like & or <, or even XML character references like &amp; or &lt;, the real character will appear in the output instead of the character reference.

The cdata-section-elements attribute value can contain more than one element-type name if the names are separated by whitespace. It is used when the method attribute value is xml, and it is not used with any other type of output. Note that XT does not currently support this functionality.

10.1.1.9 The indent Attribute

The indent attribute can be used when either xml or html output is specified with the method attribute. When set to the value of yes, the hierarchical structure of the output result tree is indented, and in most cases, each element is on a separate line. This functionality provides a very user-friendly formatting of the output XML, since most processors by default ignore whitespace in the input and deliver "bunched up" XML. The XSLT processor has the discretionary option to implement this feature, and how it does so is also discretionary. The default value of the indent attribute, if left undeclared, is no.

Using the indent attribute with mixed content models will deliver unpredictable indentation. In other words, if the hierarchy of input elements contains text strings as siblings to other elements, the output from using indent set to yes can occasionally be unpredictable.

10.1.1.10 The media-type Attribute

The media-type attribute can be used with any value set with the method attribute, though uses with extension methods are not governed by XSLT. Values for this attribute are based on an external transport protocol called MIME.[3] For example, in an XML output document, the MIME type would be text/xml, and for HTML, text/html. If the XML being output is to be used as an application or data transport mechanism, the MIME type will be application/xml. If the output file type is text, the MIME type is text/plain. Because xml is the default for the method attribute, when there is no method specified, the default value of the media-type attribute is text/xml. The value of the media-type attribute can be used for clarifying how a given environment should handle the output file according to how or if it reads these kinds of MIME type declarations.

The two MIME media types applicable to XSLT and XML documents are text/xml and application/xml. These two media types are outlined in the W3C specification RFC2376. Additional media types may be registered in the future, and if one for XSLT specifically is registered, the new type should be supported by XSLT processors. An example of using the media-type attribute is as follows:

<xsl:output method="text" encoding="UTF-8" media-type="text/plain"/> 

10.1.2 Working with the xml Output File Method

All of the <xsl:output> attributes are applicable to the xml output method, but may not be for other output types. For example, the standalone and omit-xml-declaration attributes are purely for XML output, while the others may apply to HTML or text, and possibly to extension output methods selectively, based on their function. The xml output method is the default output type unless, as described in Section 10.1.3 below, several output conditions are met that will cause the processor to generate HTML instead.

10.1.2.1 Using the omit-xml-declaration Attribute with XML Output

The omit-xml-declaration attribute controls the output of the XML declaration. If the omit-xml-declaration attribute is not specified, the XML declaration, <?xml version="1.0"?> will be included, as long as the output method is xml. The behavior of many of the other <xsl:output> attributes, such as version, encoding, and standalone, is somewhat predicated on whether there is an XML declaration. Using <xsl:output method="xml" omit-xml-declaration="yes" /> will suppress the XML declaration in the output result tree.

10.1.2.2 The version Attribute with XML Output

Working with XML means this attribute has certain values already prescribed. Because the current version of the XML specification is 1.0, the only currently valid version number for <xsl:output> when the method attribute value is xml is 1.0. When the version of the W3C XML specification changes, the version for the XML specification in the output document can be changed using the version attribute of <xsl:output>.

Note that the current version of XT does not implement this function. The version for the XSL specification generated using XT is 1.0, regardless of the <xsl:output> version attribute.

10.1.2.3 The encoding Attribute with XML Output

If an encoding other than the default UTF-8 is desired in the XML declaration of the output document, the encoding attribute on <xsl:output> can be used to generate it. XSLT processors check the value of the encoding attribute and should signal an error if the value is not valid or is not a supported encoding type.

Encoding types are registered with the Internet Assigned Numbers Authority (IANA),[4] as defined in the W3C specification (RFC2278).[5] XML processors are not required to support anything other than UTF-8 and UTF-16, but according to the XML specification, the following character sets may be used:

In an encoding declaration, the values "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used for the various encodings and transformations of Unicode / ISO/IEC 10646; the values "ISO-8859-1", "ISO-8859-2", "ISO-8859-9" should be used for parts of ISO 8859; and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should be used for the various encoded forms of JIS X-0208-1997. (XML specification, Section 4.3.3)

The following <xsl:output> element shows the use of the encoding attribute:

<xsl:output method="xml" encoding="EUC-JP" /> 

This will output the following XML declaration in the output file:

<?xml version="1.0" encoding="EUC-JP"?> 

The value of the encoding attribute is case-insensitive and must contain only printable ASCII characters. Additional user-defined character encoding sets are allowed, but if the specified encoding is not a charset registered with the IANA, it must start with X-.

10.1.2.4 The doctype-public and doctype-system Attributes with XML Output

The doctype-public and doctype-system attributes cause a DOCTYPE declaration to be output immediately following the XML declaration in the output result tree. The doctype-public attribute, which passes as its value a string that is a public identifier, will create a public document type declaration with the key word PUBLIC. The doctype-system attribute passes its value as a string in the form of a URI, which is a system identifier either a URL or a system path which may or may not resolve to an actual "place" on the file system or Internet. For example, using <xsl:output method="xml" doctype-system="http://www.my_store.com/my.dtd" /> generates a DOCTYPE declaration in the output result tree as follows:

<!DOCTYPE topelement SYSTEM "http://www.my_store.com/my.dtd"> 

When used together, the two attributes are combined into one DOCTYPE declaration with both PUBLIC and SYSTEM identifiers as follows:

<xsl:output method="xml"       doctype-public="-//MY STORE//DTD MyDTD//EN"       doctype-system="http://www.my_store.com/my.dtd" /> 

This use of <xsl:output> will produce the following DOCTYPE declaration unless there is an omit-xml-declaration attribute declared with a value of yes:

<!DOCTYPE topelement PUBLIC "-//MYSTORE//DTD MyDTD//EN" "http://www.my_store.com/my.dtd"> 

The key word DOCTYPE is followed by the name of the top-level element of the XML document. The name for the document element, given in these examples as "topelement," is determined by the XSLT processor according to the element-type name of the first element node in the output document. The name is followed by the key word PUBLIC and the public declaration, if a public declaration exists. If the public declaration does not exist, the name is followed by the key word SYSTEM and the system declaration. If both declarations exist, the SYSTEM key word is dropped.

It is possible to use two different <xsl:output> elements to declare the system and public identifiers separately, but they are still combined into one DOCTYPE declaration in the output. Note that the XSLT processor does not validate the string content of the doctype-system or doctype-public attributes to ensure a proper identifier. The string is sent directly to the output document as it appears in the stylesheet.

10.1.2.5 The standalone Attribute with XML Output

When the standalone attribute is declared with <xsl:output>, the presence or absence of the standalone attribute in the XML declaration in the resulting XML document is determined.

The default value of this attribute is no; should the standalone attribute not be declared in the <xsl:output> element, the XML declaration will still be created in the result (assuming the omit-xml-declaration attribute has not been declared in <xsl:output> with a value of yes), but there will be no standalone attribute declared. When the standalone attribute is declared with a value of yes, the resulting XML document instance will have a standalone="yes" attribute node added to the XML declaration. See Section 10.1.1.5 for more information on the standalone attribute in the XML declaration. The following use of <xsl:output> shows the declaration of the standalone attribute:

<xsl:output method="xml" standalone="yes"> 

This use of the <xsl:output> element results in the following XML declaration:

<?xml version="1.0" standalone="yes"?> 
10.1.2.6 The indent Attribute with XML Output

When the indent attribute value is yes, the processor has the option of indenting the XML output tree to reflect the hierarchy of the elements. The implementation of the indent attribute leaves some discretion to the XSLT processor, so there is no guarantee either of indentation or of consistency of results from one processor to another. In most cases, if the indent attribute is set to yes, the output structure will have line breaks and/or indents for each element.

The default value of the indent attribute is no. In other words, when this attribute is not declared, the output result tree reflects no hierarchically generated whitespace.

The XSLT specification indicates, "It is usually not safe to use indent="yes" with the document types that include element types with mixed content." In other words, if the hierarchy of input elements contains text strings as siblings to other elements, the output from using indent set to yes can occasionally be unpredictable. Otherwise, the following declaration will make a conventional indented hierarchical representation:

<xsl:output method="xml" indent="yes" /> 

This would result in an output tree that, ideally, would look similar to the following:

<element>       <sub-element>             <sub-sub-element>Some text</sub-sub-element>       </sub-element> </element> 

Note that XT does not provide any indentation, but does add line breaks for each element.

10.1.2.7 The cdata-section-elements Attribute with XML Output

CDATA sections in XML are used to escape character strings. Elements that are to be formatted as CDATA in the output result tree are identified by their element-type name, given as the value of the cdata-section-elements attribute. If there is more than one element to be output as CDATA, they should be itemized in the value of the attribute as a whitespace-separated list in other words, listed with a space between each element-type name, with no commas or other separating tokens.

The creation of a CDATA section for an LRE called "sample" would involve the following use of <xsl:output>:

<xsl:output method="xml" cdata-section-elements="sample" /> 

This would apply to the following <sample> LRE from the stylesheet:

<sample>Some stuff goes in here</sample> 

The LRE would be sent to the output result tree written as follows:

<sample><![CDATA[Some stuff goes in here]]></sample> 

The cdata-section-elements causes the basic CDATA section delimiter syntax of <![CDATA[, followed by ]]>, to be wrapped around the text of any element in the output result tree specified in cdata-section-elements.

Special character entities escaped with their defined character entity references in the input are resolved to their respective characters; for example, &lt; is resolved upon output in a CDATA section as <.

For example, assume the stylesheet has the following content for the <sample> LRE:

<sample>&lt;example></sample> 

The content of the LRE would be sent to the output, surrounded by the CDATA marked section delimiters, with the character entity reference &lt; resolved to the appropriate character, <, as follows:

<sample><![CDATA[<example>]]></sample> 

Further, if the LRE in the stylesheet has the CDATA section already marked, the output would be exactly the same. For example, the result of the following LRE is equivalent to the previous example:

<sample><![CDATA[<element>]]></sample> 

There is a special character sequence that is specific to CDATA sections and is considered markup by the processor. The sequence of two closing square brackets followed by a greater-than symbol (]]>), when found inside an element specified as CDATA, will cause the processor to generate individual CDATA sections, one containing the two ]] and another containing the >. It is not valid for a CDATA section to contain another CDATA section. Note that the > can appear in the LRE as the appropriate character entity reference of &gt; and still be considered part of the disallowed markup. For example, suppose our <sample> LRE contained these specific characters as follows:

<sample>]]&gt;</sample> 

The processor would generate the following element in the output result tree:

<sample><![CDATA[]]]]><!CDATA[>]]></sample> 

While this may look like a plethora of square brackets, note that the first CDATA section contains the two from the original LRE, followed by the closing of that CDATA section. The second CDATA section contains the string in this case, a resolved, predefined entity for the greater-than symbol, > that followed the ]] in the original LRE.

10.1.2.8 The media-type Attribute with XML Output

When using the xml output method, the default for this attribute is the text/xml value, but other values could be substituted, as warranted by the circumstances. For example, if the XML being output is to be used as an application or a data transport mechanism, the media type should be application/xml. Because xml is the default for the method attribute, when there is no method specified, the default value of the media-type attribute is text/xml.

The two MIME media types applicable to XSLT and XML documents are text/xml and application/xml. These two media types are outlined in the W3C specification RFC2376.

Note that if the media-type attribute is set to text/xml, the encoding attribute should not be specified because the charset parameter on the MIME type determines the character encoding method. If the encoding attribute is specified, it will be ignored, as stated in Appendix F of the XML recommendation. The relevant section is shown here:

[The] XML entity [may be] accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher level protocol used to deliver XML. Rules for the relative priority of the internal label and the MIME-type label in an external header, for example, should be part of the RFC document defining the text/xml and application/xml MIME types. In the interests of interoperability, however, the following rules are recommended:

  • If an XML entity is delivered with a MIME type of text/xml, then the charset parameter on the MIME type determines the character encoding method; all other heuristics and sources of information are solely for error recovery.

  • If an XML entity is delivered with a MIME type of application/xml, then the byte-order mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

These rules apply only in the absence of protocol-level documentation; in particular, when the MIME types text/xml and application/xml are defined, the recommendations of the relevant RFC will supersede these rules.

10.1.3 Working with the html Output File Method

Most of the attributes for the <xsl:output> element using the xml output method also apply to the html output method. However, there are three attributes that are not valid or do not apply to HTML because they are XML-specific: omit-xml-declaration, standalone, and cdata-section-elements. The remaining six attributes are discussed in the following sections. First, however, there are specific implementations of the html output method that should be discussed.

Note

There is no specific output type for XHTML,* though it is possible that extension elements for it will be added in the near future. XHTML output uses the same <xsl:output> attributes as the xml output method.

*See the W3C specification for XHTML for more details: http://www.w3.org/MarkUp/.

10.1.3.1 Defaulting to the html Output File Method

There are circumstances under which the XSLT processor will default to the html output method, even when it is not explicitly declared. This is not to say that when the requirements for this are met, an explicit declaration of xml will itself be overridden. The following list is quoted from the W3C specification for XSLT, Section 16, with the quoted material in italics. The output method will default to html if all three of the following requirements are met:

  1. The root node of the result tree has an element child,

  2. The expanded-name of the first element child of the root node (i.e., the document element) of the result tree has local part html (in any combination of upper- and lowercase) and a null namespace URI, and

  3. Any text nodes preceding the first element child of the root node of the result tree contain only whitespace characters.

These three requirements, when taken together as a whole, basically state that if the first element in the output document is the <html> element, the output method will default to html and the XML declaration will be omitted.

The first requirement states that the first node in the output document after the root must be an element and not a PI or comment. The second requirement states that the first element in the output document must be of the element-type name "html" and not case-specific. A "null namespace URI" only reflects the fact that the namespace for HTML is not an XML namespace and does not have a prefix, and is therefore considered an LRE.

Note

As discussed in Chapter 12, namespaces contain two parts: a prefix and a local part. If the namespace is expanded and the prefix is null, then the URI reference for the prefix is also understood to be null, and the element is considered an LRE by the processor. The exceptions to this rule are elements in the XSL namespace name, which do not need to be prefixed, and extension elements, if they are in a portion of the stylesheet that is defaulted with the extension namespace.

The third requirement states that if the first node in the document is text, it must only contain whitespace. In other words, the first object found in the output document cannot be text unless it is whitespace.

10.1.3.2 Using the html Output File Method

When html is the output method, whether explicitly declared with the method attribute in the <xsl:output> element or implicitly defaulted to based on the specific rules outlined in Section 10.1.3.1, the XSLT processor will follow certain guidelines for the output as specified by the XSLT specification:

  1. Non-HTML element tags will be generated in the output just as they are with XML; a starting tag and an ending tag are required.

  2. Any LRE elements that are not recognized as HTML elements will be treated as if they are XML tags.

  3. HTML elements are recognized as HTML elements and sent to the output as required by the version of HTML specified with the version attribute of <xsl:output>.

  4. Empty HTML tags will not have an end tag, and will not have a closing / inside the empty element tag as is done with XML. The empty elements recognized for HTML version 4.0 are: area, base, basefont, br, col, frame, hr, img, input, isindex, link, meta, and param. These elements will appear in the output in their HTML form as <img>, <br>, etc.

  5. HTML is case-insensitive, so the XSLT processor will recognize them in any combination of case.

  6. Like the xml output method, the default handling of special characters for the html method is to escape them, as long as the character has a defined entity reference for HTML. Characters like < and & will normally be sent to the output in their entity reference form of &lt; and &amp;. However, the escaping of special characters in the script and style elements will be turned off.

  7. The script and style elements are analogous to the CDATA sections in XML, and special characters found in these elements should appear as their resolved characters in the output. If CDATA sections are found in these elements, they will also be resolved. For example, the script element may contain a programming-specific process sequence as follows:

    <script>if (a &lt; b) process()</script> or <script><![CDATA[if (a < b) process()]]></script> 

    Both of these inputs would resolve to the following output in html output mode:

    <script>if (a < b) process()</script> 
  8. The special character < will not be escaped if it is found in an attribute because it is a valid character in HTML attributes.

  9. The special character & in an attribute will not be escaped if it is immediately followed by a { character. For example, an element start-tag may be written in the stylesheet as follows:

    <BODY bgcolor='&amp;{{randomrbg}};'> 

    This will be sent to the HTML output as:

    <BODY bgcolor='&{randomrbg};'> 
  10. The HTML 4.0 Recommendation, Section B.2.1, outlines the specific method for escaping non-ASCII characters in URI attribute values when using the html output method

  11. PIs, which in XML are terminated with ?>, are terminated without the closing ? when in html output mode. PIs will be in the form <?PI> in HTML.

  12. A holdover from SGML, Boolean value attributes in HTML can be minimized. This means that the attribute can appear in the element tag without a value, because the only possible value of the attribute is the same as the name of the attribute. Simply specifying the attribute sets its value. The html output method handles Boolean attributes in this way, sending them to the output in minimized form. For example, an element start-tag with a Boolean attribute may be written in the stylesheet as follows:

    <OPTION selected="selected"> 

    The HTML output of this element will be as follows:

    <OPTION selected> 
10.1.3.3 The version Attribute with HTML Output

Unlike the xml output method, the version attribute for the html output method does not generate a version number in the output document. It is used to control which version of HTML the processor will conform to when generating the output.

XSLT processors currently default to HTML version 4.0 to match the latest version of HTML. Unless otherwise stipulated, the absence of the version attribute in the <xsl:output> declaration will implicitly default the resulting HTML document instance to version 4.0.

In some cases, perhaps for the backward-compatibility of some systems, it may be necessary to use an older version of HTML for the output. In these cases, the version attribute will be used to signal the correct version to the processor, and the processor should generate the correct output.

10.1.3.4 The indent Attribute with HTML Output

When used with the html output method, the indent attribute works, depending on the XSLT processor's discretionary implementation, the same as it does with the xml output method. The use of whitespace to indicate the relative hierarchy of the descent of various elements nested one within another can vary from processor to processor, but the role of <xsl:output> is the same for both the xml and html output types. However, the processor will never interpret this attribute in such a way that it alters the default display handling of the HTML by the different browsers.

Note that XT does not support indentation with the html output method.

10.1.3.5 The encoding Attribute with HTML Output

When the html output method is used and the output document contains a <head> element, XSLT processors may generate a <meta> element to declare character set and media type values. The encoding attribute is used to generate the correct value for the charset attribute in the <meta> element. For example, <xsl:output method="html" encoding="UTF-16" /> would produce the following <meta> element in the HTML output:

<META http-equiv="Content-Type" content="text/html" charset="UTF-16"> 

This output assumes that the media-type attribute has not set a MIME output type other than the default of text/html.

Characters that are not supported by the selected encoding type should be escaped using the correct character entity reference, if defined for HTML, or a decimal numeric character reference. It is an error if the unresolvable character appears in a script or style element, or in a comment.

Note that Saxon generates the <meta> element for default as well as explicitly specified values of the encoding attribute, while XT only generates it if the encoding element is specified.

10.1.3.6 The media-type Attribute with HTML Output

As discussed in the previous section, when using the html output method, the XSLT processor should generate a <meta> element in the output document if the <head> element exists. The media-type attribute, when declared using <xsl:output>, is used to reset the default media type value of text/html to some other value, if need be, in the content attribute of the <meta> element.

Note that XT does not support the media-type attribute for the html output method.

10.1.3.7 The doctype-system and doctype-public Attributes with HTML Output

These attributes operate in the same way for the html output method as they do for xml output method (see Section 10.1.2.4). However, the determination of the document element name which is done automatically by the processor for XML output based on the first element in the output document is always going to be either HTML or html. The XSLT processor does not validate the string value of the doctype-system and doctype-public attributes to ensure a proper identifier. However, the correct system identifier to reference the W3C specification is http://www.w3.org/TR/html4/strict.dtd (the version number will be different if the chosen version is something other than 4.0). The correct public identifier for the latest HTML version is -//W3C//DTD HTML 4.01//EN. As with the xml output method for these attributes, if both are declared, the public identifier will come first.

These declarations are less commonly used in HTML than in XML, so this pair of attributes is not likely to be seen as frequently, but their use is recommended to support valid HTML documents. Nonetheless, apart from the automatic identification of the html document element, these attributes work almost identically with either html or xml output method.

10.1.4 Working with the text Output File Method

The text output method, as succinctly described by the W3C specification for XSLT, "outputs the result tree by outputting the string-value of every text node in the result tree in document order without any escaping" (Section 16.3).

This means that each node in the output result tree is examined, all entities are resolved, and the resulting text is sent to the output. If the input contains special characters that are not part of the markup, they are passed on to the output as the resolved characters.

There are only two relevant attributes for the <xsl:output> element using the text output method: encoding and media-type. The remaining attributes, version, omit-xml-declaration, standalone, doctype-public, doctype-system, cdata-section-elements, and indent, are not valid and are ignored by the XSLT processor when the method attribute is set to text.

10.1.4.1 The media-type Attribute with Text Output

The media-type attribute can be used with the text output method to define a media type other than the default for text, which is text/plain. If the output method is explicitly selected as text, a media-type attribute is not required, unless a value other than text/plain is required. An example of using the media-type attribute with a value other than text/plain is as follows:

<xsl:output method="text" media-type="model/vrml"/> 
10.1.4.2 The encoding Attribute with Text Output

The encoding attribute is used to specify which character set the output result tree will have. When used with the text output method, the processor should generate valid encoded text.

The encoding of text is the process of converting the sequence of characters from the input tree, according to the value given for encoding, to generate the resulting sequence of bytes in the output. The default for this attribute is system-dependent.

XSLT processors are not required to support all values of encoding, but they are required to respect both UTF-8 and UTF-16. It is an error if a character that cannot be represented in the encoding value selected is used.

10.2 The <xsl:strip-space> and <xsl:preserve-space> Top-Level Elements

These elements form a complementary pair of XSLT top-level elements whose purpose is to govern the handling of whitespace in the input tree and, by implication, in the output result tree. Both elements have one required attribute, the elements attribute, whose value (tokens) is a whitespace-separated list of element-type names, as shown in the element model definitions for each element below:

<!-- Category: top-level-element --> <xsl:strip-space   elements = tokens /> 
<!-- Category: top-level-element --> <xsl:preserve-space   elements = tokens /> 

The <xsl:preserve-space> top-level element retains whitespace text nodes from the input elements listed in its elements attribute. The <xsl:strip-space> element has the opposite effect of <xsl:preserve-space>. Whitespace text nodes are stripped from the input prior to processing.

The value of the elements attribute contains a list of the elements that will be acted on to preserve the whitespace text nodes they contain, if used with <xsl:preserve-space>, or to strip them, if used with <xsl:strip-space>. All elements in the input document are defaulted to preserve space unless they are explicitly added to the <xsl:strip-space> elements attribute.

If there are conflicts in the matching of elements identified by their element-type name for either of the pair of these top-level elements in a given XSLT stylesheet, the standard rules of precedence discussed in Chapter 7 for <xsl:import> and <xsl:include>, and template rules in general, are used. Any <xsl:preserve-space> or <xsl:strip-space> elements with lower precedence due to being imported are ignored. The default priorities for the names (an element-type name or attribute name receives higher priority) determine precedence.

It should be noted that the <xsl:strip-space> and <xsl:preserve-space> elements do not operate on the text nodes of the specified elements unless those text nodes contain only whitespace. A text node with text other than whitespace will be sent to the output intact. Any excessive whitespace between words in a text node will appear as it did in the input.

For example, if a <block> in our Markup City contained extra whitespace, such as:

<block>1st Street              </block> <block>2nd       Street</block> <block>          3rd Street</block> 

Specifying "block" as the value for the elements attribute of <xsl:strip-space> would not remove the extra spaces. If, however, there were extra line breaks between <block> elements, specifiying "thoroughfare" as the parent of <block> elements would remove the extra line breaks between the <block>s, as shown in Example 10-1

Notice that the whitespace between text in a node is not stripped, but the extra line breaks, which are considered to be whitespace nodes, are stripped.

Example 10-1 Removing line breaks from Markup City.
INPUT: <?xml version="1.0"?>        <thoroughfare>                     <sidestreet>Bob Wallace Avenue</sidestreet>                     <block>1st Street            </block>                     <block>2nd       Street</block>                     <block>          3rd Street</block>                     <sidestreet>Woodridge Street</sidestreet>        </thoroughfare> STYLESHEET: <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                 version="1.0"> <xsl:strip-space elements="thoroughfare"/> <xsl:template match="*"> <xsl:copy-of select="."/> </xsl:template> </xsl:stylesheet> OUTPUT: <?xml version="1.0" encoding="utf-8"?> <thoroughfare><sidestreet>Bob Wallace Avenue</sidestreet><block>1st Street          </block><block>2nd     Street</block><block> 3rd Street</block><sidestreet>Woodridge Street</sidestreet></thoroughfare> 

10.3 Generating Error Messages and Logs

Error messages and log files are particularly important to programmers who need to debug or track changes in the processes of their code. XSLT provides a method, using <xsl:message>, to generate logs or messages, whether they are stored as files or sent to the screen as message windows, based on processor-specific implementations.

10.3.1 The <xsl:message> Instruction Element

The <xsl:message> instruction element is a way for the XSLT processor to communicate "outside" of itself and outside of the XSLT stylesheet being processed. It has one optional attribute, terminate, with a possible value of yes or no, as shown in the following element model definition. The content of the <xsl:message> element is a template, which means that other instruction elements can be used to build the structure of the final message.

<!-- Category: instruction --> <xsl:message 
terminate = "yes" | "no"> <!-- Content: template --> </xsl:message> 

If the terminate attribute is specified with a value of yes, the processor will send the message and terminate the processing of the stylesheet at that point. The default is no, meaning that the processor will continue to operate after the message has been output.

There is some discretion on the part of the XSLT processor as to the implementation of the output context and format for whatever message is to be given when <xsl:message> is triggered. It might appear in a message box on-screen, as a line at the command prompt, or it can be sent to a log file. The <xsl:message> element is triggered when the template rule containing it is instantiated; then the template contained within the <xsl:message> element determines the message displayed.

Using a modified fragment of our Markup City for Example 10-2, we can determine whether <block>s have names in text nodes or attribute nodes. The terminate attribute is set to yes to force the processor to terminate when the <xsl:message> element is activated by finding a <block> without a name attribute.

Example 10-2 Using <xsl:message> to test for content.
INPUT: <?xml version="1.0"?>             <thoroughfare name="Whitesburg Drive">                   <sidestreet>Bob Wallace Avenue</sidestreet>                   <block name="1st Street"></block>                   <block name="2nd Street"></block>                   <block>3rd Street</block>                   <sidestreet> Woodridge Street</sidestreet>             </thoroughfare> STYLESHEET: <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                 version="1.0"> <xsl:template match="//block">       <xsl:choose>             <xsl:when test="@name">                   <xsl:value-of select="@name" />             </xsl:when>             <xsl:otherwise>                   <xsl:message terminate="yes">                         <xsl:text>Unfortunately, </xsl:text>                         <xsl:value-of select="." />                         <xsl:text> is a street name that is a text node, not an attribute, so processing will terminate now.</xsl:text>                   </xsl:message>              </xsl:otherwise>       </xsl:choose> </xsl:template> </xsl:stylesheet> MESSAGE GENERATED: Unfortunately, 3rd Street is a street name that is a text node, not an attribute, so processing will terminate now. 

Using the <xsl:choose> structure, the match attribute of <xsl:when> tests for the presence of a name attribute in each <block>. If there is a name attribute, then its value is selected with <xsl:value-of> and sent to the output result tree. The <xsl:otherwise> element matches any <block> found without a name attribute 3rd Street in this example and the processor activates the <xsl:message>. The value of the text node is inserted into the text message using <xsl:value-of>. Because the terminate attribute on <xsl:message> is set to yes, the process terminates. Depending on the implementation of the processor, the valid content in the output result tree before the <xsl:message> element is activated may or may not be sent to the output.

[1] See http://www.w3.org/TR/xsl/ for more information on XSL Formatting Objects.

[2] A CDATA section is a special kind of marked section that can include markup characters like & and <, without having to literally escape them using &amp; and &lt;.

[3] MIME is a type of external transport protocol like HTTP that provides information to the processor about the data being processed.

[4] See http://www.w3.org/TR/xslt#IANA.

[5] See http://www.w3.org/TR/xslt#RFC2278.

CONTENTS


XSLT and XPATH(c) A Guide to XML Transformations
XSLT and XPATH: A Guide to XML Transformations
ISBN: 0130404462
EAN: 2147483647
Year: 2005
Pages: 18

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net