xsl:output


The <xsl:output> element is a top-level declaration used to control the format of the serialized result document. An XSLT stylesheet is processed conceptually in two stages: The first stage is to build a result tree, and the second is to write out the result tree to a serial output file. The <xsl:output> element controls this second stage, which is often referred to as serialization.

This second stage of processing, to serialize the tree as an output document, is not a mandatory requirement for an XSLT processor; the standard allows the processor to make the tree available in some other way, for example via the DOM API. A processor that does not write the tree to an output file is allowed to ignore this element. Processors that do provide serialization may also allow the definitions in this element to be overridden by parameters set in the API when the processor is invoked. See Appendix D for details of the JAXP API.

Changes in 2.0

An <xsl:output> declaration may be given a name , allowing a named output format to be defined that can be referenced in an <xsl:result-document> instruction.

An output method has been defined for XHTML.

Several new output options have been added: escape-uri -attributes, include- content-type , normalize-unicode , undeclare-namespaces, and use-character-maps.

Format

 <xsl:output   name? = qname   method? = "xml"  "html"  "xhtml"  "text"  qname-but-not-ncname   cdata-section-elements? = qnames   doctype-public? = string   doctype-system? = string   encoding? = string   escape-uri-attributes? = "yes"  "no"   include-content-type? = "yes"  "no"   indent? = "yes"  "no"   media-type? = string   normalize-unicode? = "yes"  "no"   omit-xml-declaration? = "yes"  "no"   standalone? = "yes"  "no"   undeclare-namespaces? = "yes"  "no"   use-character-maps? = qnames   version? = nmtoken /> 

Position

<xsl:output> is a declaration, which means it must be a child of the <xsl:stylesheet> element. It may appear any number of times in a stylesheet.

Attributes

Name

Value

Meaning

name

optional

Lexical QName

Defines a name for this output format, so that it can be referenced in an <xsl:result-document> instruction

method

optional

«xml »

«html »

«xhtml »

«text »

QName

Defines the required output format

cdata-section-elements

optional

Whitespace-separated list of lexical QNames

Names those elements whose text content is to be output in the form of CDATA sections

doctype-public

optional

string

Indicates the public identifier to be used in the DOCTYPE declaration in the output file

doctype-system

optional

string

Indicates the system identifier to be used in the DOCTYPE declaration in the output file

encoding

optional

string

Defines the character encoding

escape-uri-attributes

optional

«yes » «no »

Indicates whether URI-valued attributes in HTML and XHTML should be %HH escaped

include-content-type

optional

«yes » «no »

Indicates whether a <meta> element should be added to the output to indicate the content type and encoding

indent

optional

«yes » «no »

Indicates whether the output should be indented to indicate its hierarchic structure

media-type

optional

string

Indicates the media-type (often called MIME type) to be associated with the output file

normalization-form

optional

see below

Indicates whether and how the Unicode characters in the serialized document should be normalized

omit-xml-declaration

optional

«yes » «no »

Indicates whether an XML declaration is to be included in the output

standalone

optional

«yes » «no »

Indicates that a standalone declaration is to be included in the output, and gives its value

undeclare-namespaces

optional

«yes » «no »

Indicates whether (with XML 1.1 output) namespaces should be undeclared using «xmlns:p="" » when they go out of scope

use-character-maps

optional

Whitespace-separated list of lexical QNames

A list of the names of <xsl: character-map > elements that are to be used for character mapping

version

optional

NMtoken

Defines the version of the output format

Content

None, the element is always empty.

Effect

A stylesheet can contain several output format definitions. This is useful if the stylesheet produces multiple result documents, or if it produces different kinds of output on different occasions. One of the output format definitions can be unnamed, and the others are named using a QName in the same way as other stylesheet objects.

An output definition can be split over several <xsl:output> elements; all the <xsl:output> elements with the same name (specified in the name attribute) constitute one output definition. In this case the attributes defined in these multiple elements are in effect combined into a single conceptual <xsl:output> element as follows :

  • For the cdata-section-elements attribute, the lists of QNames supplied on the separate <xsl:output> elements are merged-if an element name is present in any of the lists, it will be treated as a CDATA section element.

  • For the use-character-maps attribute, the lists of QNames on the separate <xsl:output> elements are concatenated . They are taken in order of the import precedence of the <xsl:output> declarations on which they appear, or where two declarations have the same import precedence, in declaration order.

  • For all other attributes, an <xsl:output> element that specifies a value for the attribute takes precedence over one that leaves it defaulted. If several <xsl:output> elements specify a value for the attribute, the one with highest import precedence is used. If this leaves more than one value (and even if they are identical), the XSLT processor may either report an error, or use the one that occurs last in the stylesheet.

The concepts of import precedence and declaration order are explained under <xsl:import> on page 314.

The method attribute controls the format of the output, and this in turn affects the detailed meaning and the default values of the other attributes.

Four output methods are defined in the specification: «xml », «html », «xhtml », and «text ». Alternatively, the output method may be given as a QName, which must include a non-null prefix that identifies a namespace that is currently in scope. This option is provided for vendor or user extensions, and the meaning is not defined in the standard. A vendor-defined output method can attach its own interpretations to the meanings of the other attributes on the <xsl:output> element, and it can also define additional attributes on the <xsl:output> element, provided they are not in the default namespace.

If the method attribute is omitted, the output will be in XML format, unless the result tree is recognizably HTML or XHTML. The result tree is recognized as HTML if:

  • The root node has at least one element child,

  • The first element child of the root node is named <html>, in any combination of upper and lower case, and has a null namespace URI, and

  • There are no text nodes before the <html> element, other than, optionally , a text node containing whitespace only.

The result tree is recognized as XHTML if:

  • The root node has at least one element child,

  • The first element child of the root node is named <html>, in lower case, and has the namespace URI http://www.w3.org/1999/xhtml , and

  • There are no text nodes before the <html> element, other than, optionally, a text node containing whitespace only.

Rules for XML Output

When the output method is «xml », the output file will usually be a well-formed XML document, but the actual requirement is that it should be either a well-formed XML external general parsed entity, or a well- formed XML document entity, or both.

An external general parsed entity is something that could be incorporated into an XML document by using an entity reference such as «&doc; » . The following example shows a well-formed external general parsed entity that is not a well-formed document:

  A <b>bold</b> and <emph>emphatic</emph> statement  

An example of a well-formed document that is not a well-formed external general parsed entity (because it contains a standalone attribute) is:

  <?xml version="1.0" encoding="utf-8" standalone="yes"?>   <p>A <b>bold</b> and <emph>emphatic</emph> statement</p>  

The rules for document entities and external general parsed entities overlap, as shown in Figure 5-9.

click to expand
Figure 5-9

Essentially , an XSLT stylesheet can output anything that fits in the darker shaded area, which means anything that is a well-formed XML document entity, a well-formed external general parsed entity, or both.

Well, almost anything:

  • It must also conform to the XML Namespaces Recommendation.

  • There is no explicit provision for generating an internal DTD subset, although it can be achieved, with difficulty, by generating text and disabling output escaping, or by using character maps.

  • Similarly, there is no explicit provision for generating entity references, though this can also be achieved by means of character maps.

In the XML standard, the rules for an external general parsed entity are given as:

  extParsedEnt   TextDecl? content  

while the rule for the document entity is effectively:

  document   XMLDecl? Misc* doctypedecl? Misc* element Misc*  

where Misc permits whitespace, comments, and processing instructions.

So the principal differences between the two cases are:

  • A TextDecl (text declaration) is not quite the same thing as an XMLDecl (XML declaration), as discussed below.

  • A document may contain a doctypedecl (document type declaration), but an external general parsed entity must not. A document type declaration is the <!DOCTYPE ... > header identifying the DTD and possibly including an internal DTD subset.

  • The body of a document is an element, while the body of an external parsed entity is content. Here content is effectively the contents of an element but without the start and end tags.

The TextDecl (text declaration) looks at first sight very much like an XML declaration; for example <?xml version="1.0" encoding= "utf-8"?> could be used either as an XML declaration or as a text declaration. There are differences, however:

  • In an XML declaration, the version attribute is mandatory, but in a text declaration it is optional.

  • In an XML declaration, the encoding attribute is optional, but in a text declaration it is mandatory.

  • An XML declaration may include a standalone attribute, but a text declaration may not.

The content part is a sequence of components including child elements, character data, entity references, CDATA sections, processing instructions, and comments, each of which may appear any number of times and in any order.

So the following are all examples of well-formed external general parsed entities:

  <quote>Hello!</quote>   <quote>Hello!</quote><quote>Goodbye!</quote>   Hello!   <?xml version="1.0" encoding="utf-8"?>Hello!  

The following is a well-formed XML document, but it is not a well-formed external general parsed entity, both of the standalone attribute, and the document type declaration. This is also legitimate output:

  <?xml version="1.0" encoding="utf-8" standalone="no"?>   <!DOCTYPE quote SYSTEM "hello.dtd">   <quote>Hello!</quote>  

The following is neither a well-formed XML document nor a well-formed external general parsed entity.

  <?xml version="1.0" encoding="utf-8" standalone="no"?>   <!DOCTYPE quote SYSTEM "hello.dtd">   <quote>Hello!</quote>   <quote>Goodbye!</quote>  

It cannot be an XML document because it has more than one top-level element, and it cannot be an external general parsed entity because it has a <!DOCTYPE> declaration. A stylesheet attempting to produce this output is in error.

The XSLT specification also places two other constraints on the form of the output, although these are rules for the implementor to follow rather than rules that directly affect the stylesheet author. These rules are:

  • The output must conform to the rules of the XML Namespaces Recommendation. If the output is XML 1.0, then it must conform to XML Namespaces 1.0, and if it is XML 1.1, then it must conform to XML Namespaces 1.1.

    If the output is an XML document, the meaning of this is clear enough, but if it is merely an external entity, some further explanation is needed. The standard provides this by saying that when the entity is pulled into a document by adding an element tag around its content, the resulting document must conform with the XML Namespaces rules.

  • The output file must faithfully reflect the result tree. This requirement is easy to state informally, but the specification includes a more formal statement of the requirement, which is surprisingly complex.

    The rule is expressed by describing what may change when the data model is serialized to XML, and then parsed again to create a new data model. Things that may change include the order of attributes and namespace nodes and the base URI. If the parsing stage does DTD or schema validation, then this may cause new attribute or element values to appear, as specified by the DTD or schema. Perhaps the most significant change is that type annotations on element and attribute nodes will not be preserved. Any type annotations in the data model after such a round trip will be based on revalidation of the textual XML document; the type annotations in the original result tree are lost during serialization.

  • Serialization never adds attributes such as xml:base or xsi:type. If you want these present in the output, you must put them in the result tree, just like any other attribute.

Between XSLT 1.0 and XSLT 2.0, there is a change in the way the rules concerning namespace declarations are described. In XSLT 1.0, it was the job of the serializer to generate namespace declarations, not only for namespace nodes explicitly present on the result tree, but also for any namespaces used in the result document, but not represented by namespace nodes. This situation can happen because there is nothing in the rules for <xsl:element> and <xsl:attribute>, for example, that requires namespace nodes to be created on the tree for the namespace URI used in the names of these nodes. At XSLT 2.0, however, the specification describes a process called namespace fixup, which ensures that an element in a result tree always has a namespace node for every namespace that is used either in the element name or in the name of any of its attributes. This means that it is no longer the responsibility of the serializer to create these namespace declarations. The reason for this change is that with XSLT 2.0, the content of temporary trees (including their namespace nodes) becomes visible to the stylesheet, and the namespace nodes had to be present in the tree to make it usable for further processing. Namespace fixup is described under <xsl:element> on page 260.

Although the output is required to be well-formed XML, it is not the job of the serializer to ensure that the XML is valid against either a DTD or a schema. Just because you generate a document type declaration that refers to a specific DTD, or a reference to a schema, don't expect the XSLT processor to check that the output document actually conforms to that DTD. Instead, XSLT 2.0 provides facilities to validate the result tree against a schema before it is serialized, by using the validation or type attribute on <xsl:result-document>.

With the «xml » output method, the other attributes of <xsl:output> are interpreted as follows. Attributes that are not applicable to this output method are not included in the table, and they are ignored if you specify them.

Attribute

Interpretation

cdata-section-elements

This is a list of element names, each expressed as a lexical QName, separated by whitespace. Any prefix in a QName is treated as a reference to the corresponding namespace URI in the normal way, using the namespace declarations in effect on the actual <xsl:output> element where the cdata-section-elements attribute appears; because these are element names, the default namespace is assumed where the name has no prefix

 

When a text node is output, if the parent element of the text node is identified by a name in this list, then the text node is output as a CDATA section. For example, the text value «James » is output as «<![CDATA [James]]> », and the text value «AT&T » is output as «<![CDATA[AT&T]]> » . Otherwise, this value would probably be output as «AT&amp; T » . The XSLT processor is free to choose other equivalent representations if it wishes, for example a character reference, but the standard says that it should not use CDATA unless it is explicitly requested

The CDATA section may be split into parts if necessary, perhaps because the terminator sequence «]]> » appears in the data, or because there is a character that can only be output using a character reference because it is not supported directly in the chosen encoding

doctype-system

If this attribute is specified, the output file should include a document type declaration (that is, <!DOCTYPE> ) after the XML declaration and before the first element start tag. The name of the document type will be the same as the name of the first element. The value of this attribute will be used as the system identifier in the document type declaration

This attribute should not be used unless the output is a well-formed XML document

doctype-public

This attribute is ignored unless the doctype-system attribute is also specified. It defines the value of the public identifier to go in the document type declaration. If no public identifier is specified, none is included in the document type declaration.

encoding

This specifies the preferred character encoding for the output document. All XSLT processors are required to support the values «UTF-8 » and «UTF-16 » (which are also the only values that XML parsers are required to support). This encoding name will be used in the encoding attribute of the XML or Text declaration at the start of the output file, and all characters in the file will be encoded using these conventions. The standard encoding names are not case-sensitive

If the encoding is one that does not allow all XML characters to be represented directly, for example «iso-8859-1 », then characters outside this subset will be represented where possible using XML character references (such as «&#x20A4; » ). It is an error if such characters appear in contexts where character references are not recognized (for example within a processing instruction or comment, or in an element or attribute name)

If the result tree is serialized to a destination that expects a stream of Unicode characters rather than a stream of bytes, then the encoding attribute is ignored. This happens, for example, if you send the output to a Java Writer . It often causes confusion when you use the transformNode() method in Microsoft's MSXML API, which returns the serialized result of the transformation as a string: This is a value of type BSTR, which is encoded in UTF-16 regardless of the encoding you requested in the stylesheet

If you ask for an encoding other than UTF-8 or UTF-16, then the XML rules require that an XML declaration (or text declaration) is written. In this case, requesting «omit-xml-declaration="yes" » has no effect

indent

If this attribute has the value «yes », the idea is that the XML output should be indented to show its hierarchic structure. The XSLT processor is not obliged to respect this request, and if it does so, the precise form of the output is not defined

There are some constraints on how indentation should be achieved. In effect, it can only be done by adding whitespace-only text nodes to the tree, and these cannot be added adjacent to an existing non-whitespace text node. XSLT 2.0 has introduced a rule that the added text node must be adjacent to an element node, that is, immediately before a start tag or after an end tag (in 1.0, it could be added as a child of an empty element)

Note that even with these restrictions, adding whitespace nodes to the output may affect the way the recipient interprets it. This is particularly true with mixed content models where an element can have both elements and text nodes as its children

The processor is not allowed to add whitespace text nodes to the content of an element that has the attribute «xml:space="preserve" »

media-type

This attribute defines the media type of the output file (often referred to as its MIME type). The default value is «text/xml » . The specification doesn't say what use is made of this information: It doesn't affect the contents of the output file, but it may affect the way it is named, stored, or transmitted, depending on the environment. For example, the information might find its way into an HTTP protocol header

normalization-form

One of the controversial features of Unicode has always been that it allows the same character to be represented in more than one way. For example, the letter « § » (lower case «c » with cedilla) can be represented either as a single character (with codepoint x00E7), or as the two codepoints «c » (x0063) and «. » (x00B8). This fact causes considerable problems for software that performs comparison, search, and indexing operations on Unicode text. There have been long debates about whether XML should require such characters to be normalized (that is, to require one of these representations and disallow the other). The result of the debate is a compromise: XML 1.1 strongly encourages the use of normalized encodings. It encourages applications to output normalized text, and encourages parsers to provide an option that checks for normalized text, but it does not go so far as to say that non-normalized documents are not well-formed. XSLT 2.0 responds to this by providing an option to serialize the document in Unicode-normalized form. Specify "NFC" for composed normal form, "NFD" for decomposed normal form, any other supported normalization form, or "none" (the default) for no normalization.

An alternative approach is to make sure that individual text nodes, attribute nodes, and so on are already normalized in the result tree. You can do this by calling the normalize-unicode() function, described in XPath 2.0 Programmer's Reference , whenever you construct a character string that might not already be normalized

omit-xml-declaration

If this attribute has the value «yes », the XSLT processor should not output an XML declaration (or, by implication , a text declaration; recall that XML declarations are used at the start of the document entity, and text declarations are used at the start of an external general parsed entity)

If the attribute is omitted, or has the value «no », then a declaration should be output. The declaration should include both the version and encoding attributes (to ensure that it is valid both as an XML declaration and as a text declaration). It should include a standalone attribute only if a standalone attribute is specified in the <xsl:output> element

If you select an encoding such as «iso-8859-1 », the output may not be intelligible to an XML parser if the XML declaration is omitted. Nevertheless, the specification allows you to omit it, because there are sometimes alternative ways for an XML parser to determine the encoding.

standalone

If this attribute is set to «yes », then the XML declaration will specify «standalone="yes" »

If this attribute is set to <<no>>, then the XML declaration will specify «standalone="no" »

If the attribute is omitted, then the XML declaration will not include a standalone attribute. This will make it a valid text declaration, enabling its use in an external general parsed entity.

This attribute should not be used unless the output is a well-formed XML document

undeclare-namespaces

This attribute only comes into effect when you specify «version="1.1" » . XML Namespaces 1.1 introduces the ability to "undeclare" a namespace. It's always been possible to undeclare the default namespace, using the syntax «xmlns="" », but now you can also undeclare a namespace with a specific prefix, using the syntax «xmlns:pfx="" » .

The result tree created by an XSLT 2.0 processor may have a namespace that is in scope for a particular element, but is not in scope for its children. In fact, this is likely to be a common scenario. The correct way to serialize such a tree is to generate namespace undeclarations on the child elements. However, XSLT 2.0 does not do this by default, because these undeclarations may cause a lot of unwanted clutter in the output document. Instead, you have to request them explicitly by setting this attribute to «yes »

use-character-maps

This attribute is used to specify that the serializer should use the named character maps to translate specific characters into the strings given in the character map. For further details, see <xsl:character-map> on page 229

version

The version of XML used in the output document. This can be «1.0 » . or «1.1 » . The XSLT specification doesn't require either of these versions to be supported: The thinking is that early in the life of XSLT 2.0, many implementations will only support XML 1.0, but in five years time, there may be vendors who would only want to support XML 1.1. As already mentioned, support for a particular version of XML also implies support for the corresponding version of XML Namespaces

The XPath data model, and therefore the result tree, is not tied to a particular version of XML. It supports the union of what can be represented in XML 1.0 and XML 1.1. This creates the possibility that the result tree uses features that cannot be represented faithfully (or at all) in XML 1.0. If such features are used, the serializer may need to fall back to a 1.0 representation, or if all else fails, report an error

Do remember that the specifications in the <xsl:output> element will only be effective if you actually use the XSLT processor to serialize the XML. If you write the output of the transformation to a DOM, and then use a serializer that comes with your DOM implementation (for example by using the save method or the xml property in the case of the Microsoft DOM implementation), then the <xsl:output> specifications will have no effect.

Rules for HTML Output

When the method attribute is set to «html », or when it is defaulted and the result tree is recognized as representing HTML, the output will be an HTML file. By default, it will follow the rules of HTML 4.0.

Requesting HTML serialization gives no guarantee that the result will be valid HTML. You can use any elements and attributes you like in the result tree, and the serializer will output them, following the HTML conventions where appropriate, but without enforcing any rules as to which elements can be used where.

HTML is output in the same way as XML, except where specific differences are noted. These differences are:

  • Certain elements are recognized as empty elements. They are recognized in any combination of upper and lower case. These elements are output with a start tag and no end tag. For HTML 4.0 these elements are:

     <area> <base> <basefont> <br> <col> <frame> <hr> <img> <input> <isindex> <link> <meta> <param> 
  • The <script> and <style> elements (again in any combination of upper and lower cases) do not require escaping of special characters. In the text content of these elements, a «< » character will be output as «< », not as «&lt; » .

  • HTML attributes whose value is a URI (for example, the href attribute of the <a> element, or the src attribute of the <img> element) are recognized, and special characters within the URI are escaped as defined in the HTML specification. Specifically, non-ASCII characters in the URI should be represented by converting each byte of the UTF-8 representation of the character to «%HH » where HH represents the byte value in hexadecimal. This feature may be suppressed by setting «escape-uri-attributes="no" » .

  • Special characters may be output using entity references such as «&eacute; » where these are defined in the relevant version of HTML. This is at the discretion of the XSLT processor; it doesn't have to use these entity names.

  • Processing instructions are terminated with «> » rather than «?> ». Processing instructions are not often used in HTML, but the HTML 4.0 standard recommends that any vendor extensions should be implemented this way, rather than by adding element tags to the language. So it is possible they will be seen more frequently in the future.

  • Attributes that are conventionally written with a keyword only, and no value, will be recognized and output in this form. Common examples are <TEXTAREA READONLY> and <OPTION SELECTED> . This is shorthand, permitted in SGML but not in XML, for an attribute that has only one permitted value, which is the same as the attribute name. In XML, these tags must be written as <TABLE BORDER="BORDER"> and <OPTION SELECTED="SELECTED"> . The HTML output method will normally use the abbreviated form, as this is the only form that older HTML browsers will recognize.

  • The special use of the ampersand character in dynamic HTML attributes is recognized. For example, the tag <TD WIDTH="&{width};"> is correct HTML, though it would not be correct in XML, because of the ampersand character. To produce this output from a literal result element, the tag in the stylesheet would need to be written as <TD WIDTH="&amp;{{width}};">: note the double curly braces, to prevent them being interpreted with their special meaning in attribute value templates.

A common source of anxiety with HTML output is the use of ampersands in URLs. For example, suppose you want to generate the output:

  <a href="http://www.acme.com/search.asp?product=widgets&country=spain>   Spanish Widgets   </a>  

It's not actually possible to produce this using standard XSLT; the ampersand will always come out as «&amp; » . The reason for this is simple: «& », although commonly used and widely accepted, is not actually correct HTML, and according to the standard it must be escaped as «&amp; » . All respectable browsers accept the correct escaped form, so the answer is: don't worry about it.

The attributes on the <xsl:output> element are interpreted as follows when HTML output is selected. Attributes not listed are not applicable to HTML, and are ignored.

Attribute

Interpretation

doctype-system

If this attribute is specified, the output file will include a document type declaration immediately before the first element start tag. The name of the document type will be «HTML » or «html » . The value of the attribute will be used as the system identifier in the document type declaration

doctype-public

If this attribute is specified, the output file will include a document type declaration immediately before the first element start tag. The name of the document type will be «HTML » or «html » . The value of the attribute will be used as the public identifier in the document type declaration

encoding

This specifies the preferred character encoding for the output document

If the encoding is one that does not allow all XML characters to be represented directly, for example «iso-8859-1 », then characters outside this subset will be represented where possible using either entity references or numeric character references. The processor is encouraged not to use such references for characters that are within the encoding, except in special cases such as the nonbreaking space character, which may be output either as itself (it looks just like an ordinary space) or as «&#160; », «&#a0; », or «&nbsp; » . It is an error if characters that can't be represented directly appear in contexts where character references are not recognized (for example within a script element, within a comment, or in an element or attribute name).

escape-uri-attributes

This attribute determines whether non-ASCII characters appearing in URI-valued attributes should be escaped using the %HH convention. The default is «yes » . Although HTML requires URIs to be escaped in this way, there are several reasons why you might choose to suppress this. Firstly, the URIs might already be in escaped form: You can do the escaping from within the stylesheet, with much greater control, using the escape-uri() function described in XPath 2.0 Programmer's Reference . Secondly, browsers do not always handle escaped URIs correctly. This is especially true when the URI is handled on the client side, for example when it invokes JavaScript functions, or when it contains a fragment identifier

include-content-type

If this attribute is set to «yes » (or if it is omitted), the serializer will add a <meta> element as a child of the HTML <head> element, provided that the result tree contains a <head> element. This <meta> element contains details of the media type and encoding of the document. You may want to suppress this by specifying the value «no », for example if the stylesheet is copying a document that already includes such an element

indent

If this attribute has the value «yes », the idea is that the HTML output should be indented to show its hierarchic structure. The XSLT processor is not obliged to respect this request, and if it does so, the precise form of the output is not defined

When producing indented output, the processor has much more freedom to add or remove whitespace than in the XML case, because of the way whitespace is handled in HTML. The processor can add or remove whitespace anywhere it likes so long as it doesn't change the way a browser would display the HTML.

media-type

This attribute defines the media type of the output file (often referred to as its MIME type). The default value is «text/html » . The specification doesn't say what use is made of this information; it doesn't affect the contents of the output file, but it may affect the way it is named, stored, or transmitted, depending on the environment. For example, the information might find its way into an HTTP protocol header

normalization-form

This attribute is used in the same way as for the XML output method, described on page 378

use-character-maps

This attribute is used in the same way as for the XML output method, described on page 378

version

The version of HTML used in the output document. It is up to the implementation to decide which versions of HTML should be supported, though all implementations can be expected to support the default version, namely version 4.0

Rules for XHTML Output

An XHTML document is an XML document, so when you specify «method="XHTML" », most of the rules for the XML output method are inherited without change. However, there are special guidelines for serializing XHTML so that it is rendered correctly in browsers that were designed originally to handle HTML; and in addition some of the features of HTML serialization, such as URI escaping and addition of <meta> elements, are also applicable to XHTML. So the XHTML output method is essentially a blend of features from the XML and HTML methods.

In fact, the XHTML output method works in the same way as the XML output method (and uses all the <xsl:output> attributes that control the XML method) with specific exceptions. These exceptions are:

  • The way that empty elements are output depends on the way the element is declared in the XHTML DTD. For an element whose content model is empty, such as <hr> or <br> or <img>, the serializer should use an XML empty-element tag, taking care to include a space before the final «/> », so that the tag looks like <hr /> or <img src="a.jpeg" /> . For an element that is empty but allowed to have content, such as a <p> element, the serializer should use a start tag followed by an end tag, thus: <p></p>

  • The entity reference «&apos; » is not recognized by all browsers, so it should be avoided. It is always possible to use «&#27; » instead.

  • The serializer needs to take care with whitespace (for example newlines) appearing in attribute values. The specification doesn't say exactly how this should be handled, but it's probably safest, if there is any whitespace other than a single space character, to represent it using numeric character references.

The specification points out that the XHTML DTD requires not only that the <html> element is in the namespace « http://www.w3.org/1999/xhtml », but also that this must be the default namespace. This is actually a bit of a fudge. Namespace prefixes are allocated by the namespace fixup process, not by the serializer. If the namespace fixup process has allocated a nondefault prefix to the <html> element, there is very little that the serializer can do about it; while the namespace fixup process, of course, isn't supposed to know anything about how the tree will be serialized. In practice, though, this is unlikely to cause problems; XSLT processors don't go out of their way to invent prefixes for namespaces when they don't need to. This is all a messy consequence of the fact that DTDs don't really deal properly with namespaces.

The XHTML output method also inherits two specific features of the HTML output method:

  • Non-ASCII characters in URI-valued attributes are escaped using the %HH convention, unless you suppress this by specifying «escape-uri-attributes="no" » .

  • A <meta> element is added as the child of the <head> element, unless you suppress this using «include-content-type="no" » .

You can control whether an XML declaration is output using the «omit-xml-declaration » attribute. The XHTML 1.0 specification advises against using an XML declaration, but points out that under the XML rules, it may be omitted only if the encoding is UTF-8 or UTF-16.

Rules for Text Output

When «method="text" », the result tree is output as a plain text file. The values of the text nodes of the tree are copied to the output, and all other nodes are ignored. Within text nodes, all character values are output using the relevant encoding as determined by the encoding attribute; there are no special characters such as «& » to be escaped.

The way in which line endings are output (for example LF or CRLF) is not defined; the implementation might choose to use the default line-ending conventions of the platform on which it is running.

The attributes that are relevant to text output are listed below. All other attributes are ignored.

Attribute

Interpretation

Encoding

This specifies the preferred character encoding for the output document. The default value is implementation-defined, and may depend on the platform on which it is running

If the encoding is one that does not allow all XML characters to be represented directly, for example «iso-8859-1 », then any character outside this subset will be reported as an error

media-type

This attribute defines the media type of the output file (often referred to as its MIME type). The default value is «text/plain ». The specification doesn't say what use is made of this information: It doesn't affect the contents of the output file, but it may affect the way it is named, stored, or transmitted, depending on the environment. For example, the information might find its way into an HTTP protocol header

Usage

The defaulting mechanisms ensure that it is usually not necessary to include an <xsl:output> element in the stylesheet. By default, the XML output method is used unless the first thing output is an <HTML> element, in which case the HTML output method is assumed.

The <xsl:output> element is concerned with how your result tree is turned into an output file. If the XSLT processor allows you to do something else with the result tree, for example passing it to the application as a DOM Document or as a stream of SAX events, then the <xsl:output> element is irrelevant.

The encoding attribute can be very useful to ensure that the output file can be easily viewed and edited. Unfortunately, though, the set of possible values varies from one XSLT implementation to another, and may also depend on the environment. For example, many XSLT processors are written in Java and use the Java facilities for encoding the output stream, but the set of encodings supported by each Java VM is different. However, support for iso-8859-1 encoding is fairly universal, so if you have trouble viewing the output file because it contains UTF-8 Unicode characters, setting the encoding to iso-8859-1 is often a good remedy.

Important  

If your stylesheet generates accented letters or other special characters, and it looks as if they have come out incorrectly in the output, chances are they are correctly represented in UTF-8, but you are looking at them with a text editor that doesn't understand UTF-8. Either select a different output encoding (such as iso-8859-1), or get a text editor such as jEdit (www. jedit .org) that can work with UTF-8.

The encoding attribute determines how the XSLT processor serializes the output as a stream of bytes, but it says nothing about what happens to the byteslater. If the processor writes to a file, the file will probably be written in the chosen encoding. But if the output is accessed as a character string through an API, or is written to a character field in a database, the encoding of the characters may be changed before you get to see them. A classic example of this effect is the Microsoft transformNode() interface (see Appendix C), which returns the result of the transformation as a BSTR string. Because this is a BSTR, it will always be encoded in UTF-16, regardless of the encoding you request. The same thing will happen with the JAXP interface (see Appendix D); if you supply a StreamResult based on a Writer the encoding then depends on how the particular Writer encodes Unicode characters, and the XSLT processor has no control over the matter.

Examples

The following example requests XML output using iso-8859-1 encoding. The output will be indented for readability, and the contents of the <script> element, because it is expected to contain many special characters, will be output as a CDATA section. The output file will reference the DTD booklist.dtd: Note that it is entirely the user's responsibility to ensure that the output of the stylesheet actually conforms to this DTD, and, indeed, that it is a well-formed XML document.

  <xsl:output   method="xml"   indent="yes"   encoding="iso-8859-1"   cdata-section-elements="script"   doctype-system="booklist.dtd" />  

The following example might be used if the output of the stylesheet is a comma-separated-values file using US ASCII characters only:

  <xsl:output   method="text"   encoding="us-ascii" />  

See Also

  • <xsl:character-map> on page 229

  • <xsl:result-document> on page 414




XSLT 2.0 Programmer's Reference
NetBeansв„ў IDE Field Guide: Developing Desktop, Web, Enterprise, and Mobile Applications (2nd Edition)
ISBN: 764569090
EAN: 2147483647
Year: 2003
Pages: 324

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net