Another important aspect of writing stylesheets is picking the output method: XML, HTML, text (that is, any kind of text-based document that is not XML or HTML), and so on. In other words, the output method determines the type of document youre creating. By default, the output method is XML, although most processors create HTML documents if they see a <HTML> element. (Some processors also do this if the extension of the document file youre creating is .html.)
Chapter 6 discusses how this works in depth, but Ill look at this topic in overview now. Unless you are sure that the default output type rules of your XSLT processor are doing what they should, its often advisable to set the output type explicitly to match the kind of output document you want, using the <xsl:output> element. The output type can determine, for example, whether the XSLT processor writes the XML processing instruction, <?xml version="1.0"?> , at the beginning of the document, and it can determine the MIME type (such as text/xml or text/html) of documents sent back from an XSLT processor on a Web server to a browser. In addition, if you set the output type to HTML, most XSLT processors recognize that not all elements in HTML need closing as well as opening tags, and so on.
Chapter 6 is about converting from XML to other document types, but Ill take a look at <xsl-output> in overview here because its important to understand when working with stylesheets in general. The following list includes the attributes of <xsl-output> :
cdata-section-elements (optional). Sets the names of those elements whose content you want output as CDATA sections. Set to a whitespace-separated list of QNames .
doctype-public (optional). Specifies the public identifier to be used in the <!DOCTYPE> declaration in the output. Set to a string value.
doctype-system (optional). Specifies the system identifier to be used in the <!DOCTYPE> declaration in the output. Set to a string value.
encoding (optional). Sets the character encoding. Set to a string value.
indent (optional). Specifies whether the output should be indented to show its nesting structure. Set to yes or no.
media-type (optional). Sets the MIME type of the output. Set to a string value.
method (optional). Sets the output format. Set to xml, html, text, or a valid QName .
omit-xml-declaration (optional). Specifies whether the XML declaration should be included in the output. Set to yes or no.
standalone (optional). Specifies whether a standalone declaration should be included in the output and sets its value if so. Set to yes or no.
version (optional). Sets the version of the output. Set to a valid NMToken .
The most-used attribute of this element is method , because thats what you use to set the output tree type you want. The three most common settings are html, xml, and text.
The planets.xsl stylesheet weve been working with doesnt use the <xsl:output> element; it turns out that Ive been relying on the default output rules with that stylesheet. By default, the output type is XML, unless the XSLT processor sees an <HTML> or <html> tag. (Note that this is not a formal requirement, just a convention, so you cant expect all XSLT processors to honor it.) Ive used the <HTML> tag in planets.xsl like this:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/PLANETS"> <HTML> <HEAD> <TITLE> The Planets Table </TITLE> </HEAD> . . .
However, if you remove this tag:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/PLANETS"> <HEAD> <TITLE> The Planets Table </TITLE> </HEAD> . . .
Then this is the kind of output youll get from James Clarks XT. Note the XML processing instruction at the beginning:
<?xml version="1.0" encoding="utf-8"?> <HEAD> <TITLE> The Planets Table </TITLE> </HEAD> . . .
On the other hand, you can explicitly specify the output type as HTML with the <xsl:output> element, even without using the <HTML> element:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <xsl:template match="/PLANETS"> <HEAD> <TITLE> The Planets Table </TITLE> </HEAD> . . .
Heres the output from XT in this casejust an HTML fragment, no XML processing instruction:
<HEAD> <TITLE> The Planets Table </TITLE> </HEAD> . . .
Automatic <meta> Elements Added to HTMLIf you do use the <xml:output method="html"/> element explicitly, some XSLT processors, such as Saxon, add a <meta> element to the output documents <head> element something like this: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> . |
In general, XSLT processors are supposed to realize that certain elements, such as <br> , <img> , <frame> , and so on are empty in HTML. Also, spaces and other characters in URI attribute values are converted as specified in the HTML specification (a space becomes %20 and so on), processing instructions are terminated with > rather than ?>, and the fact that standalone attributes are not assigned a value is recognized.
In this section Im going to use an example that youll see more about in Chapter 6. Im going to look ahead and use the <xsl:copy> element, which youll see in Chapter 3, to create a stylesheet that just makes a copy of any XML document.
I use the match pattern *, which, as mentioned earlier, matches any element, and use the <xsl:copy> element to copy the current element to the output document. This is what the new stylesheet, which just copies the source document to the result document, looks like:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:template> </xsl:stylesheet>
Because this stylesheet is for copying any XML document to a new XML documenteven XHTML documents, which are XML documents that use the <html> tagI explicitly indicate that the output method is XML here. If I didnt do this, copied XHTML documents would not start with the XML declaration:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml"/> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:template> </xsl:stylesheet>
This example copies only elements to the result document, not text nodes, comments, or attributes. Youll see a more complete version of this same stylesheet in Chapter 4.
Remember that XML is the default output method, unless your input document contains an <HTML> or <html> tag. However, even if you are transforming from one XML document to another, its often useful to use the <xsl:output> element to specify, for example, the character encoding (the default is usually UTF-8, the eight-bit Unicode subset), or whether the output document should be indented (which is covered in Chapter 3).
Working with XML FragmentsYou can even work on XML fragments, not just entire XML documents. In that case, you can set the omit-xml- declaration attribute to yes to omit the XML declaration at the beginning of the output tree, as discussed in Chapter 6. |
When you use the XML output method, the output tree is well-formed XML (but there is no requirement that it be valid). There is no requirement that it be a well- formed XML document; it could be an XML external general parsed entity. The content of the output can contain character data, CDATA sections, entity references, processing instructions, comments, and elements. The output must also conform to the XML namespaces declaration.
The text output method is not just for creating plain text; its used for any non-XML, non-HTML text-based format. For example, you can use it to create Rich Text Format (RTF) documents. Rich Text Format uses embedded text-based codes to specify the format of documents, and you can place those text-based codes in documents yourself if you use the text output method.
Heres an example stylesheet (that youll see in Chapter 6) that converts planets.xml into planets.rtf:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:strip-space elements="*"/> <xsl:template match="/PLANETS">{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Courier New;}} \viewkind4\uc1\pard\lang1033\b\f0\fs36 The Planets Table\par \b0\fs20 Name\tab Mass\tab Rad.\tab Day\par <xsl:apply-templates/> \par }</xsl:template> <xsl:template match="PLANET"> <xsl:value-of select="NAME"/> \tab <xsl:value-of select="MASS"/> \tab <xsl:value-of select="RADIUS"/> \tab <xsl:value-of select="DAY"/> \tab \par </xsl:template> </xsl:stylesheet>
You can see the resulting RTF document, planets.rtf, in Figure 2.3 in Microsoft Word 2000.
Note that Ive set the output method to text with <xsl:output method="text"/> :
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:template match="/PLANETS">{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Courier New;}} \viewkind4\uc1\pard\lang1033\b\f0\fs36 The Planets Table\par . . .
You might also note that Ive started the RTF codes immediately after the <xsl:template> element. Ive done that because RTF documents must start with RTF codes from the very beginning; if I had begun inserting RTF codes on the next line, like this:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:template match="/PLANETS"> {\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Courier New;}} \viewkind4\uc1\pard\lang1033\b\f0\fs36 The Planets Table\par . . .
then the RTF output file would have started with a newline character, which would throw off the RTF application like Microsoft Word. Youll learn more on RTF and other formats in Chapter 6.