2.1 Outputting Text

You can put plain, literal text into an XSLT template, and it will be written to a result tree when the template containing the text is processed. You saw this work in the very first example in the book (msg.xsl in Chapter 1). I'll go into more detail about adding literal text in this section.

Look at the single-element document text.xml in examples/ch02 (this directory is where all example files mentioned in this chapter can be found):

<?xml version="1.0"?>    <message>You can easily add text to your output.</message>

With text.xml in mind, consider the stylesheet txt.xsl:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="text"/>    <template match="/">Message: <apply-templates/></template> </stylesheet>

When applied to text.xml, here is what generally happens, although the actual order of events may vary internally in a processor:

  1. The template rule in txt.xsl matches the root node (/), the beginning point of the source document.

  2. The implicit, built-in template for elements then matches message.

  3. The text "Message: " (including one space) is written to the result tree.

  4. apply-templates processes the text child node of a message using the built-in template for text.

  5. The built-in template for text picks up the text node "You can easily add text to your output."

  6. The output is serialized.

Apply txt.xsl to text.xml using Xalan:

xalan text.xml txt.xsl

This gives you the following output:

Message: You can easily add text to your output.

The txt.xsl stylesheet writes the little tidbit of literal text, "Message: ", from its template onto the output, and also grabs some text out of text.xml, and then ultimately puts them together in the result tree. You can do the same thing with the XSLT instruction element text.

2.1.1 Using the text Element

Instead of literal text, you can use XSLT's text instruction element to write text to a result tree. Instruction elements, you'll remember, are elements that are legal only inside templates. Using the text element gives you more control over result text than literal text can.

The template rule in lf.xsl contains some literal text, including whitespace:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="text"/>    <template match="/">Message:   <apply-templates/> </template>    </stylesheet>

When you apply lf.xsl to text.xml with Xalan like this:

xalan text.xml lf.xsl

the whitespace a linefeed and some space is preserved in the result:

Message:   You can easily add text to your output.

The XSLT processor sees the whitespace in the stylesheet as literal text and outputs it as such. The XSLT instruction element text allows you to take control over the whitespace that appears in your template.

In contrast, the stylesheet text.xsl uses the text instruction element:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="text"/>    <template match="/">  <text>Message: </text>   <apply-templates/> </template>    </stylesheet>

When you insert text like this, the only whitespace that is preserved is what is contained in the text element a single space. Try it to see what happens:

xalan text.xml text.xsl

This gives you the same output you got with txt.xsl, with no hidden whitespace:

Message: You can easily add text to your output.

Back in the stylesheet txt.xsl, recall how things are laid out in the template element:

<template match="/">Message: <apply-templates/></template>

The literal text "Message: " comes immediately after the template start tag. The reason is that if you use any literal text that is not whitespace in a template, an XSLT processor interprets adjacent whitespace in the template element as significant. Any whitespace that is considered significant is preserved and sent along to output.

To see more of how whitespace effects literal text in a result, look at the stylesheet whitespace.xsl:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="text"/>    <template match="/"> Message:     <apply-templates/>                                       ...including whitespace! </template>    </stylesheet>

Now, process it against text.xml to see what happens:

xalan text.xml whitespace.xsl

Observe how the whitespace is preserved, both from above and below the apply-templates element:

Message:     You can easily add text to your output.                                       ...including whitespace!

If no nonwhitespace literal text follows apply-templates (that is, if you removed "...including whitespace!" from within template in whitespace.xsl), the latter whitespace would not be preserved.

Whitespace is obviously hard to see. I recommend that you make a copy of whitespace.xsl and experiment with whitespace to see what happens when you process it.

Netscape and Mozilla, by the way, preserve the whitespace-only text nodes in output from whitespace.xsl, but IE does not. Use whitespace-pi.xml to test this in a browser if you like, but keep in mind that such output can vary as browser versions increment upward.


If you use text elements, the other whitespace within template elements becomes insignificant and is discarded when processed. You'll find that whitespace is easier to control if you use text elements. The control.xsl stylesheet uses text elements to handle the whitespace in its template:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="text"/>    <template match="/">  <text>Message: </text>  <text>     </text>  <text>  </text>   <apply-templates/>  <text>                      ...and whitespace, too!</text> </template>    </stylesheet>

The control.xsl stylesheet has four text elements, two of which contain only whitespace, including one that inserts a pair of line breaks. Because you can see the start and end tags of text elements, it becomes easier to judge where the whitespace is, making it easier to control. To see the result, process it with text.xml:

xalan text.xml control.xsl

As an alternative, you could also insert line breaks by using character references, like this:

<text>&#10;&#10;</text>

This instance of the text element contains character references to two line breaks in succession. A character reference begins with an ampersand (&) and ends with a semicolon (;). In XML, you can use decimal or hexadecimal character references. The decimal character reference &#10; represents the linefeed character using the decimal number 10, preceded by a pound sign (#). A hexadecimal character reference uses a hexadecimal number preceded by a pound sign and the letter x (#x). You can also use &#x000A; or &#xA;, which are equivalent hexadecimal character references to the decimal reference &#10;.

Why Linefeeds?

You might be wondering why I use a linefeed line-end character (&#10;) instead of a carriage return (&#13;) or carriage return/linefeed combination. The reason is because when a document is processed with a compliant XML processor, the line ends are all changed to linefeeds anyway. In other words, if an XML processor encounters a carriage return or a carriage return/linefeed combination, these characters are converted into linefeeds during processing. You can read about this in Section 2.11 of the XML specification.


2.1.1.1 The disable-output-escaping attribute

The text element has one optional attribute: disable-output-escaping. XSLT does not require processors to support this attribute (see Section 16.4 of the XSLT specification), but most do. This attribute can have one of two values, either yes or no. The default is no, meaning the same whether the disable-output-escaping attribute is not present or if its value is no. What does this attribute do? Hang on this is going to take a bit of explaining.

In XML, some characters are forbidden in certain contexts. Two notable characters that fit into this category are the left angle bracket or less-than sign (<) and the ampersand (&). It's fine to use these characters in markup, such as when beginning a tag with <. You can't, however, use a < in character data (the strings that appear between tags) or in an attribute value. The reason is that the < is a road sign to an XML processor. When an XML processor munches on an XML document, if it sees a <, it says in effect, "Oh. We're starting a new tag here. Branch to the code that handles that." Therefore, you can see why we aren't allowed to use < directly in XML, except in markup.

There is a way out, though. XML provides several ways to represent these characters by escaping them with an entity or character reference whenever you want to use them where they are normally not allowed. Escaping a character essentially hides it from the processor. The most common way to escape characters like < and & is by referencing predefined entities. You'll find XML's built-in, predefined entity references listed in Table 2-1.

Table 2-1. Predefined entities in XML 1.0

Character

Entity reference

Numeric character reference

< (less-than)

&lt;
&#60;

& (ampersand)

&amp;
&#38;

> (greater-than)

&gt;
&#62;

" (quotation)

&quot;
&#34;

' (apostrophe)

&apos;
&#39;

The greater-than entity is provided so that XML can be compatible with Standard Generalized Markup Language (SGML). The > character alone is permissible in character data and in attribute values, escaped or not. (For SGML compatibility, you always need to escape the > character if it appears as part of the sequence ]]> , which is used to end CDATA sections. CDATA sections are described in more detail in Chapter 3.)

XML, by the way, is a legal subset of SGML, an international standard. SGML is a product of the International Organization for Standardization (ISO), and you can find the SGML specifications on the ISO web site, http://www.iso.ch. But have your credit card ready: you have to pay for most ISO specifications (sometimes dearly), unlike W3C specifications, which are free to download.


The &quot; and &apos; entities allow you to include double and single quotes in attribute values. A second matching quote should indicate the close of an attribute value. If not escaped, a misplaced matching quote signals a fatal error, if not followed by well-formed markup. (See Section 1.2 of the XML specification.) I say matching because if an attribute value is surrounded by double quotes, it can contain single quotes in its value (as in "'value'"). The reverse is also true, that is, single quotes can enclose double quotes ('"value"').

You have to escape an ampersand in character content because the ampersand itself is used to escape characters in entity and character references! If that's confusing, a few examples should clear things up. I'll now show you how the disable-output-escaping attribute works.

The little document escape.xml contains the name of a famous publisher:

<title>O'Reilly</title>

The stylesheet noescape.xsl adds some new text to this title using the default, which is to not disable output escaping:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="xml" omit-xml-declaration="yes"/>    <template match="/">  <publisher xmlns="">   <value-of select="title" xmlns="http://www.w3.org/1999/XSL/Transform"/>   <text disable-output-escaping="no" xmlns="http://www.w3.org/1999/XSL/Transform"> &amp; Associates</text>  </publisher> </template>    </stylesheet>

noescape.xsl uses the xml output method. You can't see the effect of output escaping when the output method is text, so you have to use either the xml or html methods. You'll learn more about output methods later in this chapter and in Chapter 3.

This stylesheet also redeclares the XSLT namespace several times (on the value-of and text elements). You'll see how to circumvent this cumbersome practice with a namespace prefix in "Adding a Namespace Prefix," later in this chapter.

To see output escaping in action, process escape.xml with this command:

xalan escape.xml noescape.xsl

Here is the result:

<publisher>O'Reilly &amp; Associates</publisher>

disable-output-escaping with a value of no has the same effect as having no attribute at all, that is, the output is escaped and &amp; is preserved in the result.

The following stylesheet, escape.xsl, disables output escaping:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="xml" omit-xml-declaration="yes"/>    <template match="/">  <publisher xmlns="">   <value-of select="title" xmlns="http://www.w3.org/1999/XSL/Transform"/>   <text disable-output-escaping="yes" xmlns="http://www.w3.org/1999/XSL/Transform"> &amp; Associates</text>  </publisher> </template>    </stylesheet>

Process this:

xalan escape.xml escape.xsl

and you get:

<publisher>O'Reilly & Associates</publisher>

In escape.xsl, escaping is turned off so that &amp; is not preserved. You get only the ampersand in the result. The publisher element, which appears in both escape.xsl and noescape.xsl, is a literal result element. Let me explain what that is.



Learning XSLT
Learning XSLT
ISBN: 0596003277
EAN: 2147483647
Year: 2003
Pages: 164

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net