Gimme Some Space

 
 
xslt for dummies
Chapter 13 - Gimme Some Space and Other Output Issues
XSLT For Dummies
by Richard Wagner
Hungry Minds 2002
  

Whitespace is a term used to describe those invisible characters inside a document. You know, all those characters that you never see, but you know theyre there, such as spaces, tabs, carriage returns, and line feeds. Theyre kind of like those creepy microscopic creatures you see on PBS specials that supposedly crawl all over you and me. Yuck! Im itching all over just thinking about it let me change the subject and talk about something much more pleasant: whitespace.

Whitespace is one of those tricky issues in XSLT, because so many variables that determine what whitespace appears in the result tree are at play. Whitespace has origins in the XML source, the template rules of the XSLT stylesheet, and specific space- related XSLT elements, such as xsl: strip-space .

And if that isnt complicated enough, in what must be nothing more than a sick joke, some XSLT processors handle whitespace quite differently from others, which can result in varying outputs.

 Warning   msxsl3, the 3.0 version of Microsofts msxsl processor, is particularly problematic in how it deals with whitespace. Unlike other processors, such as Saxon, whitespace is automatically stripped out by default, both from the original XML source document and the XSLT stylesheet. In fact, trying to figure out how to even add whitespace back into it becomes a frustrating exercise. Happily, with version 4.0 of the msxsl processor, this default behavior has been changed to reflect what youd expected from standard XSLT processors.

Fortunately, in many cases, whitespace in the result document is not that significant of an issue. Nonetheless, when you are trying either to preserve a specific format or to format the result document in a specific manner, then knowing how to work with whitespace becomes important.

Whitespace in XSLT stylesheets

The general rule of thumb is that, inside the XSLT stylesheet, whitespace is stripped out of the template before any transformation occurs. However, you can make sure the processor preserves the whitespace based on how you work with the text nodes and xsl:text instructions.

Whitespace in text nodes

Whitespace in text nodes is normally ignored, but when a text node contains nonwhitespace characters, then whitespace characters are automatically preserved. To demonstrate how this works, take a look at my sample XML source in Listing 13-1.

Listing 13-1: afifilms.xml
start example
 <!-- American Film Institute Top 10 Films --> <!-- afifilms.xml --> <topfilms createdby="AFI"> <film place="1" date="1941">Citizen Kane</film> <film place="2" date="1942">Casablanca</film> <film place="3" date="1972">The Godfather</film> <film place="4" date="1939">Gone With The Wind</film> <film place="5" date="1962">Lawrence Of Arabia</film> <film place="6" date="1939">The Wizard Of Oz</film> <film place="7" date="1967">The Graduate</film> <film place="8" date="1954">On The Waterfront</film> <film place="9" date="1993">Schindler's List</film> <film place="10" date="1952">Singin' In The Rain</film> </topfilms> 
end example
 

Suppose I want to use this list of the American Film Institutes top ten films to generate a list of each film and the date it was made. I can create such a result with this code:

 <xsl:template match="film"> <xsl:apply-templates/> <xsl:value-of select="@date"/> </xsl:template> 

The template rule then generates the following result:

 Citizen Kane1941 Casablanca1942 The Godfather1972 Gone With The Wind1939 Lawrence Of Arabia1962 The Wizard Of Oz1939 The Graduate1967 On The Waterfront1954 Schindler's List1993 Singin' In The Rain1952 

Although you can easily forget about it, a text node is actually between the xsl:apply-templates and xsl:value-of instructions. However, because all the characters that the text node contains are whitespace (carriage return, line feed), the text node is ignored in the output. Therefore, the following two ways of expressing the code produce the same output:

 <xsl:apply-templates/> <xsl:value-of select="@date"/> 

and

 <xsl:apply-templates/><xsl:value-of select="@date"/> 

To make the output more readable, I can add literal text between these two instructions to make each list item into a sentence . The new template rule looks like:

 <xsl:template match="film"> <xsl:apply-templates/> was made in <xsl:value-of select="@date"/> </xsl:template> 

This revised template produces the following result:

 Citizen Kane was made in 1941 Casablanca was made in 1942 The Godfather was made in 1972 Gone With The Wind was made in 1939 Lawrence Of Arabia was made in 1962 The Wizard Of Oz was made in 1939 The Graduate was made in 1967 On The Waterfront was made in 1954 Schindler's List was made in 1993 Singin' In The Rain was made in 1952 

However, imagine that I alter the text between the xsl:apply-templates and xsl:value-of instructions in the template rule by adding line break between the text node:

 <xsl:template match="film"> <xsl:apply-templates/> was made in <xsl:value-of select="@date"/> </xsl:template> 

The results in this case show the line break:

 Citizen Kane was made in 1941 Casablanca was made in 1942 The Godfather was made in 1972 Gone With The Wind was made in 1939 Lawrence Of Arabia was made in 1962 The Wizard Of Oz was made in 1939 The Graduate was made in 1967 On The Waterfront was made in 1954 Schindler's List was made in 1993 Singin' In The Rain was made in 1952 

The XSLT processor cant ignore the line break in this template rule because nonwhitespace characters appear in the same text node. The whitespace characters are all preserved along with the adjoining nonwhitespace characters.

Whitespace inside xsl:text

Any whitespace appearing inside the xsl:text element is automatically preserved, making it a good tool to control exactly what whitespace you want to appear in the result document. For example, if I want to add an XML comment to precede each item in the list, I add an xsl:comment instruction to the template. (I discuss the use of xsl:comment later in the chapter.) But if I use the following snippet, the comment appears on the same line as the list entry:

 <xsl:template match="film"> <xsl:comment>List entry</xsl:comment> <xsl:apply-templates/> was made in <xsl:value-of select="@date"/> </xsl:template> 

The result is:

 <!--List entry-->Citizen Kane was made in 1941 <!--List entry-->Casablanca was made in 1942 <!--List entry-->The Godfather was made in 1972 <!--List entry-->Gone With The Wind was made in 1939 <!--List entry-->Lawrence Of Arabia was made in 1962 <!--List entry-->The Wizard Of Oz was made in 1939 <!--List entry-->The Graduate was made in 1967 <!--List entry-->On The Waterfront was made in 1954 <!--List entry-->Schindler's List was made in 1993 <!--List entry-->Singin' In The Rain was made in 1952 

Just as you found out in the preceding section, whitespace is ignored between </xsl:comment> and <xsl:apply-templates> tags. Therefore, to add a line break between the comment and line text and after each item, you need to use xsl:text :

 <xsl:template match="film"> <xsl:comment>List entry</xsl:comment><xsl:text> </xsl:text> <xsl:apply-templates/> was made in <xsl:value-of select="@date"/><xsl:text> </xsl:text> </xsl:template> 

So, even though the xsl:text instruction contains nothing but a carriage return, the XSLT processor preserves it because whitespace that falls between the start and end tags of xsl:text is considered significant. The text generated is as follows :

 <!--List entry--> Citizen Kane was made in 1941 <!--List entry--> Casablanca was made in 1942 <!--List entry--> The Godfather was made in 1972 <!--List entry--> Gone With The Wind was made in 1939 <!--List entry--> Lawrence Of Arabia was made in 1962 <!--List entry--> The Wizard Of Oz was made in 1939 <!--List entry--> The Graduate was made in 1967 <!--List entry--> On The Waterfront was made in 1954 <!--List entry--> Schindler's List was made in 1993 <!--List entry--> Singin' In The Rain was made in 1952 

Whitespace in source XML documents

When creating XML documents, I often want to visually show the hierarchy of the document structure by indenting each level, but I dont want this whitespace to actually show up in my result tree. Although this seems logical, it causes problems with the XML processor, because it doesnt know whether or not those whitespace characters are significant. Because preserving information that could be significant is better than deleting it, the XML processor preserves all whitespace outside the start and end tags of the XML elements. For example, I can change the spacing of the source document Ive been using so that it looks like this:

 <!-- American Film Institute Top 25 Films --> <topfilms createdby="AFI"> <film place="1" date="1941">Citizen Kane</film> <film place="2" date="1942">Casablanca</film> <film place="3" date="1972">The Godfather</film> <film place="4" date="1939">Gone With The Wind</film><film place="5" date="1962">Lawrence Of Arabia</film> <film place="6" date="1939">The Wizard Of Oz</film> <film place="7" date="1967">The Graduate</film><film place="8" date="1954">On The Waterfront</film> <film place="9" date="1993">Schindler's List</film> <film place="10" date="1952">Singin' In The Rain</film> </topfilms> 

To show how this whitespace is carried over to the result document, I create a basic template rule

 <xsl:template match="film"> <xsl:apply-templates/> </xsl:template> 

After transformation, the output is as follows:

 Citizen Kane Casablanca The Godfather Gone With The WindLawrence Of Arabia The Wizard Of Oz The GraduateOn The Waterfront Schindler's List Singin' In The Rain 

Using xsl:strip-space and xsl: preserve-space

You can use the xsl:strip-space element to get rid of all this extra whitespace in the source document. This element has a single required attribute named elements . You use the elements attribute to list the names of elements containing whitespace that you want to strip. If you want to add more than one element name , separate the names with (ironically enough) whitespace. You can also use * to specify all elements.

For my example, I want to specify the topfilms element, because all the extra whitespace is part of its content:

 <xsl:strip-space elements="topfilms"/> 

By adding this as a top-level element to my stylesheet, the transformation now looks quite different:

 Citizen KaneCasablancaThe GodfatherGone With The WindLawrence Of ArabiaThe Wizard Of OzThe GraduateOn The WaterfrontSchindler's ListSingin' In The Rain 

 Tip   The xsl:strip-space removes whitespace only for the elements specified by the elements attribute and doesnt strip whitespace from the descendents of those elements. In this example, if I want to remove any extra whitespace appearing in film elements, I need to explicitly add it to the elements attribute value: elements="topfilms film" .

The xsl:preserve-space element preserves whitespace in the source document. By default, XSLT conserves space already, so this element is needed only to offset the use of xsl:strip-space . A common example of how developers use this element is when you want to remove the space in all elements except one or two. So if I want to remove all the whitespace in the source document, except for the whitespace inside the film elements, I use the following:

 <xsl:strip-space elements="*"/> <xsl:preserve-space elements="film"/> 

 Remember   The xsl:strip-space and xsl:preserve-space elements are top-level elements for a stylesheet. If you put them inside a template rule, you get an error.

Preserving with xml:space

A second way of preserving whitespace in the source document is to add a special XML attribute named xml:space to one or more of the document elements. The xml:space attribute has two possible values:

  • xml:space="preserve" tells the processor to keep the whitespace for this element intact.

  • xml:space="default" tells the processor to return to its default setting.

The xml:space applies to the element that defines it as well as any of its descendants.

When the XSLT processor encounters an xml:space , it remembers the value as text nodes are processed . Text nodes take on the xml:space value of their closest ancestor .

  
 
 
2000-2002    Feedback


XSLT For Dummies
XSLT for Dummies
ISBN: 0764536516
EAN: 2147483647
Year: 2002
Pages: 148

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net