xsl:preserve-space

xsl: preserve-space

The <xsl:preserve-space> element, along with <xsl: strip-space >, is used to control the way in which whitespace nodes in the source document are handled.

Changes in 2.0

The syntax of a NameTest has been extended to allow the format «*:NCName », which matches all elements with a given local name , in any namespace.

Format

 <xsl:preserve-space   elements = tokens />

Position

<xsl:preserve-space> is a top-level declaration, which means that it must be a child of the <xsl:stylesheet> element. There are no constraints on its ordering relative to other declarations.

Attributes

Name	Value	Meaning
elements mandatory	Whitespace-separated list of NameTests	Defines the elements in the source document whose whitespace-only text nodes are to be preserved

Name

Value

Meaning

elements

mandatory

Whitespace-separated list of NameTests

Defines the elements in the source document whose whitespace-only text nodes are to be preserved

The NameTest construct is defined in XPath. NameTests are also used in XSLT patterns, so they are described in Chapter 6 of this book on page 509. A NameTest may be an actual element name, or the symbol «* » meaning all elements, or the construct «prefix:* » meaning all elements in a particular namespace, or the construct «*:local-name » meaning all elements with a given local name, regardless of their namespace.

Content

None, the element is always empty.

Effect

This declaration, together with <xsl:strip-space>, defines the way that whitespace-only text nodes in the source document are handled. Unless contradicted by an <xsl:strip-space> element, <xsl:preserve-space> indicates that whitespace-only text nodes occurring as children of a specified element are to be retained in the source tree.

Preserving whitespace-only text nodes is the default action, so this element only needs to be used where it is necessary to contradict an <xsl:strip-space> element. The interaction of the two is explained below.

The concept of whitespace-only text nodes is explained at some length, starting on page 136, in Chapter 3.

This declaration also affects the handling of whitespace-only text nodes in any document loaded using the document() function. It does not affect the handling of whitespace-only text nodes in the stylesheet when used in its role as a stylesheet, but it does affect the stylesheet in the same way as any other document if a copy of the stylesheet is loaded using the document() function.

The element does not affect whitespace nodes in documents returned as the result of extension functions or passed to the stylesheet as the value of a stylesheet parameter. Also, the element does not affect anything that happens to the source document before the XSLT processor gets to see it, so if you create the source tree using an XML parser that strips whitespace nodes (as Microsoft's MSXML3 does, by default), then specifying <xsl:preserve-space> in the stylesheet will not get these nodes back-they are already gone.

A whitespace-only text node is a text node whose text consists entirely of a sequence of whitespace characters , these being space, tab, carriage return, and linefeed ( #x20, #x9, #xD, and #xA ). The <xsl:preserve-space> element has no effect on whitespace contained in text nodes that also contain non-whitespace characters; such whitespace is always preserved and is part of the value of the text node.

An XML 1.1 parser will recognize the additional characters «x85 » and «x2028 » as representing line endings. This means it will convert such characters into «x0A » characters in the data model that XSLT sees. If one of these two characters finds its way into a text node without being converted, which can happen either if they are written as character reference « » or «  » in the source XML or if the tree is built by an XML 1.0 parser, then the XSLT processor will not treat them as whitespace.

Before a node is classified as a whitespace-only text node, the tree is normalized by concatenating all adjacent text nodes. This includes the merging of text that originated in different XML entities.

A whitespace-only text node may either be stripped or preserved. If it is stripped, it is removed from the tree. This means it will never be matched, it will never be copied to the output, and it will never be counted when nodes are numbered. If it is preserved, it is retained on the tree in its original form, subject only to the end-of-line normalization performed by the XML parser.

If a whitespace-only text node has an ancestor with an xml:space attribute and the nearest ancestor with such an attribute has the value «xml:space="preserve" », then the text node is preserved regardless of the <xsl:preserve-space> and <xsl:strip-space> elements in the stylesheet.

The elements attribute of <xsl:preserve-space> must contain a whitespace-separated list of NameTests . The form of a NameTest is defined in the XPath expression language; see Chapter 6, page 509. Each form of NameTest has an associated priority. The different forms of NameTest and their meanings are:

S y ntax	Examples	Meaning	Priority
QName	title svg:width	Matches the full element name, including its namespace URI
NCName «:* »	svg:*	Matches all elements in the namespace whose URI corresponds to the given prefix	-0.25
«*: » NCName	*:address	Matches all elements with a given local-name, regardless of their namespace	-0.25
«* »	*	Matches all elements	-0.5

The priority is used when conflicts arise. For example, if the stylesheet specifies:

  <xsl:strip-space elements="*"/>   <xsl:preserve-space elements="para clause"/>

then whitespace-only text nodes appearing within a <para> or <clause> will be preserved. Even though these elements match both the <xsl:strip-space> and the <xsl:preserve-space>, the NameTest in the latter has higher priority (0 as compared to -0.5).

An <xsl:strip-space> or <xsl:preserve-space> element containing several NameTests is equivalent to writing a separate <xsl:strip-space> or <xsl:preserve-space> element for each NameTest individually.

A whitespace-only text node is preserved if there is no <xsl:strip-space> element in the stylesheet that matches its parent element.

A whitespace-only text node is removed from the tree if there is an <xsl:strip-space> element that matches the parent element, and no <xsl:preserve-space> element that also matches.

If there is an <xsl:strip-space> element that matches the parent element, and also an <xsl:preserve-space> element that matches, then the decision depends on the import precedence and priority of the respective rules. Taking into consideration all the <xsl:strip-space> and <xsl:preserve-space> elements that match the parent element of the whitespace-only text node, the XSLT processor takes the one with highest import precedence (as defined in the rules for <xsl:import> on page 312). If there is more than one element with this import precedence, it takes the one with highest priority, as defined in the table above. If there is still more than one, it may either report an error, or choose the one that comes last in declaration order. If the chosen element is <xsl:preserve-space>, the whitespace-only text node is preserved on the tree: If it is <xsl:strip-space>, it is removed from the tree.

In deciding whether to strip or preserve a whitespace-only text node, only its immediate parent element is considered in the above rules. The rules for its other ancestors make no difference. The element itself, of course, is never removed from the tree: The stripping process will only remove text nodes.

If an individual element has the XML-defined attribute «xml:space="preserve" » or «xml:space="default" » this overrides anything defined in the stylesheet. These values, unlike <xsl:preserve-space> and <xsl:strip-space>, do affect descendant elements as well as the element on which the attribute appears. If an <xsl:strip-space> doesn't seem to be having any effect, one possible reason is that the element type in question is declared in the DTD to have an xml:space attribute with a default value of «preserve » . There is no way of overriding this in the stylesheet.

Usage

For many categories of source document, especially those used to represent data structures, whitespace-only text nodes are never significant, so it is useful to specify:

  <xsl:strip-space elements="*"    />

which will remove them all from the tree. There are two main advantages in stripping these unwanted nodes:

When <xsl:apply-templates> is used with a default select attribute, all child nodes will be processed. If whitespace-only text nodes are not stripped, they too will be processed , probably leading to the whitespace being copied to the output destination.
When the position() function is used to determine the position of an element relative to its siblings, the whitespace-only text nodes are included in the count. This often leads to the significant nodes being numbered 2, 4, 6, 8, ....

Generally speaking, it is a good idea to strip whitespace-only text nodes belonging to elements that have element content, that is, elements declared in the DTD as containing child elements but no #PCDATA, or declared in a schema to have a complex type with «mixed="no" » .

It also usually does no harm to strip whitespace-only text nodes from elements declared as having simple content; that is, elements whose only children are text nodes. In most cases, an element containing whitespace text is equivalent to an empty element, so stylesheet logic can be simplified if elements containing whitespace only are normalized to be empty by removing the text node.

By contrast, stripping whitespace-only text nodes from elements with mixed content, elements declared in the DTD or schema to contain both child elements and #PCDATA, is often a bad idea. For example, consider the element below:

  <quote>He went to <edu>Balliol College</edu> <city>Oxford</city> to read   <subject>Greats</subject></quote>

The space between the <edu> element and the <city> element is a whitespace-only text node, and it should be preserved, because otherwise when the tags are removed by an application that's only interested in the text, the words «College » and «Oxford » will run together.

It's worth noting that many XSLT processors do not physically remove whitespace text nodes from the tree; they only behave as if they did. Whether the nodes are physically removed or whether the processor creates a view of the tree in which these nodes are invisible, whitespace stripping can incur a significant cost. However, if whitespace is stripped while the tree is being built from serial XML input, the performance arguments are reversed : It then becomes cheaper to remove the whitespace nodes than to preserve them. Generally, if whitespace is insignificant then it's best to get rid of it as early as possible.

Examples

To strip whitespace nodes from all elements of the source tree:

  <xsl:strip-space elements="*"/>

To strip whitespace nodes from selected elements:

  <xsl:strip-space elements="book author title price"/>

To strip whitespace nodes from all elements except the <description> element:

  <xsl:strip-space elements="*"/>   <xsl:preserve-space elements="description"/>

To strip whitespace nodes from all elements except those in the namespace with URI http://mednet.org/text :

  <xsl:strip-space elements="*"/>   <xsl:preserve-space elements="mednet:*"   xmlns:mednet="http://mednet.org/text" />