xsl:sort

xsl: sort

The <xsl:sort> element is used to define a component of a sort key. It is used within an <xsl:apply-templates>, <xsl:for-each>, <xsl:for-each- group >, or <xsl:perform-sort> instruction to define the order in which this instruction processes its data.

Changes in 2.0

The <xsl:sort> element may now be used within <xsl:for-each-group> and <xsl:perform-sort> as well as within <xsl:apply-templates> and <xsl:for-each> .

The value of the sort key may be calculated using an enclosed sequence constructor, as an alternative to using the select attribute.

In XSLT 1.0, <xsl:sort> was always used to sort a set of nodes. In 2.0 it is generalized so that it can sort any sequence.

A collation attribute has been added to allow the collating sequence for strings to be specified by means of a URI.

Sorting is now sensitive to the data types of the items being sorted. For example, if the sort key values are numeric, they will be compared as numbers rather than as strings.

Format

 <xsl:sort   select? = expression   lang? = {  nmtoken  }   order? = { "ascending"  "descending" }   collation? = { uri }   case-order? = { "upper-first"  "lower-first" }   data-type? = { "text"  "number"  qname-but-not-ncname  }>   stable = { "yes"  "no" }   <!-- Content:  sequence-constructor  --> </xsl:sort>

Position

<xsl:sort> is always a child of <xsl:apply-templates>, <xsl:for-each>, <xsl:for-each-group> , or <xsl:perform-sort> . Any number of sort keys may be specified, in major-to-minor order.

When used in <xsl:for-each>, <xsl:for-each-group> , or <xsl:perform-sort> , any <xsl:sort> elements must appear before the sequence constructor of the containing element.

When used in <xsl:apply-templates> , the <xsl:sort> elements can come before or after any <xsl: with-param > elements.

Attributes

Name	Value	Meaning
select optional	Expression	Defines the sort key.
order optional	Attribute value template returning «ascending » «descending »	Defines whether the nodes are processed in ascending or descending order of this key. The default is «ascending »
case-order optional	Attribute value template returning «upper-first » «lower-first »	Defines whether upper-case letters are to be collated before or after lower-case letters . The default is language-dependent
lang optional	Attribute value template returning a language code	Defines the language whose collating conventions are to be used. The default depends on the processing environment
data-type optional	Attribute value template returning «text » «number » QName	Defines whether the values are to be collated alphabetically or numerically , or using a user -defined data type. The default is «text »
collation optional	Attribute value template returning collation URI	The collation URI identifies how strings are to be compared with each other
stable optional	Attribute value template returning «yes » «no »	This attribute is allowed only on the first <xsl:sort> element; if set to «no » , it indicates that there is no requirement to retain the original order of items that have equal values for all the sort keys

A number of these attributes can be written as attribute value templates. The context item, context position, and context size for evaluating these attribute value templates are the same as the context for evaluating the select attribute of the containing instruction (that is <xsl:for-each>, <xsl:apply-templates>, <xsl:for-each-group> , or <xsl:perform-sort> ).

Content

The element may contain a sequence constructor. This is used to compute the sort key value, as an alternative to using the select attribute. The two are mutually exclusive: If the select attribute is present, the element must be empty, and if it is not empty, the select attribute must be omitted.

Effect

The list of <xsl:sort> elements appearing for example within an <xsl:apply-templates> or <xsl:for-each> element determines the order in which the selected items are processed. The items are sorted first by the first sort key; any group of items that have duplicate values for the first sort key are then sorted by the second sort key, and so on.

It's useful to start by establishing some clear terminology. There is a tendency to use the phrase sort key with several different meanings, often in the same sentence , so to avoid confusion I'll try to stick to the more precise terms that are used in the XSLT specification itself.

A collection of <xsl:sort> elements, which together define all the criteria for performing a sort, is called a sort key specification.
A single <xsl:sort> element within the sort key specification is called a sort key component. Often, of course, there will only be one component in a sort key specification.
The result of evaluating a sort key component for one of the items to be sorted is called a sort key value.

So if you are sorting by last name and then first name, the sort key specification is "last name, then first name;" the sort key components are "last name" and "first name," and the sort key values are strings such as "Kay" and "Michael".

It's also useful to be clear about how we describe the sorting process:

The sequence that provides the input to the sort operation is called the initial sequence. In XSLT 1.0 the initial sequence was always a set of nodes in document order, but in 2.0 it can be any sequence of items (nodes or atomic values) in any order.
The sequence that is produced as the output of the sort operation is called (naturally enough) the sorted sequence.

The overall rules for the sorting operation are fairly intuitive, but it's worth stating them for completeness:

Given two items A and B in the initial sequence, their relative positions in the sorted sequence are determined by evaluating all their sort key values, one for each component in the sort key specification.
The relative positions of A and B depend on the first pair of sort key values that is different for A and B; the second pair of sort key values needs to be considered only if the first sort key values for the two items are equal, and so on. For example, if you are sorting by last name and then first name, the system will only need to consider the first name for two individuals who have the same last name. I will explain later exactly what it means for one sort key value to be considered equal to another.
Considering only this pair of sort key values, A comes before B in the sorted sequence if A's value for this sort key component is less than B's value, unless «order="descending " »is specified for this sort key component, in which case it comes after B. I will explain later what it means for one sort key value to be "less than" another.
If all the sort key values for A and B are the same, then A and B appear in the sorted sequence in the same relative positions that they had in the initial sequence (the technical term for this is that the sort is stable ). However, if «stable="no" » is specified on the first <xsl:sort> element, this requirement is waived, and the system can return duplicates in any order.

The sort key value for each item in the initial sequence is established by evaluating the expression given in the select attribute, or by evaluating the contained sequence constructor. These are mutually exclusive:

If there is a select attribute, then the <xsl:sort> element must be empty. If neither a select attribute nor a sequence constructor is present, the default is equivalent to specifying «select="." » .

The select expression (or sequence constructor) is evaluated for each item, with this item as the context item, its position in the initial sequence as the context position, and with the number of items in the initial sequence as the context size.

This means that if you want to process a sequence in reverse order, you can specify a sort key as:

  <xsl:sort select="position()" order="descending" />

You can also achieve other crafty effects: Try, for example, sorting by «position() mod 3 » . This can be useful if you need to arrange data vertically in a table.

This just leaves the question of how the system decides whether one sort key value is equal to or less than another. The basic rule is that the result is decided using the XPath «eq » and «lt » operators. These are essentially the same as the «= » and «< » operators, except that they only compare a single atomic value to another single atomic value, and perform no type conversions other than numeric promotion (which means, for example, that if one operand is an integer and the other is a double, the integer will be converted to a double in order to perform the comparison).

There are several caveats to this general rule:

The «lt » operator may raise an error when comparing values of different types (such as a number and a date) or when comparing two values of the same type for which no ordering relation is defined (for example, instances of the type xs:QName ). If this happens then the XSLT processor has a choice: It can either treat this as a fatal error, or it can continue by assigning an implementation-defined ordering to these items. (This might mean, for example, that if you sort a mixture of strings and numbers, the output will contain all the strings followed by all the numbers, or all the numbers followed by all the strings.)
The <xsl:sort> element has an attribute data-type . This is only there for backwards compatibility with XSLT 1.0, but you can still use it. If the value of the attribute is «text » , then the sort key values are converted to strings (using the XPath casting rules) before being compared. If the value is «number » , then they are cast to the type xs:double . The attribute also allows the value to be a prefixed QName, but the meaning of this entirely depends on the implementation. The feature was probably added to XSLT 1.0 to anticipate the use of schema-defined type names such as xs:date , and the implementation may allow this usage, but it's not defined in the standard. Instead, if you want to convert the sort key values to a particular type to do the comparison, you can achieve this using a cast or constructor function within the select attribute: for example, «select="xs:date(@date-of-birth)" » .
Another option that has been retained for backwards compatibility with XSLT 1.0 is the ability to supply a sequence of values as the result of the select attribute, rather than a single value. XSLT 1.0 allowed the value of the expression to be a node set (a sequence of nodes in document order), and took the string value of the first node in the set as the effective value of the sort key, ignoring any other nodes. This behavior (generalized to any sequence of items) is still available in XSLT 2.0, but only if running in backwards compatibility mode. The <xsl:sort> element is in backwards compatibility mode if this element, or an ancestor of this element in the stylesheet module, has a version attribute (or xsl:version in the case of a literal result element) whose value is «1.0 » . If this is not the case, then supplying a sequence of more than one item as the sort key value will cause an error.
It is possible that evaluating a sort key value will return the empty sequence. XSLT specifies that for sorting purposes, the empty sequence is considered to be equal to itself and less than any other value.
Another possibility is that when evaluating a numeric sort key value, the value will be the special value NaN (not a number). This would happen, for example, if you specify «select="number (@price)" » and the element has no price attribute, or a price attribute whose value is «$10.00 » (the «$ » sign will cause the conversion to a number to fail). XSLT specifies that for sorting purposes, NaN is considered equal to itself, and less than any other numeric value (but greater than an empty sequence). This is different from the results of using the XPath comparison operators, where « eq » returns false if both operands are NaN.
Last and not least, if the two values are strings, then they are compared using a collation. Collations are a broad subject and so we will devote a separate section to them.

The order attribute specifies whether the order is ascending or descending. Descending order will produce the opposite result of ascending order. This means, for example, that NaN values will appear last rather than first, and also that the effect of the case-order attribute is reversed : If you specify «case-order="upper-first" » with «order="descending" », then drall will come before Drall.

The final sorted order of the items determines the sequence in which they are processed by the containing <xsl:apply-templates>, <xsl:for-each>, or <xsl:for-each-group> instruction, or the order in which they are returned by the containing <xsl:perform-sort> instruction. While the sorted sequence is being processed, the value of position() will reflect the position of the current item in the sorted sequence.

Collations

When the sort key values to be compared are strings, they are compared using a collation. A collation is essentially a mechanism for deciding whether two strings such as «polish » and «Polish » are considered equal, and if not, which of them should come first.

Choosing the right collation depends on the data that you are sorting, and on the expectations of the users. These expectations vary by country (in Germany, in most modern publications , « » is sorted along with «a », but in Sweden, it is sorted after «z » ) and also vary according to the application. Telephone directories, dictionaries, gazetteers, and back-of-book indexes each have their own rules.

XSLT 2.0 and XPath 2.0 share similar mechanisms for dealing with collations, because they are needed not only in sorting, but also in defining what operators such as «eq » mean, and in functions such as distinct-values (). The assumption behind the design is that many computing environments (for example the Windows operating system, the Java virtual machine, or the Oracle database platform) already include extensive mechanisms for defining and customizing collations, and that XSLT processors will be written to take advantage of these. As a result, sorting order will not be identical between different implementations .

The basic model is that a collation (a set of rules for determining string ordering) is identified by a URI. Like a namespace URI, this is an abstract identifier, not necessarily the location of a document somewhere on the Web. The form of the URI, and its meaning, is entirely up to the implementation. At the time of writing there is talk of IANA (the Internet Assigned Numbers Authority) setting up a register of collation names, but even if this comes to fruition, it will still be up to the implementation to decide whether to support these registered collations or not. Until such time, the best you can do to achieve interoperability is pass the collation URI to the stylesheet as a parameter; the API can then sort out the logic for choosing different collations according to which processor you are using.

The Unicode consortium has published an algorithm for collating strings called the Unicode Collation Algorithm (see http://www.unicode.org/unicode/ reports /tr10/inde x.html ). Although the XSLT specification refers to this document, it doesn't say that implementations have to support it. In practice, many of the facilities available in platforms such as Windows and Java are closely based on this algorithm. The Unicode Collation Algorithm is not itself a collation, because it can be parameterized. Rather, it is a framework for defining a collation with the particular properties that you are looking for.

You can specify the URI of the collation to be used in the collation attribute of the <xsl:sort> element. This is an attribute value template, so you can write <xsl:sort collation="{$collation-uri }">to use a collation that has been passed to the stylesheet as a parameter.

There is one collation URI that every implementation is required to support, called the Unicode codepoint collation (not to be confused with the Unicode Collation Algorithm mentioned earlier). In the current drafts of the specification this is selected using the URI

http://www.w3.org/2003/11/xpath-functions/collation/codepoint

but the final URI is likely to change as the specification reaches Recommendation status.

Under the codepoint collation, strings are simply compared using the numeric code values of the characters in the string: If two characters have the same Unicode codepoint they are equal, and if one has a numerically lower Unicode codepoint, then it comes first. This isn't a very sophisticated or user-friendly algorithm, but it has the advantage of being cheap and cheerful. If you are sorting strings that use a limited alphabet, for example part numbers, then it is probably perfectly adequate.

If you specify a collation that the implementation doesn't recognize, then it raises an error. However, the word "recognize" is deliberately vague. An implementation could choose to recognize every possible collation URI that you might throw at it, and never raise this error at all. More probably, an implementation might decide to use parameterized URIs (for example, allowing a component such as «language=fr » to select the target language), and it's then an implementation decision whether to "recognize" a URI that contains invalid or missing parameters.

If you don't specify the collation attribute on <xsl:sort>, you can provide a hint as to what kind of collation you want by specifying the lang and/or case-order attributes. These are retained from XSLT 1.0, which didn't support explicit collation URIs, but they are still available for use in 2.0.

The lang attribute specifies the language whose collation rules are to be used (this might be the language of the data, or the language of the target user). Its value is specified in the same way as the standard xml:lang attribute defined in the XML specification, for example «lang="en-US" » refers to U.S. English and «lang="fr-CA" » refers to Canadian French.
Knowing the language doesn't help you decide whether upper-case or lower-case letters should come first (every dictionary in the world has its own rules on this), so XSLT makes this a separate attribute, case-order. Generally case order will be used only to decide the ordering of two words that compare equal if case is ignored. For example, in German, where an initial upper-case letter can change the meaning of a word, some dictionaries list the adjective drall (meaning plump or buxom) before the unrelated noun Drall (a swerve, twist, or bias), while others reverse the order. Specifying «case-order="lower-first" » would place drall immediately before Drall, while «case-order="upper-first" » would have Drall immediately followed by drall.

Usage

In this section we'll consider two specific aspects of sorting that tend to be troublesome . The first is the choice of a collation, and the second is how to achieve dynamic sorting-that is, sorting on a key chosen by the user at runtime, perhaps by clicking on a column heading.

Using Collations

XSLT is designed to be capable of handling serious professional publishing applications, and clearly this requires some fairly powerful sorting capabilities. In practice, however, the most demanding applications almost invariably have domain-specific collating rules; for instance, the rules for sorting personal names in a telephone directory are unlikely to work well for geographical names in a gazetteer . This is why the working groups decided to make the specification so open -ended in its support for collations.

Collations based on the Unicode collation algorithm generally assign each character in the sort key value a set of weights. The primary weight distinguishes characters that are fundamentally different: «A » is different from «B ». The secondary weight distinguishes secondary differences, for example the distinction between «A » and « » . The tertiary weight is used to represent the difference between upper and lower case, for example «A » and «a » . The way that weights are used varies a little in non-Latin scripts, but the principles are similar.

Rather than looking at each character separately, the Unicode collation algorithm compares two strings as a whole. First it looks to see if there are two characters whose primary weights differ ; if so, the first such character determines the ordering. If all the primary weights are the same, it looks at the secondary weights, and it only considers the tertiary weights if all the secondary weights are the same. This means for example that in French, «attache » sorts before «attach » , which in turn sorts before «attachement » . The acute accent is taken into account when comparing «attache » with «attach » , because there is no primary difference between the strings, but it is ignored when comparing «attach » with «attachement » , because in this case there is a primary difference.

The strength of a collation determines what kind of differences it takes into account. For comparing equality between strings, it is often appropriate to use a collation with weak strength: For example a collation with primary strength will treat «attache » as equal to «attach » . (In French this is not necessarily the right thing to do, as these two words have completely different meanings; but it would be appropriate if there is a high possibility that accents have been omitted from one of the strings.) When sorting, however, it is almost always best to use a collation with high strength, which will take secondary and if necessary tertiary differences into account when there are no primary differences.

Sometimes it is better, rather than defining two separate sort key components, to concatenate the sort key values into a single sort key component. For example, if you define a single sort key component as:

  <xsl:sort select="concat(last-name, ' ', first-name)"/>

this might give better results than

  <xsl:sort select="last-name"/>   <xsl:sort select="first-name"/>

This is because in the second case above, a tertiary difference in the last name is considered more significant than a primary difference in the first name: So «MacMillan Tricia » will sort before «Macmillan Harold ». When the sort key values are concatenated , the difference between «MacMillan » and «Macmillan » is only taken into account when the first names are the same.

Dynamic Sort Keys

The select attribute contains an expression which is evaluated for each node, with the node as the context node, to give the value that determines the node's position in the sequence. There's no direct way of specifying that you want to use different sort keys on different occasions. I've seen people try to write things like:

  <xsl:param name="sort-key" >   . . .   xsl:for-each select="BOOK">   <xsl:sort select="$sort-key"/>

hoping that if $sort-key is set to «TITLE », the elements will be sorted by title, and that if $sort-key is set to «AUTHOR », they will be sorted by author. This doesn't work: The variable $sort-key will have the same value for every <BOOK> element, so the books will always be output in unsorted order. In this case, where the sort key is always a child element of the elements being sorted, you can achieve the required effect by writing:

  <xsl:sort select="*[local-name()=$sort-key]">

Another way of making the sort conditional is to use a conditional expression as the sort key. This is much easier in XSLT 2.0 with the introduction of conditional expressions in XPath:

  <xsl:sort select="if ($sort-key = title') then title   else if ($sort-key = 'author') then author   else if ($sort-key = 'isbn') then isbn   else publisher">

If the computation of the sort key is really complicated, you can do it in a sequence constructor rather than in the select attribute. This can even invoke templates or build temporary trees-there are no limits.

There are two other solutions to this problem that are worth mentioning, although both have their disadvantages:

One is to generate or modify the stylesheet before compiling it, so that it includes the actual sort key required. This technique is popular when transformations are executed in the browser, typically under the control of JavaScript code in the HTML page. The stylesheet is then typically loaded from the server and parsed into a DOM document, which can be modified in situ before the transformation starts. The disadvantage is that this means recompiling the stylesheet each time it is run: This would probably be an unacceptable overhead if the transformation is running server-side within a Web server.
Another is to use an extension function that permits the evaluation of XPath expressions that have been constructed dynamically, as strings. Such a function, dyn:evaluate(), is defined in the third-party function library at http://www.exslt.org/ , and is available in this or a similar form with a number of XSLT processors including Saxon and Xalan.

Examples

I'll start with a couple of simple examples, and then show a full working example that you can download and try yourself.

Example 1: to process all the <book> children of the current node, sorting them by the value of the isbn attribute
```
  <xsl:apply-templates select="book">   <xsl:sort select="@isbn"/>   </xsl:apply-templates>  
```
Example 2: to output the contents of all the <city> elements in the document, in alphabetical order, including each distinct city once only:
```
  <ul>   <xsl:for-each select="distinct-values(//city)]">   <xsl:sort select="."/>   <li><xsl:value-of select="."/></li>   </xsl:for-each>   </ul>  
```
If «select="." » omitted from the <xsl:sort> element, the effect would be the same, because this is the default; however, I prefer to include it for clarity.

Sorting on the Result of a Calculation

This example outputs a list of products, sorted by the total sales of each product, in descending order.

Source

This is the file products.xml :

  <products>   <product name="strawberry jam">   <region name="south" sales="20.00"/>   <region name="north" sales="50.00"/> </product>   <product name="raspberry jam">   <region name="south" sales="205.16"/>   <region name="north" sales="10.50"/>   </product>   <product name="plum jam">   <region name="east" sales="320.20"/>   <region name="north" sales="39.50"/>   </product>   </products>

Stylesheet

products.xsl is a complete stylesheet written using the simplified stylesheet syntax, in which the entire stylesheet module is written as a single literal result element. Simplified stylesheets are described in Chapter 3, on page 119.

The <xsl:sort> element sorts the selected nodes (all the <product> elements) in descending order of the numerical total of the sales attribute over all their <region> child elements. The total is calculated using the sum() function (discussed in XPath 2.0 Programmer's Reference , Chapter 10), and displayed using the format-number() function (see Chapter 7, page 558).

  <products xsl:version="2.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">   <xsl:for-each select="products/product">   <xsl:sort select="sum(region/@sales)" order="descending"/>   <product name="{@name}"   sales="{format-number(sum(region/@sales), '$####0.00')}"/>   </xsl:for-each>   </products>

For this to work correctly under XSLT 1.0, you need to add "data-type= " number "" to the <xsl:sort> element. This is not needed with XSLT 2.0 because the data type is recognized automatically.

Output

I have added line breaks for readability:

  <products>   <product name="plum jam" sales="9.70"/>   <product name=" raspberry jam" sales="5.66"/>   <product name="strawberry jam" sales=".00"/>   </products>