xsl : key

The <xsl:key> element is a top-level declaration used to declare a named key, for use with the key() function in expressions and patterns.

Changes in 2.0

The restrictions on using global variables in the match and use attributes have been eased.

It is now possible to define keys of any data type (for example numbers or dates), and to a specify a collation to be used when deciding whether two string-valued keys match.

The key values for a node can now be evaluated using a sequence constructor contained in the <xsl:key> element, as an alternative to using the use attribute. This allows the evaluation to invoke templates or other XSLT constructs such as <xsl:number> .

Format

 <xsl:key   name = qname   match = pattern   use? = expression   collation? = uri>   <!-- Content: sequence-constructor --> </xsl:key>

Position

<xsl:key> is a top-level declaration, which means that it must be a child of the <xsl:stylesheet> element. It may appear any number of times in a stylesheet.

Attributes

Name	Value	Meaning
name mandatory	Lexical QName	The name of the key
match mandatory	Pattern	Defines the nodes to which this key is applicable
use optional	Expression	The expression used to determine the value of the key for each of these nodes
collation optional	URI	The name of a collation used to compare string-valued keys

The syntax for a Pattern is defined in Chapter 6.

The use attribute and the sequence constructor are mutually exclusive: If the use attribute is present, the element must be empty, and if it is absent, then there must be a nonempty sequence constructor.

Content

A sequence constructor. This is used as an alternative to the use attribute to determine the value of the key.

Effect

The name attribute specifies the name of the key. It must be a valid lexical QName; if it contains a namespace prefix, the prefix must identify a namespace declaration that is in scope on the <xsl:key> element. The effective name of the key is the expanded name, consisting of the namespace URI and the local part of the name.

The match attribute specifies the nodes to which the key applies. The value is a Pattern , as described in Chapter 6. If a node doesn't match the pattern, then it will have no values for the named key. If a node does match the pattern, then the node will have zero or more values for the named key, as determined by the use attribute.

The simplest case is where the key values are unique. For example, consider the following source document:

  <vehicles>   <vehicle reg="P427AGH" owner="Joe Karloff"/>   <vehicle reg="T788PHT" owner="Prunella Higgs"/>   <vehicle reg="V932TXQ" owner="William D. Abikombo"/>   </vehicles>

In the stylesheet you can define a key for the registration number of these vehicles, as follows :

  <xsl:key name="vehicle-registration" match="vehicle" use="@reg"/>

The use attribute specifies an expression used to determine the value or values of the key. This expression doesn't have to select an attribute, like «@reg » in the example above. For example, it could select a child element. If this is a repeating child element, you can create an index entry for each instance. If the use attribute isn't supplied, then the sequence constructor is evaluated instead.

The collation attribute identifies a collation that will be used when the key value is a string. If no collation is specified, an implementation-defined default collation is used. Collations are discussed under <xsl: sort > on page 427. You need to decide whether you want to use a weak collation, in which strings such as «ASCII » and «ascii » are considered equivalent, or a strong collation, in which they are considered to be different. You might also need to consider what language should be used to define the matching rules. One of the options, which might be the best choice when you are comparing strings such as part numbers, is to use the Unicode Codepoint Collation, which considers two strings to be equal only if they use the same Unicode characters , as identified by their codepoint values. You can request this using the collation URI:

http://www.w3.org/2003/11/xpath-functions/collation/codepoint

The final URI will be determined only when the specification reaches Candidate Recommendation status.

XSLT 2.0 allows the pattern in the match attribute, and the expression in the use attribute, to reference global variables and to use the key() function to access other keys. But the definitions must not be circular: For example, if a key K makes use of the global variable V, then the value of V must not depend in any way on the key K.

The formal rules are as follows. For each node that matches the pattern, the use expression or the sequence constructor is evaluated with that node as the context node, and with the context position and context size both set to 1 (one). Then:

The result of evaluating the expression is atomized. Atomization is described in Chapter 2 (page 73); its effect is to extract the typed value of any nodes in the result sequence. Note that the typed value may be a sequence.
Each atomic value in the atomized result contributes one value for the key.
The resulting atomic values can be used to locate the node when supplied in the second argument of the key() function. The node is selected by the key() function if any of the key values (as established by the procedure just described) is equal to any of the values in the sequence supplied to the key() function. The comparison is done using the XPath «eq » operator and is sensitive to the type of the values; if the values are strings, then the selected collation is used.

There is no rule that stops two nodes having the same key value. For example, declaring a key for vehicle registration numbers in the example above does not mean that each registration number must be different. So a node can have more than one key value, and a key value can refer to more than one node.

To describe the rules more formally , each named key can be considered as a set of node-value pairs. A node can be associated with multiple values and a value can be associated with multiple nodes. The value can be of any atomic type. A node-value pair (N, V) is present in this set if node N matches the pattern specified in the match attribute, and if the value of the expression in the use attribute (or of the sequence constructor), when applied to node N, produces a sequence that after atomization and conversion to the required type contains the valueV.

To complicate things a bit further, there can be more than one <xsl:key> declaration in the stylesheet with the same name. The set of node-value pairs for the key is then the union of the sets produced by each <xsl:key> declaration independently. The import precedence of the key declarations makes no difference. All the <xsl:key> declarations with a given name must have the same values for the collation attribute, if present.

A key can be used to select nodes in any document, not just the principal source document. This includes a document constructed as a temporary tree. The key() function always returns nodes that are in the same document as the context node at the time it is called, or the node supplied in the third argument to the function if present. It is therefore best to think of there being one set of node-value pairs for each named key for each document.

The effect of calling «key(K, V, D) » , where K is a key name, V is a string value, and D is a node, is to locate the set of node-value pairs for the key named K in the document containing D , and to return a sequence containing the nodes from each pair where the value is V . If there are several nodes with this value, they are always returned in document order.

If you like to think in SQL terms, imagine a table KEY-VALUES with four columns , KEY-NAME , DOCUMENT , NODE , and VALUE . Then calling «key('K', 'V', 'D') » is equivalent to the SQL statement:

  SELECT DISTINCT NODE FROM KEY-VALUES WHERE   KEY-NAME='K'   AND VALUE='V'   AND DOCUMENT='D';

Usage and Examples

Declaring a key has two effects: It simplifies the code you need to write to find the nodes with given values, and it is likely to make access faster.

The performance effect, of course, depends entirely on the implementation. It would be quite legitimate for an implementation to conduct a full search of the document each time the key() function was called. In practice, however, most implementations are likely to build an index or hash table, so there will be a one-time cost in building the index (for each document), but after this, access to nodes whose key value is known should be very fast.

The <xsl:key> element is usually used to index elements, but in principle it can be used to index any kind of node except namespace nodes.

Keys versus IDs

An alternative to using keys is to use XML-defined IDs. If you have attributes defined in a DTD or schema as being of type ID, you can find an element with a particular ID value using the id() function described in XPath 2.0 Programmer's Reference .

Why would you prefer to use keys, rather than relying on ID values? Keys have many advantages:

ID values must be simple attributes of the elements they identify, they cannot be anything more complex.
ID values must be unique.
You cannot have two different sets of ID values in the same document, for example ISBNs and acquisition numbers; if you do, you have to be sure they will not clash with each other.
ID values must take the form of XML names , for example they cannot contain characters such as «/ » and «+ » .
ID values are not recognized in a source document unless it is parsed with a validating XML parser or schema processor that reports attribute types to the XSLT processor. XML parsers are not required to behave in this way, so it is easy to end up with configuration problems that result in IDs not being recognized.
Recognizing ID attributes in temporary trees is particularly troublesome , as it requires the temporary tree to be validated .

Using a Simple Key

The detailed rules for keys seem complicated, but most practical applications of keys are very simple. Consider the following key definition:

  <xsl:key name="product-code" match="product" use="string(@code)"/>

This defines a key whose name is «product-code » , and which can be used to find <product> elements given the value of their code attribute. If a product has no code attribute, it won't be possible to find it using this key.

In this example I have deliberately forced the @code attribute to be converted to a string. In the absence of a schema, atomizing the value would normally give a value of type xdt:untypedAtomic . Keys use the rules for the XPath 2.0 eq operator, which treat untyped atomic values as strings. So «key("product-code", "abc") » would perform a string comparison, while «key("product-code", 29) » would be a type error. If the value is a string, then it's known at the time the key is defined that all comparisons will be string comparisons, which gives the system a chance to do the indexing more efficiently .

To find the product with code value «ABC-456 » , you can write, for example:

  <xsl:apply-templates select="key('product-code', 'ABC-456')"/>

Note that you could just as well choose to index the attribute nodes:

  <xsl:key name=  ^"  product-code" match="product/@code" use="."/>

To find the relevant product you would then write:

  <xsl:apply-templates select="key('product-code', 'ABC-456')/.."/>

I've used <xsl:apply-templates> here as an example. This will select all the <product> elements in the current document that have code «ABC-456 » (I never said it had to be a unique identifier) and apply the matching template to each one in turn , processing them in document order, as usual. I could equally have used any other instruction that uses an XPath expression; for example, I could have assigned the node set to a variable, or used it in an <xsl:value-of> element.

The second argument to the key function is normally a value of the type specified in the as attribute of the <xsl:key> declaration. It won't usually be a literal, as in my example, but is more likely to be a string obtained from somewhere else in the source document, or perhaps supplied as a parameter to the stylesheet. It may well have been passed as one of the parameters in the URL used to select this stylesheet in the first place; for example, a Web page might display a list of available products such as the one in Figure 5-7

Figure 5-7

Behind each of these buttons shown to the user there might be a URL such as:

  http://www.cheap-food.com/servlet/product?code=ABC-456

You then write a servlet (or an ASP page if you prefer) on your Web server that extracts the query parameter code , and fires off your favorite XSLT processor specifying products.xml as the source document, show-product.xsl as the stylesheet, and «ABC-456 » as the value to be supplied for the global stylesheet parameter called prod-code . Your stylesheet would then look like this:

  <xsl:param name="prod-code"/>   <xsl:key name="product-code" match="product" use="@code"/>   <xsl:template match="/">   <html>   <body>   <xsl:variable name="product" select="key('product-code', $prod-code)"/>   <xsl:if test="not($product)">   <p>There is no product with this code</p>   </xsl:if>   <xsl:apply-templates select="$product"/>   </body>   </html>   </xsl:template>

Multivalued Keys

A key can be multivalued, in that a single node can have several values each of which can be used to find the node independently. For example, a book may have several authors, and each author's name can be used as a key value. This could be defined as follows:

  <xsl:key name="book-author" match="book" use="author/name"/>

The use expression, «author/name » , selects more than one node, so the typed value of each of its nodes (that is, the name of each author of the book) is used as one of the values in the set of node-value pairs that makes up the key.

In this particular example, as well as one book having several authors, each author may have written several books, so when you use an XPath expression such as:

  <xsl:for-each select="key('book-author', 'Agatha Christie')">

you will be selecting all the books in which Agatha Christie was one of the authors. What if you want to find all the books in which Alex Homer and David Sussman are joint authors? You can do this using the intersect operator provided in XPath 2.0:

  <xsl:variable name="set1"   select="key('book-author', 'Alex Homer')"/>   <xsl:variable name="set2"   select="key('book-author', 'David Sussman')"/>   <xsl:variable name="result"   select="$set1 intersect $set2"/>

You can also supply a sequence of several values as the second argument to the key() function. For example, you might write:

  <xsl:variable name="ac" select="key('book-author', 'Agatha Christie')">   <xsl:for-each select="key('book-author', $ac/author/name)">

The result of the select expression in the <xsl:for-each> instruction is the set of all books in which one of the authors is either Agatha Christie or a co-author of Agatha Christie. This is because $ac is the set of all books in which Agatha Christie is an author, so «$ac/author/name » is the set of all authors of these books, and using this set of named authors as the value of the key produces the set of books in which any of them is an author.

Multivalued Nonunique Keys

This example shows how a node can have several values for one key, and a given key value can identify more than one node. It uses author name as a key to locate <book> elements.

Source

The source file is booklist.xml :

  <booklist>   <book>   <title>Design Patterns</title>   <author>Erich Gamma</author>   <author>Richard Helm</author>   <author>Ralph Johnson</author>   <author>John Vlissides</author>   </book>   <book>   <title>Pattern Hatching</title>   <author>John Vlissides</author>   </book>   <book>   <title>Building Applications Frameworks</title>   <author>Mohamed Fayad</author>   <author>Douglas C. Schmidt</author>   <author>Ralph Johnson</author>   </book>   <book>   <title>Implementing Applications Frameworks</title>   <author>Mohamed Fayad</author>   <author>Douglas C. Schmidt</author>   <author>Ralph Johnson</author>   </book>   </booklist>

Stylesheet

The stylesheet is author-key.xsl .

It declares the key and then simply copies the <book> elements that match the author name supplied as a parameter. So you can call this stylesheet with a call such as (all on one line):

  java -jar c:\saxon\saxon7.jar booklist.xml author-key.xsl   author=''Ralph Johnson"

Note that parameters containing spaces have to be written in quotes on the command line. The detailed command line syntax for each XSLT processor is different.

  <xsl:transform   xmlns:xsl=''http://www.w3.org/1999/XSL/Transform"   version="2.0"   >   <xsl:key name="author-name" match="book" use="author"/>   <xsl:param name="author" required="yes"/>   <xsl:template match="/">   <xsl:copy-of select="key('author-name', $author)"/>   </xsl:template>   </xsl:transform>

Output

With the parameter set to the value «John Vlissides » , the output is as follows:

  <?xml version="1.0" encoding="utf-8" ?>   <book>   <title>Design Patterns</title>   <author>Erich Gamma</author>   <author>Richard Helm</author>   <author>Ralph Johnson</author>   <author>John Vlissides</author>   </book>   <book>   <title>Pattern Hatching</title>   <author>John Vlissides</author>   </book>

Multiple Named Keys

There is nothing to stop you from defining several keys for the same nodes. For example:

  <xsl:key name="book-isbn" match="book" use="isbn"/>   <xsl:key name="book-author" match="book" use="author/surname"/>

This allows you to find a book if either the author or the ISBN is known.

However, it's worth thinking twice before doing this. Assuming the XSLT processor implements the key by building an index or hash table, rather than by searching the whole document each time, you have to weigh the cost of building the index against the cost of finding the information by a search. If your transformation only needs to find a single book using its ISBN number, it might be simpler and faster to write:

  <xsl:for-each select="//book[isbn='0-13-082676-6']"/>

and not use a key at all.

Multiple Definitions for the Same Key

It's also possible to have several <xsl:key> declarations with the same name. For example:

  <xsl:key name="artist-key" match="book" use="author/name"/>   <xsl:key name="artist-key" match="CD" use="composer"/>   <xsl:key name="artist-key" match="CD" use="performer"/>

Now you can use the key() function in an expression such as:

  <xsl:apply-templates select="key('artist-key', 'Ringo Starr')"/>

The set of nodes this returns will be either <book> elements or <CD> elements or a mixture of the two; the only thing you know for certain is that each one will be either a book with Ringo Starr as one of the authors, or a CD with Ringo Starr listed either as the composer or as a performer.

If the use expression were the same in each case, you could simplify this. For example to find books and CDs with a particular publisher, you could write:

  <xsl:key name="publisher-key"' match="book  CD" use="publisher"/>

This example uses the union pattern «bookCD » , which matches all <book> elements and all <CD> elements. Union patterns are described on page 502 in Chapter 6.

The different definitions do not all need to be in the same stylesheet module; all the key definitions in included and imported stylesheets are merged together regardless of their import precedence.

Using Keys for Grouping

In XSLT 2.0, the new <xsl:for-each- group > instruction (described on page 281) provides facilities for grouping nodes with common values for a grouping key, or for eliminating nodes with duplicate values for a grouping key. In XSLT 1.0, this was a much more difficult problem to solve, and you may well encounter XSLT 1.0 stylesheets that use the workaround for this problem known as Muenchian grouping, named after its inventor , Steve Muench of Oracle.

It shouldn't be necessary to use Muenchian grouping any more in XSLT 2.0. However, since it is widely used and you may have the job of converting stylesheets that use it, or of writing stylesheets that work under both XSLT 2.0 and XSLT 1.0, it's worth understanding how it works.

Say you want to group a list of employees according to their location. Typically, you want two nested loops , in pseudocode:

  <xsl:for-each (distinct location)>   <location name="...">   <xsl:for-each (employee in that location)>   <employee>   (details)   </employee>   </xsl:for-each>   </location>   </xsl:for-each>

Muenchian grouping uses a key to identify the distinct locations, and then to identify all the employees at a given location. The key is defined like this:

  <xsl:key name="k" match="employee" use="location"/>

To find the distinct locations, scan all the employee elements, and select only those that are the first one in their location. In XSLT 2.0 you would write:

  <xsl:for-each select="//employee[. is key('k', location)[1]]">

The «is » operator wasn't available in XPath 1.0, so this had to be written as:

  <xsl:for-each select=   "//employee[generate-id(.) = generate-id(key('k', location)[1])]">

The inner loop, which selects all the employees at the same location, is achieved by writing:

  <xsl:for-each select="key('k', location)">

so the final code becomes:

  <xsl:for-each select=   "//employee[generate-id(.) = generate-id(key('k', location)[1])]">   <location name="{location}">   <xsl:for-each select="key('k', location)">   <employee>   <xsl:apply-templates/>   </employee>   </xsl:for-each>   </location>   </xsl:for-each>

In XSLT 2.0 this can be rewritten much more readably as:

  <xsl:for-each=group select="//employee" group-by="location"   <location name="{current-grouping-key()}">   <xsl:for-each select="current-group()">   <employee>   <xsl:apply-templates/>   </employee>   </xsl:for-each>   </location>   </xsl:for-each>