xsl:for-each-group


xsl:for-each- group

The <xsl:for-each-group> instruction selects a set of items, arranges the items into groups based on common values or other criteria, and then processes each group in turn .

Changes in 2.0

This instruction is new in XSLT 2.0.

Format

 <xsl:for-each-group   select = expression   group-by? = expression   group-adjacent? = expression   group-starting-with? = pattern   group-ending-with? = pattern   collation? = { uri }>   <!-- Content: (xsl:sort*, sequence-constructor) --> </xsl:for-each-group> 

Position

<xsl:for-each-group> is an instruction, which is always used within a sequence constructor.

Attributes

Name

Value

Meaning

select

mandatory

Expression

The sequence of items to be grouped, known as the population

group-by

optional

Expression

Grouping key. Items with common values for the grouping key are to be allocated to the same group

group-adjacent

optional

Expression

Grouping key. Items with common values for the grouping key are to be allocated to the same group if they are adjacent in the population

group-starting-with

optional

Pattern

A new group will be started for each item in the population that matches this pattern

group-ending-with

optional

Pattern

A new group will be started following an item that matches this pattern

collation

optional

Collation URI

Identifies a collation used to compare strings for equality when comparing group key values

The attributes group-by , group-adjacent , group-starting-with , and group-ending-with are mutually exclusive. Exactly one of these four attributes must be present.

Content

Zero or more <xsl:sort> elements, followed by a sequence constructor.

Effect

Grouping takes as input a collection of items (usually nodes) and allocates each of these items to one of a number of subcollections called groups. It then processes each of the groups in turn.

The effect of the <xsl:for-each-group> instruction is summarized as follows :

  • The expression in the select attribute is evaluated. This can return any sequence (of nodes or atomic values). This sequence is known as the population, and the order of the items in the sequence is called population order.

  • Each item in the population is allocated to one or more groups. The way this is done depends on which of the four attributes group-by , group-adjacent , group-starting-with , and group-ending-with is specified, and is described in detail below. When group-by is used, an item may have more than one grouping key and may therefore be allocated to any number of groups (zero or more). In all other cases, each item in the population is allocated to exactly one group. (For the benefit of mathematicians, the groups are then said to partition the population.)

  • The initial item of each group is identified. This is the item in the group that is first in population order, as defined above. If one or more sort keys have been defined using <xsl:sort> elements within the <xsl:for-each-group> element, these sort keys are used to determine the processing order of the groups. Otherwise , the groups are processed in order of first appearance, that is, based on the position of their initial items in population order.

  • There is a special rule covering what happens if an item is allocated to two groups and is the initial item in both of them. This can only happen if the item had several values for its grouping key, and the order of first appearance then relates to the order of these grouping keys in the result of the group-by expression. If the expression was «group-by="author" » , and the value of this expression for the node in question was the sequence «("Gilbert", "Sullivan") » , then the group for author Gilbert will be processed before the group for author Sullivan.

  • The sequence constructor contained in the <xsl:for-each-group> element is evaluated once for each group. Within the sequence constructor, the function current-group () may be called to obtain the items that are members of this group (in population order); and if the groups were defined using group-by or group-adjacent then the function current-grouping-key () may be called to obtain the value of the grouping key that characterizes this group of items.

  • The sequences that result from evaluating the sequence constructor once for each group are concatenated (in processing order) to form the final result of the <xsl:for-each-group> instruction.

If the population is empty, then the number of groups will be zero. No group is ever empty. Whether the population contains nodes or atomic values, no attempt is made to remove duplicates. This means that if the same node appears twice in the population, it will generally appear twice in each group that it is allocated to.

The following sections describe the effect of each of the four attributes group-by , group-adjacent , group-starting-with , and group-ending-with in turn.

group-by

The most common way of using <xsl:for-each-group> is to group items based on common values for a grouping key, which is achieved using the group-by attribute.

The group-by attribute is an XPath expression, which is evaluated once for each item in the population. It is evaluated with this item as the context item, with the position of this item in the population as the context position, and with the size of the population as the context size .

The value of the group-by expression is in general a sequence. This sequence is first atomized (as described in Chapter 2, page 73), and duplicate values are then removed from the atomic sequence that results. For each distinct value that remains in the sequence, the item is allocated to a group identified by this value. The total number of groups is equal to the number of distinct values present in the grouping keys for all items in the population.

Duplicate nodes are not removed from the population, but duplicate grouping keys calculated for a single node are removed. So if the authors of a book are J. Smith and P. Smith, and your grouping key is «author/ surname » , then when you process the group for Smith, this book will be processed once, not twice.

Grouping keys are compared based on their data type: For example numbers are compared as numbers , and dates are compared as dates. Two grouping keys are considered equal based on the rules of the XPath «eq » operator. Strings are compared using the collation specified in the collation attribute if present (for more details on collations, see under <xsl:sort> on page 427). Two NaN ( not-a-number ) values are considered equal to each other even though they are not equal when compared using «eq » . If two values cannot be compared (because they are of noncomparable types, for example xs:date and xs:integer , or because they belong to a type such as xs:duration where no equality operator is defined) then they are considered not equal, which means the items end up in different groups.

If the group-by expression for any item in the population evaluates to an empty sequence, then the item will not be allocated to any groups, which means it will not be processed at all.

group-adjacent

When the group-adjacent attribute is used to define the grouping criteria, items are assigned to groups on the following basis:

  • The first item in the population starts a new group.

  • Subsequent items in the population are allocated to the same group as the previous item in the population if and only if they share the same value for the grouping key defined by the group-adjacent attribute; otherwise they are allocated to a new group.

The group-adjacent expression is evaluated once for each item in the population. During this evaluation, the context item is this item, the context position is the position of this item in population order, and the context size is the size of the population. The value that results from evaluating the group-adjacent expression is atomized (see page 000 ). Unlike the group-by attribute, the result of evaluating the group-adjacent expression, after atomization, must be a single value. A type error is reported if the value is an empty sequence, or if it is a sequence containing more than one atomic value.

Values of the grouping key are compared in the same way as for the group-by attribute. This means, for example, that strings are compared using the collation defined in the collation attribute if specified, and that NaN values compare equal.

There are two main reasons for using group-adjacent in preference to group-by :

  • Firstly, when there is a genuine requirement not to group items with the same grouping key unless they are adjacent. For example, a sequence of temperature readings might be presented so that the only readings actually shown are those that differ from the previous reading. This can be achieved by grouping adjacent readings, and only displaying the first reading in each group.

  • Secondly, when it is known that the items with common grouping keys will always be adjacent in the population. In this case using group-adjacent might give the same result as group-by , but might be more efficient because the XSLT processor can perform the grouping in a single pass through the data.

group-starting-with

The group-starting-with attribute is a pattern (not an expression). Patterns are described in Chapter 6. Patterns apply only to nodes, so this attribute must be used only when the population consists entirely of nodes. The nodes in the population are assigned to groups on the following basis:

  • The first node in the population starts a new group.

  • Subsequent nodes in the population start a new group if they match the pattern, and are assigned to the same group as the previous node otherwise.

The result is that the initial node in each group (the one that comes first in population order) will always match the pattern, with the possible exception of the first group, which may contain no node that matches the pattern.

The group-starting-with attribute is useful where the population consists of a repeating group of nodes whose first member can be readily identified: for example, a <header> element followed by a sequence of <para> elements, then another <header> , and so on. In this case the grouping can easily be defined using «group-starting-with="header" » .

group-ending-with

This attribute behaves in a very similar way to group-starting-with , except that the pattern identifies the last item in a group instead of the first. The nodes in the population are assigned to groups on the following basis:

  • The first node in the population starts a new group

  • Subsequent nodes in the population start a new group if the previous node in the population matches the pattern, and are assigned to the same group as the previous node otherwise.

The most common use case for this option is where input records contain a continuation marker of some kind. For example, the population might consist of elements that have the attribute « continued ="yes" » or «continued="no" » . A group can then be defined using the criterion «group-ending-with="*[@continued='no']" » .

Sorting the Groups

If there are any <xsl:sort> elements as children of the <xsl:for-each-group> , these affect the order in which the groups are processed. They do not affect the order of the items within each group, nor which item in a group is considered to be the initial item.

The select expression in an <xsl:sort> element calculates a sort key that affects the group as a whole, but it is always evaluated with respect to the initial item in the group. The initial item in a group is the item within the group that was first in population order. The select expression is evaluated with this item as the context item, with the position of this item relative to the initial items of other groups as the context position, and with the number of groups as the context size.

If any of the attributes of the <xsl:sort> elements are attribute value templates, then the XPath expressions in these attribute value templates are evaluated with the same context item, position, and size as the select expression of the containing <xsl:for-each-group> element.

If there are no <xsl:sort> elements, or in cases where the initial items in two groups have the same values for their sort keys, the groups are processed in order of first appearance; that is, if the initial item of group G appeared in the population before the initial item of group H, then group G is processed before group H.

"Processed before" does not refer to the actual order of execution: the system can process the groups in any order, or in parallel. What it means is that the results of processing group G appear in the final result sequence ahead of the results of processing group H.

In practice, if the groups are sorted at all then they are nearly always sorted by the value of the grouping key: group the addresses in each city, sorting the groups by city. This can be conveniently coded as:

  <xsl:for-each-group select="//address" group-by="city">   <xsl:sort select="current-grouping-key()"/>   <city-group>   <xsl:copy-of select="current-group()"/>   </city-group>   </xsl:for-each-group>  

The functions current-group() and current-grouping-key() are described on page 529 in Chapter 7.

An <xsl:sort> element within <xsl:for-each-group> is used to sort the groups. To sort the items within each group, use an <xsl:sort> element within the inner <xsl:for-each> , or write:

  <xsl:perform-sort select="current-group()"/>  

The <xsl:perform-sort> instruction is described on page 405.

You can also use the <xsl:perform-sort> instruction to sort the population before grouping starts.

Usage and Examples

The following sections give a number of examples of how <xsl:for-each-group> can be used to solve grouping problems. They are organized according to the four ways of defining the grouping criteria: group-by , group-adjacent , group-starting-with , and group-ending-with .

Using group-by

This is by far the most common kind of grouping. We'll start with a simple case.

Single-Level Grouping by Value
start example

This example groups a set of employees according to the department in which they work.

Source

We'll start with the following simple data file ( staff.xml ):

  <staff>   <employee name="John Jones" department="sales"/>   <employee name="Barbara Jenkins" department="personnel"/>   <employee name="Cormac O'Donovan" department="transport"/>   <employee name="Wesley Thomas" department="personnel"/>   <employee name="Maria Gomez" department="sales"/>   </staff>  

Output

The requirement is to output an HTML document in which the staff are listed by department:

  <h2>sales department</h2>   <p>John Jones</p>   <p>Maria Gomez</p>   <h2>personnel department</h2>   <p>Barbara Jenkins</p>   <p>Wesley Thomas</p>   <h2>transport department</h2>   <p>Cormac O'Donovan</p>  

Stylesheet

This output is simple to achieve. The full stylesheet is in group-by-dept.xsl :

  <xsl:template match="staff">   <xsl:for-each-group select="employee" group-by="@department">   <h2><xsl:value-of select="current-grouping-key()"/>   <xsl:text> department</xsl:text></h2>   <xsl:for-each select="current-group()">   <p><xsl:value-of select="@name"/></p>   </xsl:for-each>   </xsl:for-each-group>   </xsl:template>  

A number of variations are possible on this theme. To sort the deparments, use an <xsl:sort> element within the <xsl:for-each-group> . To sort the employees within each department, use an <xsl:sort> element within the <xsl:for-each> . The solution then becomes ( sorted- depts .xsl ):

  <xsl:template match="staff">   <xsl:for-each-group select="employee" group-by="@department">   <xsl:sort select="current-grouping-key()"/>   <h2><xsl:value-of select="current-grouping-key()"/>   <xsl:text> department</xsl:text></h2>   <xsl:for-each select="current-group()">   <xsl:sort select="@name"/>   <p><xsl:value-of select="@name"/></p>   </xsl:for-each>   </xsl:for-each-group>   </xsl:template>  
end example
 

This general design pattern, where <xsl:for-each-group> is used at the outer level to iterate over the groups, and an inner <xsl:for-each> is used to iterate over the items within a group, is typical. But there are a number of useful variations:

  • The inner loop is sometimes better done using <xsl:apply-templates select="current-group()"/> , especially if the group includes elements of different types.

  • Sometimes the entire inner loop can be written as <xsl:copy-of select="current-group()"/> , especially when generating XML output.

  • Sometimes the requirement is not to display the items in each group, but to calculate some aggregate function for these items: for example to list for each department, the name of the department, the number of employees, and the maximum salary. In this case, the inner loop might not be explicit. The number of employees in the group is easily computed as «count(current-group()) » . Our sample data doesn't show salary, but if this was available as an extra attribute on the <employee> element then you could easily calculate the maximum salary as «max (current-group()/@salary) » .

  • If the requirement is to eliminate duplicates rather than to group all the items (in our example, to output a list of departments) then the inner loop can be omitted entirely. But in this case it may be simpler to use the distinct-values () function described in XPath 2.0 Programmer's Reference .

  • If you need to number the groups, or test whether you are processing the last group, then within the <xsl:for-each-group> element you can use position() and last() in the usual way. At this level, the context item is the initial item of the group being processed, and position() and last() refer to the position of this item in a list that contains the initial item of each group, in processing order.

The example above was expressed as a grouping problem ("list the employees grouped by department"), so it is easy to see that <xsl:for-each-group> can be applied to the problem. Sometimes grouping problems are not so easy to recognize. This might be the case if the example above were expressed as "for each department, list the name of the department and the number of employees."

Example: Multilevel Grouping by Value
start example

Sometimes there is a need to do multilevel grouping. For example, you might want to group the employees by department, and the departments by location. Assume that the <employee> element now has a location attribute as well as a department attribute. It doesn't really matter whether departments can span locations, the code will work either way.

Source

The data is now like this ( staff-locations.xml ):

  <staff>   <employee name="John Jones"   department="sales"   location="New York"/>   <employee name="Barbara Jenkins"   department="personnel"   location="Los Angeles"/>   <employee name="Cormac O'Donovan"   department="transport"   location="New York"/>   <employee name="Wesley Thomas"   department="personnel"   location="Los Angeles"/>   <employee name="Maria Gomez"   department="sales"   location="Seattle"/>   </staff>  

Output

You might want the output presented like this:

  Location: Los Angeles   Department:Personnel   Barbara Jenkins   Wesley Thomas   Location: New York   Department: Sales   John Jones   Department: Transport   Cormac O'Donovan   Location: Seattle   Department: Sales   Maria Gomez  

Stylesheet

Assume that the indentation is achieved using CSS styles, so you can concentrate on getting the structure of the information right. To do this multilevel grouping, just use two levels of <xsl:for-each-group> elements ( multi-level.xsl ):

  <xsl:template match="staff">   <xsl:for-each-group select="employee" group-by="@location">   <xsl:sort select="current-grouping-key()"/>   <p class="indent0">   <xsl:text>Location </xsl:text>   <xsl:value-of select="current-grouping-key()"/>   </p>   <xsl:for-each-group select="current-group()" group-by="@department">   <xsl:sort select="current-grouping-key()"/>   <p class="indent1">   <xsl:text>Location </xsl:text>   <xsl:value-of select="current-grouping-key()"/>   </p>   <xsl:for-each select="current-group()">   <xsl:sort select="@name"/>   <p class="indent2">   <xsl:text>Location </xsl:text>   <xsl:value-of select="@name"/>   </p>   </xsl:for-each>   </xsl:for-each-group>   </xsl:for-each-group>   </xsl:template>  

A similar requirement is where there is a composite grouping key ("group employees that have the same department and the same location"). There are two ways of handling this. You can either treat it as a single level of grouping, using the concatenation of the two values as the grouping key, or you can treat it as two nested groupings in which the outer level does nothing ( composite.xsl ):

  <xsl:for-each-group select="employee" group-by="@location">   <xsl:for-each-group select="current-group()" group-by="@department">   ...   </xsl:for-each-group>   </xsl:for-each-group>  

The two techniques are not completely identical. For example, with a single-level grouping using a concatenated key, the value of position() while processing a department will run continuously from 1 up to the total number of groups, but with a two-level grouping, position() will start again at 1 for each location.

end example
 

The group-by option also allows an item to belong to more than one group. Suppose that an employee can work for several departments, and that the department attribute is extended to be a whitespace-separated list of department names . If you are using a schema-aware processor that annotates this attribute as belonging to a list-valued type, then all the examples we have written above will handle this situation without change. When you write «group-by="@department" » , the value of the expression is atomized, and if the type is a list-valued type, this will return the sequence of atomic values contained in the attribute. The item with this grouping key is then allocated to one group for each department. I was careful to output the department name by referring to <xsl:value-of select="current-grouping-key()"/> ; if I had written <xsl:value-of select="@department"/> the output would have been rather confusing, because instead of listing the name of the single department to which all the employees in this group belong, the code would list all the deparments to which the first employee in the group belongs.

Using group-adjacent

The group-adjacent option attaches significance not only to the value of the grouping key, but to the order of items in the population. So it shouldn't be a surprise to find that most of its applications come with document-oriented XML, where order is typically much more signicant than with data-oriented XML.

Here's a simple example: Given a sequence consisting of <para> elements and <bullet> elements, you want to convert the <para> elements into <p> elements, and the <bullet> elements into <li> elements, with a <ul> element wrapped around a sequence of consecutive bullets. You can do this as follows:

  <xsl:template match="para">   <p>   <xsl:for-each-group   group-adjacent="if (self::bullet) then 0 else position()">   <xsl:apply-templates/>   </xsl:for-each-group>   </p>   </xsl:template>  

The grouping condition ensures that adjacent <bullet> elements go in a group together, while each <para> element goes in a group by itself (calling position() ensures each <para> element gets a unique grouping key-you could also have used generate-id () ). We are only interested in the rule that processes the first bullet:

  <xsl:template match="bullet">   <ul>   <xsl:apply-templates select="current-group()" mode="each-bullet"/>   </ul>   </xsl:template>  

This template rule processes the group of adjacent bullets by outputting the necessary <ul> element to the result tree, and inside this it creates the elements that represent each individual bullet, by calling another template rule (in a different mode) to process each one.

Now look at a more complex example, involving the formatting of a Shakespeare play. You can download the text of all Shakespeare's plays, marked up in XML by Jon Bosak, at http://metalab.unc.edu/bosak/xml/eg/shaks200.zip .

Grouping Consecutive Elements by Name
start example

This example shows how to tackle a problem in which the content of an element (in this case, a <SPEECH> ) consists of a number of <SPEAKER> elements followed by a number of <LINE> elements.

Source

You can run this example on any of the Shakespeare plays. For convenience, the download directory contains the file ado-scene1.xml , containing the first scene from Much Ado About Nothing.

In this markup, a <SCENE> element consists of a sequence of <SPEECH> elements interleaved with stage directions. A <SPEECH> contains one or more <SPEAKER> elements indicating who is speaking, and one or more <LINE> elements indicating what they are saying. So in Hamlet you have speeches like this:

  <SPEECH>   <SPEAKER>HAMLET</SPEAKER>   <LINE>My fate cries out,</LINE>   <LINE>And makes each petty artery in this body</LINE>   <LINE>As hardy as the Nemean lion's nerve.</LINE>   <LINE>Still am I call'd. Unhand me, gentlemen.</LINE>   <LINE>By heaven, I'll make a ghost of him that lets me!</LINE>   <LINE>I say, away! Go on; I'll follow thee.</LINE>   </SPEECH>  

and also speeches with multiple speakers :

  <SPEECH>   <SPEAKER>ROSENCRANTZ</SPEAKER>   <SPEAKER>GUILDENSTERN</SPEAKER>   <LINE>We'll wait upon you.</LINE>   </SPEECH>  

There are very few occasions where Shakespeare allows two characters to speak together for more than a single line (the witches' speech in Macbeth is tagged as <SPEAKER>ALL</SPEAKER> ), but here is an example from Timon of Athens:

  <SPEECH>   <SPEAKER>PHRYNIA</SPEAKER>   <SPEAKER>TIMANDRA</SPEAKER>   <LINE>Well, more gold: what then?</LINE>   <LINE>Believe't, that we'll do any thing for gold.</LINE>   </SPEECH>  

Output

Suppose that you want to output each speech as a row in a table, with the speakers listed in one column, and the text in the other. It should look like this (you wouldn't normally make the table cells visible in this way, but it helps to be able to see the structure):

PHRYNIA TIMANDRA

More counsel with more money, bounteous Timon.

TIMON

More whore, more mischief first; I have given you earnest.

ALCIBIADES

Strike up the drum towards Athens! Farewell, Timon: If I thrive well, I'll visit thee again.

TIMON

If I hope well, I'll never see thee more.

Stylesheet

So what does the stylesheet look like?

The content of a <SPEECH> element consists of two groups: a group containing consecutive <SPEAKER> elements and a group containing consecutive <LINE> elements. So you can write it like this:

  <xsl:template match="SPEECH">   <tr>   <xsl:for-each-group select="*" group-adjacent="name()">   <td valign="top">   <xsl:for-each select="current-group()">   <xsl:apply-templates select="."/>   <xsl:if test="position() !=last()"><br/></xsl:if>   </xsl:for-each>   </td>   </xsl:for-each-group>   </tr>   </xsl:template>  
end example
 

Here we are using the name of an element as its grouping key.

Actually, I omitted one complication. Within the sequence of <LINE> elements there can also be a <STAGEDIR> representing a stage direction, thus:

  <SPEECH>   <SPEAKER>TIMON</SPEAKER>   <LINE>Long live so, and so die.</LINE>   <STAGEDIR>Exit APEMANTUS</STAGEDIR>   <LINE>I am quit.</LINE>   <LINE>Moe things like men! Eat, Timon, and abhor them.</LINE>   </SPEECH>  

When this happens you would want to output it, in its proper place, in italics:

TIMON

Long live so, and so die.

Exit APEMANTUS

I am quit.

Moe things like men! Eat, Timon, and abhor them.

What does this do to the stylesheet?

The second group, the one that comprises the right-hand column of the table, no longer shares a common element name. What you can do, however, is allocate <SPEAKER> elements to one group, and anything else to a different group.

  <xsl:template match="SPEECH">   <tr>   <xsl:for-each-group select="*"   group-adjacent="if (self::SPEAKER) then 0 else 1">   <td valign="top">   <xsl:for-each select="current-group()">   <xsl:apply-templates select="."/>   <xsl:if test="position()!=last()"><br/></xsl:if>   </xsl:for-each>   </td>   </xsl:for-each-group>   </tr>   </xsl:template>  

The fact that you output the content of the <SPEAKER> and <LINE> elements using <xsl:apply-templates> means that you don't have to change the body of this rule to handle <STAGEDIR> elements as well, all you need to do is add a template rule with «match="SPEECH/STAGEDIR" » to handle them.

To complete the stylesheet, you need to add template rules for the individual elements such as <STAGEDIR> . These are straightforward, so I will not list them here. You can find the complete stylesheet in speech.xsl .

You could actually have used a boolean grouping key, «group-adjacent="boolean (self::SPEAKER)" » , but that would be a little obscure for my taste.

All these examples so far would work equally well using group-by rather than group-adjacent , because there are no nonadjacent items that would have been put in the same group if you had used group-by . But it's still worth using group-adjacent , if only because it's likely to be more efficient-the system knows that it doesn't need to do any sorting or hashing, it just has to compare adjacent items.

Example: Handling Repeating Groups of Adjacent Elements
start example

This example is a slightly more difficult variant of the previous example, in which the <SPEECH> elements have been omitted from the input markup.

Source

If the Shakespeare markup had been done by someone less capable than Jon Bosak, the <SPEECH> elements might have been left out. You would then see a structure like this:

  <SPEAKER>PHRYNIA</SPEAKER>   <SPEAKER>TIMANDRA</SPEAKER>   <LINE>More counsel with more money, bounteous Timon.</LINE>   <SPEAKER>TIMON</SPEAKER>   <LINE>More whore, more mischief first; I have given you earnest.</LINE>   <SPEAKER>ALCIBIADES</SPEAKER>   <LINE>Strike up the drum towards Athens! Farewell, Timon:</LINE>   <LINE>If I thrive well, I'll visit thee again.</LINE>  

I have modified the markup of this (very long) scene from Timon of Athens and included it as timon-scene.xml .

Output

The required output is the same as in the previous example: that is, a table, in which each row represents one speech, with the names of the speakers in one column and the lines spoken in the other.

Stylesheet

There are various ways of handling such a structure, none of them particularly easy. One approach is to do the grouping bottom-up: First you put a group of consecutive speakers in a <SPEAKERS> element and a group of consecutive lines and stage directions in a <LINES> element; then you process the sequence of alternating <LINES> and <SPEAKERS> elements. Here's the logic, which is expanded into a full stylesheet in the download file alternate-groups.xsl :

  <xsl:template match="SCENE">   <table>   <xsl:variable name="sequence" as="element()*">   <xsl:for-each-group select="*"   group-adjacent="if (self::SPEAKER)   then 'SPEAKERS' else 'LINES'">   <xsl:element name="{current-grouping-key()}">   <xsl:copy-of select="current-group()"/>   </xsl:element>   </xsl:for-each-group>   </xsl:variable>   <xsl:for-each-group select="$sequence"   group-starting-with="SPEAKERS">   <tr>   <xsl:for-each select="current-group()">   <td valign="top">   <xsl:for-each select="*">   <xsl:apply-templates/>   <xsl:if test="position() != last()"><br/></xsl:if>   </xsl:for-each>   </td>   </xsl:for-each>   </tr>   </xsl:for-each-group>   <table>   </xsl:template>  

This does the grouping in two phases. The first phase creates a sequence of alternating elements named <SPEAKERS> and <LINES> , which you constructed by choosing these as your grouping keys. This sequence is held in a variable. The second phase uses group-starting-with to recognize a group consisting of a <SPEAKERS> element followed by a <LINES> element. All that remains is to process each group, which of course consists of a <SPEAKERS> element holding one or more <SPEAKER> elements, followed by a <LINES> element holding one or more <LINE> and <STAGEDIR> elements.

end example
 

If I had presented an example query "find all the speeches in Shakespeare involving two or more speakers and containing two or more lines," and had presented the solution as «collection ('shakes.xml')//SPEECH[SPEAKER[2] and LINES[2]] » , you would probably have found the example rather implausible. But if you want to know how I found the Timon of Athens quote, you have your answer.

Using group-starting-with

Like group-adjacent , the group-starting-with option selects groups of items that are adjacent in the population, and it therefore tends to be used with document-oriented XML. The difference is that with this option, there doesn't have to be any value that the adjacent nodes have in common: All that you need is a pattern that matches the first node in each group.

I used this technique for the Shakespeare example. In fact, given a scene consisting of alternating sequences of <SPEAKER> elements and <LINE> elements, with no <SPEECH> elements to mark the boundaries, I could have reconstructed the <SPEECH> elements by writing:

  <xsl:template match="SCENE">   <xsl:copy>   <xsl:for-each-group select="*" group-starting-with=   "SPEAKER[not(preceding-sibling::*[1] [self::SPEAKER])]">   <SPEECH>   <xsl:copy-of select="current-group()"/>   </SPEECH>   </xsl:for-each-group>   </xsl:copy>   </xsl:template>  

Here the pattern that marks out the first element in a new group is that it is a <SPEAKER> element whose immediately preceding sibling element (if it has one) is not another <SPEAKER> element.

A common use for group-starting-with is the implicit hierarchies one sees in XHTML. We will explore this in the next example.

Handling Flat XHTML Documents
start example

This example shows how to create a hierarchy to represent the underlying structure of an XHTML document in which headings and paragraphs are all represented as sibling elements.

Source

A typical XHTML document looks like this ( flat.xml ):

  <html>   <body>   <h1>Title</h1>   <p>We need to understand how hierarchies can be flat.</p>   <h2>Subtitle</h2>   <p>Let's get to the point.</p>   <p>The second paragraph in a section often says very little.</p>   <p>But the third gets to the heart of the matter.</p>   <h2>Subtitle</h2>   <p>To conclude, we are dealing with a flat hierarchy.</p>   </body>   </html>  

This fragment consists of a <body> element with eight child elements, all at the same level of the tree. Very often, if you want to process this text, you will need to understand the hierarchic structure even though it is not explicit in the markup. For example, you may want to number the last paragraph as «1.2.1 » .

Output

To manipulate this data, you need to transform into a structure like the one below that reflects the true hierarchy:

  <body>   <div><head>Title</head>   <p>We need to understand how hierarchies can be flat.</p>   <div><head>Subtitle</head>   <p>Let's get to the point.</p>   <p>The second paragraph in a section often says very little.</p>   <p>But the third gets to the heart of the matter.</p>   </div>   <div><head>Subtitle</head>   <p>To conclude, we are dealing with a flat hierarchy.</p>   </div>   </div>   </body>  

Stylesheet

The group-starting-with option is ideal for this purpose, because the <h1> and <h2> elements are easy to match. Here is the code ( unflatten.xsl ):

  <xsl:template match="body">   <xsl:copy>   <xsl:for-each-group select="*" group-starting-with="h1">   <xsl:apply-templates select="." mode="group"/>   </xsl:for-each-group>   </xsl:copy>   </xsl:template>   <xsl:template match="h1" mode="group">   <div><head><xsl:value-of select="."/></head>   <xsl:for-each-group select="current-group() except ."   group-starting-with="h2">   <xsl:apply-templates select="." mode="group"/>   </xsl:for-each-group>   </div>   </xsl:template>   <xsl:template match="h2" mode="group">   <div><head><xsl:value-of select="."/></head>   <xsl:for-each-group select="current-group() except ."   group-starting-with="h3">   <xsl:apply-templates select="." mode="group"/>   </xsl:for-each-group>   </div>   </xsl:template>   <xsl:template match="h3" mode="group">   <div><head><xsl:value-of select="."/></head>   <xsl:copy-of select="current-group() except ."/>   </div>   </xsl:template>   <xsl:template match="p" mode="group"    >    <xsl:copy-of select="current-group()"/>   </xsl:template>  

I've shown this down to three levels; it should be obvious how it can be extended.

When an <h1> element is matched, it is processed as part of a group that starts with an <h1> element and then contains a number of <p> and <h2> elements interleaved. The template rule first outputs the contents of the <h1> element as a heading, and then splits the contents of this group (excluding the first <h1> element, which is of no further interest) into subgroups. The first subgroup will typically start with an ordinary <p> element, and all subsequent subgroups will start with an <h2> element. Call <xsl:apply-templates> to process the first element in the subgroup, and this fires off either the «match="p" » template (for the first group) or the «match="h2" » template (for others). The «match="p" » template simply copies the group of <p> elements to the result tree, while the «match="h2" » template starts yet another level of grouping based on the <h3> elements, and so on.

If you wanted to be clever you could handle the <h1> , <h2> , <h3> , ... , <h8> elements with a single generic rule. This could be done by writing the pattern as:

  group-starting-with=   "*[name()=translate(name(current()), '12345678', '23456789')]"  
end example
 

Using group-ending-with

The group-ending-with option complements group-starting-with by matching the last item in a group instead of the first. This requirement is far less common, but it does arise. The classical example for it is where a large document has been broken up, for transmission reasons, into small arbitrary chunks , and the last chunk carries some distinguishing characteristic such as the absence of an attribute saying «continued="yes" » . To reconstitute the documents from the sequence of chunks, group-ending-with is the answer:

  <xsl:template match="sequence-of-chunks">   <xsl:for-each-group group-ending-with="*[not(@continued='yes')]">   <doc>   <xsl:copy-of select="current-group()/*"/>   </doc>   </xsl:for-each-group>   </xsl:template>  

Arranging Data in Tables

Arranging data in tables is a common requirement when generating HTML pages, and the <xsl:for-each-group> instruction can help with this in a number of ways. I will not present any detailed worked examples here, just a checklist of techniques. However, the examples are expanded in the download files: see towns.xml , towns-by-rows.xsl and towns-by- columns .xsl .

If you need to arrange data in rows, like this:

Andover

Basingstoke

Crawley

Dorking

Egham

Farnham

Guildford

Horsham

Ironbridge

Jarrow

Kingston

Leatherhead

the simplest approach is this, where «$cols » is the number of columns required:

  <xsl:for-each-group select="town"   group-adjacent="(position()-1) idiv $cols">   <tr>   <xsl:for-each select="current-group()">   <td>   <xsl:value-of select="."/>   </td>   </xsl:for-each>   </tr>   </xsl:for-each-group>  

If the data needs to be sorted first, use the <xsl:perform-sort> instruction. For example:

  <xsl:variable name="sorted-towns" as="element()*">   <xsl:perform-sort select="town">   <xsl:sort/>   </xsl:perform-sort>   </xsl:variable>   <xsl:for-each-group select="$sorted-towns"   group-adjacent="(position()-1) idiv $cols">   ...   </xsl:for-each-group>  

The <xsl:perform-sort> instruction is described on page 405.

If you need to generate empty table cells to fill up the last row, one convenient way is to add them to the sequence before you start:

  <xsl:variable name="gaps"   select="(count(towns) idiv $cols)*$cols + $cols -- count(towns)"/>   <xsl:variable name=''padding" select="   if ($gaps = $cols) then ()   else for $i in 1 to $gaps return '&nbsp;'"/>   <xsl:variable name=''cells" select="towns, $padding''/>   <xsl:for-each-group select="$cells">   ...   </xsl:for-each-group>  

If you want to arrange the data in columns, like this:

Andover

Dorking

Guildford

Jarrow

Basingstoke

Egham

Horsham

Kingston

Crawley

Farnham

Ironbridge

Leatherhead

then it is probably simplest to use group-by . The grouping key (the things that the towns in a particular row have in common) is the value of «position() mod 3 » where 3 is the number of rows, which you can calculate as «count ($cells) idiv $cols » ):

  <xsl:for-each-group select="$cells"   group-by="position() mod (last() idiv $cols)">   <tr>   <xsl:for-each select="current-group()">   <td>   <xsl:value-of select="."/>   </td>   </xsl:for-each>   </tr>   </xsl:for-each-group>  

See Also

<xsl:perform-sort> on page 405

<xsl:sort> on page 423

Collations on page 427

current-group() function on page 529

current-grouping-key() function on page 530

distinct-values() function in XPath 2.0 Programmer's Reference




XSLT 2.0 Programmer's Reference
NetBeansв„ў IDE Field Guide: Developing Desktop, Web, Enterprise, and Mobile Applications (2nd Edition)
ISBN: 764569090
EAN: 2147483647
Year: 2003
Pages: 324

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net