Recipe5.1.Ignoring Duplicate Elements


Recipe 5.1. Ignoring Duplicate Elements

Problem

You want to select all nodes that are unique in a given context based on uniqueness criteria.

Solution

XSLT 1.0

Selecting unique nodes is a common application of the preceding and preceding-sibling axes. If the elements you select are not all siblings, then use preceding. The following code produces a unique list of products from SalesBySalesperson.xml:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">      <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>     <xsl:template match="/"> <products>      <xsl:for-each select="//product[not(@sku=preceding::product/@sku)]">           <xsl:copy-of select="."/>      </xsl:for-each> </products> </xsl:template>          </xsl:stylesheet>

If the elements are all siblings, then use preceding-sibling:

<products>      <product sku="10000" totalSales="10000.00"/>      <product sku="10000" totalSales="990000.00"/>      <product sku="10000" totalSales="1110000.00"/>      <product sku="20000" totalSales="50000.00"/>      <product sku="20000" totalSales="150000.00"/>      <product sku="20000" totalSales="150000.00"/>      <product sku="25000" totalSales="920000.00"/>      <product sku="25000" totalSales="2920000.00"/>      <product sku="30000" totalSales="5500.00"/>      <product sku="30000" totalSales="115500.00"/>      <product sku="70000" totalSales="10000.00"/> </products>     <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">      <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>     <xsl:template match="/products "> <products>      <xsl:for-each select="product[not(@sku=preceding-sibling::product/@sku)]">           <xsl:copy-of select="."/>      </xsl:for-each> </products> </xsl:template>          </xsl:stylesheet>

To avoid preceding, which can be inefficient, travel up to the ancestors that are siblings, and then use preceding-sibling and travel down to the nodes you want to test:

<xsl:for-each select="//product[not(@sku=../preceding-sibling::*/product/@sku)]">      <xsl:copy-of select="."/> </xsl:for-each>

If you are certain that the elements are sorted so that duplicate nodes are adjacent (as in the earlier products), then you only have to consider the immediately preceding sibling:

<xsl:for-each       select="/salesperson/product[not(@name=preceding-sibling::product[1]/@name]">      <!-- do something with each uniquely named product --> </xsl:for-each>

XSLT 2.0

You can solve this problem in XSLT 2.0 as a grouping problem. Simply use for-each-group with the uniqueness criteria as the group-by value. Use the first node in the current group as the unique one:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">      <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>      <xsl:template match="/">     <products>       <xsl:for-each-group select="//product" group-by="@sku">         <xsl:copy-of select="current-group( )[1]"/>       </xsl:for-each-group>     </products>   </xsl:template>    </xsl:stylesheet>

Discussion

XSLT 1.0

Using the node-set( ) extension function, you can also do the following:

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt=" http://exslt.org">      <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>     <xsl:template match="/">       <xsl:variable name="products">      <xsl:for-each select="//product">           <xsl:sort select="@sku"/>           <xsl:copy-of select="."/>      </xsl:for-each> </xsl:variable>       <products>      <xsl:for-each select="exslt:node-set($products)/product">           <xsl:variable name="pos" select="position( )"/>           <xsl:if test="$pos = 1 or            not(@sku = $products/preceding-sibling::product[1]/@sku">                <xsl:copy-of select="."/>           </xsl:if>      </xsl:for-each> </products>       </xsl:template>

However, I have never found this technique to be faster than using the preceding axis. This technique does have an advantage in situations where the duplicate testing is not trivial. For example, consider a case where duplicates are determined by the concatenation of two attributes:

<xsl:stylesheet version="1.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"  xmlns:exslt=" http://exslt.org">      <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>     <xsl:template match="/">       <xsl:variable name="people">      <xsl:for-each select="//person">           <xsl:sort select="concat(@lastname,@firstname)"/>           <xsl:copy-of select="."/>      </xsl:for-each> </xsl:variable>       <products>      <xsl:for-each select="exslt:node-set($people/)person">           <xsl:variable name="pos" select="position( )"/>           <xsl:if test="$pos = 1 or                 concat(@lastname,@firstname) !=                            concat(people/person[$pos - 1]/@lastname,                                  people/person[$pos - 1]/@firstname)">                <xsl:copy-of select="."/>           </xsl:if>      </xsl:for-each> </products>       </xsl:template>

When you attempt to remove duplicates, the following examples do not work:

<xsl:template match="/"> <products>      <xsl:for-each select="//product[not(@sku=preceding::product[1]/@sku)]">            <xsl:sort select="@sku"/>           <xsl:copy-of select="."/>      </xsl:for-each> </products> </xsl:template>

Do not sort to avoid considering all but the immediately preceding element. The axis is relative to the node's original order in the document. The same situation applies when using preceding-sibling. The following code is also sure to fail:

<xsl:template match="/">       <xsl:variable name="products">      <xsl:for-each select="//product">       <!-- sort removed from here -->           <xsl:copy-of select="."/>      </xsl:for-each> </xsl:variable>       <products>      <xsl:for-each select="exsl:node-set($products)/product">            <xsl:sort select="@sku"/>           <xsl:variable name="pos" select="position( )"/>           <xsl:if test="$pos = 1 or                 @sku != $products/product[$pos - 1]/@sku">                <xsl:copy-of select="."/>           </xsl:if>      </xsl:for-each> </products> </xsl:template>

This code fails because position( ) returns the position after sorting, but the contents of $products has not been sorted; instead, an inaccessible copy of it was.

XSLT 2.0

Sometimes you only want to remove duplicate elements when they are adjacent. Consider, for example, a data set derived from a series of measurements taken at set time intervals. If the system being measured is fairly stable, there may be many adjacent measurements that are equal. One may want to remove these adjacent duplicates without removing other equal measurements that appear later in the sequence.

For this problem, you would still use xsl:for-each-group but with group-adjacent rather than group-by:

<measurements>   <data time="12:00:00" value="1.0"/>   <data time="12:00:01" value="1.0"/>   <data time="12:00:02" value="1.1"/>   <data time="12:00:03" value="1.1"/>   <data time="12:00:04" value="1.0"/>   <data time="12:00:05" value="1.1"/>   <data time="12:00:06" value="1.2"/>   <data time="12:00:07" value="1.3"/>   <data time="12:00:08" value="1.4"/>   <data time="12:00:09" value="1.6"/>   <data time="12:00:10" value="1.9"/>   <data time="12:00:11" value="2.1"/>   <data time="12:00:12" value="1.7"/>   <data time="12:00:13" value="1.5"/> </measurements> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">   <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>      <xsl:template match="/measurements">     <xsl:copy>       <xsl:for-each-group select="data" group-adjacent="@value">         <xsl:copy-of select="current-group( )[1]"/>       </xsl:for-each-group>     </xsl:copy>   </xsl:template>    </xsl:stylesheet> <!--Output --> <measurements>    <data time="12:00:00" value="1.0"/>    <data time="12:00:02" value="1.1"/>    <data time="12:00:04" value="1.0"/>    <data time="12:00:05" value="1.1"/>    <data time="12:00:06" value="1.2"/>    <data time="12:00:07" value="1.3"/>    <data time="12:00:08" value="1.4"/>    <data time="12:00:09" value="1.6"/>    <data time="12:00:10" value="1.9"/>    <data time="12:00:11" value="2.1"/>    <data time="12:00:12" value="1.7"/>    <data time="12:00:13" value="1.5"/> </measurements>

See Also

The XSLT FAQ (http://www.dpawson.co.uk/xsl/sect2/N2696.html) describes a solution that uses keys and describes solutions to related problems.




XSLT Cookbook
XSLT Cookbook: Solutions and Examples for XML and XSLT Developers, 2nd Edition
ISBN: 0596009747
EAN: 2147483647
Year: 2003
Pages: 208
Authors: Sal Mangano

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net