Recipe3.7.Finding Minimums and Maximums


Recipe 3.7. Finding Minimums and Maximums

Problem

You need to find the minimum (or maximum) numerical node (or nodes) in a node set.

Solution

XSLT 1.0

The EXSLT functions that perform these operations are exsl:min, exsl:max, exsl:lowest, and exsl:highest. The functions min and max find the value of the node with minimum and maximum numerical value, respectively. EXSLT defines exsl:min as listed here:

  • The minimum value is defined as follows: the node set passed as an argument is sorted in ascending order as it would be by xsl:sort with a data type of number. The minimum is the result of converting the string value of the first node in this sorted list to a number using the number function.

  • If the node set is empty, or if the result of converting the string values of any of the nodes to a number is NaN, then NaN is returned.

exsl:max is defined similarly. EXSLT provides pure XSLT implementations that are literal implementations of this definition, as shown in Example 3-8.

Example 3-8. EXSLT min and max implement directly from the definition
<xsl:template name="ckbk:min">    <xsl:param name="nodes" select="/.." />    <xsl:choose>       <xsl:when test="not($nodes)">NaN</xsl:when>       <xsl:otherwise>          <xsl:for-each select="$nodes">             <xsl:sort data-type="number" />             <xsl:if test="position( ) = 1">                <xsl:value-of select="number(.)" />             </xsl:if>          </xsl:for-each>       </xsl:otherwise>    </xsl:choose> </xsl:template>     <xsl:template name="ckbk:max">    <xsl:param name="nodes" select="/.." />    <xsl:choose>       <xsl:when test="not($nodes)">NaN</xsl:when>       <xsl:otherwise>          <xsl:for-each select="$nodes">             <xsl:sort data-type="number" order="descending" />             <xsl:if test="position( ) = 1">                <xsl:value-of select="number(.)" />             </xsl:if>          </xsl:for-each>       </xsl:otherwise>    </xsl:choose> </xsl:template>

You may be scratching your head over the default value for the nodes parameter (select="/.."). This is simply an idiomatic way to initialize nodes to an empty node set (i.e., the parent of the root is empty by definition).

Although the definitions of ckbk:min and ckbk:max use xsl:sort, the implementations need not, so results that are more efficient are possible provided your XSLT processor supports tail recursion (see Example 3-9).

Example 3-9. min and max implemented with divide and conquer
<xsl:template name="ckbk:max">      <xsl:param name="nodes" select="/.."/>      <xsl:param name="max"/>   <xsl:variable name="count" select="count($nodes)"/>   <xsl:variable name="aNode" select="$nodes[ceiling($count div 2)]"/>   <xsl:choose>     <xsl:when test="not($count)">       <xsl:value-of select="number($max)"/>     </xsl:when>     <xsl:when test="number($aNode) != number($aNode)">       <xsl:value-of select="number($aNode)"/>     </xsl:when>     <xsl:otherwise>       <xsl:call-template name="ckbk:max">         <xsl:with-param name="nodes" select="$nodes[not(. &lt;= number($aNode))]"/>         <xsl:with-param name="max">           <xsl:choose>             <xsl:when test="not($max) or $aNode > $max">               <xsl:value-of select="$aNode"/>             </xsl:when>             <xsl:otherwise>               <xsl:value-of select="$max"/>             </xsl:otherwise>           </xsl:choose>         </xsl:with-param>       </xsl:call-template>     </xsl:otherwise>   </xsl:choose> </xsl:template>     <xsl:template name="ckbk:min">   <xsl:param name="nodes" select="/.."/>   <xsl:param name="min"/>   <xsl:variable name="count" select="count($nodes)"/>   <xsl:variable name="aNode" select="$nodes[ceiling($count div 2)]"/>   <xsl:choose>     <xsl:when test="not($count)">       <xsl:value-of select="number($min)"/>     </xsl:when>     <xsl:when test="number($aNode) != number($aNode)">       <xsl:value-of select="number($aNode)"/>     </xsl:when>     <xsl:otherwise>       <xsl:call-template name="ckbk:min">         <xsl:with-param name="nodes" select="$nodes[not(. >= number($aNode))]"/>         <xsl:with-param name="min">           <xsl:choose>             <xsl:when test="not($min) or $aNode &lt; $min">               <xsl:value-of select="$aNode"/>             </xsl:when>             <xsl:otherwise>               <xsl:value-of select="$min"/>             </xsl:otherwise>           </xsl:choose>         </xsl:with-param>       </xsl:call-template>     </xsl:otherwise>   </xsl:choose> </xsl:template>

Typically, the preceding implementations are faster than the version using xsl:sort. In some degenerate cases, they are likely to be slower. The reason is that efficiency hinges on removing half the nodes from consideration (on average) at each recursive step. One can imagine a scenario in which at each pass, aNode is the minimum node (in the case of ckbk:max) or the maximum remaining node (in the case of ckbk:min). If this were to occur, each pass would remove only one node, resulting in poor performance. Luckily, data tends to come in two configurations: presorted and random. In both cases, these implementations should hold their own.

While you had to look out for non-numeric data explicitly, EXSLT's implementations let xsl:sort take care of this. The XSLT standard mandates that non-numeric data be placed up front by sort when data-type='number'.

Don't be tempted to write not(number($var)) to test for NaN! I often catch myself doing this because it sounds correct. The number function does not test for a number; instead, it attempts to convert its argument to a number. This is not what you wantthis test will conclude 0 is not a number due to the conversion of 0 to false. The correct test is number($var) != number($var). This test works because NaN is never equal to NaN, but any number is always equal to itself. Do not be tempted to shorten this idiom to number($var) != $var. Doing so works most of the time, but if $var is an empty node set, it will fail. If you prefer, a more direct approach string(number($var)) = 'NaN' also works.


EXSLT defines ckbk:lowest as follows.

  • The ckbk:lowest function returns the nodes in the node set whose value is the minimum value for the node set. The minimum value for the node set is the same as the value as calculated by ckbk:min. A node has this minimum value if the result of converting its string value to a number as if by the number function is equal to the minimum value, where the equality comparison is defined as a numerical comparison using the = operator.

  • If any of the nodes in the node set has a non-numeric value, the ckbk:min function will return NaN. The definition numeric comparisons entails that NaN != NaN. Therefore, if any of the nodes in the node set has a non-numeric value, ckbk:lowest will return an empty node set.

The EXSLT implementation is literally based on this definition and might not be very efficient:

<xsl:template name="ckbk:lowest">    <xsl:param name="nodes" select="/.." />    <xsl:if test="$nodes and not($nodes[number(.) != number(.)])">       <xsl:variable name="min">          <xsl:for-each select="$nodes">             <xsl:sort data-type="number" />             <xsl:if test="position( ) = 1">                <xsl:value-of select="number(.)" />             </xsl:if>          </xsl:for-each>       </xsl:variable>       <xsl:copy-of select="$nodes[. = $min]" />    </xsl:if> </xsl:template>

The xsl:if test scans all nodes to handle cases when a non-numeric is present. It then sorts to find the min and finally collects all nodes with that min. Example 3-10 reuses ckbk:min to do the same without needing to sort.

Example 3-10. Lowest implemented by reusing ckbk:min
<xsl:template name="ckbk:lowest">      <xsl:param name="nodes" select="/.."/>          <xsl:variable name="min">        <xsl:call-template name="ckbk:min">          <xsl:with-param name="nodes" select="$nodes"/>        </xsl:call-template>      </xsl:variable>       <xsl:choose>        <xsl:when test="number($min) = number($min)">          <xsl:copy-of select="$nodes[. = $min]"/>        </xsl:when>      </xsl:choose> </xsl:template>

Finally, you can implement ckbk:lowest with only one pass over the nodes if you are willing to forgo reuse of ckbk:min (see Example 3-11).

Example 3-11. Lowest implemented without reuse of ckbk:min
<xsl:template name="ckbk:lowest">   <xsl:param name="nodes" select="/.."/>   <xsl:param name="lowest" select="/.."/>   <xsl:variable name="index" select="ceiling(count($nodes) div 2)"/>   <xsl:variable name="aNode" select="$nodes[$index]"/>   <xsl:choose>     <xsl:when test="not($index)">       <xsl:copy-of select="$lowest"/>     </xsl:when>     <xsl:when test="number($aNode) != number($aNode)"/>     <xsl:otherwise>       <xsl:choose>         <xsl:when test="not($lowest) or $aNode &lt; $lowest">            <xsl:call-template name="ckbk:lowest">             <xsl:with-param name="nodes" select="$nodes[not(. >= $aNode)]"/>             <xsl:with-param name="lowest" select="$nodes[. = $aNode]"/>           </xsl:call-template>         </xsl:when>         <xsl:when test="$aNode = $lowest">            <xsl:call-template name="ckbk:lowest">             <xsl:with-param name="nodes" select="$nodes[not(. >= $aNode)]"/>             <xsl:with-param name="lowest" select="$lowest|$nodes[$index]"/>           </xsl:call-template>         </xsl:when>         <xsl:otherwise>            <xsl:call-template name="ckbk:lowest">             <xsl:with-param name="nodes" select="$nodes[not(. >= $aNode)]"/>             <xsl:with-param name="lowest" select="$lowest"/>           </xsl:call-template>         </xsl:otherwise>       </xsl:choose>     </xsl:otherwise>   </xsl:choose> </xsl:template>

Interestingly, this implementation does worse, probably because of the additional copying that occurs. In performance tests on 10,000 data points using various distributions of data (sorted, reverse sorted, semirandom, and random), the ckbk:min-based implementation beat the xsl:sort-based implementation by about 40% on average (often better). The recursive implementation that did not use ckbk:min was 24% slower than the one that did.

The ckbk:highest definition and implementations follow directly from inverting the relational logic of ckbk:lowest, so I will not discuss them here.

XSLT 2.0

XPath 2.0 has native min and max functions.

Discussion

The minimum and maximum values of a node set can be determined by the simple XPath expressions <xsl:value-of select="$nodes[not($nodes &lt; .)]"/> and <xsl:value-of select="$nodes[not($nodes > .)]"/>. In English, the first says, "Select all nodes for which there is no node less than its value," and the second says, "Select all nodes for which there is no node greater than its value."

Although very simple, these expressions have O(N2) performance, where N is the number of nodes. Therefore, unless you are certain that the number of nodes is small, avoid this shortcut if you can. Occasionally, you are forced to use it because, for example, you have to find the min/max within the select attribute of xsl:sort or the use attribute of xsl:key (for which you cannot call a template).

In another XSLT publication, the following recursive implementation for finding minimums is described as more efficient than one using xsl:sort:

<xsl:template name="ckbk:min">   <xsl:param name="nodes" select="/.."/>   <xsl:param name="min" select="number('NaN')"/>   <xsl:choose>     <xsl:when test="not($nodes)">       <xsl:value-of select="number($min)"/>     </xsl:when>     <xsl:otherwise>       <xsl:variable name="aNode" select="$nodes[1]"/>       <xsl:call-template name="ckbk:min">         <xsl:with-param name="nodes" select="$nodes[position( ) > 1]"/>         <xsl:with-param name="min">           <xsl:choose>             <xsl:when test="$aNode &lt; $min or string($min) = 'NaN'">               <xsl:value-of select="$aNode"/>             </xsl:when>             <xsl:otherwise>               <xsl:value-of select="$min"/>             </xsl:otherwise>           </xsl:choose>         </xsl:with-param>       </xsl:call-template>     </xsl:otherwise>   </xsl:choose> </xsl:template>

This is simply not the case on any XSLT implementation I tested, and I can't imagine that it ever could be. The reason this is much more likely to be slow is because only one node is removed from consideration at each step. Even with tail recursion, there will be a lot of copying of nodes. It is easy to be fooled into thinking that this recursive solution is as efficient as the iterative-indexing solution you might come up with in C or Java. However, indexing is not the same as creating a brand-new node set with the first item removed, as is the case with $nodes[position( ) > 1].

In many cases when you need to find the minimum of a data set, you end up also needing the maximum. It would be nice to have a function handy that gives two for the price of one. The following will do so for a slight increase in complexity:

<xsl:template name="ckbk:min-max">   <xsl:param name="nodes" select="/.."/>   <xsl:param name="nodes-for-max" select="$nodes"/>   <xsl:param name="min"/>   <xsl:param name="max"/>   <xsl:variable name="count1" select="count($nodes)"/>   <xsl:variable name="aNode1" select="$nodes[ceiling($count1 div 2)]"/>   <xsl:variable name="count2" select="count($nodes-for-max)"/>   <xsl:variable name="aNode2" select="$nodes-for-max[ceiling($count2 div 2)]"/>   <xsl:choose>     <xsl:when test="not($count1) and not($count2)">       <xsl:value-of select="concat(number($min),',',number($max))"/>     </xsl:when>     <xsl:when test="number($aNode1) != number($aNode1) and                 number($aNode2) != number($aNode2)">       <xsl:value-of select="concat(number($aNode1),',',number($aNode2))"/>     </xsl:when>     <xsl:otherwise>       <xsl:call-template name="ckbk:min-max">         <xsl:with-param name="nodes" select="$nodes[not(. >= number($aNode1))]"/>         <xsl:with-param name="nodes-for-max"            select="$nodes-for-max[not(. &lt;= number($aNode2))]"/>         <xsl:with-param name="min">           <xsl:choose>             <xsl:when test="not($min) or $aNode1 &lt; $min">               <xsl:value-of select="$aNode1"/>             </xsl:when>             <xsl:otherwise>               <xsl:value-of select="$min"/>             </xsl:otherwise>           </xsl:choose>         </xsl:with-param>         <xsl:with-param name="max">           <xsl:choose>             <xsl:when test="not($max) or $aNode2 > $max">               <xsl:value-of select="$aNode2"/>             </xsl:when>             <xsl:otherwise>               <xsl:value-of select="$max"/>             </xsl:otherwise>           </xsl:choose>         </xsl:with-param>       </xsl:call-template>     </xsl:otherwise>     </xsl:choose> </xsl:template>

Testing shows that this approach continues to outperform a sort-based solution.

When considering minima and maxima, only one special case was addressed: when the nodes are literally the numbers you must process. A more general problem involves finding the minimum or maximum of a function of the nodes in a node set. For example, consider the case in which you have a set of positive and negative numbers and you need the minimum of the square of their value. Although hacking the previous implementations to do the squaring is simple, a single reusable implementation is more desirable. Chapter 14 revisits this problem and describes several alternatives for creating generic solutions.




XSLT Cookbook
XSLT Cookbook: Solutions and Examples for XML and XSLT Developers, 2nd Edition
ISBN: 0596009747
EAN: 2147483647
Year: 2003
Pages: 208
Authors: Sal Mangano

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net