Recipe 2.4. Finding Substrings from the End of a StringProblemXSLT does not have any functions for searching strings in reverse. SolutionXSLT 1.0Using recursion, you can emulate a reverse search with a search for the last occurrence of substr. Using this technique, you can create a substring-before-last and a substring-after-last: <xsl:template name="substring-before-last"> <xsl:param name="input" /> <xsl:param name="substr" /> <xsl:if test="$substr and contains($input, $substr)"> <xsl:variable name="temp" select="substring-after($input, $substr)" /> <xsl:value-of select="substring-before($input, $substr)" /> <xsl:if test="contains($temp, $substr)"> <xsl:value-of select="$substr" /> <xsl:call-template name="substring-before-last"> <xsl:with-param name="input" select="$temp" /> <xsl:with-param name="substr" select="$substr" /> </xsl:call-template> </xsl:if> </xsl:if> </xsl:template> <xsl:template name="substring-after-last"> <xsl:param name="input"/> <xsl:param name="substr"/> <!-- Extract the string which comes after the first occurrence --> <xsl:variable name="temp" select="substring-after($input,$substr)"/> <xsl:choose> <!-- If it still contains the search string the recursively process --> <xsl:when test="$substr and contains($temp,$substr)"> <xsl:call-template name="substring-after-last"> <xsl:with-param name="input" select="$temp"/> <xsl:with-param name="substr" select="$substr"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$temp"/> </xsl:otherwise> </xsl:choose> </xsl:template> XSLT 2.0XSLT 2.0 does not add reverse versions of substring-before/after, but one can get the desired effect using the versatile tokenize( ) function that uses regular expressions: <xsl:function name="ckbk:substring-before-last"> <xsl:param name="input" as="xs:string"/> <xsl:param name="substr" as="xs:string"/> <xsl:sequence select="if ($substr) then if (contains($input, $substr)) then string-join(tokenize($input, $substr) [position( ) ne last( )],$substr) else '' else $input"/> </xsl:function> <xsl:function name="ckbk:substring-after-last"> <xsl:param name="input" as="xs:string"/> <xsl:param name="substr" as="xs:string"/> <xsl:sequence select="if ($substr) then if (contains($input, $substr)) then tokenize($input, $substr)[last( )] else '' else $input"/> </xsl:function> In both functions, we have to test if substring is empty because tokenize does not allow an empty search pattern. Unfortunately, these implementations won't work exactly like their native counterparts. This is because tokenize treats its second argument as a regular, not a literal, string. This could lead to some surprises. You can fix this by having the function escape the special characters used in regular expression. You can switch this behavior on and off via a third Boolean argument. The original two-argument version and this new three-argument version can coexist because XSLT allows functions to be overloaded (a function is defined by its name and its arity or number of arguments): <xsl:function name="ckbk:substring-before-last"> <xsl:param name="input" as="xs:string"/> <xsl:param name="substr" as="xs:string"/> <xsl:param name="mask-regex" as="xs:boolean"/> <xsl:variable name="matchstr" select="if ($mask-regex) then replace($substr,'([.+?*^$])','\$1') else $substr"/> <xsl:sequence select="ckbk:substring-before-last($input,$matchstr)"/> </xsl:function> <xsl:function name="ckbk:substring-after-last"> <xsl:param name="input" as="xs:string"/> <xsl:param name="substr" as="xs:string"/> <xsl:param name="mask-regex" as="xs:boolean"/> <xsl:variable name="matchstr" select="if ($mask-regex) then replace($substr,'([.+?*^$])','\$1') else $substr"/> <xsl:sequence select="ckbk:substring-after-last($input,$matchstr)"/> </xsl:function> DiscussionBoth XSLT string-searching functions (substring-before and substring-after) begin searching at the start of the string. Sometimes you need to search a string from the end. The simplest way to do this in XSLT is to apply the built-in search functions recursively until the last instance of the substring is found.
Another algorithm is divide and conquer. The basic idea is to split the string in half. If the search string is in the second half, then you can discard the first half, thus turning the problem into a problem half as large. This process repeats recursively. The tricky part is when the search string is not in the second half because you may have split the search string between the two halves. Here is a solution for substring-before-last: <xsl:template name="str:substring-before-last"> <xsl:param name="input"/> <xsl:param name="substr"/> <xsl:variable name="mid" select="ceiling(string-length($input) div 2)"/> <xsl:variable name="temp1" select="substring($input,1, $mid)"/> <xsl:variable name="temp2" select="substring($input,$mid +1)"/> <xsl:choose> <xsl:when test="$temp2 and contains($temp2,$substr)"> <!-- search string is in second half so just append first half --> <!-- and recurse on second --> <xsl:value-of select="$temp1"/> <xsl:call-template name="str:substring-before-last"> <xsl:with-param name="input" select="$temp2"/> <xsl:with-param name="substr" select="$substr"/> </xsl:call-template> </xsl:when> <!--search string is in boundary so a simple substring-before --> <!-- will do the trick--> <xsl:when test="contains(substring($input, $mid - string-length($substr) +1), $substr)"> <xsl:value-of select="substring-before($input,$substr)"/> </xsl:when> <!--search string is in first half so throw away second half--> <xsl:when test="contains($temp1,$substr)"> <xsl:call-template name="str:substring-before-last"> <xsl:with-param name="input" select="$temp1"/> <xsl:with-param name="substr" select="$substr"/> </xsl:call-template> </xsl:when> <!-- No occurrences of search string so we are done --> <xsl:otherwise/> </xsl:choose> </xsl:template> As it turns out, divide and conquer is of little or no advantage unless you search large texts (roughly 4,000 characters or more). You might have a wrapper template that chooses the appropriate algorithm based on the length or switches from divide and conquer to the simpler algorithm when the subpart becomes small enough. |