Unicode Code Point Functions
There are two functions in XPath 2.0 designed to work with
Unicode
code points
, which refers to Unicode
equivalent of
The fn:codepoints-to-string FunctionYou pass this function a sequence of Unicode code points, and it converts them to a string. Here's how you use this function:
fn:codepoints-to-string(
$srcval
as xs:integer*) as xs:string
For example, to convert the Unicode code point sequence (65, 66, 67) into the corresponding string, "ABC", we can use this function as you see in ch10_13.xsl (Listing 10.13). Listing 10.13 An XSLT Example Using the XPath Function fn:codepoints-to-string ( ch10_13.xsl )
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:value-of select="codepoints-to-string((65, 66, 67))"/>
</xsl:template>
</xsl:stylesheet>
And here is the result: <?xml version="1.0" encoding="UTF-8"?> ABC The fn:string-to-codepoints FunctionThis function lets you create a sequence of Unicode code points from a string. Here's how you use this function:
fn:string-to-codepoints(
$srcval
as xs:string) as xs:integer*
You can see an example in ch10_14.xsl in Listing 10.14, where we're converting the string "ABC" to a sequence of code points, and displaying that sequence by using the separator attribute in the <xsl:value-of> element. Listing 10.14 An XSLT Example Using the XPath Function fn:string-to-codepoints ( ch10_14.xsl )
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:value-of select="string-to-codepoints('ABC')" separator=", "/>
</xsl:template>
</xsl:stylesheet>
And here's the resultas you can see, we've been able to convert "ABC" into a sequence of code points: <?xml version="1.0" encoding="UTF-8"?> 65, 66, 67 |
Using Pattern Matching
A big addition to XPath in version 2.0 are the
All of these functions use regular expressions. The regular expressions used in XPath 2.0 are the same as those used in XML schema, with some additions. Understanding Regular ExpressionsNow that regular expressions are supported by XML schema, books on XML discuss how to create regular expressions. Nonetheless, we'll give a brief introduction to the topic here for those not familiar with the subject.
MORE ON REGULAR EXPRESSIONS You can find the XML schema support for regular expressions discussed in http://www.w3.org/TR/xmlschema-2/. This support is a subset of the regular expressions used in the Perl programming language, and you can find the complete documentation for Perl regular expressions at the Comprehensive Perl Archive Network (CPAN) Web site: www.cpan.org/doc/manual/html/pod/perlre.html. Regular expressions are made up of patterns , and these patterns can be used to match text in your data. Each character matches itself by default, so if you have the pattern Hello
you can match the text "Hello". You can also use regular
expression special
\bHello\b Here are the special characters, called metacharacters , that you can use in regular expressions:
And here are the available assertions, which assert that a particular condition is true:
For example, if you wanted to match a
\d\d\d-\d\d-\d\d\d\d
Here's another example, this time using
character
classes
. For example, the character class
[abc]
matches only the characters "a", "b", or "c". You can use a dash in
a character class as a shortcut to
\b([A-Za-z]+)\b The + sign is a regular expression quantifier . You can use these quantifiers in regular expressions:
This regular expression matches any word (even if it includes digits): \b\w+\b
You can also
\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
The matches to the
XPath 2.0 Versus XML Schema DifferenceThe regular expressions in XPath 2.0 are actually more powerful than those in XML schema. XML Schema uses regular expressions only for validity checking, which means it doesn't support some powerful text-handling techniques. In particular, two modes are defined in XPath 2.0 regular expressions: string mode and multiline mode (just as in Perl regular expressions). You specify which mode you want with flags, coming up in a page or two. In addition, two special characters, ^ and $, are also supported in XPath 2.0 regular expressions. As in standard regular expressions, in string mode, the character ^ matches the start of the string, and $ matches the end of the string. In multiline mode, ^ matches the start of any line (lines are broken up with newline, \n , characters, which is #x0A in Unicode), and $ matches the end of any line. As in standard regular expressions, when you're in string mode, the character . matches any character. In multiline mode, the metacharacter . matches any character except a newline character. For example, the regular expression ^J.*d$ will match this text: James Bond Minimal matching is also supported in XPath 2.0 regular expressions. For example, suppose you have the text, "That is some book, isn't it?" and you want to match the regular expression .*is to this string. In the default case, this expression will match as much as it can, so instead of matching "That is", this regular expression will match "That is some book, is". To indicate that you want to match as little as possible, you can use an additional question mark, so the regular expression .*?is would match "That is". Here is how minimal matching works in XPath 2.0 regular expressions:
The three regular expression functions support an optional
parameter,
$flags
, that you use to set options. This
parameter is a string, and individual letters are used to set the
corresponding options. The presence of a letter in the string
indicates that the option is on; if it's not present, the option is
off.
The fn:matches FunctionThe fn:matches function returns true if a regular expression matches the text in a string, and false otherwise. Here are the two ways to use this function: fn:matches( $srcval as xs:string?, $pattern as xs:string) as xs:boolean? fn:matches( $srcval as xs:string?, $pattern as xs:string, $flags as xs:string) as xs:boolean? Here's an example; we'll check whether we can find the word "bananas" in a string like this: fn:matches('Want some bananas today?', '\bbananas\b') . You can see how this works in ch10_15.xsl in Listing 10.15. Listing 10.15 An XSLT Example Using the XPath Function fn:matches ( ch10_15.xsl )
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:value-of select="if(matches('Want some bananas today?',
'\bbananas\b'))
then 'Yes, we have some bananas.'
else 'No, we have no bananas.'"/>
</xsl:template>
</xsl:stylesheet>
And here's the result, where we matched the word "bananas": <?xml version="1.0" encoding="UTF-8"?> Yes, we have some bananas. Using this function, then, you can perform regular expression matching. The fn:replace FunctionThis function replaces matched text with other text. Here are the two ways to use it: fn:replace( $srcval as xs:string?, $pattern as xs:string, $replacement as xs:string) as xs:string? fn:replace( $srcval as xs:string?, $pattern as xs:string, $replacement as xs:string, $flags as xs:string) as xs:string? In this case, the function replaces matches to $pattern in $srcval with $replacement . For example, say that you wanted to replace "bananas" in our text "Want some bananas today?" with "oranges". You can do that with the fn:replace function, as you see in ch10_16.xsl (Listing 10.16). Listing 10.16 An XSLT Example Using the XPath Function fn:replace ( ch10_16.xsl )
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:value-of select="replace('Want some bananas today?',
'\bbananas\b', 'oranges')"/>
</xsl:template>
</xsl:stylesheet>
And here is the result: <?xml version="1.0" encoding="UTF-8"?> Want some oranges today? If you enclose subexpressions in parentheses, you can refer to matches to those subexpressions as $1 , $2 , and so on up to $9 in the $replacement string. For example, say that you want to extract the two words from the string "Bananas, Apples". To do that, you can use the regular expression (\w+), (\w+) , and refer to the text that matched the first \w+ subexpression as $1 in the replacement text, and the text that matched the second \w+ subexpression as $2 in the replacement text. You can see how this works in ch10_17.xsl (Listing 10.17). Listing 10.17 Matching Subexpressions with fn:replace ( ch10_17.xsl )
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:value-of select="replace('Bananas, Apples',
'(\w+), (\w+)', 'Item 1: Item 2:')"/>
</xsl:template>
</xsl:stylesheet>
And here are the results you get from Saxon: <?xml version="1.0" encoding="UTF-8"?> Item 1:Bananas Item 2:Apples
As you can see, using the
fn:replace
function, you can
perform
The fn:tokenize Function
This function is designed to break up text into smaller
fn:tokenize( $srcval as xs:string?, $pattern as xs:string) as xs:string* fn:tokenize( $srcval as xs:string?, $pattern as xs:string, $flags as xs:string) as xs:string* You use this function to split up the text in $srcval into pieces separated by text matching the pattern in $pattern . For example, say that you want to break up the text "Now is the time" into the words "Now", "is", "the", "time". You can do that if you instruct the fn:tokenize function to break on space characters, \s . You can see how this works in ch10_18.xsl (Listing 10.18). Listing 10.18 An XSLT Example Using the XPath Function fn:tokenize ( ch10_18.xsl )
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:value-of select="tokenize('Now is the time', '\s+')"
separator=", "/>
</xsl:template>
</xsl:stylesheet>
And here are the results you get from Saxon: <?xml version="1.0" encoding="UTF-8"?> Now, is, the, time |