16.4 Using Regular Expressions

Regular expressions allow you to define specific patterns for searching strings of text. XML Schema supports regular expressions, and XSLT 2.0 relies on XML Schema-style regular expressions. Table 16-1 shows a sampling of symbols used in regular expressions that XSLT 2.0 supports. The table represents only a few of the possibilities.

Table 16-1. Sample of regular expression symbols

Regular Expression

Description

.

Matches any character except a newline or carriage return.

*

Matches any character.

?

Matches any single character.

\s

Matches any whitespace character, including a space, tab, newline, or carriage return.

\S or [^\s] or[^#x20\t\n\r]

Matches any character except a whitespace character.

\d or [0-9]

Matches any digit.

\d{3}

Matches any three digits.

\D or [^\d] or[^0-9]

Matches any character except a digit.

^

Matches the beginning of a line.

$

Matches the end of a line.

\Ll{5}

Matches any five lowercase letters.

\Lu{6}

Matches any six uppercase letters.

\P{1}

Matches any single punctuation character.

In regular expressions, you can mix these symbols with actual characters to form a search string. For example, using these symbols, you could match:

  • A U.S.-style 9-digit ZIP code, such as 10048-1000 with \d{5}-\d{4}

  • A U.S.-style 10-digit phone number, such as (800)555-1234 with (\d{3})\d{3}-\d{4}

  • The word The at the beginning of a line, followed by a whitespace character, followed by any character, with the expression ^The\s*

XPath 2.0 adds three new functions for use with regular expressions: matches( ), replace( ), and tokenize( ). For more information on these new functions, see Section 7.5 of the functions and operators specification for XPath 2.0 and XQuery 1.0 at http://www.w3.org/TR/xpath-functions/. XSLT 2.0 offers the new analyze-string element. See Section 15 of the XSLT 2.0 spec at http://www.w3.org/TR/xslt20/ for more information on that. I'll show you examples of the matches( ) and replace( ) functions, and the analyze-string element.

The tokenize( ) function is not demonstrated in this chapter. It breaks a string into tokens. The tokens are separated by a regular expression such as by one or more spaces (\s+).


16.4.1 The matches( ) Function

The function matches( ) is new in XPath 2.0. This function returns an xs:boolean value that indicates whether the value in the first argument matches the regular expression in the value of the second argument. The stylesheet match.xsl, in Example 16-3, uses the matches( ) function to test whether a string matches a regular expression.

Example 16-3. A stylesheet matching on a regular expression
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/>     <xsl:template match="functions">  <xsl:element name="list">   <xsl:element name="description">XPath 2.0 Context Functions</xsl:element>   <xsl:element name="date">    <xsl:value-of select="current-date(  )"/>   </xsl:element>    <xsl:apply-templates select="function"/>  </xsl:element> </xsl:template>     <xsl:template match="function">  <xsl:copy>   <xsl:if test="matches(name,'^fn:')">    <xsl:value-of select="substring(name, 4)"/>   </xsl:if>  </xsl:copy> </xsl:template>     </xsl:stylesheet>

The first template rule uses a new XPath 2.0 function, current-date( ), to insert the current date into a date element in the result tree, then it applies templates for function elements. In the second template rule, the first argument of matches( ) is name a child node of function. The content of name is the string that this function attempts to match. The second argument is a regular expression. ^fn: looks for the letters fn: at the beginning of the line (^). If matches( ) finds ^fn: and returns true, the value-of element in the template of if writes a substring from the content of name beginning from the fourth character, thus eliminating fn:.

Transform functions.xml with match.xsl with:

java -jar saxon7.jar functions.xml match.xsl

and you will see this result:

<?xml version="1.0" encoding="UTF-8"?> <list>    <description>XPath 2.0 Context Functions</description>    <date>2003-10-03</date>    <function>context-item(  )</function>    <function>position(  )</function>    <function>last(  )</function>    <function>current-dateTime(  )</function>    <function>current-date(  )</function>    <function>current-time(  )</function>    <function>default-collation(  )</function>    <function>implicit-timezone(  )</function> </list>

16.4.2 The replace( ) Function

The new replace( ) function in XPath 2.0 returns the value of the first argument with every substring matched by the regular expression in the second argument, replaced by the string in the third argument. Example 16-4, the stylesheet replace.xsl, will show you how it works.

Example 16-4. A stylesheet replacing regular expressions
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/>     <xsl:template match="functions">  <xsl:element name="list">   <xsl:element name="description">XPath 2.0 Context Functions</xsl:element>   <xsl:element name="date">    <xsl:value-of select="current-date(  )"/>   </xsl:element>    <xsl:apply-templates select="function"/>  </xsl:element> </xsl:template>     <xsl:template match="function">  <xsl:copy>   <xsl:value-of select="replace(name, '^fn:', '')"/>  </xsl:copy> </xsl:template>     </xsl:stylesheet>

The first argument of replace( ) is the name element, meaning the content of the name element. The second argument is the regular expression you are looking for, and the third argument is the string you want to replace the second argument with. If you process functions.xml with:

java -jar saxon7.jar functions.xml replace.xsl

it will produce the same output as match.xsl.

16.4.3 The analyze-string Element

Finally, the instruction element analyze-string is also new in XSLT 2.0. This element allows you to select a string using the select attribute, and then search that string with a regular expression defined in a regex attribute. Two children can then follow analyze-string: matching-substring to define what happens when analyze-string finds a matching substring, and can follow non-matching-substring to define what happens when analyze-string finds a non-matching substring. You can use either matching-substring or non-matching-substring or both. (Also, analyze-string accepts fallback as a child.)

The regex.xsl stylesheet, Example 16-5, uses analyze-string to handle some text in a node.

Example 16-5. A stylesheet performing more complex regular expressions processing
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/>     <xsl:template match="functions">  <xsl:element name="list">   <xsl:element name="description">XPath 2.0 Context Functions</xsl:element>   <xsl:element name="date">    <xsl:value-of select="current-date(  )"/>   </xsl:element>    <xsl:apply-templates select="function"/>  </xsl:element> </xsl:template>     <xsl:template match="function">  <xsl:copy>  <xsl:analyze-string select="name" regex="^fn:">   <xsl:matching-substring></xsl:matching-substring>   <xsl:non-matching-substring>    <xsl:value-of select="."/>   </xsl:non-matching-substring>    </xsl:analyze-string>  </xsl:copy> </xsl:template>     </xsl:stylesheet>

The second template searches the content of function elements in the source tree. When analyze-string finds the string fn: at the beginning of a line, it replaces the matching substring with nothing in the result tree and outputs the matching substring as is using value-of.

Execute the transformation with this command:

java -jar saxon7.jar functions.xml regex.xsl

and you will get the following result:

<?xml version="1.0" encoding="UTF-8"?> <list>    <description>XPath 2.0 Context Functions</description>    <date>2003-08-26</date>    <function>context-item(  )</function>    <function>position(  )</function>    <function>last(  )</function>    <function>current-dateTime(  )</function>    <function>current-date(  )</function>    <function>current-time(  )</function>    <function>default-collation(  )</function>    <function>implicit-timezone(  )</function> </list>

This same effect can be achieved by using replace( ) or even matches( ), as you saw earlier. The main reason for using analyze-string is when the replacement text contains elements for example, you could use analyze-string to replace a line break by a br tag.


These examples give you a taste of what is possible using regular expressions. For more information on the regular expressions used by XML Schema, and XSLT 2.0 by association, see http://www.w3.org/TR/xmlschema-0.html#regexAppendix and http://www.w3.org/TR/xmlschema-2.html#regexs.



Learning XSLT
Learning XSLT
ISBN: 0596003277
EAN: 2147483647
Year: 2003
Pages: 164

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net