regex-group

regex- group

The regex-group () function returns a captured substring resulting from matching a regular expression using the <xsl:analyze-string> instruction.

Changes in 2.0

This function is new in XSLT 2.0.

Signature

Argument	Data Type	Meaning
group	xs:integer	Identifies the captured subgroup that is required. The n th captured subgroup provides the string that was matched by the part of the regular expression enclosed by the nth left parenthesis.
Result	xs:string	The string that was matched by the nth subexpression of the regular expression.

Effect

When the <xsl:analyze-string> instruction is used to match a string against a regular expression (regex), its <xsl:matching-substring> child element is invoked once for each substring of the input string that matches the regular expression. The substring that matched the regex can be referred to within this element as «. » , because it becomes the context item, and for consistency with other regex languages it is also available as the value of «regex-group (0) » . Sometimes, however, you need to know not only what the substring that matched the regex was, but which parts of that substring matched particular parts of the regex. The group of characters that matches a particular parenthesized subexpression within the regex is referred to as a captured group ; and the captured group that matches the nth parenthesized subexpression is accessible as the value of «regex-group(n) » .

The substrings matched by <xsl:analyze-string> are available during the execution of the sequence constructor within the <xsl:matching-substring> element, including any templates called from instructions within this sequence constructor: that is, the scope is dynamic rather than static.

Note that it is only the <xsl:analyze-string> instruction that makes captured groups available. They are not made available by the regex functions described in XPath 2.0 Programmer's Reference , that is matches() , replace() , and tokenize() .

Usage and Examples

The regex-group() function is always used together with the <xsl:analyze-string> instruction, described in Chapter 5, page 176.

For example, suppose you are analyzing a comma-separated-values file containing lines like this.

  423, "Barbara Smith","General Motors", 1996-03-12

Given this line as the content of variable $in , you might analyze it using the code:

  <xsl:analyze-string select="$in" regex='("([^"] *?)")([^,]+ ?),'>   <xsl:matching-substring>   <cell>   <xsl:value-of select="regex-group(2)"/>   <xsl:value-of select="regex-group(3)"/>   </cell>   </xsl:matching-substring>   </xsl:analyze-string>

The regex here has two alternatives. The first alternative, «("([ ^ "]*?)") » , matches a string enclosed in quotes. The second alternative, «([^,]+?), » , matches a sequence of non-comma characters followed by a comma. If a string within quotes is matched, then the characters between the quotes are matched by the «[^"] * ? » part of the regex. This appears after the second «( » in the regex, so the string that it matches is available as «regex-group (2) » . If a nonquoted string is matched, it is matched by the «[^,] +? » part, which appears after the third «( » in the regex, and is therefore available as «regex-group(3) » . Rather than work out whether group 2 or group 3 was matched, the XSLT code simply outputs both: the one that was not matched will be a zero-length string, so it is simpler to copy it to the output than to write the conditional code to find out whether it was actually matched.

A full stylesheet containing this example is shown under the unparsed-text () function (page 587).

regex- group

Changes in 2.0

Signature

Effect

Usage and Examples

See Also