10.5. Recursive ExpressionsMost aspects of the preg engine's flavor are covered as general topics in Chapter 3, but the flavor does offer something new in its interesting way of matching nested constructs: recursive expressions. The sequence The next few sections show some common uses for recursion. Recursion also plays a central role in the extended "tagged data" example, which starts on page 481. 10.5.1. Matching Text with Nested Parentheses The quintessential recursive example is to match text containing nested sets of parentheses. Here's one way: This expression matches any number of two alternatives. The first alternative, The other alternative, This expression works fine on its own, but be very careful adding anything to it, because anything added is also applied recursively during calls to For example, consider using this expression to validate that an entire string has no unbalanced parentheses. You might be tempted to wrap it in 10.5.1.1. Recursive reference to a set of capturing parentheses The
We can use a limited reference like this to solve the problem posed in the previous section: before adding The underlined part of the regex is within the first set of capturing parentheses, so that's what is reapplied each time Here's our regex in a sample PHP snippet that reports whether the text within the $text variable is balanced or unbalanced: if (preg_match('/^ ( (?: [^()]++ \( (?1) \) )*) $/x', $text)) echo "text is balanced\n"; else echo "text is unbalanced\n"; 10.5.1.2. Recursive reference via named captureIf the subexpression to be called recursively has been wrapped with named parentheses (˜ 138), you have the option to use the (?P> name ) notation for the recursive reference, rather than the (? num ) notation we've seen so far. With this notation, our example might become:
That expression may look complicated, but we can easily make it more readable with the x pattern modifier (˜ 446): $pattern = '{ # The regular expression begins here ... ^ (?P<stuff> # Everything within this set of parentheses is named "stuff." (?: [^()]++ # anything not parentheses \( (?P>stuff) \) # an open paren, more "stuff," and finally a close paren . )* ) $ # This is the end of the regular expression . }x'; # The 'x' here is a preg pattern modifier . if (preg_match($pattern, $text)) echo "text is balanced\n"; else echo "text is unbalanced\n"; 10.5.1.3. More on possessive quantifiers I'll make one final comment on the use of possessive quantifiers in the original expression. If the outer This would be less efficient, but at least it wouldn't be a neverending match. To regain efficiency, you could apply the "unrolling the loop" technique covered in Chapter 6 (˜ 261), which results in 10.5.2. No Backtracking Into RecursionAn important aspect of the preg flavor's recursion semantics is that it treats everything matched by recursion as if it were matched within atomic parentheses (˜ 259). That means that if recursion matches something that must ultimately be partially "unmatched" to achieve overall success, it won't happen (resulting in an overall failure). The "partially" in the middle of that last sentence is important, because the entire text matched via a recursive call can, as a whole unit, be unmatched via backtracking. What recursion disallows is backtracking to a point back within the recursive call. 10.5.3. Matching a Set of Nested Parentheses We've seen how to match a line that has no unbalanced parentheses, so, for completeness, I'd like to show how to explicitly match a balanced set of parentheses (possibly containing additional nested sets within): This example uses the same ingredients as the previous one, but it is arranged a bit differently. As before, if you wish to use this as part of a larger expression, you need to wrap it in capturing parentheses and change |