Flylib.com

Books Software

 
 
 

7.4 Built-in Rules

     

7.4 Built-in Rules

A number of named rules are provided by default, including a complete set of POSIX-style classes, and Unicode property classes. The list isn't fully defined yet, but Table 7-7 shows a few you're likely to see.

Table 7-7. Built-in rules

Rule

Meaning

<alpha>

Match a Unicode alphabetic character.

<digit>

Match a Unicode digit.

<sp>

Match a single-space character (the same as \s ).

<ws>

Match any whitespace (the same as \s+ ).

<null>

Match the null string.

<prior>

Match the same thing as the previous match.

<before . . . >

Zero-width lookahead . Assert that you're before a pattern.

<after . . . >

Zero-width lookbehind. Assert that you're after a pattern.

<prop . . . >

Match any character with the named property.

<replace( . . . )>

Replace everything matched so far in the rule or subrule with the given string (under consideration).


The <null> rule matches a zero-width string (so it's always true) and <prior> matches whatever the most recent successful rule matched. These replace the two behaviors of the Perl 5 null pattern // , which is no longer valid syntax for rules.

     

7.5 Backtracking Control

Backtracking is triggered whenever part of the pattern fails to match. You can also explicitly trigger backtracking by calling the fail function within a closure. Table 7-8 shows some metacharacters and built-in rules relevant to backtracking.

Table 7-8. Backtracking controls

Operator

Meaning

:

Don't retry the previous atom; fail to the next earlier atom.

: :

Don't backtrack over this point; fail out of the closest enclosing group ( ( . . . ) , [ . . . ] , or the rule delimiters).

:: :

Don't backtrack over this point; fail out of the current rule or subrule.

<commit>

Don't backtrack over this point; fail out of the entire match (even from within a subrule).

<cut>

Like <commit> , but also cuts the string matched. The current matching position at this point becomes the new beginning of the string.


     

7.6 Hypothetical Variables

Hypothetical variables are a powerful way of building up data structures from within a match. Ordinary captures with ( ) store the result of the captures in $1 , $2 , etc. The values stored in these variables will be kept if the match is successful, but thrown away if the match fails (hence the term "hypothetical"). The numbered capture variables are accessible outside the match, but only within the immediate surrounding lexical scope:

"Zaphod Beeblebrox" ~~ m:w/ (\w+) (\w+) /;



print ; # prints Zaphod

You can also capture into any user -defined variable with the binding operator := . These variables must already be defined in the lexical scope surrounding the rule:

my $person;

"Zaphod's just this guy." ~~ / ^ $person := (\w+) /;

print $person; # prints Zaphod

Repeated matches can be captured into an array:

my @words;

"feefifofum" ~~ / @words := (f<-[f]>+)* /;

# @words contains ("fee", "fi", "fo", "fum")

Pairs of repeated matches can be captured into a hash:

my %customers;

$records ~~ m:w/ %customers := [ <id> = 

<name> \n]* /;

If you don't need the captured value outside the rule, use a $? variable instead. These are only directly accessible within the rule:

"Zaphod saw Zaphod" ~~ m:w/ $?name := (\w+) \w+ $?name/;

A match of a named rule stores the result in a $? variable with the same name as the rule. These variables are also accessible only within the rule:

"Zaphod saw Zaphod" ~~ m:w/ <name> \w+ $?name /;