8.6. The Match VariablesSo far, when we've put parentheses into patterns, they've been used only for their ability to group parts of a pattern together. But parentheses also trigger the regular expression engine's memory. The memory holds the part of the string matched by the part of the pattern inside parentheses. If there are more than one pair of parentheses, there will be more than one memory. Each regular expression memory holds part of the original string, not part of the pattern. Since these variables hold strings, they are scalar variables; in Perl, they have names like $1 and $2. There are as many of these variables as there are pairs of memory parentheses in the pattern. As you'd expect, $4 means the string matched by the fourth set of parentheses. []
These match variables are a big part of the power of regular expressions because they let us pull out the parts of a string: $_ = "Hello there, neighbor"; if (/\s(\w+),/) { # memorize the word between space and comma print "the word was $1\n"; # the word was there } Or you could use more than one memory at once: $_ = "Hello there, neighbor"; if (/(\S+) (\S+), (\S+)/) { print "words were $1 $2 $3\n"; } That tells us that the words were Hello there neighbor. Notice that there's no comma in the output. Because the comma is outside of the memory parentheses in the pattern, there is no comma in memory two. Using this technique, we can choose what we want in the memories, as well as what we want to leave out. You could have an empty match variable[*] if that part of the pattern might be empty. That is, a match variable may contain the empty string:
my $dino = "I fear that I'll be extinct after 1000 years."; if ($dino =~ /(\d*) years/) { print "That said '$1' years.\n"; # 1000 } $dino = "I fear that I'll be extinct after a few million years."; if ($dino =~ /(\d*) years/) { print "That said '$1' years.\n"; # empty string } 8.6.1. The Persistence of MemoryThese match variables generally stay around until the next successful pattern match.[] That is, an unsuccessful match leaves the previous memories intact, but a successful one resets them all. This correctly implies that you shouldnt use these match variables unless the match succeeded; otherwise, you could be seeing a memory from some previous pattern. The following (bad) example is supposed to print a word matched from $wilma. But if the match fails, it's using whatever leftover string happens to be found in $1:
$wilma =~ /(\w+)/; # BAD! Untested match result print "Wilma's word was $1... or was it?\n"; This is another reason a pattern match is almost always found in the conditional expression of an if or while: if ($wilma =~ /(\w+)/) { print "Wilma's word was $1.\n"; } else { print "Wilma doesn't have a word.\n"; } Since these memories don't stay around forever, you shouldn't use a match variable like $1 more than a few lines after its pattern match. If your maintenance programmer adds a new regular expression between your regular expression and your use of $1, you'll be getting the value of $1 for the second match, rather than the first. For this reason, if you need a memory for more than a few lines, copy it into an ordinary variable. Doing this helps make the code more readable at the same time: if ($wilma =~ /(\w+)/) { my $wilma_word = $1; ... } Later, in Chapter 9, you'll see how to get the memory value directly into the variable at the same time as the pattern match happens, without having to use $1 explicitly. 8.6.2. The Automatic Match VariablesThere are three more match variables that you get free,[*] whether the pattern has memory parentheses or not. That's the good news; the bad news is that these variables have weird names.
Larry probably would have been happy enough to call these by slightly less weird names, like perhaps $gazoo or $ozmodiar. But those are names you might want to use in your own code. To keep ordinary Perl programmers from having to memorize the names of all of Perl's special variables before choosing their first variable names in their first programs,[] Larry has given strange names to many of Perls built-in variables, names that break the rules. In this case, the names are punctuation marks: $&, $`, and $'. They're strange, ugly, and weird, but those are their names.[] The part of the string that matched the pattern is automatically stored in $&:
if ("Hello there, neighbor" =~ /\s(\w+),/) { print "That actually matched '$&'.\n"; } The part that matched was "there," (with a space, a word, and a comma). Memory one, in $1, has the five-letter word there, but $& has the entire matched section. Whatever came before the matched section is in $`, and whatever was after it is in $'. Another way to say that is that $` holds whatever the regular expression engine had to skip over before it found the match, and $' has the remainder of the string that the pattern never got to. If you glue these three strings together in order, you'll always get back the original string: if ("Hello there, neighbor" =~ /\s(\w+),/) { print "That was ($`)($&)($').\n"; } The message shows the string as (Hello)( there,)( neighbor), showing the three automatic match variables in action. Any or all of these three automatic match variables may be empty like the numbered match variables. And they have the same scope as the numbered match variables. Generally, that means they'll stay around until the next successful pattern match. Now, we said earlier that these three are "free." Well, freedom has its price. In this case, the price is that once you use any one of these automatic match variables anywhere in your entire program, other regular expressions will run a little more slowly[*]. Now, this isn't a giant slowdown, but it's enough of a worry that many Perl programmers will never use these automatic match variables.[] Instead, theyll use a workaround. For example, if the only one you need is $&, put parentheses around the whole pattern and use $1 instead. (You may need to renumber the pattern's memories.)
Match variables (the automatic ones and the numbered ones) are most often used in substitutions, which you'll see in the next chapter. |