You will frequently use the set of whitespace characters , \s , the set of word characters, \w , and the word boundary anchor, \b , in your Perl regular expressions. Yet you should be careful when you use them together . Consider the following pattern match:
This works fine on input like " joebloe ttyp0 ". However, it will not match at all on strings like " webmaster-1 ttyp1 " and will return a strange result on " joebloe pts/10 ". This match probably should have been written:
There is probably something wrong in your regular expression if you have \w adjacent to \s , or \W adjacent to \S . At the least, you should examine such regular expressions very carefully . Another thing to watch out for is a "word" that contains punctuation characters. Suppose you want to search for a whole word in a text string:
This works fine for input like hacker and even Perl5-Porter , but fails for words like goin' , or any word that does not begin and end with a \w character. It also will consider isn a matchable word if $text contains isn't . The reason is that \b matches transitions between \w and \W characters not transitions between \s and \S characters. If you want to support searching for words delimited by whitespace, you will have to write something like this instead:
The word boundary anchor, \b , and its inverse, \B , are zero-width patterns. Even though they are not the only zero-width patterns ( ^ , \A , etc. are others), they are the hardest to understand. If you are not sure what \b and \B will match in your string, try substituting for them:
The results at the ends of the string should be especially interesting to you. Note that if the last (or first) character in a string is not a \w character, there is no word boundary at the end of the string. Note also that there are not-word boundaries between consecutive \W characters (like space and double quote) as well as consecutive \w characters. Matching at the end of line: $ , \Z , /s , /mOf course, $ matches at the end of a lineor does it? Officially, it matches at the end of the string being matched, or just before a final newline occurring at the end of the string. This feature makes it easy to match new-line- terminated data:
The /s (single-linesort of) option changes the meaning of . (period) so that it matches any character instead of any character but newline. This is useful if you want to capture newlines inside a string:
However, /s does not change the meaning of $ :
To force $ to really match the end of the string, you need to be more insistent. One way to do this is to use the (?! ) regular expression operator:
Here, (?!\n) ensures that there are no newlines after the $ . [6]
Ordinarily, $ only matches before the end of the string or a trailing newline. However, the option /m (multi-line) option modifies the operation of $ so that it can also match before intermediate newlines. The /m option also modifies ^ so that it will match a position immediately following a newline in the middle of the string:
The \A and \Z anchors retain the original meanings of ^ and $ , respectively, whether or not the /m option is used:
|