Advanced Regular Expressions


There are a few other regular expression tools that are worth spending a little more time on in case you need to perform more advanced string matching.

Multiline Matching

The multiline flag ( m ) causes ^ and $ to match the beginning and end of a line, in addition to the beginning and end of a string. You could use this flag to parse text like the following,

var text = "This text has multiple lines.\nThis is the second line.\nThe third.";var pattern = /^.*$/gm; // match an entire linevar lines = text.match(pattern);document.writeln("Length of lines = "+lines.length);document.writeln("<<br />>");document.writeln("lines[0] = "+lines[0]);document.writeln("<<br />>");document.writeln("lines[1] = "+lines[1]);document.writeln("<<br />>");document.writeln("lines[2] = "+lines[2]);document. writeln ("<<br />>");

which uses the String method match() to break the text up into individual lines and places them in the array lines . (The global flag is set so that, as previously discussed, match() will find all occurrences of the pattern, not just the first.) The output of this example is shown here.

click to expand

Non-capturing Parentheses

JavaScript also provides more flexible syntax for parenthesized expressions. Using the syntax (?: ) specifies that the parenthesized expression should not be made available for backreferencing. These are referred to as non-capturing parentheses. For example,

var pattern = /(?:a+)(bcd)/; // ignores first subexpressionif (pattern.test("aaaaaabcd")) { alert(RegExp.$1); }

shows the following result:

You can see that the first subexpression (one or more a s) was not captured (made available) by the RegExp object.

Lookahead

JavaScript allows you to specify that a portion of a regular expression matches only if it is or is not followed by a particular subexpression. The (?= ) syntax specifies a positive lookahead; it only matches the previous item if the item is followed immediately by the expression contained in (?= ) . The lookahead expression is not included in the match. For example, in the following,

var pattern = /\d(?=\.\d+)/;

pattern matches only a digit that is followed by a period and one or more digits. It matches 3.1 and 3.14159, but not 3. or .3.

Negative lookahead is achieved with the (?! ) syntax, which behaves like (?= ) . It matches the previous item only if the expression contained in (?! ) does not immediately follow. For example, in

var pattern = /\d(?!\.\d+)/;

pattern matches a string containing a digit that is not followed by a period and another digit. It will match 3 but not 3.1 or 3.14. The negative lookahead expression is also not returned on a match.

Greedy Matching

One particularly challenging aspect facing those new to regular expressions is greedy matching. Often termed aggressive or maximal matching, this term refers to the fact that the interpreter will always try to match as many characters as possible for a particular item. A simple way to think about this is that JavaScript will continue matching characters if at all possible. For example:

var pattern = /(ma.*ing)/;var sentence = "Regexp matching can be daunting.";pattern.test( sentence );alert(RegExp.$1);

You might think that the pattern would match the word matching. But the actual output is

The interpreter matches the longest substring it can, in this case from the initial ma in matching all the way to the final ing in daunting.

Disabling Greedy Matching

You can force a quantifier ( * , + , ? , {m} , {m,} , or {m,n} ) to be non-greedy by following it with a question mark. Doing so forces the expression to match the minimum number of characters rather than the maximum. To repeat our previous example, but this time with minimal matching, we d use

var pattern = /(ma.*?ing)/; // NON-greedy * because of the ?var sentence = "Regexp matching can be daunting.";pattern.test(sentence);alert(RegExp.$1);

The output shows that the interpreter found the first shortest matching pattern in the string:

As we have seen throughout this chapter, there is certainly a lot of power as well as complexity with regular expressions. All JavaScript programmers really should master regexps, as they can aid in common tasks such as form validation. However, before rushing out and adding regular expressions to every script, programmers should consider some of their usage challenges.




JavaScript 2.0
JavaScript: The Complete Reference, Second Edition
ISBN: 0072253576
EAN: 2147483647
Year: 2004
Pages: 209

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net