Section 11.2. String Methods for Pattern Matching


11.2. String Methods for Pattern Matching

Until now, this chapter has discussed the grammar used to create regular expressions, but it hasn't examined how those regular expressions can actually be used in JavaScript code. This section discusses methods of the String object that use regular expressions to perform pattern matching and search-and-replace operations. The sections that follow this one continue the discussion of pattern matching with JavaScript regular expressions by discussing the RegExp object and its methods and properties. Note that the discussion that follows is merely an overview of the various methods and properties related to regular expressions. As usual, complete details can be found in Part III.

Strings support four methods that use regular expressions. The simplest is search( ). This method takes a regular-expression argument and returns either the character position of the start of the first matching substring or -1 if there is no match. For example, the following call returns 4:

 "JavaScript".search(/script/i); 

If the argument to search( ) is not a regular expression, it is first converted to one by passing it to the RegExp constructor. search( ) does not support global searches; it ignores the g flag of its regular expression argument.

The replace( ) method performs a search-and-replace operation. It takes a regular expression as its first argument and a replacement string as its second argument. It searches the string on which it is called for matches with the specified pattern. If the regular expression has the g flag set, the replace( ) method replaces all matches in the string with the replacement string; otherwise, it replaces only the first match it finds. If the first argument to replace( ) is a string rather than a regular expression, the method searches for that string literally rather than converting it to a regular expression with the RegExp( ) constructor, as search( ) does. As an example, you can use replace( ) as follows to provide uniform capitalization of the word "JavaScript" throughout a string of text:

 // No matter how it is capitalized, replace it with the correct capitalization text.replace(/javascript/gi, "JavaScript"); 

replace( ) is more powerful than this, however. Recall that parenthesized subexpressions of a regular expression are numbered from left to right and that the regular expression remembers the text that each subexpression matches. If a $ followed by a digit appears in the replacement string, replace( ) replaces those two characters with the text that matches the specified subexpression. This is a very useful feature. You can use it, for example, to replace straight quotes in a string with curly quotes, simulated with ASCII characters:

 // A quote is a quotation mark, followed by any number of // nonquotation-mark characters (which we remember), followed // by another quotation mark. var quote = /"([^"]*)"/g; // Replace the straight quotation marks with "curly quotes," // and leave the contents of the quote (stored in $1) unchanged. text.replace(quote, "''$1''"); 

The replace( ) method has other important features as well, which are described in the String.replace( ) reference page in Part III. Most notably, the second argument to replace( ) can be a function that dynamically computes the replacement string.

The match( ) method is the most general of the String regular-expression methods. It takes a regular expression as its only argument (or converts its argument to a regular expression by passing it to the RegExp( ) constructor) and returns an array that contains the results of the match. If the regular expression has the g flag set, the method returns an array of all matches that appear in the string. For example:

 "1 plus 2 equals 3".match(/\d+/g)  // returns ["1", "2", "3"] 

If the regular expression does not have the g flag set, match( ) does not do a global search; it simply searches for the first match. However, match( ) returns an array even when it does not perform a global search. In this case, the first element of the array is the matching string, and any remaining elements are the parenthesized subexpressions of the regular expression. Thus, if match( ) returns an array a, a[0] contains the complete match, a[1] contains the substring that matched the first parenthesized expression, and so on. To draw a parallel with the replace( ) method, a[n] holds the contents of $n.

For example, consider parsing a URL with the following code:

 var url = /(\w+):\/\/([\w.]+)\/(\S*)/; var text = "Visit my blog at http://www.example.com/~david"; var result = text.match(url); if (result != null) {     var fullurl = result[0];   // Contains "http://www.example.com/~david"     var protocol = result[1];  // Contains "http"     var host = result[2];      // Contains "www.example.com"     var path = result[3];      // Contains "~david" } 

Finally, you should know about one more feature of the match( ) method. The array it returns has a length property, as all arrays do. When match( ) is invoked on a nonglobal regular expression, however, the returned array also has two other properties: the index property, which contains the character position within the string at which the match begins, and the input property, which is a copy of the target string. So in the previous code, the value of the result.index property would be 17 because the matched URL begins at character position 17 in the text. The result.input property holds the same string as the text variable. For a regular expression r and string s that does not have the g flag set, calling s.match(r) returns the same value as r.exec(s). The RegExp.exec( ) method is discussed a little later in this chapter.

The last of the regular-expression methods of the String object is split( ). This method breaks the string on which it is called into an array of substrings, using the argument as a separator. For example:

 "123,456,789".split(",");  // Returns ["123","456","789"] 

The split( ) method can also take a regular expression as its argument. This ability makes the method more powerful. For example, you can now specify a separator character that allows an arbitrary amount of whitespace on either side:

 "1, 2, 3, 4, 5".split(/\s*,\s*/); // Returns ["1","2","3","4","5"] 

The split( ) method has other features as well. See the String.split( ) enTRy in Part III for complete details.




JavaScript. The Definitive Guide
JavaScript: The Definitive Guide
ISBN: 0596101996
EAN: 2147483647
Year: 2004
Pages: 767

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net