13.3 Getting ControlThe Metacharacters


13.3 Getting Control”The Metacharacters

Regular expression metacharacters are characters that do not represent themselves . They are endowed with special powers to allow you to control the search pattern in some way (e.g., find the pattern only at the beginning of line, or at the end of the line, or if it starts with an upper- or lowercase letter, etc.). Metacharacters will lose their special meaning if preceded with a backslash. For example, the dot metacharacter represents any single character, but when preceded with a backslash is just a dot or period.

If you see a backslash preceding a metacharacter, the backslash turns off the meaning of the metacharacter, but if you see a backslash preceding an alphanumeric character in a regular expression, then the backslash is used to create a metasymbol. A metasymbol provides a simpler form to represent some of regular expression metachacters. For example, [0-9] represents numbers in the range between 0 and 9, and \d , the metasymbol, represents the same thing. [0-9] uses the bracketed character class, whereas \d is a metasymbol (see Table 13.6).

Example 13.10
 /^a...c / 

EXPLANATION

This regular expression contains metacharacters (see Table 13.6). The first one is a caret (^). The caret metacharacter matches for a string only if it is at the beginning of the line. The period (.) is used to match for any single character, including a whitespace. This expression contains three periods, representing any three characters. To find a literal period or any other character that does not represent itself, the character must be preceded by a backslash to prevent interpretation.

The expression reads: Search at the beginning of the line for an a , followed by any three single characters, followed by a c . It will match, for example, abbbc, a123c, a c, aAx3c , and so on, but only if those patterns were found at the beginning of the line.

Table 13.6. Metacharacters and metasymbols.

Metacharacter/Metasymbol

What It Matches

Character Class: Single Characters and Digits

.

Matches any character except newline

[a“z0“9]

Matches any single character in set

[^a“z0“9]

Matches any single character not in set

\d

Matches one digit

\D

Matches a non-digit, same as [^0“9]

\w

Matches an alphanumeric (word) character

\W

Matches a non-alphanumeric (non-word) character

Character Class: Whitespace Characters

\0

Matches a null character

\b

Matches a backspace

\f

Matches a formfeed

\n

Matches a newline

\r

Matches a return

\s

Matches whitespace character, spaces, tabs, and newlines

\S

Matches non-whitespace character

\t

Matches a tab

Character Class: Anchored Characters

^

Matches to beginning of line

$

Matches to end of line

\A

Matches the beginning of the string only

\b

Matches a word boundary (when not inside [ ] )

\B

Matches a non-word boundary

\G

Matches where previous m//g left off

\Z

Matches the end of the string or line

\z

Matches the end of string only

Character Class: Repeated Characters

X?

Matches 0 or 1 of x

X*

Matches 0 or more of x

X+

Matches 1 or more of x

(xyz)+

Matches one or more patterns of xyz

X{m,n}

Matches at least m of x and no more than n of x

Character Class: Alternative Characters

waswerewill

Matches one of was, were, or will

Character Class: Remembered Characters

(string)

Used for backreferencing (see "Remembering or Capturing" on page 443)

\1 or $1

Matches first set of parentheses

\2 or $2

Matches second set of parentheses

\3 or $3

Matches third set of parentheses

New with JavaScript 1.5

(?:x)

Matches x but does not remember the match. These are called non-capturing parentheses. The matched substring cannot be recalled from the resulting array's elements [1] , ..., [n] or from the predefined RegExp object's properties $1, ..., $9 .

x(?=y)

Matches x only if x is followed by y . For example, /Jack(?=Sprat)/ matches Jack only if it is followed by >Sprat. /Jack(?=SpratFrost)/ matches Jack only if it is followed by Sprat or Frost . However, neither Sprat nor Frost are part of the match results.

x(?!y)

Matches x only if x is not followed by y . For example, /\d+(?!\.)/ matches a number only if it is not followed by a decimal point. /\d+(?!\.)/. exec ("3.141") matches 141 but not 3.141 .

If you are searching for a particular character within a regular expression, you can use the dot metacharacter to represent a single character, or a character class that matches on one character from a set of characters. In addition to the dot and character class, JavaScript has added some backslashed symbols (called metasymbols) to represent single characters. See Table 13.7 for the single-character metacharacters, and Table 13.8 on page 423 for a list of metasymbols.

Table 13.7. Single-character and single-digit metacharacters.

Metacharacter

What It Matches

.

Matches any character except newline

[a“z0“9_]

Matches any single character in set

[^a“z0“9_]

Matches any single character not in set

13.3.1 The Dot Metacharacter

The dot metacharacter matches for any single character with exception of the newline character. For example, the regular expression /a.b/ is matched if the string contains an a , followed by any one single character (except the \n ), followed by b , whereas the expression /.../ matches any string containing at least three characters.

Example 13.11
 <html><head><title>The dot Metacharacter</title>     </head>     <body>     <script language="JavaScript"> 1  var textString="Norma Jean";  2  var reg_expression = /N..ma/;  3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>"); 4       if (  reg_expression.test(textString))  {       //  if (  result)  document.write("<b>The reg_ex /N..ma/ matched the             string\""+ textString +"\".<br>");         }         else{ 5           document.write("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The variable textString is assigned the string "Norma Jean" .

  2. The regular expression /N..ma/ is assigned to the variable reg_expression . A match is found if the string being tested contains an uppercase N followed by any two single characters (each dot represents one character), and an m and an a . It would find Norma, No man, Normandy , etc.

  3. The test method returns true if the string textString matches the regular expression and false if it doesn't. The variable result contains either true or false .

  4. If the string "Norma Jean" contains regular expression pattern /N..ma/ , the return from the test method is true, and the output is sent to the screen as shown in Figure 13.9.

    Figure 13.9. The user entered Norma Jean , an N followed by any 2 characters, and ma.

    graphics/13fig09.jpg

  5. If the pattern is not found, No Match! is displayed on the page.

13.3.2 The Character Class

A character class represents one character from a set of characters. For example [abc] matches either an a, b , or c; and [a-z] matches one character from a set of characters in the range from a to z ; and [0-9] matches one character in the range of digits between to 9 . If the character class contains a leading caret, ^, then the class represents any one character not in the set; thus, [^a-zA-Z] matches a single character not in the range from a to z or A to Z , and [^0-9] matches a single digit not in the range between 0 and 9.

JavaScript provides additional symbols, called metasymbols, to represent a character class. The symbols \d and \D represent a single digit and a single non-digit, respectively; the same as [0-9] and [^0-9] ; whereas \w and \W represent a single word character and a single non-word character, respectively; same as [A-Za-z_0-9] and [^A-Za-z_0-9] .

Example 13.12
 <html><head><title>The Character Class</title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /[A-Z][a-z]eve/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The reg_ex /[A-Z][a-z]eve/ matched the             string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The variable is assigned a bracketed regular expression containing alphanumeric characters. This regular expression matches a string that contains at least one uppercase character ranging between A and Z, followed by one lowercase character ranging between a and z, followed by eve .

  2. The variable textString is assigned user input, in this example Steven lives in Cleveland was entered.

  3. The regular expression test() method will return true since Steven contains an uppercase character, followed by a lowercase character, and eve. Cleveland also matches the pattern. The variable result contains either true or false . See the output in Figures 13.10 and 13.11.

    Figure 13.10. The user entered Steven lives in Cleveland , one uppercase letter [A-Z] , followed by one lowercase letter [a-z] , followed by eve . This matches both Steven and Cleveland .

    graphics/13fig10.jpg

    Figure 13.11. When the user entered Believe! (top), it didn't match (bottom). It would have matched if he had entered BeLieve . Why?

    graphics/13fig11.jpg

Example 13.13
 <html><head><title>The Character Class</title>     </head>     <body>     <script language="JavaScript">     // Character class 1  var reg_expression = /[A-Za-z0-9_]/;  //  A single alphanumeric  //  word character  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The reg_ex /[A-Za-z0-9_]/ matched the             string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. A regular expression object, an alphanumeric character in the bracketed character class [A-Za-z0-9_] is assigned to the variable called reg_expression . This regular expression matches a string that contains at least one character in the character class ranging between A and Z, a and z, 0 and 9 , and the underscore character, _.

  2. User input is entered in the prompt dialog box and assigned to the variable textString . In this example the user entered Take 5 .

  3. The regular expression test method will return true since this string Take 5 contains at least one alphanumeric character. See Figure 13.12.

    Figure 13.12. User entered Take 5 (top). The string contained at least one alphanumeric character (bottom).

    graphics/13fig12.jpg

Example 13.14
 <html><head><title>The Character Class and Negation</title>     </head>     <body>     <script language="JavaScript">     //  Negation within a Character Class  1  var reg_expression = /[^0-9]/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  // R  eturns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The reg_ex /[^0-9]/ matched the             string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The caret inside a character class, when it is the first character after the opening bracket , creates a negation, meaning any character not in this range. This regular expression matches a string that does not contain a number between 0 and 9.

  2. User input is assigned to the variable textString . In this example, abc was entered.

  3. The regular expression test() method will return true since the string abc does not contain a character ranging from 0 to 9.

Figure 13.13. The user entered abc . It contains a character that is not in the range between 0 and 9.

graphics/13fig13.gif

13.3.3 Metasymbols

Metasymbols offer an alternative way to represent a character class. For example, instead of representing a number as [0-9] , it can be represented as \d , and the alternative for representing a non-number [^0-9] is \D . Metasymbols are easier to use and to type than metacharacters.

Table 13.8. Metasymbols.

Symbol

What It Matches

Character Class

\d

One digit

[0-9]

\D

One non-digit

[^0-9]

\s

One whitespace character (tab, space, newline, carriage return, formfeed, vertical tab)

 

\S

One non-space character

 

\w

One word character

[A-Za-z0-9_]

\W

One non-word character

[^A-Za-z0-9]

Example 13.15
 <html><head><title>The Digit Meta Symbol</title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /6\d\d/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /6\d\d/ matched             the string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The variable is assigned a regular expression containing the number 6, followed by two single digits. The metasymbol \d represents the character class [0-9].

  2. The variable textString is assigned user input; in this example, 126553 was entered.

  3. The regular expression test() method will return true since this string abc does not contains a 6 followed by any two digits. See Figure 13.14.

    Figure 13.14. The user entered 126553 . It contains a 6 followed by any two digits.

    graphics/13fig14.jpg

Example 13.16
 <html><head><title>The Digit Meta Symbol Negated</title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /[a-z]\D\D/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /[a-z]\D\D/             matched the string\"" + textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The variable is assigned a regular expression containing a letter, followed by two single non-digits. The metasymbol \D represents the character class [^0-9].

  2. The variable textString is assigned user input; in this example, Hello! was entered.

  3. The regular expression test() method will return true since this string Hello!! matches a lowercase letter, followed by two non-digit characters. See Figure 13.15.

    Figure 13.15. The user entered a lowercase letter followed by two non-digits.

    graphics/13fig15.jpg

Example 13.17
 <html><head><title>Word and Space Metasymbols</title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /\w\s\w\W/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /\w\s\w\W/             matched the string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The variable is assigned a regular expression containing an alphanumeric word character \w , followed by a space \s , followed by another alphanumeric word character, followed by a non-alphanumeric word character \W . The metasymbol \w represents the character class [A-Za-z0-9_]. The metasymbol \W represents the character class [^A-Za-z0-9_], and the metasymbol \s represents a whitespace character (tab, space, newline, carriage return, formfeed).

  2. The variable textString is assigned user input; in this example, ABC D% was entered first.

  3. The regular expression test() method will return true since the string ABC D% matches an alphanumeric character ( C ), followed by a space, another alphanumeric character ( D ) and a non-alphanumeric character ( % ) (see Figure 13.16). An example of output where the pattern failed is shown in Figure 13.17.

    Figure 13.16. The user entered ABC D% . It contained a word character, followed by a whitespace, another word character, followed by a non-whitespace.

    graphics/13fig16.jpg

    Figure 13.17. The user entered ABCD# . To match, the string needs a space between the C and D .

    graphics/13fig17.jpg

13.3.4 Metacharacters to Repeat Pattern Matches

In the previous examples, the metacharacter matched on a single character. What if you want to match on more than one character? For example, let's say you are looking for all lines containing names and the first letter must be in uppercase, which can be represented as [A-Z] , but the following letters are lowercase and the number of letters varies in each name . [a-z] matches on a single lowercase letter. How can you match on one or more lowercase letters, or zero or more lowercase letters? To do this you can use what are called quantifiers . To match on one or more lowercase letters, the regular expression can be written /[a-z]+/ where the + sign means "one or more of the previous characters"; in this case, one or more lowercase letters. JavaScript provides a number of quantifiers as shown in the Table 13.9.

Table 13.9. Quantifiers: The greedy metacharacters.

Metacharacter

What It Matches

x?

Matches 0 or 1 of x

(xyz)?

Matches zero or one pattern of xyz

x*

Matches 0 or more of x

(xyz)*

Matches zero or more patterns of xyz

x+

Matches 1 or more of x

(xyz)+

Matches one or more patterns of xyz

x{m,n}

Matches at least m of x and no more than n of x

The Greed Factor

Normally quantifiers are "greedy"; that is, they match on the largest possible set of characters starting at the left-hand side of the string and searching to the right, looking for the last possible character that would satisfy the condition. For example, given the string:

 
  var string="ab123456783445554437AB"  

and the regular expression:

 
 /  ab[0-9]*  / 

If the replace() method were to substitute what is matched with an "X" :

 
  string=string.relace(/ab[0-9]/, "X");  

the resulting string would be:

 
  "XAB"  

The asterisk is a greedy metacharacter. It matches for zero or more of the preceding character. In other words, it attaches itself to the character preceding it; in the above example, the asterisk attaches itself to the character class [0-9]. The matching starts on the left, searching for ab followed by zero or more numbers in the range between 0 and 9. It is called greedy because the matching continues until the last number is found; in this example, the number 7 . The pattern ab and all of the numbers in the range between 0 and 9 are replaced with a single X .

Greediness can be turned off so that instead of matching on the maximum number of characters, the match is made on the minimal number of characters found. This is done by appending a question mark after the greedy metacharacter. See Example 13.18.

Example 13.18
 <html><head><title></title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /\d\.?\d/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /\d\.?\d/             matched the string\""+textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The variable is assigned a regular expression containing a decimal character \d , and followed by either one or zero literal periods, \.? . The question mark (zero or one) controls the character preceding it, in this case a period. There is either one period or no period at all in the string being matched.

  2. The variable textString is assigned user input; in this example, 3.7 was entered.

  3. The regular expression test method will return true since the string 3.7 matches a decimal number, 3, followed by a period (or not one) and followed by another decimal number, 7. See the examples in Figure 13.18.

    Figure 13.18. The user entered 3.7 , or number, period, number (top); the user entered 456 , or number, no period, number (middle); the user entered 5A6 , but there must be at least two consecutive digits for a match (bottom).

    graphics/13fig18.jpg

Example 13.19
 <html><head><title></title>     </head>     <body>     <script language="JavaScript">     //  Greediness  1  var reg_expression = /[A-Z][a-z]*\s/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /[A-Z][a-z]*\s/             matched the string"+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The variable is assigned a regular expression containing an uppercase letter, [A-Z] , followed by zero or more lowercase letters, [a-z]* , and a space, \s . There are either zero or more lowercase letters.

  2. The variable textString is assigned user input; in this example, Danny boy was entered.

  3. The regular expression test method will return true since the string Danny boy matches an uppercase letter D , followed by zero or more lowercase letters anny , and a space. See Figure 13.19.

    Figure 13.19. The user entered Danny boy , or one uppercase letter, zero or more lowercase letters, and a space (top); the user entered DANNY BOY , or one uppercase letter, zero lowercase letters, and a space (bottom).

    graphics/13fig19.jpg

Example 13.20
 <html><head><title></title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /[A-Z][a-z]+\s/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /[A-Z][a-z]+\s/             matched the string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The regular expression reads: Search for an uppercase letter, followed by one or more lowercase letters, followed by a space.

  2. The user is prompted for input.

  3. The regular expression test() method checks that the string textString entered by the user, matches the regular expression and returns true or false. See Figure 13.20.

    Figure 13.20. The user entered Danny Boy or one uppercase letter, one or more lowercase letters, and a space (top); the user entered DannyBoy and gets no match, since there was no space (bottom).

    graphics/13fig20.jpg

Example 13.21
 <html><head><title></title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /abc\d{1,3}\.\d/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression             /abc\d{1,3}\.\d/ matched the string\""+             textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The variable is assigned a regular expression containing the pattern abc\d{1,3}\.\d , where abc is followed by at least one digit, repeated by up to three digits, followed by a literal period, and another another digit, \d .

  2. The variable textString is assigned user input; here, abc456.5xyz was entered.

  3. The regular expression contains the curly brace {} metacharacters, representing the number of times the preceding expression will be repeated. The expression reads: Find at least one occurrence of the pattern \d and as many as three in a row. See Figure 13.21.

    Figure 13.21. The user entered abc followed by between one and three numbers, followed by a literal period, and xyz (top); the entered string matched true (bottom).

    graphics/13fig21.jpg

Example 13.22
 <html><head><title></title>     </head>     <body>     <script language="JavaScript">     //Repeating patterns 1  var reg_expression = /#\d{5}\.\d/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /#\d{5}\.\d/             matched the string ""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The variable is assigned a regular expression that reads: Find a # sign, followed by exactly five repeating digits \d{5} , a period, and another digit \d .

  2. The user is prompted for input.

  3. The test() method returns true if the regular expression pattern was found in the input string. See Figure 13.22.

    Figure 13.22. The user entered #34234.6 , or a # sign, followed by five repeating digits, a period, and a number (top). This returns true . The user entered abac#12345.56789 (middle). This returns true ; but when the user entered #234.555 (there are not five repeating digits after the # sign), no match was made (bottom).

    graphics/13fig22.jpg

Example 13.23
 <html><head><title></title>     </head>     <body>     <script language="JavaScript">     //Repeating patterns 1  var reg_expression = /5{1,}\.\d/;  var textString=prompt("Type a string of text","");         var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression #\5{1,}\.\d/             matched the string\" "+ textString +"\".<br>");         }         else{             alert("No Match!");         }      </script>      </body>      </html> 
Figure 13.23. The user entered abc5555555.2 , or the number 5 at least 1 time, followed by a literal period, and any digit, \d (top). This returns true ; the user entered 5.6 (bottom). This also returns true .

graphics/13fig23.gif

Metacharacters That Turn off Greediness

By placing a question mark after a greedy quantifier, the greed is turned off and the search ends after the first match, rather than the last one.

Example 13.24
 <html><head><title>Greed</title>     </head>     <body bgcolor=lightblue>     <script language="JavaScript"> 1  var myString="abcdefghijklmnopqrstuvwxyz";  document.write("<font size='+1'>Old string:<b>             "+myString+"<br>"); 2  myString=myString.replace(/[a-z]+/, "XXX")  ;         document.write("</b>New string:<b> "+   myString+"<br>");      </script>      </body>      </html> 

EXPLANATION

  1. The variable, called myString , is assigned a string of lowercase letters.

  2. The regular expression reads: Search for one or more lowercase letters, and replace them with XXX . The + metacharacter is greedy. It takes as many characters as match the expression; i.e., it starts on the left-hand side of the string grabbing as many lowercase letters as it can find until the end of the string.

  3. The value of myString is printed after the substitution, as shown in Figure 13.24.

    Figure 13.24. The + sign is greedy. One or more lowercase letters are replaced with XXX ; i.e., the whole string.

    graphics/13fig24.jpg

Example 13.25
 <html><head><title></title>     </head>     <body>     <script language="JavaScript"> 1  var myString="abcdefghijklmnopqrstuvwxyz";  document.write("<font size='+1'>Old string: <b>"                        +myString+"<br>"); 2       myString=  myString.replace(/[a-z]+?/, "XXX"  );         document.write("</b>New string: <b>"+myString+"<br>");      </script>      </body>      </html> 

EXPLANATION

  1. The variable called myString is assigned a string of lowercase letters, just exactly like the last example.

  2. The regular expression reads: Search for one or more lowercase letters, but after the + sign, there is a question mark. The question mark turns off the greed factor. Now instead of taking as many lowercase letters as it can, this regular expression search stops after it finds the first lowercase character, and then replaces that character with XXX .

Figure 13.25. This is not greedy. Output from Example 13.25.

graphics/13fig25.jpg

13.3.5 Anchoring Metacharacters

Often it is necessary to anchor a metacharacter down, so that it matches only if the pattern is found at the beginning or end of a line, word, or string. These metacharacters are based on a position just to the left or to the right of the character that is being matched. Anchors are technically called zero-width assertions because they correspond to positions , not actual characters in a string; for example, /^abc/ will search for abc at the beginning of the line, where the ^ represents a position, not an actual character. See Table 13.10 for a list of anchoring metacharacters.

Table 13.10. Anchors (assertions).

Metacharacter

What It Matches

^

Matches to beginning of line or beginning of string

$

Matches to end of line or end of a string

\b

Matches a word boundary (When not inside [ ] )

\B

Matches a non-word boundary

Example 13.26
 <html><head><title></title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /^Will/;  //  Beginning of line anchor  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /^Will/ matched             the string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }      </script>      </body>      </html> 

EXPLANATION

  1. The variable is assigned a regular expression containing the beginning of line anchor metacharacter, the caret, followed by Will .

  2. The variable textString is assigned user input; in this example, Willie Wonker was entered.

  3. The regular expression test() method will return true since this string Willie Wonker begins with Will . See Figure 13.26.

    Figure 13.26. The user entered Willie Wonker. Will is at the beginning of the line, so this tests true (top); if the user enters I know Willie , and Will is not at the beginning of the line, the input would test false (bottom).

    graphics/13fig26.jpg

Example 13.27
 <html><head><title>Beginning of Line Anchor</title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /^[JK]/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /^[JK]/ matched             the string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }      </script>      </body>      </html> 

EXPLANATION

  1. A regular expression contains a beginning of line anchor, the caret. The regular expression reads: Find either an uppercase J or uppercase K at the beginning of the line or string.

  2. The variable textString is assigned user input; in this example, Jack and Jill .

  3. The regular expression test() method will return true since the string Jack matches an uppercase letter J and is found at the beginning of the string. See Figure 13.27.

    Figure 13.27. The string must begin with either a J or K . The user entered Jack and Jill (top) and this returns true ; the user entered Karen Evich (bottom) and this also returns true .

    graphics/13fig27.gif

Example 13.28
 <html><head><title>End of Line Anchor</title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /50$/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /50$/ matched             the string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The regular expression /50$/ is assigned to the variable. The pattern contains the dollar sign ( $ ) metacharacter, representing the end of line anchor only when the $ is the last character in the pattern. The expression reads: Find a 5 and a followed by a newline.

Example 13.29
 <html><head><title>Anchors</title>     </head>     <body>     <script language="JavaScript"> 1  var reg_expression = /^[A-Z][a-z]+\s\d$/;  //  At the beginning of the string, find one uppercase  //  letter, followed by  one or more lowercase letters,             // a space, and one digit. 2  var string=prompt("Enter a name and a number","");  3  if (reg_expression.test(string)){  alert("It Matched!!");         }         else{            alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The regular expression reads: Look at the beginning of the line, ^, find an uppercase letter, [A-Z] , followed by one or more lowercase letters, [a-z]+ , a single whitespace, \s , and a digit at the end of the line, \d$ .

  2. The user is prompted for input.

  3. The regular expression test() method tests to see if there was a match and returns true if so, and false if not. See Figures 13.28 and 13.29.

    Figure 13.28. The string begins with a capital letter, followed by one or more lowercase letters, a space, and ends with one digit (left); the input sequence matched, so this message is displayed (right).

    graphics/13fig28.jpg

    Figure 13.29. The regular expression does not match because the string ends in more than one digit (left); the input sequence did not match, so this message is displayed (right).

    graphics/13fig29.jpg

Example 13.30
 <html><head><title>The Word Boundary</title>     </head>     <body>     <script language="JavaScript">     //  Anchoring a word with \b  1  var reg_expression = /\blove\b/;  var textString=prompt("Type a string of text",""); 2       var result=  reg_expression.test(  textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /\blove\b/             matched the string \""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. The regular expression contains the \b metacharacter, representing a word boundary, not a specific character. The expression reads: Find a word beginning and ending with love . This means that gloves, lover, clover, and so on, will not be found.

  2. The regular expression test() method will return true since the string love is within word boundary anchors \b . See Figure 13.30.

Figure 13.30. The user entered I love you! . The word love is between word boundaries (\b). The match was successfull.

graphics/13fig30.jpg

13.3.6 Alternation

Alternation allows the regular expression to contain alternative patterns to be matched; for example, the regular expression / JohnKarenSteve / will match a line containing John or Karen or Steve . If Karen, John , or Steve are all on different lines, all lines are matched. Each of the alternative expressions is separated by a vertical bar (the pipe symbol, ) and the expressions can consist of any number of characters, unlike the character class that only matches for one character; thus, /abc/ is the same as [abc] , whereas /abde/ cannot be represented as [abde] . The pattern /abde/ is either ab or de , whereas the class [abcd] represents only one character in the set a, b, c , or d .

Example 13.31
 <html><head><title>Alternation</title>     </head>     <body>     <script language="JavaScript">     //  Alternation: this or that or whatever...  1  var reg_expression = /SteveDanTom/;  var textString=prompt("Type a string of text",""); 2       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /SteveDanTom/             matched the string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script></body></html> 

EXPLANATION

  1. The pipe symbol, , is used in the regular expression to match on a set of alternative patterns. If any of the patterns, Steve, Dan , or Tom , are found, the match is successful.

  2. The test() method will return true if the user enters either Steve, Dan , or Tom . See Figure 13.31.

    Figure 13.31. The user entered Do you know Tommy? . Pattern Tom was matched in the string.

    graphics/13fig31.jpg

Grouping or Clustering

If the regular expression pattern is enclosed in parentheses, a subpattern is created. Then, for example, instead of the greedy metacharacters matching on zero, one, or more of the previous single characters, they can match on the previous subpattern. Alternation can also be controlled if the patterns are enclosed in parentheses. This process of grouping characters together is also called clustering .

Example 13.32
 <html><head><title>Grouping or Clustering</title>     </head>     <body>     <script language="JavaScript">     // Grouping with parentheses 1  var reg_expression = /^(SamDanTom) Robbins/;  2       var textString=prompt("Type a string of text",""); 3       var result=reg_expression.test(textString);  //  Returns true  //  or false  document.write("<font size='+1'><b>"+result+"<br>");         if (result){             document.write("<b>The regular expression /^(SamDanTom)             Robbins/ matched the string\""+ textString +"\".<br>");         }         else{             alert("No Match!");         }     </script>     </body>     </html> 

EXPLANATION

  1. By enclosing Sam, Dan, and Tom in parentheses, the alternative now becomes either Sam Robbins, Dan Robbins, or Tom Robbins . Without the parentheses, the regular expression matches Sam , or Dan , or Tom Robbins . The caret metacharacter ^ anchors all of the patterns to the beginning of the line.

  2. The user input is assigned to the variable called textString .

  3. The test() method checks to see if the string contains one of the alternatives: Sam Robbins or Dan Robbins or Tom Robbins. If it does, true is returned; otherwise , false is returned. See Figure 13.32.

    Figure 13.32. The user entered Dan Robbins as one of the alternatives. Sam Robbins or Tom Robbins would also be okay.

    graphics/13fig32.jpg

Remembering or Capturing

If the regular expression pattern is enclosed in parentheses, a subpattern is created. The subpattern is saved in special numbered class properties, starting with $1 , then $2 , and so on. which will be applied to the RegExp object, not an instance of the object. These properties can be used later in the program and will persist until another successful pattern match occurs, at which time they will be cleared. Even if the intention was to control the greedy metacharacter or the behavior of alternation as shown in the previous example, the subpatterns are saved as a side effect. [2] For more information on this go to http://developer.netscape.com/docs/manuals/communicator/jsguide/reobjud.hmt#1007373.

[2] It is possible to prevent a subpattern from being saved.

Example 13.33
 <html><head><title>Capturing</title>     </head>     <body>     <script language="JavaScript"> 1       textString = "Everyone likes William Rogers and his friends." 2  var reg_expression = /(William)\s(Rogers)/;  3  myArray=textString.match(reg_expression);  4       document.write(myArray);  //  Three element array  5  document.write(RegExp. + " "+RegExp.);  // alert(myArray[1] + " "+ myArray[2]);         //  match and exec create an array consisting of the string, and  //  the captured patterns. myArray[0] is "William Rogers"  //  myArray[1] is "William"  myArray[2] is "Rogers".  </script>     </body>     </html> 

EXPLANATION

  1. The string called textString is created.

  2. The regular expression contains two subpatterns, William and Rogers , both enclosed in parentheses.

  3. When either the String object's match() method or the RegExp object's exec() method are applied to the regular expression containing subpatterns, an array is returned, where the first element of the array is the regular expression string, and the next elements are the values of the subpatterns.

  4. The array elements are displayed, separated by commas.

  5. The subpatterns are class properties of the RegExp object. $1 represents the first captured subpattern, William , and $2 represents the second captured subpattern, Rogers . See Figure 13.33.

    Figure 13.33. Output from Example 13.33.

    graphics/13fig33.jpg

Example 13.34
 <html>     <head><title>Capture and Replace</title>     <font size="+1"><font face="helvetica">     <script language = "JavaScript"> 1  var string="Tommy Savage:203-123-4444:12 Main St."  2  var newString=string.replace(/(Tommy) (Savage)/, ", ");  3       document.write(  newString  +"<br>");     </script>     </head><body></body>     </html> 

EXPLANATION

  1. A string is assigned to the variable, called string .

  2. The replace() method will search for the pattern Tommy Savage . Since the search side of the replace() method contains the pattern Tommy enclosed in parentheses and the pattern Savage enclosed in parentheses, each of these subpatterns will be stored in $1 and $2 , respectively. A third pattern would be stored in $3 and a fourth pattern in $4 , etc. On the replacement side of the replace() method, $2 and $1 are replaced in the string, so that Savage is first, then a comma, and then Tommy . The first and last names have been reversed .

  3. The new string is displayed.

Figure 13.34. Output from Example 13.34.

graphics/13fig34.jpg

Example 13.35
 <html>     <head>     <title>Capture and Replace</title></head>     <body>     <font size="+1">     <font face="helvetica">     <script language = "JavaScript"> 1  var string="Tommy Savage:203-123-4444:12 Main St."  2  var newString=string.replace(/(\w+)\s(\w+)/, ", ");  3       document.write(newString +"<br>");     </script>     </body>     </html> 

EXPLANATION

  1. A string is created to be used by the replace() method in step 2.

  2. The replace() method searches for one or more alphanumeric word characters, followed by a single space, and another set of alphanumeric word characters. The word characters are enclosed in parentheses, and thus captured. $1 will contain Tommy , and $2 will contain Savage . On the replacement side, $1 and $2 are reversed. After the replacement is made, a new string is created.

  3. The value of newString shows that the capturing and the substitution occurred successfully, leaving the remainder of the string as it was. See Figure 13.35.

    Figure 13.35. Subpatterns are used in string replacement.

    graphics/13fig35.jpg



JavaScript by Example
JavaScript by Example (2nd Edition)
ISBN: 0137054890
EAN: 2147483647
Year: 2003
Pages: 150
Authors: Ellie Quigley

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net