13.3 Getting Control”The MetacharactersRegular expression metacharacters are characters that do not represent themselves . They are endowed with special powers to allow you to control the search pattern in some way (e.g., find the pattern only at the beginning of line, or at the end of the line, or if it starts with an upper- or lowercase letter, etc.). Metacharacters will lose their special meaning if preceded with a backslash. For example, the dot metacharacter represents any single character, but when preceded with a backslash is just a dot or period. If you see a backslash preceding a metacharacter, the backslash turns off the meaning of the metacharacter, but if you see a backslash preceding an alphanumeric character in a regular expression, then the backslash is used to create a metasymbol. A metasymbol provides a simpler form to represent some of regular expression metachacters. For example, [0-9] represents numbers in the range between 0 and 9, and \d , the metasymbol, represents the same thing. [0-9] uses the bracketed character class, whereas \d is a metasymbol (see Table 13.6). Example 13.10/^a...c / EXPLANATION This regular expression contains metacharacters (see Table 13.6). The first one is a caret (^). The caret metacharacter matches for a string only if it is at the beginning of the line. The period (.) is used to match for any single character, including a whitespace. This expression contains three periods, representing any three characters. To find a literal period or any other character that does not represent itself, the character must be preceded by a backslash to prevent interpretation. The expression reads: Search at the beginning of the line for an a , followed by any three single characters, followed by a c . It will match, for example, abbbc, a123c, a c, aAx3c , and so on, but only if those patterns were found at the beginning of the line. Table 13.6. Metacharacters and metasymbols.
If you are searching for a particular character within a regular expression, you can use the dot metacharacter to represent a single character, or a character class that matches on one character from a set of characters. In addition to the dot and character class, JavaScript has added some backslashed symbols (called metasymbols) to represent single characters. See Table 13.7 for the single-character metacharacters, and Table 13.8 on page 423 for a list of metasymbols. Table 13.7. Single-character and single-digit metacharacters.
13.3.1 The Dot MetacharacterThe dot metacharacter matches for any single character with exception of the newline character. For example, the regular expression /a.b/ is matched if the string contains an a , followed by any one single character (except the \n ), followed by b , whereas the expression /.../ matches any string containing at least three characters. Example 13.11<html><head><title>The dot Metacharacter</title> </head> <body> <script language="JavaScript"> 1 var textString="Norma Jean"; 2 var reg_expression = /N..ma/; 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); 4 if ( reg_expression.test(textString)) { // if ( result) document.write("<b>The reg_ex /N..ma/ matched the string\""+ textString +"\".<br>"); } else{ 5 document.write("No Match!"); } </script> </body> </html> EXPLANATION
13.3.2 The Character ClassA character class represents one character from a set of characters. For example [abc] matches either an a, b , or c; and [a-z] matches one character from a set of characters in the range from a to z ; and [0-9] matches one character in the range of digits between to 9 . If the character class contains a leading caret, ^, then the class represents any one character not in the set; thus, [^a-zA-Z] matches a single character not in the range from a to z or A to Z , and [^0-9] matches a single digit not in the range between 0 and 9. JavaScript provides additional symbols, called metasymbols, to represent a character class. The symbols \d and \D represent a single digit and a single non-digit, respectively; the same as [0-9] and [^0-9] ; whereas \w and \W represent a single word character and a single non-word character, respectively; same as [A-Za-z_0-9] and [^A-Za-z_0-9] . Example 13.12<html><head><title>The Character Class</title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /[A-Z][a-z]eve/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The reg_ex /[A-Z][a-z]eve/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.13<html><head><title>The Character Class</title> </head> <body> <script language="JavaScript"> // Character class 1 var reg_expression = /[A-Za-z0-9_]/; // A single alphanumeric // word character 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The reg_ex /[A-Za-z0-9_]/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.14<html><head><title>The Character Class and Negation</title> </head> <body> <script language="JavaScript"> // Negation within a Character Class 1 var reg_expression = /[^0-9]/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // R eturns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The reg_ex /[^0-9]/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Figure 13.13. The user entered abc . It contains a character that is not in the range between 0 and 9.
13.3.3 MetasymbolsMetasymbols offer an alternative way to represent a character class. For example, instead of representing a number as [0-9] , it can be represented as \d , and the alternative for representing a non-number [^0-9] is \D . Metasymbols are easier to use and to type than metacharacters. Table 13.8. Metasymbols.
Example 13.15<html><head><title>The Digit Meta Symbol</title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /6\d\d/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /6\d\d/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.16<html><head><title>The Digit Meta Symbol Negated</title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /[a-z]\D\D/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /[a-z]\D\D/ matched the string\"" + textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.17<html><head><title>Word and Space Metasymbols</title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /\w\s\w\W/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /\w\s\w\W/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
13.3.4 Metacharacters to Repeat Pattern MatchesIn the previous examples, the metacharacter matched on a single character. What if you want to match on more than one character? For example, let's say you are looking for all lines containing names and the first letter must be in uppercase, which can be represented as [A-Z] , but the following letters are lowercase and the number of letters varies in each name . [a-z] matches on a single lowercase letter. How can you match on one or more lowercase letters, or zero or more lowercase letters? To do this you can use what are called quantifiers . To match on one or more lowercase letters, the regular expression can be written /[a-z]+/ where the + sign means "one or more of the previous characters"; in this case, one or more lowercase letters. JavaScript provides a number of quantifiers as shown in the Table 13.9. Table 13.9. Quantifiers: The greedy metacharacters.
The Greed FactorNormally quantifiers are "greedy"; that is, they match on the largest possible set of characters starting at the left-hand side of the string and searching to the right, looking for the last possible character that would satisfy the condition. For example, given the string: var string="ab123456783445554437AB" and the regular expression: / ab[0-9]* / If the replace() method were to substitute what is matched with an "X" : string=string.relace(/ab[0-9]/, "X"); the resulting string would be: "XAB" The asterisk is a greedy metacharacter. It matches for zero or more of the preceding character. In other words, it attaches itself to the character preceding it; in the above example, the asterisk attaches itself to the character class [0-9]. The matching starts on the left, searching for ab followed by zero or more numbers in the range between 0 and 9. It is called greedy because the matching continues until the last number is found; in this example, the number 7 . The pattern ab and all of the numbers in the range between 0 and 9 are replaced with a single X . Greediness can be turned off so that instead of matching on the maximum number of characters, the match is made on the minimal number of characters found. This is done by appending a question mark after the greedy metacharacter. See Example 13.18. Example 13.18<html><head><title></title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /\d\.?\d/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /\d\.?\d/ matched the string\""+textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.19<html><head><title></title> </head> <body> <script language="JavaScript"> // Greediness 1 var reg_expression = /[A-Z][a-z]*\s/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /[A-Z][a-z]*\s/ matched the string"+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.20<html><head><title></title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /[A-Z][a-z]+\s/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /[A-Z][a-z]+\s/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.21<html><head><title></title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /abc\d{1,3}\.\d/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /abc\d{1,3}\.\d/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.22<html><head><title></title> </head> <body> <script language="JavaScript"> //Repeating patterns 1 var reg_expression = /#\d{5}\.\d/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /#\d{5}\.\d/ matched the string ""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.23<html><head><title></title> </head> <body> <script language="JavaScript"> //Repeating patterns 1 var reg_expression = /5{1,}\.\d/; var textString=prompt("Type a string of text",""); var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression #\5{1,}\.\d/ matched the string\" "+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> Figure 13.23. The user entered abc5555555.2 , or the number 5 at least 1 time, followed by a literal period, and any digit, \d (top). This returns true ; the user entered 5.6 (bottom). This also returns true .
Metacharacters That Turn off GreedinessBy placing a question mark after a greedy quantifier, the greed is turned off and the search ends after the first match, rather than the last one. Example 13.24<html><head><title>Greed</title> </head> <body bgcolor=lightblue> <script language="JavaScript"> 1 var myString="abcdefghijklmnopqrstuvwxyz"; document.write("<font size='+1'>Old string:<b> "+myString+"<br>"); 2 myString=myString.replace(/[a-z]+/, "XXX") ; document.write("</b>New string:<b> "+ myString+"<br>"); </script> </body> </html> EXPLANATION
Example 13.25<html><head><title></title> </head> <body> <script language="JavaScript"> 1 var myString="abcdefghijklmnopqrstuvwxyz"; document.write("<font size='+1'>Old string: <b>" +myString+"<br>"); 2 myString= myString.replace(/[a-z]+?/, "XXX" ); document.write("</b>New string: <b>"+myString+"<br>"); </script> </body> </html> EXPLANATION
Figure 13.25. This is not greedy. Output from Example 13.25.
13.3.5 Anchoring MetacharactersOften it is necessary to anchor a metacharacter down, so that it matches only if the pattern is found at the beginning or end of a line, word, or string. These metacharacters are based on a position just to the left or to the right of the character that is being matched. Anchors are technically called zero-width assertions because they correspond to positions , not actual characters in a string; for example, /^abc/ will search for abc at the beginning of the line, where the ^ represents a position, not an actual character. See Table 13.10 for a list of anchoring metacharacters. Table 13.10. Anchors (assertions).
Example 13.26<html><head><title></title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /^Will/; // Beginning of line anchor 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /^Will/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.27<html><head><title>Beginning of Line Anchor</title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /^[JK]/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /^[JK]/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.28<html><head><title>End of Line Anchor</title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /50$/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /50$/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.29<html><head><title>Anchors</title> </head> <body> <script language="JavaScript"> 1 var reg_expression = /^[A-Z][a-z]+\s\d$/; // At the beginning of the string, find one uppercase // letter, followed by one or more lowercase letters, // a space, and one digit. 2 var string=prompt("Enter a name and a number",""); 3 if (reg_expression.test(string)){ alert("It Matched!!"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Example 13.30<html><head><title>The Word Boundary</title> </head> <body> <script language="JavaScript"> // Anchoring a word with \b 1 var reg_expression = /\blove\b/; var textString=prompt("Type a string of text",""); 2 var result= reg_expression.test( textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /\blove\b/ matched the string \""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Figure 13.30. The user entered I love you! . The word love is between word boundaries (\b). The match was successfull.
13.3.6 AlternationAlternation allows the regular expression to contain alternative patterns to be matched; for example, the regular expression / JohnKarenSteve / will match a line containing John or Karen or Steve . If Karen, John , or Steve are all on different lines, all lines are matched. Each of the alternative expressions is separated by a vertical bar (the pipe symbol, ) and the expressions can consist of any number of characters, unlike the character class that only matches for one character; thus, /abc/ is the same as [abc] , whereas /abde/ cannot be represented as [abde] . The pattern /abde/ is either ab or de , whereas the class [abcd] represents only one character in the set a, b, c , or d . Example 13.31<html><head><title>Alternation</title> </head> <body> <script language="JavaScript"> // Alternation: this or that or whatever... 1 var reg_expression = /SteveDanTom/; var textString=prompt("Type a string of text",""); 2 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /SteveDanTom/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script></body></html> EXPLANATION
Grouping or ClusteringIf the regular expression pattern is enclosed in parentheses, a subpattern is created. Then, for example, instead of the greedy metacharacters matching on zero, one, or more of the previous single characters, they can match on the previous subpattern. Alternation can also be controlled if the patterns are enclosed in parentheses. This process of grouping characters together is also called clustering . Example 13.32<html><head><title>Grouping or Clustering</title> </head> <body> <script language="JavaScript"> // Grouping with parentheses 1 var reg_expression = /^(SamDanTom) Robbins/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false document.write("<font size='+1'><b>"+result+"<br>"); if (result){ document.write("<b>The regular expression /^(SamDanTom) Robbins/ matched the string\""+ textString +"\".<br>"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION
Remembering or CapturingIf the regular expression pattern is enclosed in parentheses, a subpattern is created. The subpattern is saved in special numbered class properties, starting with $1 , then $2 , and so on. which will be applied to the RegExp object, not an instance of the object. These properties can be used later in the program and will persist until another successful pattern match occurs, at which time they will be cleared. Even if the intention was to control the greedy metacharacter or the behavior of alternation as shown in the previous example, the subpatterns are saved as a side effect. [2] For more information on this go to http://developer.netscape.com/docs/manuals/communicator/jsguide/reobjud.hmt#1007373.
Example 13.33<html><head><title>Capturing</title> </head> <body> <script language="JavaScript"> 1 textString = "Everyone likes William Rogers and his friends." 2 var reg_expression = /(William)\s(Rogers)/; 3 myArray=textString.match(reg_expression); 4 document.write(myArray); // Three element array 5 document.write(RegExp. + " "+RegExp.); // alert(myArray[1] + " "+ myArray[2]); // match and exec create an array consisting of the string, and // the captured patterns. myArray[0] is "William Rogers" // myArray[1] is "William" myArray[2] is "Rogers". </script> </body> </html> EXPLANATION
Example 13.34<html> <head><title>Capture and Replace</title> <font size="+1"><font face="helvetica"> <script language = "JavaScript"> 1 var string="Tommy Savage:203-123-4444:12 Main St." 2 var newString=string.replace(/(Tommy) (Savage)/, ", "); 3 document.write( newString +"<br>"); </script> </head><body></body> </html> EXPLANATION
Figure 13.34. Output from Example 13.34.
Example 13.35<html> <head> <title>Capture and Replace</title></head> <body> <font size="+1"> <font face="helvetica"> <script language = "JavaScript"> 1 var string="Tommy Savage:203-123-4444:12 Main St." 2 var newString=string.replace(/(\w+)\s(\w+)/, ", "); 3 document.write(newString +"<br>"); </script> </body> </html> EXPLANATION
|