13.3 Getting Control”The Metacharacters
Regular expression metacharacters are
If you see a backslash preceding a metacharacter, the backslash turns off the meaning of the metacharacter, but if you see a backslash
Example 13.10/^a...c / EXPLANATION This regular expression contains metacharacters (see Table 13.6). The first one is a caret (^). The caret metacharacter matches for a string only if it is at the beginning of the line. The period (.) is used to match for any single character, including a whitespace. This expression contains three periods, representing any three characters. To find a literal period or any other character that does not represent itself, the character must be preceded by a backslash to prevent interpretation. The expression reads: Search at the beginning of the line for an a , followed by any three single characters, followed by a c . It will match, for example, abbbc, a123c, a c, aAx3c , and so on, but only if those patterns were found at the beginning of the line. Table 13.6. Metacharacters and metasymbols.
If you are searching for a particular character within a regular expression, you can use the
dot
metacharacter to represent a single character, or a
character class
that matches on one character from a set of characters. In addition to the dot and character class, JavaScript has added some backslashed symbols (called metasymbols) to represent single characters. See Table 13.7 for the
Table 13.7. Single-character and single-digit metacharacters.
13.3.1 The Dot MetacharacterThe dot metacharacter matches for any single character with exception of the newline character. For example, the regular expression /a.b/ is matched if the string contains an a , followed by any one single character (except the \n ), followed by b , whereas the expression /.../ matches any string containing at least three characters. Example 13.11
<html><head><title>The dot Metacharacter</title>
</head>
<body>
<script language="JavaScript">
1
var textString="Norma Jean";
2
var reg_expression = /N..ma/;
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
4 if (
reg_expression.test(textString))
{ // if (
result)
document.write("<b>The reg_ex /N..ma/ matched the
string\""+ textString +"\".<br>");
}
else{
5 document.write("No Match!");
}
</script>
</body>
</html>
EXPLANATION
13.3.2 The Character ClassA character class represents one character from a set of characters. For example [abc] matches either an a, b , or c; and [a-z] matches one character from a set of characters in the range from a to z ; and [0-9] matches one character in the range of digits between to 9 . If the character class contains a leading caret, ^, then the class represents any one character not in the set; thus, [^a-zA-Z] matches a single character not in the range from a to z or A to Z , and [^0-9] matches a single digit not in the range between 0 and 9. JavaScript provides additional symbols, called metasymbols, to represent a character class. The symbols \d and \D represent a single digit and a single non-digit, respectively; the same as [0-9] and [^0-9] ; whereas \w and \W represent a single word character and a single non-word character, respectively; same as [A-Za-z_0-9] and [^A-Za-z_0-9] . Example 13.12
<html><head><title>The Character Class</title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /[A-Z][a-z]eve/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The reg_ex /[A-Z][a-z]eve/ matched the
string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Example 13.13
<html><head><title>The Character Class</title>
</head>
<body>
<script language="JavaScript">
// Character class
1
var reg_expression = /[A-Za-z0-9_]/;
//
A single alphanumeric
//
word character
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The reg_ex /[A-Za-z0-9_]/ matched the
string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Example 13.14
<html><head><title>The Character Class and Negation</title>
</head>
<body>
<script language="JavaScript">
//
Negation within a Character Class
1
var reg_expression = /[^0-9]/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); // R
eturns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The reg_ex /[^0-9]/ matched the
string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Figure 13.13. The user entered abc . It contains a character that is not in the range between 0 and 9.
13.3.3 MetasymbolsMetasymbols offer an alternative way to represent a character class. For example, instead of representing a number as [0-9] , it can be represented as \d , and the alternative for representing a non-number [^0-9] is \D . Metasymbols are easier to use and to type than metacharacters. Table 13.8. Metasymbols.
Example 13.15
<html><head><title>The Digit Meta Symbol</title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /6\d\d/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /6\d\d/ matched
the string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Example 13.16
<html><head><title>The Digit Meta Symbol Negated</title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /[a-z]\D\D/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /[a-z]\D\D/
matched the string\"" + textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Example 13.17
<html><head><title>Word and Space Metasymbols</title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /\w\s\w\W/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /\w\s\w\W/
matched the string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
13.3.4 Metacharacters to Repeat Pattern Matches
In the previous examples, the metacharacter matched on a single character. What if you want to match on more than one character? For example, let's say you are looking for all lines containing
Table 13.9. Quantifiers: The greedy metacharacters.
The Greed Factor
Normally quantifiers are "greedy"; that is, they match on the largest possible set of characters starting at the left-hand side of the string and searching to the right, looking for the last possible character that would
var string="ab123456783445554437AB"
and the regular expression:
/
ab[0-9]*
/
If the replace() method were to substitute what is matched with an "X" :
string=string.relace(/ab[0-9]/, "X");
the resulting string would be:
"XAB"
The asterisk is a greedy metacharacter. It matches for zero or more of the preceding character. In other words, it attaches itself to the character preceding it; in the above example, the asterisk attaches itself to the character class
[0-9].
The matching starts on the left, searching for
ab
followed by zero or more numbers in the range between 0 and 9. It is called greedy because the matching continues until the last number is found; in this example, the number
7
. The pattern
ab
and all of the numbers in the range between 0 and 9 are
Greediness can be turned off so that instead of matching on the maximum number of characters, the match is made on the minimal number of characters found. This is done by appending a question mark after the greedy metacharacter. See Example 13.18. Example 13.18
<html><head><title></title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /\d\.?\d/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /\d\.?\d/
matched the string\""+textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Example 13.19
<html><head><title></title>
</head>
<body>
<script language="JavaScript">
//
Greediness
1
var reg_expression = /[A-Z][a-z]*\s/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /[A-Z][a-z]*\s/
matched the string"+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Example 13.20
<html><head><title></title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /[A-Z][a-z]+\s/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /[A-Z][a-z]+\s/
matched the string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Example 13.21
<html><head><title></title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /abc\d{1,3}\.\d/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression
/abc\d{1,3}\.\d/ matched the string\""+
textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Example 13.22
<html><head><title></title>
</head>
<body>
<script language="JavaScript">
//Repeating patterns
1
var reg_expression = /#\d{5}\.\d/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /#\d{5}\.\d/
matched the string ""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
Example 13.23
<html><head><title></title>
</head>
<body>
<script language="JavaScript">
//Repeating patterns
1
var reg_expression = /5{1,}\.\d/;
var textString=prompt("Type a string of text","");
var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression #\5{1,}\.\d/
matched the string\" "+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
Figure 13.23. The user entered abc5555555.2 , or the number 5 at least 1 time, followed by a literal period, and any digit, \d (top). This returns true ; the user entered 5.6 (bottom). This also returns true .
Metacharacters That
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Metacharacter |
What It Matches |
|---|---|
|
^ |
Matches to beginning of line or beginning of string |
|
$ |
Matches to end of line or end of a string |
|
\b |
Matches a word boundary (When not inside [ ] ) |
|
\B |
Matches a non-word boundary |
<html><head><title></title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /^Will/;
//
Beginning of line anchor
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /^Will/ matched
the string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
The variable is assigned a regular expression containing the beginning of line anchor metacharacter, the caret, followed by Will .
The variable textString is assigned user input; in this example, Willie Wonker was entered.
The regular expression test() method will return true since this string Willie Wonker begins with Will . See Figure 13.26.
<html><head><title>Beginning of Line Anchor</title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /^[JK]/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /^[JK]/ matched
the string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
A regular expression contains a beginning of line anchor, the caret. The regular expression reads: Find either an uppercase J or uppercase K at the beginning of the line or string.
The variable textString is assigned user input; in this example, Jack and Jill .
The regular expression test() method will return true since the string Jack matches an uppercase letter J and is found at the beginning of the string. See Figure 13.27.
<html><head><title>End of Line Anchor</title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /50$/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /50$/ matched
the string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
The regular expression /50$/ is assigned to the variable. The pattern contains the dollar sign ( $ ) metacharacter, representing the end of line anchor only when the $ is the last character in the pattern. The expression reads: Find a 5 and a followed by a newline.
<html><head><title>Anchors</title>
</head>
<body>
<script language="JavaScript">
1
var reg_expression = /^[A-Z][a-z]+\s\d$/;
//
At the beginning of the string, find one uppercase
//
letter, followed by
one or more lowercase letters,
// a space, and one digit.
2
var string=prompt("Enter a name and a number","");
3
if (reg_expression.test(string)){
alert("It Matched!!");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
The regular expression reads: Look at the beginning of the line, ^, find an uppercase letter, [A-Z] , followed by one or more lowercase letters, [a-z]+ , a single whitespace, \s , and a digit at the end of the line, \d$ .
The user is prompted for input.
The regular expression test() method tests to see if there was a match and returns true if so, and false if not. See Figures 13.28 and 13.29.
<html><head><title>The Word Boundary</title>
</head>
<body>
<script language="JavaScript">
//
Anchoring a word with \b
1
var reg_expression = /\blove\b/;
var textString=prompt("Type a string of text","");
2 var result=
reg_expression.test(
textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /\blove\b/
matched the string \""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
The regular expression contains the \b metacharacter, representing a word boundary, not a specific character. The expression reads: Find a word beginning and ending with love . This means that gloves, lover, clover, and so on, will not be found.
The regular expression test() method will return true since the string love is within word boundary anchors \b . See Figure 13.30.
Alternation allows the regular expression to contain alternative patterns to be matched; for example, the regular expression / JohnKarenSteve / will match a line containing John or Karen or Steve . If Karen, John , or Steve are all on different lines, all lines are matched. Each of the alternative expressions is separated by a vertical bar (the pipe symbol, ) and the expressions can consist of any number of characters, unlike the character class that only matches for one character; thus, /abc/ is the same as [abc] , whereas /abde/ cannot be represented as [abde] . The pattern /abde/ is either ab or de , whereas the class [abcd] represents only one character in the set a, b, c , or d .
<html><head><title>Alternation</title>
</head>
<body>
<script language="JavaScript">
//
Alternation: this or that or whatever...
1
var reg_expression = /SteveDanTom/;
var textString=prompt("Type a string of text","");
2 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /SteveDanTom/
matched the string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script></body></html>
EXPLANATION
The pipe symbol, , is used in the regular expression to match on a set of alternative patterns. If any of the patterns, Steve, Dan , or Tom , are found, the match is successful.
The test() method will return true if the user enters either Steve, Dan , or Tom . See Figure 13.31.
If the regular expression pattern is enclosed in parentheses, a subpattern is created. Then, for example, instead of the greedy metacharacters matching on zero, one, or more of the previous single characters, they can match on the previous subpattern. Alternation can also be controlled if the patterns are
<html><head><title>Grouping or Clustering</title>
</head>
<body>
<script language="JavaScript">
// Grouping with parentheses
1
var reg_expression = /^(SamDanTom) Robbins/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); //
Returns true
//
or false
document.write("<font size='+1'><b>"+result+"<br>");
if (result){
document.write("<b>The regular expression /^(SamDanTom)
Robbins/ matched the string\""+ textString +"\".<br>");
}
else{
alert("No Match!");
}
</script>
</body>
</html>
EXPLANATION
By enclosing Sam, Dan, and Tom in parentheses, the alternative now becomes either Sam Robbins, Dan Robbins, or Tom Robbins . Without the parentheses, the regular expression matches Sam , or Dan , or Tom Robbins . The caret metacharacter ^ anchors all of the patterns to the beginning of the line.
The user input is assigned to the variable called textString .
The
test()
method checks to see if the string contains one of the alternatives:
Sam Robbins
or
Dan Robbins
or
Tom Robbins.
If it does,
true
is returned;
If the regular expression pattern is enclosed in parentheses, a subpattern is created. The subpattern is saved in special numbered class properties, starting with
$1
, then
$2
, and so on. which will be applied to the
RegExp
object, not an instance of the object. These properties can be used later in the program and will persist until another successful pattern match occurs, at which time they will be cleared. Even if the
[2] It is possible to prevent a subpattern from being saved.
<html><head><title>Capturing</title>
</head>
<body>
<script language="JavaScript">
1 textString = "Everyone likes William Rogers and his friends."
2
var reg_expression = /(William)\s(Rogers)/;
3
myArray=textString.match(reg_expression);
4 document.write(myArray); //
Three element array
5
document.write(RegExp. + " "+RegExp.);
// alert(myArray[1] + " "+ myArray[2]);
//
match and exec create an array consisting of the string, and
//
the captured patterns. myArray[0] is "William Rogers"
//
myArray[1] is "William" myArray[2] is "Rogers".
</script>
</body>
</html>
EXPLANATION
The string called textString is created.
The regular expression contains two subpatterns, William and Rogers , both enclosed in parentheses.
When either the
String
object's
match()
method or the
RegExp
object's
exec()
method are applied to the regular expression containing subpatterns, an array is returned, where the first element of the array is the regular expression string, and the
The array elements are displayed, separated by commas.
The subpatterns are class properties of the RegExp object. $1 represents the first captured subpattern, William , and $2 represents the second captured subpattern, Rogers . See Figure 13.33.
<html>
<head><title>Capture and Replace</title>
<font size="+1"><font face="helvetica">
<script language = "JavaScript">
1
var string="Tommy Savage:203-123-4444:12 Main St."
2
var newString=string.replace(/(Tommy) (Savage)/, ", ");
3 document.write(
newString
+"<br>");
</script>
</head><body></body>
</html>
EXPLANATION
A string is assigned to the variable, called string .
The
replace()
method will search for the pattern
Tommy Savage
. Since the search side of the
replace()
method contains the pattern
Tommy
enclosed in parentheses and the pattern
Savage
enclosed in parentheses, each of these subpatterns will be stored in
$1
and
$2
, respectively. A third pattern would be stored in
$3
and a fourth pattern in
$4
, etc. On the replacement side of the
replace()
method,
$2
and
$1
are replaced in the string, so that
Savage
is first, then a comma, and then
Tommy
. The first and last names have been
The new string is displayed.
<html>
<head>
<title>Capture and Replace</title></head>
<body>
<font size="+1">
<font face="helvetica">
<script language = "JavaScript">
1
var string="Tommy Savage:203-123-4444:12 Main St."
2
var newString=string.replace(/(\w+)\s(\w+)/, ", ");
3 document.write(newString +"<br>");
</script>
</body>
</html>
EXPLANATION
A string is created to be used by the replace() method in step 2.
The replace() method searches for one or more alphanumeric word characters, followed by a single space, and another set of alphanumeric word characters. The word characters are enclosed in parentheses, and thus captured. $1 will contain Tommy , and $2 will contain Savage . On the replacement side, $1 and $2 are reversed. After the replacement is made, a new string is created.
The value of newString shows that the capturing and the substitution occurred successfully, leaving the remainder of the string as it was. See Figure 13.35.