Regular Expressions: Characters


Regular Expressions: Characters

In a regular expression, any single character matches itself (unless it is a special character such as $ or ^, as we'll see later, because those characters have special meaning). For example, I can check whether the user has typed the word exit , by matching to the regular expression /exit/ and if so, display a message:

 function checker()  {      var regexp = /exit/      var matches = document.form1.text1.value.match(regexp)      document.form1.text2.value = "You typed exit."  } 

In fact, it may be a good idea to make sure that the user typed only exit , and not, for example, Don't exit!!! . We can do that by making sure that the text we're checking contains exit and nothing else. We can do that with the regular expression /^exit$/i . The special character ^ matches the beginning of a line of text, the character $ matches the end of a line of text, and the i modifier makes the match non-case-sensitive (so the user could type exit , EXIT , ExIt , and so on). In this way, you see how to build a regular expressionpiece by piece, left to right. Here's how we match the beginning of the line, followed by exit , followed by the end of the line:

 function checker()  {      var regexp = /^exit$/i      var matches = document.form1.text1.value.match(regexp)      document.form1.text2.value = "You typed exit."  } 

Besides normal characters, JavaScript defines these special characters that you can use in regular expressions. Note that you "escape" these characters by starting them with a backslash:

  • \077 Octal character code

  • \a Alarm (bell)

  • \cX Control character ( \cX is ^X, \cC is ^C, and so on)

  • \d Match a digit character

  • \D Match a non-digit character

  • \f Form feed

  • \n Newline

  • \r Return

  • \S Match a non-whitespace character

  • \s Match a whitespace character

  • \t Ta b

  • \w Match a word character ( alphanumeric characters and "_")

  • \W Match a non-word character

  • \x1A Hex character code 1A

In general, if you preface a character with a backslash, JavaScript will treat the character as itself and not try to interpret it as a special code of any kind. If you really want to match a dollar sign and not the end of a line, for example, use \$ , not $ .

Note some of the useful characters heresuch as \w , which matches a word character. Note that a \w matches only one alphanumeric character, not a whole word. To match a word, you need to match one or more word characters with the expression \w+ ; here, the plus sign (+) means "one or more match." (See the section, "Regular Expressions: Quantifiers," for more on how to use the plus sign.). For example, here is how we can match the first word in the text "Here is the text." and replace that word (that is, Here ) with There :

(Listing 20-04.html on the web site)
 <HTML>      <HEAD>          <TITLE>Replacing Words</TITLE>          <SCRIPT LANGUAGE="JavaScript">              <!--                function displayer()                 {  var regexp = /\w+/   document.form1.text2.value = document.form1.text1.value.replace( graphics/ccc.gif regexp, "There")  }              //-->          </SCRIPT>      </HEAD>      <BODY>          <H1>Replacing Words</H1>          <FORM NAME="form1">              <INPUT TYPE="TEXT" NAME="text1" VALUE="Here is the text.">              <BR>              <INPUT TYPE="BUTTON" VALUE="Replace First Word" ONCLICK="displayer()">              <BR>              <INPUT TYPE="TEXT" NAME="text2">          </FORM>      </BODY>  </HTML> 

You can see the results of this code in Figure 20.4, where we're replacing the first word in the typed text (from Here to There ).

Figure 20.4. Replacing a word.

graphics/20fig04.gif

Matching Any Character

A very powerful character that you can use in regular expressions is the dot (.). This character matches any character except a newline ( \n ). For example, I can substitute an asterisk (*) for all the characters in the string "JavaScript is the subject." like this, where I'm making the substitution operation global with the g modifier:

(Listing 20-05.html on the web site)
 <HTML>      <HEAD>          <TITLE>Replacing Characters</TITLE>          <SCRIPT LANGUAGE="JavaScript">              <!--                function replacer()                 {  var regexp = /./g   document.form1.text2.value = document.form1.ext1.value.replace(regexp, graphics/ccc.gif "*")  }            //-->          </SCRIPT>      </HEAD>      <BODY>          <H1>Replacing Characters</H1>          <FORM NAME="form1">              <INPUT TYPE="TEXT" NAME="text1" VALUE="JavaScript is the subject." SIZE="30">              <BR>              <INPUT TYPE="BUTTON" VALUE="Replace All Text" ONCLICK="replacer()">              <BR>              <INPUT TYPE="TEXT" NAME="text2" SIZE="30">          </FORM>      </BODY>  </HTML> 

You can see the results of this code in Figure 20.5, where we're replacing all the text "JavaScript is the subject." with asterisks .

Figure 20.5. Replacing text.

graphics/20fig05.gif

What if you really want to match a dot (that is, a period)? Characters that have special meaning, such as the dot, are called metacharacters (the metacharacters are: \ ( ) [ { ^ $ * + ? .), and you can preface any of them with a backslash to make them be interpreted literally and not as a metacharacter. Here's an example where I remove the period at the beginning of a sentence by actually matching the period at the beginning of the text ".Now is the time." :

(Listing 20-06.html on the web site)
 <HTML>      <HEAD>          <TITLE>Replacing Characters</TITLE>          <SCRIPT LANGUAGE="JavaScript">              <!--                function displayer()                 {  var regexp = /^\./   document.form1.text2.value = document.form1.text1.value.replace( graphics/ccc.gif regexp, "")  }              //-->          </SCRIPT>      </HEAD>      <BODY>          <H1>Replacing Characters</H1>          <FORM NAME="form1">              <INPUT TYPE="TEXT" NAME="text1" VALUE=".Now is the time." SIZE="30">              <BR>              <INPUT TYPE="BUTTON" VALUE="Replace Leading Period" ONCLICK="displayer()">              <BR>              <INPUT TYPE="TEXT" NAME="text2">          </FORM>      </BODY>  </HTML> 


Inside Javascript
Inside JavaScript
ISBN: 0735712859
EAN: 2147483647
Year: 2005
Pages: 492
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net