Matching Words


It's worth taking a look at a common taskmatching whole wordsbecause there are several ways to do that. For example, you can match a word using \S , which matches non-white space characters . Here's an example that does that, matching and displaying the first word in some text:

 <HTML>      <HEAD>          <TITLE>Finding Words</TITLE>          <SCRIPT LANGUAGE="JavaScript">              <!--                function displayer()                 {  var regexp = /\S+/   var matches = document.form1.text1.value.match(regexp)   document.form1.text2.value = matches[0]  }            //-->          </SCRIPT>      </HEAD>      <BODY>          <H1>Finding Words</H1>          <FORM NAME="form1">              <INPUT TYPE="TEXT" NAME="text1" VALUE="JavaScript is the subject." SIZE="30">              <BR>              <INPUT TYPE="BUTTON" VALUE="Find First Word" ONCLICK="displayer()">              <BR>              <INPUT TYPE="TEXT" NAME="text2">          </FORM>      </BODY>  </HTML> 

On the other hand, \S can match many non- alphanumeric characters, which you might not be interested in. You can avoid them if you match using \w , which matches all alphanumeric characters and underscores (_):

 <HTML>      <HEAD>          <TITLE>Finding Words</TITLE>          <SCRIPT LANGUAGE="JavaScript">              <!--                function displayer()                 {  var regexp = /\w+/   var matches = document.form1.text1.value.match(regexp)   document.form1.text2.value = matches[0]  }            //-->          </SCRIPT>      </HEAD>      <BODY>          <H1>Finding Words</H1>              <FORM NAME="form1">              <INPUT TYPE="TEXT" NAME="text1" VALUE="JavaScript is the subject." SIZE="30">              <BR>              <INPUT TYPE="BUTTON" VALUE="Find First Word" ONCLICK="displayer()">              <BR>              <INPUT TYPE="TEXT" NAME="text2">          </FORM>      </BODY>  </HTML> 

There are other options as well. For example, you can use a character class to match only words made up of upper or lower case letters :

 <HTML>      <HEAD>          <TITLE>Finding Words</TITLE>          <SCRIPT LANGUAGE="JavaScript">               <!--                 function displayer()                  {  var regexp = /([A-Za-z]+)/   var matches = document.form1.text1.value.match(regexp)   document.form1.text2.value = matches[0]  }              //-->          </SCRIPT>      </HEAD>      <BODY>          <H1>Finding Words</H1>          <FORM NAME="form1">              <INPUT TYPE="TEXT" NAME="text1" VALUE="JavaScript is the subject." SIZE="30">              <BR>              <INPUT TYPE="BUTTON" VALUE="Find First Word" ONCLICK="displayer()">              <BR>              <INPUT TYPE="TEXT" NAME="text2">          </FORM>      </BODY>  </HTML> 

Actually, to be even safer, you also should straddle your match with word boundary assertions with \b :

(Listing 20-11.html on the web site)
 <HTML>      <HEAD>          <TITLE>Finding Words</TITLE>          <SCRIPT LANGUAGE="JavaScript">              <!--                function displayer()                 {  var regexp = /\b([A-Za-z]+)\b/   var matches = document.form1.text1.value.match(regexp)   document.form1.text2.value = matches[0]  }            //-->          </SCRIPT>      </HEAD>      <BODY>          <H1>Finding Words</H1>          <FORM NAME="form1">              <INPUT TYPE="TEXT" NAME="text1" VALUE="JavaScript is the subject." SIZE="30">              <BR>              <INPUT TYPE="BUTTON" VALUE="Find First Word" ONCLICK="displayer()">              <BR>              <INPUT TYPE="TEXT" NAME="text2">          </FORM>      </BODY>  </HTML> 

The \b assertion matches the zero-width transition (not an actual character) between a word character (that is, \w , which matches all alphanumeric and "_" characters) and a non-word character ( \W , which matches every character except word characters).

Tip

A non-word boundary assertion, \B , also exists, which matches everything but word boundaries.


As you can see, there are several ways to match words when using regular expressions.



Inside Javascript
Inside JavaScript
ISBN: 0735712859
EAN: 2147483647
Year: 2005
Pages: 492
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net