Regular Expressions Primer

   

Regular expressions are some of the most powerful utilities available for manipulating text and data. Regular expressions are patterns that enable a developer to have total control in replacing, searching through, and extracting data. To have a good and productive way to manipulate large amounts of data, you must be proficient in regular expressions.

To have a good understanding of regular expressions, you must be familiar with the characters that make up regular expressions. Learning regular expressions is almost like learning another language. Tables 12.1 through 12.6 provide a list of some of the most frequently used characters with regular expressions.

Table 12.1. Using Characters with Regular Expressions

Character

Action of Character

()\x

Matches a clause as numbered by the left parenthesis

(\w+)\s+\1

Matches any word that occurs twice in a row, such as "How is is your day?"

Table 12.2. Multiple Searches of the Clause in the Regular Expression Using Repetition

Character

Action of Character

*

Matches zero or more occurrences; same as {0,}

{x,y}

Matches x to y number of occurrences of a regular expression

+

Matches one or more occurrences; same as {1,}

{x,}

Matches x or more occurrences or a regular expression

\s{2,}

Matches at least two space characters

?

Matches zero or one occurrences; Same as {0,1}

{x}

Matches exactly x occurrences of a regular expression

"\d{7}"

Matches 7 digits

Table 12.3. Using Characters to Search Through Literals

Character

Action of Character

\{

Matches {

\}

Matches }

\xxx

Matches the ASCII character expressed by the octal number xxx

\50

Matches "(" or chr (40)

\+

Matches +

\[

Matches [

\]

Matches ]

\f

Matches a form feed

\v

Matches vertical tab

\.

Matches .

\\

Matches \

\

Matches

\*

Matches *

\ r

Matches carriage return

\?

Matches ?

\xdd

Matches the ASCII character expressed by the hex number dd

\x28

Matches "(" or chr(40)

\t

Matches horizontal tab

\(

Matches (

\)

Matches )

Alphanumeric

Matches alphabetical and numerical characters literally

\n

Matches a new line

Table 12.4. Using Characters to Search Strings

Character

Action of Character

\b

Matches word fragments

es/b

Matches es in "lots of movies"

^

Matches the beginning of a string

^C

Matches C in "Current Characters"

\B

Matches all non-word criteria

$

Matches only the end of a string

s$

Matches the last s in "She sells sea shells "

Table 12.5. Developing Complex Regular Expressions with Alternation and Grouping

Character

Action of Character

Alternation joins clauses into one regular expression, and then matches any of the individual clauses

(uv)(wx)(yz)

Matches "uv" or "wx" or "yz"

()

Groups a clause to create a clause; may be nested

(xy)?(z)

Matches "xyz" or "z"

Table 12.6. Customized Grouping by Putting Expressions Within Braces

Character

Action of Character

\s

Matches any space character; same as [\t\r\nv\f]

\S

Matches any non-space character; same as [^ \t\r\n\v\f]

.

Matches any character except \n

\d

Matches any digit; same as [0-9]

\D

Matches any non-digit; same as [^0-9]

[xyz]

Matches any one character enclosed in the character set

\w

Matches any word character; same as [a-zA-Z_0-9]

\W

Matches any non-word character; same as [^a-zA-Z_0-9]

[^xyz]

Matches any character not enclosed in the character set

[^a-e]

Matches "r" in "horse"

There are some basic concepts you need to understand when using regular expressions. For example, a string that lays out a regular expression is called a pattern . The pattern must be set before the regular expression can be used.

Another thing to keep in mind is the property that decides whether the regular expression is to be compared to all possible matches in a string. This property is called IgnoreCase and defaults to False. A method that searches through and decides whether a string can be matched is called the Test method. If Test finds that the object can be matched, it returns True; if it cannot, it returns False.

Another helpful method of regular expressions is Replace . Replace takes two strings as its arguments and then tries to match the regular expression in the search string. If it is able to match the expression, it then replaces the match with the replace string. If it is not able to match the expression, the original search string is returned. Another method, which works much like the Replace method, is the Execute method. Rather than a search string, the Execute method uses a collection object that contains a Match object for every match it finds.

A couple of properties that are worth mentioning are the Item and Count properties. The Item property enables Match objects to be randomly and incrementally accessed from the Matches collection object. The Count property has the number of Match objects in that collection.

Inside an original string where a match occurs is a read-only value called FirstIndex . FirstIndex looks at the first character in a string and uses a zero-based offset to note the positions of the string. With strings, a couple of other properties are helpful: Value and Length . The Value contains the matched value or text. For the Match object it is the default value. The Length is the value that looks at the matched string and gets the total length of the string.

To use regular expressions as objects you need to use VBScript Version 5.0. You will find that VBScript emulates JScript's RegExp and String objects; at least, that is the case with the VBScript RegExp . In the syntactical area, VBScript is most similar to Visual Basic.

Regular expressions are powerful because they also can be called from other sources outside VBScript. The reason for this is that the VBScript regular expression engine is set up as a COM object. Visual Basic can be used to manipulate regular expressions.

One thing to keep in mind with regular expressions is that regular expressions might not be completely consistent from program to program. Be careful when using different programs to be sure you are getting the desired effects. If any questions arise about discrepancies, you should consult the online manual pages.

   


Special Edition Using ASP. NET
Special Edition Using ASP.Net
ISBN: 0789725606
EAN: 2147483647
Year: 2002
Pages: 233

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net