The Elements of a Regular Expression

Team Fly 

Page 558

The Elements of a Regular Expression

In this section we'll go though the various metacharacters you can use in building regular expressions and look at numerous examples. This is a more formal treatment of regular expressions and, unlike in the introduction of the chapter, we've organized the metacharacters according to their function.

Characters and Metacharacters

Regular expressions are made up of regular characters (they match the same characters in the text), metacharacters, and special symbols. Metacharacters are regular characters prefixed by the slash character. The character ''w" in a regular expression will match the same character in the text. If you prefix it with a slash, you turn it into a metacharacter: the \w metacharacter will match any word character. The "d" character will match the same character in the text, but the \d metacharacter will match a digit. The \W and \D metacharacters will match any non-word character and any non-digit character, respectively. Some symbols also have special meaning in a regular expression. The period matches a single character (any character, including the space) in the text and the square brackets are used to declare a range of characters. To match any of these symbols in the text, you must prefix them with the slash.

The simplest regular expression you can build is a regular string. The Match method will locate all the instances of the regular expression in the text, as if you were using the InStr() function, or the IndexOf method of the String class. If you use the string "Basic" as a regular expression, you will locate all instances of the word "Basic" in the text. By default, the search is case-sensitive. If you turn on the IgnoreCase option, you will also locate all instances of the words "BASIC," "basic," and so on.

To specify a more general pattern, you must include one or more metacharacters in the regular expression. One of the most common metacharacters of regular expressions is the period, which matches any character. The asterisk is another metacharacter that matches the preceding pattern any number of times. The expression .* will locate entire sentences in the text, because the period doesn't match the newline character.

If you want to treat any of the metacharacters in the regular expression as regular characters, you must "escape" them with a slash. To locate a period followed by an asterisk in the text, use the following regular expression:

\.\*

In the following sections you will find descriptions of all metacharacters used in building regular expressions, and examples to demonstrate their usage.

Single Character Metacharacters

A very common metacharacter in building regular expressions is the \w symbol, which means a "word character." Use this metacharacter to specify a character in a word and exclude spaces and punctuation. The period metacharacter matches any character, including spaces and punctuation symbols. The \w metacharacter matches word characters only. The pattern

...ce

Team Fly 


Visual Basic  .NET Power Tools
Visual Basic .NET Power Tools
ISBN: 0782142427
EAN: 2147483647
Year: 2003
Pages: 178

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net