Understanding Metacharacters and Metasequences


Regular expressions are composed of standard characters such as letters and numbers as well as special characters and sequences called metacharacters and metasequences. These metacharacters and metasequences are what enable regular expressions to match abstract patterns. For example, using the metasequence \d, you can match any digit, which is more abstract than matching a specific digit.

The metacharacters used by regular expressions enable you to match specific parts of a string, group characters, and even perform logical operations. The list of metacharacters used by regular expressions is relatively short. The metacharacters are summarized in Table 16.2.

Table 16.2. Metacharacters Used in Regular Expressions

Metacharacter

Description

^

The start of the string or the start of a line when the m flag is set

$

The end of a string or the end of a line when the m flag is set

\

Escape a metacharacter or metasequence so it is interpreted literally

.

Any character; includes the newline character only when the s flag is set

*

Zero or more occurrences of the preceding item

+

One or more occurrences of the preceding item

?

Zero or one occurrences of the preceding item

()

A group

[]

A character class

|

Either the item on the left or the item on the right


The metasequences are sequences of characters that are interpreted in a specific manner by regular expressions. Table 16.3 summarizes the regular expression metasequences.

Table 16.3. Metasequences Used in Regular Expressions

Metasequence

Description

{n}

n occurrences of the preceding item

{n,}

n or more occurrences of the preceding item

{n,m}

Between n and m occurrences of the preceding item

\A

The start of the string

\b

The border between a word character (a-z, A-Z, 0-9, or _) and a non-word character including the start and end of a string

\B

The border between two word characters or two non-word characters

\d

Any digit

\D

Any non-digit

\n

Newline

\r

Return

\s

Any whitespace character

\S

Any non-whitespace character

\t

Tab

\unnnn

The Unicode character represented by the character code nnnn

\w

Any word character (a-z, A-Z, 0-9, or _)

\W

Any non-word character

\xnn

The character represented by the ASCII value nn

\z

The end of the string including any final newline character

\Z

The end of the string excluding any final newline character


Using Character Classes

Character classes are denoted by square brackets ([]), and they enable you to specify a set of characters for one position within a regular expression. For example, the following regular expression uses a character class to match any substring that starts with b, followed by any vowel, and ending with a t.

var pattern:RegExp = /b[aeiou]t/g; var string:String = "The bat lost the bet, but he didn't mind a bit."; trace(string.match(pattern));  // bat,bet,but,bit


Most metacharacters and metasequences aren't interpreted as such within a character class. For example {5} is interpreted literally as the digit 5 and the right and left curly brace characters when placed within a character class. The exceptions are the metasequences \n, \r, \t, \unnnn, and \xnn. In addition, the , ], and \ characters have special meaning within character classes.

The - (hyphen) character within a character class can indicate a range of characters. For example, the following code defines a regular expression that matches any lowercase alphabetical character:

var pattern:RegExp = /[a-z]/;


You can define valid ranges of uppercase and lowercase alphabetical characters, digits, and ASCII character codes. If you use a - character such that it does not define a valid range, then it will be interpreted literally. For example, the following defines a character class that matches all lowercase characters, digits, and the - character:

var pattern:RegExp = /[a-z-0-9]/;


The ] character closes a character class. If you want to match the literal ] character within a character class, you have to escape it. The backslash character (\) is the escape character. The following example matches all lowercase characters or the right square bracket character:

var pattern:RegExp = /[a-z\]]/;


If you want to match the literal backslash character, you can escape it with a preceding backslash character. The following matches any lowercase character or the backslash character:

var pattern:RegExp = /[a-z\\]/;


Working with Quantifiers

The metacharacters and metasequences *, +, ?, {n}, {n,}, and {n,m} are quantifiers. They allow you to specify repetitions within patterns. Quantifiers are applied to the item preceding them. An item can be a character, metasequence, character class, or group.

The following example uses the + operator to find all the substrings that consist of alphabetical characters:

var pattern:RegExp = /[a-z]+/ig; var string:String = "There is no path to peace. Peace is the path."; trace(string.match(pattern));  // There,is,no,path,to,peace,Peace,is,the,path


The following code matches only the words that are 4 or 5 characters:

var pattern:RegExp = /[a-z]{4,5}/ig; var string:String = "There is no path to peace.\nPeace is the path."; trace(string.match(pattern));  // There,path,peace,Peace,path





Advanced ActionScript 3 with Design Patterns
Advanced ActionScript 3 with Design Patterns
ISBN: 0321426568
EAN: 2147483647
Year: 2004
Pages: 132

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net