ProblemYou want to understand the basic building blocks of regular expressions. SolutionRegular expressions are built by combining characters with special meaning. First start by learning the basic patterns, and then use this knowledge to put together more complex patterns. DiscussionA regular expression is a pattern constructed using the regular expression syntax and is typically used during text processing and pattern matching. The syntax consists of characters, metacharacters, and metasequences. Characters are interpreted literally, whereas metacharacters and metasequences have special meaning in the regular expression context. For example, the regular expression built from the characters hello matches the string "hello," whereas the regular expression consisting only of the . metacharacter means "any character" and matches "a", "b", "1", etc. Additionally, the regular expression built from using the \\d metasequence matches any digit, such as "1" or "9". Before getting too in-depth with the regular expression syntax, let's start by discussing how regular expressions are created in ActionScript 3.0. Regular expressions are built with the RegExp class and can be constructed from either a string describing the pattern or from a regular expression literal. A regular expression literal is a forward slash, followed by the regular expression pattern, followed by another forward slash, such as / pattern /. The follow code demonstrates how to create a regular expression for the pattern hello by using both a string and the RegExp constructor, as well as a regular expression literal: // Create a pattern for hello using the RegExp class constructor // passing in a string describing the pattern var example1:RegExp = new RegExp( "hello" ); // Create the same hello pattern using a regular expression literal var example2:RegExp = /hello/; Both the example1 and example2 regular expressions match the same pattern, namely the string "hello." In general, the pattern is the same regardless of which method you use to create the regular expression. However, when a backslash (\\) is part of the regular expression pattern, using a string and the RegExp constructor gets tricky.
Backslashes mark the beginning of an escape sequence inside a string (see Recipe 12.3) and lose their meaning in the regular expression context. That is, the backslash is interpreted as a special string character before being interpreted in the regex. Therefore, if you want to match a pattern with a backslash, you have to use a double backlash in the string approach. The regular expression literal does not have the same problem: // Create a regular expression to match a digit (note the double // backslash) var example1:RegExp = new RegExp( "\\d" ); // Create a regular expression to match a digit var example2:RegExp = /\d/; // Create a regular expression that matches a backslash. var example3:RegExp = new RegExp("\\\\"); // Create a regular expression to match a backslash Var example4:RegExp = /\\/;
By now you know that characters in a regular expression pattern are interpreted literally. By combining metacharacters and metasequences with regular characters, you can create powerful combinations useful for matching many pattern types. Let's take a look at the metacharacters, what they mean, and how they might be used. Table 13-1 summarizes the regular expression metacharacters. Any time you want to use one of these metacharacters literally, it must be preceded by a backslash. For example, to match an open curly brace, use the regular expression \\{.
Similar to metacharacters, the metasequences are described in Table 13-2 listing what the expression matches along with an example.
Table 13-1 and Table 13-2 describe the basic syntax rules that make up regular expressions. By combining characters, metacharacters, and metasequences, you can match a wide variety of patterns. There is more to the story, however. Regular expressions can also include certain flags that indicate if any special processing should be done with the pattern. There are five flags that can be accessed as properties of a RegExp object: global, ignoreCase, multiline, dotall, and extended. The flags must be set when the expression is created; trying to modify a flag on a RegExp instance results in a compile-time error: // Generates a compile-time error in strict mode: // Property is read-only example.global = true; There are two ways to set flags, depending on which method is used to create the regex. When using the RegExp constructor, you can pass a second string parameter that lists the flags for the regex. When using a regular expression literal, the flags should follow the trailing forward slash that ends the expression: // Create a regular expression with the global and ignoreCase flags var example1:RegExp = new RegExp( "hello", "gi" ); // Create a regular expression with the global and ignoreCase flags var example2:RegExp = /hello/gi; By default, all the flags are set to false unless they are explicitly declared when the regex is created. Table 13-3 lists the various flags and their meaning.
The most commonly used flags are ignoreCase and global, but specifying the extended flag can help in understanding regexes. With the extended flag set, you can insert extra whitespace to highlight the different parts that make up the expression; for example: var example1:RegExp = /(a(b)*){2,}/ // Use the extended flag for slightly more readability var example2:RegExp = /(a (b)* ){2,}/x; The preceding code creates a regular expression for "a, followed by b any number of times, with the whole expression repeated at least 2 times" and matches "abba" and "abbbabbbbbbbb," but not "abbb." A key point to remember is that every regex can be reduced to these fundamental building blocks. Understanding this and learning how to break down complex regex patterns will help avoid some of the frustration associated with learning regular expressions. It's worth your time to learn regular expressions, and once you've got them down, they'll prove to be a valuable tool to have on your belt. See AlsoRecipes 9.5 and 12.3. A good reference for regular expressions can be found at http://www.regular-expressions.info. See Mastering Regular Expressions, by Jeffrey Friedl (O'Reilly) for extensive practice with regular expressions and Regular Expressions Pocket Reference, by Tony Stubblebine (O'Reilly) for a quick lookup guide. |