Defining a Simple Pattern

I l @ ve RuBoard

Before you can use one of PHP's built-in regular expression functions, you have to be able to define a pattern that the function will use for matching purposes. PHP has a number of rules for creating a pattern, many of which are similar to those you'll use with Perl, C, and Java. You can use these rules separately or in combination, making your pattern either quite simple or very complex.

In order to explain how patterns are created, I'll start by introducing the symbols used, then discuss how to group characters together, and finish with classes. The combination of symbols, groupings, and classes define your pattern. As a formatting, rule, I'll define my patterns within quotes ("") and will indicate what the pattern matches in italics.

Literals

The first type of character you will use for defining patterns is a literal. A literal is a value that is written exactly as it is interpreted. For example, the pattern "a" will only match the letter a, "ab" will only match ab, and so forth.

Literals allow us to match exact combinations, but regular expressions would be fairly useless if you could only match literals (in which case some of the string functions would suffice, see Chapter 5, Using Strings ). You can also use specific symbols which have their own meaning, to create less rigid patterns. I'll discuss these next .

Metacharacters

Just one step beyond literals in terms of complexity are metacharacters. These are special symbols that have a meaning beyond their literal value. While "a" simply means a, the first metacharacter, the period (.), will match any single character ("." matches a, b, c, etc.). This is pretty straightforward, although I should note that if you want to refer to a metacharacter literally, you will need to escape it, much like you escape a quotation mark to print it. Hence "\." will match the period.

Next, there are three metacharacters that allow for multiple occurrences: "a*" will match zero or more a's ( a, aa, aaa, etc.); "a+" matches one or more a's ( a, aa, aaa, etc., but there must be at least one); and, "a?" will match up to one a ( a or no a's match).

To match a certain quantity of a letter, put the quantity between curly braces ({ } ), stating either a specific number, a minimum, or a minimum and a maximum. Thus, "a{ 3} " will only match aaa ; "a{3,} " will match aaa, aaaa, etc. (three or more a's); and "a{ 3,5} " matches just aaa, aaaa, and aaaaa (between three and five).

Lastly there is the caret (^) ”pronounced like the vegetable and sometimes referred to as the hat ”which will match a string that begins with the letter following the caret. There is also the dollar sign ($), for anything that ends with the preceding letter. Accordingly, "^a." will match any two-character string beginning with an a, followed by whatever, while ".a$" will correspond to any two-character string ending with an a. Therefore, "^a$" will only match a, making it the equivalent of just "a".

Table 8.1. This is a fairly complete list of special characters used to define your regular expression patterns (including metacharacters but not literals ”a, b, c, etc.).
Special Characters for Regular Expression
Character Matches
. any character
^a begins with a
a$ ends with a
a+ at least one a
a? zero or one a
\n new line
\t tab
\ escape
(ab) ab grouped
ab a or b
a{ 2} aa
a{ 1,} a, aa, aaa, etc.
a{ 1,3} a, aa, aaa
[a-z] any lowercase letter
[A-Z] any uppercase letter
[0-9] any digit

Regular expressions also make use of the pipe () as the equivalent of or. Therefore, "ab" will match the strings a or b and "greay" matches both potential spellings of the color . (Using the pipe within patterns is called alternation ).

Practically, of course, there's little use to matching repetitions of a letter in a string, but these examples are good ways to demonstrate how a symbol works. You should begin by focusing on understanding what the various symbols mean and how they are used. I'll build on this knowledge over the course of this chapter.

Tip

When using curly braces to specify a number of characters, you must always include the minimum number while the maximum is optional ”"a{ 3} " and "a{ 3,} " are acceptable but "a{ ,3} " is not.


Tip

To include special characters (^.[]$()*?{ } \) in a pattern, they need to be escaped (a backslash put before them). This is true for the metacharacters and the grouping symbols (parenthesis and brackets). You can also use the backslash to match new lines ("\n") and tabs ("\t"), essentially creating a metacharacter out of a literal.


Tip

Table 8.1 lists all these symbols and combinations for your reference (along with those being described hereafter).


I l @ ve RuBoard


PHP for the World Wide Web (Visual QuickStart Guide)
PHP for the World Wide Web (Visual QuickStart Guide)
ISBN: 0201727870
EAN: 2147483647
Year: 2001
Pages: 116
Authors: Larry Ullman

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net