10.23.1 Problem
You need to compare a value to a set of values that is difficult to specify literally without writing a really ugly expression.
10.23.2 Solution
Use pattern matching.
10.23.3 Discussion
Pattern matching is a powerful tool for validation because it allows you to test entire classes of values with a single expression. You can also use pattern tests to break up matched values into subparts for further individual testing, or in substitution operations to rewrite matched values. For example, you might break up a matched date into pieces so that you can verify that the month is in the range from 1 to 12 and the day is within the number of days in the month. Or you might use a substitution to reorder MM-DD-YY or DD-MM-YY values into YY-MM-DD format.
The next few sections describe how to use patterns to test for several types of values, but first let's take a quick tour of some general pattern-matching principles. The following discussion focuses on Perl's regular expression capabilities. Pattern matching in PHP and Python is similar, though you should consult the relevant documentation for any differences. For Java, the ORO pattern matching class library offers Perl-style pattern matching; Appendix A indicates where you can get it.
In Perl, the pattern constructor is /pat/:
$it_matched = ($val =~ /pat/); # pattern match
Put an i after the /pat/ constructor to make the pattern match case insensitive:
$it_matched = ($val =~ /pat/i); # case-insensitive match
To use a character other than slash, begin the constructor with m. This can be useful if the pattern itself contains slashes:
$it_matched = ($val =~ m|pat|); # alternate constructor character
To look for a non-match, replace the =~ operator with the !~ operator:
$no_match = ($val !~ /pat/); # negated pattern match
To perform a substitution in $val based on a pattern match, use s/pat/replacement/. If pat occurs within $val, it's replaced by replacement. To perform a case-insensitive match, put an i after the last slash. To perform a global substitution that replaces all instances of pat rather than just the first one, add a g after the last slash:
$val =~ s/pat/replacement/; # substitution $val =~ s/pat/replacement/i; # case-insensitive substitution $val =~ s/pat/replacement/g; # global substitution $val =~ s/pat/replacement/ig; # case-insensitive and global
Here's a list of some of the special pattern elements available in Perl regular expressions:
Pattern |
What the pattern matches |
---|---|
^ |
Beginning of string |
$ |
End of string |
. |
Any character |
s, S |
Whitespace or non-whitespace character |
d, D |
Digit or non-digit character |
w, W |
Word (alphanumeric or underscore) or non-word character |
[...] |
Any character listed between the square brackets |
[^...] |
Any character not listed between the square brackets |
p1|p2|p3 |
Alternation; matches any of the patterns p1, p2, or p3 |
* |
Zero or more instances of preceding element |
+ |
One or more instances of preceding element |
{n} |
n instances of preceding element |
{m,n} |
m through n instances of preceding element |
Many of these pattern elements are the same as those available for MySQL's REGEXP regular expression operator. (See Recipe 4.8.)
To match a literal instance of a character that is special within patterns, such as *, ^, or $, precede it with a backslash. Similarly, to include a character within a character class construction that is special in character classes ([, ], or -), precede it with a backslash. To include a literal ^ in a character class, list it somewhere other than as the first character between the brackets.
Many of the validation patterns shown in the following sections are of the /^pat$/ form. Beginning and ending a pattern with ^ and $ has the effect of requiring pat to match the entire string that you're testing. This is common in data validation contexts, because it's generally desirable to know that a pattern matches an entire input value, not just part of it. (If you want to be sure that a value represents an integer, for example, it doesn't do you any good to know only that it contains an integer somewhere.) This is not a hard-and-fast rule, however, and sometimes it's useful to perform a more relaxed test by omitting the ^ and $ characters as appropriate. For example, if you want to strip leading and trailing whitespace from a value, use one pattern anchored only to the beginning of the string, and another anchored only to the end:
$val =~ s/^s+//; # trim leading whitespace $val =~ s/s+$//; # trim trailing whitespace
That's such a common operation, in fact, that it's a good candidate for being placed into a utility function. The Cookbook_Utils.pm file contains a function trim_whitespace( ) that performs both substitutions and returns the result:
$val = trim_whitespace ($val);
To remember subsections of a string that is matched by a pattern, use parentheses around the relevant parts of the pattern. After a successful match, you can refer to the matched substrings using the variables $1, $2, and so forth:
if ("abcdef" =~ /^(ab)(.*)$/) { $first_part = $1; # this will be ab $the_rest = $2; # this will be cdef }
To indicate that an element within a pattern is optional, follow it by a ? character. To match values consisting of a sequence of digits, optionally beginning with a minus sign, and optionally ending with a period, use this pattern:
/^-?d+.?$/
You can also use parentheses to group alternations within a pattern. The following pattern matches time values in hh:mm format, optionally followed by AM or PM:
/^d{1,2}:d{2}s*(AM|PM)?$/i
The use of parentheses in that pattern also has the side-effect of remembering the optional part in $1. To suppress that side-effect, use (?:pat ) instead:
/^d{1,2}:d{2}s*(?:AM|PM)?$/i
That's sufficient background in Perl pattern matching to allow construction of useful validation tests for several types of data values. The following sections provide patterns that can be used to test for broad content types, numbers, temporal values, and email addresses or URLs.
The transfer directory of the recipes distribution contains a test_pat.pl script that reads input values, matches them against several patterns, and reports which patterns each value matches. The script is easily extensible, so you can use it as a test harness to try out your own patterns.
Using the mysql Client Program
Writing MySQL-Based Programs
Record Selection Techniques
Working with Strings
Working with Dates and Times
Sorting Query Results
Generating Summaries
Modifying Tables with ALTER TABLE
Obtaining and Using Metadata
Importing and Exporting Data
Generating and Using Sequences
Using Multiple Tables
Statistical Techniques
Handling Duplicates
Performing Transactions
Introduction to MySQL on the Web
Incorporating Query Resultsinto Web Pages
Processing Web Input with MySQL
Using MySQL-Based Web Session Management
Appendix A. Obtaining MySQL Software
Appendix B. JSP and Tomcat Primer
Appendix C. References