Validation by Pattern Matching

10.23.1 Problem

You need to compare a value to a set of values that is difficult to specify literally without writing a really ugly expression.

10.23.2 Solution

Use pattern matching.

10.23.3 Discussion

Pattern matching is a powerful tool for validation because it allows you to test entire classes of values with a single expression. You can also use pattern tests to break up matched values into subparts for further individual testing, or in substitution operations to rewrite matched values. For example, you might break up a matched date into pieces so that you can verify that the month is in the range from 1 to 12 and the day is within the number of days in the month. Or you might use a substitution to reorder MM-DD-YY or DD-MM-YY values into YY-MM-DD format.

The next few sections describe how to use patterns to test for several types of values, but first let's take a quick tour of some general pattern-matching principles. The following discussion focuses on Perl's regular expression capabilities. Pattern matching in PHP and Python is similar, though you should consult the relevant documentation for any differences. For Java, the ORO pattern matching class library offers Perl-style pattern matching; Appendix A indicates where you can get it.

In Perl, the pattern constructor is /pat/:

$it_matched = ($val =~ /pat/); # pattern match

Put an i after the /pat/ constructor to make the pattern match case insensitive:

$it_matched = ($val =~ /pat/i); # case-insensitive match

To use a character other than slash, begin the constructor with m. This can be useful if the pattern itself contains slashes:

$it_matched = ($val =~ m|pat|); # alternate constructor character

To look for a non-match, replace the =~ operator with the !~ operator:

$no_match = ($val !~ /pat/); # negated pattern match

To perform a substitution in $val based on a pattern match, use s/pat/replacement/. If pat occurs within $val, it's replaced by replacement. To perform a case-insensitive match, put an i after the last slash. To perform a global substitution that replaces all instances of pat rather than just the first one, add a g after the last slash:

$val =~ s/pat/replacement/; # substitution
$val =~ s/pat/replacement/i; # case-insensitive substitution
$val =~ s/pat/replacement/g; # global substitution
$val =~ s/pat/replacement/ig; # case-insensitive and global

Here's a list of some of the special pattern elements available in Perl regular expressions:

Pattern

What the pattern matches

^

Beginning of string

$

End of string

.

Any character

s, S

Whitespace or non-whitespace character

d, D

Digit or non-digit character

w, W

Word (alphanumeric or underscore) or non-word character

[...]

Any character listed between the square brackets

[^...]

Any character not listed between the square brackets

p1|p2|p3

Alternation; matches any of the patterns p1, p2, or p3

*

Zero or more instances of preceding element

+

One or more instances of preceding element

{n}

n instances of preceding element

{m,n}

m through n instances of preceding element

Many of these pattern elements are the same as those available for MySQL's REGEXP regular expression operator. (See Recipe 4.8.)

To match a literal instance of a character that is special within patterns, such as *, ^, or $, precede it with a backslash. Similarly, to include a character within a character class construction that is special in character classes ([, ], or -), precede it with a backslash. To include a literal ^ in a character class, list it somewhere other than as the first character between the brackets.

Many of the validation patterns shown in the following sections are of the /^pat$/ form. Beginning and ending a pattern with ^ and $ has the effect of requiring pat to match the entire string that you're testing. This is common in data validation contexts, because it's generally desirable to know that a pattern matches an entire input value, not just part of it. (If you want to be sure that a value represents an integer, for example, it doesn't do you any good to know only that it contains an integer somewhere.) This is not a hard-and-fast rule, however, and sometimes it's useful to perform a more relaxed test by omitting the ^ and $ characters as appropriate. For example, if you want to strip leading and trailing whitespace from a value, use one pattern anchored only to the beginning of the string, and another anchored only to the end:

$val =~ s/^s+//; # trim leading whitespace
$val =~ s/s+$//; # trim trailing whitespace

That's such a common operation, in fact, that it's a good candidate for being placed into a utility function. The Cookbook_Utils.pm file contains a function trim_whitespace( ) that performs both substitutions and returns the result:

$val = trim_whitespace ($val);

To remember subsections of a string that is matched by a pattern, use parentheses around the relevant parts of the pattern. After a successful match, you can refer to the matched substrings using the variables $1, $2, and so forth:

if ("abcdef" =~ /^(ab)(.*)$/)
{
 $first_part = $1; # this will be ab
 $the_rest = $2; # this will be cdef
}

To indicate that an element within a pattern is optional, follow it by a ? character. To match values consisting of a sequence of digits, optionally beginning with a minus sign, and optionally ending with a period, use this pattern:

/^-?d+.?$/

You can also use parentheses to group alternations within a pattern. The following pattern matches time values in hh:mm format, optionally followed by AM or PM:

/^d{1,2}:d{2}s*(AM|PM)?$/i

The use of parentheses in that pattern also has the side-effect of remembering the optional part in $1. To suppress that side-effect, use (?:pat ) instead:

/^d{1,2}:d{2}s*(?:AM|PM)?$/i

That's sufficient background in Perl pattern matching to allow construction of useful validation tests for several types of data values. The following sections provide patterns that can be used to test for broad content types, numbers, temporal values, and email addresses or URLs.

The transfer directory of the recipes distribution contains a test_pat.pl script that reads input values, matches them against several patterns, and reports which patterns each value matches. The script is easily extensible, so you can use it as a test harness to try out your own patterns.

Using the mysql Client Program

Writing MySQL-Based Programs

Record Selection Techniques

Working with Strings

Working with Dates and Times

Sorting Query Results

Generating Summaries

Modifying Tables with ALTER TABLE

Obtaining and Using Metadata

Importing and Exporting Data

Generating and Using Sequences

Using Multiple Tables

Statistical Techniques

Handling Duplicates

Performing Transactions

Introduction to MySQL on the Web

Incorporating Query Resultsinto Web Pages

Processing Web Input with MySQL

Using MySQL-Based Web Session Management

Appendix A. Obtaining MySQL Software

Appendix B. JSP and Tomcat Primer

Appendix C. References



MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2005
Pages: 412
Authors: Paul DuBois

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net