Recipe 10.26. Using Patterns to Match Email Addresses or URLs


Problem

You want to determine whether a value looks like an email address or a URL.

Solution

Use a pattern, tuned to the level of strictness you want to enforce.

Discussion

The immediately preceding sections use patterns to identify classes of values such as numbers and dates, which are fairly typical applications for regular expressions. But pattern matching has such widespread applicability that it's impossible to list all the ways you can use it for data validation. To give some idea of a few other types of values that pattern matching can be used for, this section shows a few tests for email addresses and URLs.

To check values that are expected to be email addresses, the pattern should require at least an @ character with nonempty strings on either side:

/.@./ 

That's a pretty minimal test. It's difficult to come up with a fully general pattern that covers all the legal values and rejects all the illegal ones,[*] but it's easy to write a pattern that's at least a little more restrictive. For example, in addition to being nonempty, the username and the domain name should consist entirely of characters other than @ characters or spaces:

[*] To see how hard it can be to perform pattern matching for email addresses, check Jeffrey E. F. Friedl's Mastering Regular Expressions (O'Reilly).

/^[^@ ]+@[^@ ]+$/ 

You may also want to require that the domain name part contain at least two parts separated by a dot:

/^[^@ ]+@[^@ .]+\.[^@ .]+/ 

To look for URL values that begin with a protocol specifier of http://, ftp://, or mailto:, use an alternation that matches any of them at the beginning of the string. These values contain slashes, so it's easier to use a different character around the pattern to avoid having to escape the slashes with backslashes:

m#^(http://|ftp://|mailto:)#i 

The alternatives in the pattern are grouped within parentheses because otherwise the ^ will anchor only the first of them to the beginning of the string. The i modifier follows the pattern because protocol specifiers in URLs are not case-sensitive. The pattern is otherwise fairly unrestrictive because it allows anything to follow the protocol specifier. I leave it to you to add further restrictions as necessary.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net