Using Patterns to Match Email Addresses and URLs

10.27.1 Problem

You want to determine whether or not a value looks like an email address or a URL.

10.27.2 Solution

Use a pattern, tuned to the level of strictness you want to enforce.

10.27.3 Discussion

The immediately preceding sections use patterns to identify classes of values such as numbers and dates, which are fairly typical applications for regular expressions. But pattern matching has such widespread applicability that it's impossible to list all the ways you can use it for data validation. To give some idea of a few other types of values that pattern matching can be used for, this section shows a few tests for email addresses and URLs.

To check values that are expected to be email addresses, the pattern should require at least an @ character with nonempty strings on either side:

/.@./

That's a pretty minimal test. It's difficult to come up with a fully general pattern that covers all the legal values and rejects all the illegal ones, but it's easy to write a pattern that's at least a little more restrictive.[3] For example, in addition to being nonempty, the username and the domain name should consist entirely of characters other than @ characters or spaces:

[3] To see how hard it can be to perform pattern matching for email addresses, check Appendix E in Jeffrey Friedl's Mastering Regular Expressions (O'Reilly).

/^[^@ ]+@[^@ ]+$/

You may also wish to require that the domain name part contain at least two parts separated by a dot:

/^[^@ ]+@[^@ .]+.[^@ .]+/

To look for URL values that begin with a protocol specifier of http://, ftp://, or mailto:, use an alternation that matches any of them at the beginning of the string. These values contain slashes, so it's easier to use a different character around the pattern to avoid having to escape the slashes with backslashes:

m#^(http://|ftp://|mailto:)#i

The alternatives in the pattern are grouped within parentheses because otherwise the ^ will anchor only the first of them to the beginning of the string. The i modifier follows the pattern because protocol specifiers in URLs are not case sensitive. The pattern is otherwise fairly unrestrictive, because it allows anything to follow the protocol specifier. I leave it to you to add further restrictions as necessary.

Using the mysql Client Program

Writing MySQL-Based Programs

Record Selection Techniques

Working with Strings

Working with Dates and Times

Sorting Query Results

Generating Summaries

Modifying Tables with ALTER TABLE

Obtaining and Using Metadata

Importing and Exporting Data

Generating and Using Sequences

Using Multiple Tables

Statistical Techniques

Handling Duplicates

Performing Transactions

Introduction to MySQL on the Web

Incorporating Query Resultsinto Web Pages

Processing Web Input with MySQL

Using MySQL-Based Web Session Management

Appendix A. Obtaining MySQL Software

Appendix B. JSP and Tomcat Primer

Appendix C. References



MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2005
Pages: 412
Authors: Paul DuBois

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net