Using Patterns to Match Dates or Times

10.26.1 Problem

You need to make sure a string looks like a date or time.

10.26.2 Solution

Use a pattern that matches the type of temporal value you expect. Be sure to consider issues such as how strict to be about delimiters between subparts and the lengths of the subparts.

10.26.3 Discussion

Dates are a validation headache because they come in so many formats. Pattern tests are extremely useful for weeding out illegal values, but often insufficient for full verification: a date might have a number where you expect a month, but if the number is 13, the date isn't valid. This section introduces some patterns that match a few common date formats. Recipe 10.31 revisits this topic in more detail and discusses how to combine pattern tests with content verification.

To require values to be dates in ISO (CCYY-MM-DD) format, use this pattern:

/^d{4}-d{2}-d{2}$/

The pattern requires - as the delimiter between date parts. To allow either - or / as the delimiter, use a character class between the numeric parts (the slashes are escaped with a backslash to prevent them from being interpreted as the end of the pattern constructor):

/^d{4}[-/]d{2}[-/]d{2}$/

Or you can use a different delimiter around the pattern and avoid the backslashes:

m|^d{4}[-/]d{2}[-/]d{2}$|

To allow any non-digit delimiter (which corresponds to how MySQL operates when it interprets strings as dates), use this pattern:

/^d{4}Dd{2}Dd{2}$/

If you don't require the full number of digits in each part (to allow leading zeros in values like 03 to be missing, for example), just look for three nonempty digit sequences:

/^d+Dd+Dd+$/

Of course, that pattern is so general that it will also match other values such as U.S. Social Security numbers (which have the format 012-34-5678). To constrain the subpart lengths by requiring two to four digits in the year part and one or two digits in the month and day parts, use this pattern:

/^d{2,4}?Dd{1,2}Dd{1,2}$/

For dates in other formats such as MM-DD-YY or DD-MM-YY, similar patterns apply, but the subparts are arranged in a different order. This pattern matches both of those formats:

/^d{2}-d{2}-d{2}$/

If you need to check the values of individual date parts, use parentheses in the pattern and extract the substrings after a successful match. If you're expecting dates to be in ISO format, for example, do something like this:

if ($val =~ /^(d{2,4})D(d{1,2})D(d{1,2})$/)
{
 ($year, $month, $day) = ($1, $2, $3);
}

The library file lib/Cookbook_Utils.pm in the recipes distribution contains several of these pattern tests, packaged as function calls. If the date doesn't match the pattern, they return undef. Otherwise, they return a reference to an array containing the broken-out values for the year, month, and day. This can be useful for performing further checking on the components of the date. For example, is_iso_date( ) looks for dates that match ISO format. It's defined as follows:

sub is_iso_date
{
my $s = shift;

 return undef unless $s =~ /^(d{2,4})D(d{1,2})D(d{1,2})$/;
 return [ $1, $2, $3 ]; # return year, month, day
}

To use the function, do something like this:

my $ref = is_iso_date ($val);
if (defined ($ref))
{
 # $val matched ISO format pattern;
 # check its subparts using $ref->[0] through $ref->[2]
}
else
{
 # $val didn't match ISO format pattern
}

You'll often find additional processing necessary with dates, because although date-matching patterns help to weed out values that are syntactically malformed, they don't assess whether the individual components contain legal values. To do that, some range checking is necessary. That topic is covered later in Recipe 10.31.

If you're willing to skip subpart testing and just want to rewrite the pieces, you can use a substitution. For example, to rewrite values assumed to be in MM-DD-YY format into YY-MM-DD format, do this:

$val =~ s/^(d+)D(d+)D(d+)$/$3-$1-$2/;

Time values are somewhat more orderly than dates, usually being written with hours first and seconds last, with two digits per part:

/^d{2}:d{2}:d{2}$/

To be more lenient, you can allow the hours part to have a single digit, or the seconds part to be missing:

/^d{1,2}:d{2}(:d{2})?$/

You can mark parts of the time with parentheses if you want to range-check the individual parts, or perhaps to reformat the value to include a seconds part of 00 if it happens to be missing. However, this requires some care with the parentheses and the ? characters in the pattern if the seconds part is optional. You want to allow the entire :d{2} at the end of the pattern to be optional, but not to save the : character in $3 if the third time section is present. To accomplish that, use (?:pat), an alternative grouping notation that doesn't save the matched substring. Within that notation, use parentheses around the digits to save them. Then $3 will be undef if the seconds part is not present, but will contain the seconds digits otherwise:

if ($val =~ /^(d{1,2}):(d{2})(?::(d{2}))?$/)
{
 my ($hour, $min, $sec) = ($1, $2, $3);
 $sec = "00" if !defined ($sec); # seconds missing; use 00
 $val = "$hour:$min:$sec";
}

To rewrite times in 12-hour format with AM and PM suffixes into 24-hour format, you can do something like this:

if ($val =~ /^(d{1,2}):(d{2})(?::(d{2}))?s*(AM|PM)?$/i)
{
 my ($hour, $min, $sec) = ($1, $2, $3);
 # supply missing seconds
 $sec = "00" unless defined ($sec);
 # convert 0 .. 11 -> 12 .. 23 for PM times
 $hour += 12 if defined ($4) && uc ($4) eq "PM";
 $val = "$hour:$min:$sec";
}

The time parts are placed into $1, $2, and $3, with $3 set to undef if the seconds part is missing. The suffix goes into $4 if it's present. If the suffix is AM or missing (undef), the value is interpreted as an AM time. If the suffix is PM, the value is interpreted as a PM time.

10.26.4 See Also

This section is just the beginning of what you can do when processing dates for data transfer purposes. Date and time testing and conversion can be highly idiosyncratic, and the sheer number of issues to consider is mind-boggling:

  • What is the basic date format? Dates come in several common styles, such as ISO (CCYY-MM-DD), U.S. (MM-DD-YY), and British (DD-MM-YY) formats. And these are just some of the more standard formats. Many more are possible. For example, a datafile may contain dates written as June 17, 1959 or as 17 Jun '59.
  • Are trailing times allowed on dates, or perhaps required? When times are expected, is the full time required, or just the hour and minute?
  • Do you allow values like now or today?
  • Are date parts required to be delimited by a certain character, such as - or /, or are other delimiters allowed?
  • Are date parts required to have a specific number of digits? Or are leading zeros on month and year values allowed to be missing?
  • Are months written numerically, or are they represented as month names like January or Jan?
  • Are two-digit year values allowed? Should they be converted to have four digits? If so, what is the conversion rule? (What is the transition point within the range 00 to 99 at which values change from one century to another?)
  • Should date parts be checked to ensure their validity? Patterns can recognize strings that look like dates or times, but while they're extremely useful for detecting malformed values, they may not be sufficient. A value like 1947-15-99 may match a pattern but isn't a legal date. Pattern testing is thus most useful in conjunction with range checks on the individual parts of the date.

The prevalence of these issues in data transfer problems means that you'll probably end up writing some of your own validators on occasion to handle very specific date formats. Later sections of this chapter can provide additional assistance. For example, Recipe 10.30 covers conversion of two-digit year values to four-digit form, and Recipe 10.31 discusses how to perform validity checking on components of date or time values.

Using the mysql Client Program

Writing MySQL-Based Programs

Record Selection Techniques

Working with Strings

Working with Dates and Times

Sorting Query Results

Generating Summaries

Modifying Tables with ALTER TABLE

Obtaining and Using Metadata

Importing and Exporting Data

Generating and Using Sequences

Using Multiple Tables

Statistical Techniques

Handling Duplicates

Performing Transactions

Introduction to MySQL on the Web

Incorporating Query Resultsinto Web Pages

Processing Web Input with MySQL

Using MySQL-Based Web Session Management

Appendix A. Obtaining MySQL Software

Appendix B. JSP and Tomcat Primer

Appendix C. References



MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2005
Pages: 412
Authors: Paul DuBois

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net