10.32.1 Problem
There's a given date-processing operation that you need to perform frequently, so you want to write a utility that does it for you.
10.32.2 Solution
The utilities in this section provide some examples showing how to do that.
10.32.3 Discussion
Due to the idiosyncratic nature of dates, you'll probably find it necessary to write date converters from time to time. This section shows some sample converters that serve various purposes:
All three scripts are located in the transfer directory of the recipes distribution. They assume datafiles are in tab-delimited, linefeed-terminated format. (Use cvt_file.pl to work with files in a different format.)
Our first date-processing utility, isoize_date.pl, looks for dates in U.S. format and rewrites them into ISO format. You'll recognize that it's modeled after the general input-processing loop shown in Recipe 10.21, with some extra stuff thrown in to perform a specific type of conversion:
#! /usr/bin/perl -w # isoize_date.pl - Read input data, look for values that match # a date pattern, convert them to ISO format. Also converts # 2-digit years to 4-digit years, using a transition point of 70. # By default, this looks for dates in MM-DD-[CC]YY format. # Assumes tab-delimited, linefeed-terminated input lines. # Does not check whether dates actually are valid (for example, # won't complain about 13-49-1928). use strict; # transition point at which 2-digit years are assumed to be 19XX # (below they are treated as 20XX) my $transition = 70; while (<>) { chomp; my @val = split (/ /, $_, 10000); # split, preserving all fields for my $i (0 .. @val - 1) { my $val = $val[$i]; # look for strings in MM-DD-[CC]YY format next unless $val =~ /^(d{1,2})D(d{1,2})D(d{2,4})$/; my ($month, $day, $year) = ($1, $2, $3); # to interpret dates as DD-MM-[CC]YY instead, replace preceding # line with the following one: #my ($day, $month, $year) = ($1, $2, $3); # convert 2-digit years to 4 digits, then update value in array $year += ($year >= $transition ? 1900 : 2000) if $year < 100; $val[$i] = sprintf ("%04d-%02d-%02d", $year, $month, $day); } print join (" ", @val) . " "; } exit (0);
If you feed isoize_date.pl an input file that looks like this:
Fred 04-13-70 Mort 09-30-69 Brit 12-01-57 Carl 11-02-73 Sean 07-04-63 Alan 02-14-65 Mara 09-17-68 Shepard 09-02-75 Dick 08-20-52 Tony 05-01-60
It produces the following output:
Fred 1970-04-13 Mort 2069-09-30 Brit 2057-12-01 Carl 1973-11-02 Sean 2063-07-04 Alan 2065-02-14 Mara 2068-09-17 Shepard 1975-09-02 Dick 2052-08-20 Tony 2060-05-01
isoize_date.pl serves a specific purpose: It converts only from U.S. to ISO format. It does not perform validity checking on date subparts or allow the transition point for adding the century to be specified. A more general tool would be more useful. The next script, cvt_date.pl, extends the capabilities of isoize_date.pl; it recognizes input dates in ISO, US, or British formats and converts any of them to any other. It also can convert two-digit years to four digits, allows you to specify the conversion transition point, and can warn about bad dates. As such, it can be used to preprocess input for loading into MySQL, or for postprocessing data exported from MySQL for use by other programs.
cvt_date.pl understands the following options:
--iformat= format, --oformat= format, --format= format,
Set the date format for input, output, or both. The default format value is iso; cvt_date.pl also recognizes any string beginning with us or br as indicating U.S. or British date format.
--add-century
Convert two-digit years to four digits.
--columns= column_list
Convert dates only in the named columns. By default, cvt_date.pl looks for dates in all columns. If this option is given, column_list should be a list of one or more column positions separated by commas. Positions begin at 1.
--transition= n
Specify the transition point for two-digit to four-digit year conversions. The default transition point is 70. This option turns on --add-century.
--warn
Warn about bad dates. (Note that this option can produce spurious warnings if the dates have two-digit years and you don't specify --add-century, because leap year testing won't always be accurate in that case.)
I won't show the code for cvt_date.pl here (most of it is taken up with processing command-line options), but you can examine the source for yourself if you like. As an example of how cvt_date.pl works, suppose you have a file newdata.txt with the following contents:
name1 01/01/99 38 name2 12/31/00 40 name3 02/28/01 42 name4 01/02/03 44
Running the file through cvt_date.pl with options indicating that the dates are in U.S. format and that the century should be added produces this result:
% cvt_date.pl --iformat=us --add-century newdata.txt name1 1999-01-01 38 name2 2000-12-31 40 name3 2001-02-28 42 name4 2003-01-02 44
To produce dates in British format instead with no year conversion, do this:
% cvt_date.pl --iformat=us --oformat=br newdata.txt name1 01-01-99 38 name2 31-12-00 40 name3 28-02-01 42 name4 02-01-03 44
cvt_date.pl has no knowledge of the meaning of each data column, of course. If you have a non-date column with values that match the pattern, it will rewrite that column, too. To deal with that, specify a --columns option to limit the columns that cvt_date.pl attempts to convert.
isoize_date.pl and cvt_date.pl both operate on dates written in all-numeric formats. But dates in datafiles often are written differently, in which case it may be necessary to write a special purpose script to process them. Suppose an input file contains dates in the following format (these represent the dates on which U.S. states were admitted to the Union):
Delaware Dec. 7, 1787 Pennsylvania Dec 12, 1787 New Jersey Dec. 18, 1787 Georgia Jan. 2, 1788 Connecticut Jan. 9, 1788 Massachusetts Feb. 6, 1788 Maryland Apr. 28, 1788 South Carolina May 23, 1788 New Hampshire Jun. 21, 1788 Virginia Jun 25, 1788 ...
The dates consist of a three-character month abbreviation (possibly followed by a period), the numeric day of the month, a comma, and the numeric year. To import this file into MySQL, you'd need to convert the dates to ISO format, resulting in a file that looks like this:
Delaware 1787-12-07 Pennsylvania 1787-12-12 New Jersey 1787-12-18 Georgia 1788-01-02 Connecticut 1788-01-09 Massachusetts 1788-02-06 Maryland 1788-04-28 South Carolina 1788-05-23 New Hampshire 1788-06-21 Virginia 1788-06-25 ...
That's a somewhat specialized kind of transformation, though this general type of problem (converting a specific date format) is hardly uncommon. To perform the conversion, identify the dates as those values matching an appropriate pattern, map month names to the corresponding numeric values, and reformat the result. The following script, monddccyy_to_iso.pl, illustrates how to do this:
#! /usr/bin/perl -w # monddccyy_to_iso.pl - convert dates from mon[.] dd, ccyy to ISO format # Assumes tab-delimited, linefeed-terminated input use strict; my %map = # map 3-char month abbreviations to numeric month ( "jan" => 1, "feb" => 2, "mar" => 3, "apr" => 4, "may" => 5, "jun" => 6, "jul" => 7, "aug" => 8, "sep" => 9, "oct" => 10, "nov" => 11, "dec" => 12 ); while (<>) { chomp; my @val = split (/ /, $_, 10000); # split, preserving all fields for my $i (0 .. @val - 1) { # reformat the value if it matches the pattern, otherwise assume # it's not a date in the required format and leave it alone if ($val[$i] =~ /^([^.]+).? (d+), (d+)$/) { # use lowercase month name my ($month, $day, $year) = (lc ($1), $2, $3); if (exists ($map{$month})) { $val[$i] = sprintf ("%04d-%02d-%02d", $year, $map{$month}, $day); } else { # warn, but don't reformat warn "$val[$i]: bad date? "; } } } print join (" ", @val) . " "; } exit (0);
The script only does reformatting, it doesn't validate the dates. To do that, modify the script to use the Cookbook_Utils.pm module by adding this statement after the use strict line:
use Cookbook_Utils;
That gives the script access to the module's is_valid_date( ) routine. To use it, change the reformatting section of the script to look like this:
if (exists ($map{$month}) && is_valid_date ($year, $map{$month}, $day)) { $val[$i] = sprintf ("%04d-%02d-%02d", $year, $map{$month}, $day); } else { # warn, but don't reformat warn "$val[$i]: bad date? "; }
Using the mysql Client Program
Writing MySQL-Based Programs
Record Selection Techniques
Working with Strings
Working with Dates and Times
Sorting Query Results
Generating Summaries
Modifying Tables with ALTER TABLE
Obtaining and Using Metadata
Importing and Exporting Data
Generating and Using Sequences
Using Multiple Tables
Statistical Techniques
Handling Duplicates
Performing Transactions
Introduction to MySQL on the Web
Incorporating Query Resultsinto Web Pages
Processing Web Input with MySQL
Using MySQL-Based Web Session Management
Appendix A. Obtaining MySQL Software
Appendix B. JSP and Tomcat Primer
Appendix C. References