Section 7.21. Formatting and Printing DateTime Values | The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)

7.20. Retrieving a Date/Time Value from a String

A date and time can be formatted as a string in many different ways because of abbreviations, varying punctuation, different orderings, and so on. Because of the various ways of formatting, writing code to decipher such a character string can be daunting. Consider these examples:

s1 = "9/13/98 2:15am" s2 = "1961-05-31" s3 = "11 July 1924" s4 = "April 17, 1929" s5 = "20 July 1969 16:17 EDT"  # That's one small step... s6 = "Mon Nov 13 2000" s7 = "August 24, 79"           # Destruction of Pompeii s8 = "8/24/79"

Fortunately, much of the work has already been done for us. The ParseDate module has a single class of the same name, which has a single method called parsedate. This method returns an array of elements in this order: year, month, day, hour, minute, second, time zone, day of week. Any fields that cannot be determined are returned as nil values.

require "parsedate.rb" include ParseDate p parsedate(s1)      # [98, 9, 13, 2, 15, nil, nil, nil] p parsedate(s2)      # [1961, 5, 31, nil, nil, nil, nil, nil] p parsedate(s3)      # [1924, 7, 11, nil, nil, nil, nil, nil] p parsedate(s4)      # [1929, 4, 17, nil, nil, nil, nil, nil] p parsedate(s5)      # [1969, 7, 20, 16, 17, nil, "EDT", nil] p parsedate(s6)      # [2000, 11, 13, nil, nil, nil, nil, 1] p parsedate(s7)      # [79, 8, 24, nil, nil, nil, nil, nil] p parsedate(s8,true) # [1979, 8, 24, nil, nil, nil, nil, nil]

The last two strings illustrate the purpose of parsedate's second parameter guess_year; because of our cultural habit of representing a year as two digits, ambiguity can result. Thus the last two strings are interpreted differently because we parse s8 with guess_year set to true, resulting in its conversion to a four-digit year. On the other hand, s7 refers to the eruption of Vesuvius in 79 AD, so we definitely want a two-digit year there.

The rule for guess_year is this: If the year is less than 100 and guess_year is true, convert to a four-digit year. The conversion will be done as follows: If the year is 70 or greater, add 1900 to it; otherwise, add 2000. Thus 75 will translate to 1975, but 65 will translate to 2065. This rule is not uncommon in the computing world.

What about s1, where we probably intended 1998 as the year? All is not lost, so long as we pass this number to some other piece of code that interprets it as 1998.

Note that parsedate does virtually no error checking. For example, if you feed it a date with a weekday and a date that does not correspond correctly, it will not detect this discrepancy. It is only a parser, and it does this job pretty well, but no other.

Also note an American bias in this code. An American writing 3/4/2001 usually means March 4, 2001; in Europe and most other places, this would mean April 3 instead. But if all the data are consistent, this is not a huge problem; because the return value is simply an array, you can mentally switch the meaning of elements 1 and 2. Be aware also that this bias happens even with a date such as 15/3/2000, where it is clear (to us) that 15 is the day. The parsedate method will happily return 15 as the month value.