Recipe3.7.Fuzzy Parsing of Dates


Recipe 3.7. Fuzzy Parsing of Dates

Credit: Andrea Cavalcanti

Problem

Your program needs to read and accept dates that don't conform to the datetime standard format of "yyyy, mm, dd".

Solution

The third-party dateutil.parser module provides a simple answer:

import datetime import dateutil.parser def tryparse(date):     # dateutil.parser needs a string argument: let's make one from our     # `date' argument, according to a few reasonable conventions...:     kwargs = {  }                                    # assume no named-args     if isinstance(date, (tuple, list)):         date = ' '.join([str(x) for x in date])    # join up sequences     elif isinstance(date, int):         date = str(date)                           # stringify integers     elif isinstance(date, dict):         kwargs = date                              # accept named-args dicts         date = kwargs.pop('date')                  # with a 'date' str     try:         try:             parsedate = dateutil.parser.parse(date, **kwargs)             print 'Sharp %r -> %s' % (date, parsedate)         except ValueError:             parsedate = dateutil.parser.parse(date, fuzzy=True, **kwargs)             print 'Fuzzy %r -> %s' % (date, parsedate)     except Exception, err:         print 'Try as I may, I cannot parse %r (%s)' % (date, err) if _ _name_ _ == "_ _main_ _":     tests = (             "January 3, 2003",                     # a string             (5, "Oct", 55),                        # a tuple             "Thursday, November 18",               # longer string without year             "7/24/04",                             # a string with slashes             "24-7-2004",                           # European-format string             {'date':"5-10-1955", "dayfirst":True}, # a dict including the kwarg             "5-10-1955",                           # dayfirst, no kwarg             19950317,                              # not a string             "11AM on the 11th day of 11th month, in the year of our Lord 1945",             )     for test in tests:                             # testing date formats         tryparse(test)                             # try to parse

Discussion

dateutil.parser's parse function works on a variety of date formats. This recipe demonstrates a few of them. The parser can handle English-language month-names and two- or four-digit years (with some constraints). When you call parse without named arguments, its default is to first try parsing the string argument in the following order: mm-dd-yy. If that does not make logical sense, as, for example, it doesn't for the '24-7-2004' string in the recipe, parse then tries dd-mm-yy. Lastly, it tries yy-mm-dd. If a "keyword" such as dayfirst or yearfirst is passed (as we do in one test), parse attempts to parse based on that keyword.

The recipe tests define a few edge cases that a date parser might encounter, such as trying to pass the date as a tuple, an integer (ISO-formatted without spaces), and even a phrase. To allow testing of the keyword arguments, the tryparse function in the recipe also accepts a dictionary argument, expecting, in this case, to find in it the value of the string to be parsed in correspondence to key 'date', and passing the rest on to dateutil's parser as keyword arguments.

dateutil's parser can provide a pretty good level of "fuzzy" parsing, given some hints to let it know which piece is, for example, the hour (such as the AM in the test phrase in this recipe). For production code, you should avoid relying on fuzzy parsing, and either do some kind of preprocessing, or at least provide some kind of mechanism for checking the accuracy of the parsed date.

See Also

For more on date-parsing algorithms, see dateutil documentation at https://moin.conectiva.com.br/DateUtil?action=highlight&value=DateUtil; for date handling, see the datetime documentation in the Library Reference.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net