Recipe3.6.Looking up Holidays Automatically

Recipe 3.6. Looking up Holidays Automatically

Credit: Anna Martelli Ravenscroft, Alex Martelli

Problem

Holidays vary by country, by region, even by union within the same company. You want an automatic way to determine the number of holidays that fall between two given dates.

Solution

Between two dates, there may be movable holidays, such as Easter and Labor Day (U.S.); holidays that are based on Easter, such as Boxing Day; holidays with a fixed date, such as Christmas; holidays that your company has designated (the CEO's birthday). You can deal with all of them using datetime and the third-party module dateutil.

A very flexible architecture is to factor out the various possibilities into separate functions to be called as appropriate:

import datetime from dateutil import rrule, easter try: set except NameError: from sets import Set as set def all_easter(start, end):     # return the list of Easter dates within start..end     easters = [easter.easter(y)                 for y in xrange(start.year, end.year+1)]     return [d for d in easters if start<=d<=end] def all_boxing(start, end):     # return the list of Boxing Day dates within start..end     one_day = datetime.timedelta(days=1)     boxings = [easter.easter(y)+one_day                 for y in xrange(start.year, end.year+1)]     return [d for d in boxings if start<=d<=end] def all_christmas(start, end):     # return the list of Christmas Day dates within start..end     christmases = [datetime.date(y, 12, 25)                     for y in xrange(start.year, end.year+1)]     return [d for d in christmases if start<=d<=end] def all_labor(start, end):     # return the list of Labor Day dates within start..end     labors = rrule.rrule(rrule.YEARLY, bymonth=9, byweekday=rrule.MO(1),                          dtstart=start, until=end)     return [d.date( ) for d in labors]   # no need to test for in-between here def read_holidays(start, end, holidays_file='holidays.txt'):     # return the list of dates from holidays_file within start..end     try:         holidays_file = open(holidays_file)     except IOError, err:         print 'cannot read holidays (%r):' % (holidays_file,), err         return [  ]     holidays = [  ]     for line in holidays_file:         # skip blank lines and comments         if line.isspace( ) or line.startswith('#'):             continue         # try to parse the format: YYYY, M, D         try:             y, m, d = [int(x.strip( )) for x in line.split(',')]             date = datetime.date(y, m, d)         except ValueError:             # diagnose invalid line and just go on             print "Invalid line %r in holidays file %r" % (                 line, holidays_file)             continue         if start<=date<=end:             holidays.append(date)     holidays_file.close( )     return holidays holidays_by_country = {     # map each country code to a sequence of functions     'US': (all_easter, all_christmas, all_labor),     'IT': (all_easter, all_boxing, all_christmas), } def holidays(cc, start, end, holidays_file='holidays.txt'):     # read applicable holidays from the file     all_holidays = read_holidays(start, end, holidays_file)     # add all holidays computed by applicable functions     functions = holidays_by_country.get(cc, ( ))     for function in functions:         all_holidays += function(start, end)     # eliminate duplicates     all_holidays = list(set(all_holidays))     # uncomment the following 2 lines to return a sorted list:     # all_holidays.sort( )     # return all_holidays     return len(all_holidays)    # comment this out if returning list if _ _name_ _ == '_ _main_ _':     test_file = open('test_holidays.txt', 'w')     test_file.write('2004, 9, 6\n')     test_file.close( )     testdates = [ (datetime.date(2004, 8,  1), datetime.date(2004, 11, 14)),                   (datetime.date(2003, 2, 28), datetime.date(2003,  5, 30)),                   (datetime.date(2004, 2, 28), datetime.date(2004,  5, 30)),                 ]     def test(cc, testdates, expected):         for (s, e), expect in zip(testdates, expected):             print 'total holidays in %s from %s to %s is %d (exp %d)' % (                     cc, s, e, holidays(cc, s, e, test_file.name), expect)             print     test('US', testdates, (1,1,1) )     test('IT', testdates, (1,2,2) )     import os     os.remove(test_file.name)

Discussion

In one company I worked for, there were three different unions, and holidays varied among the unions by contract. In addition, we had to track any snow days or other release days in the same way as "official" holidays. To deal with all the potential variations in holidays, it's easiest to factor out the calculation of standard holidays into their own functions, as we did in the preceding example for all_easter, all_labor, and so on. Examples of different types of calculations are provided so it's easy to roll your own as needed.

Although half-open intervals (with the lower bound included but the upper one excluded) are the norm in Python (and for good reasons, since they're arithmetically more malleable and tend to induce fewer bugs in your computations!), this recipe deals with closed intervals instead (both lower and upper bounds included). Unfortunately, that's how specifications in terms of date intervals tend to be given, and dateutil also works that way, so the choice was essentially obvious.

Each function is responsible for ensuring that it only returns results that meet our criteria: lists of datetime.date instances that lie between the dates (inclusive) passed to the function. For example, in all_labor, we coerce the datetime.datetime results returned by dateutil's rrule into datetime.date instances with the date method.

A company may choose to set a specific date as a holiday (such as a snow day) "just this once," and a text file may be used to hold such unique instances. In our example, the read_holidays function handles the task of reading and processing a text file, with one date per line, each in the format year, month, day. You could also choose to refactor this function to use a "fuzzy" date parser, as shown in Recipe 3.7.

If you need to look up holidays many times within a single run of your program, you may apply the optimization of reading and parsing the text file just once, then using the list of dates parsed from its contents each time that data is needed. However, "premature optimization is the root of all evil in programming," as Knuth said, quoting Hoare: by avoiding even this "obvious" optimization, we gain clarity and flexibility. Imagine these functions being used in an interactive environment, where the text file containing holidays may be edited between one computation and the next: by rereading the file each time, there is no need for any special check about whether the file was changed since you last read it!

Since countries often celebrate different holidays, the recipe provides a rudimentary holidays_by_country dictionary. You can consult plenty of web sites that list holidays by country to flesh out the dictionary for your needs. The important part is that this dictionary allows a different group of holidays-generating functions to be called, depending on which country code is passed to the holidays function. If your company has multiple unions, you could easily create a union-based dictionary, passing the union-code instead of (or for multinationals, in addition to) a country code to holidays. The holidays function calls the appropriate functions (including, unconditionally, read_holidays), concatenates the results, eliminates duplicates, and returns the length of the list. If you prefer, of course, you can return the list instead, by simply uncommenting two lines as indicated in the code.

Recipe3.6.Looking up Holidays Automatically