17.20 Module: Parsing a String into a DateTime Object Portably


17.20 Module: Parsing a String into a Date/Time Object Portably

Credit: Brett Cannon

Python's time module supplies the parsing function strptime only on some platforms, and not on Windows. Example 17-2 shows a strptime function that is a pure Python implementation of the time.strptime function that comes with Python. It is similar to how time.strptime is documented in the standard Python documentation. It accepts two more optional arguments, as shown in the following signature:

strptime(string, format="%a %b %d %H:%M:%S %Y", option=AS_IS, locale_setting=ENGLISH)

option's default value of AS_IS gets time information from the string, without any checking or filling-in. You can pass option as CHECK, so that the function makes sure that whatever information it gets is within reasonable ranges (raising an exception otherwise), or FILL_IN (like CHECK, but also tries to fill in any missing information that can be computed). locale_setting accepts a locale tuple (as created by LocaleAssembly) to specify names of days, months, and so on. Currently, ENGLISH and SWEDISH locale tuples are built into this recipe's strptime module.

Although this recipe's strptime cannot be as fast as the version in the standard Python library, that's hardly ever a major consideration for typical strptime use. This recipe does offer two substantial advantages. It runs on any platform supporting Python and gives perfectly identical results on different platforms, while time.strptime exists only on some platforms and tends to have different quirks on each platform that supplies it. The optional checking and filling-in of information that this recipe provides is also quite handy.

The locale-setting support of this version of strptime was inspired by that in Andrew Markebo's own strptime, which you can find at http://www.fukt.hk-r.se/~flognat/hacks/strptime.py. However, this recipe has a more complete implementation of strptime's specification that is based on regular expressions, rather than relying on whitespace and miscellaneous characters to split strings. For example, this recipe can correctly parse strings based on a format such as "%Y%m%d".

Example 17-2. Parsing a string into a date/time object portably
""" A pure-Python version of strptime. As close as possible to time.strptime's specs in the official Python docs. Locales supported via LocaleAssembly -- examples supplied for English and Swedish, follow the examples to add your own locales. Thanks to Andrew Markebo for his pure Python version of strptime, which convinced me to improve locale support -- and, of course, to Guido van Rossum and all other contributors to Python, the best language I've ever used! """ import re from exceptions import Exception _ _all_ _ = ['strptime', 'AS_IS', 'CHECK', 'FILL_IN',            'LocaleAssembly', 'ENGLISH', 'SWEDISH'] # metadata module _ _author_ _ = 'Brett Cannon' _ _email_ _ = 'drifty@bigfoot.com' _ _version_ _ = '1.5cb' _ _url_ _ = 'http://www.drifty.org/' # global settings and parameter constants CENTURY = 2000 AS_IS = 'AS_IS' CHECK = 'CHECK' FILL_IN = 'FILL_IN' def LocaleAssembly(DirectiveDict, MonthDict, DayDict, am_pmTuple):     """ Creates locale tuple for use by strptime.     Accepts arguments dictionaries DirectiveDict (locale-specific regexes for     extracting info from time strings), MonthDict (locale-specific full and     abbreviated month names), DayDict (locale-specific full and abbreviated     weekday names), and the am_pmTuple tuple (locale-specific valid     representations of AM and PM, as a two-item tuple). Look at how the     ENGLISH dictionary is created for an example; make sure your dictionary has values     corresponding to each entry in the ENGLISH dictionary. You can override     any value in the BasicDict with an entry in DirectiveDict.     """     BasicDict={'%d':r'(?P<d>[0-3]\d)', # Day of the month [01,31]         '%H':r'(?P<H>[0-2]\d)', # Hour (24-h) [00,23]         '%I':r'(?P<I>[01]\d)', # Hour (12-h) [01,12]         '%j':r'(?P<j>[0-3]\d\d)', # Day of the year [001,366]         '%m':r'(?P<m>[01]\d)', # Month [01,12]         '%M':r'(?P<M>[0-5]\d)', # Minute [00,59]         '%S':r'(?P<S>[0-6]\d)', # Second [00,61]         '%U':r'(?P<U>[0-5]\d)', # Week in the year, Sunday first [00,53]         '%w':r'(?P<w>[0-6])', # Weekday [0(Sunday),6]         '%W':r'(?P<W>[0-5]\d)', # Week in the year, Monday first [00,53]         '%y':r'(?P<y>\d\d)', # Year without century [00,99]         '%Y':r'(?P<Y>\d\d\d\d)', # Year with century         '%Z':r'(?P<Z>(\D+ Time)|([\S\D]{3,3}))', # Timezone name or empty         '%%':r'(?P<percent>%)' # Literal "%" (ignored, in the end)         }     BasicDict.update(DirectiveDict)     return BasicDict, MonthDict, DayDict, am_pmTuple # helper function to build locales' month and day dictionaries def _enum_with_abvs(start, *names):     result = {}     for i in range(len(names)):         result[names[i]] = result[names[i][:3]] = i+start     return result """ Built-in locales """ ENGLISH_Lang = (     {'%a':r'(?P<a>[^\s\d]{3,3})', # Abbreviated weekday name      '%A':r'(?P<A>[^\s\d]{6,9})', # Full weekday name      '%b':r'(?P<b>[^\s\d]{3,3})', # Abbreviated month name      '%B':r'(?P<B>[^\s\d]{3,9})', # Full month name       # Appropriate date and time representation.      '%c':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d) '           r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)',      '%p':r'(?P<p>(a|A|p|P)(m|M))', # Equivalent of either AM or PM       # Appropriate date representation      '%x':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d)',       # Appropriate time representation      '%X':r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)'},     _enum_with_abvs(1, 'January', 'February', 'March', 'April', 'May', 'June',         'July', 'August', 'September', 'October', 'November', 'December'),     _enum_with_abvs(0, 'Monday', 'Tuesday', 'Wednesday', 'Thursday',         'Friday', 'Saturday', 'Sunday'),     (('am','AM'),('pm','PM'))     ) ENGLISH = LocaleAssembly(*ENGLISH_Lang) SWEDISH_Lang = (     {'%a':r'(?P<a>[^\s\d]{3,3})',      '%A':r'(?P<A>[^\s\d]{6,7})',      '%b':r'(?P<b>[^\s\d]{3,3})',      '%B':r'(?P<B>[^\s\d]{3,8})',      '%c':r'(?P<a>[^\s\d]{3,3}) (?P<d>[0-3]\d) '           r'(?P<b>[^\s\d]{3,3}) (?P<Y>\d\d\d\d) '           r'(?P<H>[0-2]\d):(?P<M>[0-5]\d):(?P<S>[0-6]\d)',      '%p':r'(?P<p>(a|A|p|P)(m|M))',      '%x':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d)',      '%X':r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)'},     _enum_with_abvs(1, 'Januari', 'Februari', 'Mars', 'April', 'Maj', 'Juni',         'Juli', 'Augusti', 'September', 'Oktober', 'November', 'December'),     _enum_with_abvs(0, 'Måndag', 'Tisdag', 'Onsdag', 'Torsdag',         'Fredag', 'Lördag', 'Söndag'),     (('am','AM'),('pm','PM'))     ) SWEDISH = LocaleAssembly(*SWEDISH_Lang) class StrptimeError(Exception):     """ Exception class for the module """     def _ _init_ _(self, args=None): self.args = args def _g2j(y, m, d):     """ Gregorian-to-Julian utility function, used by _StrpObj """     a = (14-m)/12     y = y+4800-a     m = m+12*a-3     return d+((153*m+2)/5)+365*y+y/4-y/100+y/400-32045 class _StrpObj:     """ An object with basic time-manipulation methods """     def _ _init_ _(self, year=None, month=None, day=None, hour=None, minute=None,          second=None, day_week=None, julian_date=None, daylight=None):         """ Sets up instances variables. All values can be set at         initialization. Any info left out is automatically set to None. """         def _set_vars(_adict, **kwds): _adict.update(kwds)         _set_vars(self._ _dict_ _, **vars(  ))     def julianFirst(self):         """ Calculates the Julian date for the first day of year self.year """         return _g2j(self.year, 1, 1)     def gregToJulian(self):     """ Converts the Gregorian date to day within year (Jan 1 == 1) """         julian_day = _g2j(self.year, self.month, self.day)         return julian_day-self.julianFirst(  )+1     def julianToGreg(self):         """ Converts the Julian date to the Gregorian date """         julian_day = self.julian_date+self.julianFirst(  )-1         a = julian_day+32044         b = (4*a+3)/146097         c = a-((146097*b)/4)         d = (4*c+3)/1461         e = c-((1461*d)/4)         m = (5*e+2)/153         day = e-((153*m+2)/5)+1         month = m+3-12*(m/10)         year = 100*b+d-4800+(m/10)         return year, month, day     def dayWeek(self):         """ Figures out the day of the week using self.year, self.month, and         self.day. Monday is 0. """         a = (14-self.month)/12         y = self.year-a         m = self.month+12*a-2         day_week = (self.day+y+(y/4)-(y/100)+(y/400)+((31*m)/12))%7         if day_week==0: day_week = 6         else: day_week = day_week-1         return day_week     def FillInInfo(self):         """ Based on the current time information, it figures out what other         info can be filled in. """         if self.julian_date is None and self.year and self.month and self.day:             julian_date = self.gregToJulian(  )             self.julian_date = julian_date         if (self.month is None or self.day is None                 ) and self.year and self.julian_date:             gregorian = self.julianToGreg(  )             self.month = gregorian[1] # year ignored, must already be okay             self.day = gregorian[2]         if self.day_week is None and self.year and self.month and self.day:             self.dayWeek(  )     def CheckIntegrity(self):         """ Checks info integrity based on the range that a number can be.         Any invalid info raises StrptimeError. """         def _check(value, low, high, name):             if value is not None and not low<value<high:                 raise StrptimeError, "%s incorrect"%name         _check(self.month, 1, 12, 'Month')         _check(self.day, 1, 31, 'Day')         _check(self.hour, 0, 23, 'Hour')         _check(self.minute, 0, 59, 'Minute')         _check(self.second, 0, 61, 'Second')  # 61 covers leap seconds         _check(self.day_week, 0, 6, 'Day of the Week')         _check(self.julian_date, 0, 366, 'Julian Date')         _check(self.daylight, -1, 1, 'Daylight Savings')     def return_time(self):         """ Returns a tuple of numbers in the format used by time.gmtime(  ).         All instances of None in the information are replaced with 0. """         temp_time = (self.year, self.month, self.day, self.hour, self.minute,              self.second, self.day_week, self.julian_date, self.daylight)         return tuple([t or 0 for t in temp_time])     def RECreation(self, format, DIRECTIVEDict):         """ Creates re based on format string and DIRECTIVEDict """         Directive = 0         REString = []         for char in format:             if char=='%' and not Directive:                 Directive = 1             elif Directive:                 try: REString.append(DIRECTIVEDict['%'+char])                 except KeyError: raise StrptimeError,"Invalid format %s"%char                 Directive = 0             else:                 REString.append(char)         return re.compile(''.join(REString), re.IGNORECASE)     def convert(self, string, format, locale_setting):         """ Gets time info from string based on format string and a locale         created by LocaleAssembly(  ) """         DIRECTIVEDict, MONTHDict, DAYDict, AM_PM = locale_setting         REComp = self.RECreation(format, DIRECTIVEDict)         reobj = REComp.match(string)         if reobj is None: raise StrptimeError,"Invalid string (%s)"%string         for found in reobj.groupdict().keys(  ):             if found in 'y','Y': # year                 if found=='y': # without century                     self.year = CENTURY+int(reobj.group('y'))                 else: # with century                     self.year = int(reobj.group('Y'))             elif found in 'b','B','m': # month                 if found=='m': # month number                     self.month = int(reobj.group(found))                 else: # month name                     try:                         self.month = MONTHDict[reobj.group(found)]                     except KeyError:                         raise StrptimeError, 'Unrecognized month'             elif found=='d': # day of the month                 self.day = int(reobj.group(found))             elif found in 'H','I': # hour                 hour = int(reobj.group(found))                 if found=='H': # hour number                     self.hour = hour                 else: # AM/PM format                     try:                         if reobj.group('p') in AM_PM[0]: AP = 0                         else: AP = 1                     except KeyError:                         raise StrptimeError, 'Lacking needed AM/PM information'                     if AP:                         if hour==12: self.hour = 12                         else: self.hour = 12+hour                     else:                         if hour==12: self.hour = 0                         else: self.hour = hour             elif found=='M': # minute                 self.minute = int(reobj.group(found))             elif found=='S': # second                 self.second = int(reobj.group(found))             elif found in 'a','A','w': # Day of the week                 if found=='w': # DOW number                     day_value = int(reobj.group(found))                     if day_value==0: self.day_week = 6                     else: self.day_week = day_value-1                 else: # DOW name                     try:                         self.day_week = DAYDict[reobj.group(found)]                     except KeyError:                         raise StrptimeError, 'Unrecognized day'             elif found=='j': # Julian date                 self.julian_date = int(reobj.group(found))             elif found=='Z': # daylight savings                 TZ = reobj.group(found)                 if len(TZ)==3:                     if TZ[1] in ('D','d'): self.daylight = 1                     else: self.daylight = 0                 elif TZ.find('Daylight')!=-1: self.daylight = 1                 else: self.daylight = 0 def strptime(string, format='%a %b %d %H:%M:%S %Y',         option=AS_IS, locale_setting=ENGLISH):     """ Returns a tuple representing the time represented in 'string'.     Valid values for 'options' are AS_IS, CHECK, and FILL_IN. 'locale_setting'     accepts locale tuples created by LocaleAssembly(  ). """     Obj = _StrpObj(  )     Obj.convert(string, format, locale_setting)     if option in FILL_IN,CHECK:         Obj.CheckIntegrity(  )     if option == FILL_IN:         Obj.FillInInfo(  )     return Obj.return_time(  )

17.20.1 See Also

The most up-to-date version of strptime is always available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime, where you will also find a test suite using PyUnit; Andrew Makebo's version of strptime is at http://www.fukt.hk-r.se/~flognat/hacks/strptime.py.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2005
Pages: 346

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net