Recipe1.24.Making Some Strings Case-Insensitive


Recipe 1.24. Making Some Strings Case-Insensitive

Credit: Dale Strickland-Clark, Peter Cogolo, Mark McMahon

Problem

You want to treat some strings so that all comparisons and lookups are case-insensitive, while all other uses of the strings preserve the original case.

Solution

The best solution is to wrap the specific strings in question into a suitable subclass of str:

class iStr(str):     """     Case insensitive string class.     Behaves just like str, except that all comparisons and lookups     are case insensitive.     """     def _ _init_ _(self, *args):         self._lowered = str.lower(self)     def _ _repr_ _(self):         return '%s(%s)' % (type(self)._ _name_ _, str._ _repr_ _(self))     def _ _hash_ _(self):         return hash(self._lowered)     def lower(self):         return self._lowered def _make_case_insensitive(name):     ''' wrap one method of str into an iStr one, case-insensitive '''     str_meth = getattr(str, name)     def x(self, other, *args):         ''' try lowercasing 'other', which is typically a string, but             be prepared to use it as-is if lowering gives problems,             since strings CAN be correctly compared with non-strings.         '''         try: other = other.lower( )         except (TypeError, AttributeError, ValueError): pass         return str_meth(self._lowered, other, *args)     # in Python 2.4, only, add the statement: x.func_name = name     setattr(iStr, name, x) # apply the _make_case_insensitive function to specified methods  for name in 'eq lt le gt gt ne cmp contains'.split( ):     _make_case_insensitive('_ _%s_ _' % name) for name in 'count endswith find index rfind rindex startswith'.split( ):     _make_case_insensitive(name) # note that we don't modify methods 'replace', 'split', 'strip', ... # of course, you can add modifications to them, too, if you prefer that. del _make_case_insensitive    # remove helper function, not needed any more

Discussion

Some implementation choices in class iStr are worthy of notice. First, we choose to generate the lowercase version once and for all, in method _ _init_ _, since we envision that in typical uses of iStr instances, this version will be required repeatedly. We hold that version in an attribute that is private, but not overly so (i.e., has a name that begins with one underscore, not two), because if iStr gets subclassed (e.g., to make a more extensive version that also offers case-insensitive splitting, replacing, etc., as the comment in the "Solution" suggests), iStr's subclasses are quite likely to want to access this crucial "implementation detail" of superclass iStr!

We do not offer "case-insensitive" versions of such methods as replace, because it's anything but clear what kind of input-output relation we might want to establish in the general case. Application-specific subclasses may therefore be the way to provide this functionality in ways appropriate to a given application. For example, since the replace method is not wrapped, calling replace on an instance of iStr returns an instance of str, not of iStr. If that is a problem in your application, you may want to wrap all iStr methods that return strings, simply to ensure that the results are made into instances of iStr. For that purpose, you need another, separate helper function, similar but not identical to the _make_case_insensitive one shown in the "Solution":

def _make_return_iStr(name):     str_meth = getattr(str, name)     def x(*args):         return iStr(str_meth(*args))     setattr(iStr, name, x)

and you need to call this helper function _make_return_iStr on all the names of relevant string methods returning strings such as:

for name in 'center ljust rjust strip lstrip rstrip'.split( ):     _make_return_iStr(name)

Strings have about 20 methods (including special methods such as _ _add_ _ and _ _mul_ _) that you should consider wrapping in this way. You can also wrap in this way some additional methods, such as split and join, which may require special handling, and others, such as encode and decode, that you cannot deal with unless you also define a case-insensitive unicode subtype. In practice, one can hope that not every single one of these methods will prove problematic in a typical application. However, as you can see, the very functional richness of Python strings makes it a bit of work to customize string subtypes fully, in a general way without depending on the needs of a specific application.

The implementation of iStr is careful to avoid the boilerplate code (meaning repetitious and therefore bug-prone code) that we'd need if we just overrode each needed method of str in the normal way, with def statements in the class body. A custom metaclass or other such advanced technique would offer no special advantage in this case, so the boilerplate avoidance is simply obtained with one helper function that generates and installs wrapper closures, and two loops using that function, one for normal methods and one for special ones. The loops need to be placed after the class statement, as we do in this recipe's Solution, because they need to modify the class object iStr, and the class object doesn't exist yet (and thus cannot be modified) until the class statement has completed.

In Python 2.4, you can reassign the func_name attribute of a function object, and in this case, you should do so to get clearer and more readable results when introspection (e.g., the help function in an interactive interpreter session) is applied to an iStr instance. However, Python 2.3 considers attribute func_name of function objects to be read-only; therefore, in this recipe's Solution, we have indicated this possibility only in a comment, to avoid losing Python 2.3 compatibility over such a minor issue.

Case-insensitive (but case-preserving) strings have many uses, from more tolerant parsing of user input, to filename matching on filesystems that share this characteristic, such as all of Windows filesystems and the Macintosh default filesystem. You might easily find yourself creating a variety of "case-insensitive" container types, such as dictionaries, lists, sets, and so onmeaning containers that go out of their way to treat string-valued keys or items as if they were case-insensitive. Clearly a better architecture is to factor out the functionality of "case-insensitive" comparisons and lookups once and for all; with this recipe in your toolbox, you can just add the required wrapping of strings into iStr instances wherever you may need it, including those times when you're making case-insensitive container types.

For example, a list whose items are basically strings, but are to be treated case-insensitively (for sorting purposes and in such methods as count and index), is reasonably easy to build on top of iStr:

class iList(list):     def _ _init_ _(self, *args):         list._ _init_ _(self, *args)         # rely on _ _setitem_ _ to wrap each item into iStr...         self[:] = self     wrap_each_item = iStr     def _ _setitem_ _(self, i, v):         if isinstance(i, slice): v = map(self.wrap_each_item, v)         else: v = self.wrap_each_item(v)         list._ _setitem_ _(self, i, v)     def append(self, item):         list.append(self, self.wrap_each_item(item))     def extend(self, seq):         list.extend(self, map(self.wrap_each_item, seq))

Essentially, all we're doing is ensuring that every item that gets into an instance of iList gets wrapped by a call to iStr, and everything else takes care of itself.

Incidentally, this example class iList is accurately coded so that you can easily make customized subclasses of iList to accommodate application-specific subclasses of iStr: all such a customized subclass of iList needs to do is override the single class-level member named wrap_each_item.

See Also

Library Reference and Python in a Nutshell sections on str, string methods, and special methods used in comparisons and hashing.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net