Recipe5.2.Sorting a List of Strings Case-Insensitively


Recipe 5.2. Sorting a List of Strings Case-Insensitively

Credit: Kevin Altis, Robin Thomas, Guido van Rossum, Martin V. Lewis, Dave Cross

Problem

You want to sort a list of strings, ignoring case differences. For example, you want a, although it's lowercase, to sort before B, although the latter is uppercase. By default, however, string comparison is case-sensitive (e.g., all uppercase letters sort before all lowercase ones).

Solution

The decorate-sort-undecorate (DSU) idiom is simple and fast:

def case_insensitive_sort(string_list):     auxiliary_list = [(x.lower( ), x) for x in string_list]    # decorate     auxiliary_list.sort( )                                     # sort     return [x[1] for x in auxiliary_list]                     # undecorate

In Python 2.4, DSU is natively supported, so (assuming the items of string_list are indeed strings, and not, e.g., Unicode objects), you can use the following even shorter and faster approach:

def case_insensitive_sort(string_list):     return sorted(string_list, key=str.lower)

Discussion

An obvious alternative to this recipe's Solution is to code a comparison function and pass it to the sort method:

def case_insensitive_sort_1(string_list):     def compare(a, b): return cmp(a.lower( ), b.lower( ))     string_list.sort(compare)

However, in this way the lower method gets called twice for every comparison, and the number of comparisons needed to sort a list of n items is typically proportional to n log(n).

The DSU idiom builds an auxiliary list, whose items are tuples where each item of the original list is preceded by a "key". The sort then takes place on the key, because Python compares tuples lexicographically (i.e., it compares the tuples' first items first). With DSU, the lower method gets called only n times to sort a list of n strings, which saves enough time to cover the small costs of the first, decorate step and the final, undecorate step, with a big net increase in speed.

DSU is also sometimes known, not quite correctly, as the Schwartzian Transform, by somewhat imprecise analogy with a well-known idiom of the Perl language. (If anything, DSU is closer to the Guttman-Rosler Transform, see http://www.sysarch.com/perl/sort_paper.html.)

DSU is so important that Python 2.4 supports it directly: you can optionally pass to the sort method of a list an argument named key, which is the callable to use on each item of the list to obtain the key for the sort. If you pass such an argument, the sorting internally uses DSU. So, in Python 2.4, string_list.sort(key=str.lower is essentially equivalent to function case_insensitive_sort, except the sort method sorts the list in-place (and returns None) instead of returning a sorted copy and leaving the original list alone. If you want function case_insensitive_sort to sort in-place, by the way, just change its return statement into an assignment to the list's body:

string_list[:] = [x[1] for x in auxiliary_list]

Vice versa, if, in Python 2.4, you want to get a sorted copy and leave the original list alone, you can use the new built-in function sorted. For example, in Python 2.4:

for s in sorted(string_list, key=str.lower): print s

prints each string in the list, sorted case-insensitively, without affecting string_list itself.

The use of str.lower as the key argument in the Python 2.4 Solution restricts you to specifically sorting strings (not, e.g., Unicode objects). If you know you're sorting a list of Unicode objects, use key=unicode.lower instead. If you need a function that applies just as well to strings and Unicode objects, you can import string and then use key=string.lower; alternatively, you could use key=lambda s: s.lower( ).

If you need case-insensitive sorting of lists of strings, you might also need dictionaries and sets using case-insensitive strings as keys, lists behaving case-insensitively regarding such methods as index and count, case-insensitive results from needle in haystack, and so on. If that is the case, then your real underlying need is a subtype of str that behaves case-insensitively in comparison and hashinga clearly better factoring of the issue, compared to implementing many container types and functions to get all of this functionality. To see how to implement such a type, see Recipe 1.24.

See Also

The Python Frequently Asked Questions http://www.python.org/cgi-bin/faqw.py?req=show&file=faq04.051.htp; Recipe 5.3; Python 2.4 Library Reference about the sorted built-in function and the key argument to sort and sorted; Recipe 1.24.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net