Recipe5.5.Sorting Strings with Embedded Numbers


Recipe 5.5. Sorting Strings with Embedded Numbers

Credit: Sébastien Keim, Chui Tey, Alex Martelli

Problem

You need to sort a list of strings that contain substrings of digits (e.g., a list of postal addresses) in an order that looks good. For example, 'foo2.txt' should come before 'foo10.txt'. However, Python's default string comparison is alphabetical, so, by default, 'foo10.txt' instead comes before 'foo2.txt'.

Solution

You need to split each string into sequences of digits and nondigits, and transform each sequence of digits into a number. This gives you a list that is just the right comparison key for the sort you want, and you can then use DSU for the sort itselfthat is, code two functions, shorter than this description:

import re re_digits = re.compile(r'(\d+)') def embedded_numbers(s):     pieces = re_digits.split(s)             # split into digits/nondigits     pieces[1::2] = map(int, pieces[1::2])   # turn digits into numbers     return pieces def sort_strings_with_embedded_numbers(alist):     aux = [ (embedded_numbers(s), s) for s in alist ]     aux.sort( )     return [ s for _ _, s in aux ]           # convention: _ _ means "ignore"

In Python 2.4, use the native support for DSU, with the same function embedded_numbers to get the sort key:

def sort_strings_with_embedded_numbers(alist):     return sorted(alist, key=embedded_numbers)

Discussion

Say you have an unsorted list of filenames, such as:

files = 'file3.txt file11.txt file7.txt file4.txt file15.txt'.split( )

If you just sort and print this list, for example in Python 2.4 with print ' '.join(sorted(files)), your output looks like file11.txt file15.txt file3.txt file4.txt file7.txt, since, by default, strings are sorted alphabetically (to use a fancier word, the sort order is described as lexicographical). Python cannot just guess that you mean to treat in a different way those substrings that happen to be made of digits; you have to tell Python precisely what you want, and this recipe shows how.

Using this recipe, you can get a nicer-looking result:

print ' '.join(sort_strings_with_embedded_numbers(files))

The output is now file3.txt file4.txt file7.txt file11.txt file15.txt, which is probably just what you want in this case.

The implementation relies on the DSU idiom. We need to code DSU explicitly if we want to support Python 2.3, while if our code is Python 2.4-only, we just rely on the native implementation of DSU. We do so by passing an argument named key (a function to be called on each item to get the right comparison key for the sort) to the new built-in function sorted.

Function embedded_numbers in the recipe is how we get the right comparison key for each item: a list alternating substrings of nondigits, and the int obtained from each substring of digits. re_digits.split(s) gives us a list of alternating substrings of nondigits and digits (with the substrings of digits at odd-numbered indices); then, we use built-in functions map and int (and extended-form slices that get and set all items at odd-numbered indices) to turn sequences of digits into integers. Lexicographical comparison on this list of mixed types now produces just the right result.

See Also

Library Reference and Python in a Nutshell docs about extended slicing and about module re; Python 2.4 Library Reference about the sorted built-in function and the key argument to sort and sorted; Recipe 5.3; Recipe 5.2.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net