2.8 Sorting by Item or by Attribute


Credit: Matthew Wood

2.8.1 Problem

You need to sort a list of (x, y) coordinates by item y, or, more generally, sort a list of objects by any attribute or item.

2.8.2 Solution

You might first think of something like the following class, based on the simple but slow approach of passing a comparison function to the sort method:

class Sorter:     # Notice how _ _compare is dependent on self._ _whichindex     def _ _compare(self, x, y):         return cmp(x[self._ _whichindex], y[self._ _whichindex])     # Pass the sort function the _ _compare function defined above     def _ _call_ _(self, data, whichindex = None):         if whichindex is None :             data.sort(  )         else :             self._ _whichindex = whichindex             data.sort(self._ _compare)         return data               # handy for inlining, and low-cost

The trick is to use a bound method that accesses instance state as the comparison function to determine which item (or attribute, but this version deals only with items) to fetch. Unfortunately, this makes the approach nonreentrant and not thread-safe.

Thanks to the faster, more robust, and more flexible DSU idiom, it's not hard to make a more general version that allows attributes as well as items, guarantees a stable sort, is both reentrant and thread-safe, and is speedier to boot:

class Sorter:     def _helper(self, data, aux, inplace):         aux.sort(  )         result = [data[i] for junk, i in aux]         if inplace: data[:] = result         return result     def byItem(self, data, itemindex=None, inplace=1):         if itemindex is None:             if inplace:                 data.sort(  )                 result = data             else:                 result = data[:]                 result.sort(  )             return result         else:             aux = [(d[i][itemindex], i) for i in range(len(data))]             return self._helper(data, aux, inplace)     # a couple of handy synonyms     sort = byItem     _ _call_ _ = byItem     def byAttribute(self, data, attributename, inplace=1):         aux = [(getattr(d[i],attributename),i) for i in range(len(data))]         return self._helper(data, aux, inplace)

Of course, since the second version doesn't use its "classhood" in any way, making it a class is somewhat peculiar. It would be far more Pythonic, and clearer, to use a module with free-standing functions, rather than an artificial class with instances that have no state.

2.8.3 Discussion

How do you efficiently sort a list of (x, y) coordinates by y? More generally, how do you sort a list of dictionaries, lists, or class instances by a particular item or attribute? I hate not being able to sort by any item or attribute, so I wrote an auxiliary class to do it for me.

The DSU idiom is much faster than passing sort a comparison function, as discussed in other recipes. The second version of Sorter in this recipe uses DSU because of this, as well as auxiliary flexibility advantages. This second version gets no benefit from being a class rather than just a couple of functions, but casting it as a class makes it drop-in compatible with the first, inferior version, which did use some state as a trick (losing reentrancy and thread-safety in the process).

Here is some example code (note that it instantiates the Sorter class only once, another hint that it is not at all an optimal architecture to wrap this functionality as a class):

sort = Sorter(  ) if _ _name_ _ == '_ _main_ _' :       list = [(1, 2), (4, 8), (0, 3)]       dict = [{'a': 3, 'b': 4}, {'a': 5, 'b': 2}, {'a': 0, 'b': 0},           {'a': 9, 'b': 9}]       dumb = [1, 4, 6, 7, 2, 5, 9, 2, 4, 6]       print 'list normal:', list       sort(list, 0)       print 'sort by [0]:', list       sort(list, 1)       print 'sort by [1]:', list       print       print "dict normal:", dict       sort(dict, 'a')       print "sort by 'a':",  dict       sort(dict, 'b')       print "sort by 'b':",  dict       print       print 'dumb normal:', dumb       sort(dumb)       print 'normal sort:', dumb

Returning the sorted list is cheap (it's just an extra reference, since Python, fortunately, never does any copying of data unless you specifically request it) and offers uniform behavior between in-place and non-in-place cases. Often, we only want to do:

for x in sort(something, inplace=0):

Returning a reference to the sorted data gives us just the right amount of flexibility for this.

2.8.4 See Also

Recipe 2.4 and Recipe 4.10.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2005
Pages: 346

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net