Recipe 5.3. Sorting a List of Objects by an Attribute of the ObjectsCredit: Yakov Markovitch, Nick Perkins ProblemYou need to sort a list of objects according to one attribute of each object. SolutionThe DSU idiom shines, as usual: def sort_by_attr(seq, attr): intermed = [ (getattr(x, attr), i, x) for i, x in enumerate(seq) ] intermed.sort( ) return [ x[-1] for x in intermed ] def sort_by_attr_inplace(lst, attr): lst[:] = sort_by_attr(lst, attr) In Python 2.4, DSU is natively supported, so your code can be even shorter and faster: import operator def sort_by_attr(seq, attr): return sorted(seq, key=operator.attrgetter(attr)) def sort_by_attr_inplace(lst, attr): lst.sort(key=operator.attrgetter(attr)) DiscussionSorting a list of objects by an attribute of each object is best done using the DSU idiom introduced previously in Recipe 5.2. In Python 2.3 and 2.4, DSU is no longer needed, as it used to be, to ensure that a sort is stable (sorting is always stable in Python 2.3 and later), but DSU's speed advantages still shine. Sorting, in the general case and with the best algorithms, is O(n log n) (as is often the case in mathematical formulas, the juxtaposition of terms, here n and log n, indicates that the terms are multiplied). DSU's speed comes from maximally accelerating the O(n log n) part, which dominates sorting time for sequences of substantial length n, by using only Python's native (and maximally fast) comparison. The preliminary decoration step, which prepares an intermediate auxiliary list of tuples, and the successive undecoration step, which extracts the important item from each tuple after the intermediate list is sorted, are only O(n). Therefore any minor inefficiencies in these steps contribute negligible overhead if n is large enough, and reasonably little even for many practical values of n.
This recipe puts index i, in each tuple that is an item of list intermed, ahead of the corresponding x (where x is the i-th item in seq). This placement ensures that two items of seq will never be compared directly, even if they have the same value for the attribute named attr. Even in that case, their indices will still differ, and thus Python's lexicographic comparison of the tuples will never get all the way to comparing the tuples' last items (the original items from seq). Avoiding object comparisons may save us from performing extremely slow operations, or even from attempting forbidden ones. For example, we could sort a list of complex numbers by their real attribute: we would get an exception if we ever tried to compare two complex numbers directly, because no ordering is defined on complex numbers. But thanks to the precaution described in this paragraph, such an event can never occur, and the sorting will therefore proceed correctly. As mentioned earlier in Recipe 5.2, Python 2.4 supports DSU directly. You can pass an optional keyword-argument key, to sort, which is the callable to use on each item to get the sort key. Standard library module operator has two new functions, attrgetter and itemgetter, that exist specifically to return callables suitable for this purpose. In Python 2.4, the ideal solution to this problem therefore becomes: import operator seq.sort(key=operator.attrgetter(attr)) This snippet performs the sort in-place, which helps make it blazingly faston my machine, three times faster than the Python 2.3 function shown first in this recipe. If you need a sorted copy, without disturbing seq, you can get it using Python 2.4's new built-in function sorted: sorted_copy = sorted(seq, key=operator.attrgetter(attr)) While not quite as fast as an in-place sort, this latest snippet is still over 2.5 times faster than the function shown first in this recipe. Python 2.4 also guarantees that, when you pass the optional key named argument, list items will never be accidentally compared directly, so you need not take any special safeguards. Moreover, stability is also guaranteed. See AlsoRecipe 5.2; Python 2.4's Library Reference docs about the sorted built-in function, operator module's attrgetter and itemgetter functions, and the key argument to .sort and sorted. |