Recipe5.7.Keeping a Sequence Ordered as Items Are Added


Recipe 5.7. Keeping a Sequence Ordered as Items Are Added

Credit: John Nielsen

Problem

You want to maintain a sequence, to which items are added, in a sorted state, so that at any time, you can easily examine or remove the smallest item currently present in the sequence.

Solution

Say you start with an unordered list, such as:

the_list = [903, 10, 35, 69, 933, 485, 519, 379, 102, 402, 883, 1]

You could call the_list.sort( ) to make the list sorted and then result=the_list.pop(0) to get and remove the smallest item. But then, every time you add an item (say with the_list.append(0)), you need to call the_list.sort( ) again to keep the list sorted.

Alternatively, you can use the heapq module of the Python Standard Library:

import heapq heapq.heapify(the_list)

Now the list is not necessarily fully sorted, but it does satisfy the heap property (meaning if all indices involved are valid, the_list[i]<=the_list[2*i+1] and the_list[i]<=the_list[2*i+2])so, in particular, the_list[0] is the smallest item. To keep the heap property valid, use result=heapq.heappop(the_list) to get and remove the smallest item and heapq.heappush(the_list, newitem) to add a new item. When you need to do bothadd a new item while getting and removing the previously smallest itemyou can use result=heapq.heapreplace(the_list, newitem).

Discussion

When you need to retrieve data in an ordered way (at each retrieval getting the smallest item among those you currently have at hand), you can pay the runtime cost for the sorting when you retrieve the data, or you can pay for it when you add the data. One approach is to collect your data into a list and sort the list. Now it's easy to get your data in order, smallest to largest. However, you have to keep calling sort each time you add new data during the retrieval, to make sure you can later keep retrieving from the smallest current item after each addition. The method sort of Python lists is implemented with a little-known algorithm called Natural Mergesort, which minimizes the runtime cost of this approach. Yet the approach can still be burdensome: each addition (and sorting) and each retrieval (and removal, via pop) takes time proportional to the number of current items in the list (O(N), in common parlance).

An alternative approach is to use a data organization known as a heap, a type of binary tree implemented compactly, yet ensuring that each "parent" is always less than its "children". The best way to maintain a heap in Python is to use a list and have it managed by the heapq library module, as shown in this recipe's Solution. The list does not get fully sorted, yet you can be sure that, whenever you heappop an item from the list, you always get the lowest item currently present, and all others will be adjusted to ensure the heap property is still valid. Each addition with heappush, and each removal with heappop, takes a short time proportional to the logarithm of the current length of the list (O(log N), in common parlance). You pay as you go, a little at a time (and not too much in total, either.)

A good occasion to use this heap approach, for example, is when you have a long-running queue with new data periodically arriving, and you always want to be able to get the most important item off the queue without having to constantly re-sort your data or perform full searches. This concept is known as a priority queue, and a heap is an excellent way to implement it. Note that, intrinsically, the heapq module supplies you with the smallest item at each heappop, so make sure to arrange the way you encode your items' priority values to reflect this. For example, say that you receive incoming items each accompanied by a cost, and the most important item at any time is the one with the highest cost that is currently on the queue; moreover, among items of equal cost, the most important one is the one that arrived earliest. Here's a way to build a "priority queue" class respecting these specs and based on functions of module heapq:

class prioq(object):     def _ _init_ _(self):         self.q = [  ]         self.i = 0     def push(self, item, cost):         heapq.heappush(self.q, (-cost, self.i, item))         self.i += 1     def pop(self):         return heapq.heappop(self.q)[-1]

The main idea in this snippet is to push on the heap tuples whose first item is the cost with changed sign, so that higher costs result in smaller tuples (by Python's natural comparison); right after the cost, we put a progressive index, so that, among items with equal cost, the one arriving earliest will be in a smaller tuple.

In Python 2.4, module heapq has been reimplemented and optimized; see Recipe 5.8 for more information about heapq.

See Also

Docs for module heapq in the Library Reference and Python in a Nutshell; heapq.py in the Python sources contains a very interesting discussion of heaps; Recipe 5.8 for more information about heapq; Recipe 19.14 for merging sorted sequences using heapq.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net