Getting the N Smallest Items of an Array

Problem

You want to find the smallest few items in an array, or the largest, or the most extreme according to some other measure.

Solution

If you only need to find the single smallest item according to some measure, use Enumerable#min. By default, it uses the <=> method to see whether one item is "smaller" than another, but you can override this by passing in a code block.

	[3, 5, 11, 16].min 
	# => 3
	["three", "five", "eleven", "sixteen"].min
	# => "eleven"
	["three", "five", "eleven", "sixteen"].min { |x,y| x.size <=> y.size }
	# => "five" 

Similarly, if you need to find the single largest item, use Enumerable#max.

	[3, 5, 11, 16].max 
	# => 16
	["three", "five", "eleven", "sixteen"].max
	# => "three"
	["three", "five", "eleven", "sixteen"].max { |x,y| x.size <=> y.size }
	# => "sixteen"

By default, arrays are sorted by their natural order: numbers are sorted by value, strings by their position in the ASCII collating sequence (basically alphabetical order, but all lowercase characters precede all uppercase characters). Hence, in the previous examples, "three" is the largest string, and "eleven" the smallest.

It gets more complicated when you need to get a number of the smallest or largest elements according to some measurement: say, the top 5 or the bottom 10. The simplest solution is to sort the list and skim the items you want off of the top or bottom.

	l = [1, 60, 21, 100, -5, 20, 60, 22, 85, 91, 4, 66]
	sorted = l.sort

	#The top 5
	sorted[-5…sorted.size] 
	# => [60, 66, 85, 91, 100] 

	#The bottom 5 
	sorted[0…5] 
	# => [-5, 1, 4, 20, 21]

Despite the simplicity of this technique, it's inefficient to sort the entire list unless the number of items you want to extract approaches the size of the list.

Discussion

The min and max methods work by picking the first element of the array as a "champion," then iterating over the rest of the list trying to find an element that can beat the current champion on the appropriate metric. When it finds one, that element becomes the new champion. An element that can beat the old champion can also beat any of the other contenders seen up to that point, so one run through the list suffices to find the maximum or minimum.

The naive solution to finding more than one smallest item is to repeat this process multiple times. Iterate over the Array once to find the smallest item, then iterate over it again to find the next-smallest item, and so on. This is naive for the same reason a bubble sort is naive: you're repeating many of your comparisons more times than necessary. Indeed, if you run this algorithm once for every item in the array (trying to find the n smallest items in an array of n items), you get a bubble sort.

Sorting the list beforehand is better when you need to find more than a small fraction of the items in the list, but it's possible to do better. After all, you don't really want to sort the whole list: you just want to sort the bottom of the list to find the smallest items. You don't care if the other elements are unsorted because you're not interested in those elements anyway.

To sort only the smallest elements, you can keep a sorted "stable" of champions, and kick the largest champion out of the stable whenever you find an element that's smaller. If you encounter a number that's too large to enter the stable, you can ignore it from that point on. This process rapidly cuts down on the number of elements you must consider, making this approach faster than doing a sort.

The SortedList class from Recipe 4.7 is useful for this task. The min_n method below creates a SortedList "stable" that keeps its elements sorted based on the same block being used to find the minimum. It keeps the stable at a certain size by kicking out the largest item in the stable whenever a smaller item is found. The max_n method works similarly, but the comparisons are reversed, and the smallest element in the stable is kicked out when a larger element is found.

	module Enumerable
	 def min_n(n, &block)
	 block = Proc.new { |x,y| x <=> y } if block == nil
	 stable = SortedArray.new(&block) 
	 each do |x|
	 stable << x if stable.size < n or block.call(x, stable[-1]) == -1
	 stable.pop until stable.size <= n
	 end
	 return stable 
	 end

	 def max_n(n, &block)
	 block = Proc.new { |x,y| x <=> y } if block == nil
	 stable = SortedArray.new(&block) 
	 each do |x|
	 stable << x if stable.size < n or block.call(x, stable[0]) == 1
	 stable.shift until stable.size <= n
	 end
	 return stable 
	 end
	end
	
	l = [1, 60, 21, 100, -5, 20, 60, 22, 85, 91, 4, 66]
	l.max_n(5)
	# => [60, 66, 85, 91, 100] 
	l.min_n(5)
	# => [-5, 1, 4, 20, 21]
	
	l.min_n(5) { |x,y| x.abs <=> y.abs }
	# => [1, 4, -5, 20, 21]

 

See Also

  • Recipe 4.7, "Making Sure a Sorted Array Stays Sorted"


Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration



Ruby Cookbook
Ruby Cookbook (Cookbooks (OReilly))
ISBN: 0596523696
EAN: 2147483647
Year: N/A
Pages: 399

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net