Making Sure a Sorted Array Stays Sorted

Problem

You want to make sure an array stays sorted, even as you replace its elements or add new elements to it.

Solution

Subclass Array and override the methods that add items to the array. The new implementations add every new item to a position that maintains the sortedness of the array.

As you can see below, there are a lot of these methods. If you can guarantee that a particular method will never be called, you can get away with not overriding it.

	class 
SortedArray < Array

	 def initialize(*args, &sort_by)
	 @sort_by = sort_by || Proc.new { |x,y| x <=> y }
	 super(*args)
	 sort! &sort_by
	 end

	 def insert(i, v)
	 # The next line could be further optimized to perform a
	 # binary search.
	 insert_before = index(find { |x| @sort_by.call(x, v) == 1 })
	 super(insert_before ? insert_before : -1, v)
	 end

	 def <<(v)
	 insert(0, v)
	 end

	 alias push <<
	 alias unshift <<

Some methods, like collect!, can modify the items in an array, taking them out of sort order. Some methods, like flatten!, can add new elements to strange places in an array. Rather than figuring out a way to implement these methods in a way that preserves the sortedness of the array, we'll just let them run and then re-sort the array.[1]

[1] We can't use define_method to define these methods because in Ruby 1.8 you can't use define_method to create a method that takes a block argument. See Chapter 10 for more on this.

	 ["collect!", "flatten!", "[]="].each do |method_name|
	 module_eval %{
	 def #{method_name}(*args)
	 super
	 sort! &@sort_by
	 end
	 }
	 end

	 def reverse!
	 #Do nothing; reversing the array would disorder it.
	 end
	end

A SortedArray created from an unsorted array will end up sorted:

	a = SortedArray.new([3,2,1]) # => [1, 2, 3]

 

Discussion

Many methods of Array are much faster on sorted arrays, so it's often useful to expend some overhead on keeping an array sorted over time. Removing items from a sorted array won't unsort it, but adding or modifying items can. Keeping a sorted array sorted means intercepting and reimplementing every sneaky way of putting objects into the array.

The SortedArray constructor accepts any code block you can pass into Array#sort, and keeps the array sorted according to that code block. The default code block uses the comparison operator (<=>) used by sort.

	unsorted= ["b", "aa", "a", "cccc", "1", "zzzzz", "k", "z"]
	strings_by_alpha = SortedArray.new(unsorted)
	# => ["1", "a", "aa", "b", "cccc", "k", "z", "zzzzz"]
	strings_by_length = SortedArray.new(unsorted) do |x,y|
	 x.length <=> y.length
	end
	# => ["b", "z", "a", "k", "1", "aa", "cccc", "zzzzz"]

The methods that add elements to an array specify where in the array they operate: push operates on the end of the array, and insert operates on a specified spot. SortedArray responds to these methods but it ignores the caller's request to put elements in a certain place. Every new element is inserted into a position that keeps the array sorted.

	a << -1 # => [-1, 1, 2, 3]
	a << 1.5 # => [-1, 1, 1.5, 2, 3]
	a.push(2.5) # => [-1, 1, 1.5, 2, 2.5, 3]
	a.unshift(1.6) # => [-1, 1, 1.5, 1.6, 2, 2.5, 3]

For methods like collect! and array assignment ([]=)that allow complex changes to an array, the simplest solution is to allow the changes to go through and then re-sort:

	a = SortedArray.new([10, 6, 4, -4, 200, 100]) 
	# => [-4, 4, 6, 10, 100, 200]
	a.collect! { |x| x * -1 } # => [-200, -100, -10, -6, -4, 4]

	a[3] = 25
	a # => [-200, -100, -10, -4, 4, 25]
	# That is, -6 has been replaced by 25 and the array has been re-sorted.

	a[1..2] = [6000, 10, 600, 6]
	a # => [-200, -4, 4, 6, 10, 25, 600, 6000]
	# That is, -100 and -10 have been replaced by 6000, 10, 600, and 6,
	# and the array has been re-sorted.

But with a little more work, we can write a more efficient implementation of array assignment that gives the same behavior. What happens when you run a command like a[0]= 10 on a SortedArray? The first element in the SortedArray is replaced by 10, and the SortedArray is re-sorted. This is equivalent to removing the first element in the array, then adding the value 10 to a place in the array that keeps it sorted.

Array#[]= implements three different types of array assignment, but all three can be modeled as a series of removals followed by a series of insertions. We can use this fact to implement a more efficient version of SortedArray#[]=:.

	class 
SortedArray
	 def []=(*args)
	 if args.size == 3
	 #e.g. "a[6,3] = [1,2,3]"
	 start, length, value = args
	 slice! Range.new(start, start+length, true)
	 (value.respond_to? :each) ? value.each { |x| self << x } : self << value
	 elsif args.size == 2
	 index, value = args
	 if index.is_a? Numeric
	 #e.g. "a[0] = 10" (the most common form of array assignment)
	 delete_at(index)
	 self << value
	 elsif index.is_a? Range
	 #e.g. "a[0..3] = [1,2,3]"
	 slice! index
	 (value.respond_to? :each) ? value.each { |x| self << x } : self << value
	 else
	 #Not supported. Delegate to superclass; will probably give an error.
	 super
	 sort!(&sort_by)
	 end
	 else
	 #Not supported. Delegate to superclass; will probably give an error.
	 super
	 sort!(&sort_by)
	 end
	 end
	end

Just as before, the sort will be maintained even when you use array assignment to replace some of a SortedArray's elements with other objects. But this implementation doesn't have to re-sort the array every time.

	a = SortedArray.new([1,2,3,4,5,6])
	a[0] = 10
	a # => [2, 3, 4, 5, 6, 10]

	a[0, 2] = [100, 200]
	a # => [4, 5, 6, 10, 100, 200]

	a[1..2] = [-4, 6]
	a # => [-4, 4, 6, 10, 100, 200]

It's possible to subvert the sortedness of a SortedArray by modifying an object in place in a way that changes its sort order. Since the SortedArray never hears about the change to this object, it has no way of updating itself to move that object to its new sort position:[2]

[2] One alternative is to modify SortedArray[] so that when you look up an element of the array, you actually get a delegate object that intercepts all of the element's method calls, and re-sorts the array whenever the user calls a method that modifies the element in place. This is probably overkill.

	stripes = 
SortedArray.new(["aardwolf", "zebrafish"])
	stripes[1].upcase! 
	stripes # => ["aardwolf", "ZEBRAFISH"]
	stripes.sort! # => ["ZEBRAFISH", "aardwolf"]

If this bothers you, you can make a SortedArray keep frozen copies of objects instead of the objects themselves. This solution hurts performance and uses more memory, but it will also prevent objects from being modified after being put into the SortedArray. This code adds a convenience method to Object that makes a frozen copy of the object:

	class Object
	 def to_frozen
	 f = self
	 unless frozen?
	 begin 
	 f = dup.freeze
	 rescue TypeError 
	 #This object can't be duped (e.g. Fixnum); fortunately,
	 #it usually can't be modified either 
	 end
	 end
	 return f
	 end
	end

The FrozenCopySortedArray stores frozen copies of objects instead of the objects themselves:

	class FrozenCopySortedArray < SortedArray
	 def insert(i, v) 
	 insert_before = index(find { |x| x > v })
	 super(insert_before ? insert_before : -1, v.to_frozen)
	 end

	 ["initialize", "collect!", "flatten!"].each do |method_name|
	 define_method(method_name) do 
	 super
	 each_with_index { |x, i| self[i] = x.to_frozen }
	 # No need to sort; by doing an assignment to every element
	 # in the array, we've made #insert keep the array sorted.
	 end
	 end
	end

	stripes = SortedArray.new(["aardwolf", "zebrafish"])
	stripes[1].upcase! 
	# TypeError: can't modify frozen string

Unlike a regular array, which can have elements of arbitrarily different data classes, all the elements of a SortedArray must be mutually comparable. For instance, you can mix integers and floating-point numbers within a SortedArray, but you can't mix integers and strings. Any data set that would cause Array# sort to fail makes an invalid SortedArray:

	[1, "string"].sort
	# ArgumentError: comparison of Fixnum with String failed

	a = SortedArray.new([1]) 
	a << "string" 
	# ArgumentError: comparison of Fixnum with String failed

One other pitfall: operations that create a new object, such as |=, +=, and to_a will turn an SortedArray into a (possibly unsorted) array.

	a = SortedArray.new([3, 2, 1]) # => [1, 2, 3]
	a += [1, -10] # => [1, 2, 3, 1, -10]
	a.class # => Array

The simplest way to avoid this is to override these methods to transform the resulting array back into a SortedArray:

	class SortedArray 
	 def + (other_array) 
	 SortedArray.new(super)
	 end
	end

 

See Also

  • Recipe 4.11, "Getting the N Smallest Items of an Array," uses a SortedArray
  • If you're going to do a lot of insertions and removals, a red-black tree may be faster than a SortedArray; you can choose from a pure Ruby implementation (http://www.germane-software.com/software/Utilities/RBTree/) and one that uses a C extension for speed (http://www.geocities.co.jp/SiliconValley-PaloAlto/3388/rbtree/README.html)


Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration



Ruby Cookbook
Ruby Cookbook (Cookbooks (OReilly))
ISBN: 0596523696
EAN: 2147483647
Year: N/A
Pages: 399

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net