Problem
You want to partition a Set or array based on some attribute of its elements. All elements that go "together" in some code-specific sense should be grouped together in distinct data structures. For instance, if you're partitioning by color, all the green objects in a Set should be grouped together, separate from the group of all the red objects in the Set.
Solution
Use Set#divide, passing in a code block that returns the partition of the object it's passed. The result will be a new Set containing a number of partitioned subsets of your original Set.
The code block can accept either a single argument or two arguments.[3] The single-argument version examines each object to see which subset it should go into.
[3] This is analogous to the one-argument code block passed into Enumerable#sort_by and the two-argument code block passed into Array#sort.
require 'set' s = Set.new((1..10).collect) # => # # Divide the set into the "true" subset and the "false" subset: that # is, the "less than 5" subset and the "not less than 5" subset. s.divide { |x| x < 5 } # => #, #}> # Divide the set into the "0" subset and the "1" subset: that is, the # "even" subset and the "odd" subset. s.divide { |x| x % 2 } # => #, #}> s = Set.new([1, 2, 3, 'a', 'b', 'c', -1.0, -2.0, -3.0]) # Divide the set into the "String subset, the "Fixnum" subset, and the # "Float" subset. s.divide { |x| x.class } # => #, # => #, # => #}>
For the two-argument code block version of Set#divide, the code block should return true if both the arguments it has been passed should be put into the same subset.
s = [1, 2, 3, -1, -2, -4].to_set # Divide the set into sets of numbers with the same absolute value. s.divide { |x,y| x.abs == y.abs } # => #, # => #, # => #, # => #}> # Divide the set into sets of adjacent numbers s.divide { |x,y| (x-y).abs == 1 } # => #, # => #, # => #}>
If you want to classify the subsets by the values they have in common, use Set#classify instead of Set#divide. It works like Set#divide, but it returns a hash that maps the names of the subsets to the subsets themselves.
s.classify { |x| x.class } # => {String=>#, # => Fixnum=>#, # => Float=>#}
Discussion
The version of Set#divide that takes a two-argument code block uses the tsort library to turn the Set into a directed graph. The nodes in the graph are the items in the Set. Two nodes x and y in the graph are connected with a vertex (one-way arrow) if the code block returns true when passed |x,y|. For the Set and the two-argument code block given in the example above, the graph looks like Figure 4-1.
Figure 4-1. The set {1, 2, 3, -1, -2, -4} graphed according to the code block that checks adjacency
The Set partitions returned by Set#divide are the strongly connected components of this graph, obtained by iterating over TSort#each_strongly_connected_component. A strongly connected component is a set of nodes such that, starting from any node in the component, you can follow the one-way arrows and get to any other node in the component.
Visually speaking, the strongly connected components are the "clumps" in the graph. 1 and 3 are in the same strongly connected component as 2, because starting from 3 you can follow one-way arrows through 2 and get to 1. Starting from 1, you can follow one-way arrows through 2 and get to 3. This makes 1, 2, and 3 part of the same Set partition, even though there are no direct connections between 1 and 3.
In most real-world scenarios (including all the examples above), the one-way arrows will be symmetrical: if the code returns true for |x,y|, it will also return true for |y,x|. Set#divide will work even if this isn't true. Consider a Set and a divide code block like the following:
connections = { 1 => 2, 2 => 3, 3 => 1, 4 => 1 } [1,2,3,4].to_set.divide { |x,y| connections[x] == y } # => #, #}>
The corresponding graph looks like Figure 4-2.
Figure 4-2. The set {1,2,3,4} graphed according to the connection hash
You can get to any other node from 4 by following one-way arrows, but you can't get to 4 from any of the other nodes. This puts 4 is in a strongly connected componentand a Set partitionall by itself. 1, 2, and 3 form a second strongly connected componentand a second Set partitionbecause you can get from any of them to any of them by following one-way arrows.
Implementation for arrays
If you're starting with an array instead of a Set, it's easy to simulate Set#classify (and the single-argument block form of Set#divide)with a hash. In fact, the code below is almost identical to the current Ruby implementation of Set#classify.
class Array def classify require 'set' h = {} each do |i| x = yield(i) (h[x] ||= self.class.new) << i end h end def divide(&block) Set.new(classify(&block).values) end end [1,1,2,6,6,7,101].divide { |x| x % 2 } # => #
There's no simple way to implement a version of Array#divide that takes a two-argument block. The TSort class is Set-like, in that it won't create two different nodes for the same object. The simplest solution is to convert the array into a Set to remove any duplicate values, divide the Set normally, then convert the partitioned subsets into arrays, adding back the duplicate values as you go:
class Array def divide(&block) if block.arity == 2 counts = inject({}) { |h, x| h[x] ||= 0; h[x] += 1; h} to_set.divide(&block).inject([]) do |divided, set| divided << set.inject([]) do |partition, e| counts[e].times { partition << e } partition end end else Set.new(classify(&block).values) end end end [1,1,2,6,6,7,101].divide { |x,y| (x-y).abs == 1 } # => [[101], [1, 1, 2], [6, 6, 7]]
Is it worth it? You decide.
Strings
Numbers
Date and Time
Arrays
Hashes
Files and Directories
Code Blocks and Iteration
Objects and Classes8
Modules and Namespaces
Reflection and Metaprogramming
XML and HTML
Graphics and Other File Formats
Databases and Persistence
Internet Services
Web Development Ruby on Rails
Web Services and Distributed Programming
Testing, Debugging, Optimizing, and Documenting
Packaging and Distributing Software
Automating Tasks with Rake
Multitasking and Multithreading
User Interface
Extending Ruby with Other Languages
System Administration