Section 20.4. Service Discovery with Distributed Ruby | The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)

20.3. Rinda: A Ruby Tuplespace

The term tuplespace dates back as far as 1985, and the concept itself is even older than that. A tuple, of course, is simply an array or vector of data items (much like a database row); tuplespace is a large object space full of tuples, like a kind of "data soup."

So far, a tuplespace implementation sounds boring. It becomes more interesting when you realize that it is accessible in a synchronized way by multiple clients. In short, it is inherently a distributed entity; any client can read or write the tuplespace, so they can all use it as a large shared storage or even as a way to communicate.

The original tuplespace implementation was the Linda project, an experiment in parallel programming at Yale University in the 1980s. The Ruby implementation (based on drb, of course) is naturally called Rinda.

A Rinda tuple can actually be an array or a hash. If it is a hash, it has the additional restriction that all its keys must be strings. Here are some simple tuples:

t1 = [:add, 5, 9] t2 = [:name, :add_service, Adder.new, nil] t3 = { 'type' => 'add', 'value_1' => 5, 'value_2' => 9 }

Each item in a tuple can be an arbitrary object; this works because drb can marshal and unmarshal Ruby objects. (Of course, you may need to use DRbUndumped or make the class definitions available on the server side.)

We create a tuplespace with a simple new call:

require 'rinda/tuplespace' ts = Rinda::TupleSpace.new # ...

So a server would simply look like this:

require 'rinda/tuplespace' ts = Rinda::TupleSpace.new DRb.start_service("druby://somehost:9000", ts) gets   # CR to kill server

And a client would look like this:

require 'rinda/tuplespace' DRb.start_service ts = DRbObject.new(nil, "druby://somehost:9000") # ...

We can perform five basic operations on a Rinda tuplespace: read, read_all, write, take, and notify.

A read operation is exactly what it sounds like: You are retrieving a tuple from tuplespace. However, identifying the tuple to read may be a little unintuitive; we do it by specifying a tuple that will match the one we want to read. A nil value is in effect a wildcard that will match any value.

t1 = ts.read [:Sum,nil]        # will retrieve [:Sum, 14] for example

Normally a read operation will block (as a way of providing synchronization). If you want to quickly test the existence of a tuple, you can use a nonblocking read by specifying a timeout value of zero:

t2 = ts.read [:Result,nil],0   # raises an exception if nonexistent

If we know or expect that more than one tuple will match the pattern, we can use read_all and get an array back:

tuples = ts.read_all [:Foo, nil, nil] tuples.each do |t|   # ... end

The read_all method doesn't take a second parameter. It will always block if no matching tuple is found.

A take operation is basically a read followed by an implicit delete. The take actually removes a tuple from the tuplespace and returns it:

t = ts.take [:Sum, nil]    # tuple is now removed from tuplespace

You might ask why there isn't an explicit method to do a delete. Obviously the take method will serve that purpose.

The write method, of course, stores a tuple in tuplespace. Its second parameter tells how long in seconds the tuple should be kept before it expires. (The default expiration value is nil or never expiring.)

ts.write [:Add, 5, 9]         # Keep this "forever" ts.write [:Foo, "Bar"], 10    # Keep this ten seconds

A few words on synchronization are appropriate here. Suppose two clients attempt to take the same tuple at (approximately) the same time. One will succeed, and the other will block. If the first (successful) client then modifies the tuple and writes it back into tuplespace, the second (blocked) client will then retrieve the new modified version of the tuple. So you can think of an "update" operation as being a take followed by a write, and there will be no data loss. Of course, as with all thread programming, you have to watch for deadlocks.

A notify method, not surprisingly, enables you to "watch" the tuplespace and be informed when a matching tuple has some operation performed on it. This method (which returns a NotifyTemplateEntry object) watches for four kinds of operations:

write operations
take operations
delete operations (when a tuple has expired)
close operations (when the NotifyTemplateEntry object has expired)

Because read operations are nondestructive, the system does not support notification of reads. Listing 20.4 shows an example of using notify.

Listing 20.4. Rinda's Notification Feature

require 'rinda/tuplespace' ts = Rinda::TupleSpace.new alberts = ts.notify "write", ["Albert", nil] martins = ts.notify "take",  ["Martin", nil] thr1 = Thread.new do   alberts.each {|op,t| puts "#{op}: #{t.join(' ')}" } end thr2 = Thread.new do   martins.each {|op,t| puts "#{op}: #{t.join(' ')}" } end sleep 1 ts.write ["Martin", "Luther"] ts.write ["Albert", "Einstein"] ts.write ["Martin", "Fowler"] ts.write ["Albert", "Schweitzer"] ts.write ["Martin", "Scorsese"] ts.take  ["Martin", "Luther"] # Output: # write: Albert Einstein # write: Albert Schweitzer # take: Martin Luther

We've seen how read and other operations use templates that match tuples (conceptually much as a regular expression works). A nil value can be a wildcard as we've seen, but a class can also be specified to match any instance of that class.

tem1 = ["X", Integer]     # matches ["X",5] but not ["X","Files"] tem2 = ["X", NilClass]    # matches a literal nil in the tuple

In addition, you can define your own case equality (===) operator if you want a class to match a value in some special way. Otherwise, of course, the class will match based on the default === operator.

Bear in mind that the lifetime of a tuple can be specified upon writing. This ties in with the timeout values on the various tuple operations, ensuring that it's possible to restrict any simple or complex operation to a finite length of time.

The fact that tuples can expire also means that they can be renewed after they expire, often with a custom renewer object. The library comes with a SimpleRenewer that simply contacts the tuple's originating drb server every 180 seconds; if the server is down, the tuple is allowed to expire. But don't bother with renewer objects until you are competent in the tuplespace paradigm.

If you want another tuplespace code fragment, Listing 20.5 shows a simple one. This is based on the producer/consumer example in Chapter 13.

Listing 20.5. The Producer-Consumer Problem Revisited

require 'rinda/tuplespace' ts = Rinda::TupleSpace.new producer = Thread.new do   item = 0   loop do     sleep rand(0)     puts "Producer makes item ##{item}" ts.write ["Item",item]     item += 1   end end consumer = Thread.new do   loop do     sleep rand(0)     tuple = ts.take ["Item", nil] word, item = tuple     puts "Consumer retrieves item ##{item}"   end end sleep 60   # Run a minute, then die and kill threads