Section 10.3. Using KirbyBase

10.2. Performing Higher-Level Data Access

Frequently we want to store and retrieve data in a more transparent manner. The Marshal module offers simple object persistence, and the PStore library builds on that functionality. Finally, the dbm library is used like a hash stored permanently on disk. It does not truly belong in this section, but it is too simple to put in the database section.

10.2.1. Simple Marshaling

In many cases we want to create an object and simply save it for use later. Ruby provides rudimentary support for such object persistence or marshaling. The Marshal module enables programs to serialize and unserialize Ruby objects in this way.

# array of elements [composer, work, minutes]     works = [["Leonard Bernstein","Overture to Candide",11],          ["Aaron Copland","Symphony No. 3",45],          ["Jean Sibelius","Finlandia",20]] # We want to keep this for later... File.open("store","w") do |file|   Marshal.dump(works,file) end # Much later... File.open("store") do |file|   works = Marshal.load(file) end

This technique does have the shortcoming that not all objects can be dumped. If an object includes an object of a fairly low-level class, it cannot be marshaled; these include IO, Proc, and Binding. Singleton objects, anonymous classes, and modules also cannot be serialized.

Marshal.dump takes two other forms of parameter passing. When called with just one parameter, it returns the data as a string along with a major and minor version number in the first two bytes of the marshaled string.

s = Marshal.dump(works) p s[0]  #  4 p s[1]  #  8

Normally, if you try to load such data, it will load only if the major version number is the same and minor version number is less than or equal. However, if the verbose flag for the Ruby interpreter is set (using verbose or v) then the versions must match exactly. These version numbers are independent of the Ruby's version numbers.

The third limit parameter makes sense only if the object being marshaled contains nested objects. If it is specified (as an integer) to Marshal.dump, then it uses that as the limit to traverse the depth of the object being marshaled. If the nesting is less than the mentioned limit, then the object is marshaled without an error; otherwise an ArgumentError is thrown. An example will make it clearer:

File.open("store","w") do |file|   arr = [ ] Marshal.dump(arr,file,0)      #   in `dump': exceed depth limit                               #  (ArgumentError)   Marshal.dump(arr,file,1)   arr = [1, 2, 3] Marshal.dump(arr,file,1)      # in `dump': exceed depth limit                               # (ArgumentError)   Marshal.dump(arr,file,2)   arr = [1, [2], 3] Marshal.dump(arr,file,2)      # in `dump': exceed depth limit                               # (ArgumentError)   Marshal.dump(arr,file,3) end File.open("store") do |file|   p Marshal.load(file)          #  [ ]   p Marshal.load(file)          #   [1, 2, 3]   p Marshal.load(file)          #   arr = [1, [2], 3] end

The default value of the third parameter is 1. A negative depth implies no depth checking.

10.2.2. More Complex Marshaling

Sometimes we want to customize our marshaling to some extent. Creating _load and _dump methods enable you to do this. These hooks are called when marshaling is done so that you are handling your own conversion to and from a string.

In the following example, a person has been earning 5% interest on his beginning balance since he was born. We don't store the age and the current balance because they are a function of time.

class Person      attr_reader :name      attr_reader :age      attr_reader :balance      def initialize(name,birthdate,beginning)        @name = name        @birthdate = birthdate        @beginning = beginning        @age = (Time.now - @birthdate)/(365*86400)        @balance = @beginning*(1.05**@age)      end      def marshal_dump        Struct.new("Human",:name,:birthdate,:beginning)        str = Struct::Human.new(@name,@birthdate,@beginning)        str      end      def marshal_load(str)        self.instance_eval do          initialize(str.name, str.birthdate, str.beginning)        end      end      # Other methods...    end    p1 = Person.new("Rudy",Time.now - (14 * 365 * 86400), 100)    p [p1.name, p1.age, p1.balance]  # ["Rudy", 14.0, 197.99315994394]    str = Marshal.dump(p1)    p2  = Marshal.load(str)    p [p2.name, p2.age, p2.balance]  # ["Rudy", 14.0, 197.99315994394]

When an object of this type is saved, the age and current balance will not be stored; when the object is "reconstituted," they will be computed. Notice how the marshal_load method assumes an existing object; this is one of the few times you might want to call initialize explicitly (just as new calls it).

10.2.3. Performing Limited "Deep Copying" Using Marshal

Ruby has no "deep copy" operation. The methods dup and clone may not always work as you would initially expect. An object may contain nested object references that turn a copy operation into a game of Pick-Up-Sticks.

We offer here a way to handle a restricted deep copy. It is restricted because it is still based on Marshal and has the same inherent limitations:

def deep_copy(obj)   Marshal.load(Marshal.dump(obj)) end a = deep_copy(b)

10.2.4. Better Object Persistence with `PStore`

The PStore library provides file-based persistent storage of Ruby objects. A PStore object can hold a number of Ruby object hierarchies. Each hierarchy has a root identified by a key. Hierarchies are read from a disk file at the start of a transaction and written back at the end.

require "pstore" # save db = PStore.new("employee.dat") db.transaction do     db["params"] = {"name" => "Fred", "age" => 32,                     "salary" => 48000 } end # retrieve require "pstore" db = PStore.new("employee.dat") emp = nil db.transaction { emp = db["params"] }

Typically, within a transaction block we use the PStore object passed in. We can also use the receiver directly, however, as shown in the previous code.

This technique is transaction oriented; at the start of the block, data are retrieved from the disk file to be manipulated. Afterward, they are transparently written back out to disk.

In the middle of a transaction, we can interrupt with either commit or abort; the former keep the changes we have made, while the latter will throw them away. Refer to this longer example:

require "pstore" # Assume existing file with two objects stored store = PStore.new("objects") store.transaction do |s|   a = s["my_array"]   h = s["my_hash"]   # Imaginary code omitted, manipulating   # a, h, etc.   # Assume a variable named "condition" having   # the value 1, 2, or 3...   case condition     when 1       puts "Oops... aborting."       s.abort   # Changes will be lost.     when 2       puts "Committing and jumping out."       s.commit  # Changes will be saved.     when 3       # Do nothing...   end   puts "We finished the transaction to the end."   # Changes will be saved. end

Within a transaction, you can also use the method roots to return an array of roots (or root? to test membership). There is also a delete method to remove a root.

store.transaction do |s|   list = s.roots          # ["my_array","my_hash"]   if s.root?("my_tree")     puts "Found my_tree."   else     puts "Didn't find # my_tree."   end   s.delete("my_hash")   list2 = s.roots         # ["my_array"] end

10.2.5. Working with CSV Data

CSV (comma-separated values) format is something you may have had to deal with if you have ever worked with spreadsheets or databases. Fortunately, Hiroshi Nakamura has created a module for Ruby and made it available in the Ruby Application Archive.

There is also a FasterCSV library created by James Edward Gray III. As the name implies, it runs faster, but it also has some changes and enhancements in the interface (though with a "compatibility mode" for users of the other library). At the time of this writing, there is some discussion that FasterCSV may become standard and replace the older library (likely taking over its name as well).

This is obviously not a true database system. However, a discussion of it fits better in this chapter than anywhere else.

The CSV module (csv.rb) will parse or generate data in CSV format. There is no universal agreement on the exact format of CSV data; the library author defines this format as follows:

Record separator: CR + LF
Field separator: comma (,)
Quote data with double quotes if it contains CR, LF, or comma
Quote double quote by prefixing it with another double quote ("-> "")
Empty field with quotes means null string (data,"",data)
Empty field without quotes means NULL (data,,data)

This section covers only a portion of the functionality of this library. It will be enough to get you started, but as always, the newest docs are to be found online (starting with ruby-doc.org).

Let's start by creating a file. To write out comma-separated data, we can simply open a file for writing; the open method will pass a writer object into the attached block. We then use the append operator to append arrays of data (which are converted to comma-separated format upon writing). The first line will be a header.

require 'csv' CSV.open("data.csv","w") do |wr|   wr << ["name", "age", "salary"]   wr << ["mark", "29", "34500"]   wr << ["joe", "42", "32000"]   wr << ["fred", "22", "22000"]   wr << ["jake", "25", "24000"]   wr << ["don", "32", "52000"] end

The preceding code gives us a data file data.csv:

"name","age","salary" "mark",29,34500 "joe",42,32000 "fred",22,22000 "jake",25,24000 "don",32,52000

Another program can read this file as follows:

require 'csv' CSV.open('data.csv', 'r') do |row|   p row end # Output: # ["name", "age", "salary"] # ["mark", "29", "34500"] # ["joe", "42", "32000"] # ["fred", "22", "22000"] # ["jake", "25", "24000"] # ["don", "32", "52000"]

The preceding code could also be written without a block; then the open call would return a reader object. We could then invoke shift on the reader (as though it were an array) to retrieve the next row. But the block-oriented way seems more straightforward.

There are a few more advanced features and convenience methods in this library. For more information, see ruby-doc.org or the Ruby Application Archive.

10.2.6. Marshaling with YAML

YAML reportedly stands for YAML Ain't Markup Language. It is nothing but a flexible, human-readable data storage format. As such, it is similar to XML but arguably "prettier."

When we require the yaml library, we add a to_yaml method to every object. It is instructive to dump a few simple objects and a few more complex ones to see how YAML deals with them.

require 'yaml' str = "Hello, world" num = 237 arr = %w[ Jan Feb Mar Apr ] hsh = {"This" => "is", "just a"=>"hash."} puts str.to_yaml puts num.to_yaml puts arr.to_yaml puts hsh.to_yaml # Output: # --- "Hello, world" # --- 237 # --- # - Jan # - Feb # - Mar # - Apr # --- # just a: hash. # This: is

The inverse of the to_yaml method is the YAML.load method, which can take a string or a stream as a parameter.

Assume that we had a file such as data.yaml here:

--- - "Hello, world" - 237 -   - Jan   - Feb   - Mar   - Apr -   just a: hash.   This: is

This is the same as the four data items we just looked at, except they are collected into a single array. If we now load this stream, we get this array back:

require 'yaml' file = File.new("data.yaml") array = YAML.load(file) file.close p array # Output: # ["Hello, world", 237, ["Jan", "Feb", "Mar", "Apr"], #  {"just a"=>"hash.", "This"=>"is"}]

In general, YAML is just a way to marshal objects. At a higher level, it can be used for many purposes. For example, the fact that it is human-readable also makes it human-editable, and it becomes a natural format for configuration files and such things.

There is more to YAML than we have shown here. For further information, consult ruby-doc.org or any written reference.

10.2.7. Object Prevalence with Madeleine

In some circles, object prevalence is popular. The idea is that memory is cheap and getting cheaper, and most databases are fairly small, so we'll forget the database and keep all our objects in memory.

The classic implementation was Prevayler, implemented in Java. The Ruby version is called Madeleine.

Madeleine isn't for everyone or every application. Object prevalence comes with its own set of rules and constraints. First, all objects must fit in memoryall at once. Second, all objects must be marshalable.

All objects must be deterministicthey must behave in exactly the same way based on their inputs. (This means that using the system clock or using random numbers is problematic.)

The objects should be isolated from all I/O (file and network) as much as possible. The general technique is to make calls outside the prevalence system to do such I/O.

Finally, every command that alters the state of the prevalence system must be issued in the form of a command object (so that these objects themselves can be marshaled and stored).

Madeleine provides two basic methods for accessing the object system. The execute_query method provides query capability or read-only access. The execute_command method encapsulates any operation that changes the state of any object in the object system.

Both these methods take a Command object as a parameter. A Command object by definition has an execute method.

The system works by taking snapshots of the object system at periodic points in the application's execution. The commands are serialized along with the other objects. Currently there is no way to "roll back" a set of transactions.

It's difficult to create a good meaningful example of the usage of this library. If you are familiar with the Java version, I suggest you study the Ruby API and learn it that way. In the absence of any really good tutorials, perhaps you can write one.

10.2.8. Using the DBM Library

DBM is a simple platform-independent string-based hash file-storage mechanism. It stores a key and some associated data, both of which must be strings. Ruby's dbm interface is built into the standard installation.

To use this class, create a DBM object associated with a filename and work with the string-based hash however you want. When you have finished, you should close the file.

require 'dbm' d = DBM.new("data") d["123"] = "toodle-oo!" puts d["123"]        # "toodle-oo!" d.close puts d["123"]        # RuntimeError: closed DBM file e = DBM.open("data") e["123"]                # "toodle-oo!" w=e.to_hash                # {"123"=>"toodle-oo!"} e.close e["123"]                # RuntimeError: closed DBM file w["123"]                # "toodle-oo!

DBM is implemented as a single class that mixes in Enumerable. The two (aliased) class methods new and open are singletons, which means you may only have one DBM object per data file open at any given time.

q=DBM.new("data.dbm")   # f=DBM.open("data.dbm")  # Errno::EWOULDBLOCK:                         #   Try again - "data.dbm"

There are 34 instance methods, many of them aliases or similar to the hash methods. Basically, if you are used to manipulating a real hash in a certain way, there is a good chance you can apply the same operation to a dbm object.

The method to_hash makes a copy of the hash file object in memory, and close permanently closes the link to the hash file. Most of the rest of the methods are analogous to hash methods, but there are no rehash, sort, default, or default= methods. The to_s method just returns a string representation of the object id.