Databases and Persistence

Table of contents:

Table 13-1.

We all want to leave behind something that will outlast us, and Ruby processes are no exception. Every program you write leaves some record of its activity, even if its just data written to standard output. Most larger programs take this one step further: they store data from one run in a structured file, so that on another run they can pick up where they left off. There are a number of ways to persist data, from simple to insanely complex.

Simple persistence mechanisms like YAML let you write Ruby data structures to disk and load them back later. This is great for simple programs that don handle much data. Your program can store its entire state in a disk file, and load the file on its next invocation to pick up where it left off. If you never keep more data than can fit into memory, the simplest way to make it permanent is to store it with YAML, Marshal, or Madeleine, and reload it later (see Recipes 13.1, 13.2, and 13.3). Madeleine also lets you revisit the prior states of your data.

If your dataset won fit in memory, you need a database: a way of storing data on disk (usually in an indexed binary format) and retrieving parts of it quickly. The Berkeley database is the simplest database we cover: it operates like a hash, albeit a hash potentially much bigger than any you could keep in memory (Recipe 13.6).

But when most people think of a "database" they think of a relational database: MySQL, Postgres, Oracle, SQLite, or the like. A persistence mechanism stores data as Ruby data structures, and a Berkeley DB stores data as a hash of strings. But relational databases store data in the form of structured records with typed fields.

Because the tables of a relational database can have a complex structure and contain gigabytes of data, their contents are not accessed like normal Ruby data structures. Instead they e queried with SQL, a special programming language based on relational algebra. Most of the development time that goes into Ruby database libraries is spent trying to hide this fact. Several libraries hide the details of communication between a Ruby program and a SQL database; the balance of this chapter is devoted to showing how to use them.

Every relational database exposes a C API, and Ruby bindings to each API are available. We show you how to use the two most popular open source databases: MySQL (Recipe 13.9) and Postgres (Recipe 13.10).^[1] But every database has different bindings, and speaks a slightly different variant of SQL. Fortunately, there are other libraries that hide these differences behind a layer of abstraction. Once you install the bindings, you can install abstraction layers atop them and rely on the abstraction layer to keep track of the differences between databases.

^[1] SQLite deserves an honorable mention because, unlike other relational databases, it doesn require a server to run. The client code can directly query the database file. This makes things a lot easier to set up. Note that SQLite has two incompatible file formats (version 2 and version 3), and a gem exists for each version. You probably want the sqlite3-ruby gem.

Rubys simplest database abstraction library is DBI (its modeled after Perls DBI module). It does nothing more than provide a uniform interface to the different database bindings. You still have to write all the SQL yourself (and if you e serious about database neutrality, you must use the lowest common denominator of SQL), but you only have to learn a single binding API.

The more popular database abstraction libraries are ActiveRecord (the library of choice for Rails applications) and Og. Not only do these libraries hide the differences between databases, they hide most of the actual SQL. The database tables are represented as Ruby classes, the rows in the database tables as instances of those classes. You can find, create, and modify database rows by manipulating normal-looking Ruby objects. Neither Og nor ActiveRecord can do everything that raw SQL can, so you may also need to use DBI or one of the database-specific bindings.

One standard argument for database abstraction layers is that they make it easy to switch an applications underlying database without having to rewrite all the code. They certainly do make this easier, but it almost never happens.^[2] The real advantage is that with abstraction layers, you don have to learn all the different database bindings. Even if you never change databases for any given project, throughout your career youll find yourself using different databases on different projects. Learning how to use a database abstraction layer can save you from having to learn multiple database-specific bindings.

^[2] What does happen is that you may write a product designed to work with whatever database the user has installed. You can always require that your users run a specific database.

Whether you use ActiveRecord, Og, DBI, or database-specific bindings, youll need an actual database for your code to connect to. The recipes in this chapter assume youve got a database called cookbook and that you connect to it with the username "cookbook_user" and the password "password".

Heres how to set up cookbook as a MySQL database:

	$ mysql -u root
	Welcome to the MySQL monitor. Commands end with ; or g.

	Your MySQL connection id is 6 to server version: 4.0.24_Debian-10-log

	Type help; or \h for help. Type \c to clear the buffer.

	mysql> create database cookbook;
	Query OK, 1 row affected (0.00 sec)

	mysql> grant all privileges on cookbook.* to cookbook_user@localhost identified
	by password;
	Query OK, 0 rows affected (0.00 sec)

Heres how to set cookbook up as a Postgres database (youll probably need to run these commands as the postgres user):

	$ createuser
	Enter name of user to add: cookbook_user
	Enter password for new user: password
	Enter it again: password
	Shall the new user be allowed to create databases? (y/n) y
	Shall the new user be allowed to create more new users? (y/n) n
	CREATE USER

	$ createdb cookbook
	CREATE DATABASE

To avoid showing you the database connection code in every single recipe, weve factored it out into a library. If you want to run the code in this chapters recipes, you should put the following code in a file called cookbook_dbconnect.rb. Keep it in the directory where you keep the recipe code, or somewhere in your library include path, so that require cookbook_dbconnect will work.

This file defines database connection functions for DBI, ActiveRecord, and Og:

	# cookbook_dbconnect.rb
	require 
ubygems
	require dbi
	require active_record
	require og

The with_db method gets a database connection through DBI and runs a code block in the context of that connection:

	def with_db
	 DBI.connect("dbi:Mysql:cookbook:localhost",
	 "cookbook_user", "password") do |c|
	 yield c
	 end
	end

The activerecord_connect method only needs to be called once at the beginning of a program: after that, ActiveRecord will acquire database connections as needed.

	def activerecord_connect
	 ActiveRecord::Base.establish_connection(:adapter => "mysql",
	 :host => "localhost",
	 :username => "cookbook_user",
	 :password => "password",
	 :database => "cookbook")
	end

For your reference, this table presents the ActiveRecord adapter names for various kinds of databases.

Table 13-1.
Database	Adapter name
MySQL	mysql
PostgreSQL	postgresql
Oracle	oci
Microsoft SQL Server	sqlserver
SQLite 2	sqlite
SQLite 3	sqlite3
DB2	db2

The og_connect also needs to be called only once. One caveat: you must call it after youve defined the classes for your Og data model.

	def og_connect
	 Og.setup( { :destroy => false,
	 :store => :mysql,
	 :user => "cookbook_user",
	 :password => "password",
	 :name => "cookbook" } )
	end

This version of cookbook_dbconnect assumes you e running against a MySQL database. For a different database, you just need to change the database name so that DBI, ActiveRecord, and Og know which adapter they should use.

Here are some resources for more information about databases in Ruby:

http://ruby-dbi.rubyforge.org/
http://www.rubyonrails.org/show/ActiveRecord
http://www.rubygarden.com/index.cgi/Libraries/og_tutorial.rdoc

Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration

Databases and Persistence

Table 13-1.