Connecting to External Databases | The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)

	Ruby Way By Hal Fulton Slots : 1.0
	Table of Contents

Ruby can interface to various databases, thanks to the development work of many different people. These range from monolithic systems such as Oracle down to the more petite MySQL. We have included the CSV (comma-separated values) format here for some measure of completeness.

The level of functionality provided by these packages will continually be changing. Be sure to refer to an online reference for the latest information. The Ruby Application Archive (RAA) is always a good starting point.

Interfacing to MySQL

Ruby's MySQL interface is the most stable and fully functional of its database interfaces. It is an extension and must be installed after both Ruby and MySQL are installed and running. If you upgrade Ruby, you will need to reinstall it. Installation itself is simple, using Ruby's make process.

There are three steps to using this module once you have it installed. First, load the module in your script; then connect to the database. Finally, work with your tables. Connecting requires the usual parameters for host, username, password, database, and so on, as shown here:

 require 'mysql' m = Mysql.new("localhost","ruby","secret","maillist") r = m.query("SELECT * FROM people ORDER BY name") r.each_hash do |f|   print "#{ f['name']}  - #{ f['email']} " end

Partial output is shown here:

 John Doe - jdoe@rubynewbie.com Fred Smith - smithf@rubyexpert.com Don Jackson - don@perl2.com Jenny Williams - jwill27@miss-code.com

The class methods Mysql.new and MysqlRes.each_hash are very useful, along with the instance method query.

The module is composed of four classes: Mysql, MysqlRes, MysqlField, and MysqlError, as described in the README file. We summarize some useful methods here, but you can always find more information in the actual documentation.

The class method Mysql.new takes several string parameters, all defaulting to nil, and it returns a connection object. The parameters are host, user, passwd, db, port, sock, and flag. Aliases for new are real_connect and connect.

The methods create_db, select_db, and drop_db all take a database name as a parameter; they are used as shown here (note that the method close will close the connection to the server):

 m=Mysql.new("localhost","ruby","secret") m.create_db("rtest")    # Create a new database m.select_db("rtest2")   # Select a different database m.drop_db("rtest")      # Delete a database m.close                 # Close the connection

The method list_dbs will return a list of available database names in an array:

 dbs = m2.list_dbs       # ["people","places","things"]

The query takes a string parameter and returns a MysqlRes object by default. Depending on how query_with_result is set, it may return a Mysql object.

In the event of an error, the error number can be retrieved by errno; error, on the other hand, will return the actual error message. Here's an example:

 begin   r=m.query("create table rtable    (      id int not null auto_increment,      name varchar(35) not null,      desc varchar(128) not null,      unique id(id)    )") # exception happens... rescue   puts m.error   # Prints: You have an error in your SQL syntax   # near 'desc varchar(128) not null ,   #   unique id(id)   # )' at line 5"   puts m.errno   # Prints 1064   # ('desc' is reserved for descending order) end

A few useful instance methods of MysqlRes are summarized in the following list:

fetch_fields Returns an array of MysqlField objects from the next row.
fetch_row Returns an array of field values from the next row.
fetch_hash(with_table=false) Returns a hash containing the next row's field names and values.
num_rows Returns the number of rows in the result set.
each An iterator that sequentially returns an array of field values.
each_hash(with_table=false) An iterator that sequentially returns a hash of {fieldname => fieldvalue}. (Use x['field name'] to get the field value.)

Here are some instance methods of MysqlField:

name Returns the name of the designated field
table Returns the name of table to which the designated field belongs
length Returns the defined length of the field
max_length Returns the length of the longest field from the result set
hash Returns a hash with a name and values for name, table, def, type, length, max_length, flags, and decimals

The material here is always superseded by online documentation. For more information, see the MySQL Web site (www.mysql.com) and the Ruby Application Archive.

Interfacing to PostgreSQL

An extension is available from the RAA that provides access to PostgreSQL (it works with PostgreSQL 6.5/7.0).

Assuming you already have PostgreSQL installed and set up (and you have a table named testdb), you merely need to follow essentially the same steps as used with other database interfaces in Ruby: Load the module, connect to the database, and then do your work with the tables. You'll probably want a way of executing queries, getting the results of a "select" operation back, and working with transactions. Here's an example:

 require 'postgres' conn = PGconn.connect("",5432, "", "", "testdb") conn.exec("create table rtest ( number integer default 0 );") conn.exec("insert into rtest values ( 99 )") res = conn.query("select * from rtest") # res id [["99"]]

The PGconn class contains the connect method, which takes the typical database connection parameters, such as host, port, database, username, and login, but it also takes options and tty parameters in positions three and four. We have connected in our example to the Unix socket via a privileged user, so we don't need a username and password. Also, the host, options, and tty parameters are left empty. The port must be an integer, whereas the others are strings. An alias for this is the new method.

The next thing of interest is working with our tables; this requires some means to perform queries. The instance methods PGconn#exec and PGconn#query are just what we need.

The exec method sends its string parameter as a SQL query request to PostgreSQL, and it returns a PGresult instance on success. On failure, it raises a PGError exception.

The query method also sends its string parameter as a SQL query request to PostgreSQL. However, it returns an array on success. The returned array is actually an array of tuples. On failure, it returns nil, and error details can be obtained by the error method call.

A special method, called insert_table, is available for inserting values into a specific table. Despite the name, insert_table actually means "insert into table." This method returns a PGconn object. Here's an example:

 conn.insert_table("rtest",[[34]]) res = conn.query("select * from rtest") # res is [["99"], ["34"]]

This inserts one row of values into the table rtest. For this simple example, there is only one column to begin with. Notice that the PGresult object res shows updated results with two tuples. We will discuss PGresult methods shortly.

Other potentially useful methods from the PGconn class include the following:

db Returns the connected database name.
host Returns the connected server name.
user Returns the authenticated username.
error Returns the error message about the connection.
finish Close the backend connection.
loimport(file) Imports a file to a large object; returns the PGlarge instance on success. On failure, this method raises the PGError exception.
loexport(oid, file) Saves a large object of oid to a file.
locreate([mode]) Returns the PGlarge instance on success. On failure, it raises the PGError exception.
loopen(oid, [mode]) Opens a large object of oid; returns the PGlarge instance on success. The mode argument specifies the mode for the opened large object, which is either INV_READ or INV_WRITE (if the mode is omitted, the default is INV_READ).
lounlink(oid) Unlinks (deletes) the Postgres large object of oid.

Notice that the last five methods of PGconn involve objects of the PGlarge class. The PGlarge class has specific methods for accessing and changing its own objects. (The objects are created as a result of the PGconn instance methods loimport, locreate, and loopen from the preceding list.)

Here is a list of PGlarge methods:

open([mode]) Opens a large object. The mode argument specifies its mode (see PGconn#loopen).
close Closes a large object (also closed when it is garbage collected).
read([length]) Attempts to read "length" bytes from a large object. If no length is given, all data is read.
write(str) Writes the string to the large object and returns the number of bytes written.
tell Returns the current position of the pointer.
seek(offset, whence) Moves the pointer to offset. The possible values for whence are SEEK_SET, SEEK_CUR, and SEEK_END (or 0, 1, and 2).
unlink Deletes a large object.
oid Returns the large object oid.
size Returns the size of a large object.
export(file) Saves a large object of oid to a file.

Of more interest to us are the instance methods of the PGresult class, which are created as the result of queries. Use PGresult#clear when finished with these objects to improve memory performance. Here's a list of these methods:

result Returns the query result tuple in the array.
each An iterator.
[] An accessor.
fields Returns the array of the fields of the query result.
num_tuples Returns the number of tuples of the query result.
fieldnum(name) Returns the index of the named field.
type(index) Returns an integer corresponding the type of the field.
size(index) Returns the size of the field in bytes. A value of -1 indicates the field is variable length.
getvalue(tup_num, field_num) Returns the field value for the given parameters. tup_num is the same as row number.
getlength(tup_num, field_num) Returns the length of the field in bytes.
cmdstatus Returns the status string of the last query command.
clear Clears the PGresult object.

Working with CSV Data

The CSV format is something you may have had to deal with if you have ever worked with spreadsheets or databases. Fortunately, Hiroshi Nakamura has created a module for Ruby and has made it available in the Ruby Application Archive.

This is not a true database system. However, we felt that a discussion of it fits here better than anywhere else.

The CSV module (csv.rb) will parse or generate data in CSV format. The module author defines this format as follows:

 Record separator: CR + LF Field separator: comma (,) Quote data with double quotes if it contains CR, LF, or comma Quote double quote by prefixing it with another double quote (" -> "") Empty field with quotes means null string (data,"",data) Empty field without quotes means NULL (data,,data)

There are two ways to use this module: Parse/create single lines, and read/write through a file parsing/creating records sequentially. In the latter case, you will be dealing with arrays of column data objects instead of arrays of strings. The latter method requires the use of the record separator and the isNull flag of the column data object.

First, let's look at the handling of single, nonterminated lines. The method CSV::parse will parse the specified CSV-formatted string as a single line and return an array of strings; the method CSV::create will take the specified array of strings and create a single CSV-formatted line.

Suppose we have a data file data.csv, as shown here:

 "name","age","salary" "mark",29,34500 "joe",42,32000 "fred",22,22000 "jake",25,24000 "don",32,52000

We can process this file as follows:

 require 'csv' IO.foreach("data.csv") {  |f| p CSV::parse(f.chomp) } # Output: # ["name", "age", "salary"] # ["mark", "29", "34500"] # ["joe", "42", "32000"] # ["fred", "22", "22000"] # ["jake", "25", "24000"] # ["don", "32", "52000"]

We could also process each array and write it back to a file:

 File.open("newdata.csv","w") do |file|   IO.foreach("data.csv") do |line|      a=CSV::parse line.chomp      if a[1].to_i > 20        s=a[2].to_i        a[2]=s*1.1  # 10% raise      end     file.puts CSV::create(a)   end end

Now let's take a look at the multiline parsing commands. These commands handle data differently and do expect to see the record terminator CR-LF. The parseLine method populates an array with objects that contain the parsed field data and a flag for null data. Likewise, the createLine method uses such objects to generate CSV-formatted output.

To demonstrate, we can use the data.csv file again:

 require 'csv' file_str = File.open("data.csv").read rec = []; pos=nil; i=-1 c,pos = CSV::parseLine(file_str,                        pos.to_i,                        rec[(i+=1)]=[]) until pos == 0

Note that if your data records are not separated by CR-LF, there will be a discrepancy. To ensure that your data records are separated by CR-LF, you might perform #gsub!(/\n/,"\r\n") on the file string beforehand.

You will still have to go through another step before you can use the information encapsulated inside the column data objects. However, this would be the only means available if you wish to detect null values. Otherwise, it would be better to use the simpler CSV::parse method because large files would have to be parsed line by line anyway.

Here is an example going the other way:

 cd=[] cd[0]=CSV::ColData.new cd[0].data="joe" cd[0].isNull=false cd[1]=CSV::ColData.new cd[1].data=27 cd[1].isNull=false cd[2]=CSV::ColData.new cd[2].data=32000 cd[2].isNull=false csv_str="" c = CSV::createLine(cd, 3, csv_str)

This will produce a CSV-formatted line including the CR-LF terminator. It seems like extra work, if you ask us.

Any material here is always superseded by online documentation. For more information, see the Ruby Application Archive.

Interfacing to Other Databases

Space does not permit us to delve into all the database interfaces available. Furthermore, at the time of this writing, these were of varying levels of maturity.

We'll just mention that there are several other libraries and tools in various stages of development. These will allow interfacing to Oracle, Interbase, mSql, LDAP, and others. There is also a usable ODBC driver. As always, refer to the Ruby Application Archive for the latest software and documentation.