Chapter 4. External Data Manipulation
IN THIS CHAPTER
Working with Files and Directories
Performing Higher-Level Data Access
Connecting to External Databases
On a clean disk you can seek forever.
Thomas B. Steel, Jr.
Computers are good at computing. This tautology is more profound than it appears. If we only had to sit and chew up the CPU cycles and reference RAM as needed, life would be easy.
A computer that only sits and thinks to itself is of little use to us, however. Sooner or later we have to get information into it and out of it, and that is where life gets harder.
Several things make I/O complicated. First of all, input and output are rather different things, but we naturally lump them together. Second, the varieties of I/O operations (and their usages) are as diverse as species of insects.
History has seen such devices as drums, paper tapes, magnetic tapes, punched cards, and teletypes. Some operated with a mechanical component; others were purely electromagnetic. Some were read-only; others were write-only or read-write. Some writable media were erasable, and others were not. Some devices were inherently sequential; others were random access. Some media were permanent; others were transient or volatile. Some devices depended on human intervention; others did not. Some were character oriented; others were block oriented. Some block devices were fixed length; others were variable length. Some devices were polled; others were interrupt driven. Interrupts could be implemented in hardware or software, or both. We have seen both buffered and nonbuffered I/O. We have seen memory-mapped I/O, channel-oriented I/O, and with the advent of operating systems such as Unix, we have seen I/O devices mapped to files in a file system. We have done I/O in machine language, in assembly language, and in high-level languages. Some languages have the I/O capabilities firmly hardwired in place; others leave it out of the language specification completely. We have done I/O with and without suitable device drivers or layers of abstraction.
If this seems like a confusing mess, that's because it is. Part of the complexity is inherent in the concept of input/output, part of it is the result of design tradeoffs, and part of it is the result of legacies or traditions in computer science and the quirks of various languages and operating systems.
Ruby's I/O is complex because I/O in general is complex. However, we have tried to make it understandable and present a good overview of how and when to use various techniques.
The core of all Ruby I/O is the IO class, which defines behavior for every kind of input/output operation. Closely allied with IO (and inheriting from it) is the File class. There is a nested class within File called Stat, which is an object that encapsulates various details about a file that we might want to examine (such as its permissions and timestamps). The methods stat and lstat return objects of type File::Stat.
The module FileTest also has methods that allow us to test much the same set of properties. This is mixed into the File class and can also be used on its own.
Finally, there are I/O methods in the Kernel module that are mixed into Object (the ancestor of all objects). These are the simple I/O routines we have used all along without worrying about what their receiver was. These naturally default to standard input and standard output.
The beginner may find these classes to be a confused jumble of overlapping functionality. The good news is that you need only use small pieces of this framework at any given time.
On a higher level, Ruby offers features to make object persistence possible. The Marshal enables simple serialization of objects, and the more sophisticated PStore library is based on Marshal. We include the DBM library in this section, although it is only string based.
On the highest level of all, data access can be performed by interfacing to a separate database management system such as MySQL or Oracle. This issue is complex enough that one or more books could devoted to it. We will provide only a brief overview to get you started. In some cases, we provide only a pointer to an online archive.