Text Files as Databases | Sams Teach Yourself Perl in 24 Hours (3rd Edition)

< Day Day Up >

Often, databases are small, simple arrangements: a list of users on a small system, local hosts on a small network, a list of favorite Web sites, or a personal address file. These are all simple forms of databases, and for simple databases, normal text files will often do. But before using text files as databases, you need to consider some pros and cons.

The good news: Using a text file as a database has a few distinct advantages over using more complicated alternatives such as DBM files or large databases such as Oracle or Sybase. Some of these advantages are:

Text file databases are portable. They can be moved between vastly different kinds of systems without too much trouble.
Text file databases can be edited with a text editor and printed to paper without any special tools.
Text file databases are simple to construct initially.
Text file databases can be imported into other programs ”spreadsheets, word processors, or other databases ”without hassles. Almost any program that allows you to import data allows you to import text.

Now, as you would expect, there is some bad news. To understand the bad news fully, consider how text files are usually constructed . Text file databases are traditionally arranged so that each line in the text file is a record and columns within each line are fields. To your system, however, a text file is simply a stream of characters . So a text file database that looks like

Bob 555-1212

Maury 555-0912

Paul 555-0012

Ann-Marie 555-1190

is actually stored as a continuous stream of characters like

Bob[space]555-1212[newline]Maury[space]555-0912[newline]Paul[space]

where [space] represents a space character and [newline] represents a record separator (newline character, "\n") for your operating system, as discussed in Hour 5. The characters for each record and each field are all packed together in one long stream of characters; the nice column-row display is simply the human-readable way that editors, printers, and Perl represent the data.

Keeping that structure in mind, consider the disadvantages of a text file database:

Text files cannot be inserted into; they can only be overwritten ”partially or completely. Inserting new data anywhere , except at the end of the file, involves copying all the data following the newly inserted data further down in the file.

New data: Susan 555-6613 to be inserted after "Bob"

Bob[space]555-1212[newline]Maury[space]555-0912[newline]Paul[space]

All this data must be copied

Bob[space]555-1212[newline]Susan[space]555-6613[newline]Maury[space]

Copying data within a file is error-prone and slow.
The reverse is also true: Removing data from the middle of a text file is difficult. All the data following the removed portion must be copied into the gap. For example, you would remove Maury from the original text database like this:

By removing this

Bob[space]555-1212[newline]Maury[space]555-0912[newline]Paul[space]

All this data must be copied

Bob[space]555-1212[newline]Paul[space]555-0012[newline]Ann-Marie[space]
To find a particular record in a text file database, you must search the file sequentially ”normally from the top down. Unlike a DBM file, in which finding a record is as easy as looking for it in a hash, each line of a text file must be examined to see whether it's the correct record. This process is slow, and it gets progressively slower the larger the database gets.

Inserting into or Removing from a Text File

Text file databases aren't completely hopeless. With a small text file database, you can easily insert or delete from the database if you treat the text file like an array.

For example, if the database

Bob 555-1212

Maury 555-0912

Paul 555-0012

Ann-Marie 555-1190

were saved into a file called phone.txt, a short Perl program could read the database into an array like this:

 #!/usr/bin/perl -w use strict; sub readdata {     open(PH, "phone.txt")  die "Cannot open phone.txt: $!";     my(@DATA)=<PH>;     chomp @DATA;     close(PH);     return(@DATA); }

Here, the readdata() function reads phone.txt and puts the data into @DATA ”without the newline characters ”and returns the array. If you add another function, writedata() , as follows , the database can be read and written:

 sub writedata {     my(@DATA)=@_;   # Accept new contents     open(PH, ">phone.txt")  die "Cannot open phone.txt: $!";     foreach(@DATA) {         print PH "$_\n";     }     close(PH); }

Now, to insert records into the database, simply read the data with readdata() into an array; use push , unshift , or splice to insert a record into the array; and then write the array out again with writedata() like this:

 @PHONELIST=readdata();   # Put all of the records in @PHONELIST push(@PHONELIST, "April 555-1314"); writedata(@PHONELIST);  # Write them out again.

To remove text from the text file database, use splice , pop , or shift on the array @PHONELIST before writing it back out. You can also manually edit the array with a loop, such as with grep :

 @PHONELIST=readdata();    # Read all records into @PHONELIST # Remove everyone named "Ann" (or Annie, Annette, etc..) @PHONELIST=grep(! /Ann/, @PHONELIST); writedata(@PHONELIST);

In the preceding snippet, the records are copied into @PHONELIST from readdata() . The grep iterates over @PHONELIST , testing each element to see whether it does not match Ann ; those that do not match are assigned to @PHONELIST again. The @PHONELIST array is then given back to writedata() for writing.

< Day Day Up >