Chapter 6: Reading and Writing Simple Records | Programming from the Ground Up

Overview

As mentioned in Chapter 5, many applications deal with data that is persistent - meaning that the data lives longer than the program by being stored on disk in files. You can shut down the program and open it back up, and you are back where you started. Now, there are two basic kinds of persistent data - structured and unstructured. Unstructured data is like what we dealt with in the toupper program. It just dealt with text files that were entered by a person. The contents of the files weren't usable by a program because a program can't interpret what the user is trying to say in random text.

Structured data, on the other hand, is what computers excel at handling. Structured data is data that is divided up into fields and records. For the most part, the fields and records are fixed-length. Because the data is divided into fixed-length records and fixed-format fields, the computer can interpret the data. Structured data can contain variable-length fields, but at that point you are usually better off with a database.^[1]

This chapter deals with reading and writing simple fixed-length records. Let's say we wanted to store some basic information about people we know. We could imagine the following example fixed-length record about people:

Firstname - 40 bytes
Lastname - 40 bytes
Address - 240 bytes
Age - 4 bytes

In this, everything is character data except for the age, which is simply a numeric field, using a standard 4-byte word (we could just use a single byte for this, but keeping it at a word makes it easier to process).

In programming, you often have certain definitions that you will use over and over again within the program, or perhaps within several programs. It is good to separate these out into files that are simply included into the assembly language files as needed. For example, in our next programs we will need to access the different parts of the record above. This means we need to know the offsets of each field from the beginning of the record in order to access them using base pointer addressing. The following constants describe the offsets to the above structure. Put them in a file named record-def.s:

  .equ RECORD_FIRSTNAME, 0  .equ RECORD_LASTNAME, 40  .equ RECORD_ADDRESS, 80  .equ RECORD_AGE, 320  .equ RECORD_SIZE, 324

In addition, there are several constants that we have been defining over and over in our programs, and it is useful to put them in a file, so that we don't have to keep entering them. Put the following constants in a file called linux.s:

  #Common Linux Definitions  #System Call Numbers  .equ SYS_EXIT, 1  .equ SYS_READ, 3  .equ SYS_WRITE, 4  .equ SYS_OPEN, 5  .equ SYS_CLOSE, 6  .equ SYS_BRK, 45  #System Call Interrupt Number  .equ LINUX_SYSCALL, 0x80  #Standard File Descriptors  .equ STDIN, 0  .equ STDOUT, 1  .equ STDERR, 2  #Common Status Codes  .equ END_OF_FILE, 0

We will write three programs in this chapter using the structure defined in record-def.s. The first program will build a file containing several records as defined above. The second program will display the records in the file. The third program will add 1 year to the age of every record.

In addition to the standard constants we will be using throughout the programs, there are also two functions that we will be using in several of the programs - one which reads a record and one which writes a record.

What parameters do these functions need in order to operate? We basically need:

The location of a buffer that we can read a record into
The file descriptor that we want to read from or write to

Let's look at our reading function first:

  .include "record-def.s"  .include "linux.s" #PURPOSE:   This function reads a record from the file #          descriptor # #INPUT:    The file descriptor and a buffer # #OUTPUT:   This function writes the data to the buffer #          and returns a status code. # #STACK LOCAL VARIABLES  .equ ST_READ_BUFFER, 8  .equ ST_FILEDES, 12  .section .text  .globl read_record  .type read_record, @function read_record:  pushl %ebp  movl  %esp, %ebp  pushl %ebx  movl  ST_FILEDES(%ebp), %ebx  movl  ST_READ_BUFFER(%ebp), %ecx  movl  $RECORD_SIZE, %edx  movl  $SYS_READ, %eax  int   $LINUX_SYSCALL  #NOTE - %eax has the return value, which we will  #       give back to our calling program  popl  %ebx  movl  %ebp, %esp  popl  %ebp  ret

It's a pretty simply function. It just reads data the size of our structure into an appropriately sized buffer from the given file descriptor. The writing one is similar:

  .include "linux.s"  .include "record-def.s" #PURPOSE:   This function writes a record to #           the given file descriptor # #INPUT:     The file descriptor and a buffer # #OUTPUT:    This function produces a status code # #STACK LOCAL VARIABLES  .equ ST_WRITE_BUFFER, 8  .equ ST_FILEDES, 12  .section .text  .globl write_record  .type write_record, @function write_record:  pushl %ebp  movl  %esp, %ebp  pushl %ebx  movl  $SYS_WRITE, %eax  movl  ST_FILEDES(%ebp), %ebx  movl  ST_WRITE_BUFFER(%ebp), %ecx  movl  $RECORD_SIZE, %edx  int   $LINUX_SYSCALL  #NOTE - %eax has the return value, which we will  #       give back to our calling program  popl  %ebx  movl  %ebp, %esp  popl  %ebp  ret

Now that we have our basic definitions down, we are ready to write our programs.

^[1]A database is a program which handles persistent structured data for you. You don't have to write the programs to read and write the data to disk, to do lookups, or even to do basic processing. It is a very high-level interface to structured data which, although it adds some overhead and additional complexity, is very useful for complex data processing tasks. References for learning how databases work are listed in Chapter 13.