Redundant Data-the Purpose Behind Relational Database Theory

Before relational database theory, flat-file storage was the critical issue. Memory was expensive and required lots of physical space. In addition, maintaining flat-file data took lots of human resources, which required salaries and office space.

The flat-file format stores every piece of data for every record, which can exponentially increase your storage needs and quickly lead to crushing demands on your resources. The main problem the relational model solves is redundant data. By using multiple tables that store each data item only once, the relational model significantly reduces storage needs.

Relational Database History

Most of you have probably heard of Dr. E. F. Codd, the IBM researcher who reset the database time clock. His 1970 paper ("A Relational Model of Data for Large Shared Data Banks," which appeared in the June 1970 issue of Communications of the ACM) was the technological nudge the industry needed to grow. At the time, databases stored data in flat-file format (one table stores each field of data for every record) and were expensive and cumbersome to maintain-at that time, 64KB systems were the size of a piano crate.

Dr. Codd theorized a method that stored information in related tables, called relations, thus creating a storage medium that was efficient and easy to implement and use. Dr. Codd's paper became the basis for the relational database model on which all relational database systems (such as Microsoft Access, Microsoft SQL Server, Oracle, and so on) are based.

Eventually, this theory was realized in a marketable database known as DB2, an IBM relational database that still owns a large percentage of the database market. Another windfall was Structured English Query Language, or SEQUEL, the support language for multitable and multiuser data access. Today, we know this language as Structured Query Language (SQL), and it has become the industry standard for relational databases.

To understand the difference between a relational and a flat-file database, let's look at a simple example that we will work with throughout this chapter. Let's suppose we're creating a database that stores information on books. For each book, we want to include title, ISBN, category, and so on.

Suppose, in addition to information about each book, you also store address information for each book's author and publisher. The flat-file format forces you to enter each corresponding author and publisher address for each book. In other words, every time you enter a new book for an existing author or publisher, you must also enter the author and publisher address information.

In the relational model, you enter the author and publisher information just once in related tables that store only author and publisher data, respectively. When reviewing a book's author or publisher data, you simply rely on the relationship between the book and author or publisher tables to retrieve the corresponding author and publisher information for any book.