Relational Databases

Although there are a number of database models, one of the most common and the one we will discuss is the relational database. The relational database model was introduced in 1970 by an IBM employee named E.F. Codd. Codd defined a relational database as a collection of data items organized as a set of defined tables from which data can be accessed and presented in different ways without having to reorganize the database tables. Although this initially sounds a bit confusing, the relational database model is actually fairly simple when viewed as a set of basic guidelines:

Relationships are represented by tables.
Each row represents a unique instance of an entity.
Columns are attributes of the entity.
Every table has an attribute or set of attributes that when combined form a "key" that uniquely identifies each entity.
Data is presented as a collection of relations; that is, users of the database could obtain a view of the database that fitted their needs.

Before we continue with more detailed discuss of relational databases, we want to take a step back and review a few of the general concepts and terms related to databases.

Understanding Databases

For the purpose of our discussion, we define data as the characteristics of people, places, things, and events. The color of your eyes, the title of this book, and the town where you live are all bits of unrelated data. A database is a structured collection of related data that is organized so that its contents (data) can be easily accessed, updated, and managed.

The database can contain one or more tables, which in turn are comprised of one or more columns. These columns contain predefined types of data. Let's look at Table 23.1. The table contains three columns: StateID, StateAbbreviation, and StateName. Each row represents a unique instance of the data contained in the columns.

Table 23.1. The State Table
StateID	StateAbbreviation	StateName
1	AL	Alabama
2	AK	Alaska
3	AZ	Arizona
4	AR	Arkansas
5	CA	California
6	CO	Colorado
7	CT	Connecticut
8	DE	Delaware
9	DC	District of Columbia
10	FL	Florida

Note that no two columns in a table can have the same name, and as a general rule, the column name should be meaningfully related to the data it contains. You often hear the rows commonly referred to as records, entities, or tuples. The terms "fields" and "attributes" are interchangeably used with columns.

One of our favorite visual analogies of a database is a locked file cabinet. Picture the file cabinet with a label that reads "DATABASE." The file cabinet can have one or many drawers, which are representative of the database tables. Each drawer contains files. These files are the equivalent of rows we find in the database table. Remember that we mentioned this was a locked file cabinet. To be able to access the data contained in the file cabinet, you need a key. A database management system (DBMS) is the key to a database. It is the DBMS that provides access to the data and provides the means to convert the data into meaningful information. If we send you a copy of the MDF file from a Microsoft SQL Server, it is still a database; but without the DBMS, you cannot access the data.

The concept of a database being organized and structured is a very important one. Imagine sitting in meetings all day, five days a week, taking notes on 3x5 cards. At the end of each meeting, you take all the 3x5 cards and toss them into one of five drawers in your file cabinet. After a short period, your drawers would contain a lot of 3x5 cards. Now imagine trying to retrieve a phone number you recorded in a meeting that occurred five weeks ago. Although this example might seem a bit absurd, it would be no different from creating a database table that had column names such as FirstName, PartNumber, Flavor, Publisher, and Breed. Do you find the column names a bit confusing? That is exactly the point. We want our data to be organized and structured so that the data can be useful information.

Keys

In a relational database, tables normally contain a single field or combination of fields that uniquely identifies each record or row. This is known as the primary key. Any column or combination of columns that contain a unique value can be a primary key. Many texts that deal with the topic of database development refer to these as candidate keys. You also see candidate keys that are not used as primary keys referred to as alternate keys. Primary keys that are formed by the combination of columns are known as composite primary keys.

Primary keys are important because they are how users of the database distinguish between records in a table. In some cases, you have to create an artificial primary key. This happens when the columns in a table do not lend themselves to natural uniqueness.

If we return to our previous discussion of the online bookstore, we can find a number of examples where tables might require an artificial primary key. Our database contains a table appropriately named Author. This table contains columns such as FirstName, LastName, Address1, Address2, City, StateID, and Email. Although we might initially be inclined to select the combination of columns FirstName and LastName as a primary key, what happens if we have two authors named Bob Smith? To resolve this problem, we would create an artificial key called Author_ID. In many cases, you use a mechanism such as AutoNumber (Access) or Sequencing (Oracle) to automatically create the value. This eliminates the need for the user to determine a unique value.

It is possible to create tables that do not have a primary key. The lack of a primary key creates a number of problems, including difficulties with manipulating data and preventing record duplication.

We have mentioned how tables in a relational database represent relationships. It is the primary key that helps to establish the link or relationship between tables. When the column in a table matches the primary key of another table, that column is known as a foreign key. As with primary keys, a foreign key can be a single column or multiple columns. A table can also contain foreign keys from more than one table.

Returning once more to our online bookstore, recall in our Author table that the primary key was the column Author_ID. Our database also contains the table State. The primary key in the State table is the column StateID. The StateID is also stored in the Author table. StateID is a primary key in the State table and a foreign key in the Author table. It is the foreign key that creates the relationship between the tables Author and State. This relationship is often referred to as a parent/child relationship, and the foreign key's table is called a lookup table. In our example, the State table would be the parent and lookup table, although the Author table is the child. Keep in mind that this is different from the parent/child relationship that you might have encountered in object-oriented programming (OOP).

It is a common technique to populate the options of a select list with the values from a lookup table. This gives the developer a way to ensure that the values a user attempts to enter for the column in a child table exist in the parent table. It is important to understand that there is more to this technique than simply making a UI that is friendly. If you were to create an input form to add new records to the Author table, your relational database management system (RDBMS) would check whether the value submitted for the foreign key field exists in the parent table. If a user enters a value of 99 in the field for StateID and 99 is not a value that exists in the StateID column of the State table, an error is returned.

This process of ensuring that the data in our relationship stays related is known as maintaining referential integrity. There are two rules of referential integrity. The rules apply to all databases. The first rule governs primary keys and states that primary keys cannot contain null values. If the primary key is a composite, then none of the columns that make up the composite primary key can contain null values.

The second rule of referential integrity says that the database cannot contain any unmatched foreign key values. It is this rule that prevents a user from entering the value 99 for StateID in the Author table when 99 does not exist as a value in the State table. Conversely, this means that if we delete a record or row from a parent table, the deletion cannot result in any unmatched foreign key values in the child table.

We have already discussed a common technique to help prevent violation of this rule. However, what are our options when it comes to preventing orphan values? The first solution is to not enable the deletion of records in a parent/child relationship. The second option is to reassign the child records to a new value and then enable the delete. A third option is to cascade the delete, which means that any record in the child table containing the deleted foreign key would also be deleted. There are other options available that are not supported by all RDBMSs. The documentation for your RDBMS is a good source of information on how to utilize keys for maintaining referential integrity.

Relationships

In relational databases, a relationship is established when two tables share common data. This sharing of data is what links or connects the tables. There are three types of relationships:

One-to-one
One-to-many
Many-to-many

During database design, one of the first steps is to determine what is our data and how it fits together. We find it useful to think of tables as an actual person, place, thing, or event. In the case of online bookstore, we might start with a table called Book. We would then determine the information that we need about a book.

A book has an International Standard Book Number (ISBN). It also has a title, a publisher, and an author. We next would look at the data to determine what data each instance of a book has for a single value. We know that a book has one ISBN, one title, and one publisher. This is the data that we would place in the table Book.

Now, what about the author? Some books have one only author, whereas others (like this one) have more then one author. This tells us that we need a table that handles the relationship between authors and books. Although a book has only one publisher, there is additional data that we need about the publisher. The Book table is not the appropriate place for this data because it is related to the Publisher, not the Book; so we would create a Publisher table. By this point, you hopefully have seen that we are trying to identify the data that is unique to each instance of an entity presented by our table; we are trying to avoid data redundancy.

This is the reason we said that additional information about the publisher would not be appropriate for the Book table. This is a point that is worthy of a little bit of discussion. If we decided to enter the publisher's address as part of the Book table and inserted 1000 books published by New Riders in the table, what do you think is the chance that it will be entered all 1000 times without a mistake or difference? If New Riders changes their address, we have to locate and update 1000 records. If we instead create a Publisher table, we only have to enter the address for New Riders one time. If they move to a new location, we only have to update one record. When we query the database, we would still be able to use the relationship between the Publisher and Book table to obtain information about the book and publisher. This demonstrates several of the reasons why relational databases have become so popular. The relational database reduces data redundancy and improves maintainability, all without sacrificing data accessibility.

We have alluded to the relationship between tables on a number of occasions and it is now time to examine the three types of relationships more closely. A publisher can publish many books. A state might have many authors. These are examples of a one-to-many relationship. A one-to-many relationship is defined that each record in Table A can have many related records in Table B, but Table B can have only one related record in Table A. A book can have only one publisher, and an author can live in only one state. This is the most common type of relationship.

The next type of relationship that we can look at is the many-to-many relationship. By definition, a many-to-many relationship exists when a record in Table A can have many related records in Table B, although a record in Table B can have many related records in Table A. The relationship between the Author and Book tables is an excellent example of this type of relationship. A book can have many authors and an author can write many books. This type of relationship is actually two one-to-many relationships and is possible only in a relational database with a third table. This third table is known as a junction or cross-reference table. There are two methods for creating the cross-reference table. The first is to create a table with a primary key that is a composite primary key comprised of the foreign keys from Table A and Table B. The second method consists of creating a table with three columns, the foreign keys from Table A and Table B and a unique primary key, normally a simple ID field.

The third type of relationship is the one-to-one relationship. In a one-to-one relationship, each record in Table A can have only one related record in Table B. Conversely, each record in Table B can have only one related record in Table A. Because of this one-to-one relationship, the data would normally be stored in one table.

There are two primary reasons for creating tables with a one-to-one relationship. The first is for performance reasons and the second is for security. A common use of one-to-one relationship is to protect customer credit card information. The second reason for creating a one-to-one relationship is when only a small portion of the records have the related columns.

Indexes

There are two ways that most databases access data, either by performing what is called a full table scan or by utilizing indexes. A full table scan means that the DBMS starts with the first record in a table and scans through all the rows in a table, extracting the rows that meet the query conditions. An index in a database is very much like the index of a book. It provides the database with a method of knowing where to look for specific information. Indexes can be applied to single fields or multiple fields. A multiple field index might be used where the first field contains information that might not be unique. The fields FirstName and LastName would be a prime example of a situation where the use of multiple field indexing would be appropriate. EmployeeID or any other unique ID would be appropriate for single field indexing. A field such as Gender, which has only two possible values (M and F), would not be appropriate for indexing.

As a general rule, if the following five rules or characteristics apply, then an index should be created for the respective field:

The field has primary keys.
Foreign keys or columns are joined to other columns in other tables.
The field contains many different values.
You anticipate frequent searching of the column.
You anticipate frequent sorts of the column.

Here are some situations in which you do not want to index columns:

The column is seldom referenced in a query as part of the condition statement.
The column contains few unique values.

Datasources

There are two ways that the term datasource is used. The first manner refers to the source to which ColdFusion is connecting to obtain data. ColdFusion can interact with a number of external datasources. In addition to databases, ColdFusion can interact with datasources such as spreadsheets, Lightweight Directory Access Protocol (LDAP) directories, Hypertext Transfer Protocol (HTTP) servers, mail servers, and File Transfer Protocol (FTP) servers. Because this chapter is concerned with databases, the rest of our discussion focuses on ColdFusion using a database as the datasource.

The second definition associated with datasource is the connection ColdFusion uses to communicate with an external source. In previous releases of ColdFusion, there were three types of datasources supported to communicate with databases: open database connectivity (ODBC), OLE-DB, and native drivers. This changed with the release of ColdFusion MX.

ColdFusion MX uses Java Database Connectivity (JDBC) drivers to interact with databases. You can still connect to ODBC datasources using the bundled Merant Type IV database drivers or the SQL Link Type III database drivers.

ODBC

ODBC is a widely accepted application program interface (API) for database access. When applications such as ColdFusion use ODBC, they are not communicating with the database directly. Instead they are communicating with the ODBC datasource. The ODBC API uses drivers to give you an abstract layer with which to communicate. This enables you to create an application without concern about the protocols used by the datasource (such as a database or spreadsheet).

In versions of ColdFusion prior to the release of ColdFusion 5, you could not use a datasource without it registering in the ColdFusion Administrator. This does not mean you have to actually create the datasources in the ColdFusion Administrator. You could create the datasource as a system Data Source Name (DSN) in whatever ODBC datasource manager your operating system (OS) provided; ColdFusion would find it in the Registry. The release of ColdFusion 5 introduced the capability to make a dynamic connection to an unregistered datasource. This is accomplished by setting the dbtype attribute of the CFQUERY tag to "DYNAMIC". This is not supported in MX, however.

JDBC Drivers

JDBC is a Java API that enables applications to interact with a variety of relational databases. The advantage that JDBC enjoys is that it is neither database- nor platform-specific and unlike ODBC, JDBC drivers do not require that a driver manager and drivers be installed on every client machine. There are four types of JDBC drivers, but only Type 3 and Type 4 drivers are supported by MX.

Type 3 drivers are pure Java code and translate JDBC API calls into a DBMS-independent net protocol that is then translated to a DBMS protocol by a server module (middle tier). Because the Type 3 drivers use a DBMS-neutral protocol to send SQL requests, they offer the advantage of an extremely flexible JDBC solution. The requirement to have the server module translate the SQL call to a DBMS-specific code is a disadvantage though.

Type 4 drivers are also pure Java, but convert the SQL call into DBMS-specific code. This enables a direct call to the DBMS directly from the client. Type 4 drivers often use proprietary protocols and are often provided by the database vendors themselves. Although they lack the flexibility of a Type 3 driver, they are platform-independent and do not rely on a server module. Table 23.2 provides a list of the drivers shipped with ColdFusion MX.

Table 23.2. ColdFusion MX JDBC Type 4 Drivers
Driver	Type
Microsoft Access	3
ODBC Socket	3
DB2 UDB for OS/390	4
DB2 Universal Database 6.2 and 7.2	4
Informix 9.x	4
Microsoft SQL Server 7.x and 2000	4
MySQL	4
Oracle R3 (8.1.7) and Oracle 9i	4
Oracle Thin Client	4
PostgreSQL	4
Sybase 11, 11.9.2, and 12	4
Sybase jConnector 5.0	4

The ColdFusion documentation, specifically "Administering ColdFusion MX," provides detailed instruction on the configuration of datasources. We highly recommend you review the configuration instructions even if you are an experienced ColdFusion developer there are changes from previous releases.

Summary

Relational databases provide a way of storing data in a collection of related and defined tables. Each table contains records or rows that represent a unique instance of an entity. A primary key is used to establish this uniqueness. Relationships between the tables are established when two tables share common data. When the column data in one table matches the primary key of another table, it is know as a foreign key. Table indexes provide the DBMS a way of knowing where to look for specific data, much like the index of a book. One of the strengths of a relational database is the ease with which new table (relationships) can be added. A second strength is the capability to access and present data without having to reorganize the database tables. Let's take a look at how we access data within a relational database.