Relational Databases

Perhaps that definition is too broad. These days, when we think of "database," it's generally a relational database system. Such databases are built on the "relational model" designed by Edgar Codd of IBM. In 1970, he issued "A Relational Model of Data for Large Shared Data Banks," the seminal paper on relational modeling, and later expanded on the basic concepts with C. J. Date, another "real programmer." Upon reading that 1970 paperand if you have a free afternoon, you would really benefit from spending time with your family or friends rather than reading that paperyou will enter a world of n-tuples, domains, and expressible sets. Fortunately, you don't need to know anything about these terms to use relational database systems.

The relational databases that most programmers use collect data in tables, each of which stores of specific set of unordered records. For convenience, tables are presented as a grid of data values, with each row representing a single record, and each column representing a consistent field that appears in each record. Table 4-1 presents a table of orders, with a separate record for each line item of the order.

Table 4-1. Boy, a Lot of People Drink Coffee and Tea
Record ID	Order ID	Customer ID	Customer Name	Product ID	Product	Price	Quantity
92231	10001	AA1	Al Albertson	BEV01COF	Coffee	3.99	3
92232	10001	AA1	Al Albertson	BRD05RYE	Rye Bread	2.68	1
92233	10002	BW3	Bill Williams	BEV01COF	Coffee	3.99	1
92234	10003	BW3	Will Williams	BEV01COF	Tea	3.99	2
92235	10004	CC1	Chuck Charles	CHP34PTO	Potato Chips	0.99	7

Putting all of your information in a table is really convenient. The important data appears at a glance in a nice and orderly arrangement, and it's easy to sort the results based on a particular column. Unfortunately, this table of orders has a lot of repetition. Customer names and product names both repeat multiple times. Also, although the product ID "BEV01COF" indicates coffee, one of the lines lists it as "Tea." There are a few other problems inherent in data that's placed in a single flat file database table.

Mr. Codd, the brilliant computer scientist that he was, saw these problems, too. But instead of just sitting around and complaining about them like I do, he came up with a solution: normalization. By breaking the data into separate tables with data subsets, assigning a unique identifier to each record/row in every table (a primary key), and making a few other adjustments, the data could be "normalized" for both processing efficiency and data integrity. For the sample orders in Table 4-1, the data could be normalized into three separate tables: one for order line items, one for customers, and one for products (see Table 4-2, Table 4-3, and Table 4-4, respectively). In each table, I've put an asterisk next to the column title that acts as the primary key column.

Table 4-2. The Table of Customers
Customer ID *	Customer Name
AA1	Al Albertson
BW3	Bill Williams
CC1	Chuck Charles

Table 4-3. The Table of Products
Product ID *	Product Name	Unit Price
BEV01COF	Coffee	3.99
BRD05RYE	Rye Bread	2.68
BEV01COF	Coffee	3.99
CHP34PTO	Potato Chips	0.99

Table 4-4. The Table of Order Line Items
Record ID *	Order ID	Customer ID	Product ID	Quantity
92231	10001	AA1	BEV01COF	3
92232	10001	AA1	BRD05RYE	1
92233	10002	BW3	BEV01COF	1
92234	10003	BW3	BEV01COF	2
92235	10004	CC1	CHP34PTO	7

To get combined results from multiple tables at once, join (or link) their matching fields. For instance, you can link the Customer ID field in the table of line items with the matching Customer ID primary key field in the table of customers. Once joined, the details for a single combined line item record can be presented with the matching full customer name. It's the same for joins with any two tables that have linkable fields. Figure 4-1 shows the relationships between the customer, product, and order line tables.

Figure 4-1. Three tables, and yet they work as one

To join tables together, relational databases implement query languages that allow you to manipulate the data using relational algebra (from which the term "relational database" derives). The most popular of these languages, SQL, uses simple English-like sentences to join, order, summarize, and retrieve just the data values you need. The primary statement, SELECT, provides basic data selection and retrieval features. Three other common statements, INSERT, UPDATE, and DELETE, let you manipulate the records stored in each table. Together, these four statements make up the primary data manipulation language (DML) commands of SQL. SQL also includes data definition language (DDL) statements that let you design the tables used to hold the data, as well as other database features. I'll show examples of various SQL statements later in this chapter.

Vendor-specific systems, such as Microsoft's SQL Server, Oracle's "Oracle," Microsoft's Access, and IBM's DB2, extend these core DDL and DML features through additional data analysis and management tools. They also battle each other over important features such as data replication, crash-proof data integrity, the speed at which complex queries return the requested results, and who has the biggest private jet.

Table 4-1. Boy, a Lot of People Drink Coffee and Tea

Table 4-2. The Table of Customers

Table 4-3. The Table of Products

Table 4-4. The Table of Order Line Items

Figure 4-1. Three tables, and yet they work as one