Perhaps that definition is too broad. These days, when we think of "database," it's generally a relational database system. Such databases are built on the "relational model" designed by Edgar Codd of IBM. In 1970, he issued "A Relational Model of Data for Large Shared Data Banks," the seminal paper on relational modeling, and later expanded on the basic concepts with C. J. Date, another "real programmer." Upon reading that 1970 paperand if you have a free afternoon, you would really benefit from spending time with your family or friends rather than reading that paperyou will enter a world of n-tuples, domains, and expressible sets. Fortunately, you don't need to know anything about these terms to use relational database systems.
The relational databases that most programmers use collect data in tables, each of which stores of specific set of unordered records. For convenience, tables are presented as a grid of data values, with each row representing a single record, and each column representing a consistent field that appears in each record. Table 4-1 presents a table of orders, with a separate record for each line item of the order.
Putting all of your information in a table is really convenient. The important data appears at a glance in a nice and orderly arrangement, and it's easy to sort the results based on a particular column. Unfortunately, this table of orders has a lot of repetition. Customer names and product names both repeat multiple times. Also, although the product ID "BEV01COF" indicates coffee, one of the lines lists it as "Tea." There are a few other problems inherent in data that's placed in a single flat file database table.
Mr. Codd, the brilliant computer scientist that he was, saw these problems, too. But instead of just sitting around and complaining about them like I do, he came up with a solution: normalization. By breaking the data into separate tables with data subsets, assigning a unique identifier to each record/row in every table (a primary key), and making a few other adjustments, the data could be "normalized" for both processing efficiency and data integrity. For the sample orders in Table 4-1, the data could be normalized into three separate tables: one for order line items, one for customers, and one for products (see Table 4-2, Table 4-3, and Table 4-4, respectively). In each table, I've put an asterisk next to the column title that acts as the primary key column.
To get combined results from multiple tables at once, join (or link) their matching fields. For instance, you can link the Customer ID field in the table of line items with the matching Customer ID primary key field in the table of customers. Once joined, the details for a single combined line item record can be presented with the matching full customer name. It's the same for joins with any two tables that have linkable fields. Figure 4-1 shows the relationships between the customer, product, and order line tables.
Figure 4-1. Three tables, and yet they work as one
To join tables together, relational databases implement query languages that allow you to manipulate the data using relational algebra (from which the term "relational database" derives). The most popular of these languages, SQL, uses simple English-like sentences to join, order, summarize, and retrieve just the data values you need. The primary statement, SELECT, provides basic data selection and retrieval features. Three other common statements, INSERT, UPDATE, and DELETE, let you manipulate the records stored in each table. Together, these four statements make up the primary data manipulation language (DML) commands of SQL. SQL also includes data definition language (DDL) statements that let you design the tables used to hold the data, as well as other database features. I'll show examples of various SQL statements later in this chapter.
Vendor-specific systems, such as Microsoft's SQL Server, Oracle's "Oracle," Microsoft's Access, and IBM's DB2, extend these core DDL and DML features through additional data analysis and management tools. They also battle each other over important features such as data replication, crash-proof data integrity, the speed at which complex queries return the requested results, and who has the biggest private jet.