Chapter 1
Basic Concepts
So, what is this mythical creature called a relational database? Briefly, it's a tool for storing and manipulating information efficiently and effectively—"
efficiently
and effectively" in the sense that data is protected from accidental loss or corruption, that it doesn't use more resources (human or computer) than necessary, and that it can be retrieved in
sensible
ways within acceptable performance constraints. The database itself is the physical implementation of a relational model, which is a way of describing some aspect of the real world according to a set of rules first proposed by Dr. E. F. Codd in the late 1960s.
In theory, a relational database could be coded from scratch, but in reality you'll normally use the services of a database management system (DBMS). A DBMS is sometimes called a relational database management system (RDBMS), but technically a DBMS must meet some 300 rules to qualify as relational, and to the best of my knowledge no commercially available system fully qualifies. The two database management systems we'll be examining in this book are Microsoft Access and Microsoft SQL Server.
I've said that a relational database is the physical implementation of a relational model (the data model), and it's important to keep these two concepts distinct. While it's almost
impossible
to completely ignore the constraints of the implementation environment during the design phase, best practice dictates that the original model be as "pure" as possible. Although you might already know that for performance reasons you're going to have to make certain trade-offs during implementation, you can, and should, ignore these decisions during data modeling. An example of this is storing calculated fields (such as OrderTotal) in a base table, which is a
major
no-no in relational design but a common technique in practice. Whatever you choose to do with your implementation, your model should not include the calculated field.
What Is a Database?
Database terminology is almost as
slippery
as the
term
"object-oriented programming." The word "database" can be used to describe everything from a single set of data, such as a telephone list, to a complex set of tools, such as SQL Server, and a whole lot in between. This lack of precision isn't a bad thing,
necessarily
—it's just the nature of language—but it's not particularly useful for our purposes, so I'll try to be a bit more precise here. Figure 1-1 shows the relationships between the terms discussed below.
Although relational databases don't have real-world
analogies
, most are intended to model some aspect of the real world. I'll call that bit of the real world the
problem space
. The problem space, by its nature, is messy and complex—if it weren't, we wouldn't need to build a model of it. But it is critical to the success of your project to limit the database system you're designing to a specific,
well-defined
set of objects and interactions; only by doing so can you make
sensible
decisions about the scope of your system.
I'll use the term
data model
to mean the conceptual description of the problem space. This includes the definition of entities, their attributes (a Customer, for example, is an entity, and it might have the attributes
Name
and Address), and the entity constraints (such as, the CustomerName cannot be empty). The data model also includes a description of the relationships between entities and any constraints on those relationships—for example, managers are not allowed to have more than five individuals reporting to them. It does not include any reference to the physical layout of the system.
The definition of the physical layout—the tables and views that will be implemented—is the
database schema
or just
schema
. It's the translation of the conceptual model into a physical representation that can be implemented using a database management system. Note that the schema is still conceptual, not physical. The schema is nothing more than the data model
expressed
in the terms that you will use to describe it to the database engine—tables and triggers and such
creatures
. One of the benefits of using a database engine is that you don't ever have to deal with the physical implementation; you can largely ignore B-trees and leaf nodes.
Once you've explained to the database engine what you want the data to look like, using either code or an interactive environment such as Microsoft Access, the engine will create some physical objects (usually, but not always, on a hard disk someplace) and you'll store data in them. The combination of structure and data is what I'll refer to as a
database
. This database includes the physical tables; the defined views, queries, and stored procedures; and the rules the engine will enforce to protect the data.
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}
Figure 1-1.
Relational database terminology.
The term "database" does
not
include the
application,
which consists of the forms and reports with which your users will interact, nor does it include any of the bits and pieces—things such as middleware or Microsoft Transaction Server—used to stick the front and back ends together. The term "database" also excludes the database engine. Thus, an Access .mdb file is a database, while Microsoft Jet is a database engine. Actually, an .mdb file can contain other things besides the database—forms and
reports
, for example—but that's a topic we'll discuss later.
To describe all these
components
—the application, the database, the database engine, and the middleware—I'll use the term
database system
. All of the software and data that goes into making a production system is part of the database system.