Database Design and Implementation | DB2 for z/OS Version 8 DBA Certification Guide

Figure 4-7 shows the many facets to work on when implementing a DB2 relational database. A couple of general design steps need to be performed: development of a logical model and development/implementation of a physical model.

Figure 4-7. Designing relational database applications

Logical Design

Logical design is the process of determining entities, entity attributes, relationships among entities, degrees of the relationships, and representing them in a fully keyed, normalized data model or entity relationship diagram. The normalized data model can then be used to define the tables needed for the relational database. Performing the logical design also involves identifying primary keys and foreign keys.

A data model is a graphic, conceptual model that identifies the entity types of the business and the business interactions among the entities. The data model provides a static rather than dynamic view of data, which appears as if frozen in time. Two basic approaches to designing a data model follow.

A bottom-up approach produces a composite, or global, view of the data in the organization, based on the combination of many users' views of the problem requirements, not the inherent structure of the data. This approach does not reflect all the business activities most often used in data analysis.
A top-down approach produces an organizational view of the data before the application views are identified. This approach takes into account the business activities of the entire organization, independent of any particular application.

A logical data model has three basic components: entities, attributes, and relationships.

Entities

An entity is an object about which we store data for the purpose of answering a query or making a decision. An entity may be a person, place, or thingeventof interest to the enterprise or application. Occurrences of an entity are uniquely identified by one or more attributes that make up the key to the data entity. An entity is often named with a singular noun, such as Customer. In addition, an association (relationship) between two or more data entities represents an action and may be defined in an English sentence: subject, verb. This interaction is directional:

One to one
One to many
Many to many
Zero to one
Zero to many

Attributes

Attributes are the values, or characteristics, associated with an entity. Every occurrence of an entity has associated attributes. Attributes can be represented as a field. Domains are also used to give specific property of an object. An attribute that has attributes should be considered as a possible entity. Attributes are the data items required to make process work. When identifying attributes, consider such characteristics as data type, size, how used, and entities to which related.

Normalization

Normalization is the process of nonloss decomposition of data relations and is performed during logical database design. Normalization promotes the formalization of simple ideas, such as "Domains must contain atomic values (single value)." The concept of functional dependence plays a key role. It is a helpful tool for physical design but is not an absolute and does not result in a physical design (final table design). Many process considerations must be accounted for before a physical design is done.

Some of the advantages of a normalized design may include

By forcing the storage of nonkey information in a single place, minimizing the amount of space required to store data
By storing data items only once, where possible, minimizing the risk of data inconsistencies within the database
By storing data in only one place, where possible, minimizing several types of possible update and delete errors
By focusing on the properties of the data rather than on how the applications will use the data, maximizing the stability of the data structures

Some of the objectives of data normalization are

Eliminating all data "anomalies," such as UPDATE, INSERT, and DELETE
Avoiding redundancy of data
Avoiding potential inconsistency among data
Preserving all relevant information
Maintaining maximum flexibility in the database design
Accommodating changes easily
Providing nonloss decomposition of data elements

Depending on the interpretation of the relational model, the normal forms may vary but in general number five. We look only at three, as most normalized designs are in the third normal form.

Figure 4-8 shows the first normal form (1NF), which has the following characteristics.

For any specific value of the unique indentifier (unique key), each attribute of the data entity has only one value.
No repeating groups or arrays are present.
Every attribute has a unique meaning and name.
The data entity contains atomic values.
The data entity must not group several data fields as a single value for a column.
The data entity's true relational table satisfies 1NF.
The design will still be a high degree of redundancy.

Figure 4-8. First normal form

Figure 4-9 shows the second normal form (2NF), which has the following characteristics.

The data relation must first be in 1NF.
Every nonkey attribute must be fully dependent on the primary key.
No data elements will be dependent on a component of the primary key.
The design does not allow nonkey attributes to be assigned to entities in which a part of the primary key can determine the nonkey attributes.

Figure 4-9. Second normal form

Figure 4-10 shows the third normal form (3NF), which has the following characteristics.

The data relation must first be in 2NF.
No data elements are dependent on other nonkey data elements.
Every nonkey attribute is nontransitively dependent on the primary key.
This form is considered to be optimal for most tables.

Figure 4-10. Third normal form

Anomolies

The introduction of data anomalies into a database design can lower productivity in application design and create useless data. Anomolies need to be eliminated in order to

Achieve semantic clarity
Increase productivity and data consistency
Create a design that requires the least amount of knowledge to achieve results

Anomalies can cause

Integrity problems, such as duplication and inconsistencies
A loss of conceptual clarity
Unnecessary programming complexity
Questionable growth and stability

Figure 4-11a shows what an update anomaly is and a problem that could result. Figure 4-11b shows how a normalized design would fix the issue.

Figure 4-11. (a) Update anomaly; (b) Normalized design to solve update anomaly

Physical Design

The physical-data model is similar to the logical-data model except that the objects are identified as physical objects ready to be implemented in a DB2 environment. All elements on the logical model should transform into an object on the physical model. At this point, the model should be normalized.

Following are some general guidelines for transforming a logical model to a physical one:

Creating a physical table for each entity
Defining a unique index for each primary key
Defining a nonunique index for each foreign key
Defining other nonunique and unique indexes
Documenting processing needs to be considered before implementation

Performance and DBMS specifics must still be considered afterward.

Example Implementation

The best way to understand data type selection is to design a database and implement the design by using DB2. This example implementation creates a database that can be used to schedule and track the results of a certification program. This database will be used to illustrate many aspects of SQL and DB2 features. Many examples from this database will be used throughout this book.

NOTE

This database was developed in the DB2 Universal Database V8 for Linux, UNIX, and Windows Database Administration Certification Guide, Fifth Edition, published by Prentice Hall, and can be referenced in that publication to observe implementation differences between DB2 z/OS and the DB2 UNIX, Windows, and Linux platforms.

This database will be used to schedule test candidates' exams; following candidates' completion of the test, the database will contain the candidates' test scores. The database and its application will need to perform eight tasks:

Insert/update/delete testing center information
Insert/update/delete test information
Insert/update/delete test candidate information
Guarantee a uniquely identified test name, regardless of the test number
Schedule a candidate to take a test
Update candidate test scores once exams have been completed
Determine which candidates qualify for certification
Generate various reports on the candidates and tests

The database will be named DB2CERT. The data to be stored in the DB2CERT database can easily be grouped into three reference tables; a fourth table is used to relate the other tables. The primary relationship can be defined as a test candidate takes a specific test at a test center.

Figure 4-12 shows the relationships within the problem domain. The rectangles represent the base tables: CANDIDATE, TEST_CENTER, and TEST. The fourth table is a relationship table called TEST_TAKEN.

DB2CERT Database Table Descriptions

The following tables will be used in the DB2CERT database.

The CANDIDATE table stores information about each test candidate for the DB2 Certification Program. The candidate's name, address, ID, and phone number will be stored in this table. A data record represents a single test candidate, a person.
The TEST_CENTER table stores information about the test centers where a candidate can take a DB2 Certification exam. The test center name, address, number of seats at the test center, test center ID, and its phone number will be stored in this table. A data record represents a single test center location.
The TEST table stores information about each of the DB2 Certification exams. The test name, type, test ID, cut scorepassing percentageand the length of each test will be stored in this table. A data record represents a single test. Our example has three tests; therefore, this table has only three data records. A test name must be uniquely identified.
The TEST_TAKEN table associates the records from the other three tables. It serves the dual purposes of scheduling tests and tracking each test result. The candidates' test scores, date taken, start time, and seat number will be stored in this table. This will be the most active of the four tables, as each candidate must take multiple exams to become certified; each test taken will have a corresponding data record in this table.

Once the tables and their relationships have been defined, the following should be defined:

User-defined data types
Columns or attributes for the tables
(Optional) primary keys (PK) for the tables
(Optional) unique keys for the tables
(Optional) foreign keysreferential constraintsfor the tables
(Optional) table-check constraints for the tables
(Optional) triggers for the database

Figure 4-13 shows the database design. The rectangles represent the entities, or tables. The columns, or attributes, are listed by the entities. Note that some of the columns are derived columns; that is, they represent a concept, not a physical attribute of an object. The derived columns (such as TOTALTAKEN on TEST) are included in the model because their values will be populated by the database, using a constraint mechanism.

Figure 4-13. Entities and attributes

We must map the attributes shown in Figure 4-13 as DB2-supported data types or distinct types. To demonstrate some of the powerful features of DB2, we have decided to create distinct types for many of the attributes.

It is beneficial to have a primary key defined for each table to ensure uniqueness of the data records. The attributes that are bolded will be used as primary keys. We will also create unique keys to illustrate their use.

Earlier, we mentioned that the DB2CERT database will have four tables. However, the design shown in Figure 4-14 has only three tables defined, shown as rectangles.

Figure 4-14. DB2 Certification database entity basic relationship

An implied table is defined in the relationship candidate takes a test. A table is required to store each occurrence of a candidate taking a certification test.

A restriction will be imposed on the candidates: They can take the test only once on any given day. A candidate can take different tests on the same day but not the same test. With this restriction in place, we will define a primary key for the TEST_TAKEN table as a composite key including NUMBER (test ID), CID (candidate ID), and DATE_TAKEN (the date of the test). By defining the primary key as a combination of these three values, this constraint can be enforced.

Figure 4-14 shows the entity relationship diagrams for the Certification database.

Figure 4-15 shows the RI relationships between the tables in the Certification database.

Figure 4-15. DB2 Certification database referential integrity

The diamond shapes are used to describe the relationships between the tables, including the parent/child relationship. For example, the CANDIDATE and the TEST_TAKEN table have a one-to-many relationship because a single candidate can take many tests. This relationship is shown by denoting the values of 1 and M (many) on the appropriate side of the diamond.

The database design shown in Figure 4-14 is just one type of diagramming technique. A logical database design can be represented in a number of ways, but it should not absolutely dictate the physical implementation of the database. It has been included it here because it will be used in many of the SQL statements throughout the rest of the book.

Defining Distinct Types

Distinct types must exist in the database before they can be referenced in a CREATE TABLE statement. We created a numeric distinct type of MINUTES; a character distinct type of PHONE; and a binary large object distinct type of BITMAP.

 CREATE DISTINCT TYPE phone AS CHAR(10) WITH COMPARISONS; CREATE DISTINCT TYPE bitmap AS BLOB(INT) WITH COMPARISONS; CREATE DISTINCT TYPE minutes AS SMALLINT WITH COMPARISONS;

Defining Tables and Columns

Designing a database involves many considerations, only a few of which are discussed in this book. Using the database design in Figure 4-13, we can start creating database objects.

The first step in creating a database is to issue the run a CREATE DATABASE statement, such as the following:

 CREATE DATABASE CERTTS STOGROUP CERTSTG BUFFERPOOL BP7 INDEXBP BP8;

Once the database has been created, dependent physical objects can be created. Distinct (user-defined) data types can also be created. Tables can then be created. (If explicit table spaces are to be used, they will need to be created first, but for this example, we will assume they already exist. Refer to Appendix A for more table space CREATE methods.)

The database design has a number of attributes shown. Each of these attributes will be a column in the table definitions. Every DB2 table contains one or more columns. The tables and their corresponding columns are given names.

Data is placed in a DB2 table by using the SQL statement INSERT or UPDATE. (The LOAD utility is also an option.) Usually, it is desirable for each column, or data value, to have a value. Sometimes, no value is provided for a column during the INSERT statement. If the column is defined as NOT NULL, the INSERT statement will fail. If a default value is defined, it will be stored.

The table being created in the following DDL is called DB2USER1.CANDIDATE and contains 15 columns. Each column is given a name and a data type. Two distinct data types are being used in the DB2USER1.CANDIDATE table. These data types are PHONE and BITMAP. In addition, constraints are defined for the valid values for some of the columns. For example, the null constraint is specified for all the columns except HPHONE, WPHONE, INITIAL, and PHOTO.

 CREATE TABLE DB2USER1.CANDIDATE (    Cid CHAR(9) NOT NULL,    LName VARCHAR(30) NOT NULL,    FName VARCHAR(30) NOT NULL,    Initial CHAR(1),    HPhone phone,    WPhone phone,    StreetNo VARCHAR(8) NOT NULL,    StreetName VARCHAR(20) NOT NULL,    City VARCHAR(30) NOT NULL,    Prov_State VARCHAR(30) NOT NULL,    Code CHAR(6) NOT NULL,    Country VARCHAR(20)) NOT NULL,    Cert_DBA CHAR(1) NOT NULL WITH DEFAULT 'N',    Cert_APP CHAR(1) NOT NULL WITH DEFAULT 'N',    PHOTO BITMAP, PRIMARY KEY (Cid)) IN DB2CERT.CERTTS;

Keys

Keys are a special set of columns defined on a table. They can be used to uniquely identify a row or to reference a uniquely identified row from another table. Keys can be classified either by the columns they comprise or by the database constraint they support. See the discussion earlier in this chapter on types of keys.

Defining Primary Keys

It is sometimes beneficial to define a primary key for each of your DB2 tables to guarantee the uniqueness of a column value or group of column values (composite key). In the previous CREATE statement, the primary key for the table candidate is defined as the column CID (candidate ID). By specifying this column as a primary key, DB2 will create a system-unique index if one does not already exist.

Let us look at the other tables representing the tests and the test centers. In the following CREATE statements, the TEST and TEST_CENTER tables are shown. These tables each have a primary key defined. In our example, the primary-key constraint was given a name (UNIQUE_TEST and UNIQUE_CENTER) for referencing purposes. If a constraint name is not provided, DB2 will assign a system-generated name to the constraint.

 CREATE TABLE DB2USER1.TEST ( NUMBER CHAR(6) NOT NULL, NAME VARCHAR(50) NOT NULL, TYPE CHAR(1) NOT NULL, Cut_Score DECIMAL(6,2) NOT NULL, Length minutes NOT NULL, TotalTaken SMALLINT NOT NULL, TotalPassed SMALLINT NOT NULL, CONSTRAINT Unique_Test PRIMARY KEY(Number), CONSTRAINT Unique_Test_Name UNIQUE (Name) CONSTRAINT Test_Type CHECK (Type IN ('P','B'))) IN DB2CERT.CERTTS CREATE TABLE DB2USER1.TEST.CENTER (    Tcid CHAR(6) NOT NULL,    Name VARCHAR(40) NOT NULL,    StreetNo VARCHAR(8) NOT NULL,    StreetName VARCHAR(20) NOT NULL,    City VARCHAR(30) NOT NULL,    Prov_State VARCHAR(30) NOT NULL,    Country VARCHAR(20) NOT NULL,    Code CHAR(6) NOT NULL,    Type CHAR(1) NOT NULL,    Phone phone NOT NULL,    No_Seats SMALLINT NOT NULL,    CONSTRAINT Unique_Center PRIMARY KEY (Tcid)) IN DB2CERT.CERTTS    PARTITION BY (TCID)    (PARTITION 1 ENDING AT ('300'),     PARTITION 2 ENDING AT ('500'));

Note that the TEST_CENTER table is a partitioned table that is partitioned by TCID.

Defining Unique Keys

Unique keys can be used to enforce uniqueness on a set of columns. A table can have more than one unique key (index) defined. The TEST table definition uses a unique constraint (UNIQUE_TEST_NAME) on column NAME to ensure that a test name is not used twice. The column NUMBER too has a primary-key constraint to avoid duplicate test numbers.

Having unique constraints on more than one set of columns of a table is different from defining a composite unique key that includes the whole set of columns. For example, even if we define a composite primary key on the columns NUMBER and NAME, a test name may still be duplicated using a different test number.

NOTE

A unique index needs to always be created for primaryif one does not already existor unique-key constraints, unless using the schema processor. If you define a constraint name, it will be used to name the index; otherwise, a system-generated name will be used for the index.

Defining Foreign Keys

A foreign key is a reference to the data values in another table. Foreign keys have various types of constraints. Let us look at the remaining table in the DB2 Certification database and, in particular, its foreign-key constraints. We have one composite primary key defined and three foreign-key constraints.

The primary key is defined as the columns CID, TCID, and NUMBER on the TEST_TAKEN table. The foreign-key constraints will perform the following.

If a record in the CANDIDATE table is deleted, all matching records in the TEST_TAKEN table will be deleted (DELETE CASCADE).
If a test center in the TEST_CENTER table is deleted, all the matching records in the TEST_TAKEN table will be deleted (DELETE CASCADE).
If a test in the TEST table is deleted and matching records are in the TEST_TAKEN table, the DELETE statement will result in an error (DELETE RESTRICT).

NOTE

A foreign-key constraint always relates to the primary or unique constraint of the table in the references clause.

 CREATE TABLE DB2USER1.TEST_TAKEN (    Cid candidate_id NOT NULL,    Tcid center_id NOT NULL,    Number test_id NOT NULL,    Date_Taken date NOT NULL WITH DEFAULT,    Start_Time TIME NOT NULL,    Finish_Time TIME NOT NULL,    Score score,    Seat_No CHAR(2) NOT NULL, CONSTRAINT number_const PRIMARY KEY (Cid,Tcid,Number), FOREIGN KEY (Cid) REFERENCES db2cert.candidate ON DELETE CASCADE, FOREIGN KEY (Tcid) REFERENCES db2cert.test_center ON DELETE CASCADE, FOREIGN KEY (Number) REFERENCES db2cert.test ON DELETE RESTRICT) IN DB2CERT.CERTTS

Defining parent/child relationships between tables is known as declarative referential integrity because the child table refers to the parent table. These constraints are defined during table creation or by using the ALTER TABLE SQL statement. DB2 will enforce referential constraints for all INSERT, UPDATE, and DELETE activity.

NOTE

This database implementation created only the tables and distinct types. The complete DDL for the storage group, databases, table spaces, tables, and indexes can be found in Appendix A.