Database Design


Whenever you are working with a relational database management system such as MySQL, the first step in creating and using a database is to establish the database's structure (also called the database schema). Database design, aka data modeling, is crucial for successful long-term management of your information. Using a process called normalization, you carefully eliminate redundancies and other problems that will undermine the integrity of your database.

The techniques you will learn in this chapter will help to ensure the viability, performance, and reliability of your databases. The specific example I will start witha content management system for storing Web linkswill be more explicitly used in Chapter 12, "ExampleContent Management," but the principles of normalization apply to any database application you might create.

Normalization

Normalization was developed by an IBM researcher named E.F. Codd in the early 1970s (he also invented the relational database). A relational database is merely a collection of data, organized in a particular manner, and Dr. Codd created a series of rules called normal forms that help define that organization. In this chapter I will discuss the first three of the normal forms, which are sufficient for most database designs.

Before you begin normalizing your database, you must define the role of the application being developed. Whether it means that you thoroughly discuss the subject with a client or figure it out for yourself, understanding how the information will be accessed dictates the modeling. Thus, this process will require paper and pen rather than the MySQL software itself (although database design is applicable to any relational database, not just MySQL).

In my example I want to store Web links (URLs). I have listed a sample row of data in Table 5.1.

Table 5.1. A record for the kind of information I want to store in my database.

Sample Content Data

ITEM

EXAMPLE

URL

www.php.net

Title

PHP: Hypertext Preprocessor

Description

The home page of …

Categories

General PHP, Programming, Web Development

Date Submitted

5/15/2005

Approved

Yes


Tips

  • One of the best ways to determine what information should be stored in a database is to think about what questions will be asked of the database and what data would be included in the answers.

  • The example in Chapter 4, "Introduction to SQL and MySQL," used only a single table and did not require more advanced normalization.


Keys

As briefly mentioned in the previous chapter, keys are pieces of data that help to identify a row of information in a table. There are two types of keys you will deal with: primary and foreign. A primary key is a unique identifier that has to abide by certain rules. They must

  • Always have a value (they cannot be NULL)

  • Have a value that remains the same (never changes)

  • Have a unique value for each record in the table

The best real-world example of a primary key is the U.S. Social Security number: each individual has a unique Social Security number, and that number never changes. Just as the Social Security number is an artificial construct used to identify people, you'll frequently find creating an arbitrary primary key for each table to be the best design practice.

The second type of key is a foreign key. Foreign keys are the representation of the primary key from Table A in Table B. If you have a cinema database with a movies table and a directors table, the primary key from directors would be linked as a foreign key in movies. You'll see better how this works as the normalization process continues.

The content database is just a simple table as it stands (Table 5.1), but before I begin the normalization process, I'll want to ensure at least one primary key (the foreign keys will come in later steps).

To assign a primary key

1.

Look for any fields that meet the three tests for a primary key.

In this example, the only column that may fit all of the criteria is the URL itself, but strings make for lousy keys, and there is always the possibility that the same URL is submitted multiple times. Furthermore, a URL could change, violating one of the rules of primary keys.

2.

If no logical primary key exists, invent one (Table 5.2).

Table 5.2. I've added a primary key to the table as an easy way to reference the records.

Sample Content Data, Extended

ITEM

EXAMPLE

URL ID

1

URL

www.php.net

Title

PHP: Hypertext Preprocessor

Description

The home page of …

Categories

General PHP, Programming, Web Development

Date Submitted

5/15/2005

Approved

Yes


Frequently, you will need to create a primary key because no good solution presents itself. In this example, I'll manufacture a URL ID.

Relationships

When I speak of database relationships, I specifically mean how the data in one table relates to the data in another. A relationship between two tables can be one-to-one, one-to-many, or many-to-many.

Tips

  • As a rule of thumb, I name my primary keys using at least part of the table's name (e.g., URL) and the word id. Some database developers like to add the abbreviation pk to the name as well.

  • MySQL allows for only one primary key per table, although you can base a primary key on multiple columns (this means the combination of those columns must be unique and never change).

  • Ideally, your primary key should always be an integer, which results in better MySQL performance.

  • In many database applications, the integrity of the primary keyforeign key relationship is monitored and ensured. Currently, MySQL formally implements foreign keys only when using the InnoDB table type but generally ignores their existence otherwise. Hence, foreign keys in MySQL are more of a theoretical presence than a binding one, although this should change in later versions of the software.


The relationship is one-to-one if one and only one item in Table A applies to one and only one item in Table B (e.g., each U.S. citizen has only one Social Security number, and each Social Security number applies to only one U.S. citizen; no citizen can have two Social Security numbers, and no Social Security number can refer to two citizens).

A relationship is one-to-many if one item in Table A can apply to multiple items in Table B. The terms female and male will apply to many people, but each person can be only one or the other (in theory). A one-to-many relationship is the most common one between tables in normalized databases.

Finally, a relationship is many-to-many if multiple items in Table A can apply to multiple items in Table B. For example, a record album can contain songs by multiple artists, and artists can make multiple albums. You should try to avoid many-to-many relationships in your design because they lead to data redundancy and integrity problems. Instead you'll end up creating an intermediary table so that a many-to-many relationship can be broken down into two one-to-many relationships (you'll see this soon enough).

Relationships and keys work together in that a key in one table will normally relate to a field in another, as I mentioned earlier.

Tips

  • Database modeling uses certain conventions to represent the structure of the database, which I'll follow through a series of images in this chapter. The symbols for the three types of relationships are shown in Figure 5.1.

    Figure 5.1. These symbols are commonly used to represent relationships in database modeling schemes.


  • The process of database design results in an ERD (entity-relationship diagram). This graphical representation of a database uses boxes for tables, ovals for columns, and the symbols from Figure 5.1 to represent the relationships.

  • The term "relational" in RDBMS actually stems from the tables, which are technically called relations.


First Normal Form

As I said before, normalizing a database is the process of adjusting the database's structure according to several rules, called forms. Your database should adhere to each rule exactly, and the forms must be followed in order.

Every table in a database must have the following two qualities in order to be in First Normal Form (1NF):

  • Each column must contain only one value (this is sometimes described as being atomic or indivisible).

  • No table can have repeating columns for related data.

A table containing one field for a person's entire address (street, city, state, ZIP code, country) would not be 1NF compliant, because it has multiple values in one column, violating the first property listed. As for the second, a movies table that had columns such as actor1, actor2, actor3, and so on would fail to be 1NF compliant because of the repeating columns all listing the exact same kind of information.

I'll begin the normalization process by checking the existing structure for 1NF compliance. Any columns that are not atomic will be broken into multiple columns. If a table has repeating similar columns, then those will be turned into their own, separate table.

To make a database 1NF compliant

1.

Identify any field that contains multiple pieces of information.

Looking back at Table 5.2, one field is not 1NF compliant: Categories. The example record had three different applicable categories, although other records could have more or less.

The Date Submitted field contains a day, a month, and a year, but subdividing past that level of specificity is really not warranted.

If your table used just one column for a person's name (instead of separate first and last) or stored multiple phone numbers (mobile, home, work) in a single column, you would also want to subdivide those, too.

2.

Break up any fields found in Step 1 into distinct fields (Table 5.3).

Table 5.3. The table with atomic columns.

Content Database, Atomic

ITEM

EXAMPLE

URL ID

1

URL

www.php.net

Title

PHP: Hypertext Preprocessor

Description

The home page of …

Category1

General PHP

Category2

Programming

Category3

Web Development

Date Submitted

5/15/2005

Approved

Yes


To fix this problem, I'll create separate Category1, Category2… fields, each of which contains only one value.

3.

Turn any repeating column groups into their own table (Tables 5.4 and 5.5).

Table 5.4. The 1NF-compliant URLs table stores all of the pertinent URL information, aside from what categories they are associated with.

Content Database: URLs

ITEM

EXAMPLE

URL ID

1

URL

www.php.net

Title

PHP: Hypertext Preprocessor

Description

The home page of …

Date Submitted

5/15/2005

Approved

Yes


Table 5.5. This table is required in order to allow each URL to be associated with multiple categories.

Content Database: Associated URLs

ID

URL

CATEGORY

1

www.php.net

General PHP

2

www.php.net

Programming

3

www.php.net

Web Development

4

www.mysql.com

General MySQL


There are two problems with the table structure as it stands (having multiple category fields). First of all, there's no getting around the fact that each record will be limited to a certain number of categories. Even if you add columns Category1 through Category100, there will still be that limit (of a hundred). Second, any record that doesn't have the maximum number of categories will have NULL values in those extra columns. You should generally avoid columns with NULL values in your database schema. To fix this, a new table has been created.

Notice that I've also added a primary key column to the newly created table. The idea that each table has a primary key is implicit in the First Normal Form.

4.

Double-check that all new fields created in Steps 2 and 3 pass the 1NF test.

Tips

  • The simplest way to think about 1NF is that this rule analyzes a table horizontally. You inspect all of the columns within a single row to guarantee specificity and avoid repetition of similar data.

  • Various resources will describe the normal forms in somewhat different ways, often with much more technical jargon. What is most important is the spiritand end resultof the normalization process, not the technical wording of the rules.

  • If you limited each URL so that it could be associated with only a single category, then this example would be much easier. For starters, the separate Associated URLs table wouldn't be necessary.


Second Normal Form

For a database to be in Second Normal Form (2NF), the database must first already be in 1NF (you must normalize in order). Then you must identify any columns whose values are the same in multiple rows. Such columns must be turned into their own table and related back to the original table.

As an example, the fictional movies table would have the director Martin Scorsese listed twenty times. This violates the 2NF rule, so a separate directors table would be created and the two tables would be linked through a primary keyforeign key one-to-many relationship.

Looking at the content database (see Tables 5.4 and 5.5), there are two obvious problems: the URL value will be the same for many rows in the Associated URLs table (Table 5.5), and similarly, the Category will most likely have repetitions for different URLs. To put this database into 2NF, I'll need to separate out these columns into their own tables, where each value will be represented only once. In fact, normalization could be summarized as the process of creating more and more tables until potential redundancies have been eliminated.

To make a database 2NF compliant

1.

Identify any fields that could have repeating values.

As I just stated, the URL and Category fields will likely have repeating values over multiple rows. The URLs table, as it stands, has no significant issues (although the Approved value will repeat as either Yes or No).

2.

Create new tables accordingly (Figure 5.2).

Figure 5.2. To make the database 2NF compliant, I create a third table (Categories).


The most logical modification for the existing structure is to make a separate Categories table. The URLs table is fine as is. I do need to yank the URL column out of the Associated URLs table, as that value repeats and is already represented within URLs (the tables will be linked shortly).

In my visual representation of the database, I create a box for each table, with the table name as a header and all of its columns (also called its attributes) underneath.

3.

Assign or create new primary keys (Figure 5.3).

Figure 5.3. Each table should have its own primary key.


Using the techniques described earlier in the chapter, ensure that each new table has a primary key. Here I've added a Category ID field to the Categories table to act as its primary key.

4.

Create the requisite foreign keys and indicate the relationships (Figure 5.4).

Figure 5.4. To relate the three tables, I add two foreign keys to the Associated URLs table, relating it to the other two tables.


The final step in achieving 2NF compliance is to incorporate foreign keys to identify how all of the data and tables are associated. Remember that a primary key in one table will most likely be a foreign key in another.

With this example, the URL ID from the URLs table links to the URL ID column in the Associated URLs table. Therefore, URLs has a one-to-many relationship with Associated URLs (because each URL can be associated with many different categories).

Also, the two Category ID columns are linked, creating a one-to-many relationship between Categories and Associated URLs (because each category can be associated with many different URLs).

Tips

  • Another way to test for 2NF is to look at the relationships between tables. The ideal is to create one-to-many situations. Tables that have a many-to-many relationship may need to be restructured.

  • A properly normalized database should never have duplicate rows (two or more rows in which the values in every column match) in the same table.

  • As you may have surmised, the Associated URLs table is actually just an intermediary. It solves the problem arising from the fact that there's a many-to-many relationship between the URLs and the categories.

  • To simplify how you conceive of the normalization process, remember that 1NF is a matter of inspecting a table horizontally, and 2NF is a vertical analysis (hunting for repeating values over multiple rows).

  • A primary key can actually be derived from multiple columns. Regardless, the rules for primary keys still count, so the combination of the column values must be unique, must not be null, and must never change. I repeat this fact because the Associated URLs table doesn't need its own ID column; URL ID and Category ID together could form the primary key. I'm leaving ID in as the primary key to minimize confusion.


Third Normal Form

A database is in Third Normal Form (3NF) if it is in 2NF and every nonkey column is dependent upon the primary key. If you followed the normalization process properly to this point, you may not have 3NF issues. In my example (see Figure 5.4), there aren't any 3NF problems, but I'll explain a hypothetical situation where this rule would come into play.

Take, as a common example, a single table that stores the information for registered clients: first name, last name, email address, phone number, mailing address, and so on. Such a table would not be 3NF compliant because many of the columns would not be dependent upon the primary key: street would actually be dependent upon the city; city would be dependent upon the state; and the ZIP code would be an issue, too. These values are subservient to each other, not to the person whose record it is. To normalize this database, you would have to create one table for the states, another for the cities (with a foreign key linking to the states table), and another for the ZIP codes. All of these would then be linked back to the clients table.

If you feel that all that may be overkill, you are correct. To be frank, this higher level of normalization is often unnecessary. The point is that you should strive to normalize your databases but that sometimes you'll make concessions to keep things simple (see the sidebar "Overruling Normalization"). The needs of your application and the particulars of your database will help dictate just how far into the normalization process you should go.

As I said, the content example is fine as is, so I'll outline the 3NF steps just the same using an abbreviated version of the just-mentioned Clients example.

To make a database 3NF compliant

1.

Identify any fields in any tables that do not relate directly to the primary key.

As I just stated, what you look for are columns that depend more upon each other (like city and state) than they do on the record as a whole.

In the content database, this isn't an issue. Just looking at the URL table, each URL will be specific to a URL ID, each title will be specific to that URL ID (to the URL actually, but the URL and URL IDs are essentially the same thing, as the one is a numeric representation of the other), and so forth.

2.

Create new tables accordingly (Figure 5.5).

Figure 5.5. Going with an minimal version of the hypothetical clients database, two new tables are created for storing the city and state values.


If you found any problematic columns in Step 1, like city and state, you would create a separate Cities and States tables.

3.

Assign or create new primary keys (Figure 5.6).

Figure 5.6. Primary keys are added to the two new tables.


Every table must have a primary key, so I add City ID and State ID to the new tables.

4.

Create the requisite foreign keys that link any of the relationships (Figure 5.7).

Figure 5.7. Finally, the foreign keys are added and the relationships are formally defined.


Finally, I've added a State ID to the Cities table and a City ID to the Clients table. This effectively links each client's record to the city and state in which they live. I also indicate that there are one-to-many relationships between the three tables.

Tips

  • As a general rule, I would probably not normalize this particular exampleClientsto this extent. If I left the city and state fields in the Clients table, the worst thing that would happen is that a city would change its name and this fact would need to be updated for all of the users living in that city. But thiscities changing their namesis not a common occurrence.

    Overruling Normalization

    As much as ensuring that a database is in 3NF will help guarantee stability and endurance, you won't fully normalize every database with which you work. Before undermining the proper methods though, understand that doing so may have devastating long-term consequences.

    The two primary reasons to overrule normalization are convenience and performance. Fewer tables are easier to manipulate and comprehend than more. Further, because of their more intricate nature, normalized databases will most likely be slower for updating, retrieving data from, and modifying. Normalization, in short, is a trade-off between data integrity/scalability and simplicity/speed. On the other hand, there are ways to improve your database's performance but few to remedy corrupted data that can result from poor design.

    Practice and experience will teach you how best to model your database, but do try to err on the side of normalization, particularly as you are still mastering the concept.


  • If you wanted to expand the Clients example, there are two ways of handling the street (e.g., 1234 Main Street). You could store the entire value in the Clients table while still linking to the Cities table. Alternatively, you could store just the street number (1234) in the Clients table but create a separate Streets table, storing the street names (Main Street), linked to the Cities table. This method would be much more tedious but truer to the normalization rules.


Creating the database

The final step in designing your database is to identify the column types and set the column, table, and database names. While MySQL is very flexible on how you name your databases, tables, and columns, here are some good rules to go by (some of which are required):

  • Use alphanumeric characters and the underscore to separate words (MySQL cannot use spaces or punctuation in names).

  • Do not use an existing keyword (you shouldn't name a table or column table or database or varchar, and so on).

  • Limit yourself to fewer than 64 characters.

  • Use entirely lowercase words (this is definitely a personal preference rather than a rule).

  • Use plural table names (to indicate multiple values stored) and singular column names.

  • End primary and foreign key columns with id (or ID), or pk and fk.

  • List the primary key first in a table, followed by foreign keys.

  • Field names should be descriptive.

  • Field names should be unique across every table, except for the keys.

These are largely recommendations and are therefore not absolute, except for limiting yourself to alphanumeric names without spaces that are fewer than 64 characters long. Some developers prefer to use capital letters to break up words (instead of underscores). Others like to indicate the column type or its table in its name. The most important consideration is that you remain consistent.

Table 5.6 shows the final database design. One alteration from the previous scheme is that I have decided to move the Approved and Date Submitted fields into the Associated URLs table (Figure 5.8). This is just a matter of personal preference, though; the database design would work either way.

Table 5.6. The final plan for the content database.

The content Database with Types

COLUMN NAME

TABLE

COLUMN TYPE

url_id

urls

SMALLINT(4) UNSIGNED NOT NULL AUTO_INCREMENT

url

urls

VARCHAR(60) NOT NULL

title

urls

VARCHAR(60) NOT NULL

description

urls

TINYTEXT NOT NULL

ua_id

url_ associations

SMALLINT(4) UNSIGNED NOT NULL AUTO_INCREMENT

url_id

url_ associations

SMALLINT (4) UNSIGNED NOT NULL

url_category_id

url_ associations

TINYINT(3) UNSIGNED NOT NULL

date_submitted

url_associations

TIMESTAMP

approved

url_ associations

CHAR(1) NOT NULL

url_category_id

url_categories

TINYINT(3) UNSIGNED NOT NULL AUTO_INCREMENT

category

url_categories

VARCHAR(20) NOT NULL


Figure 5.8. The last change in my database design is to have the Date Submitted and Approved columns be attributes of the Associated URLs table.


As for the table names, each will begin with url (urls, url_associations, url_categories). This convention makes it clear that the tables are related, and it also means they will be listed together when viewing all of a database's tables (because they are alphabetically close to one another).

To create the database

1.

Log in to the mysql client.

As with the previous chapter, this one will also use the mysql client (or monitor or command line client) for all of its examples. You are welcome to use phpMyAdmin or other tools as the interface to MySQL.

2.

Create the content database (Figure 5.9).

 CREATE DATABASE content; USE content; 

Figure 5.9. The first steps are to create and select the database.


Depending upon your setup, you may not be allowed to create your own databases. If not, just use your given database and add the following tables to it.

3.

Create the urls table (Figure 5.10).

 CREATE TABLE urls ( url_id SMALLINT(4) UNSIGNED NOT NULL  AUTO_INCREMENT, url VARCHAR(60) NOT NULL, title VARCHAR(60) NOT NULL, description TINYTEXT NOT NULL, PRIMARY KEY (url_id) ); 

Figure 5.10. Creating the first table.


It does not matter in what order you create your tables, but I'll make the urls table first. Remember that you can enter your SQL queries over multiple lines for convenience.

4.

Create the url_categories table (Figure 5.11).

 CREATE TABLE url_categories ( url_category_id TINYINT(3) UNSIGNED  NOT NULL AUTO_INCREMENT, category VARCHAR(20) NOT NULL, PRIMARY KEY (url_category_id) ); 

Figure 5.11. Creating the second table.


The primary key for this table is a bit smallera TINYINT(3) instead of a SMALLINT(4)as I expect it to contain fewer records.

5.

Create the url_associations table (Figure 5.12).

 CREATE TABLE url_associations ( ua_id SMALLINT(4) UNSIGNED NOT NULL  AUTO_INCREMENT, url_id SMALLINT(4) UNSIGNED  NOT NULL, url_category_id TINYINT(3) UNSIGNED  NOT NULL, date_submitted TIMESTAMP, approved CHAR(1) DEFAULT 'N'  NOT NULL, PRIMARY KEY (ua_id) ); 

Figure 5.12. The database's third and final table.


For brevity sake, I'm calling the primary key just ua_id instead of the full url_association_id. Also, I'll set a default value for the approved column. Since it is defined as NOT NULL, if no approved value is inserted when adding a record, it will be given a value of N.

The date_submitted column will automatically be NOT NULL because of the unique nature of TIMESTAMP columns. Such columns are automatically updated to the current date and time whenever a record is inserted or updated.

6.

If desired, confirm the database's structure (Figure 5.13).

 SHOW TABLES; SHOW COLUMNS FROM urls; SHOW COLUMNS FROM url_categories; SHOW COLUMNS FROM url_associations; 

Figure 5.13. You can check the structure of any database or table using SHOW.


This step is optional because MySQL reports on the success of each query as it is entered. But it's always nice to remind yourself of a database's structure.

Tips

  • In my tables, I set the approved column as a CHAR(1) so that it can be either Y or N (you could also make it a TINYINT(1), storing 1 or 0 instead). I decided to use a TIMESTAMP for the date, which can be used like DATETIME but takes up half the space.

  • When you have a primary keyforeign key link (like urls_id in urls to urls_id in url_associations), both columns should be of the same type (in this case, SMALLINT(4) UNSIGNED NOT NULL).

  • Once you've sketched out a database on paper, you could create a series of spreadsheets that reflect the design (or use an application specifically tailored to this end). This file can act both as a good reference to the Web developers working on the database as well as a nice thing to give over to the client when the project is completed.

  • Database and table names are case-sensitive on Unix systems but insensitive under Windows. Column names are always case-insensitive.

  • By strictly adhering to any set of database design principles, you minimize errors that could occur when programming a database interface, as you will in Chapter 7, "Using PHP with MySQL."


To populate the database:

1.

Add some new records to the url_categories table (Figure 5.14).

 INSERT INTO url_categories  (category) VALUES ('General  PHP), ('Web Development'), ('Code  Libraries), ('Programming'),  ('General MySQL), ('General  Database); 

Figure 5.14. Adding records to the url_categories table.


Since the url_associations table relies on values retrieved from both the categories and urls tables, I'll need to populate these first.

2.

Add new records to the urls table (Figure 5.15).

 INSERT INTO urls (url, title,  description) VALUES ('www.php.net', 'PHP: Hypertext  Preprocessor, 'The home page of  PHP...'), ('www.mysql.com', 'MySQL: The  World\'s Most Popular Open Source  Database, 'The home page of  MySQL...'), ('www.w3.org', 'World Wide Web  Consortium, 'The home page of the  W3C...'), ('www.Zend.com', 'Zend', 'The home  page of Zend...'); 

Figure 5.15. Adding records to the urls table.


You can either use the examples I'm entering here or come up with your own (the SQL statements are also available for download from the book's supporting Web site). Be careful with problematic characters (like the apostrophe in the MySQL record), escaping them as needed.

3.

Add new records to the url_associations table (Figure 5.16).

 SELECT * FROM url_categories; SELECT url_id, title FROM urls; INSERT INTO url_associations (url_id, url_category_id, approved) VALUES (1,1,'Y'), (1,4,'Y'), (1,2,'Y'), (2,5,'Y'), (2,2,'Y'), (3,2,'Y'), (4,1,'Y'), (4,3,'Y'); 

Figure 5.16. Normalized databases will often require you to know values from one table in order to enter records into another.


Because two of the fields in the url_associations table (title_id and type_id) relate to values in other tables, I'll select those values before inserting my records. For example, to create a new URL record for www.PHP.net (url_id of 1) under the General PHP category (url_category_id of 1), I would use

 INSERT INTO url_associations  (url_id, url_category_id,  approved) VALUES (1,1,'Y'); 

With your PHP scriptsonce you've created an interface for this database, this process will be much easier, but it's important to comprehend the theory in SQL terms first.

You should also notice here that I'm not entering a value for the date_submitted field. MySQL will automatically insert the current date and time for this TIMESTAMP column.

4.

Repeat Steps 1 through 3 to populate the database.

The rest of the examples in this chapter will use the populated database. You can use the example INSERTs I've defined here, create your own, or use the SQL for this chapter from the book's Web site.



    PHP and MySQL for Dynamic Web Sites. Visual QuickPro Guide
    PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide (2nd Edition)
    ISBN: 0321336577
    EAN: 2147483647
    Year: 2005
    Pages: 166
    Authors: Larry Ullman

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net