Lesson 1:Introduction to Database Design | MCSE Training Kit (Exam 70-229): Microsoft SQL Server(TM) 2000 Database Design and Implementation (Pro Certification)

3 4

Before you can develop a logical data model—and subsequently create a database and the objects it contains—you should understand the fundamental concepts of database design. In addition, you should be familiar with the basic components of a database and how those components work together to provide efficient data storage and to provide access to those who require specific types of data, in specific formats, from the database. This lesson introduces you to the basic components of a database and the terminology that describes those components. The lesson then discusses normalization and entity relationships—two concepts that are integral to understanding relational database design.

After this lesson, you will be able to:

Describe the main components of a relational database.
Describe the process of normalization and normalize tables in a database design.
Identify the relationships that exist between entities.

Estimated lesson time: 30 minutes

Components of a SQL Server Database

A SQL Server database consists of a collection of tables that store specific sets of structured data. A table (entity) contains a collection of rows (tuples) and columns (attributes). Each column in the table is designed to store a certain type of information (for example, dates, names, dollar amounts, or numbers). Tables have several types of controls (constraints, rules, triggers, defaults, and customized user data types) that ensure the validity of the data. Tables can have indexes (similar to those in books) that enable rows to be found quickly. You can add declarative referential integrity constraints to the tables to ensure consistency between interrelated data in different tables. A database can also store procedures that use Transact-SQL programming code to perform operations with the data in the database, such as storing views that provide customized access to table data.

For example, suppose that you create a database named MyCoDB to manage the data in your company. In the MyCoDB database, you create a table named Employees to store information about each employee, and the table contains columns named EmpID, LastName, FirstName, Dept, and Title. To ensure that no two employees share the same EmpID and that the Dept column contains only valid numbers for the departments in your company, you must add constraints to the table. Because you want to quickly find the data for an employee based on the employee's ID or last name, you define indexes. For each employee, you will have to add a row of data to the Employees table, so you create a stored procedure named AddEmployee that is customized to accept the data values for a new employee and that performs the operation of adding the row to the Employees table. You might need a departmental summary of employees, in which case you define a view named DeptEmps that combines data from the Departments and Employees tables and produces the output. Figure 3.1 shows the parts of the MyCoDB database.

figure 3.1-the mycodb database, the employees table, and the deptemps view.

Figure 3.1 The MyCoDB database, the Employees table, and the DeptEmps view.

Normalizing a Database Design

Optimizing a database design includes the process of normalization. Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. Having a greater number of narrow tables (with fewer columns) is characteristic of a normalized database; having a few wide tables (with more columns) is characteristic of a denormalized database.

Reasonable normalization often improves performance. When useful indexes are available, the SQL Server 2000 query optimizer is efficient at selecting rapid, efficient joins between tables.

As normalization increases, so do the number and complexity of joins required to retrieve data. Too many complex relational joins between too many tables can hinder performance. Reasonable normalization should include few regularly executed queries that use joins involving more than four tables.

A database that is used primarily for decision support (as opposed to update-intensive transaction processing) might not have redundant updates and might be more understandable and efficient for queries if the design is not fully normalized. Nevertheless, data that is not normalized is a more common design problem in database applications than over-normalized data. Starting with a normalized design and then selectively denormalizing tables for specific reasons is a good strategy.

Sometimes the logical database design is already fixed, however, and total redesign is not feasible. But even then, it might be possible to normalize a large table selectively into several smaller tables. If the database is accessed through stored procedures, this schema change could take place without affecting applications. If not, it might be possible to create a view that hides the schema change from the applications.

Achieving a Well-Designed Database

In relational database design theory, normalization rules identify certain attributes that must be present or absent in a well-designed database. These rules can become quite complicated and go well beyond the scope of this book. There are a few rules that can help you achieve a sound database design, however. A table should have an identifier, it should store data for only a single type of entity, it should avoid nullable columns, and it should not have repeating values or columns.

A Table Should Have an Identifier

The fundamental rule of database design theory is that each table should have a unique row identifier, which is a column or a set of columns used to distinguish any single record from every other record in the table. Each table should have an ID column, and no two records can share the same ID value. The column (or columns) that serves as the unique row identifier for a table is the primary key of the table.

In Figure 3.2, the Employees table does not include a column that uniquely identifies each row within the table. Notice that the name David Mendlen appears twice. Because there is no unique identifier in this table, there is no way to easily distinguish one row from another. This situation could be worse if both employees worked in the same department and had the same job title.

figure 3.2-a table that has no unique identifier.

Figure 3.2 A table that has no unique identifier.

You can normalize the table by adding a column that uniquely identifies each row, as shown in Figure 3.3. Notice that each instance of David Mendlen has a unique EmpID value.

figure 3.3-a normalized table that has a unique identifier.

Figure 3.3 A normalized table that has a unique identifier.

A Table Should Store Data for Only a Single Type of Entity

Attempting to store too much information in a table can prevent the efficient and reliable management of the data in the table. For example, in Figure 3.4, the Books table includes information about each book's publisher.

figure 3.4-a table that includes title and publisher information.

Figure 3.4 A table that includes title and publisher information.

Although it is possible to have columns that contain information for both the book and the publisher in the same table, this design leads to several problems. The publisher information must be added and stored redundantly for each book published by a given publisher. This information uses extra storage space in the database. If the address for the publisher changes, the change must be made for each book. Furthermore, if the last book for a publisher is removed from the Books table, the information for that publisher is lost.

In a normalized database, the information about books and publishers would be stored in at least two tables: one for book titles and one for publishers (as shown in Figure 3.5).

figure 3.5-a normalized database design that includes a table for book titles and a table for publisher information.

Figure 3.5 A normalized database design that includes a table for book titles and a table for publisher information.

The information about the publisher now has to be entered only once and then linked to each book. If the publisher information changes, it must be changed in only one place, and the publisher information will be there even if the publisher has no books in the database.

A Table Should Avoid Nullable Columns

Tables can have columns defined to allow null values. A null value indicates that the record has no value for that attribute. Although it can be useful to allow null values in isolated cases, it is best to use them sparingly because they require special handling that increases the complexity of data operations. If you have a table that has several nullable columns and several of the rows have null values in the columns, you should consider placing these columns in another table linked to the primary table. Storing the data in two separate tables enables the primary table to be simple in design but capable of accommodating the occasional need for storing this information.

A Table Should Not Have Repeating Values or Columns

A table should not contain a list of values for a specific piece of information. For example, suppose that you want to track book titles and their authors. Although most books might have only one author, many of them might have two or more. If there is only one column in the Books table for the name of the author, this situation presents a problem. One solution is to store the name of both authors in the column, but showing a list of individual authors would then be difficult. Another solution is to change the structure of the table to add another column for the name of the second author, but this solution accommodates only two authors. Yet another column must be added if a book has three authors.

Figure 3.6 shows two methods of handling multiple authors per title.

figure 3.6-two methods for structuring the books table.

Figure 3.6 Two methods for structuring the Books table.

If you find that you need to store a list of values in a single column or if you have multiple columns for a single piece of data (Author1, Author2, and so on), you should consider placing the duplicated data in another table with a link to the primary table. In the case of the Books table, you could create an additional primary table for authors and then create a third table that matches books to authors and accommodates repeating values, as shown in Figure 3.7. This design enables any number of authors for a book without modifying the definition of the table and allocates no unused storage space for books that have a single author.

figure 3.7-three tables that store information about books and their authors.

Figure 3.7 Three tables that store information about books and their authors.

Entity Relationships

In a relational database, relationships help to prevent redundant data. A relationship works by matching data in key columns—usually columns that have the same name in both tables. In most cases, the relationship matches the primary key from one table, which provides a unique identifier for each row with an entry in the foreign key in the other table. Primary keys and foreign keys are discussed in more detail in Chapter 5, "Implementing Data Integrity."

There are three types of relationships between tables: one-to-one, one-to-many, and many-to-many. The type of relationship depends on how the related columns are defined.

One-to-One Relationships

In a one-to-one relationship, a row in table A can have no more than one matching row in table B (and vice versa). A one-to-one relationship is created if both of the related columns are primary keys or have unique constraints. This type of relationship is not common, however, because information related in this way would usually be in one table.

One-to-Many Relationships

A one-to-many relationship is the most common type of relationship. In this type of relationship, a row in table A can have many matching rows in table B, but a row in table B can have only one matching row in table A. For example, the Publishers and Titles tables mentioned previously have a one-to-many relationship. Each publisher produces many titles, but each title comes from only one publisher. A one-to-many relationship is created if only one of the related columns is a primary key or has a unique constraint.

Many-to-Many Relationships

In a many-to-many relationship, a row in table A can have many matching rows in table B (and vice versa). You create such a relationship by defining a third table, called a junction table, whose primary key consists of the foreign keys from both table A and table B. In Figures 3-6 and 3-7, you saw how the author information could be separated into another table. The Books table and the Authors table have a many-to-many relationship. Each of these tables has a one-to-many relationship with the BookAuthor table, which serves as the junction table between the two primary tables.

Exercise 1: Exploring the Basic Concepts of Database Design

In this exercise, you will view the primary objects that are contained in a SQL Server database. You will then apply the principles of normalization to a database design and identify the relationships that exist between entities within a database. To perform this exercise, you should be logged into your Windows 2000 Server computer as Administrator. You will use SQL Query Analyzer and SQL Server Enterprise Manager for part of the exercise, and you will need paper and a pencil to complete the rest of the exercise.

To identify the main components of a SQL Server database

Open SQL Query Analyzer and use Windows authentication to log on to SQL Server.
Open the Object Browser window if it is not already displayed.

The Object Browser window displays a hierarchical tree of database objects contained in the instance of SQL Server that you are logged on to.

Review the list of database objects that appear in the tree. Notice that the Northwind and Pubs databases appear as objects in the tree. The tree also includes a list of common objects, such as aggregate functions and system data types.
Expand the Northwind node.

A list of object types appears. The list includes users tables, system tables, stored procedures, functions, and user-defined data types. Each category contains objects specific to the Northwind database.

Expand the Users Tables node.

A list of users tables in the Northwind database appears. Notice that each table object is preceded by the object owner (which, in this case, is dbo).

To view the contents of a table

Right-click dbo.Categories, then click Open.

The Open Table window appears and displays the contents of the Categories table.

Review the columns and rows within the table.

What are the column names (attributes) in the Categories table, and how many rows of data are displayed?

Close the Open Table window.
Review each users table object to determine the columns in each one. Be sure to close the Open Table window after you review that table. You can also review the system tables to view their attributes.

To use the sp_help system stored procedure to view table information

In the Query window, execute the following Transact-SQL statement:

 Use Northwind GO sp_help

A list of all objects in the Northwind database appears on the Grids tab of the Results pane.

NOTE

After the result appears on the Grids tab, click within the Results pane. A second scroll bar will appear, and you can scroll through all of the objects in the database.

Close SQL Query Analyzer.

To normalize a database design

Review the following table:


FirstName	LastName	City
Elizabeth	Boyle	Cleveland
Rob	Caron	Chicago
Neil	Smith	Denver
Denise	Smith	Boston

Keeping in mind the table's design, apply the four basic rules that you should follow when designing a database. The rules are listed here for your convenience:
- A table should have an identifier.
- A table should store data for only a single type of entity.
- A table should avoid nullable columns.
- A table should not have repeating values or columns.

Which rule is being violated in the Customers table?

Modify the table's design so that it adheres to the basic rules of normalization. Use your paper and pencil to draw the table and its data.

How should you modify the data?

Review the following table:


CustID	FirstName	LastName	City	PurchaseType
101	Elizabeth	Boyle	Cleveland	Books, CDs
102	Rob	Caron	Chicago	Books, videos
103	Neil	Smith	Denver	CDs, videos, DVDs
104	Denise	Smith	Boston	Books

The PurchaseType column contains a list of the types of products that the customer has purchased.

Determine which rule(s) of normalization is being violated in the Customers table.
Modify the database design so that it adheres to the rules of normalization.

How should you modify the current design?

Review the following table:


CustID	FirstName	LastName	City	Purchase	Manufacturer	ManContact
101	Elizabeth	Boyle	Cleveland	Spring candles	Pavlova, Ltd.	Ian Devling
102	Rob	Caron	Chicago	Sandalwood incense	Mayumi's	Mayumi Ohno
103	Neil	Smith	Denver	Sage	Pavlova, Ltd	Ian Devling
104	Denise	Smith	Boston	Hanging crystal	Leka Trading	Chandra Leka

Determine which rule(s) of normalization is being violated.
Modify the database design so that it adheres to the rules of normalization.

How should you modify the current design?

To generate a database diagram in SQL Server

On the Start menu, point to Programs and then point to Microsoft SQL Server. Then, click Enterprise Manager.

SQL Server Enterprise Manager appears.

In the Tree tab, expand the Microsoft SQL Servers node, expand SQL Server Group, expand the node for your local computer, expand Databases, and then expand the Pubs database.

A list of object types in the Pubs database should now be displayed.

Right-click Diagrams, then click New Database Diagram.

The Create Database Diagram wizard appears.

Click Next.

The Select Tables To Be Added screen appears. The list of tables in the Pubs database appears in the left window.

Select the Add Selected Tables Automatically checkbox.

The How Many Levels Of Related Tables option becomes active.

Change the level in the How Many Levels Of Related Tables list box to 2.
Select the Authors table from the Available Tables list, then click Add.

The Authors table, TitleAuthor table, and Titles table are added to the Tables To Add To Diagram list.

Click Next.

The Completing The Create Database Diagram Wizard screen appears.

Click Finish.

The diagram is generated and appears in a new window. The diagram includes the Authors table, the TitleAuthor table, and the Titles table.

To view entity relationships in a database diagram

If the tables overlap, click the Arrange Tables button on the toolbar.

The tables should now be arranged so that you can see the relationships between them.

If necessary, click the Zoom button on the toolbar and select a size that would make the relationships easier to identify. Maximize the window if necessary.
View the connector that links the Authors table to the TitleAuthor table and the connector that links that Titles table to the TitleAuthor table.

The connector indicates that a relationship exists between the tables. Notice that there is no connector between the Author table and the Titles table because no direct relationship exists between the tables.

At one end of the connecter is a key, which indicates one. The other side of the connector is an infinity sign, which indicates many. These symbols tell you that a one-to-many relationship exists between the Authors table and the TitleAuthor table and that a one-to-many relationship exists between the Titles table and the TitleAuthor table.

What is the implied relationship between the Titles table and the Authors table?

What type of table is the TitleAuthor table (in terms of the implied relationship between Titles and Authors)?

Close the diagram window without saving the changes that you made, then close SQL Server Enterprise Manager.

NOTE

As an extra exercise, try creating diagrams for other users tables within the Pubs database and for tables in the Northwind database. You can also try selecting levels greater than two in the How Many Levels Of Related Tables list box. Experiment with the different levels and tables.

Lesson Summary

A SQL Server database consists of a collection of tables that store a specific set of structured data. A table contains a collection of rows and columns. Each column in the table is designed to store a certain type of information (for example, dates, names, dollar amounts, or numbers). The logical design of the database, including the tables and the relationships between them, is the core of an optimized, relational database. Optimizing a database design includes the process of normalization. Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. As normalization increases, so do the number and complexity of joins that are required to retrieve data. Normalization rules identify certain attributes that must be present or absent in a well-designed database. Tables in a normalized database should have an identifier, should only store data for a single type of entity, should avoid nullable columns, and should not have repeating values or columns. You can create relationships between your tables in a database diagram to show how the columns in one table are linked to columns in another table. In a relational database, relationships help to prevent redundant data. A relationship works by matching data in key columns—usually columns that have the same name in both tables. There are three types of relationships between tables: one-to-one, one-to-many, and many-to-many. The type of relationship depends on how you define the related columns.