A database is simply a collection of related data. For example, your workplace probably has an accounting database that tracks all the income and expenses of the company. If your company is big enough, there's likely another database used in the marketing department that maintains a sales database. Although the marketing department is ultimately interested in the revenues the sales force generates, it also tracks information that wouldn't be directly of interest to the accounting department. Other databases might exist in the maintenance department, another in human resources, and perhaps others are scattered throughout the company.
Regardless of where a database is located, its contents are geared to meet some specific need. Although it would be possible to have one huge monolithic database, it isn't uncommon to find many different databases used by different departments within the company.
What's in a Database?
Obviously, data is what's in the database, but that's not the real issue. What's more important to us is how we interact with a database as programmers. We're less interested in the specific data than in how we can access and present the data. We want to be able to read and write the data and, as such, need to understand how data is organized within the database.
It's the responsibility of the database management system (DBMS) to worry about the details of reading and writing the data to the database. As you probably know, Visual Basic .NET provides the means to use the Access DBMS to read and write database information. However, that doesn't mean that you're limited to using the Access database. Visual Basic .NET can be used to manipulate data of most popular databases, including Microsoft's SQL Server, Oracle, MySQL, and a host of others.
How is it possible for Visual Basic .NET to manipulate the data in such a variety of databases? As you might guess, each company that markets a DBMS is free to design the low-level details of the database in just about any manner it chooses. For example, one manufacturer might elect to represent the Integer data type as a 16-bit quantity, whereas another prefers to use 32 bits. Some might decide that floating-point numbers will use 8 bytes, whereas another decides to use 10 bytes. How does Visual Basic .NET get all of these different-shaped pegs to fit into one round hole?
Fortunately, at an early point in database development history, a group of programmers recognized a need to provide a standardized means by which to access database information. The result was the development of a language written specifically for databases. That languages is called Structured Query Language or SQL. The purpose of SQL (usually pronounced by just saying the letters S-Q-L) is not actually to program the database, but rather to extract information from it. In other words, the initial goal of SQL was to query the database for information; hence the word query in the language's name .
Most modern databases vendors implement SQL for use with their database, and Visual Basic .NET is no exception. Visual Basic .NET provides for a robust subset of SQL that may be used directly with an Access database. You can also connect to other databases with Visual Basic .NET by using DLL (dynamic link libraries) and Open Database Connectivity (ODBC) links. Visual Basic .NET provides a means by which to write a user interface (or front end ) for almost any major DBMS. The actual operations on the database are performed by the ODBC link.
The DLLs take care of the details about how to move the native data found in a database into the Visual Basic .NET environment. Fortunately, we don't need to concern ourselves with how this is done. The details are encapsulated within the libraries, and SQL provides the common denominator through which we can access the data.
I'll have a lot more to say about SQL later in this chapter. However, before we can appreciate what SQL can do for us, we need to understand the basic construction of a modern database system.
The Organizational Structure of a Modern Database
Think of a database as a huge room. Inside the room on the floor are one or more books. One book is titled Customer, another is titled Orders, another is titled Employee, and so on. You pick up the book titled Employee and notice that the first line looks a lot like the guest book at a wedding , with some additional details. The line has an employee's first name, last name, address, Social Security number, the name of the employee's supervisor, plus a bunch of other information about the employee. The next line contains the same information, but for a different employee.
What you discover is that each "book" in the room is actually the equivalent of a database table. A database table is a specific subpart of a database and provides detailed information about one subject or topic of interest in the database. The book you examined in the previous paragraph is simply an abstract view of a database table that contains employee information. If you look around at the titles of the other books in the room, you might find names such as Customers, Orders, Products, Payroll, and so on. Each of these books represents a database table.
Figure 24.1 shows you the tables that are found in the Xtreme.mdb database that's provided with Visual Studio .NET. This is an Access database (that is, mdb = M icrosoft d ata b ase) that we'll use for much of our database discussion. (The actual path to the database depends on where you installed .NET. However, you should be able to find it off the main Visual Studio .NET directory in the Crystal Reports\Samples\Database directory.)
Figure 24.1. The database tables in the Xtreme.mdb database.
As you can see in Figure 24.1, a number of boxes are displayed. Each of these boxes has a title bar that is the table name in the Xtreme.mdb database. Therefore, Customer, Orders, Orders Detail, Employee, Employee Address, and so forth, are all table names that collectively form the Xtreme.mdb database.
In Chapter 23, you learned that each random access record has certain information associated with it. Each piece of information is called a field. In Figure 24.1, each line under the table name is a database field. Just as in random access files, each field represents a piece of information that's collected for a table entry. In the Orders table, for example, there is a field for the Order ID, the Order Amount, the Customer ID, and so on for each piece of information in the table.
In the book analogy used earlier, I said that the book had columns labeled in it. These columns correspond to the fields in the table.
If you fill in the fields for a table, that completed row becomes one record in the table. Therefore, where columns represent the fields in a database table, rows represent the records in the table. Figure 24.2 shows an example using the Supplier table from the Xtreme.mdb database.
Figure 24.2. Sample records from the Supplier table in the Xtreme.mdb database.
If you look at the field names in Figure 24.1, you'll see those repeated as column headers in Figure 24.2. These columns form the fields for the Supplier table. You can also see that there are seven records in the table. In other words, the information for each of the fields has been filled in to form a total of seven records in the Supplier database table.
If you look closely at Figure 24.1, you'll notice that at least one of the field names in each table is displayed using a boldface font. These fields are the primary key fields for each table. A primary key is a field that has an unduplicated value for each record in the table. For example, in the Supplier table, the Supplier ID field is the primary key for the table. This means that no other record in the Supplier table can have the same value for the Supplier ID field. If you look in Figure 24.2, you'll see that no records have the same value for the Supplier ID field.
In some cases, the primary key consists of the combined value of two key fields. If you look at the Orders Detail table in Figure 24.1, you'll notice that the primary key for that table is the combination of the Order ID and Product ID fields. These two table fields, taken together, form a unique combination that isn't duplicated in any other record in the Orders Detail table.
Why do tables even have keys? Because keys represent unique values for records within the table, they can be used to rapidly and efficiently search the records in a table. During the process of creating tables and then adding records to them, the DBMS quietly (and without us knowing about it) constructs indexes to these key fields. When you search a table for a certain record, the DBMS uses these indexes to find the record rapidly.
Another reason for having keys in a database table is because they can be used to form relationships with other tables. If you look at Figure 24.1, you'll see a line drawn from the Customer ID field of the Customer table to the Customer ID field of the Orders table. That is, the primary key in the Customer table is used as a field in the Orders table. This means that a relationship exists between the Customer and Orders tables. Because the Customer ID is a primary key, it has a unique value. On the other hand, any one customer can have more than one order. (Indeed, the company hopes that is the case.) The relationship between the Customer and Orders table, therefore, is often described as One-to-Many. That is, one (unique) customer can have many orders.
We can also see that the Employee ID primary key field from the Employee table is also present in the Orders table. Perhaps the employees earn a commission on each sale, so they must include the Employee ID number in each record to ensure that the employee is paid. Because the Employee ID is the primary key of the Employee table, it is unique. This also means that a One-to-Many relationship exists between the Employee and Orders tables. That is, one (unique) employee is associated with many orders.
We can also view the relationships from the Orders table's point of view, too. From this point of view, there's a Many-to-One relationship between both the Customer and Employee tables. Think about it.
Why have relationships exist between various tables? The reason is because it improves the way we can search, or query, the database. For example, it's relatively simple to use SQL to search the Orders table for all orders placed by a certain customer. Likewise, we can use SQL to search the Orders table and find all sales that were closed by a specific employee. It's the relationships between the tables that make these queries faster and more efficient.
Databases that are built with these relationships in mind are called relational databases. Virtually all modern databases are relational databases, including the Access database engine included in Visual Basic .NET.
Why All Those Tables?
After seeing Figure 24.1, you might be asking yourself, "Why do we need all those tables?" Actually, we could organize the database in many different ways, and using fewer tables is a possibility. You might, however, want to follow that question with several additional questions: "What's the purpose of the database? What do I want to do with it? What information do I need to maintain to fulfill its purpose?" Given what we see in Figure 24.1, it seems that a primary goal of the database is to track company sales.
If tracking company sales is the goal, we might get by with just one table. For example, we could have an Orders table that stores the customer's name and address, what the customer bought and how many, the name of the employee who made the sale, the name and address of the supplier of the product (so that we can reorder more product when our inventory starts to run low), plus a few other supporting pieces of information. If the Orders table has all this information for each sale, we could do away with almost all the other tables.
This database design would work, but it would be similar to me handing you a shovel and saying, "I want you to build a four-lane highway between New York and Philadelphia by yourself." You might get the job done, but it's not the most efficient plan for building a highway . Likewise, our single-table database design plan could use some work.
First of all, if our business has repeat customers, our design duplicates their names and addresses in the table for every purchase they make. The same is true for the employee and suppliers names and addresses. It's much more efficient to store a customer's ID number (usually a 4-byte Long ) in the Orders table than to use the 50 to 100 bytes it could take to store the customer's name and address. The same idea applies to the employee and supplier IDs, too. An added benefit is that Visual Basic .NET can search a list of numbers faster than it can search a list of strings. This means that if your customer number is 524, we can search through the Orders table and extract all sales with customer ID number 524 much faster than we could by searching the table using your name.
In addition to improving our searching abilities by using ID numbers, we also save a tremendous amount of storage space by avoiding duplicate information. Even though disk storage space is pretty cheap these days, the time it takes to plow through a poorly designed database is not.
What we want to do is create a database design that avoids duplicate data whenever possible, but still have the necessary information to accomplish the task at hand. The use of table indexes and keys in Figure 24.1 shows that whoever designed this database has done a pretty good job.
Finding numerous tables in a database is both common and necessary. By creating relationships between the tables, we can provide more efficient ways to search the information in the database and also lessen its storage requirements.