Chapter 12: Access to External Data | Linux Application Development For The Enterprise (Charles River Media Programming)

In this chapter we discuss how to access data stored in an external database system from within Excel. There are essentially two ways to accomplish this: One is to use the wizard provided for this purpose, which in turn calls the program MS Query. (This form of data importation can be controlled via the QueryTable object.) The alternative is to use the new ADO library, which provides a host of objects for reading and editing data.

12.1 Fundamental Concepts About Relational Databases

We have already mentioned that Excel does not itself recognize relations. Nevertheless, it is important for you as an Excel programmer to understand the concept of a relational database: Both the supplementary program MS Query and the ADO library enable access to external databases, which almost always are built on the relational database model.

Relational databases are employed when the data are to be used in several tables and when the tables refer to one another. The essential motivation for using relational databases is to avoid redundancy.

Tip

As a starting point for the explanations that follow we shall use the example database Northwind (file Nwind.mdb), which Microsoft includes with several of its products, for example with the Office suite, with Visual Basic, and with SQL Server. There are quite a few versions of this database. The file is included with the book's sample files.

Note

This section provides a brief introduction to the relational database model, without going into too much detail. The information provided here should be sufficient for you to understand how to extract data from a relational database. If you wish to construct your own databases, you will need to consult further literature on the subject.

Sharing Data Among Several Databases

Let us suppose that you have a small business in which orders are entered manually into an order form and kept in this format. The form includes the following fields:

order data
name and address of the customer
name of the seller
list of ordered articles, consisting of article name, number of items, unit price, total price
total price of the order
additional information

Though this way of proceeding is easy to understand, it has several disadvantages:

If a customer places several orders, his name and address must be written anew each time. If his address changes, all current order forms for this customer must be retrieved and updated. Customer-specific data (such as special arrangements for regular customers) must be stored separately.
If an article occurs in several orders, its name and price must be written each time, although this is information that in any case is stored centrally (in a price list, say). There is a great danger of typographical errors.
If a large number of different articles are ordered, then there will be insufficient space on the order form, and several forms will have to be stapled together.

In changing over to a digital system the order form could, of course, be used with little alteration. However, this is not such a good idea, due to the drawbacks mentioned above. Much better would be to partition the data among several tables:

Table Customers :	customer number, name, address
Table Employees :	employee number, name, etc.
Table Products :	article number, name, unit price, possible discounts
Table Orders :	order number, date, customer number, salesperson number
Table Order Details :	order number, article number, number of units

The definition of individual tables for articles, customers, employees (seller), and orders is probably immediately clear. It allows us to avoid the redundancy problem described above.

Conceptually perhaps most confusing is the table Order Details: Here are stored all the individual orders. An immediate integration of these individual orders (each order consists of several items: 3 pieces of X, 2 pieces of Y, 10 pieces of Z, say) in the orders table is not possible, since the number of items varies: If there were ten places in the orders table, then for most orders seven or eight items would be left empty (a waste of storage space). With other orders ten lines would be too few, and the order will have to be split (redundancy).

The chosen solution of individual tables therefore seems strange , since it is completely unsuitable for doing things "by hand." It is unthinkable to select from the endless list of items those items that correspond to order number 1234 placed on 5 June 1997. A solution optimized for human capacities would be to store within the order itself at least one reference to the order entries to minimize the task of searching. In a database program this is unnecessary, since the data in Order Details can be found rapidly . (Naturally, it is assumed that all access to linked tables is by way of indices. For Order Details the combination of order number and article number serves as the primary index.)

Remarks

Quite often, when a database is being created, one attempts to give the same name to fields of different tables that later will be linked by a relation. This contributes to clarity, but is not a requirement.

There are different types of relations, and these differ in their effect and use fundamentally from one another (inner join, outer join, with and without referential integrity). It is beyond the scope of this book to describe these types in any detail.

Relations Among Several Tables

There exist three basic relations between two tables:

1:1	One-to-one relation between two tables: Each data item in one table corresponds to exactly one data item in the other table. Such relations are rare, because the information of the two tables could as easily be stored in a single table.
1:n	A data item in the first table can occur in several data items in the second table (for example, one seller appears in several orders). There cannot exist multiplicity in the other direction, since an order cannot be executed by more than one salesperson (at least not in this example). Occasionally, one speaks of an n:1 relation that is actually the same as a 1:n relation (the point of view has merely been shifted).
n:m	A data item in one table can appear in several data items in the other table, and conversely (for example, several different articles can occur in one order, while one article can occur in several different orders; another example is that of books and their authors).

In a database the 1:n relations between tables are created with identification numbers . Each salesperson possesses a unique employee ID number in the employees table. (This number is usually called a primary key. In an order the salesperson is referred to by this number. The field in the orders table is called a foreign key, because it refers to an ID in a different table.

For n:m relations a separate, additional, table is necessary, with which the n:m relation is reduced to two 1:n relations. In the following example there exists a single n:m relation between orders and products. The order details table serves as the additional table. The primary key of this table is composed of the order and article number (this combination is unique; in a given order a product cannot occur twice). Figure 12-1 clarifies the relations among the tables.

Figure 12-1: Relations for managing the order data

Relating Data from Different Tables

Figure 12-2 shows, by means of an example, how the data of an order are related : On 7 August 1996, order 10251 was executed. The customer name is stored in the Orders table with the ID VICTE . The Customer table reveals that the customer is, in fact, Victuailles en stock.

Figure 12-2: Data for an order are divided among four tables

What products (and how many of each) has this firm ordered? For this we must search in the table OrderDetails for the ordered items with order number 10251. There we find three items: six of product 22, fifteen of product 57, and twenty of product 65. And what might these products be? This information is contained in the table Products : Product 22 is Gustaf's Kn ckebr d .

It may be that this division of data among several tables appears overly elaborate. But in fact, it produces an enormous advantage:

The most obvious advantage is the result of saving of space: In a real-life application the table Order Details would be by far the largest, which for a medium- size business would have about one hundred thousand entries. But for each line only four numerical quantities need to be stored: order ID, product number, number of items, and unit price. Without the relational model you would have to store for each order the product name, name of the salesperson, customer name, and so on. The storage requirement would multiply, without yielding any advantage. A large portion of the data would be merely redundant.
The relational model helps to avoid errors: If the product name has to be written out for each item each time it is ordered, then it is only a matter of time before typos begin to infiltrate the database.
The relational model makes possible central editing of data: When the address of a customer changes, only the corresponding entry in the Customers table needs to be updated. Without this relational linkage of data you would have to do a global search, which, experience tells us, is fraught with error. (You have certainly experienced this problem yourself: You inform a firm of your new address, and nonetheless many shipments are wrongly addressed. The reason? Your address is stored by the firm in several places. One department has received your notification of change of address, but two other departments continue to use the old address.)

Querying Data

You need to be concerned with the organization of information in the various tables as described above only when you create queries with SQL commands. (SQL stands for Standard Query Language and is a type of programming language for manipulation of databases.) Often, instead of having to formulate queries in SQL code you can use one of a number of convenient tools, such as MS Query, described in the next section. The database program Access possesses a so-called query generator, with which you can easily define queries.

The Northwind Database

The imaginary firm Northwind provides gastronomical specialties to customers all over the world. Figure 12-1 shows only a portion of the tables in the Northwind database. The complete database schema is somewhat more complex and can be seen in Figure 12-3. First we give some information about the construction of the database:

Figure 12-3: Tables and relations of the Northwind example database

In Products is stored information about the origin of each product. Category and supplier data are stored in two additional tables, in order to avoid redundancy. The table Orders contains data on each order. In three 1:n relations reference is made to the Customers table, the Shippers table, and the Employees table. So that arbitrarily many articles can be included in an order, an n:m relation between Orders and Products is established via the intermediate table Order Details .

The database contains about eighty products in eight categories from thirty suppliers. There are eight hundred orders from ninety customers stored. There are three shipping firms, and the employee count is nine.

The table order details contains, among other things, the data field unitprice . This field appears to contradict the rule for the construction of a relational database in that it is redundant (the unit price can be obtained from the product ID in the associated table products ). A possible reason for the unit being stored a second time is to make it easier to deal with price changes: When the price of a product is changed, this change does not affect the record of orders already placed in order details .