11.1 Fundamentals


11.1 Fundamentals

Experience tells us that the majority of all Excel programs are in reality database programs. Microsoft has taken note of this and has enlarged the repertoire of database functions from version to version. In version 5 this meant revised functions for setting up pivot tables and the supplementary program MS Query. In version 7 we had the template wizard (see Chapter 9) and the DAO library, which for the first time offered programmers full access to the database functions of Access. With Excel 97 DAO was improved (ODBCDirect). Excel 2000 has finally replaced DAO with the totally new ADO library. At the same time, we have the completely revised functions for data import (MS Query and the QueryTable object) as well as for data analysis (pivot tables). (From the point of view of databases, Excel 2002 offers no fundamentally new functions.)

This section attempts to collect in as compact a form as possible basic information on the subject of databases. If you wish to use Excel for nothing more than to sort a table of 100 rows quickly (that, too, is a database), you can confidently proceed to the next sections. But if you have more ambitious database applications in mind, then the time that you spend now reading this section is guaranteed to be a good investment.

This section also wishes to warn you against thoughtlessly using Excel (to call a spade a spade, misusing Excel) for applications for which it was not designed. The following sections explain why Excel is poorly suited as a database program for advanced applications. Even if you wish to continue to use Excel only to manage smaller data sets, you should be aware of its limitations.

Introduction: What Is a Database?

The concept database has unfortunately acquired a number of different meanings depending on the context in which it is used. The lowest common denominator is perhaps that every database is concerned with a collection of associated, ordered data.

  • In Excel every table or list can be considered a database. Excel makes available commands for working with such databases in the DATA menu (Sort, Group, etc.).

  • In database systems such as Access or Oracle a database is stored in its own database file. There, in addition to the data, arranged in several tables, are query functions and perhaps program code. In such database systems large data sets can be processed more securely and efficiently than in Excel.

  • Occasionally, the database system will itself be called a database (and not the data stored within it). This shorthand can result in considerable confusion.

Of course, there are great differences among different database systems, relating to the scope of the data to be managed, data security, efficiency over networks, and programming. In essence there are two categories:

  • Desktop systems and file server systems. In such systems the data are stored in a network in a directly accessible file. For small database applications this is very convenient and familiar. However, data access in large networks can be very slow. Typical representatives are Access, FoxPro, and Paradox.

  • Client/Server systems. In this case there is a separation between the program that manages the data (the server) and the programs that access the data (clients). This separation makes possible significantly greater efficiency and security. However, the programming of clients (that is, of database applications) is more difficult. Popular database servers are Oracle, Microsoft SQL server, IBM DB/2, Informix, and Sybase. For programming of clients Visual Basic or Delphi can be used. As you will see in the next chapter, your Excel application can also be a client for a database server.

Pointer  

Please note that the information collected here is greatly abbreviated due to space limitations. If you wish to plunge more deeply into the subject of databases, you will need to consult the literature on the subject. There are good books on Access, but also books that deal with ADO programming under VB and VBA.

A Brief Database Glossary

The following paragraphs describe some important concepts in the world of databases to facilitate unambiguous communication. These definitions are of significance to you above all if you wish to use the supplementary program MS Query for reading external data or the ADO library for database programming.

Databases are usually organized into several tables . A typical table might contain, for example, data on all customers of a business: customer number, name , address, telephone number, etc. Each row of the table is called a data record . Individual elements of a row are called data fields .

Practically all currently available databases are based on the relational model . This model makes it possible to divide the entire range of data into several tables that are linked by relations. This results in a number of advantages in relation to efficiency, data security, and the avoidance of redundancy. The fundamentals of relational databases will be discussed at the beginning of the next chapter, which deals with external data.

Tables usually contain data in unordered form (that is, in the order in which the data were first stored). To make possible the speediest search of the data, there are stored in addition to the actual data certain so-called indexes. An index , or key , contains essentially the information as to where data are to be found within a table.

To create a sorted list from one or more tables that contain only data that satisfy certain criteria, one employs queries . The result of a query is thus a set of data collected according to certain criteria. Queries are processed more quickly than otherwise if the underlying data have been ordered by indexes.

The formulation of queries often proceeds interactively with the database program or in MS Query. Internally or in program code, queries are formulated in SQL , which stands for "standard query language" and represents a standard for formulating queries. SQL is thus a programming language.

For programming database applications VB and VBA can be used (among other possibilities). For this reason Excel is also suited for accessing external databases, regardless of whether this is managed with Access, Oracle, or the SQL server. The database-specific functions are made available in the ADO library via objects, methods , and properties (ADO stands for "ActiveX data objects").

Excel Versus Database Systems

Differences Between Worksheet Calculational Programs and Genuine Database Systems

The basic idea of a worksheet calculational program like Excel is that all data (thus the entire Excel file) should be loaded into RAM and reside there for instant access. This has the advantage that data processing goes very quickly, but it has the disadvantage that the size of the data set is limited, as much as by the amount of memory available as by the limits on Excel worksheets, which is currently at 65,000 lines.

A completely different concept is followed by genuine database systems. The data generally reside in a database file. In RAM only the smallest possible portion of the data is loaded. Every change in the data is immediately saved (without an explicit command such as File Save being required). The advantage is that in this way you can process data sets larger than 100 megabytes. The drawback is that most operations are carried out more slowly than in Excel.

This rather schematic comparison establishes two significant differences between worksheet calculational programs and database programs: First, worksheet programs require a great deal of RAM even to process small amounts of data, while in the case of genuine database systems the data sets can be much larger than available RAM. Second, a worksheet calculational program can save only an entire file, which in the case of large data sets takes so long that it is seldom carried out. In contrast, in a database system each small change is immediately saved. This happens so quickly that you usually don't even notice it. For this reason (and many others as well) data in real database systems are better protected than in Excel.

Features of Genuine Database Systems That Are Lacking in Excel

  • Excel has no indices: Indices make possible an efficient search in large data sets without having first to sort the data.

  • Excel has no relations: The relational model offers various advantages with respect to efficiency, data security, and avoiding redundancy.

  • Excel lacks security mechanisms against deletion or alteration of data: Data security is the most important criterion for a database after speed of access. Every database program is equipped with a set of security mechanisms that prevent the accidental alteration or deletion of data. In Excel such security mechanisms are almost entirely lacking. It is particularly critical in Excel that it is relatively easy to destroy the unity of data series. By this it is meant that after the careless insertion or deletion of some cells , for example, the entries no longer match: Next to the name of X is now the address of Y .

  • Excel has no report generator: In Excel you can print out large tables, but there is not a sufficiently flexible command to format the printout for a particular purpose. Just try to print labels in Excel sometime!

  • Excel is not designed to manage networked data: An elementary sign of a database program is its ability to work over a network. The data can be centrally managed on one computer, and several users can access it simultaneously . The database program makes sure that no conflicts arise in this quasi-simultaneous data access. Excel can be run over a network, but the management of central, common data that can be coordinated to be processed on several computers is possible only under severe restrictions.

Why Is Excel Nonetheless Used as a Database System?

  • Excel is simple to use. The learning curve with a new database program is steep.

  • Excel is available. Another program would first have to be created.

  • Excel has elementary database functions available and is therefore quite suitable for simple database applications. (The problem is just that over the course of time most small applications grow larger!)

  • In its use as a database system Excel has all of its worksheet functions available. Genuine database systems cannot compete with Excel in this regard.

  • Excel is the standard spreadsheet calculational program in the office sector. There is no comparable standard in the database sector. For this reason Excel files are excellently suited for data exchange among colleagues within a business or between firms.

For all of these reasons there is hardly an Excel user who would shrink from creating small database applications with Excel. Actually, there is almost no other program that so simply and intuitively can be used to create a small address file.

The Consequence: External Data Storage, Data Analysis with Excel

In spite of the limitations of Excel cited above, it is not necessary that every application involving large amounts of data be realized with a database program. Excel offers a number of advantages that database systems and programming languages lack for the development of database clients.

The optimal solution is often to take the best of both worlds : If you have large data sets, then use a proper database system to store the data. But this does not keep you from carrying out various applications such as data entry, data analysis, and printing of charts in Excel or from creating the needed Excel programs for these tasks . Excel offers optimal conditions for such mixed solutions.

  • The auxiliary program MS-Query enables interactive access to almost every database system.

  • The object library ADO enables database access via program code as well. With the possibilities offered by MS-Query, the ADO library can also be used to edit data or to store new data series.

When Should Databases Be Used Directly in Excel?

After so many critical observations you are probably at the point of uncertainty as to whether Excel ever should be used for data management. But that would be taking things a bit too far. Excel is completely suited for managing data, under the following circumstances:

  • The data sets are small (tables up to 1000 rows are more or less unproblematic).

  • The data are simply structured (no necessity for the relational model, minimal redundancy).

  • There is no need for simultaneous processing of data over a network.

There is thus (almost) nothing to be said against setting up a small address database, a literature database, or a simple cash account in Excel. However, it is not a good idea to attempt a full-fledged bookkeeping program or an extensive stock management system. And it would be a matter of gross negligence to manage vital information, such as in the field of medicine, in Excel.




Definitive Guide to Excel VBA
Linux Application Development For The Enterprise (Charles River Media Programming)
ISBN: 1584502533
EAN: 2147483647
Year: 2002
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net