Reliving Database History

Databases form the foundation of world commerce and knowledge distribution. Without databases, there would be no World Wide Web, automatic teller machines, credit/debit cards, or online airline reservation systems. Newsgathering organizations, research institutions, universities, and libraries would be unable to categorize and selectively disseminate their vast store of current and historical information. It's difficult to imagine today a world without a network of enormous databases, many of which probably contain a substantial amount of your personal data that you don't want to be easily available to others.

The Early History of Databases

The forerunner of today's databases consisted of stacks of machine-readable punched cards, which Herman Hollerith used to record the 1890 U.S. census. Hollerith formed the Computing-Tabulating-Recording Company, which later became International Business Machines. From 1900 to the mid-1950s, punched cards were the primary form of business data storage and retrieval, and IBM was the primary supplier of equipment to combine and sort (collate) punched cards, and print reports based on punched-card data.

Note

graphics/globe.gif

Jim Gray's article, "Data Management: Past, Present, and Future," which is available as a Microsoft Word document at http://research.microsoft.com/~gray/DB_History.doc, offers a more detailed history of data processing systems. Dr. Gray is a senior researcher and the manager of Microsoft's Bay Area Research Center (BARC).


The development of large computer-maintained databases originally called databanks is a post-World War II phenomenon. Mainframes replaced punched cards with high-capacity magnetic tape drives to store large amounts of data. The first databases were built on the hierarchical and network models, which were well suited to the mainframe computers of the 1950s. Hierarchical databases use parent-child relationships to define data structures, whose diagrams resemble business organization charts or an inverted tree with it's root at the top of the hierarchy. Network databases allow relaxation of the rules of hierarchical data structures by defining additional relationships between data items. Hierarchical and network databases ordinarily are self-contained and aren't easy to link with other external databases over a wide-area network (WAN).

Note

Hierarchical databases remain alive and well in the 21st century. For example, data storage for Windows 2000's Active Directory and Microsoft Exchange Server is derived from the hierarchical version of Access's relational Jet databases. The name Jet comes from the original Access database engine called Joint Engine Technology.

The Internet's Domain Name System (DNS) is a collection of hierarchical databases for translating character-based Internet domain names into numerical Internet Protocol (IP) addresses. The DNS database is called a distributed database, because its data is held by a global network of thousands of computers.


Early databases used batch processing for data entry and retrieval. Keypunch operators typed data from documents, such as incoming orders. At night, other operators collated the day's batch of punched cards, updated the information stored on magnetic tape, and produced reports. Many smaller merchants continue to use batch processing of customer's credit-card purchases, despite the availability of terminals that permit almost instantaneous processing of credit- and debit-card transactions.

The Relational Database Model

graphics/globe.gif

Dr. E. F. Codd, an employee of IBM Corporation, published "A Relational Model of Data for Large Shared Databanks" in a journal of the Association for Computing Machinery (ACM) in June 1970. A partial copy of the paper is available at http://www.acm.org/classics/nov95/. Dr. Codd's specialty was a branch of mathematics called set theory, which includes the concept of relations. He defined a relation as a named set of tuples (records or rows) that have attributes (fields or columns). One of the attributes must contain a unique value to identify each tuple. The common term for relation is a table whose presentation to the user is similar to that of a spreadsheet.

Note

This book uses the terms field and record when referring to tables, and columns and rows when discussing data derived from tables, such as the views and query result sets described later in this chapter.


Relational databases solve a serious problem associated with earlier database types. Hierarchical and network databases define sets of data and explicit links between each data set as parent-child or owner-member, respectively. To extract information from these databases, programmers had to know the structure of the entire database. Complex programs in COBOL or other mainframe computer languages are needed to navigate through the hierarchy or network and extract information into a format understandable by users.

Dr. Codd's objective was to simplify the process of extracting formatted information and make adding or altering data easier by eliminating complex navigational programming. During the 1970s, Dr. Codd and others developed a comparatively simple language, Structured Query Language (SQL), for creating, manipulating, and retrieving relational data. With a few hours of training, ordinary database users could write SQL statements to define simple information needs and bypass the delays inherent in the database programming process. SQL, which was first standardized in 1985, now is the lingua franca of database programming, and all commercial database products support SQL.

Note

The most recent SQL standard, SQL-92, was published by the American National Standards Institute (ANSI) in 1992. Few, if any, commercial relational database management systems (RDBMSs) today fully support the entire SQL-92 standard.

RDBMS competitors have erected an SQL Tower of Babel by adding non-standard extensions to the language. For example, Microsoft's Transact-SQL (T-SQL) for SQL Server, which is the subject of Chapter 21, "Moving from Jet Queries to Transact-SQL," has many proprietary keywords and features. Oracle Corporation's Oracle: SQL and PL/SQL dialects also have proprietary SQL extensions.


Client/Server and Desktop RDBMSs

In the early database era, the most common presentation of data took the form of lengthy reports processed by centralized, high-speed impact printers on fan-folded paper. The next step was to present data to the user on green-screen video terminals, often having small printers attached, which were connected to mainframe databases. As use of personal computers gained momentum, terminal emulator cards enabled PCs to substitute for mainframe terminals. Mainframe-scale relational databases, such as IBM's DB2, began to supplement and later replace hierarchical and network databases, but terminals continued to be the primary means of data entry and retrieval.

Oracle, Ingres, Informix, Sybase, and other software firms developed relational databases for lower-cost minicomputers, most of which ran various flavors of the Unix operating system. Terminals continued to be the primary data entry and display systems for multiuser Unix databases.

The next step was the advent of early PC-based flat-file managers and relational database management systems (RDBMSs). Early flat-file database managers, typified by Jim Button's PCFile for DOS (1981) and Claris FileMaker for Macintosh (1988) and Windows (1992), used a single table to store data and offered few advantages over storing data in a spreadsheet. The first desktop RDBMSs such as dBASE, Clipper, FoxBase, and Paradox ran under DOS and didn't support SQL. These products later became available in multiuser versions, adopted SQL features, and eventually migrated to Windows. Access 1.0, which Microsoft introduced in November 1992, rapidly eclipsed its DOS and Windows competitors by virtue of Access's combination of native SQL support, versatility, and ease of use.

Note

Microsoft celebrated Access's 10th birthday with a "Happy Anniversary Microsoft Access" subsite at http://www.microsoft.com/office/access/10years/default.asp. You can open a Macromedia Flash animated timeline that describes the new features in each of Access's seven prior versions. There's also a link to "war stories" told by long-time Access developers, who Microsoft calls "Access Heroes."


PC-based desktop RDBMSs are classified as shared-file systems, because they store their data in conventional files that multiple users can share on a network. One of Access's initial attractions for users and developers was its capability to store all application objects forms, reports, and programming code and tables for a database application in a single .mdb file. FoxPro, dBASE, Clipper, and Paradox require a multitude of individual files to store application and data objects. Today, almost all multiuser Access applications are divided into a front-end .mdb file, which contains the application objects and links to a back-end database that holds the data. Each user has a copy of the front-end .mdb and shares connections to a single back-end .mdb on a peer Windows XP/2000 workstation, or Windows 2000+ server.

Client/server RDBMSs have an architecture similar to Access's front-end/back-end shared-file multiuser configuration. What differentiates client/server from shared-file architecture is that the RDBMS on the server handles most of the data processing activity. The client front end provides a graphical user interface (GUI) for data entry, display, and reporting. Only SQL statements and the specific data requested by the user pass over the network. Client/server databases traditionally run on network operating systems, such as Windows 2000+ or Unix, and are much more robust than shared-file databases, especially for applications in which many users make simultaneous additions, changes, and deletions to the database. All commercial data-driven Web applications use client/server databases.

Since version 1.0, Access has had the capability to connect to client/server databases by linking their tables to a Jet database. Linking lets you treat client/server tables almost as if they were native Jet tables. Linking uses Microsoft's widely accepted Open Database Connectivity (ODBC) standard, and Access 2003 includes an ODBC driver for SQL Server and Oracle databases. You can purchase licenses for ODBC drivers that support other Unix RDBMSs, such as Sybase or Informix, from the database supplier or third parties. Chapter 19, "Linking Access Front-Ends to Jet and Client/Server Tables," describes the process of linking Jet and SQL Server databases. Although Chapter 19 uses SQL Server 2000 for its examples, the linking procedure is the same or at least similar for other client/server RDBMSs.

Note

Prior to Access 2000, Jet was Access's standard database engine, so the terms Access database and Jet database were interchangeable. Microsoft now considers SQL Server to be its strategic RDBMS, which means that SQL Server gets continuing development funds and Jet doesn't. Jet 4.0, which is included with Access 2003, is the final version and is headed toward retirement. To reflect this change, this edition uses the terms Jet database and SQL Server database. Unless otherwise noted, SQL Server refers to all versions except the Windows CE edition.


Access data projects (ADP) and the Microsoft SQL Server 2000 Desktop Engine (MSDE) combine to make Access 2003 a versatile tool for designing and testing client/server databases, and creating advanced data entry and reporting applications. You can start with a conventional Jet database and later use Access's Upsizing Wizard to convert the .mdb file(s) to an .adp file to hold application objects and an SQL Server 2000 back-end database. Access 2003's Upsizing Wizard incorporated many improvements to the Access 2000 and earlier versions, but Access 2003's Wizard is the same as 2002's. Despite the upgraded wizardry, you're likely to need to make changes to queries to accommodate differences between Jet 4.0 and SQL Server 2000.

For an example of differences between Jet and SQL Server SQL syntax that affects the upsizing process, see "Displaying Data with Queries and Views," p. 142.


The ability to save ADP forms and, especially, reports to data access pages (DAP) greatly simplifies delivering up-to-date information over corporate intranets. ADP rely on SQL Server and don't directly support Oracle or other client/server databases. Fortunately, you can link SQL Server to other client/server databases. Instead of ODBC drivers, SQL Server requires OLE DB drivers to link to other databases. Section VI, "Publishing Data to Intranets and the Internet," shows you how to take advantage of client/server ADP and DAP.



Special Edition Using Microsoft Office Access 2003
Special Edition Using Microsoft Office Access 2003
ISBN: 0789729520
EAN: 2147483647
Year: 2005
Pages: 417

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net