Lesson 1: Data Access with Visual C | Desktop Applications with Microsoft Visual C++ 6.0 MCSD Training Kit

This lesson will explain the different data access interfaces available to the Visual C++ application developer, and also explain circumstances in which each interface should be used. Because these technologies can connect to relational databases, you should have a basic understanding of relational database theory and Structured Query Language (SQL), the host language for controlling and interacting with an RDBMS. This lesson provides only a cursory introduction to relational database theory and SQL.

After this lesson, you will be able to:

Describe the development of Microsoft's strategy for data access.

Describe the data access interfaces available and determine the appropriate technology to use for a given situation.

Describe the basic architecture of a relational database and how to use simple SQL statements to retrieve data from tables within an RDBMS.

Estimated lesson time: 40 minutes

Data Access Interfaces

The number of data access interfaces available for Microsoft Windows applications can seem overwhelming. Which of the cryptically named technologies—DAO, ODBC, RDO, UDA, OLE DB or ADO—should you use to meet your specific application requirements?

Your decision may be made easier by an explanation of the historical context of these technologies. Microsoft's data access strategy was previously based on Data Access Objects (DAO), for access to desktop databases; and Remote Data Objects (RDO), which use the Open Database Connectivity (ODBC) architecture, for access to client/server databases. This strategy is now being replaced by a single model, Universal Data Access (UDA), which provides access to all types of data.

The goal of Microsoft's UDA strategy is to provide high-performance access to non-relational data sources as well as relational databases through an easy-to-use programming interface that is tool- and language-independent. UDA is implemented through ADO, which provides a high-level interface to OLE DB, Microsoft's newest Component Object Model (COM)-based data access technology.

While you can still use any of the earlier C++ data access technologies, you should always use UDA technology when developing a new application. Generally, you should use ADO for all of your data access because ADO is easy to use, has many powerful features, and performs well. Experienced COM devel-opers can use the OLE DB interfaces directly to gain optimum performance and efficiency benefits. Visual C++ 6.0 provides the OLE DB Templates, a set of templated classes that implement many of the commonly used OLE DB interfaces, to make the OLE DB technology easier to use.

If you are considering migrating an existing DAO/ODBC application to ADO, you will have to consider whether the benefits of ADO justify the cost of the conversion. Code written in DAO or RDO will not directly map to ADO code. However, a solution implemented using other data access technologies will most likely be achievable using ADO. Eventually, you should look towards converting all your code to ADO because it implements a simpler, more flexible object model.

Data Access Objects

DAO, the native programming interface to the Microsoft Jet database engine, was originally designed for use with Microsoft Visual Basic and Microsoft Visual Basic for Applications. DAO uses the Microsoft Jet engine to provide a set of data access objects that encapsulate common database objects such as tables, queries, and recordsets (objects that contain a set of rows returned as a result of a query against a database).

DAO is generally used to access local desktop data sources such as Microsoft Access, Microsoft FoxPro, and Paradox, but it can also be used to access remote data sources.

At the lowest level, DAO exposes its objects through COM interfaces. However, C++ programmers are more likely to access them through the MFC DAO database classes (described in Lesson 2 of this chapter), or through the dbDAO classes. The dbDAO classes provide C++ with the same DAO functionality available through Visual Basic and also use a similar syntax.

Open Database Connectivity

ODBC provides a common application programming interface (API) to access client/server data sources, usually an RDBMS, such as SQL Server or Oracle. ODBC's consistent interface provides maximum interoperability; a single application can access different RDBMS platforms through a common set of code.

This ability enables developers to build and distribute a client/server application without targeting a specific RDBMS or having to know specific details of the different database server platforms that the application may access. All that is needed to connect to a RDBMS is an ODBC driver. These drivers are supplied by RDBMS vendors or third-party developers, and developed according to the ODBC open standard.

Because the capabilities of different RDBMS platforms vary, and because ODBC driver developers might choose to limit the number of features they implement, ODBC defines three levels of driver conformance that provide the application with information about what features are available to the application from the driver:

Core conformance, which all ODBC drivers must meet.

Level 1 conformance, which includes the Core interface conformance level functionality plus additional features, like transactions, that are usually available in a RDBMS.

Level 2 conformance, which includes the Level 1 interface conformance level functionality plus advanced features like the asynchronous execution of ODBC.

For more information about ODBC conformance levels, search for "Interface conformance levels" in the Visual C++ Help file.

You can install and configure ODBC drivers on your computer using the ODBC Data Sources applet in Control Panel. The ODBC Data Sources applet is also used to register a Data Source Name (DSN). A DSN is a uniquely named collection of information used by the ODBC Driver Manager to connect your application to a particular ODBC database. A DSN must be registered on the specific system that will use it. DSNs can be stored in a file (a file DSN) or in the registry (a machine DSN). Machine DSNs are either installed for a particular user (a user DSN), or are accessible to all users of a computer (a system DSN). Figure 7.1 shows the ODBC Data Source Administrator being used to inspect system DSNs.

click to view at full size.

Figure 7.1 The ODBC Data Source Administrator

ODBC is based on SQL as a standard for accessing data. When an application needs to get data from a data source, the application sends an SQL statement to the ODBC Driver Manager, which then loads the ODBC driver required to interface with the data. The driver then translates the SQL sent by the application into the SQL used by the DBMS, and finally sends it to the database server. The DBMS retrieves the data and passes it back to the application via the driver and the Driver Manager.

ODBC provides a cursor library, which provides scrollable cursors for any driver that meets the minimum level of ODBC conformance. You can use cursors to iterate across a set of rows selected from a database.

C++ developers can use the ODBC API to connect to a database, send SQL statements, retrieve results, get errors, disconnect, and so on. The ODBC API is a well-documented way of writing client/server applications but is fairly difficult and involves a lot of code. As a result, object models such as ADO or RDO, or the MFC ODBC database support classes, are more commonly used.

Remote Data Objects

RDO is a thin objectified layer over the ODBC API. RDO depends on the ODBC driver and the database engine for much of its functionality. Data access using RDO is intended to provide access to an ODBC data source through an object model similar to that used by DAO, without additional memory requirements necessitated by supporting a local database. The RDO object model provides additional features such as server-side cursors, disconnected recordsets, and asynchronous processing.

Like DAO, RDO exposes its objects through COM interfaces. RDO provides the Data Source Control, an ActiveX control that encapsulates a database query and the retrieved recordset. The Data Source Control provides controls that allow you to browse through the recordset, displaying the data that it contains in one of Microsoft's data-bound ActiveX controls, such as the DBGrid or DBList controls.

OLE DB

OLE DB is a set of COM interfaces that provides applications with uniform access to data stored in diverse information sources, regardless of location or type. OLE DB is an open specification designed to build on the success of ODBC by providing an open standard for accessing all types of data. Whereas ODBC was created to access relational databases, OLE DB is designed for both relational and non-relational information sources, including but not limited to mainframe, server, and desktop databases; e-mail and file system stores; spreadsheets and project management tools; and custom business objects.

Conceptually, OLE DB has three types of components: data consumers, service components, and data providers, as illustrated in Figure 7.2.

click to view at full size.

Figure 7.2 Components of OLE DB

Data consumers are applications or components that use the data exposed by data providers. Any application that uses ADO is an OLE DB data consumer.

Service components are elements that process or transport data to extend the functionality of data providers. Examples of service components include query processors that generate or optimize queries; or cursor engines that consume data from a sequential, forward-only data source to produce scrollable data.

Data providers are applications such as SQL Server or Microsoft Exchange, or system components such as file systems or document stores, which expose their data to other applications. Data providers expose OLE DB interfaces that data consumers or service components can access directly. An OLE DB provider is available for ODBC; this provider makes the many existing ODBC data sources available to OLE DB data consumers.

ActiveX Data Objects

ADO is designed as a high-performance, easy-to-use application-level interface to OLE DB. ADO is implemented with a small footprint, minimal network traffic, and a minimal number of layers between the application and the data source, all contributing to a lightweight, high-performance interface. ADO exposes a COM Automation interface, which makes it accessible by all leading Rapid Application Development tools, database tools, application development environments, and scripting languages.

Since ADO was designed to combine the best features of, and eventually replace, RDO and DAO, it uses similar conventions with simplified semantics to make it easy to learn. ADO provides the ADO Data Control, an improved version of the RDO Data Source Control.

A feature of ADO, Remote Data Service (RDS) manages the transport of disconnected recordsets to clients over HTTP as well as over Distributed COM (DCOM), making it possible to develop full-featured, data-centric Web applications. ADO will be covered in greater detail in Lesson 3 of this chapter.

Using the ODBC Data Source Administrator

In this practice exercise, you will use the ODBC Data Source Administrator to register a system DSN for the pubs sample database that is included as part of a standard SQL Server installation. You will use this DSN in subsequent exercises to access ODBC data.

To add a data source using the ODBC Administrator

On the Start menu, click Control Panel. Open the ODBC Data Sources applet.

On the System DSN tab, click Add. Select the SQL Server driver from the list of drivers that are displayed, and then click Finish.

NOTE
If the SQL Server driver is not displayed in the list, you will need to install the SQL Server ODBC Driver from your Visual C++ CD-ROM. Select Add/Remove from the installation screen, and install the driver from the Data Access option group.

Type MyDSN as the name of the data source. You can leave the Description box empty.

Select (local) as the SQL Server to connect to, and click Next.

If you have installed the Desktop version of SQL Server, select SQL Server authentication using a login ID and password entered by the user. Enter the login ID and password on this screen, and click Next.

NOTE
Use the default login ID, sa, and leave the password box blank unless you have assigned a password for this account. (On a default installation, the sa account has no password.)

If you have installed the SQL Server Standard version, select Windows NT authentication using the network login ID, and click Next.

Select the check box to make the Change the default database to: option available. Select pubs from the drop-down list, and click Next.

NOTE
If the pubs database does not appear, you will need to install it on your SQL Server. To do this, use the SQL Server Query Analyzer to load and execute the MSSQL7\Install\InstPubs.sql script.

On the following screen, maintain the default options, and click Finish.

Test the data source by clicking Test Data Source. The last line returned should read TESTS COMPLETED SUCCESSFULLY! Click OK to dismiss the SQL Server ODBC Data Source Test dialog box, then click OK again to complete the registration procedure.

You will see that the DSN has been added to the list of system DSNs. Close the ODBC Data Source Administrator, and then close Control Panel.

Relational Database Concepts

While there are database system types other than relational databases, and while ADO is intended to provide access to non-relational data as well as relational data, the relational model is currently dominant. For that reason, before working with a data source, you should have a basic understanding of relational database theory.

A relational database stores and presents data as a collection of tables. (Tables are covered in greater detail later in this section.) A logical structure is defined for this type of database by establishing relationships between tables. The relational database model offers the following benefits in that it:

Organizes data into a collection of tables to make the design easy to understand.

Provides a relationally complete language for data definition, retrieval, and update.

Provides data integrity rules that define consistent database states to improve data reliability.

Elements of a Relational Database

A relational database presents data as a collection of tables. For example, the SQL Server sample database pubs contains business information of the type that might be held by a publishing house. The pubs database contains one table that lists all authors, and another that lists all book titles. Figure 7.3 shows a portion of the authors table.

Figure 7.3 The authors table from the pubs database

Tables are logical groupings of related information made up of rows and columns. A row (sometimes called a record) contains information about a single entry in a table. For example, a row in the authors table would contain information about a single author.

A row is composed of multiple columns (sometimes called fields). Each column contains a single piece of information about the row. For example, a row in the authors table has columns for the author ID, the author's first name, and the author's last name. If you look at the portion of the authors table shown in Figure 7.3, you will see that these columns are named au_id, au_lname and au_fname.

Unlike columns, table rows are not named. To uniquely identify rows within a table, you must define a primary key for the table. A primary key is a column, or a combination of columns, which has a unique value for each row in the table. The au_id column shown in Figure 7.3 is an example of a primary key. Once a column or columns have been defined as the primary key for a table, the data integrity rules built into the RDBMS ensure that rows with duplicate keys cannot be added to the table.

A table can also define foreign keys to specify relationships between tables. A foreign key points to a primary key field in a related table. For example, the pubs database contains the titles table, which lists published titles. The titles table defines one of its columns as a foreign key, containing a primary key value from the publishers table to designate the publisher of a particular title. Figure 7.4 shows this relationship as portrayed in the Visual Studio database diagram tool.

click to view at full size.

Figure 7.4 The relationship between the titles table and the publishers table

The symbols used to qualify the relationship depicted by this diagram indicate that a one-to-many relationship exists between the publishers table and the titles table. In other words, the titles table can contain any number of rows with the same value in the pub_id column; or to put it another way, the titles table can contain many titles from the same publisher. Because the pub_id column in the publishers table is defined as the table's primary key, each row in the publishers table will have a unique value in the pub_id column.

Structured Query Language

Structured Query Language (SQL) is a well-defined, standard language used for querying, updating, and managing relational databases. Approved as a standard by the International Standards Organization (ISO), SQL can retrieve, sort, and filter specific data from a database. In addition, you can change and delete data in a database using SQL statements.

It is important to have a fundamental understanding of SQL so that your applications will communicate effectively with your database. By using SQL, an application can ask the database to perform tasks rather than requiring application code and processing cycles to perform them. More importantly, effective use of SQL can minimize the amount of data that must be read from and written to a remote database server. Effective use of SQL can also minimize the amount of data sent across the network. Minimizing disk and network input/output (I/O) are important factors in improving application performance.

You must be aware that, in different environments, databases have different implementations of the same SQL functionality, both syntactically and semantically. Each implementation of SQL has its own support for different data types, integrity rules, and query optimization.

SQL SELECT Statement

The SQL SELECT statement returns information from the database as a set of selected rows. The SELECT statement is divided into three major sections:

SELECT The SELECT list allows you to specify which columns will be returned from the query.

FROM The FROM clause allows you to specify which tables will be used to get the columns specified in the SELECT list.

WHERE The optional WHERE clause allows you to specify filter criteria to limit the selection of rows. You can filter queries based on multiple columns.

The minimum syntax for a SELECT statement is:

 SELECT columns FROM tables

To perform this operation, the database engine searches the specified table or tables and extracts the chosen columns. You can select all columns in a table by using an asterisk (*). For example, the following SQL statement will return all columns and rows from the authors table:

 SELECT * FROM authors

It might not be efficient to return all data from a table. By adding a WHERE clause to the end of the statement, you can specify that only rows meeting a certain condition are to be returned. The following example will return all columns from all rows in the authors table having a last name equal to Ringer:

 SELECT * FROM authors WHERE au_lname = 'Ringer'

Note the use of apostrophes (') surrounding the name Ringer in this example. Apostrophes are used when the value in the WHERE clause is a string. In this case, au_lname is defined by the database as a string value. When a numeric value is specified in the WHERE clause, apostrophes are not used, as shown in the following example:

 SELECT * FROM titles WHERE royalty = 10

IN Operator

By using the IN operator in a WHERE clause, you can return only those rows in which a WHERE clause parameter is contained in a specified list. For example, you can use the IN operator to return last names and state codes of all authors living in Utah or Tennessee, as illustrated in the following example and in Figure 7.5.

 SELECT au_lname, state FROM authors WHERE state IN ('UT', 'TN')

Figure 7.5 Using the IN operator with a WHERE clause to filter rows

BETWEEN Operator

This operator returns a selection of rows in which the WHERE parameter is between two given criteria. Note that dates are specified in the 'yyyymmdd' string format.

 SELECT title_id, title, pubdate FROM titles WHERE pubdate BETWEEN '19910601' AND '19910630'

LIKE Operator

You can use the LIKE operator to find values in a column that match a pattern you specify. You can specify the complete value, as in LIKE 'Smith', or you can use wildcard characters to find a range of values (for example, LIKE 'Sm%'). In the following example, also illustrated in Figure 7.6, all rows where the author's last name starts with the letter S are returned.

 SELECT au_lname FROM authors WHERE au_lname LIKE 'S%'

click to view at full size.

Figure 7.6 Syntax and results of an SQL statement using the LIKE operator

ORDER BY Clause

By default, rows are returned in the order they were entered in the database. The optional ORDER BY clause will sort a query's resulting rows on a specified column or columns, in ascending or descending order. The ASC option indicates ascending order; the DESC option indicates descending order. The default sort order is ascending (A to Z, 0 to 9). The following example selects all columns from the authors table and sorts them in descending order, by last name:

 SELECT * FROM authors ORDER BY au_lname DESC

Lesson Summary

A number of different interfaces are available to allow your Visual C++ application to access persistent data from an externally managed source. Over the last few years, Microsoft's data access strategy has moved from a model based on the DAO and the ODBC-based RDO, to a newer, single model known as UDA. UDA is based on OLE DB, which is a set of COM interfaces that provide high-performance access to all types of data—both relational databases and nonrelational sources such as e-mail and file-system stores. Although you can use the OLE DB interfaces directly, Microsoft recommends that you use ADO, which is a powerful, easy-to-use, high-level interface to OLE DB data sources.

Visual C++ still supports DAO and RDO, so you can support and maintain earlier applications that use these technologies. However, you should always use UDA technology when creating a new application.

DAO uses the Microsoft Jet database engine to provide a set of data access objects to access local desktop data sources. The DAO object model encapsulates common database objects such as tables, queries, and recordsets. DAO exposes its objects through COM interfaces, but Visual C++ programmers generally use the DAO database classes provided by MFC.

RDO provides access to ODBC data sources through an object model similar to that used by DAO. ODBC has been widely used as a low-level API to client/server data sources. RDO provides many powerful features such as server-side cursors, disconnected recordsets, and asynchronous processing.

Based on SQL as a standard for accessing data, ODBC is an important technology—a large number of ODBC-based applications have been written. ODBC-based applications interface with a database through ODBC drivers, which are supplied by RDBMS vendors or third-party developers, and developed according to the ODBC open standard. You can use the ODBC Data Source Administrator to install and configure ODBC drivers on your computer and register a DSN with the operating system.

Visual C++ developers can use the ODBC API to connect to a database, send SQL statements, retrieve results, get errors, disconnect, and so on. However, it is easier to use one of the object models such as ADO or RDO, or to use the MFC ODBC database support classes.

OLE DB is an open specification, designed to build on the success of ODBC by providing an open standard for accessing all kinds of data. Conceptually, OLE DB has three types of components: data consumers, service components, and data providers.

Data providers are applications that expose their data to other applications. They also expose OLE DB interfaces that service components or data consumers can access directly. The OLE DB provider for ODBC makes the many existing ODBC data sources available to OLE DB data consumers.

Data consumers are applications or components that use the data exposed by data providers. Any application that uses ADO is an OLE DB data consumer.

Service components are elements that process or transport data to extend the functionality of data providers.

ADO is designed as a high-performance, easy-to-use application-level interface to OLE DB. ADO exposes a COM Automation interface, which makes it accessible by a wide variety of development tools and scripting languages.

So that you can use ODBC and ADO efficiently, you should have a basic understanding of relational database theory and SQL, the language that controls and interacts with an RDBMS.

A relational database presents data as a collection of tables made up of rows and columns. A row contains information about a single record in a table. Each column contains a single piece of information about the record.

To uniquely identify rows within a table, you should define a primary key for the table. A primary key is a column or a combination of columns whose value is guaranteed to be unique for each row in the table. You can also define foreign keys to specify relationships between tables. A foreign key "points to" a primary key field in a related table.

Remember that SQL is a standard language used for querying, updating, and managing relational databases. SQL can both retrieve and update data in a database. You should have a fundamental understanding of SQL so your applications will communicate effectively with your database. The proper use of SQL can ensure that the database server, on behalf of your application, executes processing; and also that your data is retrieved efficiently, without generating unnecessary network traffic and disk I/O.

You should be aware that, in different environments, databases have different implementations of the same SQL functionality.

The SQL SELECT statement returns information from the database as a set of records. The SELECT statement is divided into three major sections: SELECT, FROM, and WHERE.

SELECT allows you to specify which columns will be returned from the query.

FROM allows you to specify which tables will be used to get the columns specified in the SELECT section of the SQL statement.

WHERE allows you to specify filter criteria to limit the selection of rows. You can filter queries based on multiple columns. The WHERE clause can be further qualified with IN, BETWEEN, or LIKE operators.

You can also specify an ORDER BY clause to sort a query's resulting rows on a specified column or columns, in ascending or descending order.