After completing application design, developers should have access to conceptual design documents that contain a sufficient model of needed data objects. Additionally, the project team will presumably have defined several specific data and data connectivity requirements. With these requirements, the team should be able to approximate how these objects will be grouped into components. As developers attempt to implement these components, they should keep the following key design points regarding data objects in mind:
Every data store provides a native access method. Each database vendor provides a vendor-specific API to ease database access. As mentioned, non-DBMS data can be accessed through data-specific APIs, such as the Windows NT Directory Service API or MAPI, for accessing mail data, or via file system APIs. By using the native access method for each data store, developers can use each store's full power. However, such technique requires developers to learn how to use each access method. Furthermore, developers must understand the API functions and their most efficient uses, and also learn how to use diagnostic and configuration tools associated with data stores. The project team should be aware that costs of training developers on all data access APIs used within an organization can be high.
Distributed application developers face two fundamental technical challenges related to data access. First, developers rarely have the opportunity to start from the beginning, regarding data access. Most applications need to access existing data, which can be stored in a variety of formats. For instance, some of this information could be stored in DBMSs; other data could be stored in less structured forms. Many DBMSs are on the market, including mainframe databases such as Information Management Systems (IMS) and DB2, server databases such as Oracle and SQL Server, and desktop databases such as Microsoft Access and Paradox. It is possible for multiple DBMSs to be in use with a single application. Non-DBMS data can be stored in text files, indexed sequential access method (ISAM) files, spreadsheets, e-mail messages, or any application-specific file. Somehow, distributed application developers must integrate such disparate data sources into a unified view for application users.
A second challenge related to data access is that distributed applications naturally involve accessing remote data sources. In most cases, users implementing applications to examine data are working on different computers than the computers on which data is stored. Therefore, it's important that efficient mechanisms for accessing remote data are in place, to minimize network traffic generated by applications. This point becomes increasingly critical as applications scale to reach thousands or millions of users, many connected to computers through relatively expensive network bandwidth over wide area network (WAN) connections or throughput constrained modem connections.
MDAC components are:
Instead of using native data access methods, developers can choose to use a generic, vendor-neutral API such as the Microsoft ODBC interface. ODBC is a C programming language interface for accessing data in a DBMS using SQL. An ODBC driver manager provides the programming interface and run-time components developers need to locate DBMS-specific drivers. The DBMS vendor typically supplies ODBC drivers. These drivers translate generic calls from the ODBC driver manager into calls to the native data access method.
The primary advantage of using ODBC is that developers need to learn only one API to access a wide range of DBMSs. Applications can access data from multiple DBMSs simultaneously. In fact, application developers do not need to target a specific DBMS—the exact DBMS to be used can be decided upon application deployment.
Unfortunately, there are several drawbacks to the ODBC approach. First, an ODBC driver must be present for every data store developers want to access. These drivers must support SQL queries, even if the database does not use SQL for its native query language. Second, the ODBC API treats all data in the form of relational tables. Both constraints can cause problems for unstructured and non-relational data stores. Finally, the ODBC API is a standard, controlled by a committee. In other words, regardless of the underlying DBMS's capabilities, the ODBC driver can expose only functionality that is part of the standard. Modifying the API is a complex process. The committee must agree to the proposed change, specify how ODBC drivers should implement the proposed function(s), and specify how applications or the driver manager can detect whether a given driver supports the new specification. Drivers must be updated, and applications must ensure that new drivers are installed or that applications are written carefully against older drivers.
In practice, ODBC is a widely used mechanism for database access and an appropriate solution for applications that work only with traditional relational databases. Most major DBMS vendors support ODBC. However, as applications move beyond relational DBMSs (RDBMS), a more comprehensive solution is needed.
OLE DB specifies a set of COM interfaces for data management. These interfaces are defined so that data providers can implement different support levels, based on underlying data store capabilities. Because OLE DB is COM-based, it can easily be extended; resulting extensions are implemented as new interfaces. Clients can use the standard COM QueryInterface method to determine whether specific features are supported either on a particular computer, or by a particular data store. This capability improves upon function-based APIs defined by ODBC.
Figure 9.3 shows the high-level OLE DB architecture, which consists of three major elements: data consumers, service components, and data providers.
Figure 9.3 OLE DB architecture
Data consumers are COM components that access data using OLE DB providers. OLE DB service providers are COM components that encapsulate a specialized data management function, such as query processing, cursor management, or transaction management. OLE DB is designed so that these service components can be implemented independently from data providers, which are in turn delivered as stand-alone products and plugged in as needed. For example, sim-ple data providers may provide a way only to retrieve—but not query, sort, or filter—all information from its data source.
A service component may implement SQL query processing for any data provider. If a consumer wanted to perform a SQL query on data from a simple data provider, the service component could then execute the query.
OLE DB data providers are COM components responsible for providing information from data stores to the outside world. With the use of this procedure, all data is exposed in the form of virtual tables, or rowsets. Internally, providers will make calls to an underlying data store using its native data access method or a generic API, such as ODBC.
The OLE DB object model consists of seven core components, shown in Figure 9.4. These objects are implemented by data providers or service components and used by data consumers.
Figure 9.4 OLE DB object model
In the OLE DB object model, an Enumerator object locates a data source. Data consumers that aren't accustomed to a specific data source use an Enumerator object to retrieve a list of names of available data sources and subordinate Enumerator objects. For example, in a file system, each file could correspond to a data source and each subdirectory could correspond to a subordinate Enumerator object. The data consumer searches the list of names for a data source to use, moving through the subordinate Enumerator objects as necessary. Once a data source is selected by name, a Data Source object can be created.
A Data Source object knows how to connect to a data store type, such as a file or a DBMS. Each OLE DB provider implements a Data Source component class with a unique CLSID. A data consumer can either create a specific Data Source directly, by calling CoCreateInstance, using the Data Source's CLSID, or it can use an Enumerator object to search for a data source to use. Although each Data Source class has a unique CLSID, all classes are required to expose a certain set of OLE DB interfaces. This requirement is important, as a consumer can standardize the use of any available data source. Data consumers specify the name of the data source to which they want to connect, as well as provide any authentication information, through the Data Source object. Once a Data Source object is created, it can reveal the underlying data provider's capabilities.
Developers create Session objects using a Data Source object. A Session represents a particular connection to a data source. A Session object's primary function is to define transaction boundaries. Session objects are also responsible for creating Command and Rowset objects, which are the primary objects for which to access data through OLE DB. A Data Source object can be associated with multiple Session objects.
If an OLE DB provider supports queries, it must implement Command objects. Command objects are generated by Session objects, and are responsible for preparing and executing text commands. Multiple Command objects can be associated with a single Session object. Developers should keep in mind that OLE DB doesn't distinguish among command languages used; Command objects must simply understand commands and translate them into calls to the underlying data provider when the commands are executed.
Commands that return data create Rowset objects. Rowset objects can also be created directly by Session objects. A Rowset simply represents tabular data. Rowsets are used extensively by OLE DB. All Rowsets are required to implement a core set of OLE DB interfaces. These interfaces allow consumers to sequentially traverse the rows in the Rowset, get information about Rowset columns, bind Rowset columns to data variables, and get information about the Rowset as a whole. Implementing additional OLE DB interfaces supports additional features, such as updating the Rowset object or accessing specific rows directly.
The OLE DB object model also includes an Error object. Any other OLE DB object can create error objects. They contain rich error information that can not be conveyed through the simple HRESULT returned by COM methods. OLE DB Error objects build on a standard error-handling mechanism, IErrorInfo, defined by Automation. OLE DB extends this error-handling mechanism to permit multiple error records that a single call returns, and also to permit providers to return provider-specific error messages.
The OLE DB object model provides a powerful, flexible mechanism for consumers to access any data type uniformly. OLE DB defines a rich, component-based model that lets data providers implement as much functionality as the providers are able to support, including sequential access, simple rowsets, and full DBMS functionality. This model gives developers the option of writing generic data access components that use only the most basic functionality. Using this model, developers may also write components (optimized for a specific DBMS) that use a single programming model.
The OLE DB object model exposes functionality through COM interfaces that are not Automation compatible, which means that OLE DB cannot be used directly from many programming languages and tools, such as Microsoft Active Server Pages. Thus, UDA also defines the application-level programming interface ADO. All ADO interfaces are dual interfaces, so any COM-aware programming language or tool can use them. ADO is the recommended way to access data stores in multi-service layer applications, also commonly referred to as Microsoft Windows DNA applications.
The ADO object model is shown in Figure 9.5. ADO is built on top of OLE DB, so developers will see many similarities to the OLE DB object model. ADO was also designed to be familiar to developers who have used earlier Microsoft data access object models such as DAO and RDO. Unlike DAO and RDO, the ADO object model is not hierarchical. Except for Error and Field objects, all objects can be created independently, making it easy to reuse objects in different contexts. Creating objects independently also allows for several ways to accomplish a particular programming task.
Figure 9.5 ADO object model
Microsoft has introduced several object models for data access over the years. DAO consists of Automation objects for accessing the Microsoft Jet database engine used by Access as well as ISAM and ODBC databases. RDO consists of Automation objects for accessing relational ODBC data sources. These technologies are supported "in-the-box" by Visual Basic 5.0. However, while these technologies will continue to be supported, future development efforts should focus on ADO.
The Connection object represents a unique session to a data store. The Connection object is essentially a combination of the Data Source and Session objects in OLE DB. Connection objects expose an Execute method that lets developers perform simple tasks with a minimal amount of effort. Alternatively, Connection objects can be attached to Command and Recordset objects, which also offer methods to access data from the data store.
Command objects in ADO are equivalent to OLE DB Command objects—both provide a means to prepare and execute parameterized commands against the data source. Preparing a command lets developers save a processed form of the command that can be executed quickly. A Command object has a Parameters collection, which contains one or more Parameter objects, each representing a command-specific parameter. ADO Command objects are available only when the underlying OLE DB provider implements OLE DB Command objects.
Recordset objects are the basis of ADO. Like OLE DB Rowset objects, they represent tabular data from a data source. Connection and Command methods that return data from the data store return read-only Recordset objects that can be accessed sequentially. More flexible Recordset objects can be created directly by programmers, connected to Connection and (optionally) Command objects, and populated by calling various Recordset methods. Recordset objects support a variety of options for controlling the amount of data retrieved from a data source at a given time. Other available options control the type and duration of locks placed on the underlying data source, and also specify when updates are applied to the data store.
Although Recordset object refers to the overall construct of a set of rows and columns. At a granular level, a Recordset refers to a set of columns associated with a specific row, which is also the current row. Individual columns of the Recordset are accessed through its Fields collection, with one Field object for every column. Associated with every Recordset object is a cursor. In database terms, a cursor is the software that returns rows of data to an application. The cursor in a Recordset object indicates the current position in the Recordset and determines what row will be returned next. ADO supports several cursor types, ranging from simple forward-only cursors, to cursors that let developers move to any row, to cursors that let developers view changes made by other users as developers move through the Recordset object.
The ADO object model contains the seven objects listed in Table 9.6.
Table 9.6 Objects contained in the ADO object model
|Connection||Manages a connection to a data source.|
|Command||Defines a specific command to execute against a data source.|
|Recordset||Represents a set of records from a data source or the results of an executed command.|
|Field||Represents a column of data with a common data type. A Recordset object has a Fields collection, with one Field object per column in the Recordset.|
|Parameter||Represents a parameter associated with a Command object based on a parameterized query or stored procedure. A Command object has a Parameters collection, with one Parameter object per command parameter.|
|Property||Represents a dynamic characteristic of an ADO object defined by the OLE DB provider. Connection, Command, Recordset, and Field objects have Properties collections, with one Property object per dynamically defined characteristic.|
|Error||Contains details about data access errors for a single operation. A Connection object has an Errors collection, with one Error object per OLE DB provider error.|
Developers use a Connection object to set up a connection with a data source. When the object is used with an ODBC data source, developers establish a connection by passing either a data source name (DSN), user ID, and password or a DSN filename to the Connection object's Open method.
Data objects should generally access data sources using a fixed identity, rather than using the client's identity. This technique greatly simplifies administration and makes it possible to efficiently pool database connections across multiple client requests. If developers need to restrict access to a database, they can restrict access to business objects with which clients interact, or to the data objects themselves.
The most straightforward and flexible way to specify parameters for the Connection object's Open method is to include a DSN filename in development source code and then specify the data source, user ID, and password in the DSN file. This technique lets a system administrator modify the data source or account access information without requiring source code changes to components.
The Connection object specifies the type of database access desired by the developer. Developers would use the Mode property to indicate a read-only, write-only, or read/write connection, and also the type of sharing to be permitted so the team's database will be protected. Developers must set the Mode property before opening a connection.
Typically, developers open a connection immediately before accessing the database, then subsequently close this connection as soon as possible, rather than retain a connection for an object's lifetime. This approach is acceptable despite the expense of creating database connections, because the ODBC 3.0 driver manager contains exceptional connection pooling services. For each connection request, the driver manager first examines the pool for an acceptable unused connection. If it finds a connection, the driver manager returns it; otherwise, the manager creates a new connection. If the connection remains idle for a specified period (by default, 60 seconds), the connections driver manager disconnects it from the database and removes it from the pool. Currently, only free memory and the number of database connections available limit the ODBC connection pool size. The only alternative for controlling the pool size is to set the ODBC pooling time-out value based on an estimated connection rate.
The driver manager does not reuse a connection that was established using a different user identity. Developers thus should connect to databases using a fixed identity within their data objects. If developers use the client's identity, every unique client will require a unique database connection, eliminating a key scalability benefit of three-service layer architecture and Microsoft Transaction Server (MTS). Developers also can not reuse connections across process boundaries. As we discussed in Chapter 8, components that access the same data sources should run within the same process so that connections can be reused.
If developers examine Connection object methods in the ADO documentation, the BeginTrans, CommitTrans, and RollbackTrans methods, related to transaction processing, will occur. Components running in the MTS environment should never use these three methods. Instead, developers should let MTS manage transactions through ObjectContext and use the ObjectContext, SetComplete, and SetAbort methods to elicit the transaction outcome.
Using ADO, data can be accessed using three elements: the Connection Execute method as well as the Command and Recordset objects. Developers can use the Connection Execute method to execute a specified command against a data source. When used with an ODBC data source, commands can be SQL statements or nonparameterized stored procedures. Any results are returned as a Recordset object with a read-only, forward-only cursor. We'll discuss cursors in more detail in the "Recordset Objects" section below.
Stored procedures can provide a great performance boost, especially for complex data access operations. However, developers should use stored procedures only for data access. Business logic should be implemented in business objects.
Developers can use Command objects to execute parameterized stored procedures and commands or to save a compiled version of a command that will be executed multiple times. Developers also establish a connection to a data source by setting the Command ActiveConnection property. Developers should specify such a command using the CommandText property, and execute this command using the Execute method. Any results are returned as a Recordset with a read-only, forward-only cursor. If developers have a parameterized command, they should specify such parameters in the Command object's Parameters collection. To compile a command for speedy reuse, developers set the Prepared property.
Finally, developers can manipulate data directly using Recordset objects. Creating a Recordset object and using its methods directly is the most flexible way to manipulate data.
When developers use ADO, they will almost always manipulate data exclusively through Recordset objects. Developers get either a Recordset as the return value from a Connection or Command Execute call, or developers create their own Recordsets.
A Recordset object comprises a set of rows and columns. As mentioned, at any given time, a Recordset refers to the set of columns associated with the current specific row. The Recordset 's individual columns can be accessed through the Fields collection. Developers move through the rows of a Recordset by using the object's associated cursor.
Recordset objects are tremendously useful for three-service layer applications. State on a server should not be shared across method calls; rather, server states should connect to a database, retrieve data, disconnect, and return all data to the caller. Disconnected Recordset objects can accomplish such a process through ADO.
Recordset objects also support a variety of lock types. Whenever records in a Recordset are being updated, a lock must be put on those records. The following LockType property values specify what types of locks are placed on records during editing:
The Recordset object provides a set of methods for moving through its rows. The MoveNext and MovePrevious methods move forward and backward through the Recordset, one record at a time. Developers can use the BOF and EOF properties to detect when they have reached the beginning or end of a Recordset object. For a Recordset object that supports dynamic positioning, MoveFirst and MoveLast are available. For a Recordset object that supports bookmarks, developers can use the Bookmark property to return a unique identifier for a current record in the Recordset object. At a later time, developers can set the Bookmark property to return to that particular record. Recordset objects also provide methods to move to specific records by ordinal number.
The most common way to populate a Recordset object that developers create is to attach it to a Connection object using the ActiveConnection property, then by calling the Recordset Open method. Developers can also populate a Recordset object programmatically, if the data does not derive from an OLE DB data source.
ADO supports the following four cursor types:
The Recordset object features available to developers depend on the cursor type specified when the Recordset object is opened.
Not every OLE DB provider supports every cursor type. When developers use the OLE DB provider for ODBC, the cursor types available to developers depend on types supported by the underlying ODBC driver for a database. The SQL Server ODBC driver supports all four cursor types.
Disconnected Recordset objects use optimistic locking; developers can manipulate these objects on the client using a client-side cursor library. (In this context, "client" refers to either the presentation or business layer.) Disconnected batch updates are supported through the UpdateBatch method. Using the UpdateBatch method requires extreme care, because another client may risk updating several records while these records are being modified on a different client. When the batch update is applied, developers will receive errors for the conflicting updates. In this case, developers need to define, as part of their component interfaces, how partial updates will be handled—will they generate transaction failures, or will the client need to handle the error?
To create a disconnected Recordset object, before opening the connection, developers should set the CursorLocation property on either the Connection or the Recordset object to adUseClient. The developers should then retrieve the data and release the ActiveConnection. If developers want to allow the client to modify the information, they should create the Recordset object using batch optimistic locking (adLockBatchOptimistic) and either a static (adOpenStatic) or a keyset (adOpenKeyset) cursor.
As mentioned, columns of the current row in a Recordset object are accessed using the Fields collection. Developers can access a field by its name or by a numeric index. When developers use an ODBC data source, the field name corresponds to its name in a SQL SELECT statement. The numeric index is thereby determined by the field's position in the SQL SELECT statement. Once developers have a Field object, they can get or set information about this object using its properties. The most commonly used property is Value, which can be used to retrieve or set the field's data value.
Two special methods on the Field object—GetChunk and AppendChunk—are available to handle long binary or character data. Developers can use GetChunk to retrieve a data portion and AppendChunk to write a data portion. Developers can determine whether they need to use these methods by examining the Field object's Attributes property.
Any ADO operation can generate errors, so it's important to handle those errors within method calls. Such error handling consists of two steps performed by ADO: returning error codes for each method call, then supporting the standard COM error reporting mechanism, IErrorInfo. Specific OLE DB provider errors, such as native database error codes or ODBC error codes, are stored in the Errors collection associated with a Connection object. One ADO call can generate multiple errors in the Errors collection. It is recommended that developers walk through the Errors collection to retrieve substantial error information about database failures.
ADO clears the ErrorInfo object before it makes a call that could potentially generate errors. However, the Errors collection is cleared and repopulated only when the OLE DB provider generates a new error, or when the Clear method is called. Some methods and properties can generate warning messages in the Errors collection without halting program execution. Before calling these methods or properties, developers should clear the Errors collection so that they can read the Count property to determine whether any warnings were generated. Methods that can generate warnings include Recordset Resync, UpdateBatch, and CancelBatch. The Recordset Filter property can also generate warnings.
Although ADO recordsets offer access to only one row of data at a time, recordsets do not necessarily access the underlying data store every time the cursor moves. Internally, a Recordset can cache multiple rows of data. This caching capability is an important part of building scalable distributed applications.
We'll consider a scenario in which thousands of users access an online store over the Internet. The catalog of items for sale is most likely maintained in a database. If each user maintains a unique connection to the database for the entire time the user browses the store, the number of database connections available on the database server strictly limits the number of simultaneous shoppers. On the other hand, if database connections are used only while blocks of data are read from the database, a single connection can support many users. In addition, if a large block of data can be sent back to each user's computer, the user can browse a catalog with fewer database server accesses. The latter scenario not only can reduce network traffic; it could also make the online store more responsive for the user.
The MDAC technologies that help make this scenario a reality are ADO disconnected Recordset objects and RDS. The RDS architecture is shown in Figure 9.6.
Figure 9.6 RDS architecture
A disconnected Recordset object has simply been dissociated from its Connection object. Disconnected Recordset objects do not retain locks on the under-lying data store. Instead, all data for all rows is cached within the Recordset object. If a Recordset is modified and the changes are saved back to the data store, OLE DB checks each modified row for conflicting updates in the data store. In other words, OLE DB checks whether the row has been modified since the Recordset object was cached. If so, a conflict is reported to the application so that the application can handle the conflict properly.
RDS provides a client-side cursor engine for disconnected recordsets. It also provides a very efficient service for marshaling recordsets between computers, over either the Internet or an intranet. This means that a server application can generate a set of data and copy it to a client application, which can browse the data as if it were connected to the actual data store. RDS also provides a way to bind Recordset objects to data-bound controls, which can greatly simplify writing this type of client application.
RDS provides three components to help developers write applications: RDS.DataControl, RDSServer.DataFactory, and RDS.DataSpace. An RDS.DataControl object is used to bind data-bound ActiveX controls to Recordset objects. Client applications use RDS.DataControl to browse or modify a Recordset. RDSServer.DataFactory is essentially a generic business object for communicating with data sources. It contains no business rules or other application-specific logic.
The Recordset object itself is obtained in one of two ways. First, the Recordset object can be created implicitly by RDS.DataControl. Properties of RDS.DataControl are set that identify the data server and query to be used. When the Refresh method is called, an RDSServer.DataFactory object is used behind the scenes to create the Recordset object. Second, custom business objects that return disconnected recordsets can be defined. RDS.DataSpace objects are used to create client-side proxies to these business objects. The client-side proxy does whatever is necessary to communicate with the business object. Once the proxy has been created, the application can call whatever method returns a Recordset object; it then sets the Recordset property of RDS.DataControl.
This book exclusively endorses the second method. Middle-service business objects will use data objects to create ADO disconnected recordsets. These Recordset objects will be passed back to the presentation layer using RDS. Presentation-layer client applications will use RDS to bind the Recordset objects to data-bound controls.
In intranet scenarios, DCOM can be used to transfer Recordsets between client and server computers. In Internet scenarios, however, HTTP will normally be used. In this case, RDS provides services to manage the HTTP communication. An Internet Server API (ISAPI) extension, Advanced Data ISAPI (ADISAPI), is installed on the Web server. ADISAPI does the work required to handle requests for Recordsets from client-side business object proxies. It creates server-side objects, calls the methods required to generate the Recordset objects, and converts the data to an efficient form for transmission back to the client. ADISAPI and the business object proxies handle all details of actually transmitting the data via HTTP.