ADO.NET | The .NET and COM Interoperability Handbook (Integrated .Net)


Team-Fly

	.NET and COM Interoperability Handbook, The By Alan Gordon
	Table of Contents

	Chapter One. What's in a Name ?

ADO.NET

The .NET Framework class library contains a set of classes for data access. You can find these classes in the System.Data namespace. These classes are used primarily for accessing relational databases, but the classes are generic enough to allow you to use the technology to access other kinds of data stores (email, delimited text documents, Excel spreadsheets, and so forth). These classes are conceptually similar to ADO, which is Microsoft's popular data-access technology, so much so that Microsoft has called the classes in this namespace ADO.NET.

ADO.NET improves on ADO in three major ways: (1) It has vastly improved support for the disconnected programming model. (2) XML support is now a fundamental part of ADO.NET rather than a supported feature. (3) It has a vastly simpler driver model (the mechanism used to enable a particular data source to work with ADO.NET).

The Disconnected Programming Model

There are two types of client-server programming models: connected and disconnected. In the connected model, the client opens a connection to the server and maintains that open connection throughout its interaction with the server in a given session. In the disconnected programming model, the client opens a connection to the server, makes a request of the server, perhaps fetching a Web page or data from a table, and then immediately closes the connection. With each request on the server, the client creates and closes a new connection (although the client or server may pool and reuse connections). The connected model typically has higher performancein terms of the latency of each requestbecause you don't have the overhead associated with creating a connection each time. The problem with the connected model is that it does not scale very well. If 10,000 clients are currently using a server, there will be 10,000 connections open on the server even though only a small percentage of those clients are actually making a request on the server at any instant in time. Each connection will consume some resources on the server. With the disconnected model, you may have 10,000 clients using a server, but you will only need as many connections as there are simultaneous requests . You could probably service these 10,000 clients with a few hundred connections. The Web uses a disconnected model The network protocol on which the Web is built, HTTP, uses a disconnected model. When you make a request on a Web server, your Web browser makes a connection to the server and then sends a request to the server. The server responds with the desired information (or possibly an error message). A new connection is used for each request/response pair.

ADO has had support for the disconnected programming model for some time now. With ADO, you can create a disconnected Recordset. The essential process is that you connect to some data source and perform some database operation (a query or a call to a stored procedure) that returns data. You then set the ActiveConnection property on the Recordset to null, and the Recordset, which functions as a database cursor in the connected usage scenario, now becomes a glorified collection object. You can sort , search, filter, and update the data in the Recordset even though you are not connected to its data source. When you are ready to update the database, you simply reconnect to the database by setting the ActiveConnection property to a valid connection and call the UpdateBatch method on the Recordset. The Recordset will then attempt to synchronize its updates with the database, that is, it will add any Recordsets to the database that have been added to the Recordset, it will attempt to modify any records that were updated in the Recordset, and it will delete any records that have been deleted in the Recordset. If there were conflicts, for example, a record was modified by some other user after you fetched the record, the Recordset will return a list of the conflicted records. You can then attempt either to resolve the conflict or return an error message to the user.

Unfortunately, there are several problems with ADO's support for the disconnected client-server model. One is that the disconnected programming model was kluged on to the ADO Recordset object, which was originally designed to model a database cursor (a mechanism for iterating through the results of a database query). In many cases, you will want to have a different schema in the disconnected representation of data from that which exists in the underlying data source. For example, you might have a table in a database called tblCustomer that has a primary key field called CstID. In your disconnected representation of the data, you might want to call the table Customer and the primary key field CustomerID. This was difficult to do with ADO Recordsets. Moreover, ADO Recordsets only contain data from a single table or query. This makes it difficult to use them with complex or hierarchical data.

Note

With ADO, you could use hierarchical Recordsets to return data from multiple tables or queries, and you could even represent relationships. However, hierarchical Recordsets required nonstandard SQL, and they are complex to use.

ADO Recordsets also have limited support for XML. Starting with version 2.5, you could save a Recordset to a stream or file as XML and reconstitute the Recordset from the XML later. However, you had no control over the XML schema that the Recordset used when it wrote the XML document, and you could not create an ADO Recordset from an arbitrary XML document. With ADO.NET, all of these problems are solved thanks to a class called DataSet .

A DataSet in ADO.NET is somewhat similar to a Recordset in ADO.NET. One key similarity is that a DataSet will become a common mechanism for developers to pass data between the various tiers in a multitier .NET Framework application, just as the ADO Recordset became a common way to pass data amongst the multiple tiers in a Windows DNA application. One key difference is that a DataSet can contain multiple tables and can also easily model the relationships between those tables. The table names and field names in the DataSet do not have to be the same as the table and field names in the database. The DataSet class has a mechanism that you can use to map the schema in the underlying data source to a different schema supported by the DataSet . A DataSet also contains methods that you can use to synchronize the contents of the DataSet with the data in the underlying data store. ADO.NET also gives you more control over how this synchronization is done than ADO did.

Improved XML Support

ADO.NET also features vastly improved XML support than what was offered under ADO. With ADO.NET, you can turn the contents of a dataset into an XML document. You have the option of including just the XML data or the XML schema and the data. However, the biggest improvement in the XML support that ADO.NET offers over ADO is that ADO Recordsets have an XML schema that you can not alter. With ADO.NET, you have complete control of the XML schema that the DataSet uses to represent data, in other words, the layout and types of tags used in the document. You may define the XML schema for a DataSet programmatically by calling methods in the DataSet or by calling the ReadXmlSchema method on the DataSet and passing in an XSD schema file. You can also define the mapping between the elements and attributes in an XML document and the tables and fields in a database. After you have done this, you can move data seamlessly between a data source like a database and an XML document and vice-versa. This enables a number of usage scenarios that were not possible with ADO. For instance, you can set the XML schema for the DataSet to a Biztalk XML schema and then map the DataSet to a database schema, and you can move data seamlessly between a BizTalk document and a database. DataSets are also the preferred mechanism for passing data between processes in a distributed, .NET system, replacing disconnected Recordsets. Therefore, when your XML Web services need to return tabular data, you typically return that data as DataSets instead of disconnected Recordsets. The Web services infrastructure in the .NET Framework will automatically convert the DataSet to its XML representation and send it across the wire. You have complete control over the XML schema for this XML representation, so it's easy for this data to be interpreted by a client regardless of the platform that it runs on.

The ADO.NET Driver Model

Although ADO and ADO.NET have the same underlying purpose at their essence, that is, to make a data source available through some uniform interface, they are implemented in a very different way. In order to make a particular data source work with OLEDB, the vendor of the data source, a third-party, or in some cases Microsoft, had to create an OLEDB provider. An OLEDB provider is just a set of COM classes that implement the OLEDB interfaces. ADO is a set of Automation classes that hide the complexity of the OLEDB interfaces and provide a simpler way of using OLEDB. One way to think of it is that OLEDB is the low-level interface and that ADO is the high-level interface.

ADO.NET does not have separate high and low-level interfaces. In order to make a particular data source work with ADO.NET, the vendor of the data source, a third-party, or Microsoft has to create a managed provider. A managed provider consists of just the four classes shown in Figure 1-5.

Figure 1-5. The classes in an ADO.NET managed provider.

graphics/01fig05.gif

The Connection and Command classes map to their counterparts in ADO. A Connection represents an open connection to a data store. It contains properties that allow you to specify the type of data store (SQL Server, Microsoft Access, Oracle, and so forth), as well as the location of the instance of the data store that you are using (the server name in the case of Oracle or SQL Server or a file name in the case of Access). It also allows you to open and close the connection, test if the connection is open, and set properties of the connection like the default timeout. The Command object is used to execute an operation against a data source, like executing a query or a stored procedure or performing an insert, update, or delete operation. A DataReader loosely corresponds to a forward-only, read-only Recordset, or what many database programmers refer to as a firehose cursor. The DataAdapter is used to build a DataSet from a Command; it is the link between the managed provider and the disconnected functionality provided by the DataSet class.

What About the Other Cursor Types?

If you are a seasoned ADO programmer, you are probably wondering what happened to the other cursor types, such as scrollable and updateable cursors. ADO has always allowed you to create Recordsets that were scrollable, that is, you could move both forward and backward at will through the data. It also allowed you to create updateable cursors, which allowed you to alter data in the database. Supporting scrollable and updateable cursors is one of the main reasons that building an OLEDB provider is so complicated. In order to simplify the creation of a managed provider, Microsoft decided not to require this functionality in a .NET managed provider. If you are crying a river of tears because of the loss of this functionality, keep in mind that you can still use ADO in from .NET. through COM Interop.

The four classes implemented by a managed provider are exposed directly to the user, and they inherit from classes in the System.Data namespace in the .NET Framework class library. Each managed provider has unique names for these classes. For instance, Microsoft ships two managed providers with the .NET Framework SDK. One is a provider for Microsoft's client-server database, SQL Server, and the other is a managed provider that works with any OLEDB data source.

Note

Version 1.0 of the .NET Framework SDK did ship with only the SQL Client and OLEDB managed providers. After the release of version 1.0 Microsoft released a managed provider for ODBC and one for Oracle. Both of these Managed providers will be included with the version 1.1 release of the .NET Framework, which should be released shortly after this book is released. In the beta, 1.1 version of the .NET Framework, the Oracle managed provider, which uses the Oracle Client Interface (OCI) can be found in the System.Data.OracleClient namespace. The ODBC managed provider can be found in the System.Data.Odbc namespace.

The SQL Server managed provider resides in the System.Data.SqlClient namespace, and the four managed provider classes have the following names: SQLCommand, SQLConnection, SQLDataReader , and SQLDataAdapter . The OLEDB provider resides in the System.Data.OleDb namespace, and the four managed provider classes have the following names: OleDbCommand, OleDbConnection, OleDbDataReader , and OleDbDataAdapter . Therefore, your client code may change slightly, depending on which data source and hence which managed provider, you decide to use. Obviously, this is a disadvantage relative to ADO, but keep in mind that the data source independence of applications written with ADO/OLEDB was always overstated. Unless you were writing a trivial data application, you had to architect your application specifically for data source independence if you wanted your application to work with multiple databases. For instance, you couldn't use stored procedures or nonstandard SQL features, and, of course, you have to test your application with each of your supported data sources and OLEDB providers. The reality is that every data source and every OLEDB provider have bugs and, unless your application is trivial, you will have a separate bug list, and potentially a different set of coding workarounds for each data source/OLEDB provider combination. In fact, at my previous company, we abandoned the use of a promising object-to-relational database-mapping tool because, although its vendor advertised that it worked with any data source, we ran into a number of very obvious problems when we tried to use it with Sybase Adaptive Server with a third-party OLEDB provider. Upon questioning the vendor, we found out that the product had only been tested with Microsoft SQL Server and Access using Microsoft's OLEDB providers.

Note

At the risk of sounding like a Java basher and a Microsoft " shill ," this illustrates the inherent problem with any technology that advertises itself as being platform independent. Unless every platform is bug free, you still have to testand usually debugeach platform that your users may run on. This is why developers who wrote code with the Java often deride Sun Microsystem's "write once, run everywhere" mantra with one of their own: "write once, debug everywhere."

The main advantage of the managed provider approach is that it is much simpler to create a managed provider than it ever was to create an OLEDB provider, so you should see a proliferation of managed providers for a variety of data sources.


Team-Fly

Top