ADO.NET Over View | Pro Visual C++ 2005 for C# Developers

ADO.NET is more than just a thin veneer over some existing API. The similarity to ADO is fairly minimal — the classes and methods of accessing data are completely different.

ADO (ActiveX Data Objects) is a library of COM components that has had many incarnations over the last few years. Currently at version 2.7, ADO consists primarily of the Connection, Command, Recordset, and Field objects. Using ADO, a connection is opened to the database, some data is selected into a record set consisting of fields, that data is then manipulated and updated on the server, and the connection is closed. ADO also introduced a so-called disconnected record set, which is used when keeping the connection open for long periods of time is not desirable.

There were several problems that ADO did not address satisfactorily, most notably the unwieldiness (in physical size) of a disconnected record set. This support was more necessary than ever with the evolution of Web-centric computing, so a fresh approach was required. A number of similarities exist between ADO.NET programming and ADO (not only the name), so upgrading from ADO shouldn't be too difficult. What's more, if you're using SQL Server, there's a fantastic new set of managed classes that are tuned to squeeze maximum performance out of the database. This alone should be reason enough to migrate to ADO.NET.

ADO.NET ships with four database client namespaces: one for SQL Server, another for Oracle, the third for ODBC datasources, and the fourth for any database exposed through OLEDB. If your database of choice is not SQL Server or Oracle, the OLEDB route should be taken unless you have no other choice than to use ODBC.

Namespaces

All of the examples in this chapter access data in one way or another. The following namespaces expose the classes and interfaces used in .NET data access:

System.Data — All generic data access classes
System.Data.Common — Classes shared (or overridden) by individual data providers
System.Data.Odbc — ODBC provider classes
System.Data.OleDb — OLE DB provider classes
System.Data.ProviderBase — New base classes and connection factory classes
System.Data.Oracle — Oracle provider classes
System.Data.Sql — New generic interfaces and classes for SQL Server data access
System.Data.SqlClient — SQL Server provider classes
System.Data.SqlTypes — SQL Server data types

The main classes in ADO.NET are listed in the following subsections.

Shared Classes

ADO.NET contains a number of classes that are used regardless of whether you are using the SQL Server classes or the OLE DB classes.

The following classes are contained in the System.Data namespace:

DataSet — This object is designed for disconnected use and can contain a set of DataTables and include relationships between these tables.
DataTable — A container of data that consists of one or more DataColumns and, when populated, will have one or more DataRows containing data.
DataRow — A number of values, akin to a row from a database table, or a row from a spreadsheet.
DataColumn — This object contains the definition of a column, such as the name and data type.
DataRelation — A link between two DataTable classes within a DataSet class. Used for foreign key and master/detail relationships.
Constraint — This class defines a rule for a DataColumn class (or set of data columns), such as unique values.

The following classes can be found in the System.Data.Common namespace:

DataColumnMapping — Maps the name of a column from the database with the name of a column within a DataTable.
DataTableMapping — Maps a table name from the database to a DataTable within a DataSet.

Database-Specific Classes

In addition to the shared classes introduced in the previous section, ADO.NET contains a number of database-specific classes. These classes implement a set of standard interfaces defined within the System.Data namespace, allowing the classes to be used in a generic manner if necessary. For example, both the SqlConnection and OleDbConnection classes implement the IDbConnection interface.

SqlCommand, OleDbCommand, OracleCommand, and ODBCCommand — Used as wrappers for SQL statements or stored procedure calls.
SqlCommandBuilder, OleDbCommandBuilder, OracleCommandBuilder, and ODBCCommandBuilder — Used to generate SQL commands (such as INSERT, UPDATE, and DELETE statements) from a SELECT statement.
SqlConnection, OleDbConnection, OracleConnection, ODBCConnection — Used to connect to the database. Similar to an ADO Connection.
SqlDataAdapter, OleDbDataAdapter, OracleDataAdapter, ODBCDataAdapter — Used to hold select, insert, update, and delete commands, which are then used to populate a DataSet and update the Database.
SqlDataReader, OleDbDataReader, OracleDataReader, ODBCDataReader — Used as a forward only, connected data reader.
SqlParameter, OleDbParameter, OracleParameter, ODBCParameter — Used to define a parameter to a stored procedure.
SqlTransaction, OleDbTransaction, OracleTransaction, ODBCTransaction — Used for a database transaction, wrapped in an object.

As you can see from the previous list, there are four classes for each type of object — one for each of the providers that are part of .NET version 1.1. In the rest of this chapter, unless otherwise stated, the prefix <provider> is used to indicate that the particular class used is dependent on the database provider in use. With version 2.0 of .NET, the designers have updated the class hierarchy for these classes significantly. In 1.1, all that was common between the various connection classes was the implementation of the IConnection interface. This has changed in .NET 2.0 because now both share a common base class. Similarly the other classes such as Commands, DataAdapters, DataReaders, and so on also share common base classes.

The most important feature of the ADO.NET classes is that they are designed to work in a disconnected manner, which is important in today's highly Web-centric world. It is now common practice to architect a service (such as an online bookshop) to connect to a server, retrieve some data, and then work on that data on the client before reconnecting and passing the data back for processing. The disconnected nature of ADO.NET enables this type of behavior.

ADO 2.1 introduced the disconnected record set, which would permit data to be retrieved from a database, passed to the client for processing, and then reattached to the server. This used to be cumbersome to use, because disconnected behavior was not part of the original design. The ADO.NET classes are different — in all but one case (the <provider>DataReader) they are designed for use offline from the database.

Note

The classes and interfaces used for data access in the.NET Framework are introduced in the course of this chapter. The focus is mainly on the SQL classes when connecting to the database, because the Framework SDK samples install an MSDE database (SQL Server). In most cases, the OleDb, Oracle and ODBC classes mimic exactly the SQL code.

Using Database Connections

To access the database, you need to provide connection parameters, such as the machine that the database is running on, and possibly your login credentials. Anyone who has worked with ADO will be familiar with the .NET connection classes, OleDbConnection and SqlConnection. Figure 19-1 shows two of the connection classes and includes the class hierarchy.

image from book
Figure 19-1

This is a significant change from that in .NET versions 1.0 and 1.1; however, in practice, using the connection class (and other classes in ADO.NET) is backwardly compatible.

The examples in this chapter use the Northwind database, which is installed with the .NET Framework SDK samples. The following code snippet illustrates how to create, open, and close a connection to the Northwind database:

 using System.Data.SqlClient; string source = "server=(local);" + "integrated security=SSPI;" + "database=Northwind"; SqlConnection conn = new SqlConnection(source); conn.Open(); // Do something useful conn.Close();

The connection string should be very familiar to you if you've used ADO or OLE DB before — indeed, you should be able to cut and paste from your old code if you use the OleDb provider. In the example connection string, the parameters used are as follows (the parameters are delimited by a semicolon in the connection string):

server=(local) — This denotes the database server to connect to. SQL Server permits a number of separate database server instances to be running on the same machine, and here you're connecting to the default SQL Server instance.
integrated security=SSPI — This uses Windows Authentication to connect to the database, which is highly recommended over using a username and password within the source code.
database=Northwind — This describes the database instance to connect to; each SQL Server process can expose several database instances.

The example opens a database connection using the defined connection string and then closes that connection. Once the connection has been opened, you can issue commands against the data source, and when you're finished, the connection can be closed.

SQL Server has another mode of authentication — it can use Windows-integrated security, so that the credentials supplied at logon are passed to SQL Server. This is accomplished by removing the uid and pwd portions of the connection string, and adding in Integrated Security=SSPI.

In the download code available for this chapter, you will find the file Login.cs that simplifies the examples in this chapter. It is linked to all the example code, and includes database connection information used for the examples; you can alter this to supply your own server name, user, and password as appropriate. This by default uses Windows-integrated security; however, you can change the username and password as appropriate.

Managing Connection Strings

In the previous .NET releases it was up to the developer to manage the database connection strings, often done by storing a connection string in the application configuration file, or more commonly hard- coded somewhere within the application itself.

With .NET 2.0, you now have a predefined way to store connection strings, and even use database connections in a type agnostic manner — for example, it would now be possible to write an application and then plug in various database providers, all without altering the main application.

To define a database connection string you should use the new <connectionStrings> section of the configuration file. Here you can specify a name for the connection, the actual database connection string parameters, and in addition you need to specify the provider for this connection type. Here is an example:

 <configuration> ... <connectionStrings> <add name="Northwind" providerName="System.Data.SqlClient" connectionString="server=(local);integrated security=SSPI; database=Northwind" /> </connectionStrings> </configuration>

You use this same connection string in the other examples in this chapter.

Once the database connection information has been defined within the configuration file, you then need to utilize this within the application. You'll most likely want to create a method such as the following to retrieve a database connection based on the name of the connection:

 private DbConnection GetDatabaseConnection ( string name ) { ConnectionStringSettings settings =  ConfigurationSettings.ConnectionStrings[name]; DbProviderFactory factory = DbProviderFactories.GetFactory  ( settings.ProviderName ) ; DbConnection conn = factory.CreateConnection ( ) ; conn.ConnectionString = settings.ConnectionString ; return conn ; }

This code reads the named connection string section (using the new ConfigurationStringSettings class), and then requests a provider factory from the generic DbProviderFactories class. This uses the ProviderName property, which was set to "System.Data.SqlClient" in the application configuration file. You might be wondering how this maps to the actual factory class used to generate a database connection for SQL Server — in this case it should utilize the SqlClientFactory class fromSystem.Data.SqlClient.

If you look into the machine.config file for .NET 2.0, you may notice the DbProviderFactories section — this maps the alias names (such as 'System.Data.SqlClient') to the factory object for that type of database. The following shows an abridged copy of the information I have on my system:

 <system.data> <DbProviderFactories> ... <add name="SqlClient Data Provider" invariant="System.Data.SqlClient" support="FF" description=".Net Framework Data Provider for SqlServer" type="System.Data.SqlClient.SqlClientFactory, System.Data, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" /> ... </DbProviderFactories> </system.data>

This just shows the entry for the SqlClient provider — there are other entries for Odbc, OleDb, Oracle, and also the SqlCE.

So, in the example, the DbProviderFactory class just looks up the factory class from the machine configuration settings, and uses that concrete factory class to instantiate the connection object. In the case of the SqlClientFactory class, all this does is construct an instance of SqlConnection and return this to the caller.

This may seem like a lot of unnecessary work to obtain a database connection, and indeed it is if your application is never going to run on any other database than the one it was designed for. If, however, you use the preceding factory method and also use the generic Db* classes (such as DbConnection, DbCommand, and DbDataReader), you'll future-proof the application and any move in the future to another database system will be fairly simple.

Using Connections Efficiently

In general, when using scarce resources in .NET, such as database connections, windows, or graphics objects, it is good practice to ensure that each resource is closed after use. Although the designers of.NET have implemented automatic garbage collection, which will tidy up eventually, it is necessary to release resources as early as possible to avoid starvation of resources.

This is all too apparent when writing code that accesses a database, because keeping a connection open for slightly longer than necessary can affect other sessions. In extreme circumstances, not closing a connection can lock other users out of an entire set of tables, considerably hurting application performance. Closing database connections should be considered mandatory, so this section shows how to structure your code so as to minimize the risk of leaving a resource open.

You have two main ways to ensure that database connections and the like are released after use.

Option One: try...catch...finally

The first option to ensure that resources are cleaned up is to use try...catch...finally blocks, and ensure that you close any open connections within the finally block. Here's a short example:

 try { // Open the connection conn.Open(); // Do something useful } catch ( SqlException ex ) { // Do something about the exception } finally { // Ensure that the connection is freed conn.Close ( ) ; }

Within the finally block, you can release any resources you have used. The only trouble with this method is that you have to ensure that you close the connection — it is all too easy to forget to add in the finally clause, so something less prone to vagaries in coding style might be worthwhile.

Also, you might find that you open a number of resources (say two database connections and a file) within a given method, so the cascading of try...catch...finally blocks can sometimes become less easy to read. There is, however, another way to guarantee resource cleanup — the using statement.

Option Two: The using block statement

During development of C#, the debate on how .NET uses nondeterministic destruction became very heated.

In C++, as soon as an object went out of scope, its destructor would be automatically called. This was great news for designers of resource-based classes, because the destructor was the ideal place to close the resource if the user had forgotten to do so. A C++ destructor is called whenever an object goes out of scope — so, for instance, if an exception was raised and not caught, all destructors would be called.

With C# and the other managed languages, there is no concept of automatic, deterministic destruction. Instead there is the garbage collector, which disposes of resources at some point in the future. What makes this nondeterministic is that you have little say over when this process actually happens. Forgetting to close a database connection could cause all sorts of problems for a .NET executable.

Luckily, help is at hand. The following code demonstrates how to use the using clause to ensure that objects that implement the IDisposable interface (see Chapter 7, "Memory Management and Pointers") are cleared up immediately after the block exits:

string source = "server=(local);" +                  "integrated security=SSPI;" +                  "database=Northwind"; using ( SqlConnection conn = new SqlConnection ( source ) ) { // Open the connection conn.Open ( ) ; // Do something useful }

In this instance, the using clause ensures that the database connection is closed, regardless of how the block is exited.

Looking at the IL code for the Dispose() method of the connection classes, all of them check the current state of the connection object, and if open will call the Close() method. A great tool for browsing .NET assemblies is Reflector (available at http://www.aisto.com/roeder/dotnet/). This tool permits you to view the IL code for any .NET method, and will also reverse-engineer the IL into C# source code so you can easily see what a given method is doing.

When programming, you should use at least one of these methods, and probably both. Wherever you acquire resources it is good practice to use the using statement; even though we all mean to write the Close() statement, sometimes we forget, and in the face of exceptions the using clause does the right thing. There is no substitute for good exception handling either, so in most instances I would suggest you use both methods together as in the following example:

 try { using (SqlConnection conn = new SqlConnection ( source )) { // Open the connection conn.Open ( ) ; // Do something useful // Close it myself conn.Close ( ) ; } } catch (SqlException e) { // Do something with the exception here... }

Note that this example called Close(), which isn't strictly necessary, because the using clause will ensure that this is done anyway. However, you should ensure that any resources such as this are released as soon as possible — you might have more code in the rest of the block and there's no point locking a resource unnecessarily.

In addition, if an exception is raised within the using block, the IDisposable.Dispose method will be called on the resource guarded by the using clause, which in this example ensures that the database connection is always closed. This produces easier-to-read code than having to ensure you close a connection within an exception clause. You might also note that the exception is defined as a SqlException rather than the catch-all Exception type — always try to catch as specific an exception as possible and let all others that are not explicitly handled rise up the execution stack.

In conclusion, if you are writing a class that wraps a resource, whatever that resource may be, always implement the IDisposable interface to close the resource. That way anyone coding with your class can use the using() statement and guarantee that the resource will be cleared up.

Transactions

Often when there is more than one update to be made to the database, these updates must be performed within the scope of a transaction. A transaction in ADO.NET is initiated by calling one of the BeginTransaction() methods on the database connection object. These methods return an object that implements the IDbTransaction interface, defined within System.Data.

The following sequence of code initiates a transaction on a SQL Server connection:

string source = "server=(local);" +                  "integrated security=SSPI;" +                  "database=Northwind"; SqlConnection conn = new SqlConnection(source); conn.Open(); SqlTransaction tx = conn.BeginTransaction(); // Execute some commands, then commit the transaction tx.Commit(); conn.Close();

When you begin a transaction, you can choose the isolation level for commands executed within that transaction. The level determines how changes made in one database session are viewed by another. Not all database engines support all of the four levels presented in the following table.

Isolation Level	Description
ReadCommitted	The default for SQL Server. This level ensures that data written by one transaction will only be accessible in a second transaction after the first transaction commits.
ReadUncommitted	This permits your transaction to read data within the database, even data that has not yet been committed by another transaction. For example, if two users were accessing the same database, and the first inserted some data without concluding their transaction (by means of a Commit or Rollback), then the second user with their isolation level set to ReadUncommitted could read the data.
RepeatableRead	This level, which extends the ReadCommitted level, ensures that if the same statement is issued within the transaction, regardless of other poten- tial updates made to the database, the same data will always be returned. This level does require extra locks to be held on the data, which could adversely affect performance. This level guarantees that, for each row in the initial query, no changes can be made to that data. It does, however, permit "phantom" rows to show up — these are completely new rows that another transaction might have inserted while your transaction was running.
Serializable	This is the most "exclusive" transaction level, which in effect serializes access to data within the database. With this isolation level, phantom rows can never show up, so a SQL statement issued within a serializable transac- tion will always retrieve the same data. The negative performance impact of a Serializable transaction should not be underestimated — if you don't absolutely need to use this level of isolation, stay away from it.

The SQL Server default isolation level, ReadCommitted, is a good compromise between data coherence and data availability, because fewer locks are required on data than in RepeatableRead or Serializable modes. However, situations exist where the isolation level should be increased, and so within .NET you can simply begin a transaction with a different level from the default. There are no hard-and-fast rules as to which levels to pick — that comes with experience.

Note

If you are currently using a database that does not support transactions, it is well worth changing to a database that does. Once I was working as a trusted employee and had been given complete access to the bug database. I typed what I thought was delete from bug where id=99999, but in fact had typed a < rather than an =. I deleted the entire database of bugs (except the one I wanted to!). Luckily for me ourI.S team backed up the database on a nightly basis and we could restore this, but a rollback command would have been much easier.