Working with ADO.NET | Performance Consulting: A Practical Guide for HR and Learning Professionals

Chapter 9 - Data Access with .NET

bySimon Robinsonet al.
Wrox Press 2002

This last section will attempt to address some common scenarios when developing data access applications with ADO.NET.

Tiered Development

Producing an application that interacts with data is often done by splitting the application up into tiers. A common model is to have an application tier (the front end), a data services tier , and the database itself.

One of the difficulties with this model is deciding what data to transport between tiers, and the format that it should be transported in. With ADO.NET you'll be pleased to hear that these wrinkles have been ironed out, and support for this style of architecture has been designed in from the start.

Copying and Merging Data

Ever tried copying an entire OLE DB recordset? In .NET it's easy to copy a DataSet :

   DataSet source = {some dataset};     DataSet dest = source.Copy();

This will create an exact copy of the source DataSet each DataTable , DataColumn , DataRow , and Relation will be copied across verbatim, and all data will be in exactly the same state as it was in the source. If all you want to copy is the schema of the DataSet , you can try the following:

   DataSet source = {some dataset};     DataSet dest = source.Clone();

This will again copy all tables, relations, and so on. However, each copied DataTable will be empty. It really couldn't be more straightforward.

A common requirement when writing a tiered system, whether based on Win32 or the web, is to be able to ship as little data as possible between tiers. This reduces the amount of resources consumed.

To cope with this requirement, the DataSet has the GetChanges() method. This simple method performs a huge amount of work, and returns a DataSet with only the changed rows from the source dataset. This is ideal for passing between tiers, as only a minimal set of data has to be passed across the wire.

The following example shows how to generate a "changes" DataSet :

   DataSet source = {some dataset};     DataSet dest = source.GetChanges();

Again, this is trivial. Under the covers things are a little more interesting. There are two overloads of the GetChanges() method. One overload takes a value of the DataRowState enumeration, and returns only rows that correspond to that state (or states). GetChanges() simply calls GetChanges(Deleted Modified Added) , and first checks to ensure that there are some changes by calling HasChanges() . If no changes have been made, then a null is returned to the caller immediately.

The next operation is to clone the current DataSet . Once done, the new DataSet is set up to ignore constraint violations ( EnforceConstraints = false ), and then each changed row for every table is copied into the new DataSet .

Once you have a DataSet that just contains changes, you can then move these off to the data services tier for processing. Once the data is updated in the database, the "changes" DataSet can be returned to the caller (as there may, for example, be some output parameters from the stored procedures that have updated values in the columns ). These changes can then be merged into the original DataSet using the Merge() method. This sequence of operations is depicted below:

Key Generation with SQL Server

The RegionInsert stored procedure presented earlier in the chapter was one example of generating a primary key value on insertion into the database. The method for generating the key was fairly crude and wouldn't scale well, so for a real application you should look at utilizing some other strategy for generating keys.

Your first instinct might be simply to define an identity column, and return the @@IDENTITY value from the stored procedure. The following stored procedure shows how this might be defined for the Categories table in the Northwind example database. Type this stored procedure into SQL Query Analyzer, or run the StoredProcs.sql file in the 13_SQLServerKeys directory:

   CREATE PROCEDURE CategoryInsert(@CategoryName NVARCHAR(15),     @Description NTEXT,     @CategoryID INTEGER OUTPUT) AS     SET NOCOUNT OFF     INSERT INTO Categories (CategoryName, Description)     VALUES(@CategoryName, @Description)     SELECT @CategoryID = @@IDENTITY     GO

This inserts a new row into the Category table, and returns the generated primary key to the caller. You can test the procedure by typing in the following SQL in Query Analyzer:

   DECLARE @CatID int;     EXECUTE CategoryInsert 'Pasties' , 'Heaven Sent Food' , @CatID OUTPUT;     PRINT @CatID;

When executed as a batch of commands, this will insert a new row into the Categories table, and return the identity of the new record, which is then displayed to the user .

Let's say that some months down the line, someone decides to add in a simple audit trail, which will record all insertions and modifications made to the category name . You define a table such as that shown below, which will record the old and new value of the category:

The creation script for this table is included in the StoredProcs.sql file. The AuditID column is defined as an IDENTITY column. You then construct a couple of database triggers that will record changes to the CategoryName field:

   CREATE TRIGGER CategoryInsertTrigger     ON Categories     AFTER UPDATE     AS     INSERT INTO CategoryAudit(CategoryID , OldName , NewName)     SELECT old.CategoryID, old.CategoryName, new.CategoryName     FROM Deleted AS old,     Categories AS new     WHERE old.CategoryID = new.CategoryID;     GO

For those of you used to Oracle stored procedures, SQL Server doesn't exactly have the concept of OLD and NEW rows, instead for an insert trigger there is an in memory table called Inserted , and for deletes and updates the old rows are available within the Deleted table.

This trigger retrieves the CategoryID of the record(s) affected, and stores this together with the old and new value of the CategoryName column.

Now, when you call your original stored procedure to insert a new CategoryID , you receive an identity value; however, this is no longer the identity value from the row inserted into the Categories table, it is now the new value generated for the row in the CategoryAudit table. Ouch!

To view the problem first hand, open up a copy of SQL Server Enterprise manager, and view the contents of the Categories table.

This lists all the categories I have in my instance of the database.

The next identity value for the Categories table should be 21 , so we'll insert a new row by executing the code shown below, and see what ID is returned as follows :

   DECLARE @CatID int;     EXECUTE CategoryInsert 'Pasties' , 'Heaven Sent Food' , @CatID OUTPUT;     PRINT @CatID;

The output value of this on my PC was 17 . If I look into the CategoryAudit table, I find that this is the identity of the newly inserted audit record, not that of the category record created.

The problem lies in the way that @@IDENTITY actually works. It returns the LAST identity value created by your session, so as shown above it isn't completely reliable.

There are two other identity functions that you can utilize instead of @@IDENTITY , but neither are free from possible problems. The first, SCOPE_IDENTITY() , will return the last identity value created within the current "scope". SQL Server defines scope as a stored procedure, trigger, or function. This may work most of the time, but if for some reason someone adds another INSERT statement into the stored procedure, then you will receive this value rather than the one you expected.

The other, IDENT_CURRENT() will return the last identity value generated for a given table in any scope, so for instance, if two users were accessing SQL Server at exactly the same time, it might be possible to receive the other user's generated identity value.

As you might imagine, tracking down a problem of this nature isn't easy. The moral of the story is to beware when utilizing IDENTITY columns in SQL Server.

Naming Conventions

Having worked with database applications all my working life, I've picked up a few recommendations for naming entities, which are worth sharing. I know, this isn't really .NET related , but the conventions are useful especially when naming constraints as above. Feel free to skip this section if you already have your own views on the subject.

Database Tables

Always use singular names Product rather than Products . This one is largely due to having to explain to customers a database schema it's much better grammatically to say "The Product table contains products" than "The Products table contains products". Have a look at the Northwind database as an example of how not to do this.
Adopt some form of naming convention for the fields that go into a table ours is < Table > _ID for the primary key of a table ( assuming that the primary key is a single column), Name for the field considered to be the user-friendly name of the record, and Description for any textual information about the record itself. Having a good table convention means you can look at virtually any table in the database and instinctively know what the fields are used for.

Database Columns

Use singular rather than plural names again.
Any columns that link to another table should be named the same as the primary key of that table. So, a link to the Product table would be Product_ID , and to the Sample table Sample_ID . This isn't always possible, especially if one table has multiple references to another. In that case use your own judgment.
Date fields should have a suffix of _On , as in Modified_On , Created_On . Then it's easy to read some SQL output and infer what a column means just by its name.
Fields that record the user should be suffixed with _By , as in Modified_By and Created_By . Again, this aids legibility.

Constraints

If possible, include in the name of the constraint the table and column name, as in CK_ < Table > _ < Field >. Examples would be CK_PERSON_SEX for a check constraint on the SEX column of the PERSON table. A foreign key example would be FK_Product_Supplier_ID , for the foreign key relationship between product and supplier.
Show the type of constraint with a prefix, such as CK for a check constraint and FK for a foreign key constraint. Feel free to be more specific, as in CK_PERSON_AGE_GT0 for a constraint on the age column indicating that the age should be greater than zero.
If you have to trim the length of the constraint, do it on the table name part rather than the column name. When you get a constraint violation, it's usually easy to infer which table was in error, but sometimes not so easy to check which column caused the problem. Oracle has a 30-character limit on names, which you can easily hit.

Stored Procedures

Just like the obsession many have fallen into over the past few years of putting a ' C ' in front of each and every class they have declared (you know you have!), many SQL Server developers feel compelled to prefix every stored procedure with ' sp_ ' or something similar. It's not a good idea.

SQL Server uses the ' sp_ ' prefix for all (well, most) system stored procedures. So, on the one hand, you risk confusing your users into thinking that ' sp_widget ' is something that comes as standard with SQL Server. In addition, when looking for a stored procedure, SQL Server will treat procedures with the ' sp_ ' prefix differently from those without.

If you use this prefix, and do not qualify the database/owner of the stored procedure, then SQL Server will look in the current scope, then jump into the master database and look up the stored procedure there. Without the ' sp_ ' prefix your users would get an error a little earlier. What's worse , and also possible to do, is to create a local stored procedure (one within your database) that has the same name and parameters as an system stored procedure. I'd avoid this at all costs if in doubt, don't prefix.

Above all, when naming entities, whether within the database or within code, be consistent .

Performance

The current set of managed providers available for .NET are somewhat limited you can choose OleDb or SqlClient; OleDb permits connection to any data source exposed with an OLE DB driver (such as Oracle), and the SqlClient provider is tailored for SqlServer .

The SqlClient provider has been written completely in managed code, and uses as few layers as possible to connect to the database. This provider writes TDS ( Tabular Data Stream ) packets direct to SQL Server, which should be substantially faster than the OleDb provider, which naturally has to go through a number of layers before actually hitting the database.

To test the theory, the following code was run against the same database on the same machine, the only difference being the use of the SqlClient managed provider over the ADO provider:

   SqlConnection conn = new SqlConnection(Login.Connection);     conn.Open();     SqlCommand cmd = new SqlCommand ("update tempdata set AValue=1 Where ID=1" ,     conn);     DateTime   initial, elapsed ;     initial = DateTime.Now ;     for(int i = 0; i < iterations; i++)     cmd.ExecuteNonQuery();     elapsed = DateTime.Now ;     conn.Close();

Naturally the OLE DB version utilizes OleDbCommand rather than SqlCommand . I created a simple database table with two columns as shown below, and manually added a single row:

The SQL clause used was a simple UPDATE command:

   UPDATE TempData SET AValue = 1 WHERE ID = 1.

The SQL was kept deliberately simple to attempt to highlight the differences in the providers. The results (in seconds) achieved for various combinations of iterations were as follows :

Provider	100	1000	10000	50000
OleDb	0.109	0.798	7.95	39.11
Sql	0.078	0.626	6.23	29.27

If you are only targeting SQL Server then the obvious choice is the Sql provider. Back in the real world, if you target anything other than SQL Server you naturally have to use the OleDb provider. Or do you?

As Microsoft has done an excellent job of making database access generic with the System.Data.Common classes, it would be better to write code against those classes, and use the appropriate managed provider at run time. It's fairly simple to swap between OleDb and Sql now, and if other database vendors write managed providers for their products, you will be able to swap out ADO for a native provider with little (or no) code changes. For an example of the versatility of .NET data access, The "Scientific Data Center" case study in " Data-Centric .NET Programming with C# " (Wrox Press, ISBN 1-861005-92-x) details using C# to query a MySQL database.