Working With ADO.NET | Pro Visual C++ 2005 for C# Developers

This section addresses some common scenarios when developing data access applications with ADO.NET.

Tiered Development

Producing an application that interacts with data is often done by splitting up the application into tiers. A common model is to have an application tier (the front end), a data services tier, and the database itself.

One of the difficulties with this model is deciding what data to transport between tiers, and the format that it should be transported in. With ADO.NET you'll be pleased to learn that these wrinkles have been ironed out, and support for this style of architecture is part of the design.

One of the things that's much better in ADO.NET than OLE DB is the support for copying an entire Recordset. In .NET it's easy to copy a DataSet:

 DataSet source = {some dataset};  DataSet dest = source.Copy();

This creates an exact copy of the source DataSet — each DataTable, DataColumn, DataRow, and Relation will be copied, and all data will be in exactly the same state as it was in the source. If all you want to copy is the schema of the DataSet, you can use the following code:

 DataSet source = {some dataset};  DataSet dest = source.Clone();

This again copies all tables, relations, and so on. However, each copied DataTable will be empty. This process really couldn't be more straightforward.

A common requirement when writing a tiered system, whether based on Win32 or the Web, is to be able to ship as little data as possible between tiers. This reduces the amount of resources consumed.

To cope with this requirement, the DataSet class has the GetChanges() method. This simple method performs a huge amount of work, and returns a DataSet with only the changed rows from the source data set. This is ideal for passing data between tiers, because only a minimal set of data has to be passed along.

The following example shows how to generate a "changes" DataSet:

 DataSet source = {some dataset};  DataSet dest = source.GetChanges();

Again, this is trivial. Under the hood, things are a little more interesting. There are two overloads of the GetChanges() method. One overload takes a value of the DataRowState enumeration, and returns only rows that correspond to that state (or states). GetChanges() simply calls GetChanges(Deleted | Modified | Added), and first checks to ensure that there are some changes by calling HasChanges(). If no changes have been made, null is returned to the caller immediately.

The next operation is to clone the current DataSet. Once done, the new DataSet is set up to ignore constraint violations (EnforceConstraints = false ), and then each changed row for every table is copied into the new DataSet.

When you have a DataSet that just contains changes, you can then move these off to the data services tier for processing. After the data has been updated in the database, the "changes" DataSet can be returned to the caller (for example, there might be some output parameters from the stored procedures that have updated values in the columns). These changes can then be merged into the original DataSet using the Merge() method. Figure 19-9 depicts this sequence of operations.

image from book
Figure 19-9

Key Generation with SQL Server

The RegionInsert stored procedure presented earlier in this chapter is one example of generating a primary key value on insertion into the database. The method for generating the key in this particular example is fairly crude and wouldn't scale well, so for a real application you should use some other strategy for generating keys.

Your first instinct might be to define an identity column, and return the @@IDENTITY value from the stored procedure. The following stored procedure shows how this might be defined for the Categories table in the Northwind example database. Type this stored procedure into SQL Query Analyzer, or run the StoredProcs.sql file that is part of the code download:

 CREATE PROCEDURE CategoryInsert(@CategoryName NVARCHAR(15), @Description NTEXT, @CategoryID INTEGER OUTPUT) AS SET NOCOUNT OFF INSERT INTO Categories (CategoryName, Description) VALUES(@CategoryName, @Description) SELECT @CategoryID = @@IDENTITY GO

This inserts a new row into the Category table, and returns the generated primary key to the caller (the value of the CategoryID column). You can test the procedure by typing in the following SQL in Query Analyzer:

 DECLARE @CatID int; EXECUTE CategoryInsert 'Pasties' , 'Heaven Sent PRINT @CatID; PRINT @CatID;

When executed as a batch of commands, this inserts a new row into the Categories table, and returns the identity of the new record, which is then displayed to the user.

Suppose that some months down the line, someone decides to add a simple audit trail, which will record all insertions and modifications made to the category name. In that case, you define a table similar to the one shown in Figure 19-10, which will record the old and new value of the category.

image from book
Figure 19-10

The script for this table is included in the StoredProcs.sql file. The AuditID column is defined as an IDENTITY column. You then construct a couple of database triggers that will record changes to the CategoryName field:

 CREATE TRIGGER CategoryInsertTrigger ON Categories AFTER UPDATE AS INSERT INTO CategoryAudit(CategoryID , OldName , NewName ) SELECT old.CategoryID, old.CategoryName, new.CategoryName FROM Deleted AS old, Categories AS new WHERE old.CategoryID = new.CategoryID; GO

If you are used to Oracle stored procedures, SQL Server doesn't exactly have the concept of OLD and NEW rows; instead for an insert trigger there is an in memory table called Inserted, and for deletes and updates the old rows are available within the Deleted table.

This trigger retrieves the CategoryID of the record(s) affected, and stores this together with the old and new value of the CategoryName column.

Now, when you call your original stored procedure to insert a new CategoryID, you receive an identity value; however, this is no longer the identity value from the row inserted into the Categories table, it is now the new value generated for the row in the CategoryAudit table. Ouch!

To view the problem first hand, open a copy of SQL Server Enterprise manager, and view the contents of the Categories table (see Figure 19-11).

image from book
Figure 19-11

This lists all the categories in the Northwind database.

The next identity value for the Categories table should be 9, so a new row can be inserted by executing the following code, to see what ID is returned:

 DECLARE @CatID int;  EXECUTE CategoryInsert 'Pasties' , 'Heaven Sent Food' , @CatID OUTPUT;  PRINT @CatID;

The output value of this on a test PC was 1. If you look at the CategoryAudit table shown in Figure 19-12, you will find that this is the identity of the newly inserted audit record, not the identity of the category record created.

Figure 19-12

The problem lies in the way that @@IDENTITY actually works. It returns the LAST identity value created by your session, so as shown in Figure 19-12 it isn't completely reliable.

Two other identity functions can be used instead of @@IDENTITY, but neither are free from possible problems. The first, SCOPE_IDENTITY(), returns the last identity value created within the current scope. SQL Server defines scope as a stored procedure, trigger, or function. This may work most of the time, but if for some reason someone adds another INSERT statement into the stored procedure, then you can receive this value rather than the one you expected.

The other identity function, IDENT_CURRENT(), returns the last identity value generated for a given table in any scope. For example, if two users were accessing SQL Server at exactly the same time, it might be possible to receive the other user's generated identity value.

As you might imagine, tracking down a problem of this nature isn't easy. The moral of the story is to beware when utilizing IDENTITY columns in SQL Server.

Naming Conventions

The following tips and conventions are not directly .NET related. However, they are worth sharing and following, especially when naming constraints. Feel free to skip this section if you already have your own views on this subject.

Conventions for database tables

Always use singular names — Product rather than Products. This one is largely due to having to explain to customers a database schema; it's much better grammatically to say "The Product table contains products" than "The Products table contains products." Check out the Northwind database to see an example of how not to do this.
Adopt some form of naming convention for the fields that go into a table — Ours is <Table>_Id for the primary key of a table (assuming that the primary key is a single column), Name for the field considered to be the user-friendly name of the record, and Description for any textual information about the record itself. Having a good table convention means you can look at virtually any table in the database and instinctively know what the fields are used for.

Conventions for database columns

Use singular rather than plural names.
Any columns that link to another table should be named the same as the primary key of that table. For example, a link to the Product table would be Product_Id, and to the Sample table Sample_Id. This isn't always possible, especially if one table has multiple references to another. In that case use your own judgment.
Date fields should have a suffix of _On, as in Modified_On and Created_On. Then it's easy to read some SQL output and infer what a column means just by its name.
Fields that record the user should be suffixed with _By, as in Modified_By and Created_By. Again, this aids legibility.

Conventions for constraints

If possible, include in the name of the constraint the table and column name, as in CK_<Table>_<Field>. For example, CK_Person_Sex for a check constraint on the Sex column of the Person table. A foreign key example would be FK_Product_Supplier_Id, for the foreign key relationship between product and supplier.
Show the type of constraint with a prefix, such as CK for a check constraint and FK for a foreign key constraint. Feel free to be more specific, as in CK_Person_Age_GT0 for a constraint on the age column indicating that the age should be greater than zero.
If you have to trim the length of the constraint, do it on the table name part rather than the column name. When you get a constraint violation, it's usually easy to infer which table was in error, but sometimes not so easy to check which column caused the problem. Oracle has a 30-character limit on names, which is easy to surpass.

Stored procedures

Just like the obsession many have fallen into over the past few years of putting a C in front of each and every class they have declared (you know you have!), many SQL Server developers feel compelled to prefix every stored procedure with sp_ or something similar. This is not a good idea.

SQL Server uses the sp_ prefix for all (well, most) system stored procedures. So, on the one hand, you risk confusing your users into thinking that sp_widget is something that comes as standard with SQL Server. In addition, when looking for a stored procedure, SQL Server will treat procedures with the sp_ prefix differently from those without.

If you use this prefix, and do not qualify the database/owner of the stored procedure, SQL Server will look in the current scope and then jump into the master database and look up the stored procedure there. Without the sp_ prefix your users would get an error a little earlier. What's worse, and also possible to do, is to create a local stored procedure (one within your database) that has the same name and parameters as a system stored procedure. Avoid this at all costs — if in doubt, don't prefix.

When calling stored procedures, always prefix with the owner of the procedure, as in dbo.selectWidgets. This is slightly faster than not using the prefix as SQL Server has less work to do to find the stored proc. Something like this is not likely to have a huge impact on the execution speed of your application, but it is a tuning trick that is essentially available for free.

Above all, when naming entities, whether within the database or within code, be consistent.