Managing Data and Relationships: The DataSet

 
Chapter 9 - Data Access with .NET
bySimon Robinsonet al.
Wrox Press 2002
  

The DataSet class has been designed as an offline container of data. It has no notion of database connections. In fact, the data held within a DataSet doesn't necessarily need to have come from a database it could just as easily be records from a CSV file, or points read from a measuring device.

A DataSet consists of a set of data tables, each of which will have a set of data columns and data rows. In addition to defining the data, you can also define links between tables within the DataSet . One common scenario would be when defining a parent-child relationship (commonly known as master/detail). One record in a table (say Order ) links to many records in another table (say Order_Details ). This relationship can be defined and navigated within the DataSet .

click to expand

The following sections describe the classes that are used with a DataSet .

Data Tables

A data table is very similar to a physical database table it consists of a set of columns with particular properties, and may contain zero or more rows of data. A data table may also define a primary key, which can be one or more columns, and may also contain constraints on columns. The generic term for this information used throughout the rest of the chapter is schema .

There are several ways to define the schema for a particular data table (and indeed the DataSet as a whole). These are discussed after we introduce data columns and data rows.

The following diagram shows some of the objects that are accessible through the data table:

click to expand

A DataTable object (and also a DataColumn ) can have an arbitrary number of extended properties associated with it. This collection can be populated with any user -defined information pertaining to the object. For example, a given column might have an input mask used to validate the contents of that column the typical example would be the US social security number. Extended properties are especially useful when the data is constructed within a middle tier and returned to the client for some processing. You could, for example, store validation criteria (such as min and max ) for numeric columns.

When a data table has been populated, either by selecting data from a database, reading data from a file, or manually populating within code, the Rows collection will contain this retrieved data.

The Columns collection contains DataColumn instances that have been added to this table. These define the schema of the data, such as the data type, nullability, default values, and so on. The Constraints collection can be populated with either unique or primary key constraints.

One example of where the schema information for a data table is used is when displaying that data in a DataGrid (which we'll discuss at length in the next chapter). The DataGrid control uses properties such as the data type of the column to decide what control to use for that column. A bit field within the database will be displayed as a checkbox within the DataGrid . If a column is defined within the database schema as NOT NULL , then this fact will be stored within the DataColumn so that it can be tested when the user attempts to move off a row.

Data Columns

A DataColumn object defines properties of a column within the DataTable , such as the data type of that column, whether the column is read only, and various other facts. A column can be created in code, or can be automatically generated by the runtime.

When creating a column, it is also useful to give it a name ; otherwise the runtime will generate a name for you in the form Columnn where n is an incrementing number.

The data type of the column can be set either by supplying it in the constructor, or by setting the DataType property. Once you have loaded data into a data table you cannot alter the type of a column you'll just receive an ArgumentException .

Data columns can be created to hold the following .NET Framework data types:

Boolean

Decimal

Int64

TimeSpan

Byte

Double

Sbyte

UInt16

Char

Int16

Single

UInt32

DateTime

Int32

String

UInt64

Once created, the next thing to do with a DataColumn object is to set up other properties, such as the nullability of the column or the default value. The following code fragment shows a few of the more common options to set on a DataColumn :

   DataColumn customerID = new DataColumn("CustomerID" , typeof(int));     customerID.AllowDBNull = false;     customerID.ReadOnly = false;     customerID.AutoIncrement = true;     customerID.AutoIncrementSeed = 1000;     DataColumn name = new DataColumn("Name" , typeof(string));     name.AllowDBNull = false;     name.Unique = true;   

The following properties can be set on a DataColumn :

Property

Description

AllowDBNull

If true , permits the column to be set to DBNull .

AutoIncrement

Defines that this column value is automatically generated as an incrementing number.

AutoIncrementSeed

The initial seed value for an AutoIncrement column.

AutoIncrementStep

Defines the step between automatically generated column values, with a default of one.

Caption

Can be used for displaying the name of the column on screen.

ColumnMapping

Defines how a column is mapped into XML when a DataSet is saved by calling DataSet.WriteXml .

ColumnName

The name of the column. This is auto-generated by the runtime if not set in the constructor.

DataType

The System.Type value of the column.

DefaultValue

Can define a default value for a column.

Expression

This property defines the expression to be used in a computed column.

Data Rows

This class makes up the other part of the DataTable class. The columns within a data table are defined in terms of the DataColumn class. The actual data within the table is accessed using the DataRow object. The following example shows how to access rows within a data table. The code for this example is available in the 07_SimpleDatasetSql directory. First, the connection details:

 string source = "server=(local)\NetSDK;" +                 "uid=QSUser;pwd=QSPassword;" +                  "database=northwind"; string select = "SELECT ContactName,CompanyName FROM Customers"; SqlConnection  conn = new SqlConnection(source); 

The following code introduces the SqlDataAdapter class, which is used to place data into a DataSet . The SqlDataAdapter will issue the SQL clause, and fill a table in the DataSet called Customers with the output of this following query. We'll be discussing the data adapter class further in the Populating a DataSet section.

   SqlDataAdapter da = new SqlDataAdapter(select, conn);     DataSet ds = new DataSet();     da.Fill(ds , "Customers");   

In the code below, you may notice the use of the DataRow indexer to access values from within that row. The value for a given column can be retrieved using one of the several overloaded indexers. These permit you to retrieve a value knowing the column number, name, or DataColumn :

   foreach(DataRow row in ds.Tables["Customers"].Rows)     Console.WriteLine("'{0}' from {1}" , row[0] ,row[1]);   

One of the most appealing aspects of a DataRow is that it is versioned. This permits you to receive various values for a given column in a particular row. The versions are described in the following table:

DataRowVersion Value

Description

Current

The value existing at present within the column. If no edit has occurred, this will be the same as the original value. If an edit (or edits) have occurred, the value will be the last valid value entered.

Default

The default value (in other words, any default set up for the column).

Original

The value of the column when originally selected from the database. If the DataRow 's AcceptChanges method is called, then this value will update to be the Current value.

Proposed

When changes are in progress for a row, it is possible to retrieve this modified value. If you call BeginEdit() on the row and make changes, each column will have a proposed value until either EndEdit() or CancelEdit() is called.

The version of a given column could be used in many ways. One example is when updating rows within the database, in which instance it is common to issue an SQL statement such as the following:

   UPDATE Products     SET    Name = Column.Current     WHERE  ProductID = xxx     AND    Name = Column.Original;   

Obviously this code would never compile, but it shows one use for original and current values of a column within a row.

To retrieve a versioned value from the DataRow , use one of the indexer methods that accept a DataRowVersion value as a parameter. The following code snippet shows how to obtain all values of each column in a DataTable :

   foreach (DataRow row in ds.Tables["Customers"].Rows)     {     foreach (DataColumn dc in ds.Tables["Customers"].Columns)     {     Console.WriteLine ("{0} Current  = {1}" , dc.ColumnName ,     row[dc,DataRowVersion.Current]);     Console.WriteLine ("    Default  = {0}" , row[dc,DataRowVersion.Default]);     Console.WriteLine ("    Original = {0}" , row[dc,DataRowVersion.Original]);     }     }   

The whole row has a state flag called RowState , which can be used to determine what operation is needed on the row when it is persisted back to the database. The RowState property is set to keep track of all the changes made to the DataTable , such as adding new rows, deleting existing rows, and changing columns within the table. When the data is reconciled with the database, the row state flag is used to determine what SQL operations should occur. These flags are defined by the DataRowState enumeration:

DataRowState Value

Description

Added

The row has been newly added to a DataTable 's Rows collection. All rows created on the client are set to this value, and will ultimately issue SQL INSERT statements when reconciled with the database.

Deleted

This indicates that the row has been marked as deleted from the DataTable by means of the DataRow.Delete() method. The row still exists within the DataTable , but will not normally be viewable on screen (unless a DataView has been explicitly set up). DataView s will be discussed in the next chapter. Rows marked as deleted in the DataTable will be deleted from the database when reconciled.

Detached

A row is in this state immediately after it is created, and can also be returned to this state by calling DataRow.Remove() . A detached row is not considered to be part of any data table, and as such no SQL for rows in this state will be issued.

Modified

A row will be Modified if the value in any column has been changed.

Unchanged

The row has not been changed since the last call to AcceptChanges() .

The state of the row depends also on what methods have been called on the row. The AcceptChanges() method is generally called after successfully updating the data source (that is, after persisting changes to the database).

The most common way to alter data in a DataRow is to use the indexer; however, if you have a number of changes to make you also need to consider the BeginEdit() and EndEdit() methods.

When an alteration is made to a column within a DataRow , the ColumnChanging event is raised on the row's DataTable . This permits you to override the ProposedValue property of the DataColumnChangeEventArgs class classes, and change it as required. This is one way of performing some data validation on column values. If you call BeginEdit() before making changes, the ColumnChanging event will not be raised. This permits you to make multiple changes and then call EndEdit() to persist these changes. If you wish to revert to the original values, call CancelEdit() .

A DataRow can be linked in some way to other rows of data. This permits the creation of navigable links between rows, which is common in master/detail scenarios. The DataRow contains a GetChildRows() method that will return an array of associated rows from another table in the same DataSet as the current row. These are discussed in the Data Relationships section later in this chapter.

Schema Generation

There are three ways to create the schema for a DataTable . These are:

  • Let the runtime do it for you

  • Write code to create the table(s)

  • Use the XML schema generator

Runtime Schema Generation

The DataRow example shown earlier presented the following code for selecting data from a database and populating a DataSet :

   SqlDataAdapter da = new SqlDataAdapter(select , conn);     DataSet ds = new DataSet();     da.Fill(ds , "Customers");   

This is obviously easy to use, but it has a few drawbacks too. One example is that you have to make do with the column names selected from the database, which may be fine , but in certain instances you might want to rename a physical database column (say PKID ) to something more user-friendly.

You could naturally rename columns within your SQL clause, as in SELECT PID AS PersonID FROM PersonTable; I would always recommend not renaming columns within SQL, as the only place a column really needs to have a "pretty" name is on screen.

Another potential problem with automated DataTable / DataColumn generation is that you have no control over the column types that the runtime chooses for your data. It does a fairly good job of deciding the correct data type for you, but as usual there are instances where you need more control. You might for example have defined an enumerated type for a given column, so as to simplify user code written against your class. If you accept the default column types that the runtime generates, the column will likely be an integer with a 32-bit range, as opposed to an enum with five options.

Lastly, and probably most problematic , is that when using automated table generation, you have no type-safe access to the data within the DataTable you are at the mercy of indexers, which return instances of object rather than derived data types. If you like sprinkling your code with typecast expressions then skip the following sections.

Hand-Coded Schema

Generating the code to create a DataTable , replete with associated DataColumns is fairly easy. The examples within this section will access the Products table from the Northwind database shown below. The code for this section is available in the 08_ManufacturedDataSet example.

click to expand

The following code manufactures a DataTable , which corresponds to the above schema.

   public static void ManufactureProductDataTable(DataSet ds)     {     DataTable   products = new DataTable("Products");     products.Columns.Add(new DataColumn("ProductID", typeof(int)));     products.Columns.Add(new DataColumn("ProductName", typeof(string)));     products.Columns.Add(new DataColumn("SupplierID", typeof(int)));     products.Columns.Add(new DataColumn("CategoryID", typeof(int)));     products.Columns.Add(new DataColumn("QuantityPerUnit", typeof(string)));     products.Columns.Add(new DataColumn("UnitPrice", typeof(decimal)));     products.Columns.Add(new DataColumn("UnitsInStock", typeof(short)));     products.Columns.Add(new DataColumn("UnitsOnOrder", typeof(short)));     products.Columns.Add(new DataColumn("ReorderLevel", typeof(short)));     products.Columns.Add(new DataColumn("Discontinued", typeof(bool)));     ds.Tables.Add(products);     }   

You can alter the code in the DataRow example to utilize this newly generated table definition as follows :

 string source = "server=localhost;" +                 "integrated security=sspi;" +                  "database=Northwind";   string select = "SELECT * FROM Products";   SqlConnection conn = new SqlConnection(source); SqlDataAdapter cmd = new SqlDataAdapter(select, conn); DataSet ds = new DataSet();   ManufactureProductDataTable(ds);     cmd.Fill(ds, "Products");     foreach(DataRow row in ds.Tables["Products"].Rows)   Console.WriteLine("'{0}' from {1}", row[0], row[1]); 

The ManufactureProductDataTable() method creates a new DataTable , adds each column in turn , and finally appends this to the list of tables within the DataSet . The DataSet has an indexer that takes the name of the table and returns that DataTable to the caller.

The above example is still not really type-safe, as I'm using indexers on columns to retrieve the data. What would be better is a class (or set of classes) derived from DataSet , DataTable, and DataRow , that define type-safe accessors for tables, rows, and columns. You can generate this code yourself it's not particularly tedious and you end up with truly type-safe data access classes.

If you don't like the sound of generating these type-safe classes yourself then help is at hand. The .NET Framework includes support for using XML schemas to define a DataSet , DataTable, and the other classes that we have touched on in this section. The XML Schemas section later in the chapter details this method; but first, we will look at relationships and constraints within a DataSet .

Data Relationships

When writing an application, it is often necessary to obtain and cache various tables of information. The DataSet class is the container for this information. With regular OLE DB it was necessary to provide a strange SQL dialect to enforce hierarchical data relationships, and the provider itself was not without its own subtle quirks .

The DataSet class on the other hand has been designed from the start to establish relationships between data tables with ease. For the code in this section I decided to hand-generate and populate two tables with data. So, if you haven't got SQL Server or the NorthWind database to hand, you can run this example anyway. The code is available in the 09_DataRelationships directory:

   DataSet ds = new DataSet("Relationships");     ds.Tables.Add(CreateBuildingTable());     ds.Tables.Add(CreateRoomTable());     ds.Relations.Add("Rooms",     ds.Tables["Building"].Columns["BuildingID"],     ds.Tables["Room"].Columns["BuildingID"]);   

The tables simply contain a primary key and name field, with the Room table having BuildingID as a foreign key.

click to expand

These tables were kept deliberately simple, as my fingers were wearing out at this point so I didn't want to add too many columns to either one.

I then added some default data to each table. Once that was done, I could then iterate through the buildings and rooms using the code below.

   foreach(DataRow theBuilding in ds.Tables["Building"].Rows)     {     DataRow[] children = theBuilding.GetChildRows("Rooms");     int roomCount = children.Length;     Console.WriteLine("Building {0} contains {1} room{2}",     theBuilding["Name"],     roomCount,     roomCount > 1 ? "s" : "");     // Loop through the rooms     foreach(DataRow theRoom in children)     Console.WriteLine("Room: {0}", theRoom["Name"]);     }   

The big difference between the DataSet and the old-style hierarchical Recordset object is in the way the relationship is presented. In a hierarchical Recordset , the relationship was presented as a pseudo-column within the row. This column itself was a Recordset that could be iterated through. Under ADO.NET, however, a relationship is traversed simply by calling the GetChildRows() method:

 DataRow[] children = theBuilding.GetChildRows("Rooms"); 

This method has a number of forms, but the simple example shown above just uses the name of the relationship to traverse between parent and child rows. It returns an array of rows that can be updated as appropriate by using the indexers as shown in earlier examples.

What's more interesting with data relationships is that they can be traversed both ways. Not only can you go from a parent to the child rows, but you can also find a parent row (or rows) from a child record simply by using the ParentRelations property on the DataTable class. This property returns a DataRelationCollection , which can be indexed using the [] array syntax (for example, ParentRelations["Rooms"] ), or as an alternative the GetParentRows() method can be called as shown below:

   foreach(DataRow theRoom in ds.Tables["Room"].Rows)     {     DataRow[] parents = theRoom.GetParentRows("Rooms");     foreach(DataRow theBuilding in parents)     Console.WriteLine("Room {0} is contained in building {1}",     theRoom["Name"],     theBuilding["Name"]);     }   

There are two methods with various overrides available for retrieving the parent row(s) GetParentRows() (which returns an array of zero or more rows), or GetParentRow() (which retrieves a single parent row given a relationship).

Data Constraints

Changing the data type of columns created on the client is not the only thing a DataTable is good for. ADO.NET permits you to create a set of constraints on a column (or columns), which are then used to enforce rules within the data.

The runtime currently supports the following constraint types, embodied as classes in the System.Data namespace.

Constraint

Description

ForeignKeyConstraint

Enforce a link between two DataTable s within a DataSet

UniqueConstraint

Ensure that entries in a given column are unique

Setting a Primary Key

As is common for a table in a relational database, you can supply a primary key, which can be based on one or more columns from the DataTable .

The code below creates a primary key for the Products table, whose schema we constructed by hand earlier, and can be found in the 08_ManufactureDataSet folder.

Note that a primary key on a table is just one form of constraint. When a primary key is added to a DataTable , the runtime also generates a unique constraint over the key column(s). This is because there isn't actually a constraint type of PrimaryKey a primary key is simply a unique constraint over one or more columns.

   public static void ManufacturePrimaryKey(DataTable dt)     {     DataColumn[] pk = new DataColumn[1];     pk[0] = dt.Columns["ProductID"];     dt.PrimaryKey = pk;     }   

As a primary key may contain several columns, it is typed as an array of DataColumn s. A table's primary key can be set to those columns simply by assigning an array of columns to the property.

To check the constraints for a table, you can iterate through the ConstraintCollection . For the auto-generated constraint produced by the above code, the name of the constraint is Constraint1 . That's not a very useful name, so to avoid this problem it is always best to create the constraint in code first, then define which column(s) make up the primary key, as we shall do now.

As a long time database programmer, I find named constraints much simpler to understand, as most databases produce cryptic names for constraints, rather than something simple and legible. The code below names the constraint before creating the primary key:

 DataColumn[] pk = new DataColumn[1]; pk[0] = dt.Columns["ProductID"];   dt.Constraints.Add(new UniqueConstraint("PK_Products", pk[0]));   dt.PrimaryKey = pk; 

Unique constraints can be applied to as many columns as you wish.

Setting a Foreign Key

In addition to unique constraints, a DataTable may also contain foreign key constraints. These are primarily used to enforce master/detail relationships, but can also be used to replicate columns between tables if you set the constraint up correctly. A master/detail relationship is one where there is commonly one parent record (say an order) and many child records (order lines), linked by the primary key of the parent record.

A foreign key constraint can only operate over tables within the same DataSet , so the following example utilizes the Categories table from the Northwind database, and assigns a constraint between it and the Products table.

click to expand

The first step is to generate a new data table for the Categories table. The 08_ManufactureDataSet example includes this code:

   DataTable categories = new DataTable("Categories");     categories.Columns.Add(new DataColumn("CategoryID", typeof(int)));     categories.Columns.Add(new DataColumn("CategoryName", typeof(string)));     categories.Columns.Add(new DataColumn("Description", typeof(string)));     categories.Constraints.Add(new UniqueConstraint("PK_Categories",     categories.Columns["CategoryID"]));     categories.PrimaryKey = new DataColumn[1]     {categories.Columns["CategoryID"]};   

The last line of the above code creates the primary key for the Categories table. The primary key in this instance is a single column; however, it is possible to generate a key over multiple columns using the array syntax shown.

Then I need to create the constraint between the two tables:

   DataColumn parent = ds.Tables["Categories"].Columns["CategoryID"];     DataColumn child = ds.Tables["Products"].Columns["CategoryID"];     ForeignKeyConstraint fk =     new ForeignKeyConstraint("FK_Product_CategoryID", parent, child);     fk.UpdateRule = Rule.Cascade;     fk.DeleteRule = Rule.SetNull;     ds.Tables["Products"].Constraints.Add(fk);   

This constraint applies to the link between Categories.CategoryID and Products.CategoryID . There are four different constructors for ForeignKeyConstraint , but again I would suggest using those that permit you to name the constraint.

Setting Update and Delete Constaints

In addition to defining the fact that there is some type of constraint between parent and child tables, you can define what should happen when a column in the constraint is updated.

The above example sets the update rule and the delete rule. These rules are used when an action occurs to a column (or row) within the parent table, and the rule is used to decide what should happen to row(s) within the child table that could be affected. There are four different rules that can be applied through the Rule enumeration:

  • Cascade If the parent key was updated then copy the new key value to all child records. If the parent record was deleted, delete the child records also. This is the default option.

  • None No action whatsoever. This option will leave orphaned rows within the child data table.

  • SetDefault Each child record affected has the foreign key column(s) set to their default value, if one has been defined.

  • SetNull All child rows have the key column(s) set to DBNull . (Following on from the naming convention that Microsoft uses, this should really be SetDBNull ).

    Important 

    Constraints are only enforced within a DataSet if the EnforceConstraints property of the DataSet is true .

I have covered the main classes that make up the constituent parts of the DataSet , and shown how to manually generate each of these classes in code. There is another way to define a DataTable , DataRow , DataColumn , DataRelation, and Constraint by using the XML schema file(s) and the XSD tool that ships with .NET. The following section describes how to set up a simple schema and generate type-safe classes to access your data.

  


Professional C#. 2nd Edition
Performance Consulting: A Practical Guide for HR and Learning Professionals
ISBN: 1576754359
EAN: 2147483647
Year: 2002
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net