Data Sets | Windows Forms Programming in C#

The main unit of any data-centric application is the data set , a data-provider-neutral collection of tables with optional relationship and constraint information. Each data set contains data tables, each of which consists of zero or more data rows, which hold the actual data. In addition, each data table contains individual data columns , which have metadata that describes the type of data each data row can contain.

A data set can be populated by hand but is most commonly populated via a data adapter , which knows how to speak a data-provider-specific protocol to get and set the data. The data adapter uses a data connection , which is a communications pipe to the data itself, whether it lives on a file in the file system or in a database on another machine. A data command is used with the connection to retrieve the rows in question or to otherwise act on the data provider.

Data sets, tables, rows, and columns are data-source-neutral, but data adapters and connections are specific to a data source. The specifics serve as a bridge between the data provider and the .NET data-provider-neutral services, such as data binding (covered in Chapter 13).

The basic pieces of the .NET data architecture, known as ADO.NET, are shown in Figure 12.1.

Figure 12.1. .NET Data Architecture

This chapter and Chapter 13 include a lot of code samples that depend on an instance of SQL Server running with the Northwind database installed. If you don't have SQL Server running, you can install the Microsoft SQL Server Developer Edition (MSDE) that comes with the .NET Framework SDK. Follow the instructions in Start Programs Microsoft .NET Framework SDK Samples and QuickStartTutorials Install the .NET Framework Samples Database. ^[2]

^[2] If you're using SQL Server, whether it's MSDE or not, you'll want to watch out for the SQL Slammer, which is covered at http://www.microsoft.com/security/slammer.asp.

Retrieving Data

Given this basic architecture, the following shows an example of filling a DataSet object using the classes from the System.Data namespace and the SQL Server data provider classes from the System.Data.SqlClient namespace:

 using System.Data; using System.Data.SqlClient; // Access to SQL Server  ...   // A data set for use by the form   DataSet dataset = new DataSet();   void Form1_Load(object sender, EventArgs e) {   // Configure the connection   SqlConnection conn = new SqlConnection(@"Server=localhost;...");   // Create the adapter from the connection   SqlDataAdapter adapter = new SqlDataAdapter(conn.CreateCommand());   adapter.SelectCommand.CommandText = "select * from customers";   // Fill the data set with the Customers table   adapter.Fill(dataset);  // Populate list box   PopulateListBox(); } void PopulateListBox() {   // Clear the list box   listBox1.Items.Clear();  // Enumerate cached data   foreach( DataRow row in dataset.Tables[0].Rows ) {   string item = row["ContactTitle"] + ", " + row["ContactName"];  listBox1.Items.Add(item);  }  }

This code creates a connection using a data-provider-specific connection string , which tells the connection where to go to get the data. It then creates an adapter with the appropriate command text to retrieve data over the connection. The adapter is used to fill the data set, which produces one table. The code then enumerates the table's rows, picking out columns by name that we happen to know that the table will contain. Then it uses the data to populate items in a list box, as shown in Figure 12.2.

Figure 12.2. Showing Retrieved Data

Notice that although the sample code creates a connection, it never opens or closes it. Instead, the data adapter opens the connection for an operation ”in this case, retrieving the data and filling the data set ”and closes it when an operation is complete. The data set itself never uses the connection, nor does it know about where the data comes from. It's the data adapter's job to translate data in provider-specific format into the provider-neutral data set.

Because the data set has no concept of a connection to the provider, it is a cache of both data and operations on data. Data can be updated and even removed from the data set, but those operations aren't reflected to the actual provider until you tell the data adapter to do so. Before we discuss that, however, let's take a look at the rest of the common data operations: creating, updating, and deleting data.

Creating Data

Creating a new row in a table is a matter of asking the table for an empty DataRow object and filling it with column data:

 void addRowMenuItem_Click(object sender, EventArgs e) {  // Ask table for an empty DataRow   DataRow row = dataset.Tables[0].NewRow();   // Fill DataRow with column data   row["CustomerID"] = "SELLSB";   ...   // Add DataRow to the table   dataset.Tables[0].Rows.Add(row);  // Update list box   PopulateListBox(); }

Updating Data

You can update existing data by reaching into the data set, pulling out the row of interest, and updating the column data as appropriate:

 void updateSelectedRowMenuItem_Click(object sender, EventArgs e) {   // Get selection index from list box   int index = listBox1.SelectedIndex;   if( index == -1 ) return;  // Get row from data set   DataRow row = dataset.Tables[0].Rows[index];   // Update the row as appropriate   row["ContactTitle"] = "CEO";  // Update list box   PopulateListBox(); }

Deleting Data

Deleting a row from a table requires first deciding just how "deleted" you'd like it to be. If you'd like the row to be gone from the table completely, leaving no trace behind, that involves the Remove method on the DataRowCollection exposed from the DataTable:

 void deleteSelectedRowMenuItem_Click(object sender, EventArgs e) {   // Get selection index from list box   int index = listBox1.SelectedIndex;   if( index == -1 ) return;   // Get row from data set   DataRow row = dataset.Tables[0].Rows[index];  // Remove the row from the data set   dataset.Tables[0].Rows.Remove(row);  // Update list box   PopulateListBox(); }

However, this is probably more "deleted" than you'd like, especially if you plan to replicate changes made to the data set back to the originating data provider. In that case, you'll want to mark a row as deleted but not remove all traces of it from the data set. You do that using the Delete method on the DataRow itself:

 void deleteSelectedRowMenuItem_Click(object sender, EventArgs e) {   // Get selection index from list box   int index = listBox1.SelectedIndex;   if( index == -1 ) return;   // Get row from data set   DataRow row = dataset.Tables[0].Rows[index];  // Mark the row as deleted   row.Delete();  // Update list box   PopulateListBox(); }

When a DataTable contains deleted rows, you'll need to change how you access the data, because the DataTable class doesn't allow direct access to deleted rows. This is to prevent you from accidentally treating deleted rows like normal rows. Checking for a deleted row is a matter of checking a row's RowState, which is a combination of values from the DataRowState enumeration:

 enum DataRowState {   Added,    Deleted,   Detached,   Modified,   Unchanged, }

Taking deleted rows into account looks like this:

 void PopulateListBox() {   // Clear the list box   listBox1.Items.Clear();   // Enumerate cached data   foreach( DataRow row in dataset.Tables[0].Rows ) {  if( (row.RowState & DataRowState.Deleted) !=   DataRowState.Deleted ) continue;  string item = row["ContactTitle"] + ", " + row["ContactName"];     listBox1.Items.Add(item);   } }

By default, when you access column data, you're getting the "current" data, which, for deleted columns, is missing (and attempted access to it will cause a run-time exception). All data is marked with a value from the DataRowVersion enumeration:

 enum DataRowVersion {   Current,   Default,   Original,   Proposed, }

To retrieve old or deleted column data, you can pass a value from DataRowVersion as the second argument to the row's indexer:

 void PopulateListBox() {   // Clear the list box   listBox1.Items.Clear();   // Enumerate cached data   foreach( DataRow row in dataset.Tables[0].Rows ) {     if( (row.RowState & DataRowState.Deleted) !=          DataRowState.Deleted ) {       string id =  row["CustomerID", DataRowVersion.Original]  .ToString();       listBox1.Items.Add("***deleted***: " + id);       continue;     }     ...   } }

Tracking Changes

When a DataRow makes its way into a DataSet as a result of the DataAdapter's Fill method, the RowState is set to Unchanged and, as I mentioned, using the DataRow Delete method sets the RowState to Deleted. Similarly, adding new rows and updating existing ones sets the RowState to Added and Modified, respectively. This turns the data set into not only a repository for the current state of the cached data but also a record of the changes that have been made to the data since it was initially retrieved. You can get these changes on a per-table basis by using the GetChanges method of the DataTable:

 DataTable tableChanges =   dataset.Tables[0].GetChanges(DataRowState.Modified); if( tableChanges != null ) {   foreach( DataRow changedRow in tableChanges.Rows ) {     MessageBox.Show(changedRow["CustomerID"] + " modified");   } }

The GetChanges method takes a combination of DataRowState values and returns a table that has a copy of only those rows. The rows are copied so that there's no need to worry about accessing deleted data, an attempt that would normally throw an exception. You can use the GetChanges method to find all the modified, added, and deleted rows, together or selectively. This is a handy way to access the data that you need to replicate changes back to the data provider.

Committing Changes

The combination of the GetChanges method and the DataRowVersion enumeration allows you to build commands for replicating changes made to the data set back to the data provider. In the case of database-centric data adapters, you retrieve data using an instance of a command , which is responsible for selecting the data set via the SelectCommand property. In fact, recall the earlier code that set up the data adapter:

 // Configure the connection SqlConnection conn = new SqlConnection(@"..."); // Create the adapter from the connection string select = "select * from customers"; SqlDataAdapter adapter = new  SqlDataAdapter(select, conn);

This code is really just a shortcut for the following code, which creates a command to perform the select directly:

 // Configure the connection SqlConnection conn = new SqlConnection(@"..."); // Create the adapter from the connection string select = "select * from customers"; SqlDataAdapter adapter = new  SqlDataAdapter()  ;  adapter.SelectCommand = new SqlCommand(select, conn);

It's the Command object that's responsible for using the connection to retrieve the data, and it's the data adapter's job to keep track of the command it needs to retrieve the data. Similarly, the data adapter uses other commands for replicating changes back, where "changes" includes added rows, updated rows, and deleted rows. It does this using commands that are set via the InsertCommand, UpdateCommand, and DeleteCommand properties, respectively.

You can populate these commands yourself, but it's generally easier to let a command builder do that work for you. A command builder is an object that uses the information it gets from the select command and populates the other three commands appropriately:

 // Create the adapter from the connection with a select command SqlDataAdapter   adapter = new SqlDataAdapter("select * from customers", conn);  // Let command builder build commands for insert, update, and delete   // using the information from the existing select command   new SqlCommandBuilder(adapter);

The command builder is so self-sufficient that you don't even need to keep it around. The mere act of creating it, passing the adapter that needs commands built, is enough. After the command builder has set up the adapter's commands appropriately, you replicate changes back to the data provider by calling the adapter's Update method:

 void commitChangesMenuItem_Click(object sender, EventArgs e) {   // Configure the connection   SqlConnection conn = new SqlConnection(@"...");   // Create the adapter from the connection with a select command   SqlDataAdapter     adapter = new SqlDataAdapter("select * from customers", conn);  // Let command builder build commands for insert, update, and delete   new SqlCommandBuilder(adapter);   // Commit changes back to the data provider   try {   adapter.Update(dataset);   }   catch( SqlException ex ) {   MessageBox.Show(ex.Message, "Error(s) Committing Changes");   }  // Update list box   PopulateListBox(); }

This code uses a command builder to build the other three commands needed to update the data provider and then lets the data adapter compose the command text as necessary. If any of the updates causes an error, a run-time exception will be thrown, and that's why the code shows the call to Update wrapped in a try-catch block. Error information is kept for each row so that you can show it to the user :

 void PopulateListBox() {   // Clear the list box   listBox1.Items.Clear();   // Enumerate cached data   foreach( DataRow row in dataset.Tables[0].Rows ) {     if( (row.RowState & DataRowState.Deleted) !=          DataRowState.Deleted ) continue;     string item = row["ContactTitle"] + ", " + row["ContactName"];     if(  row.HasErrors  ) item += "(***" +  row.RowError  + "***)";     listBox1.Items.Add(item);   } }

The HasErrors Boolean property of each row reports whether there was an error during the last update, and the RowError string reports what that error is. If there are errors during an update, the RowState of the row will not be changed. For every row that doesn't have errors, it will be reset to DataRowState.Unchanged in preparation for the next update.

Multitable Data Sets

Data sets can hold more than one table at a time. When creating data sets that contain multiple tables, you will want to use one data adapter for each table loaded. In addition, you must be careful when filling a data set using more than one adapter. If you call the data adapter's Fill method on a data set multiple times, you'll end up appending data into a single table, so you need to be specific about what table you're trying to fill:

 // Configure the connection SqlConnection conn = new SqlConnection(@"..."); // Create the adapters  SqlDataAdapter customersAdapter = new SqlDataAdapter();   SqlDataAdapter ordersAdapter = new SqlDataAdapter();  // Create a data set DataSet dataset = new DataSet(); void MultiTableForm_Load(object sender, EventArgs e) {   // Create the Customer adapter from the connection   customersAdapter.SelectCommand = conn.CreateCommand();   customersAdapter.SelectCommand.CommandText =     "select * from customers";  // Fill the data set with the Customers table   customersAdapter.Fill(dataset, "Customers");  // Create the Orders adapter from the connection   ordersAdapter.SelectCommand = conn.CreateCommand();   ordersAdapter.SelectCommand.CommandText =     "select * from orders";  // Fill the data set with the Orders table   ordersAdapter.Fill(dataset, "Orders");   // Need one command builder for each adapter   // in anticipation of eventually committing changes   new SqlCommandBuilder(customersAdapter);   new SqlCommandBuilder(ordersAdapter);  // Populate list boxes   PopulateListBoxes(); }

This code fills a data set with data from two different data adapters, one for each table. When you call the Fill method of the data adapter, you must specify which table to fill with the data from the adapter. If you fail to do this, you will get one data table (called "Table") with data from both Fill methods .

You could have used a single data adapter to fill both tables, but because the command builders use the SelectCommand to determine how to update the data provider, it is good form to have a one-to-one relationship between tables in the data set and data adapters. Notice that when the data adapters are created, one command builder is created for each of them in anticipation of committing changes for each table. With more than one table, the code to commit changes needs to be updated:

 void commitChangesMenuItem_Click(object sender, EventArgs e) {   // Commit customer changes back to the data provider   try {  customersAdapter.Update(dataset, "Customers");  }   catch( SqlException ex ) {     MessageBox.Show(ex.Message,       "Error(s) Committing Customer Changes");   }   // Commit order changes back to the data provider   try {  ordersAdapter.Update(dataset, "Orders");  }   catch( SqlException ex ) {     MessageBox.Show(ex.Message,       "Error(s) Committing Order Changes");   }   // Update list boxes   PopulateListBoxes(); }

This code commits changes on each table by calling the Update method of the particular data adapter while specifying the table to update. Make sure not to get the adapter mixed up with the name of the table, or things won't go so well.

Constraints

If you'd like to catch problems with the data as it's added by the user instead of waiting until the data is sent back to the data provider, you can establish constraints. A constraint limits the kind of data that can be added to each column. The System.Data namespace comes with two constraints: the foreign key constraint and the unique value constraint, which are represented by the ForeignKeyConstraint and the UniqueConstraint classes, respectively.

For example, to make sure that no two rows have the same value in a column, you can add a unique constraint to the table's list of constraints:

 // Add a constraint DataTable customers = dataset.Tables["Customers"]; UniqueConstraint constraint =   new UniqueConstraint(customers.Columns["CustomerID"]); customers.Constraints.Add(constraint);

With the constraint in place, if a row is added to the table that violates the constraint, a run-time exception will be thrown immediately, without a round-trip to the data provider. Unique constraints set up a constraint between one or more columns in a single table, whereas foreign key constraints set up an existence requirement between columns in multiple tables. Foreign key constraints are set up automatically whenever a relation is established.

Relations

Data sets are not simply containers for multiple tables of data but instead are containers that have support for relational data. As with data in a database, the real power of a data set can be harnessed by relating multiple tables:

 // Get reference to the tables DataTable customers = dataset.Tables["Customers"]; DataTable orders = dataset.Tables["Orders"]; // Create the relation DataRelation relation =   new DataRelation(     "CustomersOrders",     customers.Columns["CustomerID"],     orders.Columns["CustomerID"]); // Add the relation dataset.Relations.Add(relation);

This code creates a relation between the customer and order tables on each table's CustomerID column. A relation is a named association of columns between multiple tables. To relate columns between tables, you use an instance of a DataRelation class, passing a name and the columns from each of the two tables. Figure 12.3 shows the sample relation between the Customers and the Orders tables.

Figure 12.3. A Sample Relation between the Customers Table and the Orders Table

After the relation is created, it's added to the set of relations maintained on the data set, an action that also sets up the foreign key constraint. In addition, relations are used for navigation and in expressions.

Navigation

When a relation is added, the second argument to the DataRelation constructor becomes the parent column , and the third argument becomes the child column . You can navigate between the two using the DataRow methods GetParentRows and GetChildRows, respectively. This allows you to show, for example, related child rows when a parent row is selected, as shown in Figure 12.4.

Figure 12.4. Showing the Results of GetChildRows Using a Relation

In Figure 12.4, the top list box shows the customers, which form the parent in the CustomersOrders relation. When a customer is selected, the bottom list box is populated with the related rows:

 void PopulateChildListBox() {   // Clear the list box   ordersListBox.Items.Clear();  // Get the currently selected parent customer row   int index = customersListBox.SelectedIndex;  if( index == -1 ) return;  // Get row from data set   DataRow parent = dataset.Tables["Customers"].Rows[index];  // Enumerate child rows   foreach( DataRow row in  parent.GetChildRows("CustomersOrders")  ) {     ...   } }

Similarly, from any child row, a relation can be navigated back to the parent using GetParentRows.

Expressions

Relations can also be used in expressions. An expression is a column of values that are calculated on-the-fly . The Expression property on the DataColumn class provides this functionality. For example, here's an expression to combine the ContactTitle and ContactName fields:

 // Create the expression column DataColumn exp = new DataColumn(); exp.ColumnName = "ContactTitleName"; exp.DataType = typeof(string); exp.Expression = "ContactTitle + ', ' + ContactName";   // Add it to the customer table dataset.Tables["Customers"].Columns.Add(exp);

This code creates a new DataColumn and specifies the name, data type, and expression. The expression syntax is fairly straightforward and is syntactically similar to SQL. You can find the complete reference to the expression syntax in the DataColumn's Expression property documentation. ^[3]

^[3] http://msdn.microsoft.com/library/en-us/cpref/html/frlrfSystemDataDataColumnClassExpressionTopic.asp

After you've created an expression column, you can use it like any other column:

 void PopulateListBoxes() {   // Clear the list box   customersListBox.Items.Clear();   // Enumerate cached data   foreach( DataRow row in dataset.Tables["customers"].Rows ) {  // Use the expression instead of composing the string   //string item = row["ContactTitle"] + ", " + row["ContactName"];   string item = row["ContactTitleName"].ToString();  customersListBox.Items.Add(item);   }   PopulateChildListBox(); }

An expression can navigate a relation from child to parent or vice versa. For example, the orders table doesn't have a contact name for the parent customer, but you can use an expression to grab the data from the parent:

 // Create the expression column DataColumn exp2 = new DataColumn(); exp2.ColumnName = "CustomerContactName"; exp2.DataType = typeof(string);  exp2.Expression = "parent(CustomersOrders).ContactName";  // Add it to the customer table dataset.Tables["Orders"].Columns.Add(exp2);

This code uses the parent(relationName).columnName syntax to navigate from the child orders table to the parent customers table. If there's only one relationship, then you can just use parent.columnName. Similarly, when going from a parent to a child, you use the child(relationName).columnName syntax.