Working with DataRelation Objects in Code | Microsoft ADO.NET (Core Reference) (PRO-Developer)

Questions That Should Be Asked More Frequently

Q. When should I create my DataRelation objects without constraints?
A. Let's say your application queries your database for customer and order data and displays that information on a simple Windows form. The application allows the user to view the data but not modify it.

In such an application, a DataRelation is useful because it lets you easily display the orders for a particular customer. However, you probably don't want to use constraints within your DataSet. Why? Constraints are used to validate your data, and validating data takes time. If the data in the application is read-only and the database has already validated the data through its own set of constraints, there's no need for ADO.NET to revalidate the data.
Q. I want to work with data from multiple tables, but I'm not going to retrieve all rows from the parent table. The queries you showed that retrieve only the related child rows look complex and inefficient. Wouldn't a single join-based query that returned data from all tables at once be faster?
A. During the Microsoft .NET Framework beta, a thread in one of the newsgroups focused on this issue. A couple of developers felt that fetching the data from separate queries was inefficient. Their premise made sense and piqued my curiosity, so I set out to see how working with separate queries would affect performance. I built applications in Visual Basic .NET using ADO.NET and Visual Basic 6 using ADO 2.6 to query the SQL Server Northwind database for customer, order, and order detail information for all customers in the United States. I wrote various routines for each application that used a specific type of query to retrieve this information from the database.

The routine that tested the join query simply retrieved the data into a single DataTable that had no validation. The queries I used were as follows:

Join-Based Query
```
SELECT C.CustomerID, C.CompanyName, C.ContactName, C.Phone,        O.OrderID, O.EmployeeID, O.OrderDate,         D.ProductID, D.Quantity, D.UnitPrice         FROM [Order Details] D, Orders O, Customers C         WHERE D.OrderID = O.OrderID AND O.CustomerID = C.CustomerID              AND C.Country = N'USA'
```
Separate Queries
```
SELECT CustomerID, CompanyName, ContactName, Phone         FROM Customers WHERE C.Country = N'USA' SELECT O.OrderID, O.CustomerID, O.EmployeeID, O.OrderDate        FROM Orders O, Customers C         WHERE O.CustomerID = C.CustomerID AND C.Country = N'USA' SELECT D.OrderID, D.ProductID, D.Quantity, D.UnitPrice        FROM [Order Details] D, Orders O, Customers C         WHERE D.OrderID = O.OrderID AND O.CustomerID = C.CustomerID              AND C.Country = N'USA'
```
Parameterized Queries
```
SELECT CustomerID, CompanyName, ContactName, Phone         FROM Customers WHERE C.Country = N'USA' SELECT OrderID, CustomerID, EmployeeID, OrderDate        FROM Orders WHERE CustomerID = ? SELECT OrderID, ProductID, Quantity, UnitPrice        FROM [Order Details] WHERE OrderID = ?
```
Like the others in the newsgroup, I had expected the join-based query and the parameterized queries to be faster than the separate queries. Thankfully, I ran the tests before sharing my opinion on what method of retrieving the data would be fastest. It turned out that retrieving the data using separate queries was fastest. The separate queries were on average 20 percent faster than the join-based query and between six and eight times faster than the parameterized queries.

Why? Well, optimizing queries is not my strong suit, but I believe I can explain the behavior. Retrieving data using separate queries was faster than using the single join query because the single join query returned more data. A join query returns redundant data. There might be 100 entries in the Order Details table that correspond to a single row in the Customers table, but that same customer information appears in each of those 100 rows.

I used the DataSet object's WriteXml method, which we'll discuss in Chapter 12, to store the contents of the DataSet to an XML file. The XML file that corresponded to the DataSet I filled with the results of the join-based query was nearly three times as large as the XML file that corresponded to the DataSet I filled using the separate queries. The join-based query might look and feel more efficient because it's simpler, but it's actually less efficient because it returns more data.

To test the parameterized queries, I executed the simple query against the Customers table and retrieved the results into a DataTable. I then looped through the results of that query and executed the parameterized query against the Orders table once for each customer. Finally, I looped through each of the combined results of the queries against the Orders table and executed the parameterized query against the Order Details table once for each order. This process required executing a separate query for each customer and for each order. The structure of the parameterized queries is simple, but executing these queries over and over for each customer and order proved inefficient and the performance was lousy.

That's not a knock on parameterized queries in general. They're simply not the best solution for retrieving data from large filtered hierarchies such as the one I tested in this scenario.

Even if you're going to work with read-only data from multiple tables, you should consider breaking up your join queries into separate queries that return data from distinct tables.


	I also ran two additional routines, one that used IN subqueries and one that bundled the separate queries into a batch. The performance of the IN subqueries was comparable to the routine that used separate queries. Bundling the separate queries into a batch improved performance by about eight percent.

Q. I'm fetching data from stored procedures, but unfortunately I can't change the join queries inside the stored procedures. Is there a simple way to "split" the results into separate DataTable objects?

A. There is no such method in the ADO.NET object model—at least not yet. However, you can construct code to split the results using a DataReader and the Find method on the DataTable object's Rows collection (a feature we'll discuss in the next chapter).

The following code snippet executes a join query against the Customers and Orders tables. It then loops through the data returned in the DataReader and appends new rows to separate Customers and Orders DataTable objects.

Visual Basic .NET

ds.Tables("Customers").BeginLoadData() ds.Tables("Orders").BeginLoadData() Dim strConn, strSQL As String strConn = "Provider=SQLOLEDB;Data Source=(local)\NetSDK;" & _            "Initial Catalog=Northwind;Trusted_Connection=Yes;" Dim cn As New OleDbConnection(strConn) cn.Open() strSQL = "SELECT C.CustomerID, C.CompanyName, C.ContactName, " & _          "C.Phone, O.OrderID, O.EmployeeID, O.OrderDate " & _          "FROM Customers C, Orders O " & _          "WHERE C.CustomerID = O.CustomerID AND C.Country = 'Canada'" Dim cmd As New OleDbCommand(strSQL, cn) Dim rdr As OleDbDataReader = cmd.ExecuteReader Dim objNewCustomer, objNewOrder As Object() Do While rdr.Read     If ds.Tables("Customers").Rows.Find(rdr.GetString(0)) Is Nothing Then         objNewCustomer = New Object() {rdr.GetString(0), _                                        rdr.GetString(1), _                                        rdr.GetString(2), _                                        rdr.GetString(3)}         ds.Tables("Customers").LoadDataRow(objNewCustomer, True)     End If     objNewOrder = New Object() {rdr.GetInt32(4), _                                 rdr.GetString(0), _                                 rdr.GetInt32(5), _                                 rdr.GetDateTime(6)}     ds.Tables("Orders").LoadDataRow(objNewOrder, True) Loop rdr.Close() cn.Close() ds.Tables("Customers").EndLoadData() ds.Tables("Orders").EndLoadData()

Visual C# .NET

ds.Tables["Customers"].BeginLoadData(); ds.Tables["Orders"].BeginLoadData(); string strConn, strSQL; strConn = "Provider=SQLOLEDB;Data Source=(local)\\NetSDK;" +            "Initial Catalog=Northwind;Trusted_Connection=Yes;"; OleDbConnection cn = new OleDbConnection(strConn); cn.Open(); strSQL = "SELECT C.CustomerID, C.CompanyName, C.ContactName, " +          "C.Phone, O.OrderID, O.EmployeeID, O.OrderDate " +          "FROM Customers C, Orders O " +          "WHERE C.CustomerID = O.CustomerID AND C.Country = 'Canada'"; OleDbCommand cmd = new OleDbCommand(strSQL, cn); OleDbDataReader rdr = cmd.ExecuteReader(); object[] objNewCustomer, objNewOrder; while (rdr.Read()) {     if (ds.Tables["Customers"].Rows.Find(rdr.GetString(0)) == null)     {         objNewCustomer = new object[] {rdr.GetString(0),                                         rdr.GetString(1),                                         rdr.GetString(2),                                         rdr.GetString(3)};         ds.Tables["Customers"].LoadDataRow(objNewCustomer, true);     }     objNewOrder = new object[] {rdr.GetInt32(4),                                  rdr.GetString(0),                                  rdr.GetInt32(5),                                  rdr.GetDateTime(6)};     ds.Tables["Orders"].LoadDataRow(objNewOrder, true); } rdr.Close(); cn.Close(); ds.Tables["Customers"].EndLoadData(); ds.Tables["Orders"].EndLoadData();