Using a DataSet Object to Work with Lots of Data

 

Using a DataSet Object to Work with Lots of Data

The DataSet is a memory-based relational representation of data and is the primary disconnected data object. Conceptually, you can think of the DataSet as an in-memory relational database, but it is simply cached data and doesn't provide any of the transactional properties (atomicity, consistency, isolation, durability) that are essential to today's relational databases. The DataSet contains a collection of DataTable and DataRelation objects, as shown in Figure 1-4. The DataTable objects can contain unique and foreign key constraints to enforce data integrity. The DataSet also provides methods for cloning the DataSet schema, copying the DataSet, merging with other DataSet objects, and retrieving changes from the DataSet.

image from book
Figure 1-4: The DataSet object contains a collection of DataTable and DataRelation objects.

You can create the DataSet schema programmatically or by providing an XML schema definition. The following code demonstrates the creation of a simple DataSet containing a DataTable for vendors and a DataTable for parts. The two DataTable objects are joined using a DataRelation named vendor_part. (The DataRelation is discussed in more detail in the next section.)

image from book

Visual Basic

Dim vendorData as new DataSet("VendorData")

Dim vendor as DataTable = vendorData.Tables.Add("Vendor")
vendor.Columns.Add("Id", GetType(Guid))
vendor.Columns.Add("Name", GetType(string))
vendor.Columns.Add("Address1", GetType(string))
vendor.Columns.Add("Address2", GetType(string))
vendor.Columns.Add("City", GetType(string))
vendor.Columns.Add("State", GetType(string))
vendor.Columns.Add("ZipCode", GetType(string))
vendor.Columns.Add("Country", GetType(string))
vendor.PrimaryKey = new DataColumn() { vendor.Columns("Id") }

Dim part as DataTable = vendorData.Tables.Add("Part")
part.Columns.Add("Id", GetType(Guid))
part.Columns.Add("VendorId", GetType(Guid))
part.Columns.Add("PartCode", GetType(string))
part.Columns.Add("PartDescription", GetType(string))
part.Columns.Add("Cost", GetType(decimal))

part.Columns.Add("RetailPrice", GetType(decimal))
part.PrimaryKey = new DataColumn() { part.Columns("Id") }

vendorData.Relations.Add( _
   "vendor_part", _
   vendor.Columns("Id"), _
   part.Columns("VendorId"))
 

image from book

image from book

C#

DataSet vendorData = new DataSet("VendorData");

DataTable vendor = vendorData.Tables.Add("Vendor");
vendor.Columns.Add("Id", typeof(Guid));
vendor.Columns.Add("Name", typeof(string));
vendor.Columns.Add("Address1", typeof(string));
vendor.Columns.Add("Address2", typeof(string));
vendor.Columns.Add("City", typeof(string));
vendor.Columns.Add("State", typeof(string));
vendor.Columns.Add("ZipCode", typeof(string));
vendor.Columns.Add("Country", typeof(string));
vendor.PrimaryKey = new DataColumn[] { vendor.Columns["Id"] };

DataTable part = vendorData.Tables.Add("Part");
part.Columns.Add("Id", typeof(Guid));
part.Columns.Add("VendorId", typeof(Guid));
part.Columns.Add("PartCode", typeof(string));
part.Columns.Add("PartDescription", typeof(string));
part.Columns.Add("Cost", typeof(decimal));
part.Columns.Add("RetailPrice", typeof(decimal));
part.PrimaryKey = new DataColumn[] { part.Columns["Id"] };

vendorData.Relations.Add(
   "vendor_part",
   vendor.Columns["Id"],
   part.Columns["VendorId"]);
 

image from book

Being More Specific with Typed DataSet Objects

The previous code created a schema for a DataSet. Accessing the DataTable named vendor would require code like this:

image from book

Visual Basic

Dim vendorTable as DataTable = vendorData.Tables("Vendor")
 

image from book

image from book

C#

DataTable vendorTable = vendorData.Tables["Vendor"];
 

image from book

What happens if the table name is spelled incorrectly? An exception is thrown, but not until runtime. A better approach is to create a new, specialized DataSet class that inherits from DataSet, adding a property for each of the tables. For example, a specialized DataSet class might contain a property called Vendor that can be accessed as follows:

image from book

Visual Basic

Dim vendorTable as DataTable = vendorData.Vendor
 

image from book

image from book

C#

DataTable vendorTable = vendorData.Vendor;
 

image from book

Using the preceding syntax, a compile error is generated if Vendor is not spelled correctly. Also, the chances of incorrect spelling are significantly reduced because Visual Studio's IntelliSense displays the Vendor property for quick selection when the line of code is being typed. The standard DataSet class is an untyped DataSet, whereas the specialized DataSet is a typed DataSet.

You can create a typed DataSet class manually, but it's usually better to provide an XML schema definition (XSD) file that can be used to generate the typed DataSet class. Visual Studio contains a tool called the DataSet Editor that you can use to graphically create and modify an XSD file which in turn can be used to generate the typed DataSet class. You can invoke the DataSet Editor by adding a DataSet file to a Visual Studio project: right-click the project, choose Add, choose New Item, and select the DataSet template. After you add the DataSet template to the project, the template will be open for you to edit using the DataSet Editor. Figure 1-5 shows the files that are created when the DataSet template is added to a project. Notice that you must select the Show All Files button to see these files. One of the files has a .cs extension, which is the extension for a C# source code file. A Visual Basic application would have a file with a .vb extension. The source code file contains the specialized typed DataSet code, which is generated automatically by the DataSet Editor.


Figure 1-5: The DataSet template contains an XML schema definition and generates source code to create a typed DataSet.

Navigating the Family Tree with DataRelation Objects

The DataRelation objects are used to join DataTable objects that are in the same DataSet. Joining DataTable objects creates a path from one DataTable object to another. This DataRelation can be traversed programmatically from parent DataTable to child DataTable or from child DataTable to parent DataTable, which enables navigation between the DataTable objects. The following code example populates the vendor and part DataTable objects and then demonstrates DataRelation object navigation, first from parent to child and then from child to parent, using the previously declared vendor_part DataRelation.

image from book

Visual Basic

Dim vendorRow as DataRow = nothing
vendorRow = vendor.NewRow()
Dim vendorId as Guid = Guid.NewGuid()
vendorRow("Id") = vendorId
vendorRow("Name") = "Tailspin Toys"
vendor.Rows.Add(vendorRow)

Dim partRow as DataRow = nothing
partRow = part.NewRow()
partRow("Id") = Guid.NewGuid()
partRow("VendorId") = vendorId
partRow("PartCode") = "WGT1"
partRow("PartDescription") = "Widget 1 Description"
partRow("Cost") = 10.00
partRow("RetailPrice") = 12.32
part.Rows.Add(partRow)

partRow = part.NewRow()
partRow("Id") = Guid.NewGuid()
partRow("VendorId") = vendorId
partRow("PartCode") = "WGT2"
partRow("PartDescription") = "Widget 2 Description"
partRow("Cost") = 9.00
partRow("RetailPrice") = 11.32
part.Rows.Add(partRow)

'Navigate parent to children
Dim parts as DataRow() = vendorRow.GetChildRows("vendor_part")
for each dr as DataRow in parts
   textBox1.AppendText("Part: " + dr("PartCode") +vbcrlf)
Next

'Navigate child to parent
Dim parentRow as DataRow = part.Rows(1).GetParentRow("vendor_part")
textBox1.AppendText("Vendor: " + parentRow("Name") + vbcrlf)
 

image from book

image from book

C#

DataRow vendorRow = null;
vendorRow = vendor.NewRow();
Guid vendorId = Guid.NewGuid();
vendorRow["Id"] = vendorId;
vendorRow["Name"] = "Tailspin Toys";
vendor.Rows.Add(vendorRow);

DataRow partRow = null;
partRow = part.NewRow();
partRow["Id"] = Guid.NewGuid();
partRow["VendorId"] = vendorId;

partRow["PartCode"] = "WGT1";
partRow["PartDescription"] = "Widget 1 Description";
partRow["Cost"] = 10.00;
partRow["RetailPrice"] = 12.32;
part.Rows.Add(partRow);

partRow = part.NewRow();
partRow["Id"] = Guid.NewGuid();
partRow["VendorId"] = vendorId;
partRow["PartCode"] = "WGT2";
partRow["PartDescription"] = "Widget 2 Description";
partRow["Cost"] = 9.00;
partRow["RetailPrice"] = 11.32;
part.Rows.Add(partRow);

//Navigate parent to children
DataRow[] parts = vendorRow.GetChildRows("vendor_part");
foreach (DataRow dr in parts)
{
   textBox1.AppendText("Part: " + dr["PartCode"] +"\r\n");
}

//Navigate child to parent
DataRow parentRow = part.Rows[1].GetParentRow("vendor_part");
textBox1.AppendText("Vendor: " + parentRow["Name"] + "\r\n");
 

image from book

Creating Primary and Foreign Key Constraints

You can create a DataRelation object with or without unique and foreign key constraints for the sole purpose of navigating between parent and child DataTable objects. The DataRelation object also contains a constructor that allows for the creation of a unique constraint on the parent DataTable object and a foreign key constraint on the child DataTable object. These constraints are used to enforce data integrity by ensuring that a parent DataRow object exists for any child DataRow object that is to be created, and that a parent DataRow object can't be deleted if it has child DataRow objects. The following code demonstrates the creation of the DataRelation named vendor_part, passing true to create constraints if they don't already exist.

Note 

If a foreign key constraint is set to a DataColumn object that allows nulls, this child DataRow can exist without having a parent DataRow object. In some situations, this might be desired or required, but in other situations it might not. Be sure to verify the AllowDbNull property of the DataColumn objects that are being used as foreign keys.

image from book

Visual Basic

vendorData.Relations.Add( _
   "vendor_part", _
   vendor.Columns("Id"), _
   part.Columns("VendorId"), True)
 

image from book

image from book

C#

vendorData.Relations.Add(
   "vendor_part",
   vendor.Columns["Id"],
   part.Columns["VendorId"], true);
 

image from book

Cascading Deletes and Cascading Updates

A foreign key constraint ensures that a child DataRow object cannot be added unless a valid parent DataRow object exists and that a parent DataRow object cannot be deleted if the parent still has child DataRow objects. In some situations, it is desirable to force the deletion of the child DataRow objects when the parent DataRow object is deleted. You can do this by setting the DeleteRule on the ForeignKeyConstraint to Cascade. As it turns out, Cascade is the default setting. Table 1-5 describes the other members of the Rule enumeration.

Table 1-5: Rule Enumeration Members

Rule Value

Description

Cascade

Default. Deletes or updates the child DataRow objects when the DataRow object is deleted or its unique key is changed.

None

Throws an InvalidConstraintException if the parent DataRow object is deleted or its unique key is changed.

SetDefault

Sets the foreign key column(s) value to the default value of the DataColumn object(s) if the parent DataRow object is deleted or its unique key is changed.

SetNull

Sets the foreign key column(s) value to DbNull if the parent DataRow ob ject is deleted or its unique key is changed.

As with deleting, on some occasions you'll want to cascade changes to a unique key in the parent DataRow object to the child DataRow object's foreign key. You can set the ChangeRule to a member of the Rule enumeration to get the appropriate behavior.

The default setting for the ForeignKeyConstraint object's DeleteRule is Rule.Cascade. The following code snippet shows how to force an InvalidConstraintException exception to be thrown when a parent that has children is deleted or when a child is being added that has no parent.

image from book

Visual Basic

Dim fk as ForeignKeyConstraint = part.Constraints("vendor_part")
fk.DeleteRule = Rule.None
 

image from book

image from book

C#

ForeignKeyConstraint fk =
   (ForeignKeyConstraint)part.Constraints("vendor_part");
fk.DeleteRule = Rule.None;
 

image from book

Serializing and Deserializing DataSet Objects

A populated DataSet can be saved, or serialized, as XML or as binary data to a stream or file. The DataSet can also be loaded, or deserialized, with XML or binary data from a stream or file. The data stream can be transferred across a network over many protocols, including HTTP. This section looks at the various methods of transferring data.

Serializing the DataSet Object as XML

You can easily serialize a populated DataSet to an XML file by executing the DataSet object's WriteXml method. The following code snippet uses the populated vendorData DataSet that was created earlier in this chapter and writes the contents to an XML file. The resulting XML file contents are also shown.

image from book

Visual Basic

vendorData.WriteXml("c:\vendors.xml", XmlWriteMode.IgnoreSchema)
 

image from book

image from book

C#

vendorData.WriteXml(@"c:\vendors.xml", XmlWriteMode.IgnoreSchema);
 

image from book

image from book

XML

<?xml version="1.0" standalone="yes"?>
 <VendorData>
   <Vendor>
     <Id>d9625cfa-f176-4521-98f5-f577a8bc2c00</Id>
     <Name>Tailspin Toys</Name>
   </Vendor>
   <Part>
     <Id>df84fa52-5aa3-4c08-b5ba-54163eb1ea3a</Id>
     <VendorId>d9625cfa-f176-4521-98f5-f577a8bc2c00</VendorId>
     <PartCode>WGT1</PartCode>
     <PartDescription>Widget 1 Description</PartDescription>
     <Cost>10</Cost>
     <RetailPrice>12.32</RetailPrice>
   </Part>
   <Part>
     <Id>c411676a-ec53-496c-bdbd-04b4d58124d0</Id>
     <VendorId>d9625cfa-f176-4521-98f5-f577a8bc2c00</VendorId>
     <PartCode>WGT2</PartCode>
     <PartDescription>Widget 2 Description</PartDescription>
     <Cost>9</Cost>
     <RetailPrice>11.32</RetailPrice>
   </Part>
 </VendorData>
 

image from book

Notice that the XML document is well formed and that its root node is called VendorData. You can set the name of the root node by changing the DataSet object's DataSetName property. This property can be changed at any time, but notice that the code on page 19 that created the DataSet passed a DataSetName to the constructor of the DataSet object.

The DataRow objects have been represented as repeating elements in the XML. For example, the single vendor DataRow object is represented in the XML file by the single Vendor element, while the two part DataRow objects are represented in the XML file by the two Part elements. Also notice that the column data is represented as elements within the element for the DataRow. You can change the format of the column data by assigning a new value to the ColumnMapping property of the DataColumn objects. Table 1-6 shows the available settings.

Table 1-6: MappingType Enumeration Members

MappingType Value

Description

Attribute

The column data is placed into an XML attribute.

Element

The default. The column data is placed into an XML element.

Hidden

The column data is not sent to the XML file.

SimpleContent

The column data is stored as text within the row element tags. In other words, the data is stored as text like the Element setting, but with out the Element tags.

Another formatting option is to nest the Part elements inside the Vendor that owns the parts. You can do this by setting the Nested property of the DataRelation object to true. The following code snippet shows how the XML format can be changed substantially—first by nesting the data and then setting all of the DataColumn objects except those with a data type of Guid (globally unique identifier) to Attribute. The resulting XML file is also shown.

image from book

Visual Basic

vendorData.Relations("vendor_part").Nested = True
For Each dt As DataTable In vendorData.Tables
    For Each dc As DataColumn In dt.Columns
        If Not (dc.DataType.Equals(GetType(Guid))) Then
            dc.ColumnMapping = MappingType.Attribute
        End If
    Next
Next
 

image from book

image from book

C#

vendorData.Relations["vendor_part"].Nested = true;
foreach (DataTable dt in VendorData)
{
   foreach (DataColumn dc in dt.Columns)
   {
      if(dc.DataType != typeof(Guid))
      {
         dc.ColumnMapping = MappingType.Attribute;
      }
   }
}
 

image from book

image from book

XML

<?xml version="1.0" standalone="yes"?>
 <VendorData>
   <Vendor Name="Tailspin Toys">
     <Id>33323c89-213e-4168-924d-72262a7a3f5a</Id>
     <Part PartCode="WGT1" PartDescription="Widget 1 Description"
       Cost="10" RetailPrice="12.32">
       <Id>e28d9624-9e97-4106-a175-309b95952a8f</Id>
       <VendorId>33323c89-213e-4168-924d-72262a7a3f5a</VendorId>
     </Part>
     <Part PartCode="WGT2" PartDescription="Widget 2 Description"
       Cost="9" RetailPrice="11.32">
       <Id>07be7d2c-acba-40da-a1b7-78db3b774566</Id>
       <VendorId>33323c89-213e-4168-924d-72262a7a3f5a</VendorId>
     </Part>
   </Vendor>
 </VendorData>
 

image from book

In the example, the XML file is being written, but the XML file contains no information that describes the data types of the data. When not specified, the default data type for all data is string. If the XML file is read into a new DataSet, all data, including DateTime data and numeric data, is loaded as string data. One fix is to store the data type information with the XML file. You do this by storing the schema definition with the XML file. The following code shows how to specify the XmlWriteMode to include the schema definition within the XML file.

image from book

Visual Basic

vendorData.WriteXml("c:\vendors.xml", XmlWriteMode.WriteSchema)
 

image from book

image from book

C#

vendorData.WriteXml(@"c:\vendors.xml", XmlWriteMode.WriteSchema);
 

image from book

image from book

XML

<?xml version="1.0" standalone="yes"?>
 <VendorData>
   <xs:schema  xmlns=""
     xmlns:xs="http://www.w3.org/2001/XMLSchema"
     xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
     <xs:element name="VendorData" msdata:IsDataSet="true">
       <xs:complexType>
         <xs:choice minOccurs="0" maxOccurs="unbounded">
           <xs:element name="Vendor">
             <xs:complexType>
               <xs:sequence>
                 <xs:element name="Id" msdata:DataType="System.Guid,
                   mscorlib, Version=2.0.3600.0, Culture=neutral,
                   PublicKeyToken=b77a5c561934e089" type="xs:anyType"
                   msdata:Ordinal="0" />
               </xs:sequence>
               <xs:attribute name="Name" type="xs:string" />
               <xs:attribute name="Address1" type="xs:string" />
               <xs:attribute name="Address2" type="xs:string" />
               <xs:attribute name="City" type="xs:string" />

               <xs:attribute name="State" type="xs:string" />
               <xs:attribute name="ZipCode" type="xs:string" />
               <xs:attribute name="Country" type="xs:string" />
             </xs:complexType>
           </xs:element>
           <xs:element name="Part">
             <xs:complexType>
               <xs:sequence>
                 <xs:element name="Id" msdata:DataType="System.Guid,
                   mscorlib, Version=2.0.3600.0, Culture=neutral,
                   PublicKeyToken=b77a5c561934e089" type="xs:anyType"
                   msdata:Ordinal="0" />
                 <xs:element name="VendorId" msdata:DataType="System.Guid,
                   mscorlib, Version=2.0.3600.0, Culture=neutral,
                   PublicKeyToken=b77a5c561934e089" type="xs:anyType"
                   minOccurs="0" msdata:Ordinal="1" />
               </xs:sequence>
               <xs:attribute name="PartCode" type="xs:string" />
               <xs:attribute name="PartDescription" type="xs:string" />
               <xs:attribute name="Cost" type="xs:decimal" />
               <xs:attribute name="RetailPrice" type="xs:decimal" />
             </xs:complexType>
           </xs:element>
         </xs:choice>
       </xs:complexType>
       <xs:unique name="Constraint1" msdata:PrimaryKey="true">
         <xs:selector xpath=".//Vendor" />
         <xs:field xpath="Id" />
       </xs:unique>
       <xs:unique name="Part_Constraint1"
         msdata:ConstraintName="Constraint1"
         msdata:PrimaryKey="true">
         <xs:selector xpath=".//Part" />
         <xs:field xpath="Id" />
       </xs:unique>
       <xs:keyref name="vendor_part" refer="Constraint1">
         <xs:selector xpath=".//Part" />
         <xs:field xpath="VendorId" />
       </xs:keyref>
     </xs:element>
   </xs:schema>
   <Vendor Name="Tailspin Toys">
     <Id>f6cb3b63-6d1a-4941-abe8-b81b5b2357c4</Id>
   </Vendor>
   <Part PartCode="WGT1" PartDescription="Widget 1 Description"
     Cost="10" RetailPrice="12.32">
     <Id>d7263a61-f9de-4d27-9d29-5167e95317d0</Id>
     <VendorId>f6cb3b63-6d1a-4941-abe8-b81b5b2357c4</VendorId>
   </Part>
   <Part PartCode="WGT2" PartDescription="Widget 2 Description"
     Cost="9" RetailPrice="11.32">
     <Id>310a89d1-60dd-4b06-b368-453ed210f670</Id>
     <VendorId>f6cb3b63-6d1a-4941-abe8-b81b5b2357c4</VendorId>
   </Part>
 </VendorData>
 

image from book

When the XmlWriteMode is set to WriteSchema, the resulting XML file is substantially larger. When few files are being generated for this data, this approach is acceptable. But if many files are being created, it might be better to create a separate XSD file that can be loaded before the data. You can use the DataSet object's WriteXmlSchema method to extract the XML schema definition to a separate file, as shown here.

image from book

Visual Basic

vendorData.WriteXmlSchema("c:\vendorSchema.xsd")
 

image from book

image from book

C#

vendorData.WriteXmlSchema(@"c:\vendorSchema.xsd");
 

image from book

Serializing a Changed DataSet Object as a DiffGram

A DiffGram is an XML document that contains all of the data from your DataSet object, including the original DataRow object information. A DataSet object can be serialized as a DiffGram by simply setting the XmlWriteMode to DiffGram as shown in the following snippet.

image from book

Visual Basic

vendorData.WriteXml("c:\vendors.xml", XmlWriteMode.DiffGram)
 

image from book

image from book

C#

vendorData.WriteXml(@"c:\vendors.xml", XmlWriteMode.DiffGram);
 

image from book

Where is the DiffGram useful? Picture this: you are writing an application that occasionally connects to a database to synchronize your disconnected DataSet object with the current information that is contained in the database. When you are not connected to the database, you want your DataSet object to be stored locally. All of the previous examples of serializing a DataSet object stored the current data, but not the original data. This means that when you deserialize the data, you will have lost the information needed to find the changed DataRow objects that should be sent back to the database.

The DiffGram contains all of the DataRowVersion information as shown in the following XML document. Notice that Part1 has been modified, and its status is indicated as such. Also notice that the bottom of the XML document contains the "before" information for DataRow objects that have been modified or deleted. This XML document also shows Part2 as being deleted because Part2 has "before" information but not current information. Part3 is an inserted DataRow object as indicated, so this DataRow object has no "before" information.

image from book

XML

<?xml version="1.0" standalone="yes"?>
 <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"
     xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
   <VendorData>
     <Vendor diffgr: msdata:rowOrder="0" Name="Tailspin Toys">
       <Id>0ad3358e-38a6-4648-ba68-209bb212e7a3</Id>
       <Part diffgr: msdata:rowOrder="0" diffgr:hasChanges="modified"
           PartCode="WGT1" PartDescription="Widget 1 Description"

            Cost="12" RetailPrice="12.32">
          <Id>ec52f9d2-392f-4870-8003-497d658076ec</Id>
          <VendorId>0ad3358e-38a6-4648-ba68-209bb212e7a3</VendorId>
       </Part>
       <Part diffgr: msdata:rowOrder="2" diffgr:hasChanges="inserted"
           PartCode="WGT3" PartDescription="Widget 3 Description"
           Cost="8" RetailPrice="10.02">
         <Id>3c34b25f-336a-4e42-a9c8-bfba177056a3</Id>
         <VendorId>0ad3358e-38a6-4648-ba68-209bb212e7a3</VendorId>
       </Part>
     </Vendor>
   </VendorData>
   <diffgr:before>
     <Part diffgr: msdata:rowOrder="0"
           PartCode="WGT1" PartDescription="Widget 1 Description"
           Cost="10" RetailPrice="12.32">
       <Id>ec52f9d2-392f-4870-8003-497d658076ec</Id>
       <VendorId>0ad3358e-38a6-4648-ba68-209bb212e7a3</VendorId>
     </Part>
     <Part diffgr: diffgr:parent msdata:rowOrder="1"
           PartCode="WGT2" PartDescription="Widget 2 Description"
           Cost="9" RetailPrice="11.32">
       <Id>e4bf3db0-7a8a-404e-84a2-4b7e4c588ffa</Id>
       <VendorId>0ad3358e-38a6-4648-ba68-209bb212e7a3</VendorId>
     </Part>
   </diffgr:before>
 </diffgr:diffgram>
 

image from book

Deserializing a DataSet from XML

You can easily deserialize XML into a DataSet from a file or stream. Remember that when a schema is not provided, all XML data is treated as string data, so you should first load the schema if it is in a separate file. The following code can be used to read the schema file and load the XML file.

image from book

Visual Basic

Dim vendorData as new DataSet()
vendorData.ReadXmlSchema("c:\vendorSchema.xsd")
vendorData.ReadXml("c:\vendors.xml", XmlReadMode.IgnoreSchema)
 

image from book

image from book

C#

DataSet vendorData = new DataSet();
vendorData.ReadXmlSchema(@"c:\vendorSchema.xsd");
vendorData.ReadXml(@"c:\vendors.xml", XmlReadMode.IgnoreSchema);
 

image from book

In the preceding example, the XmlReadMode is set to IgnoreSchema. This means that if the XML data file contains an XML schema definition, it is ignored. Table 1-7 lists the other options of the XmlReadMode enumeration.

Table 1-7: XmlReadMode Enumeration Members

XmlReadMode Value

Description

Auto

The XML source is examined by the ReadXml method and the appropriate mode is selected.

DiffGram

If the XmlFile contains a DiffGram, the changes are applied to the DataSet using the same semantics that the Merge method uses. Merge is covered in more detail in the next section.

Fragment

Reads the XML as a fragment. Fragments can contain multiple root elements. FOR XML in SQL Server is an example of something that produces fragments.

IgnoreSchema

Ignores any schema that is defined within the XML data file.

InferSchema

The XML file is read, and the schema (DataTable objects and DataColumn objects) is created based on the data. If the DataSet currently has a schema, the existing schema is used and extended to accommodate tables and columns that existing in XML document but don't exist in the DataSet object. All data types of all DataColumn objects are a string.

InferTypedSchema

The XML file is read, and the schema is created based on the data. An attempt is made to identify the data type of each column, but if the data type cannot be identified, it will be a string.

ReadSchema

Reads the XML file and looks for an embedded schema. If the DataSet already has DataTable objects with the same name, an exception is thrown. All other existing tables will remain.

Inferring a schema simply means that the DataSet attempts to create a schema for the data by looking for patterns of XML elements and attributes.

Serializing the DataSet Object as Binary Data

Although the DataSet can be serialized as XML, in many situations the size of the XML file causes problems with resources such as memory and drive space or bandwidth when you move this data across the network. If XML is not required, the DataSet can be serialized as a binary file. The code snippet that appears at the top of the next page writes to a binary file the contents of the vendorData DataSet that we previously defined and populated.

In this example code, a BinaryFormatter object is created and is used to create the binary file. Opening the binary file using the Visual Studio hex editor would reveal that the binary file contains embedded XML, as shown in Figure 1-6. We'll refer to this file as the BinaryXml file. The size of this small sample is 4186 bytes. Adding the line of code that follows Figure 1-6 on the next page to the beginning of the code sample causes the file to be saved with true binary data.

image from book
Figure 1-6: The DataSet is serialized to a binary file, but the binary file contains embedded XML.

image from book

Visual Basic

'Added the following Imports statements to the top of the file
Imports System.Runtime.Serialization.Formatters.Binary
Imports System.IO
…
Dim fs as new FileStream( _
    "c:\vendorData.bin",FileMode.Create)
Dim fmt as new BinaryFormatter()
fmt.Serialize(fs, vendorData)
fs.Close()
 

image from book

image from book

C#

//Added the following using statements to the top of the file
using System.Runtime.Serialization.Formatters.Binary;
using System.IO;
…
FileStream fs = new FileStream(
   @"c:\vendorData.bin",FileMode.Create);
BinaryFormatter fmt = new BinaryFormatter();
fmt.Serialize(fs, vendorData);
fs.Close();
 

image from book

image from book

Visual Basic

vendorData.RemotingFormat = SerializationFormat.Binary
 

image from book

image from book

C#

vendorData.RemotingFormat = SerializationFormat.Binary;
 

image from book

We can then run the code, with the results shown in Figure 1-7. We'll refer to this file as the TrueBinary file. The size of this small DataSet object is more than 20,000 bytes. This file was supposed to get smaller, but the initial overhead to create the TrueBinary file was more than 20,000 bytes, compared with about 3400 bytes of initial overhead for the BinaryXml file. With 10,000 vendors and 20,000 parts, the BinaryXml file size grows to 7,938,982 bytes and the TrueBinary file grows to only 1,973,401 bytes. This means that small objects might not benefit from changing the RemotingFormat property to Binary, while large objects will be about one-fourth the size, or four times faster.

image from book
Figure 1-7: After we set RemotingFormat to binary data, the binary file no longer contains embedded XML.

Note 

The DataTable also contains the RemotingFormat property, which can be used when only a single DataTable is to be saved as binary data.

Deserializing a DataSet from Binary Data

The binary data file we saved in the previous example can easily be deserialized into a DataSet from a file or stream. The BinaryFormatter stores the schema automatically, so there is no need to load a schema first. The BinaryFormatter automatically identifies the file as having been saved as BinaryXml or TrueBinary. The following code can be used to load the binary file.

image from book

Visual Basic

Dim vendorData as DataSet
Dim fs as new FileStream( _
   "c:\vendorData.bin", FileMode.Open)
Dim fmt as new BinaryFormatter()
vendorData = CType(fmt.Deserialize(fs),DataSet)
fs.Close()
 

image from book

image from book

C#

DataSet vendorData;
FileStream fs = new FileStream(
   @"c:\vendorData.bin", FileMode.Open);
BinaryFormatter fmt = new BinaryFormatter();
vendorData = (DataSet)fmt.Deserialize(fs);
fs.Close();
 

image from book

Using Merge to Combine DataSet Data

On many occasions, data available in one DataSet must be combined with another DataSet. For example, a sales application might need to combine serialized DataSet objects received by e-mail from a number of salespeople. Even internally within an application, you might want to create a copy of DataTable objects that the user can edit, and based on the user clicking Update, the modified data can be merged back to the original DataSet.

The DataSet contains a method called Merge that can be used to combine data from multiple DataSet objects. The Merge method has several overloads to allow data to be merged from DataSet, DataTable, or DataRow objects. The following code example demonstrates using the Merge method to combine changes from one DataSet into another DataSet.

image from book

Visual Basic

'Create an initial DataSet
Dim masterData As New DataSet("Sales")
Dim person As DataTable = masterData.Tables.Add("Person")
person.Columns.Add("Id", GetType(Guid))
person.Columns.Add("Name", GetType(String))
person.PrimaryKey = New DataColumn() {person.Columns("Id")}
person.Rows.Add(Guid.NewGuid(), "Joe")
'Create a temp DataSet and make changes
Dim tempData As DataSet = masterData.Copy()
'get Joe's info
Dim tempPerson As DataTable = tempData.Tables("Person")
Dim joe As DataRow = tempPerson.Select("Name='Joe'")(0)
Dim joeId As Guid = CType(joe("Id"), Guid)
'Modify joe's name
joe("Name") = "Joe in Sales"
'Create an Order table and add orders for Joe
Dim order As DataTable = tempData.Tables.Add("Order")
order.Columns.Add("Id", GetType(Guid))
order.Columns.Add("PersonId", GetType(Guid))
order.Columns.Add("Amount", GetType(Decimal))
order.PrimaryKey = New DataColumn() {order.Columns("Id")}
order.Rows.Add(Guid.NewGuid(), joeId, 100)
'Now merge back to master
masterData.Merge(tempData, False, MissingSchemaAction.AddWithKey)
 

image from book

image from book

C#

//Create an initial DataSet
DataSet masterData = new DataSet("Sales");
DataTable person = masterData.Tables.Add("Person");
person.Columns.Add("Id", typeof(Guid));
person.Columns.Add("Name", typeof(string));
person.PrimaryKey = new DataColumn[] { person.Columns["Id"] };
person.Rows.Add(Guid.NewGuid(), "Joe");
//Create a temp DataSet and make changes
DataSet tempData = masterData.Copy();
//get Joe's info
DataTable tempPerson = tempData.Tables["Person"];
DataRow joe = tempPerson.Select("Name='Joe'")[0];

Guid joeId = (Guid)joe["Id"];
//Modify joe's name
joe["Name"] = "Joe in Sales";
//Create an Order table and add orders for Joe
DataTable order = tempData.Tables.Add("Order");
order.Columns.Add("Id", typeof(Guid));
order.Columns.Add("PersonId", typeof(Guid));
order.Columns.Add("Amount", typeof(decimal));
order.PrimaryKey = new DataColumn[] { order.Columns["Id"] };
order.Rows.Add(Guid.NewGuid(), joeId, 100);
//Now merge back to master
masterData.Merge(tempData, false, MissingSchemaAction.AddWithKey);
 

image from book

This code creates a DataSet that contains a single DataTable object, called Person. A person named Joe was added to the Person DataTable object. The DataRowState for Joe's DataRow is Added. Next, the code copies the masterData DataSet object to a DataSet object called tempData. The code modifies the tempData DataSet object by changing Joe's name to Joe in Sales, and then it creates a new DataTable object called Order and adds an order.

The Merge method on masterData, which takes three parameters, is then called. The first parameter is the tempData object. The second parameter is a Boolean called preserveChanges, which specifies whether updates from the tempData DataSet should overwrite changes made in the masterData object. For example, Joe's DataRowState in the masterData DataSet is not Unchanged, so if the preserveChanges setting is true, Joe's name change (to Joe in Sales) will not be merged into masterData. The last parameter is a MissingSchemaAction enumeration member. The AddSchemaWithKey is selected, which means the Sales DataTable and its data are added to masterData. Table 1-8 describes the enumeration members.

Table 1-8: MissingSchemaAction Enumeration Members

MissingSchemaAction Value

Description

Add

Adds the necessary DataTable and DataColumn objects to complete the schema.

AddWithPrimaryKey

Adds the necessary DataTable, DataColumn, and PrimaryKey objects to complete the schema.

Error

An exception is thrown if a DataColumn does not exist in the DataSet that is being updated.

Ignore

Ignores data that resides in DataColumns that are not in the DataSet being updated.

When you use the Merge method, make sure the DataTable objects have a primary key. Failure to set the PrimaryKey property of the DataTable object results in DataRow objects being appended rather than existing DataRow objects being modified.