Technology BackgrounderSchemas and Datasets | .NET Patterns: Architecture, Design, and Process

XML Schema

An XML schema is simply a data-friendly or, more specifically , a database-friendly meta-language built around XML. The XML schema definition language, usually encompassed in an .XSD file, enables you to define the structure and data types for XML documents that you would find when designing a database. Until now, there have been a few formats floating around, including a more proprietary version from Microsoft supported by ADO 2.6 and earlier and Biztalk 2000, supported in the XDR format. Fortunately, we've arrived at a World Wide Web Consortium (W3C) standard that is completely supported by Microsoft .NET and ADO.NET. The XSD supported by ADO.NET adds custom elements (in the true Microsoft fashion) but it does so in a compliant way, and they should not affect parsers not using .NET. For those building interoperable applications, this is indeed good news.

Creating an XML schema is just like creating any other XML document but with a few standards and rules. An XML schema document (XSD) is based on the W3C 2001 recommendation specifications for data types and data structures. You create an XML schema by defining elements and attributes that make up types, such as complex types, that conform to the W3C XML schema Part 1 structures recommendation for the XML schema definition language. You also define and reference data types using the W3C XML schema Part 2 specification. There are built-in types, such as string and integer , but you may also create your own custom data types such as you would find in the Microsoft implementations .

To create an XML schema, you must first begin with a schema element such as the following in Listing 5.1.

Listing 5.1 XML schema snippet random example for the Poly Model pattern.

 <xs:schema id="PolyModel" targetNamespace="http://www.etier.com/PolyModel.xsd" . . . xmlns:xs=http://www.w3.org/2001/xmlschema xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">

You should notice that the XML schema is first composed of a top-level schema element. The schema element definition must include the namespace above. After this element, the document may contain any number of complex and simple type definitions, using the simpleType and complexType element tags:

SimpleType Represents the type definition for values that are used as text of an element or attribute. This data type cannot contain nested components , such as elements, or have any attributes.
ComplexType When defining types such as those that describe a database table, you use this type. This is for elements that contain any attributes and/or other elements, such as the columns that make up a database table.

These tags are used to create your custom types. These types include those that represent the tables, fields, and any keys your schema may define, such as the following schema:

Listing 5.2 The Customer schema to be used throughout this chapter.

 <?xml version="1.0" encoding="utf-8" ?>   <xs:schema           id="Customers"           targetNamespace="http://www.etier.com/Customers.xsd"     . . .     xmlns:xs="http://www.w3.org/2001/xmlschema"     xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">       <xs:element name="Customers" msdata:IsDataSet="true">             <xs:complexType>               <xs:choice maxOccurs="unbounded">                 <xs:element name="Customer">                   <xs:complexType>               <xs:sequence>                     <xs:element name="CustomerID"                        type="xs:string" />                     <xs:element name="CompanyName"                        type="xs:string" />                     <xs:element name="ContactName"                        type="xs:string"                        minOccurs="0" />                        <xs:element name="ContactTitle"                        type="xs:string"                        minOccurs="0" />                     . . .                     </xs:sequence>                  </xs:complexType>                  </xs:element>                  . . .               </xs:choice>            </xs:complexType>            <xs:unique name="CustomersKey1"               msdata:PrimaryKey="true">                  <xs:selector xpath=".//Customer" />                  <xs:field xpath="mstns:CustomerID" />            </xs:unique>   </xs:element> </xs:schema>

One of the nice aspects of XML schemas is that you need to know only a few schema semantics to use them effectively in your application. For this reason, I will cover only what you minimally need to understand in order to implement the patterns in this chapter.

Schemas and DataSets

What is an XML schema in the .NET world really?

It describes any data model, whether that model stays in memory or is persisted .
When using constraints, it can be used to validate any data being inserted into a DataSet at runtime.
It establishes the relational structure of the DataSet's tables and columns. It can specify the key columns, constraints, and relationships between tables.
It specifies whether its "shape" will be hierarchical or key-based.
It serves as an accepted standard for describing any complex data using XML.

The patterns throughout this chapter continually refer to a DataSet as being of two types: typed and untyped. You will see the terms typed and strongly typed used interchangeably, even though they are subtly different in their context. So what are they? An untyped DataSet is just any instantiated DataSet with no direct relationship to any given XML schema. A typed DataSet is a class that has been generated from a schema, something you will see in the Schema Field pattern below. There really isn't any difference between a typed DataSet and the actual schema itself because the DataSet, in essence, becomes the schema in "class form." This means that you can work with a schema "through" the DataSet class, using its typed members to access elements of the schema and the typed instance data it may contain. The schema represents a possible persistence target that would represent a DataSet if it were saved as XML. A schema is the skeleton of a DataSet object and for most purposes, they can be seen as one and the same in .NET.

When working with the Visual Studio XML Designer, you have the option of creating a schema from scratch or from the existing data model through Server Explorer, which you can use in your code to bind to a DataSet. Schemas and DataSets can also be created programmatically from any data source that you have access to at runtime. Taken a step further, Visual Studio will even automatically generate a working DataSet class from the created schema. The DataSet Visual Studio creates what is considered the strongly typed DataSet I referred to above. It is strongly typed in that the DataSet it creates inherits from the DataSet class and provides strongly typed methods and data members so that you can access members of the schema by name. This means you can access tables and columns by name, instead of iterating through collections or using indexers.

For example, to access a column in code from a strongly typed DataSet, it would look something like the following:

Listing 5.3 Code snippet showing the manipulation of the strongly typed DataSet.

 Customers dsCustomers = new Customers(); //Customers is a dataset Customers.CustomerRow oRow = dsCustomers.Customer.NewCustomerRow();    oRow.CompanyName = "The eTier Group";    oRow.ContactName = "Christian Thilmany";    oRow.Phone = "713-555-2343";    . . .

The Customers class is simply a DataSet-derived class that is autogenerated from Visual Studio. Instead of instantiating a DataSet, you instantiate the strongly typed DataSet as such and manipulate the data in a schema-specific manner by referencing its members by name. This is much cleaner and easier to read than the default means of using indexers in ADO.NET. No more keeping track of index values for columns or tables or having to assign numbers to constants that represent those index numbers . However, there are few things to watch out for that we'll cover shortly in this chapter.

When changes are made to a schema file, the definition of the DataSet class actually changes at design time. They are both essentially schema files in the XML Designer, with the difference being that typed DataSets have an associated class file as a DataSet child. At design time, untyped DataSets do not contain any structure or instance data; therefore, XML schemas cannot be generated from untyped DataSets. This doesn't mean you cannot serialize this object to XML, but that is another topic altogether. However, during runtime, once the DataSet has been populated with instance data through its normal untyped DataSet routines, a schema based on that instance data can be generated in code. If you've ever worked with SQLXML (version 3.0 at the time of writing this book) for SQL Server and/or the ADO XML support, you have already had a taste of this sort of behavior (albeit not as streamlined). This is one of the most powerful features of .NET and ADO.NET, specifically. With typed DataSets, you now have an in-memory data model that, at one point, COM+ promised to support with the "In Memory Database" (IMDB), which was unfortunately never delivered.

An ancillary benefit to using strongly typed DataSets from the Studio Designer is that the DataSet also incorporates table and column names into the statement completion feature (see Intellisense ). The code is not only easier to read but with Intellisense it is much easier to write, with type mismatch errors being caught at compile time or even composition time, rather than runtime. In Studio, any preexisting data table can be dragged into the designer window or even loaded by file. The designer also displays both the XML version and a more readable Access-style layout of the data structure. This provides you with the best of both design worlds . Once created, an .XSD file will appear in your current project, which you will see in the tree. Hidden from your defaulted view is the generated DataSet class I just spoke of unless you have chosen to view all files in the Solution Explorer. The generated file should be right off of the .XSD file just created. Like the Web service proxy code that is generated by the compiler, Visual Studio also includes a file behind the scenes during a save in Studio. To see the file generated, simply go to File Explorer and look for a source file with the name of the DataSet schema you gave upon creation. You can then see what exactly is generated to get a better feel for the strongly typed code if you don't want to use Intellisense for browsing.

Creating Typed DataSets

As mentioned, the XML Designer in Visual Studio can automatically generate a typed DataSet for you from your schema (.XSD). Follow these steps:

From the Project menu, select Add New Item, highlighting the project you want this added to.
In the New Existing Item dialog box, select DataSet from the Data folder and click OK.
Double-click the newly created schema (.XSD) in the Solution Explorer if the XML View is not currently open .
From here you can drag an existing table from the Server Explorer or create a schema from scratch.
From the context menu in the XML Designer, select Generate DataSet if it isn't already selected.
Each time you save the schema in the project, the typed DataSet will be recreated to match the new structure. You should also have this class available in Intellisense and be able to use it in code.

Note

All XML schemas created in Visual Studio conform to the W3C specification for schemas. There are some features of ADO.NET that could not be described with the default schema syntax. In those cases, custom attributes were used to define ADO.NET-specific items. You can find these custom features when viewing the schema in the Studio XML View mode. They can be identified with the "msdata:" qualifier. This does not make these schemas noncompliant because the specification allows for custom attributes (they will be ignored by parsers that do not support that attribute).

So what does the generated typed DataSet code look like? Here is a brief sample of the Customer typed DataSet generated from the XML Designer. This is generated from Visual Studio, so you should rarely, if ever, have to manipulate this code. In fact, I discourage you from doing just that unless you will not be changing it (highly unlikely because schemas change constantly as data requirements change, thus the reason behind this chapter).

Much of the autogenerated code has been omitted from the following because it is quite long. This is shown only to get a feel for some of the type-strong code behavior automatically built for you by Visual Studio. Who needs a " magical black box" for data handling like you get with Contained Manager Persistence (J2EE/EJB) when you can have complete control with generated data object code such as the following?

Listing 5.4 The Customers "typed" DataSet shown as codeas generated by Visual Studio .NET.

 //--------------------------------------------------------------- // <autogenerated> //     This code was generated by a tool. //     Runtime Version: 1.0.3705.209 // //     Changes to this file may cause incorrect behavior and will //     be lost if //     the code is regenerated. // </autogenerated> //--------------------------------------------------------------- namespace DataTierPatterns.Data {     using System;     using System.Data;     using System.Xml;     using System.Runtime.Serialization;     [Serializable()]     . . .     public class Customers : DataSet           {         private CustomerDataTable tableCustomer;         private JOBDataTable tableJOB;         private OrderDataTable tableOrder;         public Customers()               {             this.InitClass(); . . .       }             . . . public CustomerDataTable Customer {             get             {                 return this.tableCustomer;             } }         . . .   [System.Diagnostics.DebuggerStepThrough()]         public class CustomerDataTable :    DataTable, System.Collections.IEnumerable               {             private DataColumn columnCustomerID;             private DataColumn columnCompanyName;             private DataColumn columnContactName;             private DataColumn columnContactTitle;             private DataColumn columnAddress;             private DataColumn columnCity;             private DataColumn columnRegion;       . . .       internal CustomerDataTable() :                     base("Customer")             {                 this.InitClass();             }             . . .             internal DataColumn CustomerIDColumn             {                 get             {                     return this.columnCustomerID;                 }             }             internal DataColumn CompanyNameColumn             {                 get                 {                     return this.columnCompanyName;                 }             }             public void AddCustomerRow(CustomerRow row)             {                 this.Rows.Add(row);             }             public CustomerRow AddCustomerRow(string CustomerID,    string CompanyName, string ContactName, string ContactTitle, string Address, string City, string Region, string PostalCode, string Country, string Phone, string Fax)            {                 CustomerRow rowCustomerRow u61 ?                 ((CustomerRow)(this.NewRow()));                 rowCustomerRow.ItemArray = cf2 new object[]            {                         CustomerID,                         CompanyName,                         ContactName,                         ContactTitle,                         Address,                         City,                         Region,                         PostalCode,                         Country,                         Phone,                         Fax};                 this.Rows.Add(rowCustomerRow);                 return rowCustomerRow;             }             public CustomerRow FindByCustomerID(string CustomerID) . . .

Schema Types

When you create your XML schema from scratch, you have to choose between two primary types of schema in reference to the relational data you may hold. When creating schemas in any tool, you have two approaches for representing relational data, and the structure used to create the schema will reflect that.

Nested Relationships

This uses a hierarchical relationship between tables that reside within elements and the children of those elements. Before databases were modeled using XML, this was my preferred way to represent data. The problem with this representation is that it isn't as efficient in layout as the relationship type below unless only a few tables are represented.

Nested relationships involve creating tables with columns just like a one-to-many scenario ( next ), except single-column definitions may also contain the complete definitions of any related tables. Nested elements may be easier to read and understand immediately, but structures may have to repeated throughout the XML file. This repetition can be avoided using the following type.

One-to-Many Relationships

This is the preferred option when directly modeling database entities. The relationships are represented as separate tables of rows that are using common columns to relate to one another. The common columns are defined as primary keys and as references for those tables, using a referenced key as a foreign key. The Edit Key and Edit Relation dialog boxes can be used for creating keys and creating relationships as needed. This takes a bit more thought to create but is much more efficient and "ERD-like" than using hierarchical links.

One-to-many relationships involve creating individual tables with columns just like the hierarchical option. However, creating your XML schema also involves designating the common columns as keys (e.g., primary keys) and/or as keyrefs (e.g., foreign keys). Once defined, you then create the relationships between them and finally apply the constraints. There is more thought that must go into this but the design is much cleaner and more efficient.

Tables, Columns, and Keys

Next I will briefly go over the main elements of an XML schema. I will explain how database entities such as tables, columns, and keys are represented in an XML schema.

Tables

Using XML, tables are represented as "complex" elements. The columns are then represented as subelements and attributes of that complex element. The name of the table is equal to the name of the element. The names of the columns in the schema are equal to the names of the subelements and attributes that make up the complex element. The following XML schema represents a table named Customers with several columns. Each column is an element with the attributes used to provide the name and type of the column. Notice that each column makes up the sequence of the complex type with the minOccurs attribute used to denote what fields can or cannot be null.

Note

Although I explain the process here, you do not have to enforce any relationships between your tables. In fact, depending on your design, you may not even need to assign columns as having primary keys. Relating your tables in the schema will only grant you the option of enforcing referential integrity in your code at runtime and will mimic what happens during a typical database operation. However, in this book and in the commercial application featured, I do not create relationships in many cases. I avoid as many relationship definitions as I can for flexibility. Personally, I'd rather use code to police my relationships because most rules are too complicated to represent the rather simple relationship constraints a schema would provide. Most of the data in both the application and patterns featured are custom validated before all of it is inserted into the typed DataSet. This provides me with the flexibility of using the tables as temporary storage without having to worry about integrity violations (see the Abstract Packet pattern in chapter 4). The choice is yours but keep in mind that defining hard relationships also may involve a loss of some flexibility when using typed DataSets. I prefer to have this flexibility and prefer making my schema easier to read in the process. This is not to say this is a bad practice; it is only my preference for schema definition. Your use of these schema constraint constructs should be a design decision and will vary on your implementation.

Listing 5.5 Complex Customers element from the Customers schema.

 <xs:element name="Customers" msdata:IsDataSet="true">         <xs:complexType>       <xs:choice maxOccurs="unbounded">         <xs:element name="Customer">           <xs:complexType>             <xs:sequence>         <xs:element name="CustomerID"                                 type="xs:string" />    <xs:element name="CompanyName"                           type="xs:string" />   <xs:element name="ContactName" type="xs:string" minOccurs="0" /> <xs:element name="ContactTitle" type="xs:string" minOccurs="0" /> . . .

Columns

As mentioned, columns are elements within a complex table element. Any elements without a minOccurs attribute are considered required columns, whereas the elements with the minOccurs='0' attribute are considered optional columns. The type attribute shown below and the types in this example are defined as simple types (types that are built into all XML schemas). Listing 5.5 shows how to define a table and its columns.

Primary Keys

To constrain columns to contain unique values, you can create a key. In the XML Designer, adding a key that does not have the DataSet Primary Key option selected will create a unique key instead of an xs:key as below. Unique constraints simply ensure that columns do not contain duplicate values, and the difference (as you'll see) between the two is subtle. The following example shows how to define the CustomerID column from the Customers table as a DataSet primary key in the schema:

Listing 5.6 Sample primary key shown in the Customers schema.

 <xs:key name="CustomersKey1" msdata:PrimaryKey="true"> <xs:selector xpath=".//mstns:Customer" /> <xs:field xpath="mstns:CustomerID" />   </xs:key>

You can use the toolbox in the XML Designer to drag a key over to the table (element) to which you want to assign a key. You may also select the Add menu by right-clicking on the table. This will create a key and can be viewed in the XML view, such as the example shown above (Listing 5.6).

Unique Keys

Unique constraints can allow null values, whereas primary key constraints do not allow null values: This is the primary difference I alluded to above. There is another consideration to make when deciding whether to use a primary or unique key. Tables can have multiple unique constraints but only one primary key. My schema requires that the CustomerID column must be unique, so a primary key or a simple unique key could have been used. Listing 5.7 shows how to define a unique key and reference an element in the schema:

Listing 5.7 Sample generic unique key shown in the Customers schema.

 <xs:unique name="CustomersKey1" msdata:PrimaryKey="true">           <xs:selector xpath=".//Customer" />           <xs:field xpath="mstns:CustomerID" />     </xs:unique>

Unique keys are created with the XML Designer the same way primary keys are. The only difference is one option on the dialog.

Keyrefs

To enforce referential integrity within your typed DataSet, you optionally set up keyrefs. A keyref creates the "many-side" definition of a one-to-many relationship. This is just like setting up a foreign key during database design. A keyref must reference either a primary key or a unique key, as is the case with Listing 5.8.

Listing 5.8 Overall flow of activity of the Poly Model "composite" pattern.

 <xs:keyref name="CustomersOrders" refer=".//CustomersKey1">           <xs:selector xpath=".//Orders" />           <xs:field xpath="CustomerID" />        </xs:keyref>

This can also be created using the toolbox in the XML Designer. Using the Edit Relation dialog box, you can select the rules that affect what happens to related data. These rules enforce the referential integrity of the schema. When updating or deleting a record in a primary key column, there can be many records in another table that reference the primary key. Table 5.1 briefly describes each constraint managed from this dialog box.

Table 5.1. Constraints Available in the Edit Relation Dialog Box

Rule	Description
Cascade	Update or delete all related rows.
SetNull	Set related rows to null.
SetDefault	Set related rows to the specified DefaultValue.
None	No action taken.

I think it's time to begin talking about Poly Models. You should have been provided enough XML schema background to get your foot in the design door at this point. As mentioned, Poly Models are a little different than what you're probably accustomed to. These next patterns really need to be read in their entirety to get a true perspective of their application scenarios but I hope you will find them a refreshing option to data access that provides you with another tool to use in your application.