Getting Started | Beginning ASP.NET Databases Using VB.NET

XML and the vocabularies defined with XML have established themselves as the prevalent and most promising lingua franca of business-to-business, business-to-consumer, or generally , any-to-any data interchange and integration. One of the major reasons for the success of XML in this space is that XML is a simple, Unicode -based, platform-independent syntax for which simple and efficient parsers are widely available. Another important factor in favor of XML is its ability not only to represent structured data, but also to provide a uniform syntax for structured data, semi-structured data, and markup data. Structured data is data that easily fits into a predefined, homogeneous type structure such as relations or nested objects. Semi-structured data is data that may change its structure from instance to instance. For example, it may have some components that are one-off annotations that appear only on single instances or it may have components that are of heterogeneous type structure (e.g., one time the address data is a string, another it is a complex object). Finally, markup data is data that is mainly free- flowing text with interspersed markup to call out document structure and highlight important aspects of the text. Listing 7.1 gives a short example of each type of data.

Listing 7.1 Examples of Structured, Semi-Structured and Markup Data

  structured data:  <Customer>   <ID>C1</ID>   <FirstName>Janine</FirstName>   <LastName>Smith</LastName>   <Address>     <Street>1 Broadway Way</Street>     <City>Seattle</City>     <Zip>WA 98000</Zip>   </Address>   <Order>     <ID>O1</ID>     <OrderDate>2003-01-21<OrderDate>     <Amount>7</Amount>     <ProductID>P1</ProductID>   </Order>   <Order>     <ID>O2</ID>     <OrderDate>2003-06-24<OrderDate>     <Amount>3</Amount>     <ProductID>P3</ProductID>   </Order> </Customer>  semistructured data:  <PatientRecord pid="P1">   <FirstName>Janine</FirstName>   <LastName>Smith</LastName>   <Address>     <Street>1 Broadway Way</Street>     <City>Seattle</City>     <Zip>WA 98000</Zip>   </Address>   <Visit date="2002-01-05"> Janine came in with a <symptom>rash</symptom>. We identified a <diagnosis>antibiotics allergy</diagnosis> and <remedy>changed her cold prescription</remedy>.   </Visit> </PatientRecord> <PatientRecord>   <pid>P2</pid>   <Name>Nils Soerensen</Name>   <Address>     23 NE 40 ^th Street, New York   </Address> </PatientRecord> Marked-up data:   <Visit date="2002-01-05">Janine came in with a <symptom>rash</symptom>. We identified a <diagnosis>antibiotics allergy</diagnosis> and <remedy>changed her cold prescription</remedy>.</Visit>

Much of the data interchanged today originates from relational databases and is finally stored again in relational databases. Since relational systems today are predominantly used to manage structured data (an educated guess would be 80 percent or more), most of the XML generated and being consumed at this time in the context of data interchange also fits the relational model of structured data. Therefore most relational database systems have first focused on providing XML capabilities that fit this most common usage scenario and have just recently begun to provide XML capabilities to deal with data that doesn't fit into the structured mold.

By adding the more general capabilities to deal with any form of XML data, relational database systems will be able to extend their efficient and effective processing of structured data to new application areas such as XML-based document management, management of XML messages such as SOAP messages, and the processing of audit logs in XML format. By handling XML natively instead of requiring shredding into relational structures, relational database systems will be able to provide seamless support for these applications.

Relational database systems not only store XML data, but also provide query and update support on the relational and XML data they store. While SQL has proven itself as the query language for relational data, it is not suitable in its current form as the direct query language over XML data. Since a query language should have as little impedance mismatch as possible to the data model it queries, the W3C, in cooperation with its member companies, decided to design a new XML query language named XQuery. To provide the necessary support for native XML data storage and processing, XQuery must become an integral part of XML-enabled relational database systems. In order to store and query XML data natively in a relational database, the XML data must be exposed in the relational system as a manageable entity. Relational systems store and expose XML in one of the following forms:

Virtual XML view : a virtual document ”also called a virtual XML view ”over predominantly relational data
LOB (large object) storage : an LOB-based XML document, either as a character LOB, called a CLOB, a binary LOB, called a BLOB , or a native XML datatype (XML)
Relational table mappings : an XML document decomposed into relational data structures, possibly with subtrees mapped to LOBs

Chapter 6, "Mapping between XML and Relational Data," provides the in-depth investigation of the multitude of options for storing and querying virtual XML views over relational and XML documents that have been decomposed into relational data. Each of these options operates as a conceptual mid- tier or client-side component that uses the existing relational interfaces. They all expose an XML view against one or multiple relational databases. XML queries posed against the exposed view are translated into relational queries that are shipped over to the database server. After they have been executed, the result is returned either as XML or relational data for potential further post-processing.

Unlike Chapter 6, this chapter concentrates on the storage and processing of XML data that integrates with the relational system's capabilities, such as the relational query processor and execution environment. We begin with an overview of the XML datatype ”the primary mechanism provided by relational systems for native XML data storage ”before explaining how XQuery access is provided on the XML data and how XQuery is embedded in the relational query system. Then we outline how the concept of tables as collections of tuples can be generalized and applied to collections of XML documents and how XQuery can be integrated with that approach.

The section on the XML datatype discusses the appearance of the type to the user (its logical model), the physical implementation options, and the relationship to the XML Schema metadata. The section on integrating XQuery and SQL describes how to use XQuery to query an XML datatype instance, how to combine XQuery with SQL, how to correlate relational and XML data. We conclude with an overview on how to physically map XQuery in the context of a relational query engine.

As this is written, many of the features described below are available in one form or another from current or upcoming versions of the major relational database systems such as (in alphabetical order) IBM DB2, Microsoft SQL Server, and Oracle. Since the feature sets of individual database systems are changing from release to release, we do not usually attribute the features described here to any particular database system. Instead, we refer to the SQL/XML part of the SQL-2003 standard [SQL2003] where appropriate and use a pseudo-notation for anything that is not (yet) described in SQL-2003. For example, we use an abstract functional notation of the form isvalid(XML, SchemaComponents, 'lax''strict') boolean for functions on the XML datatype. Different implementations may provide different syntax, ranging from method calls on the XML type instance to SQL keywords.