Extensible Markup Language (XML) | MCAD/MCSE/MCDBA 70-229 Exam Cram 2: Designing & Implementing Databases w/SQL Server 2000 Enterprise Edition

XML (Extensible Markup Language) is really quite like a database itself, in that it stores the data along with the basic data definition and with an XML schema defines the complete data structure and business rules. XML is much more portable than a database and is quickly becoming the standard for data exchange among websites, applications, and other implementations that require exchange of data. The language itself is derived from HTML, which in turn was designed to provide an application-independent, character-set-independent method of transferring data, especially transaction-oriented data, across systems.

XML has gained acceptance as a means to communicate data across platforms, especially over the Internet. This exam objective requires you to know how to retrieve data from SQL Server to exchange with other systems. In particular, you will be required to know how to import and manipulate data using OPENXML.

In general, XML is a document format that can be used to transfer data between systems. Within the SQL Server environment, XML documents can be created with the FOR XML clause of the SELECT statement. This clause provides several options for formatting XML output.

Now here is a new feature that is drilled to death on the exam. Expect several questions because it is a new feature and a strong technology. The implementation of XML is a little rough in this release of SQL Server and there is hope for improvement; but just the fact that the capability is present is worth noting. With an industry push on this standard interprogram communication technique, it is worth spending some time to master SQL Server's implementation of XML.

Okay, this is a technical book so here it is in the proverbial technical nutshell. XML is a subset of SGML, which is a huge standard that is so complex nobody has ever completely implemented it. HTML is also a subset of SGML but is quite limited; XML is less limited (and more complex) than HTML but is still far less complex than SGML. Got it? Okay, then let's move on and take a little more of a basic approach.

An XML document consists of one or more elements, which are bound between angle brackets (< >). The first word that appears inside the angle brackets is the name of the element. The rest of the element consists of element attributes. For example, here's an element:

 <row xmlns="x-schema:#Schema1"  First="Danny" Last="Thomas"/>

The name of the element, or the element type, is row. The xmlns property specifies the name of the schema, which is the format that the element will take. The element then has other attributes, such as ID, First, and Last, which all have values. The element ends with a forward slash and an angle bracket, indicating the end of the element.

An element can also contain other elements as shown here:

 <SalesPerson  First="Danny" Last="Thomas"> <Sales Qty="4"/> <Sales Qty="3"/> </SalesPerson>

In this case, the SalesPerson element contains two Sales elements. Notice that on the first line there isn't a slash before the ending bracket; the matching slash for this element is on the last line. This is how objects are nested in XML.

Simplifying XML Use

To output data in XML format, the SELECT statement includes an operator called FOR XML. This specifies to SQL Server that instead of returning a rowset, it should return an XML document. There are three different options for generating the XML: RAW, AUTO, and EXPLICIT.

Where is the schema? To produce XML output that also contains the schema information for the XML, you must tack the XMLDATA qualifier to the end of the FOR XML clause.

In AUTO mode, SQL Server returns the rowset in an automatically generated nested XML format. If the query has no joins, it doesn't have a nesting at all. If the query has joins, it returns the first row from the first table, then all the correlated rows from each joined table as a nested level. For example, the following query shows order details nested inside of orders:

 SELECT O.OrderID, O.CustomerID, OD.ProductID, OD.UnitPrice, OD.Quantity     From Orders AS O  JOIN [Order Details] AS OD ON O.OrderID = OD.OrderID  WHERE O.OrderID < 10251  FOR XML AUTO <O Order Customer> <OD Product UnitPrice="14.0000" Quantity="12"/> <OD Product UnitPrice="9.8000" Quantity="10"/> <OD Product UnitPrice="34.8000" Quantity="5"/></O> <O Order Customer> <OD Product UnitPrice="18.6000" Quantity="9"/> <OD Product UnitPrice="42.4000" Quantity="40"/></O> <O Order Customer> <OD Product UnitPrice="7.7000" Quantity="10"/> <OD Product UnitPrice="42.4000" Quantity="35"/> <OD Product UnitPrice="15.8000" Quantity="15"/></O>

Note that the aliases for each table become the row identifiers within the XML output. Unfortunately, Query Analyzer by default shows only the first 256 characters of a string that's returned. For most XML queries that will not be adequate, and a setting of the maximum 8192 is recommended. To set the option within the query analyzer, select Tools, Options, Results from the menu. It is for this reason that we have tried to limit the data output size in most of the examples throughout this section by including a WHERE clause to filter the number of rows.

To produce a listing that also supplies the schema elements of the XML (now that is really hard to read), you tack the XMLDATA onto the end of the FOR XML clause:

 SELECT * FROM Products WHERE ProductID = 1 FOR XML AUTO, XMLDATA <Schema name="Schema3" xmlns="urn:schemas-microsoft-com:xml-data" xmlns: dt="urn:schemas-microsoft-com:datatypes"><ElementType name="Products" content="empty" model="closed"><AttributeType name="ProductID" dt:type= "i4"/><AttributeType name="ProductName" dt:type="string"/><AttributeType name="SupplierID" dt:type="i4"/><AttributeType name="CategoryID" dt:type ="i4"/><AttributeType name="QuantityPerUnit" dt:type="string"/> <AttributeType name="UnitPrice" dt:type="fixed.14.4"/><AttributeType name="UnitsInStock" dt:type="i2"/><AttributeType name="UnitsOnOrder" dt: type="i2"/><AttributeType name="ReorderLevel" dt:type="i2"/><Attribute Type name="Discontinued" dt:type="boolean"/><attribute type="ProductID"/ ><attribute type="ProductName"/><attribute type="SupplierID"/><attribute  type="CategoryID"/><attribute type="QuantityPerUnit"/><attribute type= "UnitPrice"/><attribute type="UnitsInStock"/><attribute type="UnitsOn Order"/><attribute type="ReorderLevel"/><attribute type="Discontinued"/> </ElementType></Schema><Products xmlns="x-schema:#Schema3" Product  ProductName="Chai" Supplier Category QuantityPerUnit="10  boxes x 20 bags" UnitPrice="18.0000" UnitsInStock="39" UnitsOnOrder="0"  ReorderLevel="10" Discontinued="0"/>

When you run the query, the actual XML comes out all on one line as a stream of data. XML output does not use linefeeds or make things readable in any fashion. The easiest way to write queries for XML, then, is to write them with the FOR XML clause left off, make sure that they are returning the data you want, and then add the FOR XML back onto the end of the query. That eliminates the need for a lot of extra formatting.

The use of the RAW mode of XML output is best suited in situations in which minimum formatting is desired. In RAW mode each row is returned as an element with the identifier row. Here's an example of the same query as you just saw, returned in RAW mode:

 <row Order Customer Product UnitPrice="14.0000" Quantity="12"/> <row Order Customer Product UnitPrice="9.8000" Quantity="10"/> <row Order Customer Product UnitPrice="34.8000" Quantity="5"/> <row Order Customer Product UnitPrice="18.6000" Quantity="9"/> <row Order Customer Product UnitPrice="42.4000" Quantity="40"/> <row Order Customer Product UnitPrice="7.7000" Quantity="10"/> <row Order Customer Product UnitPrice="42.4000" Quantity="35"/> <row Order Customer Product UnitPrice="15.8000" Quantity="15"/>

If you have binary data stored in text or image data types, SQL Server ignores the data when generating XML. If you want the data to be included, you use the BINARY BASE64 option:

 SELECT * FROM Employees  FOR XML RAW, BINARY BASE64

Another option available for extracting data in an XML format is EXPLICIT. The EXPLICIT option enables you to specify the format of the XML that will be created. To define the format, you have to alias the first column of output to the name Tag, name the second column Parent, and then alias each consecutive column to relate it to a specific Tag. The column names after Parent are named this way:

 [ElementName!TagNumber!AttributeName!Directive]

So the example

 SELECT 1 AS TAG, NULL AS Parent,        [Order Details].Quantity AS [Order!1!QtyPurchased],        Products.ProductID AS [Order!1!ProdID],        Products.ProductName AS  [Order!1!Product],        Products.UnitsInStock AS [Order!1!OnHand] FROM Products INNER JOIN [Order Details]    ON Products.ProductID = [Order Details].ProductID WHERE OrderID = 10248 FOR XML EXPLICIT

returns this data:

 <Order QtyPurchased="12" Person Product="Queso Cabrales" OnHand="22"/> <Order QtyPurchased="10" Person Product="Singaporean Hokkien Fried Mee" OnHand="26"/> <Order QtyPurchased="5" Person Product="Mozzarella di Giovanni" OnHand="14"/>

These types of queries are quite arduous and "clunky," to say the least, but they do produce useful results.

You are not going to become an XML expert overnight, and the material presented in this book is light, to say the least. Essentially, all we have done is shown how to draw data out of SQL Server in an XML format. We have not tried to explain XML or produce an XML reference text, but knowing the material presented in this section and the one to follow should get you through the XML portion of the exam.

You use the FOR XML clause to retrieve XML data, and it is really pretty easy to use after you get the hang of it. Getting XML data into a database is a little more tricky and not too user friendly.

Moving Data Using XML

Use the Transact-SQL OPENXML function to insert data represented as an XML document. OPENXML is used to allow access to XML data as if it were like any other table or rowset. You must first have an XML document in memory to work with. OPENXML can be used in SELECT, as well as in SELECT INTO statements where a source table or view can be specified. Here is a breakdown of the process:

Store into a variable or otherwise attain XML data that is loaded into memory.
Use the sp_xml_preparedocument system stored procedure to read and parse the XML data and create a document handle.
Access the XML data using the OPENXML statement to define the format of the XML document and return the rowset.
Use the sp_xml_removedocument system stored procedure to destroy the document handle created for XML access.

One of the trickiest parts of the whole procedure is simply having the data available in an XML format so that the rest of the process can continue. The XML data can simply be created through the use of another query if you are moving data from one database to another. You may also want to access the data through a stream or another source. To keep the material presented in this unit focused on what you need to accomplish to prepare for the exam, we will not burden you with the steps needed to access stream or file data. Instead, we will look at data loaded through other queries and directly loaded into variables for use.

To prepare an XML document, you somehow need to get the document into a T-SQL variable. After you have determined your XML source, you need to use the stored procedure for handling XML documents. There are two related procedures that will open a handle to the document and subsequently close the handle when it is no longer needed:

 sp_xml_preparedocument sp_xml_removedocument

The prepare document function will read XML text that can be provided as input directly or via a character variable. The procedure then parses the text using the XML parser. This is a dynamic link library (Msxml2.dll) used to understand the XML information being presented by the internal document. Next the data is provided as a parsed document in a state ready for consumption. This parsed document is a tree representation of the XML data. The following simple example demonstrates the use of the procedure with a direct XML text feed:

 DECLARE @hdoc INT EXEC sp_xml_preparedocument @hdoc OUTPUT,    '<SalesPerson  First="Danny" Last="Thomas">    <Sales Qty="5"/><Sales Qty="10"/></SalesPerson>' EXEC sp_xml_removedocument @hdoc

The first thing you'll notice is the variable declaration. The @hdoc is a variable that holds a temporary value for use later, in the sp_xml_removedocument as well as in the OPENXML. The next thing to note is that an sp_xml_preparedocument and sp_xml_removedocument are paired up. This is critical to ensure appropriate use of resources. Without the removal of the document handle, the only way to recover the memory is to restart SQL Server.

The sp_xml_preparedocument stored procedure has one optional argument: the xpath_namespace argument, which is used to specify an alternative namespace for the document. By default, the system uses the default namespace <root xmlns:mp="urn:schemas-microsoft-com:xml-metaprop">. If you specify an alternative namespace string, it replaces the "urn:schemas-microsoft-com: xml-metaprop" with whatever you specify.

The OPENXML function has three parameters and an optional WITH clause. It takes the document handle, which is returned by the sp_xml_preparedocument procedure, the rowpattern, which specifies which rows to return, and a single-byte value that can specify flags. In most cases you will want to use a rowset filter to achieve more useful output:

 SELECT * FROM openxml(@hdoc, '/SalesPerson', 1)

Using this filter will return only the attributes that pertain to the SalesPerson. Now, if you could just put the rows where they need to be, you'd have it made:

 SELECT * FROM openxml(@hdoc, '/SalesPerson', 1)               WITH (FirstName varchar(30), LastName varchar(30))

That WITH clause does the trick, and you finally get decent output:

 FirstName     LastName -------------   ------------- Danny           Thomas

So now you have the data you want, extracted from an XML rowset. Using the WITH clause is basically the same syntax as laying out the columns in a table: the column name, some space, the data type, a comma, and then the next column name.

OPENXML is extremely cumbersome. You can't read XML from a file easily; you have to spend time fighting with arcane bit-field flags, and you get to completely reformat your dataand you do all that just to get a few rows. You can't declare a variable of type TEXT, so you can hold only about 8KB of XML in SQL Server at a time.