Managing XML Data

In general, XML is a data storage format that can be used to define and store data. An XML document is a data storage medium that lays out the data into elements and attributes in much the same way that a database has rows and columns. An XML schema defines the complete data structure and business rules similarly to how a table definition defines the columns, their types, and the constraints to which data must conform.

XML is much more portable than a database and is quickly becoming the standard for data exchange among websites, applications, and other implementations that require exchange of data. XML is derived from HTML, which was designed to provide an application-independent, character-setindependent method of transferring data, especially transaction-oriented data, across systems.

Note

XML can easily be viewed in Internet Explorer. You simply save well-formed XML into a text file with an .xml extension, and you can then view the XML within Internet Explorer.

One of the primary uses for XML is the transfer of data between disparate systems. Through XML's definition techniques, when the data is sent, the complete definition accompanies the data. This means that the data is always within context and can more easily be mapped to other definitions of the same data.

Note

SQL Server had support for XML already, but prior to SQL Server 2005, it was somewhat difficult to work with. This kept many systems away from its use because it was too cumbersome and complex to deal with. One of the biggest issues was the minimal support for XML documents, which is the heart of what XML is all about.

Newly Supported XML Features

In SQL Server 2005, XML features have grown by leaps and bounds. The new functionality, including an XML data type, XQuery standards-based query capability, XQuery extensions to allow for modifications, enhanced functionality of the FOR XML clause, indexing capabilities over XML data, and a SOAP endpoint for better XML transfers over the Internet.

Using the `xml` Data Type

The new xml data type provides the previously lacking support for the storage of XML documents and fragments without dealing with text conversion issues. You can now store an XML document in its entirety in one field. You can store the accompanying schema as well, to provide for data definition. XML instances stored using the xml data type can be associated with an XML schema definition (XSD) to provide the definition and validation. The xml data type can be used in columns, in variables, or as parameters for stored procedures and functions.

Using XQuery with XML

XQuery is a language for querying XML data. When data is stored using the xml data type, you can use XQuery to dissect the data and pull the required information out for use in procedures. The SQL Server 2005 implementation of XQuery is based on working drafts of the World Wide Web Consortium (W3C).

Using Larger XML with `xarchar(max)`

Another new feature is related to storing XML documents. The new varchar(max) and varbinary(max) declarations of database fields and procedure variables allow for increased XML storage. Previously, developers were limited to 8000 bytes, and because many XML documents are larger than that, text fields were used. Now, with varchar(max), developers can store XML of virtually any size in a field. This is more useful if there is not a need to run XQuery against it.

Using XML DML

The new XML Data Manipulation Language (XML DML) extends the current definition supplied by the W3C. The current working draft of XQuery does not include the ability to modify XML documents. In SQL Server 2005, XQuery is extended to include the capability of inserting, updating, and deleting directly in XML documents or document fragments.

Using `FOR XML` with `PATH`

In SQL Server 2005, you can nest FOR XML statements to create a hierarchy of documents. The addition of the PATH parameter provides an alternative to the cumbersome EXPLICIT clause. The results of a FOR XML statement can be stored directly in the xml data type, which leads to easier transfers of data.

The XML data stored in a system is much more efficiently queried if it has indexing capabilities. Documents and fragments stored in the xml data type can have indexes defined for more effective processing and quicker response times.

Using XML Document Returns

You can configure HTTP endpoints or addresses to which requests based on the SOAP standard can be sent. SQL Server can now receive the packets directly, with no need for middle-tier processing and redirection. The results of queries sent to an HTTP endpoint are returned as an XML document. HTTP endpoint configuration and use are discussed fully in Chapter 6, "Database Maintenance."

The 70-431 exam objectives require you to know how to retrieve and send data from SQL Server for exchange with other systems. You also need to know how to index XML data and how to utilize and manipulate the new xml data type and its associated methods, and you need to know how to configure HTTP endpoints.

Exam Alert

Expect several questions about the SQL Server implementation of XML. Expect questions related to new features, including the ability to index XML, xml data type methods (nodes, query, value, exist, modify), and XML document handling and HTTP endpoints.

There is a considerable amount of new XML functionality in SQL Server. To understand its use, you also have to understand the layout and architecture behind XML documents.

XML: The Basics

An XML document consists of one or more elements, which are bound between angle brackets (< >). The first word that appears inside the angle brackets is the name of the element. The rest of the element consists of element attributes. For example, here's an element:

<Customer  First="Danny" Last="Thomas"/>

The name of this element, or the element type, is Customer. The element has attributes such as ID, First, and Last, which all have values. The element ends with a forward slash and an angle bracket, indicating the end of the element.

An element can also contain other elements, as shown here:

<Customer  First="Danny" Last="Thomas"> <Sales Qty="4"/> <Sales Qty="3"/> </Customer>

In this case, the Customer element contains two Sales elements. Notice that on the first line, there isn't a slash before the ending bracket; the matching slash for this element is on the last line. This is how objects are nested in XML.

Outputting Data in XML Format

To output data in XML format, you use the SELECT statement with the FOR XML operator. This tells SQL Server that instead of returning a rowset, it should return an XML document. There are four different options for generating the XML: RAW, AUTO, EXPLICIT, and PATH.

Exam Alert

Where is the XML schema? To produce XML output that also contains the schema information for the XML, you must tack the XMLDATA qualifier to the end of the FOR XML clause.

In AUTO mode, SQL Server returns the rowset in an automatically generated, nested XML format. If the query has no joins, it doesn't have a nesting at all. If the query has joins, it returns the first row from the first table and then all the correlated rows from each joined table as a nested level. For example, the following query shows order details nested inside orders:

 SELECT O.OrderID, O.CustomerID, OD.ProductID, OD.UnitPrice, OD.Quantity  FROM Orders AS O JOIN [Order Details] AS OD ON O.OrderID = OD.OrderID  WHERE O.OrderID < 10251  FOR XML AUTO <O Order Customer> <OD Product UnitPrice="14.0000" Quantity="12"/> <OD Product UnitPrice="9.8000" Quantity="10"/> <OD Product UnitPrice="34.8000" Quantity="5"/></O> <O Order Customer> <OD Product UnitPrice="18.6000" Quantity="9"/> <OD Product UnitPrice="42.4000" Quantity="40"/></O> <O Order Customer> <OD Product UnitPrice="7.7000" Quantity="10"/> <OD Product UnitPrice="42.4000" Quantity="35"/> <OD Product UnitPrice="15.8000" Quantity="15"/></O>

Note that the alias for each table becomes a row identifier within the XML output.

Note

XML results as shown in the previous query need a unique opening and closing tag, known as the root tag (that is, <root> and </root>), to be well formed. Well-formed XML can be displayed within Internet Explorer for ease of viewing.

When you run the query, the actual XML comes out all on one line, as a stream of data. XML output does not use linefeeds or make things readable in any fashion. The easiest way to write queries for XML is to write them with the FOR XML clause left out, make sure that they are returning the data you want, and then add the FOR XML back onto the end of the query. This eliminates the need for a lot of extra formatting.

The use of the RAW mode of XML output is best suited for situations in which minimal formatting is desired. In RAW mode, each row is returned as an element with the identifier row. Here's an example of the same query you just saw, this time returned in RAW mode:

SELECT O.OrderID, O.CustomerID, OD.ProductID, OD.UnitPrice, OD.Quantity  FROM Orders AS O JOIN [Order Details] AS OD ON O.OrderID = OD.OrderID  WHERE O.OrderID < 10251  FOR XML RAW <row Order Customer Product UnitPrice="14.0000" Quantity="12"/> <row Order Customer Product UnitPrice="9.8000" Quantity="10"/> <row Order Customer Product UnitPrice="34.8000" Quantity="5"/> <row Order Customer Product UnitPrice="18.6000" Quantity="9"/> <row Order Customer Product UnitPrice="42.4000" Quantity="40"/> <row Order Customer Product UnitPrice="7.7000" Quantity="10"/> <row Order Customer Product UnitPrice="42.4000" Quantity="35"/> <row Order Customer Product UnitPrice="15.8000" Quantity="15"/>

Notice that the XML output is in an element/attribute association in that each row of the table is returned as an element, with each column being an attribute. If you prefer, you can return everything as elements, with no attributes, using FOR XML RAW, ELEMENTS. New in SQL Server 2005, this provides the following results:

<row><OrderID>10248</OrderID><CustomerID>VINET</CustomerID> <ProductID>11</ProductID><UnitPrice>14.0000</UnitPrice> <Quantity>12</Quantity></row> <row><OrderID>10248</OrderID><CustomerID>VINET</CustomerID> <ProductID>42</ProductID><UnitPrice>9.8000</UnitPrice> <Quantity>10</Quantity></row> <row><OrderID>10248</OrderID><CustomerID>VINET</CustomerID> <ProductID>72</ProductID><UnitPrice>34.8000</UnitPrice> <Quantity>5</Quantity></row> <row><OrderID>10249</OrderID><CustomerID>TOMSP</CustomerID> <ProductID>14</ProductID><UnitPrice>18.6000</UnitPrice> <Quantity>9</Quantity></row> <row><OrderID>10249</OrderID><CustomerID>TOMSP</CustomerID> <ProductID>51</ProductID><UnitPrice>42.4000</UnitPrice> <Quantity>40</Quantity></row> <row><OrderID>10250</OrderID><CustomerID>HANAR</CustomerID> <ProductID>41</ProductID><UnitPrice>7.7000</UnitPrice> <Quantity>10</Quantity></row> <row><OrderID>10250</OrderID><CustomerID>HANAR</CustomerID> <ProductID>51</ProductID><UnitPrice>42.4000</UnitPrice> <Quantity>35</Quantity></row> <row><OrderID>10250</OrderID><CustomerID>HANAR</CustomerID> <ProductID>65</ProductID><UnitPrice>16.8000</UnitPrice> <Quantity>15</Quantity></row>

Adding XSINIL to the end of the command tacks the namespace argument to the beginning of the XML.

The EXPLICIT and PATH options enable you to specify the format of the XML that will be created. Using these options makes the query more complicated to formulate, but it gives you a little more control over the output.

To answer the questions on the 70-431 exam, you really only need to know the definitions. You will want to dig in much deeper if you continue with the remaining SQL Server certification exams.

Note

You are not going to become an XML expert overnight, and the material presented in this book is not intended to instruct you on XML. Essentially, all this chapter has done thus far is show how to draw data out of SQL Server in XML format. It has not tried to explain XML or produce an XML reference text, but knowing the material presented in this section and the two that follow should get you through the XML portion of the 70-431 exam.

Using the FOR XML clause to view data in XML format is really easy when you get the hang of it. Getting XML data into a database is a little more tricky and not as user friendly.

The `xml` Data Type and Methods

The xml data type lets you store XML documents in a field within an SQL Server database or lets you store a variable within a procedure. By using the xml data type, you can store a complete document in a singular column. XML fragments are also supported. An XML fragment is similar to an XML document, but it is missing a single top-level element. There is a limit to the size of a fragment; its stored XML data cannot exceed 2GB.

XML data stored by itself is referred to as untyped XML. If you optionally associate an XML schema collection with the data, the XML becomes typed, meaning that the definition for the stored XML is also stored. The saved schemas in the collection are used to validate and define the XML.

The xml data type is treated in the same manner as any of the other data types. When used in a table, an XML column can have defaults assigned, can have constraints in place, and can be used as a source or result for computed columns. An XML column cannot be a primary or foreign key, and if the XML data is typed (that is, has an associated schema), the schema must be provided with the definition.

Let's look at a practical example of using an xml data type. In the following example, resumes submitted for online job applications store educational background information into a table called Education. General name and address information is stored in a table called EmployeeProspect. EmployeeProspect has an XML column named Education:

 UPDATE ONE.dbo.EmployeeProspect  SET Education =   (SELECT A.*     FROM ONE.dbo.Education A, ONE.dbo.EmployeeProspect B     WHERE A.ProspectID = B.ProspectID     FOR XML AUTO)

The easiest way to populate an XML column or variable is simply to send the results of a query by using the FOR XML clause. In this example, the results were sent to the Education table before the data was placed in an XML column, which made it easier to illustrate the use of FOR XML. However, other techniques can be used to eliminate the need for the Education table altogether, as you will see a little later in the chapter. Normally, you would receive XML from a web service or other source outside SQL Server, which would then be deposited into a field of the xml data type.

Note

You can download from the website for this book a file called PROSPECTS.XLS. You can use this file to set up the EmployeeProspect table used in the examples in this chapter. To import the file into a table and database, follow the instructions in the ProspectsReadMe.txt file.

XML AUTO by itself provides untyped XML. You can produce typed data by using XML AUTO, TYPE. The enhanced OPENROWSET functionality allows you to bulk-load XML documents into the xml data type columns in the database. Chapter 5, "Data Consumption and Throughput," discusses this functionality further.

XML Method Interactions

The xml data type has several directly associated methods. The following xml data type methods allow for querying, modifying, and reporting on the xml data type:

query() Executes a query over XML and returns the untyped XML result of the query.
value() Extracts a value of SQL type from an XML instance.
exist() Determines whether a query returns a result.
modify() Is used in the SET statement or SET clause of an UPDATE command to alter the content of an XML document.
nodes() Helps shred XML into relational data.

Exam Alert

xml data type methods are queried heavily on all SQL Server certification exams. Expect to see two to three questions in this area.

xml data type methods are considered subqueries and are therefore not permitted in areas where subqueries are not allowed. Because of this, the xml data type methods cannot be used in the PRINT statement or within the GROUP BY clause.

The `query()` Method

By using the query() method, you can return portions of XML. In its simplest form, you supply to the query() method the path through the XML that is desired. The information returned to you is the elements from the path and any subordinate elements and attributes. The information returned is XML data. To return the entire XML set, you use a query() method and supply only the appropriate root identifier, as shown in the following:

SELECT Education.query('/A') FROM EmployeeProspect

To return a subset of the information, you continue through the XML path, providing the necessary level to produce the desired results. The following example pulls from the XML the extra curricular elements only, identified by <EXTRA>, and any subordinates:

SELECT Education.query('/A/EXTRA') FROM EmployeeProspect <EXTRA>Chess Club President</EXTRA><EXTRA>Valedictorian</EXTRA> <EXTRA>Student Body President</EXTRA> <EXTRA>Varsity Basketball</EXTRA> <EXTRA /> <EXTRA /> NULL <EXTRA>Volleyball</EXTRA><EXTRA>Basketball</EXTRA> <EXTRA /> <EXTRA>Gymnastics</EXTRA><EXTRA>Cheerleaders</EXTRA> <EXTRA /> <EXTRA /> <EXTRA /> <EXTRA>Debating Team</EXTRA><EXTRA>Ping Pong</EXTRA> <EXTRA /> <EXTRA /> <EXTRA /> <EXTRA /> <EXTRA /> <EXTRA>Bowling</EXTRA><EXTRA>Golf</EXTRA> <EXTRA>Basketball</EXTRA><EXTRA>Baseball</EXTRA> NULL <EXTRA />

You can drill through more complex nested XML in the same manner. You can also get more involved and begin to implement namespace arguments, but that is beyond the scope of the 70-431 exam and therefore this book.

The `value()` Method

The value() method is similar to the query() method except that it returns a SQL data type instead of an xml data type. You use this method to extract values from XML data. You can specify SELECT queries that combine or compare XML data with data in non-XML columns. You can also produce standard SQL-style reports based on the XML data.

You can retrieve individual attributes by using the @ sign prior to the attribute name within the supplied path. The following query illustrates how you can display a standard SQL column report that combines information from the SQL data type columns and from within the xml data type column:

SELECT ProspectID, FirstName + ' ' + LastName AS Name,  Education.value('(/A/@SCHOOL)[1]', 'varchar(40)') AS School  FROM EmployeeProspects ProspectID Name                School ---------- ------------------- ---------------------------------------- ACKE0001   Pilar Ackerman      Maple Grove High BARB0001   Angela Barbariol    Illinois Tech School BARR0001   Adam Barr           Wabash University BONI0001   Luis Bonifaz        Northern CC BUCH0001   Nancy Buchanan      Univ of IL CHEN0001   John Chen           NULL CLAY0001   Jane Clayton        Elgin CC DELA0001   Aidan Delaney       DuPage DIAZ0001   Brenda Diaz         Roosevelt High School DOYL0001   Jenny Doyle         Heartland CC

The ordinal [1] indicates that the first subordinate is to be used. If multiple schools were present in the data, only the first in each record would be returned.

The `exist()` Method

You can perform an existence test against data by using the exist() method. For example, the following query resolves who in the data set has a degree:

SELECT ProspectID, Education.exist('/A/@DEGREE')  FROM EmployeeProspects EmployeeID ---------- ----- ACKE0001   0 BARB0001   1 BARR0001   1 BONI0001   1 BUCH0001   1 CHEN0001   NULL CLAY0001   1 DELA0001   1 DIAZ0001   0 DOYL0001   1

This method returns a 0, 1, or NULL, depending on whether data in the specified query is found, not found, or null. Our data sample would return 0 for anyone with XML present but no degree, 1 for anyone with a degree, and NULL for anyone with no associated data in the XML column.

The `modify()` Method

You use the modify() method to update an XML document. You can change the content of an XML type variable or column similarly to how you would change the content of any other SQL type column or variable. The modify() method takes an XML DML statement. XML DML provides statements to insert, update, or delete nodes from data. You use the modify() method in the SET clause of an UPDATE statement, as in the following example:

 UPDATE Education  SET Education.modify('replace value of(/A/@DEGREE)[1] with "MA"')  WHERE ProspectID = 'DOYL0001'

Three different XML DML commands can be executed through the modify() method: insert, replace value of, and delete. The replace value of keywords are similar to the traditional SQL UPDATE statement. The insert keyword allows for the addition of one or more nodes or siblings, as in the following:

 UPDATE Education  SET Education.modify('insert  <A SCHOOL="Univ of IL" DEGREE="MA"   MAJOR="Speech and Communica" GPA="3.3"   GRADYEAR="1991"><EXTRA /></A>    after (/A)[1]')  WHERE ProspectID = 'BUCH0001'

When after is used, the new node(s) is placed after the position of the defined query as a sibling. When before is used, the new node(s) is placed in front of the position as a sibling. When into is used, you specify as last or as first, and the new nodes are placed accordingly as descendants of the node identified by the query.

You can also specify the delete keyword with the modify() method to remove nodes from the XML:

 UPDATE Education  SET Education.modify('delete /A/EXTRA')  WHERE ProspectID = 'BUCH0001'

The `nodes()` Method

You use the nodes() method to shred XML into relational data. It allows you to identify nodes that will be mapped into a new row of a recordset. The recordset contains copies of the original XML. You can retrieve multiple values from the recordset to provide a standard SQL report.

Unlike the other methods, nodes() is used as the source of a query in the FROM clause of the SELECT statement. You can either query directly from an XML variable or use CROSS APPLY on a table to an XML column contained within the table, as illustrated in the following example:

SELECT ProspectID, RTRIM(FirstName) + ' ' + Lastname AS Name,  T.Loc.value('@SCHOOL', 'varchar(40)') AS School,  T.Loc.value('@DEGREE', 'varchar(5)') AS Degree  FROM Education CROSS APPLY Education.nodes('/A') AS T(Loc) ProspectID Name              School                     Degree ---------- ----------------- -------------------------- ------ ACKE0001   Millar Ackerman   Maple Grove High BARB0001   Angela Barbariol  Illinois Tech School    AS BARR0001   Adam Barr         Wabash University       BS BONI0001   Luis Bonifaz      Northern CC             AA BUCH0001   Nancy Buchanan    Univ of IL              BA BUCH0001   Nancy Buchanan    Univ of IL              MA CLAY0001   Jane Clayton      Elgin CC                AA DELA0001   Aidan Delaney     DuPage                  BA DIAZ0001   Brenda Diaz       Roosevelt High School DOYL0001   Jenny Doyle       Heartland CC            AA ERIC0001   Gregory Erickson  Harvard University      BS

Newly Supported XML Features

Using the xml Data Type

Using XQuery with XML

Using Larger XML with xarchar(max)

Using XML DML

Using FOR XML with PATH

Using XML Document Returns

XML: The Basics

Outputting Data in XML Format

The xml Data Type and Methods

XML Method Interactions

The query() Method

The value() Method

The exist() Method

The modify() Method

The nodes() Method

Using the `xml` Data Type

Using Larger XML with `xarchar(max)`

Using `FOR XML` with `PATH`

The `xml` Data Type and Methods

The `query()` Method

The `value()` Method

The `exist()` Method

The `modify()` Method

The `nodes()` Method