Getting Started with the xml Data Type

The following examples show how to access the data in an xml data type. This is not intended to be an all-inclusive tutorial on XQuery and XPath; however, the examples should give you an understanding of how to write basic queries for data.

The simplest example is a variable that is an xml data type and has an XML instance value assigned to it. The following SQL snippet creates a variable and populates it with XML data. Note that this populated variable is used for the samples that follow. You can open a new query window in SQL Server Management Studio and put this SQL script before any of the examples that require the @myXPath variable.

SQL: Populating the @myXPath Variable

DECLARE @myXPath xml SET @myXPath = '<vehicles>    <car vin="123" make="Ford" model="Mustang">       <repair>         <id>3</id>         <description>Replace Supercharger.</description>         <cost>4000.34</cost>       </repair>       <repair>         <id>45</id>         <description>Front end alignment.</description>         <cost>45.67</cost>       </repair>    </car>    <car vin="234" make="Mazda" model="Miata">       <repair>         <id>7</id>         <description>Repair top.</description>         <cost>123.45</cost>       </repair>       <repair>         <id>22</id>         <description>Oil Change.</description>         <cost>29.99</cost>       </repair>    </car>    <truck vin="567" make="Nissan" model="Pathfinder">       <repair>         <id>2</id>         <description>Replace air filter.</description>         <cost>34.95</cost>       </repair>       <repair>         <id>6</id>         <description>Oil Change.</description>         <cost>39.99</cost>       </repair>    </truck> </vehicles>'

Using the Query Method with XPath

XQuery includes XPath 2.0 as a navigation language. XPath gets its name from its path expression, which provides a means of efficiently searching and hierarchically addressing the nodes in an XML tree.

For example, you can search for all occurrences of the car element in the XML tree by using //car in your query. The // means descendent or self, and when the query expression starts with the double slashes, it means the document's descendent or self. So //car means find any descendent or self in the XML document that is a car. XQuery also supports abbreviated syntax for specifying the axis. Table 15-1 lists the axes with their corresponding abbreviated syntax.

Table 15-1: Axes Supported by XQuery
Axis	Abbreviated Form
Attribute	@
Child	NA
descendant-or-self::node()	//
parent::node()	?
self::node()	.

You can also include filter expressions by using the [expression] syntax, as shown in the next example. Note that any node referenced in the filter expression is relative to the current node being evaluated. When you need to access an XML attribute, the attribute name must have the @ prefix.

Now that the variable is populated, you can query it as shown in the following SQL snippet.

SQL

SELECT @myXPath.query('//car[@vin="234"]')

When you execute this script, you get the following results.

<car vin="234" make="Mazda" model="Miata">   <repair>     <id>7</id>     <description>Repair top.</description>     <cost>123.45</cost>   </repair>   <repair>     <id>22</id>     <description>Oil Change.</description>     <cost>29.99</cost>   </repair> </car>

The SELECT statement returns the XML node using an XPath query expression that locates car elements at any level in the document with a vin attribute of 234.

One problem with this query is that the data includes cars and trucks, and you might want to locate any vehicle that has a specific vin attribute. Instead of using the word car, you can specify self, which is represented as a period. Here is the query for a vin of 567 that will return a truck.

SQL

SELECT @myXPath.query('//.[@vin="567"]')

This query now looks for any descendent of the XML document that has a vin of 567. The result is as follows.

<truck vin="567" make="Nissan" model="Pathfinder">   <repair>     <id>2</id>     <description>Replace air filter.</description>     <cost>34.95</cost>   </repair>   <repair>     <id>6</id>     <description>Oil Change.</description>     <cost>39.99</cost>   </repair> </truck>

You can change the filter expression to search for the vin of a car, and the query will then return a car as well.

Using a SQL Variable in the Filter Expression

You might want to use a SQL variable to retrieve a car or truck instead of using a constant. This approach is helpful if you are working with parameters in a stored procedure. It is shown in the following script.

SQL

DECLARE @findvar nvarchar(20) SET @findvar = '567' SELECT @myXPath.query('//.[@vin=sql:variable("@findvar")]')

When you execute the script, the same results are returned as in the previous example, but here you can use a SQL variable to filter the output of the query.

The SELECT statement returns an xml data type. If you want to retrieve the make of the car as a string, you can use the value method, as shown in the following SQL snippet.

SQL

DECLARE @findvar nvarchar(20) SET @findvar = '567' SELECT @myXPath.query(    '//.[@vin=sql:variable("@findvar")]')    .value('(//@make)[1]','nvarchar(max)')

When the SQL script is run, it displays Nissan as an nvarchar(max). Notice that the value method requires your query expression to return only a single item, so we added parentheses around the query and a [1] to specify that only the first result be used, even if you know that only one item will be returned from the //@make query expression.

You can perform a search based on the text in a node by using the text method. This method returns the text in the node. If the node has child nodes, their text is included, but the child element tags are not. The following code returns the car or truck whose repair id is equal to 7.

SQL

SELECT @myXPath.query('//.[repair/id/text()="7"]')

The following is returned from the query. Notice that the query returns the complete car with all of its repairs.

<car vin="234" make="Mazda" model="Miata">   <repair>     <id>7</id>     <description>Repair top.</description>     <cost>123.45</cost>   </repair>   <repair>     <id>22</id>     <description>Oil Change.</description>     <cost>29.99</cost>   </repair> </car>

Notice that one of the repairs has an id of 7. If you want to retrieve only the repair whose id is 7, you can tweak the query as follows.

SQL

SELECT @myXPath.query('//repair[id/text()="7"]')

This query locates all repairs in the document with a matching id. The following is the result.

<repair>   <id>7</id>   <description>Repair top.</description>   <cost>123.45</cost> </repair>

You can use information from the parent node in your filter expression by using .. to specify parent. For example, if you want to retrieve the description of the repair whose id is 7, you can use the following script.

SELECT @myXPath.query('//repair/description[../id/text()="7"]')

This script locates all descriptions, where the parent element has an id element and the id is 7. The result is as follows.

<description>Repair top.</description>

Although we haven't covered every variation, these examples should give you enough information to get going with XPath queries.

Performing General Comparisons

XPath provides general comparison operators to compare atomic values, sequences, or a combination of the two. The general comparison operators are =, !=, <, >, <=, and >=. These comparisons return true on any match.

The following example returns all the repair elements whose id node is 40 or greater.

SQL

SELECT @myXPath.query('//repair[id>=40]')

The results are as follows.

<repair>   <id>45</id>   <description>Front end alignment.</description>   <cost>45.67</cost> </repair>

Be careful not to use quotation marks when you work with numeric types; the following script yields an entirely different result.

SQL

SELECT @myXPath.query('//repair[id>="40"]')

Notice the results.

<repair>   <id>45</id>   <description>Front end alignment.</description>   <cost>45.67</cost> </repair> <repair>   <id>7</id>   <description>Repair top.</description>   <cost>123.45</cost> </repair> <repair>   <id>6</id>   <description>Oil Change.</description>   <cost>39.99</cost> </repair>

What happened here? This query performed a string comparison instead of a numeric comparison, which resulted in additional matches because 7 and 6 are greater than 4, which is the first character in 40.

Using Value Comparison Operators

You can also use value comparison operators to compare atomic values. The value comparison operators supported by SQL Server 2005 are eq, ne, lt, gt, le, and ge. You use the value comparison operators when you want to strictly type an atomic value for the comparison operation. This ensures that the correct type is used for the comparison.

The following snippet uses a value comparison operator to return repair elements whose cost is greater than or equal to 45.

SQL

SELECT @myXPath.query('//repair[ xs:decimal(cost[1]) ge 45]')

Notice that the explicit cast to xs:decimal is required when you use the value comparison operator, and the cost must be a single atomic value, so [1] is used to get the first cost result.

The output looks like the following.

<repair>   <id>3</id>   <description>Replace Supercharger.</description>   <cost>4000.34</cost> </repair> <repair>   <id>45</id>   <description>Front end alignment.</description>   <cost>45.67</cost> </repair> <repair>   <id>7</id>   <description>Repair top.</description>   <cost>123.45</cost> </repair>

Using Node Comparison Operators

You use the node comparison operator to determine whether two nodes are actually the same node. The following example performs a node test on two cars to see if they are the same node.

SQL

SELECT @myXPath.query(' if((/vehicles/*[@vin="123"])[1] is (/vehicles/*[@make="Ford"])[1]) then <result>These nodes are actually the same node.</result> else <result>These nodes are different nodes.</result> ')

In this example, two XPath queries first get all child nodes under the vehicle node, using the asterisk. The first expression filters the children by finding a matching vin attribute, while the second expression filters the child nodes by finding a matching make attribute. You want to test to see if the node returned from each query is actually the same node. The result is as follows:

<result>These nodes are actually the same node.</result>

Using Node Order Comparisons

You can use the node comparison operators to compare two nodes in the XML data to find out which node is first. The >> operator means "is after," and the << operator means "is before." Each operator takes two operands. The following example shows how to use the >> operator to find out if the repair whose id is 7 is after the repair whose id is 2.

SQL

SELECT @myXPath.query(' if((/vehicles/*/repair[./])[1] >> (/vehicles/*/repair[./])[1]) then    <result>repair 7 is after repair 2</result> else    <result>repair 2 is after repair 7</result> ')

Notice that both parameters execute an XPath query to get a node set and that the first result is selected. The result of the comparison is as follows:

<result>repair 2 is after repair 7</result>

Using Logical Operators

You can use the and and or logical operators in your queries to combine logical expressions. The following example returns any car or truck whose vin is 567 or 234.

SQL

SELECT @myXPath.query(' /vehicles/*[@vin="567" or @vin="234"] ')

The output is as follows.

<car vin="234" make="Mazda" model="Miata">   <repair>     <id>7</id>     <description>Repair top.</description>     <cost>123.45</cost>   </repair>   <repair>     <id>22</id>     <description>Oil Change.</description>     <cost>29.99</cost>   </repair> </car> <truck vin="567" make="Nissan" model="Pathfinder">   <repair>     <id>2</id>     <description>Replace air filter.</description>     <cost>34.95</cost>   </repair>   <repair>     <id>6</id>     <description>Oil Change.</description>     <cost>39.99</cost>   </repair> </truck>

Notice that both vehicles were found and a car and a truck are returned because of the use of the asterisk (*) to select any child of vehicles.

Using the Query Method with XQuery

XQuery is a superset of XPath 2.0 and extends the navigational and filtering aspects of XPath 2.0 by providing the FLWOR expression syntax. FLWOR stands for for-let-where-order-return and is pronounced flower. These clauses are defined as follows.

for Generates an ordered list of bindings that is assigned to a variable.
let Associates a single variable with the results of an expression. Note that this clause is not supported in SQL Server 2005.
where Filters the list to retain only the results.
order Sorts the query results on given criteria.
return For each query result, construct and return a resulting value.

Learning the XQuery FLWOR syntax is relatively easy if you already know T-SQL and XPath. All of the XPath information covered in this chapter also applies when you use XQuery.

When you work with typed xml values, XQuery works as a strongly typed language. Strong typing improves query performance by providing type assurances that you can use when you perform query optimization. When you work with untyped xml values, XQuery works as a weak typed language for untyped data.

You use the return clause to construct the resulting value, which means you can use XQuery to provide a transformation of the XML data. Queries written in XQuery usually require less code compared with queries written in XSLT (Extensible Stylesheet Language Transformations), which is another XML technology for transforming data. Transforming data is also known as repurposing data.

The following SQL snippet populates a variable called @myXQuery with data that will be used in the upcoming XQuery examples. You can open a new query window in SQL Server Management Studio and put this SQL script before any of the examples that require the @myXQuery variable.

SQL: Populating the @myXQuery Variable

DECLARE @myXQuery xml SET @myXQuery = ' <owners>    <owner name="Joe">       <vehicle vin="567"/>       <vehicle vin="234"/>    </owner>    <owner name="Mary">       <vehicle vin="123"/>    </owner> </owners> <vehicles>    <car vin="123" make="Ford" model="Mustang">       <repair>         <id>3</id>         <description>Replace Supercharger.</description>         <cost>4000.34</cost>       </repair>       <repair>         <id>45</id>         <description>Front end alignment.</description>         <cost>45.67</cost>       </repair>    </car>    <car vin="234" make="Mazda" model="Miata">       <repair>         <id>7</id>         <description>Repair top.</description>         <cost>123.45</cost>       </repair>       <repair>         <id>22</id>         <description>Oil Change.</description>         <cost>29.99</cost>       </repair>    </car>    <truck vin="567" make="Nissan" model="Pathfinder">       <repair>         <id>2</id>         <description>Replace air filter.</description>         <cost>34.95</cost>       </repair>       <repair>         <id>6</id>         <description>Oil Change.</description>         <cost>39.99</cost>       </repair>    </truck> </vehicles>'

The following SQL snippet provides an XQuery sample that uses all of the FLWOR clauses available in SQL Server 2005.

SQL

SELECT @myXQuery.query(' for $r in /vehicles/*/repair where $r/cost >= 100 order by $r/cost[1] ascending return $r ')

The for clause provides a loop that iterates over each repair element belonging to each vehicle. The for clause uses an XPath query to locate the repairs. Each time through the loop, the variable, $r, is assigned an element from the XPath query result. In this case, $r iterates through the repair elements. The where clause is used to provide a filter for the repair elements. This filter is set to ignore all repair elements with a cost less than 100, by using $r/cost to get the repair cost. The order clause changes the order of the result set instead of using the default document order. In this example, the order is by repair cost. If you are ordering on a child node such as cost, you must specify using the first cost node that is found. The return clause lets you decide what to return; it is currently set to return the repair element. The returned results are as follows.

<repair>   <id>7</id>   <description>Repair top.</description>   <cost>123.45</cost> </repair> <repair>   <id>3</id>   <description>Replace Supercharger.</description>   <cost>4000.34</cost> </repair>

Notice that the repairs are in order of cost a different order than in the XML data, and the repair elements are returned with their child elements.

You can use the return clause to reshape the result data. For example, if you want to provide a list of high-cost repairs that also include the vin attribute of the vehicle, you can shape the output as follows.

SQL

SELECT @myXQuery.query(' for $r in /vehicles/*/repair where $r/cost >= 100 return <high-cost-repair repair->    <vehicle vin="{$r/../@vin}"/>    {$r/description}    {$r/cost} </high-cost-repair> ')

The return clause defines a new element called high-cost-repair that is returned for each repair in the result set. This element has an attribute called repair-id that contains the id of the repair. Notice the use of the braces to get the repair's id. Next, we added a new element called vehicle, which has a vin attribute set to the value of the vin attribute of the parent of the repair. Notice that the parent is accessible by using the ? syntax. Finally, the complete description and cost nodes are returned, and the result is as follows.

<high-cost-repair repair->   <vehicle vin="123" />   <description>Replace Supercharger.</description>   <cost>4000.34</cost> </high-cost-repair> <high-cost-repair repair->   <vehicle vin="234" />   <description>Repair top.</description>   <cost>123.45</cost> </high-cost-repair>

You might also want the result to include the owner information for each repair. You can do this in a couple of ways. One way is to create a nested loop that iterates through the owner's vehicles to find a vehicle that has a matching vin, as shown in the following SQL snippet.

SQL

SELECT @myXQuery.query(' for $r in /vehicles/*/repair for $o in /owners/*/vehicle[@vin=$r/../@vin]/.. where $r/cost >= 100 return <high-cost-repair repair->    <owner name="{$o/@name}" />    <vehicle vin="{$r/../@vin}" />    {$r/description}    {$r/cost} </high-cost-repair> ')

When the vehicle is found, the .. syntax is used to get the parent element, which is the owner of the vehicle. Although the second loop actually iterates through the owners/owner/vehicle elements, it returns the owners/owner of the vehicle that has the matching vin.

The return clause defines a new element called owner and contains a name attribute that is populated with the current owner name. (Note that if you simply returned $o, you would see the owner element and its vehicle child elements in the result.) The following is returned.

<high-cost-repair repair->   <owner name="Mary" />   <vehicle vin="123" />   <description>Replace Supercharger.</description>   <cost>4000.34</cost> </high-cost-repair> <high-cost-repair repair->   <owner name="Joe" />   <vehicle vin="234" />   <description>Repair top.</description>   <cost>123.45</cost> </high-cost-repair>

Another way to retrieve the same information, with better performance, is to use an XPath query to populate the owner information without adding a second for clause, as shown in the following SQL script.

SQL

SELECT @myXQuery.query(' for $r in /vehicles/*/repair where $r/cost >= 100 return <high-cost-repair repair->    <owner name="{/owners/*/vehicle[@vin=$r/../@vin]/../@name}" />    <vehicle vin="{$r/../@vin}"/>    {$r/description}    {$r/cost} </high-cost-repair> ')

Notice the XPath query for the owner's name attribute. This approach performs much better than the previous example because nested for loops create joins to the iterator variable.

Using Data Accessors

XQuery supports two data accessor functions that you can use to extract values of nodes as strings or typed values. To extract a node as a string, use the string function; to extract a node as a typed value, use the data function. When you use the data function, the node must be a text node, attribute node, or element node. If the node is a document node of an untyped XML instance, the data function returns a string value of the document. The next section includes an example of data accessor usage.

Using Computed Element Constructors

Instead of scripting the XML tag information where the angle brackets are in your code as in the previous example, you can instruct XQuery to create element and attribute objects and then you can call their constructors using the element name { } and attribute name { } syntax. This technique is significantly faster than the previous example.

The following example demonstrates the implementation of data accessors and computed element constructors to create the same output as in the previous example.

SQL

SELECT @myXQuery.query(' for $r in /vehicles/*/repair where $r/cost >= 100 return element high-cost-repair {    attribute repair-id {$r/id},    element owner    {       attribute name {/owners/*/vehicle[@vin=$r/../@vin]/../@name}    },    element vehicle    {       attribute vin {$r/../@vin}    },    element description {data($r/description)},    element cost { data($r/cost)} } ')

You should favor this approach over the previous examples because it is significantly faster.

Using String Functions

SQL Server 2005 does not support all of the XQuery 1.0 string functions. Here are the string functions you can use in your XQuery statements, along with brief examples.

string-length Used to calculate the length of a string.

SELECT @myXQuery.query(' <result> {(//owners/owner/@name)[1]} {string-length((//owners/owner/@name)[1])} </result> ')

Returns:

<result name="Joe">3</result>

substring Extracts part of a string from the source string. Note that the length can be omitted to return the rest of the source string.
```
SELECT @myXQuery.query(' <result>{substring((//vehicles/truck/@model)[1], 4, 5)}</result> ') 
```
Returns the following based on Pathfinder:
```
<result>hfind</result> 
```

concat Concatenates two or more strings.

SELECT @myXQuery.query(' <result> {    concat    (       (//owners/owner/@name)[1],       " has a ",       (//vehicles/truck[@vin=(//owners/owner)[1]/vehicle/@vin]/@make)[1],       " ",       (//vehicles/truck[@vin=(//owners/owner)[1]/vehicle/@vin]/@model)[1]    ) } </result> ')

Returns:

<result>Joe has a Nissan Pathfinder</result>

contains Determines whether one string is contained within another string. Note that this function is case sensitive.

SELECT @myXQuery.query(' for $r in /vehicles/*/repair where contains(($r/description)[1],"Super") return $r ')

Returns:

<repair>   <id>3</id>   <description>Replace Supercharger.</description>   <cost>4000.34</cost> </repair>

Note

Here is a list of XQuery 1.0 string functions that are not supported in SQL Server 2005 but might be added in a future release: codepoints-to-string, string-to-codepoints, compare, string-join, normalize-space, normalize-unicode, upper-case, lower-case, translate, escape-uri, starts-with, ends-with, substring-before, substring-after, matches, replace, and tokenize.

Using Aggregate Functions

An aggregate function operates on a sequence of items and returns the aggregated value of the sequence. The following is a list of aggregate functions, with brief examples, that you can use in your XQuery statements.

count Returns the number of items in a sequence.

SELECT @myXQuery.query(' <result>{ count(/vehicles/*/repair) }</result> ')

Returns the following:

<result>6</result>

min Returns the minimum value in a range. Works with base types that support the gt operator. Mixed types are not supported.
```
SELECT @myXQuery.query(' <result>{ min(/vehicles/*/repair/cost) }</result> ') 
```
Returns the following:
```
<result>29.99</result> 
```
max Returns the maximum value in a range. Works with base types that support the gt operator. Mixed types are not supported.
```
SELECT @myXQuery.query(' <result>{ max(/vehicles/*/repair/cost) }</result> ') 
```
Returns the following:
```
<result>4000.34</result> 
```
avg Returns the average value of the sequence. Works with the any of the numeric base types or untypedAtomic, which is the text node of an untyped element, but cannot be mixed.
```
SELECT @myXQuery.query(' <result>{ avg(/vehicles/*/repair/cost) }</result> ') 
```
Returns the following:
```
<result>712.398333333333</result> 
```
sum Returns the sum value of the sequence. Works with any of the numeric base types or untypedAtomic, but cannot be mixed.
```
SELECT @myXQuery.query(' <result>{ sum(/vehicles/*/repair/cost) }</result> ') 
```
Returns the following:
```
<result>4274.39</result> 
```

Using Context Functions

Context functions are based on your location in an XML tree. SQL Server 2005 implements the following context functions, which are shown here with small example scripts.

last Used to determine the number of items in a sequence.

SELECT @myXQuery.query(' <result>{ (/vehicles/*/repair/cost)[last()] }</result> ')

Returns the last repair cost:

<result>   <cost>39.99</cost> </result>

To get the last repair of each vehicle and then get its cost, use the following:

SELECT @myXQuery.query(' <result>{ /vehicles/*/repair[last()]/cost }</result> ')

The results:

<result>   <cost>45.67</cost>   <cost>29.99</cost>   <cost>39.99</cost> </result>

position Obtains your current position in a sequence.

SELECT @myXQuery.query(' <result>{ (/vehicles/*/repair)[last() -position() < 3] }</result> ')

The parentheses around the /vehicles/*/repair expression indicate that this expression should be fully executed, and its result is fed into the last function to get the last three repair elements in the document. If the parentheses were used, the last three repairs of each vehicle would be returned instead. The expression returns the last three repairs in the document as follows:

<result>   <repair>     <id>22</id>     <description>Oil Change.</description>     <cost>29.99</cost>   </repair>   <repair>     <id>2</id>     <description>Replace air filter.</description>     <cost>34.95</cost>   </repair>   <repair>     <id>6</id>     <description>Oil Change.</description>     <cost>39.99</cost>   </repair> </result>

Using XQuery FLWOR vs. XPath

You should always try to use an XPath expression over a FLWOR expression because the for statement in the FLWOR expression creates a query plan with a join operation between the for variable and the body of the for clause, and this join results in poorer performance compared to the XPath expression.

You should use the FLWOR expression when you need to do any of the following.

Filter the result sequence of the for clause based on something that cannot be defined using simple XPath expressions.
Sort the result set based on a sorting expression. Sorting is defined on the result set using the order by clause.
Construct the returned result using the results obtained from the for clause.

Using Namespaces in Your Queries

In SQL Server 2005, XQuery expressions consist of a prolog and a body. You can use the prolog to declare namespaces. A namespace declaration is used to map a prefix to a namespace Uniform Resource Identifier (URI), which is a compact string of characters that identify a resource. You use the prefix when a namespace is required, instead of using the complete URI.

Remember that in XML, the default namespace for attributes is always the null namespace and the only way to get an attribute into a different namespace is to explicitly assign the attribute to the namespace. XQuery is consistent with this behavior, and you should be aware that XQuery uses the same namespaces for searching and constructing XML.

The examples in this section use a variable called @myXQueryNs, which is populated with the following XML data. Note that the namespace assignments are intentionally inconsistent.

SQL: Populating the @myXQueryNs Variable

DECLARE @myXQueryNs xml SET @myXQueryNs = ' <work xmlns="default.gjtt.com"       xmlns:OWN="owner.gjtt.com"       xmlns:VEH="vehicle.gjtt.com" >   <OWN:owners>      <OWN:owner name="Joe">         <OWN:vehicle OWN:vin="567"/>         <OWN:vehicle vin="234"/>      </OWN:owner>      <OWN:owner name="Mary">         <OWN:vehicle vin="123"/>      </OWN:owner>   </OWN:owners>   <VEH:vehicles>      <car vin="123" make="Ford" model="Mustang">         <VEH:repair>           <id>3</id>           <description>Replace Supercharger.</description>           <cost>4000.34</cost>         </VEH:repair>         <repair>           <id>45</id>           <description>Front end alignment.</description>           <cost>45.67</cost>         </repair>      </car>      <car vin="234" make="Mazda" model="Miata">         <repair>           <id>7</id>           <description>Repair top.</description>           <cost>123.45</cost>         </repair>         <repair>           <id>22</id>           <description>Oil Change.</description>           <cost>29.99</cost>         </repair>      </car>      <truck vin="567" make="Nissan" model="Pathfinder">         <repair>           <id>2</id>           <description>Replace air filter.</description>           <cost>34.95</cost>         </repair>         <repair>           <id>6</id>           <description>Oil Change.</description>           <cost>39.99</cost>         </repair>      </truck>   </VEH:vehicles> </work>'

What is the result if you perform a simple query for the last repair using the following SQL snippet?

SELECT @myXQueryNs.query(' <result>{ (//repair)[last()] }</result> ')

This snippet is returned:

<result />

No repairs were found because the repairs defined in @myXQueryNs are actually in the default.gjtt.com namespace, which is the default namespace. The search for //repair looks for a repair in the empty namespace.

One way to correct the problem is to define a namespace in the prolog of the XQuery statement and use it for the search, as shown in the following SQL script.

SELECT @myXQueryNs.query(' declare namespace ns1 = "default.gjtt.com"; <result>{ (//ns1:repair)[last()] }</result> ')

In this code, a namespace is declared in the prolog section of the XQuery expression and is referenced in the search for the repair. This returns the following repair.

<result>   <ns1:repair xmlns:ns1="default.gjtt.com">     <ns1:id>6</ns1:id>     <ns1:description>Oil Change.</ns1:description>     <ns1:cost>39.99</ns1:cost>   </ns1:repair> </result>

The first thing to verify is whether you got the repair you expected. You should have, because this is the last repair element in the XML document.

Notice that the ns1 prefix is on the repair and its elements and that the namespace declaration in the result maps ns1 to the correct namespace. This means the namespace mapping you defined for the search is the same namespace mapping used in the result.

Although the result is technically accurate, if you want these elements to be in a default namespace, you can use the following SQL snippet.

SELECT @myXQueryNs.query(' declare default element namespace "default.gjtt.com"; <result>{ (//repair)[last()] }</result> ')

This code declares a default element namespace in the prolog of the XQuery expression, which means the search for the repair will use the default namespace without requiring a prefix. The result is as follows:

<result xmlns="default.gjtt.com">   <repair>     <id>6</id>     <description>Oil Change.</description>     <cost>39.99</cost>   </repair> </result>

The result and its elements are all in the default namespace. Once again, notice that the same namespace definitions used for the query were also used in the output.

Do You Really Want to Set the Default Namespace? Wait. This output isn't the same as the previous output. The previous output placed the result element in the null namespace, and this output places the result in the default.gjtt.com namespace. Try to change the namespace of the result element, as shown in the following snippet.

SELECT @myXQueryNs.query(' declare default element namespace "default.gjtt.com"; <result xmlns="">{ (//repair)[last()] }</result> ')

This code looks like it should work, but here's the output:

<p1:result xmlns="" />

What happened? The namespace declaration added to the result element changed the namespace used for the search, so no repairs were found. You can make a quick fix to the search to retrieve repairs that are in any namespace, as follows.

SELECT @myXQueryNs.query(' declare default element namespace "default.gjtt.com"; <result xmlns="">{ (//*:repair)[last()] }</result> ')

The asterisk is used as a wildcard to search all namespaces for repair elements, but here's the output:

<result xmlns=""   <p1:repair xmlns:p1="default.gjtt.com">>     <p1:id>6</p1:id>     <p1:description>Oil Change.</p1:description>     <p1:cost>39.99</p1:cost>   </p1:repair> </result>

This output looks the same as the original, where the result element is in the default namespace. This exercise demonstrates three key points.

If you change a namespace in your element construction, that new namespace will be used for searching.
When you change the default element namespace from the default null namespace, you have a way to set namespace back to the null namespace.
Consider never using a null namespace; this will also eliminate the problem.

Using the SQL WITH XMLNAMESPACES Clause You can also use the SQL WITH XMLNAMESPACES clause to set the namespaces for your XQuery expression. This clause is an extension to the SQL WITH clause that you can use to define locking hints. (Locking hints are not related to this subject, however.) To eliminate any ambiguities, you must end the previous T-SQL statement with a semicolon (;) before using this clause.

The following SQL snippet retrieves the last repair using the WITH XMLNAMESPACES clause.

; --semicolon required on previous statement WITH XMLNAMESPACES (    'default.gjtt.com' AS "ns1" ) SELECT @myXQueryNs.query(' <result>{ (//ns1:repair)[last()] }</result> ')

Here is the result:

<result>   <ns1:repair xmlns:ns1="default.gjtt.com">     <ns1:id>6</ns1:id>     <ns1:description>Oil Change.</ns1:description>     <ns1:cost>39.99</ns1:cost>   </ns1:repair> </result>

You can also set the default namespace. Note that setting the default namespace using the WITH XMLNAMESPACES clause works the same way as the declare default element namespace clause that we looked at previously. The following SQL snippet sets the default namespace.

; --semicolon required on previous statement WITH XMLNAMESPACES (    DEFAULT 'default.gjtt.com' ) SELECT @myXQueryNs.query(' <result>{ (//repair)[last()] }</result> ')

This code returns the following:

<p1:result xmlns:p1="default.gjtt.com">   <p1:repair>     <p1:id>6</p1:id>     <p1:description>Oil Change.</p1:description>     <p1:cost>39.99</p1:cost>   </p1:repair> </p1:result>

The output is a bit different than with the declare default element namespace clause. Although all of the constructed elements have the proper namespace, the output still has a null default namespace, which might work well if you are combining this output with other output.

Another benefit to using the WITH XMLNAMESPACES clause over the declare default element namespace clause is that you can define the namespaces once for a single SELECT statement. Consider the following example that retrieves a column for the last repair and a column for the first repair.

SELECT @myXQueryNs.query(' declare namespace ns1 = "default.gjtt.com"; <result>{ (//ns1:repair)[last()] }</result> ') as LastRepair, @myXQueryNs.query(' declare namespace ns1 = "default.gjtt.com"; <result>{ (//ns1:repair)[1] }</result> ') as FirstRepair

Notice that the namespace is defined for each XQuery expression. Instead, you can use the WITH XMLNAMESPACES clause as follows and define the namespace only once.

; --semicolon required on previous statement WITH XMLNAMESPACES (    DEFAULT 'default.gjtt.com' ) SELECT @myXQueryNs.query(' <result>{ (//repair)[last()] }</result> ') as LastRepair, @myXQueryNs.query(' <result>{ (//repair)[1] }</result> ') as FirstRepair

Using Namespaces with Attributes When you work with namespaces and attributes, remember that the attribute is in the null namespace unless you explicitly specify a namespace. The following snippet is one example of how to query using an attribute.

SELECT @myXQueryNs.query(' declare namespace ns1 = "owner.gjtt.com"; <result>{ //ns1:vehicle[@vin="234"] }</result> ')

This code returns the following:

<result>   <ns1:vehicle xmlns:ns1="owner.gjtt.com" vin="234" /> </result>

Notice that no namespace is needed for the vin because the attribute is in the null namespace even though the XML document sets the default namespace. If you try to search for the vin with a value of 567, you need the namespace identifier, as shown in the following snippet.

SELECT @myXQueryNs.query(' declare namespace ns1 = "owner.gjtt.com"; <result>{ //ns1:vehicle[@ns1:vin="567"] }</result> ')

This returns:

<result>   <ns1:vehicle xmlns:ns1="owner.gjtt.com" ns1:vin="567" /> </result>

Using Schemas to Work with Typed XML Data

The previous examples use a variable that is an "untyped" xml data type. Untyped xml data types are stored as raw text data with their elements and attributes, and they are checked to verify that they are well formed. Untyped xml data types are ideal when the data in each xml instance is in a different, or varied, format when you want to ensure that the data is well formed but the field can contain any kind of XML data.

You can also create a "typed" xml data type, which requires the use of an XML schema that has been registered with SQL Server. To register schemas with SQL Server, you add the schemas to SQL Server Schema Collections. The use of schemas offers the following benefits.

Provides early error detection based on type checking.
Provides increased performance in for queries by eliminating much of the type inspection and providing a better query plan.
Validates XML instances during inserts.
Validates XML instances during updates.
Optimizes SQL Server storage.

Why Add Schemas to a Schema Collection? You add schemas to a Schema Collection, and you assign the Schema Collection to an xml type to allow schema versioning. Let's say you create a schema for sales orders that has a target namespace of Orders-V1. You add the schema to a new Schema Collection and assign this schema to the applicable xml types. Later, you realize that you need to add an OrderDate element. You can create the new schema with the changes and set the target namespace to Orders-V2. You can add this new schema to the Schema Collection, and the xml types will work with both Orders-V1 and Orders-V2.

The following SQL script adds an abbreviated schema of the Northwind database's Orders table to a new Schema Collection called OrdersSchemas.

SQL: Creating the Schema Collection

CREATE XML SCHEMA COLLECTION OrdersSchemas AS '<?xml version="1.0" standalone="yes"?> <xs:schema        targetNamespace="orders-v1"       xmlns:mstns="orders-v1"       xmlns="orders-v1"       xmlns:xs="http://www.w3.org/2001/XMLSchema"       xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"       attributeFormDefault="qualified"       elementFormDefault="qualified">   <xs:element name="northwindDataSet" msdata:IsDataSet="true"       msdata:UseCurrentLocale="true">     <xs:complexType>       <xs:choice minOccurs="0" maxOccurs="unbounded">         <xs:element name="Orders">           <xs:complexType>             <xs:sequence>               <xs:element name="OrderID" msdata:ReadOnly="true"                  msdata:AutoIncrement="true" type="xs:int" />               <xs:element name="CustomerID" minOccurs="0">                 <xs:simpleType>                   <xs:restriction base="xs:string">                     <xs:maxLength value="5" />                   </xs:restriction>                 </xs:simpleType>               </xs:element>               <xs:element name="Freight" type="xs:decimal" minOccurs="0" />               <xs:element name="ShipName" minOccurs="0">                 <xs:simpleType>                   <xs:restriction base="xs:string">                     <xs:maxLength value="40" />                   </xs:restriction>                 </xs:simpleType>               </xs:element>             </xs:sequence>           </xs:complexType>         </xs:element>       </xs:choice>     </xs:complexType>   </xs:element> </xs:schema>'

You can view your registered schemas by using the xml_schema_namespace function and passing the name of the Schema Collection, as shown in the following SQL snippet.

SELECT xml_schema_namespace('dbo', 'OrdersSchemas')

This code lists all of the schemas in the Schema Collection, but you can use the query method to filter the list of schemas that are returned, based on the targetNamespace attribute in the schema element:

SELECT xml_schema_namespace('dbo', 'ordersSchemas').query('    /xs:schema[@targetNamespace="orders-v1"] ')

After this schema has been added, you can create xml data types that are typed based on this Schema Collection. The following SQL snippet declares and initializes a typed xml variable called @myTypedXml.

SQL: Populating the @myTypedXml Variable

DECLARE @myTypedXml xml(OrdersSchemas) SET @myTypedXml = '<northwindDataSet xmlns="orders-v1">   <Orders>     <OrderID>10248</OrderID>     <CustomerID>VINET</CustomerID>     <Freight>32.3800</Freight>     <ShipName>Vins et alcools Chevalier</ShipName>   </Orders> </northwindDataSet>'

This XML data conforms to the schema, so it will be successfully validated and you can use this variable.

If you realize that you want an ordepdate and you simply try to add the order date to the data, you will receive an error stating that the OrderDate element was not expected. You must create a new version of the schema that identifies the OrderDate as shown in the following SQL snippet.

SQL: Adding Version 2 of the Schema

ALTER XML SCHEMA COLLECTION OrdersSchemas ADD '<?xml version="1.0" standalone="yes"?> <xs:schema        targetNamespace="orders-v2"       xmlns:mstns="orders-v2"       xmlns="orders-v2"       xmlns:xs="http://www.w3.org/2001/XMLSchema"       xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"       attributeFormDefault="qualified"       elementFormDefault="qualified">   <xs:element name="northwindDataSet" msdata:IsDataSet="true"       msdata:UseCurrentLocale="true">     <xs:complexType>       <xs:choice minOccurs="0" maxOccurs="unbounded">         <xs:element name="Orders">           <xs:complexType>             <xs:sequence>               <xs:element name="OrderID" msdata:ReadOnly="true"                  msdata:AutoIncrement="true" type="xs:int" />               <xs:element name="CustomerID" minOccurs="0">                 <xs:simpleType>                   <xs:restriction base="xs:string">                     <xs:maxLength value="5" />                   </xs:restriction>                 </xs:simpleType>               </xs:element>               <xs:element name="OrderDate" type="xs:dateTime" minOccurs="0" />               <xs:element name="Freight" type="xs:decimal" minOccurs="0" />               <xs:element name="ShipName" minOccurs="0">                 <xs:simpleType>                   <xs:restriction base="xs:string">                     <xs:maxLength value="40" />                   </xs:restriction>                 </xs:simpleType>               </xs:element>             </xs:sequence>           </xs:complexType>         </xs:element>       </xs:choice>     </xs:complexType>   </xs:element> </xs:schema>'

Notice that targetNamespace and the default namespace have been changed to orders-v2. Also, the order data has been added between the CustomerID and Freight, and its type is xs:dateTime. Try to use the following code to initialize the @myTypedXml variable.

DECLARE @myTypedXml xml(OrdersSchemas) SET @myTypedXml = '<northwindDataSet xmlns="orders-v2">   <Orders>     <OrderID>10248</OrderID>     <CustomerID>VINET</CustomerID>     <OrderDate>1996-07-04T00:00:00.0000000-04:00</OrderDate>     <Freight>32.3800</Freight>     <ShipName>Vins et alcools Chevalier</ShipName>   </Orders> </northwindDataSet>'

If you want to query for the OrderID of an Order with a specific OrderDate, you can use the following SQL script.

SELECT @myTypedXml.query(' declare default element namespace "orders-v2"; <result>   {//Orders[OrderDate=xs:dateTime("1996-07-04T04:00:00Z")]/OrderID} </result>')

Notice that the filter value must be cast to xs:dateTime. Without the explicit cast, an error will occur stating that the = operator cannot be applied to operators of type dateTime and string. (Note that you don't need to include the milliseconds in your query.) Here is the result:

<result xmlns="orders-v2">    <OrderID>10248</OrderID> </result>

You now have two schema versions in your Schema Collection, and you can use either version, or, to state it more accurately, both versions of the schema and data can co-exist. This is a nice feature that you don't have with standard relational data, which has a much more rigid schema and requires all data to be upgraded to the latest schema when the schema is changed.

Using the xml Data Type in SQL Tables

Up to this point, all of our sample code has been based on assigning XML data to a variable that is an xml data type, but the XML data can also be used as a column type in a database table. This means that you can use relational data with XML data, and everything covered in this chapter is applicable to xml columns.

You can create a table with typed and untyped xml columns. You can have many xml columns in a table, and each xml value (XML instance) that is inserted has a maximum size of 2 GB. To insert a row into the table, use the T-SQL INSERT command, which is used in the following SQL script that creates a table called MyXmlTable in the Northwind database and populates it with data.

SQL: Creating and Populating MyXmlTable

USE Northwind CREATE TABLE MyXmlTable(id int NOT NULL, ownername varchar(50), vehicles xml) GO ALTER TABLE dbo.MyXmlTable ADD CONSTRAINT    PK_MyXmlTable PRIMARY KEY CLUSTERED    (    id    ) WITH(STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,    ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO INSERT MyXmlTable VALUES(1, 'Joe', '<vehicles>    <car vin="234" make="Mazda" model="Miata">       <repair>         <id>7</id>         <description>Repair top.</description>         <cost>123.45</cost>       </repair>       <repair>         <id>22</id>         <description>Oil Change.</description>         <cost>29.99</cost>       </repair>    </car>    <truck vin="567" make="Nissan" model="Pathfinder">       <repair>         <id>2</id>         <description>Replace air filter.</description>         <cost>34.95</cost>       </repair>       <repair>         <id>6</id>         <description>Oil Change.</description>         <cost>39.99</cost>       </repair>    </truck> </vehicles>') INSERT MyXmlTable VALUES(2, 'Mary', '<vehicles>    <car vin="123" make="Ford" model="Mustang">       <repair>         <id>3</id>         <description>Replace Supercharger.</description>         <cost>4000.34</cost>       </repair>       <repair>         <id>45</id>         <description>Front end alignment.</description>         <cost>45.67</cost>       </repair>    </car> </vehicles>') INSERT MyXmlTable VALUES(3, 'Bob', null)

This script inserts a row for Joe and his vehicles, another row for Mary and her vehicles, and another row for Bob, who has no vehicles. In the past, you would have converted (or "shredded") the xml data to relational data, which meant that you would have created a vehicles table and a repairs table to hold the shredded data. But you know that the structure of this data might be very different depending on the vehicle and/or the repair. In addition, the XML data might be coming from an external XML source, so storing this data as an xml column makes sense.

Now that you have some table data, you can run all of the same SELECT statements that we ran earlier in the chapter. For example, the following SQL script returns the ID, the owner's name, and the last repair for each owner.

SQL

SELECT id, ownername,     vehicles.query('(/vehicles/*/repair)[last()]') as lastrepair FROM MyXmlTable

The result is shown in Figure 15-3.

image from book
Figure 15-3: The result of querying for each owner's last repair

Using Column Data in the XQuery Expression

Sometimes you will want to access column data from within your XQuery expression. For example, when you run your query, you might want to return a single xml column that contains XML data that has been constructed and includes the owner's name and ID. For this, you can use the sql:column method, as shown in the following SQL snippet.

SQL

SELECT vehicles.query(' for $e in /* return element owner {     attribute ownerid {sql:column("id")},     attribute name {sql:column("ownername")},     $e } ') FROM MyXmlTable

This code creates an outer element for the owner, creates attributes for the ownerid and name columns, and embeds the existing XML data into this element. Three rows are returned, but Bob has no cars, so his XML data is null. The following shows the XML data for Joe.

<owner owner name="Joe">   <vehicles>     <car vin="234" make="Mazda" model="Miata">       <repair>         <id>7</id>         <description>Repair top.</description>         <cost>123.45</cost>       </repair>       <repair>         <id>22</id>         <description>Oil Change.</description>         <cost>29.99</cost>       </repair>     </car>     <truck vin="567" make="Nissan" model="Pathfinder">       <repair>         <id>2</id>         <description>Replace air filter.</description>         <cost>34.95</cost>       </repair>       <repair>         <id>6</id>         <description>Oil Change.</description>         <cost>39.99</cost>       </repair>     </truck>   </vehicles> </owner>

Using the exist Method with XQuery

In the previous example, MyXmlTable was queried for each owner's last repair, but you might also want to use the xml column in the T-SQL WHERE clause to filter the result. Consider the following SQL script that retrieves all owners and their trucks.

SQL

SELECT id, ownername,     vehicles.query('//truck') as trucks FROM MyXmlTable

This query returns all owners, even if the owner has no trucks. If you want to see only the owners who have at least one truck, you can use the exist method to search the xml column value in each row. If the row has a match, the exist function returns true as a value of 1. If no match is found, the function returns false as a value of 0. If the xml column is null, the exist function returns null. The following SQL script retrieves owners and trucks only if the owner has at least one truck.

SQL

SELECT id, ownername,     vehicles.query('//truck') as trucks FROM MyXmlTable WHERE vehicles.exist('//truck')= 1

The results are shown in Figure 15-4. Notice that Joe is the only truck owner, so he is the only owner that is displayed.

image from book
Figure 15-4: Using the exist method to filter the query output

Using the Modify Method to Change Data

You can use the modify method with XML instances to replace the value of a single node, insert one or more nodes, or delete nodes. This means you don't need to retrieve the complete XML instance, modify it, and replace it when you need to make a small change.

Note

In these examples, literal strings are used to perform modifications. You get an error if you try to modify by using a SQL variable. The use of SQL variables in the modify statement is not yet implemented but might be implemented in a future release. For a workaround, see the upcoming section titled "Moving Part of XML Data from One Row to Another."

Replacing the Value of a Node

If you want to modify the value that is in a single node, you can use the replace value of clause of the modify method. This clause requires a search expression that returns a single node, plus a replacement expression. If the node is not found, no replacement is made. The replacement uses the following format in the modify method.

replace value of    SearchExpression with    NewExpression

Any attempt to execute the modify method on a null XML instance generates an error, so you must add a WHERE clause to your SQL statement to filter out null rows. The following SQL script searches for the repair whose id is 6 and replaces the repair description with an updated description.

SQL

update MyXmlTable SET vehicles.modify(' replace value of (//repair[id=6]/description/text())[1] with "Oil change. Took 5 Quarts of oil."') WHERE vehicles is not null

When this update is executed, the description's text node is modified.

If you want to simply change the value of a node in an xml variable, you can use the following syntax to execute the modify method.

SQL

DECLARE @myXmlVar xml SET @myXmlVar = ' <meats>    <meat>beef</meat>    <meat>pork</meat>    <meat>chicken</meat> </meats>' --initialize SET @myXmlVar.modify(' replace value of (//meat/text()[.="pork"])[1] with "fish" ') --replace node SELECT @myXmlVar --show change

This script declares a variable, initializes it, modifies it, and displays it.

Inserting a New Node

You can insert nodes into the XML instance by using the insert clause of the modify method. This clause requires a search expression that returns a single node, a location indicator, and the insert expression. If the node is not found, no insert is made. You can't use the insert clause to modify a null XML instance. Doing so results in the following exception.

Msg 5302, Level 16, State 1, Line 1 Mutator 'modify()' on 'vehicles' cannot be called on a null value.

The insert clause has the following format.

insert    NewExpression       as first into | as last into | after | before    SearchExpression

An example of the insert method is when additional repairs are made to a vehicle. In this scenario, Mary has had her wiper blades replaced and the repair must be inserted after the last repair. The following script modifies the data by adding an additional repair to Mary's car and then displays the resulting XML instance.

SQL

UPDATE MyXmlTable SET vehicles.modify(' insert <repair><id>102</id> <description>Replace wiper blades.</description> <cost>24.95</cost> </repair> as last into (//car[@vin="123"])[1] ') WHERE id=2 --Mary's id SELECT vehicles FROM MyXmlTable WHERE id=2

The first item of interest is the WHERE clause that returns Mary's row. This means that the only XML instance that will be included in the query is Mary's XML instance. The search expression locates the node of the car that was repaired. The new repair is inserted into the car node, as the last node within the car. The result of this query is as follows.

<vehicles>   <car vin="123" make="Ford" model="Mustang">     <repair>       <id>3</id>       <description>Replace Supercharger.</description>       <cost>4000.34</cost>     </repair>     <repair>       <id>45</id>       <description>Front end alignment.</description>       <cost>45.67</cost>     </repair>     <repair>       <id>102</id>       <description>Replace wiper blades.</description>       <cost>24.95</cost>     </repair>   </car> </vehicles>

Deleting a Node

You can delete nodes in the XML instance by using the delete clause of the modify method. This clause requires a search expression that can return multiple nodes, and those nodes are deleted. If no nodes are found, no nodes are deleted. The format of the delete clause is as follows.

delete    SearchExpression

If Joe sells his car, you must delete it from his XML instance. You can do this by using the following SQL script.

SQL

UPDATE MyXmlTable SET vehicles.modify(' delete //car[@vin="234"] ') WHERE id=1 --Joe's id

Moving Part of XML Data from One Row to Another

Now we will put everything together and also show the workaround to deal with not being able to use SQL variables with the modify method.

Here is the scenario: if Joe sells his truck to Bob, you must retrieve the truck from Joe's XML instance and insert it into Bob's XML instance. Remember that you must check to see if Bob's XML instance is null before attempting the insert, and you must delete the truck from Joe's XML instance as well.

The following SQL script is an example of how you might move a vehicle based on its vin (567) from a source owner (Joe) to a destination owner (Bob).

SQL

DECLARE @SourceOwner int DECLARE @DestinationOwner int DECLARE @vin varchar(50) SET @SourceOwner = 1 SET @DestinationOwner = 3 SET @vin = '567' BEGIN TRY   BEGIN TRAN   DECLARE @sourceVehicle xml   DECLARE @destinationVehicle xml   --get the source vehicle   SELECT @sourceVehicle = vehicles.query('       //vehicles/*[@vin=sql:variable("@vin")]       ')   FROM MyXmlTable   WHERE id = @SourceOwner   AND vehicles is not null   --was source vehicle retrieved?   IF @sourceVehicle.exist('       /*[@vin=sql:variable("@vin")]       ') <> 1   BEGIN     RAISERROR ('Vehicle with VIN: %s not owned by owner: %d',       16, 1, @vin, @SourceOwner)   END   -get destination xml   SELECT @destinationVehicle = vehicles   FROM MyXmlTable   WHERE id = @destinationOwner   --null? add root   IF @destinationVehicle is null   BEGIN     UPDATE MyXmlTable     SET vehicles = '<vehicles/>'     WHERE id = @destinationOwner   END   ELSE   BEGIN     --not null, but no vehicles element; add one     IF @destinationVehicle.exist('/vehicles') <> 1     BEGIN       UPDATE MyXmlTable       SET vehicles.modify('          insert <vehicles/> as last into /       ')       WHERE id = @destinationOwner     END   END   --add to destination owner   --Yuk, can't use xml type variable   --Work around by building and executing string   DECLARE @sql nvarchar(MAX)   SET @sql = 'UPDATE MyXmlTable     SET vehicles.modify(''insert '     + CAST(@sourceVehicle as nvarchar(MAX))     + ' as last into (/vehicles)[1]'')     WHERE id = ' + CONVERT(varchar(MAX),@destinationOwner)   EXEC (@sql)   --delete from source owner   UPDATE MyXmlTable   SET vehicles.modify('       delete //vehicles/*[@vin=sql:variable("@vin")]       ')   WHERE id=@SourceOwner   --done   COMMIT TRAN END TRY BEGIN CATCH   ROLLBACK TRAN   PRINT ERROR_MESSAGE() END CATCH --show results SELECT * FROM MyXmlTable

This script starts by declaring SQL variables to hold the source owner, destination owner, and vin. Next it enters a try/catch block and starts a transaction. It creates temporary variables for the source vehicle and the destination vehicle. It then loads the source vehicle variable from the source owner's table row and checks to see if the vehicle was indeed loaded. It loads the destination vehicle's xml column from the destination owner's table row and checks to make sure it isn't null and that it has a vehicles element to hold the newly purchased vehicle.

After performing all of the validations, the script creates a variable that contains a SQL UPDATE statement that executes the modify method, and then the script executes the contents of the variable. It's a bit of a kludge, but it works.

Using the nodes Method to Change Data

The nodes method accepts an XPath search expression and returns a collection of instances of xml data types, each with its context set to the node that its XPath expression evaluates to. This collection supports the query, value, nodes, and exist methods, and it can be used in count(*) aggregations and null checks. The collection can also be used to build a database table with a single column that contains the XML data. All other uses result in an error.

To understand how this method works, it's best to start by examining how it works with a variable, and then look at its operation with a table.

Earlier in this chapter, we populated a variable called @myXPath with a list of three vehicles, each with two repairs (for a total of six repairs). How can you view these repairs in a tabular format that is similar to a database table format? You can try running an XPath query for the repair elements like the following.

SELECT @myXPath.query('//repair')

This query yields a single row with a single column, but you want to see one row for each repair (six total rows) with a column for the id, description, and cost (three columns). This is where the nodes method can help. By itself, the nodes method breaks up the results into six rows of one column, each containing the XML data for a repair. The nodes method is a table-valued function, so it is used after the FROM clause in a SELECT statement, as follows.

SELECT T.myRepair.query('.') FROM @myXPath.nodes('//repair') AS T(myRepair)

Notice that the FROM clause contains a call to the nodes method on the @myXPath variable, which uses the same XPath search expression that we previously used. This method returns a table with a single column, both unnamed, so you must create an alias for the table and for the column. In this case, we created a table alias called T and a column alias called myRepair by using the AS clause. The myRepair column cannot be directly viewed, but a simple query for self will display its contents, as shown in Figure 15-5.

image from book
Figure 15-5: Using the nodes method to produce a table with a single column containing the XML data for each item matching the search

Now that we have a table with six rows, we can use the value method to extract single values from our column of XML data, which we will use to produce the three columns that are required. Also, it's not necessary to display the column of XML data, so we will eliminate this from the column list. The following SQL snippet shows how this is done.

SQL

SELECT   T.myRepair.value('(./id)[1]','int') as id,   T.myRepair.value('(./description)[1]','nvarchar(max)') as description,   T.myRepair.value('(./cost)[1]','money') as cost FROM @myXPath.nodes('//repair') AS T(myRepair)

For each of the columns, we use the value method to extract the single value and cast it to the desired SQL data type. Each column is also given an alias, and the result of this query produces six repair rows, each containing an id, description, and cost, as shown in Figure 15-6.

image from book
Figure 15-6: Converting XML data to tabular data by using the nodes and value methods

Now that you understand how to easily convert a single variable to a tabular form, it's time to see how the nodes method works when you use table data.

Let's say we have a database table called MyXmlTable that contains three rows that identify owners of vehicles: Joe, Mary, and Bob. Each row has an xml column containing vehicles, and each vehicle has repairs. We want the list of all repairs in a tabular format, similar to the previous example when we worked with a single variable, but we also want a column for the owner's name and the vin of the vehicle, and we want to sort the list on the repair cost.

Locate the SQL script, shown on pages 444 and 445, titled "SQL: Creating and Populating MyXmlTable," and run it. If you already have the table but have made changes to it, drop the table and run the script.

SQL Server provides a relational operator called APPLY that allows you to execute a table-valued function once per each row of an outer table expression. The nodes method is a table-valued function. The APPLY operator is essentially a join to the table-valued function. You specify APPLY after the FROM clause of a query, much as you do the JOIN operator. With the APPLY operator, you can refer to a table-valued function in a correlated subquery. APPLY has two flavors: CROSS APPLY and OUTER APPLY.

CROSS APPLY executes the table-valued function for each row in an outer table expression. It returns a single result set that represents the union of all of the results returned by the individual executions of the table-valued function. For a given outer row, if the table-valued function returns an empty set, the outer row is not returned in the result. This is much like an inner join.

OUTER APPLY is like an outer join in that all outer rows are included even if the table-valued function returns an empty set.

For this example, we will use the CROSS APPLY relational operator to retrieve the results because we don't want to display owners who have no repairs. The following is a SQL script that retrieves the tabular data.

SQL

SELECT   OwnerName,   T.myRepair.value('(../@vin)[1]','nvarchar(max)') as VIN,   T.myRepair.value('(./id)[1]','int') as RepairId,   T.myRepair.value('(./description)[1]','nvarchar(max)') as Description,   T.myRepair.value('(./cost)[1]','money') as Cost FROM MyXmlTable CROSS APPLY vehicles.nodes('//repair') as T(myRepair) ORDER BY Cost

The SELECT statement retrieves the OwnerName as a column. Next, the script displays the vin, which is interesting because you might have thought that the resultant XML data for the nodes would be rooted to the repair, but you can still retrieve the parent and get its vin attribute. The same three columns are then retrieved, as in the previous example. Notice the use of the CROSS APPLY relational operator, which joins the source table to the nodes method's output. Just as in the previous example, the table and its column must be aliased. The output of this script is shown in Figure 15-7.

image from book
Figure 15-7: Using the nodes method with CROSS APPLY creates a new row for each element.

Using the FOR XML Clause

The FOR XML clause was introduced in SQL Server 2000, which provides various options for retrieving data as XML. The FOR XML clause is still available in SQL Server 2005, and it has been extended with the FOR XML PATH option. The following options are available.

FOR XML AUTO Returns nested XML elements, based on the tables listed in the ROM clause of the query, and the fields listed in the SELECT clause.
FOR XML RAW Returns repeating XML elements from a table with a tag name of row", and each table column value is represented as an attribute in the row element. Null column values aren't included.
FOR XML EXPLICIT Returns a user-defined XML shape. This mode provides complete control of the shape of the returned XML by allowing you to mix attributes and elements as needed and even create space-separated value lists and mixed content.
FOR XML PATH Returns a user-defined XML shape. This mode allows you to provide Path syntax as a column name that can be mapped into an attribute, element, subelement structure, element content, text, or data value.

You can easily combine non-xml columns with xml columns by using one of the FOR XML clauses. For example, you can execute the following SQL snippet to retrieve an XML document that represents MyXmlTable, which contains the owners and their vehicles.

SELECT * FROM MyXmlTable FOR XML PATH

Here is the result:

<row>   <id>1</id>   <ownername>Joe</ownername>   <vehicles>     <vehicles>       <car vin="234" make="Mazda" model="Miata">         <repair>           <id>7</id>           <description>Repair top.</description>           <cost>123.45</cost>         </repair>         <repair>           <id>22</id>           <description>Oil Change.</description>           <cost>29.99</cost>         </repair>       </car>       <truck vin="567" make="Nissan" model="Pathfinder">         <repair>           <id>2</id>           <description>Replace air filter.</description>           <cost>34.95</cost>         </repair>         <repair>           <id>6</id>           <description>Oil Change.</description>           <cost>39.99</cost>         </repair>       </truck>     </vehicles>   </vehicles> </row> <row>   <id>2</id>   <ownername>Mary</ownername>   <vehicles>     <vehicles>       <car vin="123" make="Ford" model="Mustang">         <repair>           <id>3</id>           <description>Replace Supercharger.</description>           <cost>4000.34</cost>         </repair>         <repair>           <id>45</id>           <description>Front end alignment.</description>           <cost>45.67</cost>         </repair>       </car>     </vehicles>   </vehicles> </row> <row>   <id>3</id>   <ownername>Bob</ownername> </row>

This default output looks good, but the benefit of the FOR XML PATH clause is that you can control the look of the generated XML by assigning XPath expressions to the column names. For example, if you want the XML output to be an XML fragment having a root node for each owner with the vehicles nested in each owner element and attributes of the owner element giving the owner information, you can use the following SQL snippet.

SELECT id 'owner/@id',   ownername 'owner/@name',   vehicles 'owner'  FROM MyXmlTable FOR XML PATH('')

Notice that an XPath-like expression is assigned to the column alias for ID and owner name, indicating that attributes will be created on an owner element for this data. The vehicle's xml column has owner as its location, which means the vehicles will be nested inside each owner. The empty string after PATH specifies the name of the root element, which will be omitted because we've specified an empty string. The following XML fragment is the result:

<owner  name="Joe">   <vehicles>     <car vin="234" make="Mazda" model="Miata">       <repair>         <id>7</id>         <description>Repair top.</description>         <cost>123.45</cost>       </repair>       <repair>         <id>22</id>         <description>Oil Change.</description>         <cost>29.99</cost>       </repair>     </car>     <truck vin="567" make="Nissan" model="Pathfinder">       <repair>         <id>2</id>         <description>Replace air filter.</description>         <cost>34.95</cost>       </repair>       <repair>         <id>6</id>         <description>Oil Change.</description>         <cost>39.99</cost>       </repair>     </truck>   </vehicles> </owner> <owner  name="Mary">   <vehicles>     <car vin="123" make="Ford" model="Mustang">       <repair>         <id>3</id>         <description>Replace Supercharger.</description>         <cost>4000.34</cost>       </repair>       <repair>         <id>45</id>         <description>Front end alignment.</description>         <cost>45.67</cost>       </repair>     </car>   </vehicles> </owner> <owner  name="Bob" />

You can see that the FOR XML PATH clause makes it easy to repurpose the relational and XML data.

Indexing the xml Column

Every time you execute the query method, you parse the xml data type one or more times for each row in the table. With a large row count and/or large XML data BLOBs, this can hurt performance. To speed up query method performance, you can index the xml columns but be sure to take into account the cost of maintaining the index if the data changes frequently.

You can create indexes on typed and untyped xml columns; this creates a B+ tree, which is a tree structure that is used to perform quick lookups, for each instance in the column. The first index you create on an xml column is called the primary XML index. You can create three types of secondary XML indexes to speed up common types of queries.

To create the primary XML index, you must make the primary key of the table a clustered index. The following SQL script creates the clustered primary key on MyXmlTable and then creates an XML index called idx_vehicles on the xml column vehicles of table MyXmlTable.

CREATE PRIMARY XML INDEX idx_vehicles on MyXmlTable (vehicles)

After you create the primary XML index, you can create secondary XML indexes for different classes of queries. The three classes of secondary XML indexes are PATH, PROPERTY, and VALUE.

The PATH index is used to speed up path-based queries by building a B+ tree on the path and value columns of the primary XML index. Probably the most noticeable performance increase will be when you use the exist method of the xml column in the WHERE clause of a SELECT statement. To create this index, use the following SQL snippet.

SQL: Creating the PATH Index

CREATE XML INDEX idx_vehicles_Path on MyXmlTable(vehicles)    USING XML INDEX idx_vehicles FOR PATH

The PROPERTY index creates a B+ tree on the primary key, path, and value columns of the primary XML index. This index speeds up performance on value lookups within an XML instance.

SQL: Creating the PROPERTY Index

CREATE XML INDEX idx_vehicles_Property on MyXmlTable(vehicles)    USING XML INDEX idx_vehicles FOR PROPERTY

The VALUE index creates a B+ tree on the value and path columns of the primary XML index. This index increases performance on queries that search for a node based on the descendent axes (that use //) and wildcard queries that use the asterisk.

SQL: Creating the VALUE Index

CREATE XML INDEX idx_vehicles_Value on MyXmlTable(vehicles)    USING XML INDEX idx_vehicles FOR VALUE

The xml column cannot be part of a primary or foreign key, but you can create an index on the XML data in a table. Here are the limitations.

An index can be created only on a table; indexing on a view is not supported.
Only a single column can be specified in an xml index. Compound indexes are not supported.
The primary key of the table must be a clustered index. You cannot modify the primary clustered index without first dropping the xml index.

Table of content

Programming Microsoft ADO. NET 2.0 Applications. Advanced Topics

Linux Application Development (2nd Edition)

ISBN: 735621411
EAN: 2147483647

Year: 2004
Pages: 85

Authors: Michael K. Johnson, Erik W. Troan

BUY ON AMAZON

Inside Network Security Assessment: Guarding Your IT Infrastructure

Image Processing with LabVIEW and IMAQ Vision

101 Microsoft Visual Basic .NET Applications

The Java Tutorial: A Short Course on the Basics, 4th Edition

Web Systems Design and Online Consumer Behavior

Java Concurrency in Practice