The xml Data Type
SQL Server 2005 introduces a new data type, xml, to store XML data inside a relational database. The xml data type can be used for a column in a table, as a variable, as a parameter to a stored procedure or function, and as a function return value. The SQL Server 2005 xml data type implements the ISO SQL-2003 standard xml data type. You can store an entire XML document or XML fragments (XML data without single top-level element) in xml data type columns and variables. By default, SQL Server does not enforce the storing of well-formed or valid XML documents, and it allows the storing of XML fragments in xml data type columns and variables.
As with other data types, you can assign defaults, define column-level and table-level constraints, create XML indexes, and define full-text indexes on xml data type columns. However, there are few restrictions:
The xml data type is not supported on SQL Server 2005 Mobile Edition. If xml data type columns are synced to the SQL Server Mobile database, they will be converted to the ntext type.
Here is an example of using the xml data type in a table:
USE [AdventureWorks]; GO --User-defined functions used later while creating a table IF OBJECT_ID('dbo.IsXMLFragment') IS NOT NULL DROP FUNCTION dbo.IsXMLFragment; GO CREATE FUNCTION dbo.IsXMLFragment(@xmlData xml) returns bit BEGIN RETURN CASE @xmlData.value('count(/*)', 'bigint') WHEN 1 THEN (@xmlData.exist('/text()')) ELSE 1 END END; GO IF OBJECT_ID('dbo.EmpIDPresent') IS NOT NULL DROP FUNCTION dbo.EmpIDPresent; GO CREATE FUNCTION dbo.EmpIDPresent(@xmlData xml) returns bit BEGIN RETURN @xmlData.exist('/Employee/@ID') END; GO IF OBJECT_ID('dbo.tblXMLTest') IS NOT NULL DROP TABLE dbo.tblXMLTest; GO CREATE TABLE dbo.tblXMLTest (id INT IDENTITY(1,1) PRIMARY KEY, col1 XML, col2 XML NOT NULL, col3 XML DEFAULT N'<Person />', col4 XML NOT NULL DEFAULT N'<Address />', col5 XML CHECK(dbo.IsXMLFragment(col5) = 0), col6 XML NULL CHECK(dbo.EmpIDPresent(col6) = 1) ); GO
This script is available in the file XMLDataTypeDemo.sql in the code download for this book. This script creates a table with six columns of xml data type. The second column, col2, specifies the NOT NULL constraint; the third column, col3, specifies the column default; the fifth column, col5, specifies a check constraint and uses a user-defined function to ensure that only well-formed XML documents are inserted into the column; and the sixth column, col6, also specifies a check condition and uses a user-defined function to ensure that the XML document or fragment has a top-level element called Employee and an attribute called ID. The exist() and value() methods used inside the user-defined functions, as well as other methods available with the xml data type, are discussed later in this chapter.
In the following script, the first three INSERT statements succeed, and the other four INSERTs fail:
INSERT INTO dbo.tblXMLTest(col2, col5) VALUES (N'<col2Data />', N'<col5Data />'); INSERT INTO dbo.tblXMLTest(col2, col5, col6) VALUES (N'<col2Data />', N'<col5Data />', N'<Employee />'); INSERT INTO dbo.tblXMLTest(col2, col5, col6) VALUES ('<col2Data />', '<col5Data />', '<Employee /><Employee />'); GO SELECT * FROM dbo.tblXMLTest; GO INSERT INTO dbo.tblXMLTest(col5, col6) VALUES (N'<col5Data />', N'<Employee />'); INSERT INTO dbo.tblXMLTest(col2, col5, col6) VALUES (N'<col2Data />', N'<col5Data /><col5Data />', N'<Employee />'); INSERT INTO dbo.tblXMLTest(col2, col5, col6) VALUES (N'<col2Data />', N'<col5Data />', N'<Employee/>'); INSERT INTO dbo.tblXMLTest(col2, col4, col5) VALUES (N'<col2Data />', NULL, N'<col5Data />'); GO
Of the four failed INSERTs toward the end, the first one fails because col2 does not allow NULLs, the second INSERT fails because col5 does not allow XML fragments, the third one fails because col6 does not have XML with the Employee top-level element having an ID attribute, and the final one fails because col4 is non-NULLable.
The following script shows how to declare a variable of the xml data type:
DECLARE @var1 XML SET @var1 = '<TestVariables />' INSERT INTO dbo.tblXMLTest(col2, col5) VALUES (@var1, @var1); SELECT * FROM dbo.tblXMLTest; GO
The following script block illustrates how to pass an xml data type as a function parameter to implement a technique called "property promotion," where some value from XML is copied to a separate table column, so that you don't have to necessarily access and parse XML to access that value:
CREATE FUNCTION dbo.fn_GetID (@empData XML) RETURNS INT AS BEGIN DECLARE @retVal INT -- do some more things here on the input XML SELECT @retVal = @empData.value('(/Employee/@ID)', 'INT') RETURN @retVal END; GO SELECT dbo.fn_GetID(col6) FROM dbo.tblXMLTest; ALTER TABLE dbo.tblXMLTest ADD empNo AS dbo.fn_GetID(col6); INSERT INTO dbo.tblXMLTest(col2, col5, col6) VALUES ('<col2Data />', '<col5Data />', '<Employee />'); GO SELECT * FROM dbo.tblXMLTest; GO IF OBJECT_ID('dbo.tblXMLTest') IS NOT NULL DROP TABLE dbo.tblXMLTest; GO IF OBJECT_ID('dbo.fn_GetID') IS NOT NULL DROP FUNCTION dbo.fn_GetID; GO IF OBJECT_ID('dbo.IsXMLFragment') IS NOT NULL DROP FUNCTION dbo.IsXMLFragment; GO IF OBJECT_ID('dbo.EmpIDPresent') IS NOT NULL DROP FUNCTION dbo.EmpIDPresent; GO
This script begins with a user-defined function that accepts an xml data type parameter, uses the value function on the input XML parameter, and returns an integer value. Next, the ALTER TABLE statement adds a new computed column whose value is the integer returned by the fn_GetID function, passing the col6 column to it.
Validating XML Data by Using Typed XML
So far you have seen examples of storing untyped XML documents and XML fragments into xml data type columns, variables, and parameters. SQL Server 2005 supports associating XSD schemas with the xml data type so that the XML value for the column, variable, or parameter adheres to the structure and data types defined by the associated XSD schema. Having SQL Server validate the xml data type column, variable, or parameter by associating it with an XML schema collection is known as typed XML.
With typed XML, you can associate an XSD schema collection with the xml data type column, and the engine validates against the associated schema and generates an error if the validation fails. The typed XML has two main benefits. First, it ensures that the XML data in the column, variable, or parameter is according to the schema you desire. Second, it helps the engine to optimize storage and query processing. Also, when declaring the typed XML, you can specify the DOCUMENT clause, which ensures that the XML has only one top-level element and hence XML fragments are disallowed. The default is CONTENT, which allows XML fragments.
For performance reasons, it is recommended that you use typed XML. With untyped XML, the node values are stored as strings, and hence the engine has to do the data conversion when you extract the XML values or use node values in the predicate (for example, /person/age < 50); on the other hand, with typed XML, no data conversion takes place because the XML data values are internally stored based on types declared in the XSD schema. Typed XML makes parsing more efficient and avoids any runtime conversions.
Here's an example of creating typed XML:
Bulk Loading XML Data
If you have XML data in a disk file and would like to load it into an xml data type column, you can use the BULK rowset provider with the OPENROWSET function and specify the SINGLE_CLOB option to read the entire file as a single-row, single-varchar(max) column value.
Let's assume that you have the following XML text saved into a disk file called "c:\PO.xml" on the SQL Server machine:
<purchaseOrder orderDate="1999-10-20" xmlns="http://schemas.sams.com/PO"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <items> <item> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> </item> </items> </purchaseOrder>
The following T-SQL statements illustrate creating a table with an untyped xml data type column and bulk loading the preceding XML file into this column:
USE AdventureWorks; GO IF OBJECT_ID('dbo.tblBulkLoadXML') IS NOT NULL DROP TABLE dbo.tblBulkLoadXML; GO CREATE TABLE dbo.tblBulkLoadXML (id INT IDENTITY(1,1) NOT NULL PRIMARY KEY, POData XML NOT NULL); GO INSERT INTO dbo.tblBulkLoadXML (POData) SELECT * FROM OPENROWSET(BULK 'c:\PO.xml', SINGLE_CLOB) AS POData; GO SELECT * FROM dbo.tblBulkLoadXML;
You can use the technique discussed here for using the OPENROWSET function to read XML from a file. However, SQL Server 2005 does not provide any way to write the XML data from a XML column to a file. You can write a SQLCLR managed procedure to write XML into the file. Chapter 11, "SQL Server 2005 and .NET Integration," contains the C# code for this.
Before we look at other functions to query and modify the xml data type columns and variables, let's first take a look at a quick overview of the XQuery specification.
Introduction to XQuery
If SQL is used to query relational data, XQuery is a language that is used to query data that is either physically stored as XML or virtualized as XML. In addition, XQuery also allows general processing of XML (such as creating nodes). XQuery borrows a lot of features from XPath. XQuery 1.0 is said to be an extension to XPath 2.0, adding support for better iteration, sorting of results, and construction (that is, the ability to construct the shape of the desired XML). In summary, XQuery is an expression-based declarative query language that is used to efficiently extract data from XML documents. In SQL Server 2005, the data stored in xml data type columns and variables can be queried using XQuery. SQL Server 2005 partially implements the W3C XQuery specification (see www.w3.org/TR/xquery) and is aligned with the July 2004 working draft (see www.w3.org/TR/2004/WD-xquery-20040723/).
An XQuery query consists of two parts: the prolog and the query body. The optional prolog section is used to declare the namespaces used in the query, and the required query body consists of an expression that is evaluated by the SQL Server 2005 engine to produce the desired output. Each XQuery prolog entry must end with a semicolon (;).
These are the three most common expression types used in XQuery queries:
SQL Server 2005 primarily provides three xml data type methods that can be used to run XQuery queries. These methods are query(), value(), and exist(). The other two xml type methods are nodes(), which is used to shred xml type instance into relational data, much like OPENXML, and the modify() method, which is used to modify the content of an xml type column or a variable.
XML Type Methods
The xml data type supports five methods that you can use to manipulate XML instances; you can call these methods by using xmltype.method() syntax:
Let's begin with an example of the query() method:
USE AdventureWorks; GO SELECT Name, Demographics.query(' declare namespace d= "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey"; d:StoreSurvey/d:AnnualSales') AS AnnualSales FROM Sales.Store; GO
This T-SQL statement illustrates the query() method. The parameter to the query() method is an XQuery query that begins with a namespace declaration in the prolog section and a path expression as the query body. This SELECT statement queries the Demographics xml data type column to find out each store's annual sales. Note that the query() method returns the untyped XML as the result.
You can also declare namespaces by using the WITH XMLNAMESPACES clause, as shown here:
WITH XMLNAMESPACES ( 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey' AS "d") SELECT Name, Demographics.query('d:StoreSurvey/d:AnnualSales') AS AnnualSales FROM Sales.Store; GO
You can use the value() method to get the node data as a scalar value:
SELECT Name, Demographics.value(' declare namespace d= (d:StoreSurvey/d:AnnualSales)', 'decimal(18,2)') AS AnnualSales FROM Sales.Store; GO
The value() method takes two parameters: an XQuery expression string and a string that indicates the type of returned scalar value (decimal(18,2), in this example). The second parameter cannot be specified as an xml data type, a CLR user-defined type, or an image, text, ntext, timestamp, or sql_variant data type. The XQuery expression must return a singleton or an empty sequence.
The exist() method takes just one parameterthe XQuery expressionand returns either 1 (indicating trUE), 0 (indicating FALSE), or NULL (indicating that the xml data type instance was NULL). This method is generally used in the WHERE clause. Let's say you want to get a list of stores that have T3 internet lines. One of the ways in which you can do this is by running the following query:
SELECT * FROM Sales.Store WHERE Demographics.value(' declare namespace d= "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey"; "" (/d:StoreSurvey/d:Internet)', 'varchar(20)') = 'T3'; GO
However, this is not an efficient approach because SQL Server has to retrieve the node value, do the conversion (Unicode to ANSI, in this case), and then do the comparison. An alternate and better approach is to use the exist() method as shown here:
SELECT * FROM Sales.Store WHERE Demographics.exist(' declare namespace d= "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey"; "" /d:StoreSurvey[d:Internet = "T3"]') = 1; GO
The exist() method exploits the PATH and other XML indexes (discussed later in this chapter) more effectively and can yield better performance than the value() method.
Before concluding this section, here are some more examples that show the capabilities of XQuery. The first two queries illustrate the FLWOR expressions, and the next two queries illustrate constructing XML using XQuery:
The first query in this batch illustrates the FLWOR expression. The $emp looping variable is bound to the Employment elements under the Resume top-level elements, and for each such element, the XQuery expression sorts and returns the organization names, separated by tilde characters (~). The second FLWOR example, which illustrates the where clause, and returns the steps that have both material and tool nodes. The final two examples show constructing XML directly and by using computed approach.
XQuery in SQL Server 2005 supports various types of functions and operators, including arithmetic, comparison, and logical operators and the string manipulation, data accessors, and aggregate functions. XQuery expressions also support the if-then-else construct that you can use to perform operations based on the value of a conditional expression. The sql:column() and sql:variable() functions can be used inside XQuery expressions to access a non-XML relational column and an external variable.
SQL Profiler includes a trace event called XQuery Static Type under the TSQL event category that you can use to trace XQuery activity. The event provides a method name, including the column on which the method was executed and the static type of the XQuery expression. To use it, you run Profiler, select the XQuery Static Type event, and run the preceding queries. If you do not see the events in Profiler, you can run DBCC FREEPROCCACHE and then run the XQuery queries again.
In summary, XQuery can be used for querying and reshaping the data stored in xml data type columns and variables. Processing XML at the server by using XQuery can result in reduced network traffic, better maintainability, reduced risk, and increased performance.
Indexes on XML Type Columns
Properly designed indexes are the key to improving query performance. At the same time, you need to consider the cost of maintaining indexes in measuring the overall benefits of the indexes created. As mentioned earlier, SQL Server 2005 stores XML data as a BLOB, and every time an XML column is queried, SQL Server parses and shreds the XML BLOB at runtime to evaluate the query. This can be quite an expensive operation, especially if you have large XML data stored in the column or if you have a large number of rows. To improve the performance of queries on xml data type columns, SQL Server 2005 provides two new types of indexes: primary XML indexes and secondary XML indexes. You can create XML indexes on typed or untyped XML columns.
You can create primary XML indexes by using the CREATE PRIMARY XML INDEX DDL statement. This type of index is essentially a shredded and persisted representation of XML BLOB data. For each XML BLOB in the column, the primary index creates several rows of data in an internal table. This results in improved performance during query execution time because there is no shredding and parsing involved at runtime. The primary XML index requires a clustered index on the primary key of the base table in which the XML column is defined. If the base table is partitioned, the primary XML index is also partitioned the same way, using the same partitioning function and partitioning scheme. After a primary XML index is created on a table, you cannot change the primary key for that table unless you drop all the XML indexes on that table.
All the primary XML index does is avoid the runtime shredding. After you analyze your workload, you can create secondary XML indexes to further improve the performance of an XML query.
A secondary XML index cannot be created unless you have a primary XML index. There are three types of secondary XML indexesPATH, VALUE, and PROPERTYeach designed for improving the response time for the respective type of query. You can create secondary XML indexes by using the CREATE XML INDEX DDL statement.
If your XML queries make use of path expressions, such as /Production/Product/Material, the PATH secondary XML index can improve the performance of such queries. In most cases, the exist() method on XML columns in a WHERE clause benefits the most from the PATH indexes. The PATH index builds a B+ tree on the path/value pair of each XML node in the document order across all XML instances in the column.
If your XML queries are based on node values and do not necessarily specify the full path or use wildcards in the path (for example, //Sales[@amount > 10000] or //Catalog[@* = "No"]), the VALUE secondary XML index can improve the performance of such queries. The VALUE index creates a B+ tree on the value/path pair of each node in the document order across all XML instances in the XML column.
If your XML queries retrieve multiple values from individual XML instances, the PROPERTY XML index might benefit from using such queries because it groups the properties for each XML together.
You can include the word Primary, Path, Property, or Value in the name of a primary or secondary XML index that you create to quickly identify the type of XML index.
The following T-SQL batch illustrates creating primary and secondary XML indexes and then using the catalog views and functions to view the XML index details:
USE AdventureWorks; GO IF OBJECT_ID('dbo.tblIndexDemo') IS NOT NULL DROP TABLE dbo.tblIndexDemo; GO CREATE TABLE dbo.tblIndexDemo (ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL, col1 XML NOT NULL); GO CREATE PRIMARY XML INDEX PrimaryXMLIdx_col1 ON dbo.tblIndexDemo(col1); GO CREATE XML INDEX PathXMLIdx_col1 ON dbo.tblIndexDemo(col1) USING XML INDEX PrimaryXMLIdx_col1 FOR PATH; GO CREATE XML INDEX PropertyXMLIdx_col1 ON dbo.tblIndexDemo(col1) USING XML INDEX PrimaryXMLIdx_col1 FOR PROPERTY; GO CREATE XML INDEX ValueXMLIdx_col1 ON dbo.tblIndexDemo(col1) USING XML INDEX PrimaryXMLIdx_col1 FOR VALUE; GO --Get Index Information SELECT * FROM sys.xml_indexes WHERE [object_id] = OBJECT_ID('dbo.tblIndexDemo'); --Cleanup IF OBJECT_ID('dbo.tblIndexDemo') IS NOT NULL DROP TABLE dbo.tblIndexDemo; GO
This script creates a table that has a column of xml data type. It then creates a primary XML index on this column, followed by all three types of secondary indexes on the same column. You can use the sys.xml_indexes catalog view to view the details on XML indexes. This catalog view indicates whether an XML index is a primary or secondary index; if it is secondary, the view indicates what primary index it is based on and what type (PATH, PROPERTY, or VALUE) of secondary index it is.
SQL Server 2005 allows you to create full-text indexes on xml data type columns. You can combine a full-text search with XML index usage in some scenarios to first use fulltext indexes to filter the rows and then use XML indexes on those filtered rows, in order to improve the query response time. However, note that attribute values are not full-text indexed as they are considered part of the markup, which is not full-text indexed.
Modifying XML Data
SQL Server 2005 provides the modify() method, which you can use to change parts of XML content stored in an xml data type column or variable. When you change the xml type table column, the modify() method can only be called within a SET clause in an UPDATE statement; when you change the xml type variable, the modify() method can only be called by using the SET T-SQL statement. The modify() function can be used to insert one or more nodes, to delete nodes, or to update the value of a node. This function takes an XML Data Modification Language (XML DML) expression, which is an extension to the XQuery specification. XQuery 1.0 does not support the update functionality. Hence, SQL Server 2005 introduces an extension to XQuery 1.0 by including three new casesensitive keywords"insert", "delete", and "replace value of"that you can use inside an XQuery query to change parts of the XML data.
The following T-SQL script shows an example of the modify() method and XML DML:
DECLARE @xmlVar xml SET @xmlVar = N' <Person> <Phone type="h">111-111-1111</Phone> <Phone type="c">222-222-2222</Phone> </Person> '; SELECT @xmlVar; --insert SET @xmlVar.modify( 'insert <Phone type="w">333-333-3333</Phone> into (/Person)'); --delete SET @xmlVar.modify( 'delete /Person/Phone[@type="c"]'); --change node value SET @xmlVar.modify( 'replace value of (/Person/Phone[@type="h"]/text()) with "444-444-4444"'); SELECT @xmlVar; GO
At the end, the @xmlVar XML variable has the following value:
<Person> <Phone type="h">444-444-4444</Phone> <Phone type="w">333-333-3333</Phone> </Person>