Creating Indexes on XML Data Types | Beginning XML Databases (Wrox Beginning Guides)

XML data types can contain very large XML documents and store them inside an SQL Server database. The result is a large chunk of data stored into a binary object data type. Any searches through that XML document involve scanning the entire XML document data space. In many cases, queries can perform better by reading smaller portions of a data space in order to facilitate more efficient scans through large chunks of data. One method of improving performance of data reads is by using indexes.

SQL Server 2005 allows creation of indexes on XML data types. This book really needs to discuss only these indexes, what they are, and how they are created. Demonstrating their use is more of a performance tuning topic and is beyond the scope of this book. SQL Server also uses XML indexes in such a way that queries executed against XML data types use the same SQL query engine process that SQL queries do, including all query planning, query optimization, and query execution processing. So there is additionally no overhead to using a query processing engine specific to XML, and thus not as far advanced as the SQL Server processing query engine.

A query against an XML data type, regardless of the existence of XML indexes or not, uses something called XQuery. XQuery comprises special methods executable directly against XML data types as self-contained object methods. These methods are discussed specifically for SQL Server in the first section of this chapter. XQuery is a standard established as a basis for queries against XML documents. XQuery is discussed later on in this book from a more generic perspective.

The most basic reasons for the existence of an index against a set of data are as follows :

Index size : Generally an index will be much smaller than a relational table because an index is created against a small physical portion of a table. If a few rows are selected from the table then reading the index and table in tandem can be faster than reading the entire table.
Index algorithms: Indexes use specialized algorithms that allow for rapid selective searching for specific values. These specialized algorithms are not available when reading table data correctly.
Index order: Indexes are sorted in the order by which records are selected. For example, using an index on the phone number would help in reading a table of customers in phone number order, when the table was created in customer name order.

SQL Server allows four different types of indexes, which can be created against XML data types:

Primary XML index: Contains a single row (like a table) for each node, built into a clustered BTree index structure. Essentially its a flat relational structure of an XML hierarchy.
Secondary XML indexes: These indexes are actually created on the node table produced by the primary index:
- PATH index: An index on the node paths plus values in an XML document. Helps with searching through the paths (hierarchical node structure) of an XML document when using value comparisons.
- PROPERTY index: An index on the primary key of the XML table, plus the path and the values (as for the path index above). A property index is better for searching directly for values in an XML document, as opposed to a path searching through a hierarchical node structure.
- VALUE index: Similar to a path index but in the opposite order where values are indexed before path values. Useful for searching for values across an entire XML document structure, regardless of the node path hierarchy.

The following commands create a primary XML index on the XML field, in the XML table, for the demographics database used for this book. First create a table containing two fields where one is an integer primary key and the other an XML data type field:

   CREATE TABLE XML(    ID INT PRIMARY KEY,    XML XML) GO

Next create the primary key on the XML data type in the XML table:

   CREATE PRIMARY XML INDEX XMLIDX ON XML(XML) GO

Now lets add some data to the table, copying a single region from the DEMXML table into the new XML table created previously:

   DECLARE @xmldoc xml SET @xmldoc = (SELECT xml.query('/demographics/region[name="North America"]')    FROM demXML) INSERT INTO XML(ID,XML) VALUES(10,@xmldoc) SELECT ID, XML FROM XML GO

The disadvantage of creating a primary index on an XML data type is that the primary key index flattens out (or decomposes) the XML hierarchy. This tends to introduce duplication such that the primary index can actually become larger in physical size than the XML document (or fragment in this case). However, the index is constructed using a special algorithm, resulting in a BTree index. Reading a BTree index allows for scans through a tree structure, which is nearly always much faster than reading the entire physical space of an XML document. Exceptions to this rule are when deliberately reading the entire XML document, or reading a large percentage of the XML structure, or when the dataset is very small.

Secondary indexes can assist in XQuery processing by effectively creating a BTree index on all the path routes through an XML document:

   CREATE XML INDEX XMLIDXPATH ON XML(XML) USING XML INDEX XMLIDX FOR PATH GO

Or all the attribute values through an XML document:

   CREATE XML INDEX XMLIDXPATH ON XML(XML) USING XML INDEX XMLIDX FOR PROPERTY GO

Or all textual values for nodes through an XML document:

   CREATE XML INDEX XMLIDXPATH ON XML(XML) USING XML INDEX XMLIDX FOR VALUE GO

This chapter has attempted to introduce the use of XML directly from within SQL Server database. As in the previous chapter, which covered Oracle and XML, XML is vastly more complex and detailed than presented in this chapter.