Section 7.5. Indexing XML Data


7.5. Indexing XML Data

xml data type instances are stored as binary large objects (BLOB ) in xml data type columns. If these columns are not indexed, they must be shredded at runtime for each row in the table to evaluate a query. This can be costly, especially with large xml data type instances or a large number of rows in the table. Building primary and secondary XML indexes on xml data type columns can significantly improve query performance.

Shredding is the process of mapping and converting an XML document into tables in a relational database.


An xml data type column can have one primary XML index and multiple secondary XML indexes, where:


Primary XML index

A relational index on the shredded and persisted representation of all tags, values, and paths of XML instances in the xml data type column. The index creates several rows of data for each instance in the column.

A primary XML index requires a clustered index on the primary key of the table containing the xml data type being indexed.


Secondary XML index

Further improves performance for specific types of queries. A primary XML index must exist on the xml data type column before a secondary XML index can be created.

There are three types of secondary XML indexes:


PATH index

Optimizes queries based on path expressions


VALUE index

Optimizes value-based queries for paths that include wildcards or are not fully specified


PROPERTY index

Optimizes queries based on properties in a specific XML instance stored in a column

An XML index can be created only on a single xml data type column. XML indexes cannot be created on the following:

  • A non-xml data type column

  • An xml data type column in a view

  • A table-valued variable with xml data type columns

  • An xml data type variable

  • A computed xml data type column

A relational index cannot be created on an xml data type column.

You must set the SQL Server 2005 options listed in Table 7-4 when creating or rebuilding an XML index on an xml data type column. If these options are not set, you will not be able to create or rebuild the XML index, and you will not be able to insert values into or modify values in indexed xml data type columns.

Table 7-4. SQL Server 2005 option settings for creating and rebuilding an index on an XML column

SET option

Required value

ANSI_NULLS

ON

ANSI_PADDING

ON

ANSI_WARNINGS

ON

ARITHABORT

ON

CONCAT_NULL_YIELDS_NULL

ON

NUMERIC_ROUNDABORT

OFF

QUOTED_IDENTIFIER

ON


Primary and secondary indexes on xml data type columns are created, changed, and dropped similarly to indexes on non-xml data type columns. The following subsections describe managing indexes on xml data type columns.

7.5.1. Creating an XML Index

The CREATE INDEX statement is used to create a new primary or secondary XML index on an xml data type column. The syntax is:

    CREATE [ PRIMARY ] XML INDEX index_name
        ON <object> ( xml_column_name )
        [ USING XML INDEX xml_index_name
            [ FOR { VALUE | PATH | PROPERTY } ]
        [ WITH ( <xml_index_option> [ ,...n ] ) ]
    [ ; ]

    <object> ::=
    { [ database_name . [ schema_name ] . | schema_name . ] table_name }

    <xml_index_option> ::=
    {
        PAD_INDEX = { ON | OFF }
      | FILLFACTOR = fillfactor
      | SORT_IN_TEMPDB = { ON | OFF }
      | STATISTICS_NORECOMPUTE = { ON | OFF }
      | DROP_EXISTING = { ON | OFF }
      | ALLOW_ROW_LOCKS = { ON | OFF }
      | ALLOW_PAGE_LOCKS = { ON | OFF }
      | MAXDOP = max_degree_of_parallelism
    }
 

where:


[PRIMARY] XML INDEX

The xml data type column on which to create the index. If PRIMARY is specified, a clustered index is created on the column. Each xml data type column can have one primary and multiple secondary indexes. A primary index must exist on an xml data type column before a secondary index can be created on the column.


index_name

The name of the index, which must be unique within the table. A primary XML index name cannot begin with the following characters: #, ##, @, or @@.


<object> :: = { [ database_name. [ schema_name ] . | schema_name . ] table_name }

The object to be indexed, fully qualified or not, where:


database_name

The name of the database.


schema_name

The name of the schema to which the table belongs.


table_name

The name of the table to be indexed.


xml_column_name

The xml data type column on which to create the index.


USING XML INDEX xml_index_name

The primary XML index used to create the secondary XML index.


FOR { VALUE | PATH | PROPERTY }

The type of secondary XML index to create, where:


VALUE

Creates a VALUE secondary XML index on the VALUE, HID, PK, and XID columns of the primary XML index.


PATH

Creates a PATH secondary XML index on the HID, VALUE, PK, and XID columns of the primary XML index.


PROPERTY

Creates a PROPERTY secondary XML index on the PK, HID, VALUE, and XID columns of the primary XML index and includes the LVALUE and LVALUEBIN columns.


<xml_index_option> ::=

Options used to create the XML index, where:


PAD_INDEX = { ON | OFF }

Specifies index padding. If PAD_INDEX is ON, the FILLFACTOR is used to compute the padding (free space) applied to intermediate-level pages of the index. The default is OFF.


FILLFACTOR

Specifies the percentage of free space, as an integer value from 1 to 100, that should be left in the leaf level of each index page during index creation or change.


SORT_IN_TEMPDB = { ON | OFF }

Specifies whether to store sort results in tempdb. The default is OFF, meaning that intermediate sort results are stored in the same database as the index.


STATISTICS_NORECOMPUTE = { ON | OFF }

Specifies whether out-of-date distribution statistics are automatically recomputed. The default is OFF, enabling automatic statistics updating.


DROP_EXISTING = { ON | OFF }

Specifies whether the existing XML index is automatically dropped and rebuilt. The default is OFF, meaning that an error is returned if the specified index name already exists.


ALLOW_ROW_LOCKS = { ON | OFF }

Specifies whether row locks are allowed when accessing the index. The default is ON, meaning that row locks are allowed.


ALLOW_PAGE_LOCKS = { ON | OFF }

Specifies whether page locks are allowed when accessing the index. The default is ON, meaning that page locks are allowed.


MAX_DOP

Overrides the maximum degree of parallelism for the duration of the index operation by limiting the number of processors used in a parallel plan execution. Parallel indexing operations and parallel query processing are available only in SQL Server 2005 Enterprise Edition.

A table must have a clustered primary key with less than 16 columns in it before a primary XML index can be created.

The following example creates a primary XML index on the xmlCol xml data type column in the xmlTable table created in the "Creating xml Data Type Columns and Variables" section earlier in this chapter:

    CREATE PRIMARY XML INDEX xmlColIndex
    ON xmlTable(xmlCol)
 

The following example creates a secondary VALUE index on the xmlCol column:

    CREATE XML INDEX xmlColValueIndex
    ON xmlTable(xmlCol)
    USING XML INDEX xmlColIndex
    FOR VALUE
 

7.5.2. Altering an XML Index

The ALTER INDEX statement is used to modify an existing XML index created using the CREATE INDEX statement. The syntax is:

    ALTER INDEX { index_name | ALL }
        ON <object>
        { REBUILD
            [ WITH ( <rebuild_index_option> [ ,...n ] ) ]
        | DISABLE
        | SET ( <set_index_option> [ ,...n ] )
        }
    [ ; ]

    <object> ::=
    {
        [ database_name. [ schema_name ] . | schema_name. ]
            table_or_view_name
    }

    <rebuild_index_option > ::=
    {
        PAD_INDEX = { ON | OFF }
      | FILLFACTOR = fillfactor
      | SORT_IN_TEMPDB = { ON | OFF }
      | STATISTICS_NORECOMPUTE = { ON | OFF }
      | ALLOW_ROW_LOCKS = { ON | OFF }
      | ALLOW_PAGE_LOCKS = { ON | OFF }
      | MAXDOP = max_degree_of_parallelism
    }

    <set_index_option>::=
    {
        ALLOW_ROW_LOCKS= { ON | OFF }
      | ALLOW_PAGE_LOCKS = { ON | OFF }
      | STATISTICS_NORECOMPUTE = { ON | OFF }
    }
 

The arguments are described in the "Creating an XML Index" subsection earlier in this section. You need supply arguments only for index characteristics that you are changing.

7.5.3. Dropping an XML Index

The DROP INDEX statement is used to remove one or more XML indexes from the database. The syntax is:

    DROP INDEX
    { index_name ON <object> [ ,...n ] }

    <object> ::=
    {
        [ database_name. [ schema_name ] . | schema_name. ]
            table_or_view_name
    }
 

The arguments are described in the "Creating an XML Index" subsection earlier in this section.

The following example drops the secondary value index created on the xmlTable table in the "Creating an XML Index" subsection earlier in this section:

    DROP INDEX xmlColValueIndex ON xmlTable
 

7.5.4. Viewing XML Indexes

The xml_indexes catalog view returns information about primary and secondary XML indexes in a database. The following query returns the XML indexes on the Individual table in AdventureWorks:

    USE AdventureWorks

    SELECT o.name TableName, xi.*
    FROM sys.xml_indexes xi
         JOIN sys.objects o ON xi.object_id = o.object_id
    WHERE o.name = 'Individual'
 

Partial results are shown in Figure 7-11.

Figure 7-11. Results for viewing XML indexes example


In the example, the sys.xml_indexes catalog view is joined to the sys.objects catalog view to return as the first column in the result set the table that the index belongs to.