SQL Server 2005 defines a new data type called xml. This data type can be the data type for a column when you create a table. It can also be the data type for a parameter that is passed to a stored procedure or a function. It works in much the same way as the other built-in data types in SQL Server. An XML instance is the value assigned to a variable, parameter, or column value that has an xml data type.
When XML data comes into SQL Server, SQL Server parses it to ensure that it is XML data, based on the SQL ANSI 2003 standard of an XML data type. This means the XML can be an XML document or a fragment containing more than one top-level element, and it can even have top-level text.
After parsing, the data is placed into the xml data type, which provides a logical view of the XML data that has been parsed into binary XML representation of the XQuery data model. The binary XML is used by SQL Server to provide efficient processing of queries.
You can also place a set of XML schemas into the SQL Server Schema Collection. The Schema Collection is a metadata object that is used to manage schema validation information, which can be associated with an xml data type to provide automatic validation of the type. An xml column that has no schema assigned is called an "untyped" xml column; an xml column that has a schema assigned is called a "typed" xml column, as shown in Figure 15-1.
Figure 15-1: Untyped vs. typed XML data storage to the xml data type
The Schema Collection contains XML schemas that might be related using <xs:import> or might be unrelated. Each typed XML instance specifies the target namespace from the Schema Collection it conforms to. The database engine validates the instance according to its schema during data assignment and modification. To provide efficient processing of typed XML data, the schema information is also used in storage and query optimizations.
You can easily query for the complete XML data instance in the xml column, but the xml data type has five methods that are used to retrieve and modify XML data:
query Extracts parts of an XML instance. An XQuery expression is provided that evaluates to a list of XML nodes. The complete subtree root of these nodes is returned in document order. The result type is untyped xml.
value Extracts a scalar value from an XML instance. An XQuery expression is provided that evaluates to a node value. This value is cast to the T-SQL type specified as the second argument of the value method.
exist Provides an existential check on an XML instance. An XQuery expression is provided that returns 1 if the expression evaluates to a non-null node list; otherwise, it returns 0.
nodes Yields instances of a special xml data type, each with its context set to a different node than the XQuery expression evaluates to. The special xml data type supports query, value, nodes, and exist methods and can be used in count(*) aggregations and NULL checks. All other uses result in an error.
modify Permits modifying parts of an XML instance, such as adding or deleting subtrees or replacing scalar values (such as replacing the inventory count of a widget from 5 to 4).
Executing a query processes each XML instance at runtime which can be expensive if the XML instance values are large or the query needs to be evaluated on many rows in a table. In these situations, you can increase performance by implementing an index.
You can create a primary XML index on an xml column, which creates an index on all tags, values, and paths of the XML instances in the xml column. The primary XML index provides efficient evaluation of queries on XML data, and it can efficiently reassemble an XML result from the index while preserving document order and document structure.
The following secondary XML indexes, shown also in Figure 15-2, can also be created on an xml column to improve performance on different types of queries.
PATH Used for path-based queries.
PROPERTY Used for property bag scenarios. A property bag is a container that can store any value and that is usually implemented as a dictionary, where a name, or key, is used to retrieve or set a value. This is how you locate XML parts when you work with untyped XML documents.
VALUE Used for value-based queries.
Figure 15-2: The primary index must be created before the secondary indexes are created, and these indexes can increase performance when you work with large XML instances.