Using OPENXML


OPENXML can be utilized whenever a rowset provider is necessary, such as a table or view.This includes SELECT, SELECT INTO, and OPENROWSET.

A preliminary operation must be accomplished, however, before OPENXML can be used to write queries. sp_xml_preparedocument is a SQL Server internal stored procedure that takes an XML document and parses it into a memory representation of the XML document tree. I say that this is an internal stored procedure because you can't view it in the normal manner of viewing a stored procedure. If you try to view the procedure using SP_HELPTEXT, you'll see what is displayed in Figure 8.1.

Figure 8.1. Attempting to view sp_xml_preparedocument .

graphics/08fig01.gif

sp_xml_preparedocument returns a file identifier ( handle ) after parsing it in memory. OPENXML takes this handle and, through a process known as shredding , provides a rowset representation of the data based on what is passed to it.

After processing is complete, another internal stored procedure must be called to flush the parsed document from memory. This procedure is sp_xml_removedocument . The entire process is diagrammed in Figure 8.2.

Figure 8.2. The OPENXML process.

graphics/08fig02.gif

A parsed document is stored in internal cache. The MSXML parser uses one- eighth the total memory available for SQL Server. This means that if you have a SQL Server with 320MB of memory available, the MSXML parser will use 40MB of this memory. To avoid overutilization of RAM, run sp_xml_removedocument to free up the memory as soon as possible.

OPENXML Syntax

A sample OPENXML statement looks similar to this:

 FROM OPENXML (@idoc, '/ROOT/Employee', 1)        WITH (EmployeeID varchar(4), LastName varchar(20), FirstName varchar(10)) 

We'll be examining this new keyword with detailed examples shortly, but for now, here is the syntax specification for OPENXML:

 OPENXML ( handle, 'rowpattern', [flags] ) [WITH SchemaDeclaration] 

The various arguments of this function are described in the following sections.

handle

This is the handle returned by sp_xml_preparedocument that points to the parsed XML document in SQL Server memory. It serves the identical function as a file handle in other programming languages.

This handle is valid for the duration of the connection to SQL Server, until the connection is reset, or until sp_xml_removedocument is executed, thereby dropping the handle and deleting the document from memory.

rowpattern

This is the familiar XPath expression that we've been using for the majority of this book. For every node identified by the XPath expression there is a row generated by OPENXML in the rowset. Because XPath can identify any type of node (attribute, text, or otherwise ), if rowpattern ends in a text node, a row is generated for each text node identified by rowpattern .

SchemaDeclaration

OPENXML requires a custom schema created in the OPENXML statement to specify rowset structure. The WITH clause can be used to accomplish this, but there are a couple of ways of doing it. Here are our options:

  • Use the WITH clause and specify the entire schema.

    If you specify the schema, you specify column names , their data types, and how they map to the XML document. Here is how ColPattern influences the schema declaration:

    With ColPattern in the schema declaration, a rowset column is mapped to the node identified by the XPath expression ( rowpattern ). If you specify ColPattern on a column, it overrides the flags parameter (explained in the next section), meaning the schema dictates how the nodes correspond to columns regardless of whether flags states element-centric or attribute-centric mode.

    Here's an example:

     SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee/Order/OrderDetail', 2)        WITH (OrderID int             '../@OrderID',              CustomerID nvarchar(10) '../@CustomerID',              Lastname nvarchar(10)   './../@LastName',  ... 

    Without ColPattern in the schema declaration, mappings between rowset columns and XML nodes follow name correspondence based on the flags parameter setting.

    Here's an example:

     ...  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee', 1)        WITH (EmployeeID varchar(4),              LastName varchar(20),              FirstName varchar(10))  ... 

    If you're still a little confused about this, examples will help clear it up. There are several complete examples in the section "OPENXML Examples" later in this chapter. There is one for each of the situations enumerated in the preceding .

    Remember, a schema is required to lay out the rowset structure when using the WITH clause.

  • Use an existing table identified in the WITH clause.

    Simply specify an existing table with schema that OPENXML will utilize to generate the rowset.

    Here's an example of this:

     ...  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee/Order',1)        WITH Table1 
  • Leave the WITH clause out.

    Without the WITH clause, the rowset is returned in the edge table format, so-called because every edge of the XML tree maps to a rowset row. The edge table format can offer several advantages that you would not normally have. It is possible to obtain the data type of an element/attribute, node types, namespaces, and other values, as listed in Table 8.1. In essence, wherever you would use an XML parser to obtain document information, you can use an edge table.

    Here is an example:

     ...  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee')  ... 
Table 8.1. The Structure of the Edge Table

Column Name

Data Type

Description

id

Bigint

The unique ID of the document node.

The ROOT element has an ID value of 0, and negative ID values are reserved.

parentid

Bigint

Identifies the node's parent. The parent identified by this ID is not necessarily an element because it depends on the nodetype of the node whose parent is identified by this ID. For example, if the node is a text node, its parent could be an attribute node.

The top-level node has a parentid of NULL. (Sound familiar?)

nodetype

Int

An integer that corresponds to the XML DOM nodetype numbering system. (See DOM documentation for node information at http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/level-one- core .html).

The DOM nodetypes are:

ELEMENT_NODE = 1 ATTRIBUTE_NODE = 2

TEXT_NODE = 3 CDATA_SECTION_NODE = 4

ENTITY_REFERENCE_NODE = 5 ENTITY_NODE = 6

PROCESSING_INSTRUCTION_NODE = 7 COMMENT_NODE = 8

DOCUMENT_NODE = 9 DOCUMENT_TYPE_NODE = 10

DOCUMENT_FRAGMENT_NODE = 11 NOTATION_NODE = 12

localname

nvarchar

The local name of the element or attribute that is NULL if the DOM object does not have a name.

prefix

nvarchar

The namespace prefix of the node name.

namespaceuri

nvarchar

The namespace URI of the node. A NULL value indicates that no namespace is present.

datatype

nvarchar

The actual data type of the element or attribute row. It is NULL if otherwise specified. The data type is inferred from the inline DTD or from the inline schema.

prev

Bigint

The XML ID of the previous sibling element. It is NULL if there is no direct previous sibling.

text

Ntext

The attribute value or the element content in text form (or NULL if the edge table entry does not need a value).

flags

The flags argument specifies whether the mapping between the rowset columns and the XML document nodes will be element-centric (elements nested within other elements), attribute-centric (elements with data expressed as attribute values), or a mixture of both. Flags can have the following three values:

  • 1, which specifies attribute-centric mapping

  • 2, which specifies element-centric mapping

  • 3, which specifies a combination of both attribute- and element-centric mapping

In addition to the flags argument, there is another way to define mappings. The ColPattern parameter, which is actually specified in the SchemaDeclaration through use of the WITH clause, also can be used. If it is used, it overrides the setting of the flags argument.

ColPattern is used under the following two circumstances:

  • If the element/attribute name in the rowset has a different name than the column to which it is mapped, then ColPattern identified the correct column.

  • When mapping metaproperty attributes to columns, ColPattern identifies the mapping between the metaproperty and the proper column. ( Metaproperties are discussed in more detail later in this chapter.)

Both flags and ColPattern are optional parameters and, if missing, cause the default value of attribute-centric mapping to be used.

OPENXML Examples

This section contains a lot of examples demonstrating the uses of OPENXML. We'll be utilizing stored procedures here almost exclusively.

Let's start with a simple SELECT statement and OPENXML. Here we'll use the

<Employee> , <Order> , and <OrderDetail> elements and return a rowset consisting of EmployeeID , LastName , and FirstName . We'll perform the following steps:

  1. Declare our XML document.

  2. Call the sp_xml_preparedocument stored procedure to parse the document into memory.

  3. Specify the flags parameter, which in this case is 1 (attribute-centric), so that attributes map to rowset columns.

  4. Specify the rowpattern (XPath expression) to identify the employee nodes to process.

Listing 8.1 contains the stored procedure we'll use in this example.

In this example, ColPattern need not be specified because the rowset column names correspond to the XML attribute names.

Listing 8.1 Simple Select Stored Procedure
 DECLARE @idoc int  DECLARE @xmldoc varchar(1000)  SET @xmldoc ='  <ROOT>  <Employee  EmployeeID="5" LastName="Buchanan" FirstName="Steven"  >     <Order OrderID="10248" CustomerID="VINET" EmployeeID="5"            OrderDate="1996-07-04T00:00:00">         <OrderDetail ProductID="42" Quantity="10"/>         <OrderDetail ProductID="72" Quantity="5"/>     </Order>  </Employee>  <Employee  EmployeeID="7" LastName="King" FirstName="Robert"  >     <Order OrderID="10303" CustomerID="GODOS" EmployeeID="7"            OrderDate="1996-09-11T00:00:00">         <OrderDetail ProductID="40" Quantity="40" />     </Order>  </Employee>  </ROOT>'  -- Create an in memory representation of the document.  EXEC sp_xml_preparedocument @idoc OUTPUT, @xmldoc  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee', 1)        WITH (EmployeeID varchar(4),              LastName varchar(20),              FirstName varchar(10))  EXEC sp_xml_removedocument @idoc 

The following example shows the results of the simple select stored procedure in Listing 8.1.

 EmployeeID LastName             FirstName  ---------- -------------------- ---------- 5          Buchanan             Steven  7          King                 Robert 

That was an easy one, but it represents the essence of this entire chapter. We've taken an XML document and turned it into a rowset data provider, which lets us use it in any situation that requires data (table or view, for example, remember?).

Let's modify the procedure slightly by changing the flags parameter to 2 (element-centric) and seeing what happens.The line and value to change is as follows :

 FROM OPENXML (@idoc, '/ROOT/Employee', 2) 

Refer to the section "OPENXML Syntax" at the beginning of the chapter, if necessary, to understand the layout of the parameters.

The following example shows the results of changing flags to element-centric.

 EmployeeID LastName             FirstName  ---------- -------------------- ---------- NULL       NULL                 NULL  NULL       NULL                 NULL 

Not quite what you expected? EmployeeID , LastName , and FirstName are attributes, not child elements, so when you specified element-centric mapping, you were referring to nonexistent elements. That's why the NULLs came back.

To get the element-centric mapping to work ( flags = 2 ), you'll need to define your elements similar to the XML document defined in the stored procedure in Listing 8.2. The results obtained from this procedure are identical to the results of the simple select stored procedure in Listing 8.1.

Listing 8.2 Simple Select with Element Orientation
 DECLARE @idoc int  DECLARE @xmldoc varchar(1000)  SET @xmldoc ='  <ROOT>  <Employee>  <EmployeeID>5</EmployeeID>   <LastName>Buchanan</LastName>   <FirstName>Steven</FirstName>  <Order OrderID="10248" CustomerID="VINET" EmployeeID="5"            OrderDate="1996-07-04T00:00:00">        <OrderDetail ProductID="42" Quantity="10"/>        <OrderDetail ProductID="72" Quantity="5"/>     </Order>  </Employee>  <Employee>  <EmployeeID>7</EmployeeID>   <LastName>King</LastName>   <FirstName>Robert</FirstName>  <Order OrderID="10303" CustomerID="GODOS" EmployeeID="7"            OrderDate="1996-09-11T00:00:00">        <OrderDetail ProductID="40" Quantity="40"/>     </Order>  </Employee>  </ROOT>'  -- Create an in memory representation of the document.  EXEC sp_xml_preparedocument @idoc OUTPUT, @xmldoc  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee', 2)        WITH (EmployeeID varchar(4),              LastName varchar(20),              FirstName varchar(10))  EXEC sp_xml_removedocument @idoc 

Now let's see what the ColPattern specification in the WITH clause will do for us. Take a look at Listing 8.3. The results of this stored procedure are given in Listing 8.4.

Listing 8.3 Stored Procedure Utilizing the ColPattern Parameter
 DECLARE @idoc int  DECLARE @xmldoc varchar(1000)  SET @xmldoc ='  <ROOT>  <Employee EmployeeID="5" LastName="Buchanan" FirstName="Steven">     <Order OrderID="10248" CustomerID="VINET" EmployeeID="5"            OrderDate="1996-07-04T00:00:00">        <OrderDetail ProductID="42" Quantity="10"/>        <OrderDetail ProductID="72" Quantity="5"/>     </Order>  </Employee>  <Employee EmployeeID="7" LastName="King" FirstName="Robert">     <Order OrderID="10303" CustomerID="GODOS" EmployeeID="7"            OrderDate="1996-09-11T00:00:00">        <OrderDetail ProductID="40" Quantity="40" />     </Order>  </Employee>  </ROOT>'  -- Create an in memory representation of the document.  EXEC sp_xml_preparedocument @idoc OUTPUT, @xmldoc  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee/Order/OrderDetail', 2)   WITH (OrderID    int            '../@OrderID',   CustomerID nvarchar(10)   '../@CustomerID',   Lastname   nvarchar(10)   '../../@LastName',   FirstName  nvarchar(10)   '/Employee/@FirstName',   ProductID  int            '@ProductID',   Quant      int            '@Quantity')  EXEC sp_xml_removedocument @idoc 
Listing 8.4 Results of the ColPattern Parameter in a Stored Procedure
 OrderID    CustomerID Lastname   FirstName   ProductID   Quant  ----------- ---------- ---------- ---------- ----------- ----------- 10248       VINET      Buchanan   NULL       42          10  10248       VINET      Buchanan   NULL       72          5  10303       GODOS      King       NULL       40          40 

Let's look at some of the important points of this procedure.

  • First, the flags parameter is set to 2 (element-centric), but the elements in our XML document are not properly nested to accommodate this setting.This shouldn't have worked but it did. That's because, as stated in the " SchemaDeclaration" section, if you use ColPattern in the schema declaration, it will map a rowset column to the node identified by the XPath expression ( rowPattern ). In this case the ColPattern maps to attributes and overrides the flags setting.

  • Second, rowPattern maps the query to the OrderDetail elements, so ColPattern maps the attributes to different elements in the XML document relative to this. OrderID and CustomerID are mapped to the Order element. LastName and FirstName are mapped to the Employee element in two different XPath expressions. Finally, ProductID and Quantity map to the OrderDetail element.

  • In the next example, we'll combine both attribute-centric and element-centric mappings.

  • Listing 8.5 is a stored procedure that has the flags parameter set to 3 , indicating both attribute-centric and element-centric mappings. In this case, attribute-centric mapping is applied first, followed by element-centric mapping for columns not yet handled.

Listing 8.5 A Stored Procedure with Both Types of Mappings
 DECLARE @idoc int  DECLARE @xmldoc varchar(1000)  SET @xmldoc ='  <ROOT>  <Employee EmployeeID="5">     <LastName>Buchanan</LastName>     <FirstName>Steven</FirstName>     <Order OrderID="10248" CustomerID="VINET" EmployeeID="5"            OrderDate="1996-07-04T00:00:00">        <OrderDetail ProductID="42" Quantity="10"/>        <OrderDetail ProductID="72" Quantity="5"/>     </Order>  </Employee>  <Employee EmployeeID="7">     <LastName>King</LastName>     <FirstName>Robert</FirstName>     <Order OrderID="10303" CustomerID="GODOS" EmployeeID="7"            OrderDate="1996-09-11T00:00:00">       <OrderDetail ProductID="40" Quantity="40"/>     </Order>  </Employee>  </ROOT>'  -- Create an in memory representation of the document.  EXEC sp_xml_preparedocument @idoc OUTPUT, @xmldoc  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee', 3)        WITH (EmployeeID int ,              LastName   nvarchar(10),              FirstName  nvarchar(10))  EXEC sp_xml_removedocument @idoc 

The results of the stored procedure in Listing 8.5 are as follows:

 EmployeeID  LastName   FirstName  ----------- ---------- ---------- 5           Buchanan   Steven  7           King       Robert 

Here, attribute-centric mapping is applied first for EmployeeID , and then element-centric mapping is applied because LastName and FirstName are left over.

Retrieving a text node is not much different from the types of data retrievals we've done so far.We'll utilize the XPath function text( ) in the ColPattern specification. Look at the stored procedure in Listing 8.6.We specify a column named Remark that extracts the text node associated with the Order element from the XML document.

Listing 8.6 Stored Procedure Containing Text Nodes
 DECLARE @idoc int  DECLARE @xmldoc varchar(1000)  SET @xmldoc ='  <ROOT>  <Employee EmployeeID="5">     <LastName>Buchanan</LastName>     <FirstName>Steven</FirstName>     <Order OrderID="10248" CustomerID="VINET" EmployeeID="5"            OrderDate="1996-07-04T00:00:00">        <OrderDetail ProductID="42" Quantity="10"/>        <OrderDetail ProductID="72" Quantity="5"/>Customer returned merchandise     </Order>  </Employee>  <Employee EmployeeID="7">     <LastName>King</LastName>     <FirstName>Robert</FirstName>     <Order OrderID="10303" CustomerID="GODOS" EmployeeID="7"            OrderDate="1996-09-11T00:00:00" note="Almost out of inventory">        <Priority>First Class</Priority>        <OrderDetail ProductID="40" Quantity="40"/>Customer needs this ASAP     </Order>  </Employee>  </ROOT>'  EXEC sp_xml_preparedocument @idoc OUTPUT, @xmldoc  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee/Order', 1)        WITH (OrderID    int ,              EmployeeID nvarchar(10),              Remark     ntext 'text( )')  EXEC sp_xml_removedocument @idoc 

The following example shows the results of ColPattern and the text( ) function in Listing 8.6.

 OrderID     EmployeeID Remark  ----------- ---------- ------ 10248       5          Customer returned merchandise  10303       7          Customer needs this ASAP 

Notice that only the text nodes of the Order element are retrieved and placed in the results. All other text comments are ignored.

In the section, "SchemaDeclaration" I said that an existing table could be specified in the OPENXML clause that would dictate the schema of the result. Let's see how that works. Listing 8.7 illustrates a stored procedure that does exactly what we're talking about. Because we have to have a preexisting table, the first line in the procedure will create one for us to utilize in the OPENXML clause.

Listing 8.7 Specifying a Table Name in the OPENXML Clause
 --create a test table  create table Table1(OrderID int, OrderDate datetime, EmployeeID int)  DECLARE @idoc int  DECLARE @xmldoc varchar(1000)  SET @xmldoc ='  <ROOT>  <Employee EmployeeID="5">     <Order OrderID="10248" CustomerID="VINET" EmployeeID="5"            OrderDate="1996-07-04T00:00:00">       <OrderDetail ProductID="42" Quantity="10"/>       <OrderDetail ProductID="72" Quantity="5"/>Customer returned merchandise     </Order>  </Employee>  <Employee EmployeeID="7">     <Order OrderID="10303" CustomerID="GODOS" EmployeeID="7"            OrderDate="1996-09-11T00:00:00" note="Almost out of inventory">        <Priority>First Class</Priority>        <OrderDetail ProductID="40" Quantity="40" />Customer needs this ASAP     </Order>  </Employee>  </ROOT>'  -- Create an in memory representation of the document.  EXEC sp_xml_preparedocument @idoc OUTPUT, @xmldoc  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee/Order', 1)        WITH Table1  EXEC sp_xml_removedocument @idoc 

The following example shows the results of using a table instead of a SchemaDescription in Listing 8.7.

 OrderID     OrderDate                                          EmployeeID  ----------- ------------------------------------------------- ----------- 10248       1996-07-04 00:00:00.000                                     5  10303       1996-09-11 00:00:00.000                                     7 

One thing to remember here is that you can't utilize the ColPattern parameter in this situation.You are limited to the structure of the table called by the OPENXML clause.

Let's take a look at the edge table format we talked about earlier. Remember that an edge table format is produced when the WITH clause is left out of the OPENXML clause. This example will be a little different than the examples to this point. Listing 8.8 shows the procedure that illustrates the generation of an edge table. This procedure includes two edge table queries toward the bottom of the procedure. The results of the two queries are given in Listing 8.9.

Listing 8.8 Generating an Edge Table
 DECLARE @idoc int  DECLARE @xmldoc varchar(1000)  SET @xmldoc ='  <ROOT>  <Employee EmployeeID="5">     <Order OrderID="10248" CustomerID="VINET" EmployeeID="5"            OrderDate="1996-07-04T00:00:00">        <OrderDetail ProductID="42" Quantity="10"/>        <OrderDetail ProductID="72" Quantity="5"/>     </Order>  </Employee>  <Employee EmployeeID="7">     <Order OrderID="10303" CustomerID="GODOS" EmployeeID="7"            OrderDate="1996-09-11T00:00:00">        <OrderDetail ProductID="40" Quantity="40" />     </Order>  </Employee>  </ROOT>'  -- Create an in memory representation of the document.  EXEC sp_xml_preparedocument @idoc OUTPUT, @xmldoc  SELECT *  FROM OPENXML (@idoc, '/ROOT/Employee')  ---------- query the edge table ------------------------------------------------- SELECT count(*)           FROM OPENXML(@idoc, '/') where localname = 'EmployeeID'  SELECT DISTINCT localname FROM OPENXML(@idoc, '/') where nodetype = 2  -------------------------------------------------------------------------------- EXEC sp_xml_removedocument @idoc 
Listing 8.9 Results of Querying an Edge Table
 4  (1 row(s) affected)  localname  ---------------------------- CustomerID  EmployeeID  OrderDate  OrderID  ProductID  Quantity  (6 row(s) affected). 

Notice the different way of querying an edge table. The OPENXML expression is specified directly in the WHERE clause.

Next up is a discussion of metaproperties, which enable us to obtain more information about a document than is normally obtainable from just the textual representation of the XML document.



XML and SQL Server 2000
XML and SQL Server 2000
ISBN: 0735711127
EAN: 2147483647
Year: 2005
Pages: 104
Authors: John Griffin

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net