Using OPENXML to Read XML Data


  • Import and manipulate data using OPENXML .

The OPENXML function works similarly to the way OPENROWSET and OPENQUERY work, in that it returns a rowset that can be used in a SELECT statement. However, there is some additional up-front work you need to do to set it up. Here's an overview of how the process works:

  1. Use the sp_xml_preparedocument system stored procedure to create a document handle.

  2. Use the OPENXML statement to define the format of the XML document and return the rowset.

  3. Use the sp_xml_removedocument system stored procedure to destroy the document handle.

The first thing to do is look at what sp_xml_preparedocument and sp_xml_removedocument do. A discussion on the OPENXML statement then follows .

Preparing and Removing an XML Document

To prepare an XML document, you need to read the document into a variable and call the sp_xml_preparedocument stored procedure. First, you somehow need to get the document into a T-SQL variable. Then you call the stored procedure, which returns a handle to the parsed document, which can be used by the OPENXML function. Here's an example.

[View full width]
 
[View full width]
DECLARE @hdoc INT EXEC sp_xml_preparedocument @hdoc OUTPUT, '<Person PersonID="9" FirstName="Shelly" graphics/ccc.gif LastName="Alexander"> <Sales QtyPurchased="5"/> <Sales QtyPurchased="10"/> </Person>' EXEC sp_xml_removedocument @hdoc

The first thing you'll notice is a DECLARE statement. This is a variable declaration, and it's covered in more depth in Chapter 6. The @hdoc is a variable that holds a temporary value so you can use it later, in the sp_xml_removedocument as well as in the OPENXML that's coming up.

The next thing to note is that an sp_xml_preparedocument and sp_xml_removedocument are paired up. This is critical. Whenever you prepare a document, it takes up memory resources in SQL Server's cache memory, up to one eighth of the total memory available (12.5 percent, exactly). So, if you prepare documents and don't remove them, you'll end up with a memory leakmemory that you can't access but that is in use. This is known as a "bad thing" in computer science circles. The only way to recover the memory without the original handle is to restart SQL Server.

The sp_xml_preparedocument stored procedure has one optional argument: the xpath_namespace argument, which is used to specify an alternate namespace for the document. By default, the system uses the default namespace <root xmlns:mp="urn:schemas-microsoft-com:xml-metaprop"> . If you specify an alternate namespace string, it replaces the "run:schemas-microsoft-com:xml-metaprop" with whatever you specify.

Using OPENXML

The OPENXML function has three parameters and an optional WITH clause. It takes the document handle, which is returned by the sp_xml_preparedocument procedure, the rowpattern, which specifies which rows to return, and a single byte value that can specify flags.

Here's an expansion of the preceding example:

 DECLARE @hdoc INT EXEC sp_xml_preparedocument @hdoc OUTPUT, '<Person PersonID="9" FirstName="Shelly" LastName="Alexander">        <Sales QtyPurchased="5"/>        <Sales QtyPurchased="10"/> </Person>' select * from openxml(@hdoc, '*') EXEC sp_xml_removedocument @hdoc 

This returns a nearly useless recordset, because it isn't formatted. Here's a little bit of the recordset:

 id   parentid nodetype    localname prefix namespaceuri datatype prev text 0    NULL     1           Person    NULL  NULL         NULL     NULL NULL 2    0        2           PersonID  NULL  NULL         NULL     NULL NULL 9    2        3           #text     NULL  NULL         NULL     NULL 9 

Not very easy to work with. That's why there's a rowset filter, like this:

 SELECT * FROM openxml(@hdoc, '/Person', 1) 

This rowset filter returns another nearly useless rowset, but at least it's a shorter, nearly useless rowset; it contains only the attributes about the person. Now, if you could just put the rows where they need to be, you'd have it made:

 SELECT * FROM openxml(@hdoc, '/Person', 1)               WITH (FirstName varchar(30),                      LastName varchar(30)) 

That WITH clause does the trick, and you finally get decent output.

 FirstName                      LastName ------------------------------ --------------------------- Shelly                         Alexander (1 row(s) affected) 

So, now you have the data you want, extracted from an XML rowset. Using the WITH clause is basically the same syntax as laying out the columns in a table: the column name, some space, the data type, a comma, and then the next column name .

IN THE FIELD: XML PARSERS

If the OPENXML stuff looks like it's extremely cumbersome to deal with, there's a good reason for it. It is extremely cumbersome. You can't read the XML in from a file very easily; you have to spend a huge amount of time fighting with arcane bit-field flags, and you get to completely reformat your data using a WITH option. And you do all that just to get a few rows out, because you can't declare a variable of type TEXT , so you can hold only about 8KB of XML in SQL Server at a time.

Learn this stuff for the test. If you are ever involved in a project that requires you to import XML, use any of about five readily available scripting languages (Perl, VBScript, Java, Python, and C# come to mind, and there are probably dozens more), parse the XML using the already written, elegant, and useful tools in those languagestools that are specifically designed to parse XMLand have the scripts write out nice, comma- (or something) delimited text. You'll learn how to import that in the next section. That way, you don't have to worry about getting memory leaks in SQL Server, and you don't need to be concerned about running this a thousand times to get all your data in 8KB at a time.

Well, the syntax stays the same until you decide you don't want a column called FirstName ; you'd rather have columns called FName and LName . So, now what?

 SELECT * FROM openxml(@hdoc, '/Person', 1)               WITH (FName varchar(30) '@FirstName',                      LName varchar(30) '@LastName'               ) 

Now you end up with a nice, clean rowset. You need to look at one more thing for the exam. Imagine that you want to output something farther up the tree from where you specify the rowpattern.

You can enter the following if you want to use the rowpattern '/Person/Sales' to return sales information, and show the first name and last name for each sale:

 SELECT * FROM openxml(@hdoc, '/Person/Sales', 1)               WITH (FName varchar(30) '../@FirstName',                     LName varchar(30) '../@LastName',                     QtyPurchased int '@QtyPurchased'               ) 

Notice the use of the " .. " syntax in specifying the output of the XML file. This SELECT statement uses the /Person/Sales as the default level, so any value that exists at that level can be specified by just the name of the value. Here's the output:

 FName                          LName                          QtyPurchased ------------------------------ ------------------------------ -------------- Shelly                         Alexander                      5 Shelly                         Alexander                      10 

Anything above or below the default level, /Person/Sales , has to be qualified with a path , which is what the " .. " syntax represents: a path representing the level above the default, in this case /Person . Here's another example:

 select * from openxml(@hdoc, '/Person', 1)               WITH (FName varchar(30) '@FirstName',                      LName varchar(30) '@LastName',                      QtyPurchased int 'Sales/@QtyPurchased'               ) 

EXAM TIP

Pathing XML Pathing XML is very likely to be on your exam because it's important to understand how it works if you're going to use OPENXML() , and because Microsoft is very proud of the new XML features in SQL Server 2000.


This shows a different syntax, and provides a different result. The preceding example returned every person and sale. This example returns only the first sale:

 FName                          LName                          QtyPurchased ------------------------------ ------------------------------ ------------ Shelly                         Alexander                      5 

It returns only the first sale because it's returning one row for each default level, which is the /Person level in this case.

So, now you know how to export data to XML format, which is a pretty useful thing. You also should have a good handle on how to translate data from XML into a rowset, which is marginally useful. So now it's time to move real data in and out of SQL Server.

REVIEW BREAK: XML in Review

XML is a feature that is very well covered by the exam, so be sure to use the examples here and get a good understanding of OPENXML and how the path syntax works.

  • XML is a document format that can be used to transfer hierarchical data between systems.

  • XML documents can be created with the FOR XML clause of the SELECT statement. This clause provides several options for formatting XML output.

  • You can create rowsets from XML documents using the OPENXML statement.

Spend some time typing in the examples from this chapter, or use the slightly more complex examples found in Books Online to get a complete idea of how the OPENXML and FOR XML ideas really work.



MCSE Training Guide (70-229). Designing and Implementing Databases with MicrosoftR SQL ServerT 2000 Enterprise Edition
MCSE Training Guide (70-229). Designing and Implementing Databases with MicrosoftR SQL ServerT 2000 Enterprise Edition
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 228

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net