Extracting Data into XML


  • Extract data in XML format. Considerations include output format and XML schema structure.

XML, the eXtensible Markup Language, is an offshoot of HTML that is designed to provide an application-independent, character-set -independent method of transferring data, especially transaction-oriented data, across systems. If you're familiar with EDI (Electronic Document Interchange), then the concept behind XML should be pretty familiar to you, but the implementation is completely different. In this section, you'll see what XML is, and how to format and extract query results into XML.

If you feel the need to ask what XML is, the best answer is "Go get an XML book." (Look at the "Suggested Reading" section at the end of the chapter.) Basically, though, XML is a subset of SGML, which is a huge standard that is so complex nobody has ever completely implemented it. HTML is also a subset of SGML, but is quite limited; XML is less limited (and more complex) than HTML, but is still far less complex than SGML.

An XML document consists of one or more elements, which are bound between angle brackets <>. The first word that appears inside the angle brackets is the name of the element. The rest of the element consists of element attributes. For example, here's an element:

 <row xmlns="x-schema:#Schema1" QtyPurchased="5" PersonID="9" FirstName="Shelly"             LastName="Alexander"/> 

The name of the element, or the element type, is row . The xmlns property specifies the name of the schema, which is the format that the element will take. The element then has other attributes, such as QtyPurchased , PersonID , FirstName , and LastName , which all have values. The element ends with a forward slash and an angle bracket , indicating the end of the element.

An element can also contain other elements, like this:

 <Person PersonID="1" FirstName="Danny">        <Sales QtyPurchased="4"/>   <Sales QtyPurchased="3"/> </Person> 

In this case, the Person element contains two Sales elements. Notice that on the first line, there isn't a slash before the ending bracket; the slash is actually down on the last line. This is how objects are nested in XML.

To put data into XML format, the SELECT statement includes an operator called FOR XML . This specifies to SQL Server that instead of returning a rowset, it should return an XML document. There are three different options for generating the XML: RAW , AUTO , and EXPLICIT .

Automatic XML Formatting

In AUTO mode, SQL Server returns the rowset in an automatically generated nested XML format. If the query has no joins, it doesn't have a nesting at all. If the query has joins, then it returns the first row from the first table, then all the correlated rows from each joined table as a nested level. For example, this query:

 SELECT Person.PersonID, Person.FirstName, Person.LastName, Sales.QtyPurchased FROM Person        INNER JOIN Sales               ON Sales.PersonID = Person.PersonID FOR XML AUTO 

returns this XML:

 <Person PersonID="1" FirstName="Danny" LastName="Jones">         <Sales QtyPurchased="4"/>         <Sales QtyPurchased="3"/> </Person> <Person PersonID="3" FirstName="Scott" LastName="Smith">         <Sales QtyPurchased="1"/> </Person> <Person PersonID="4" FirstName="Alex" LastName="Riley">         <Sales QtyPurchased="1"/>         <Sales QtyPurchased="1"/> </Person> <Person PersonID="9" FirstName="Shelly" LastName="Alexander">         <Sales QtyPurchased="10"/>         <Sales QtyPurchased="5"/> </Person> 

The actual XML, if you run the query, comes out all on one line as a stream of data. This is what it looks like if you format it to actually be readable. XML, however, doesn't know about linefeeds or making things readable; remember, it's designed for system-to-system traffic without people intervening .

NOTE

Where's the XML ? Unfortunately, Query Analyzer by default shows only the first 256 characters of a string that's returned. That's not going to work for most of the examples in this chapter, because they're larger than 256 characters. To fix Query Analyzer, go to the Tools menu, choose Options, click on the Results tab, and in the Maximum Characters per Column box enter some larger number, like 4096.


As you add other joined tables, the nesting level increases . To figure out how to nest the different fields, SQL Server uses the order in which the columns appear in the SELECT statement. So if you rewrite the preceding SELECT statement as follows :

 SELECT Sales.QtyPurchased, Person.PersonID, Person.FirstName, Person.LastName FROM Person        INNER JOIN Sales               ON Sales.PersonID = Person.PersonID FOR XML AUTO 

You end up with XML that looks like this:

 <Sales QtyPurchased="4">         <Person PersonID="1" FirstName="Danny" LastName="Jones"/> </Sales> <Sales QtyPurchased="3">         <Person PersonID="1" FirstName="Danny" LastName="Jones"/> </Sales> <Sales QtyPurchased="1">         <Person PersonID="3" FirstName="Scott" LastName="Smith"/>         <Person PersonID="4" FirstName="Alex" LastName="Riley"/>         <Person PersonID="4" FirstName="Alex" LastName="Riley"/> </Sales> <Sales QtyPurchased="10">         <Person PersonID="9" FirstName="Shelly" LastName="Alexander"/> </Sales> <Sales QtyPurchased="5">         <Person PersonID="9" FirstName="Shelly" LastName="Alexander"/> </Sales> 

Notice that the nesting here is totally reversed . The easiest way to write queries for XML is to write them with the FOR XML clause left off, make sure that they are returning the data you want, and then add the FOR XML back onto the end of the query. That eliminates the need for a lot of extra formatting.

Minimal Formatting with the RAW Option

What if you don't want XML that is elaborately formatted? Then you can use the RAW mode. RAW mode returns each row as an element with the identifier row . Here's an example of the same query as you just saw, returned in RAW mode:

 <row QtyPurchased="4" PersonID="1" FirstName="Danny" LastName="Jones"/> <row QtyPurchased="3" PersonID="1" FirstName="Danny" LastName="Jones"/> <row QtyPurchased="1" PersonID="3" FirstName="Scott" LastName="Smith"/> <row QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <row QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <row QtyPurchased="10" PersonID="9" FirstName="Shelly" LastName="Alexander"/> <row QtyPurchased="5" PersonID="9" FirstName="Shelly" LastName="Alexander"/> 

The query is written like the one discussed in the section "Automatic XML formatting", but it uses the RAW keyword instead of the AUTO keyword:

 SELECT Sales.QtyPurchased, Person.PersonID, Person.FirstName, Person.LastName FROM Person        INNER JOIN Sales               ON Sales.PersonID = Person.PersonID FOR XML RAW 

If you have data that's stored in text or image format, SQL Server assumes that you don't want it in the XML file. If you do want to put the data into the XML file, you can use the BINARY BASE64 option, like this:

 SELECT Sales.QtyPurchased, Person.PersonID, Person.FirstName, Person.LastName FROM Person        INNER JOIN Sales               ON Sales.PersonID = Person.PersonID FOR XML RAW, BINARY BASE64 

Explicitly Creating an XML Format

Another way to extract data with the FOR XML clause is to use the option EXPLICIT . The EXPLICIT option enables you to specify the format of the XML that will be created. To define the format, you have to alias the first column of output to the name Tag , name the second column Parent , and then each consecutive column has to be aliased to relate it to a specific Tag . The column names after Parent are named like this: [ ElementName!TagNumber!AttributeName! Directive ] , complete with the square brackets around the alias. So, as an example:

 SELECT 1 as TAG,        NULL as Parent,        Sales.QtyPurchased as [Sales!1!QtyPurchased],        Person.PersonID as [Sales!1!PersonID],        Person.FirstName as [Sales!1!FirstName],        Person.LastName as [Sales!1!LastName] FROM Person        INNER JOIN Sales               ON Sales.PersonID = Person.PersonID FOR XML EXPLICIT 

This returns this XML:

 <Sales QtyPurchased="4" PersonID="1" FirstName="Danny" LastName="Jones"/> <Sales QtyPurchased="3" PersonID="1" FirstName="Danny" LastName="Jones"/> <Sales QtyPurchased="1" PersonID="3" FirstName="Scott" LastName="Smith"/> <Sales QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <Sales QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <Sales QtyPurchased="10" PersonID="9" FirstName="Shelly" LastName="Alexander"/> <Sales QtyPurchased="5" PersonID="9" FirstName="Shelly" LastName="Alexander"/> 

To do a nested element, you need to write two queries joined together with a UNION . The first query has a Tag of 1 and a Parent of NULL ; the second has a Tag of 2 and a Parent of 1 . The second query's Parent value relates to the first query's Tag value. So, to nest the preceding query, you do this:

 SELECT 1 as Tag,        NULL as Parent,        Person.PersonID        as [Person!1!PersonID],        Person.FirstName        as [Person!1!FirstName],        Person.LastName        as [Person!1!LastName],        NULL                      AS [Sales!2!QtyPurchased] FROM Person UNION SELECT 2 as Tag,        1 as Parent,        Person.PersonID        AS [Person!1!PersonID],        Person.FirstName        as [Person!1!FirstName],        Person.LastName        as [Person!1!LastName],        Sales.QtyPurchased        as [Sales!1!QtyPurchased] FROM Person        INNER JOIN Sales               ON Sales.PersonID = Person.PersonID ORDER BY [Person!1!PersonID], [Person!1!FirstName], [Person!1!LastName] FOR XML EXPLICIT 

There are a lot of necessary details in this query. First of all, notice that because this is a UNION query, each query has to have the same number of columns. Second, notice that there is an ORDER BY . This is required, or the objects do not nest properly. Here's the result XML:

 <Person PersonID="1" FirstName="Danny" LastName="Jones">        <Sales QtyPurchased="3"/>        <Sales QtyPurchased="4"/> </Person> <Person PersonID="2" FirstName="Melissa" LastName="Jones"/> <Person PersonID="3" FirstName="Scott" LastName="Smith">        <Sales QtyPurchased="1"/> </Person> <Person PersonID="4" FirstName="Alex" LastName="Riley">        <Sales QtyPurchased="1"/> </Person> <Person PersonID="5" FirstName="Chris" LastName="Avery"/> <Person PersonID="6" FirstName="Jennifer" LastName="Avery"/> <Person PersonID="7" FirstName="Bryan" LastName="Decker"/> <Person PersonID="8" FirstName="Robin" LastName="Decker"/> <Person PersonID="9" FirstName="Shelly" LastName="Alexander">        <Sales QtyPurchased="5"/>        <Sales QtyPurchased="10"/> </Person> 

Notice that the result gives all the people, not just the ones with Sales, because there isn't a join in the first SELECT in the union. If you want to have only people who had purchased things in the output, you can restrict the first query to dump those folks out.

Table 5.2 shows the syntax of the directives you can use as the fourth part of a column alias. The directives help you handle data in different ways, and they are optional.

Table 5.2. Directives for Use in XML Column Aliases

Directive

Description

  ID  

This is used to define an element as an anchor for a referral; other objects can then refer back to this one with IDREF and IDREFS . If you're not using the XMLDATA option, this option doesn't do anything.

IDREF

This links the element with another element that is specified with the ID directive. Once again, if you're not using the XMLDATA option, this doesn't do anything.

IDREFS

Same thing as IDREF .

hide

This specifies that the attribute should not be displayed. This is handy for creating an element that you want to sort on, but you don't necessarily want to have in the resultset.

element

This forces the creation of a new element with the name specified in the alias and the data coming from the column data.

xml

This directive specifies that the data in the column is already XML, and shouldn't be parsed. It can only be used with the hide directive.

xmltext

The column is wrapped up into a tag that is stuck into the document. It also can only be used with the hide directive.

cdata

This wraps the column data in a CDATA section with no encoding at all. Can be used with only string and text types, and it can be used with only the hide directive.

The EXPLICIT option, as you can see, has a lot of rules and formatting constraints. It's going to be interesting to see how data is read in from the detailed and possibly convoluted structures.



MCSE Training Guide (70-229). Designing and Implementing Databases with MicrosoftR SQL ServerT 2000 Enterprise Edition
MCSE Training Guide (70-229). Designing and Implementing Databases with MicrosoftR SQL ServerT 2000 Enterprise Edition
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 228

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net