Extracting Data into XML
XML, the eXtensible Markup Language, is an offshoot of HTML that is designed to provide an application-independent,
If you feel the need to ask what XML is, the best answer is "Go get an XML book." (Look at the "Suggested Reading" section at the end of the chapter.) Basically, though, XML is a subset of SGML, which is a huge standard that is so complex nobody has ever completely implemented it. HTML is also a subset of SGML, but is quite limited; XML is less limited (and more complex) than HTML, but is still far less complex than SGML.
An XML document consists of one or more elements, which are bound between angle brackets <>. The first word that appears inside the angle brackets is the
<row xmlns="x-schema:#Schema1" QtyPurchased="5" PersonID="9" FirstName="Shelly"
LastName="Alexander"/>
The name of the element, or the element type, is
row
. The
xmlns
property specifies the name of the schema, which is the format that the element will take. The element then has other attributes, such as
QtyPurchased
,
PersonID
,
FirstName
, and
LastName
, which all have values. The element ends with a forward slash and an angle
An element can also contain other elements, like this:
<Person PersonID="1" FirstName="Danny">
<Sales QtyPurchased="4"/>
<Sales QtyPurchased="3"/>
</Person>
In this case, the Person element contains two Sales elements. Notice that on the first line, there isn't a slash before the ending bracket; the slash is actually down on the last line. This is how objects are nested in XML. To put data into XML format, the SELECT statement includes an operator called FOR XML . This specifies to SQL Server that instead of returning a rowset, it should return an XML document. There are three different options for generating the XML: RAW , AUTO , and EXPLICIT . Automatic XML FormattingIn AUTO mode, SQL Server returns the rowset in an automatically generated nested XML format. If the query has no joins, it doesn't have a nesting at all. If the query has joins, then it returns the first row from the first table, then all the correlated rows from each joined table as a nested level. For example, this query:
SELECT Person.PersonID, Person.FirstName, Person.LastName, Sales.QtyPurchased
FROM Person
INNER JOIN Sales
ON Sales.PersonID = Person.PersonID
FOR XML AUTO
returns this XML:
<Person PersonID="1" FirstName="Danny" LastName="Jones">
<Sales QtyPurchased="4"/>
<Sales QtyPurchased="3"/>
</Person>
<Person PersonID="3" FirstName="Scott" LastName="Smith">
<Sales QtyPurchased="1"/>
</Person>
<Person PersonID="4" FirstName="Alex" LastName="Riley">
<Sales QtyPurchased="1"/>
<Sales QtyPurchased="1"/>
</Person>
<Person PersonID="9" FirstName="Shelly" LastName="Alexander">
<Sales QtyPurchased="10"/>
<Sales QtyPurchased="5"/>
</Person>
The actual XML, if you run the query, comes out all on one line as a stream of data. This is what it looks like if you format it to actually be readable. XML, however, doesn't know about linefeeds or making things readable; remember, it's designed for system-to-system traffic without people
NOTE
Where's the
XML
?
Unfortunately, Query Analyzer by default shows only the first 256 characters of a string that's returned. That's not going to work for most of the examples in this chapter, because they're larger than 256 characters. To fix Query Analyzer, go to the Tools menu, choose Options, click on the Results tab, and in the Maximum
As you add other joined tables, the nesting level
SELECT Sales.QtyPurchased, Person.PersonID, Person.FirstName, Person.LastName
FROM Person
INNER JOIN Sales
ON Sales.PersonID = Person.PersonID
FOR XML AUTO
You end up with XML that looks like this:
<Sales QtyPurchased="4">
<Person PersonID="1" FirstName="Danny" LastName="Jones"/>
</Sales>
<Sales QtyPurchased="3">
<Person PersonID="1" FirstName="Danny" LastName="Jones"/>
</Sales>
<Sales QtyPurchased="1">
<Person PersonID="3" FirstName="Scott" LastName="Smith"/>
<Person PersonID="4" FirstName="Alex" LastName="Riley"/>
<Person PersonID="4" FirstName="Alex" LastName="Riley"/>
</Sales>
<Sales QtyPurchased="10">
<Person PersonID="9" FirstName="Shelly" LastName="Alexander"/>
</Sales>
<Sales QtyPurchased="5">
<Person PersonID="9" FirstName="Shelly" LastName="Alexander"/>
</Sales>
Notice that the nesting here is totally
Minimal Formatting with the RAW OptionWhat if you don't want XML that is elaborately formatted? Then you can use the RAW mode. RAW mode returns each row as an element with the identifier row . Here's an example of the same query as you just saw, returned in RAW mode: <row QtyPurchased="4" PersonID="1" FirstName="Danny" LastName="Jones"/> <row QtyPurchased="3" PersonID="1" FirstName="Danny" LastName="Jones"/> <row QtyPurchased="1" PersonID="3" FirstName="Scott" LastName="Smith"/> <row QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <row QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <row QtyPurchased="10" PersonID="9" FirstName="Shelly" LastName="Alexander"/> <row QtyPurchased="5" PersonID="9" FirstName="Shelly" LastName="Alexander"/> The query is written like the one discussed in the section "Automatic XML formatting", but it uses the RAW keyword instead of the AUTO keyword:
SELECT Sales.QtyPurchased, Person.PersonID, Person.FirstName, Person.LastName
FROM Person
INNER JOIN Sales
ON Sales.PersonID = Person.PersonID
FOR XML RAW
If you have data that's stored in text or image format, SQL Server assumes that you don't want it in the XML file. If you do want to put the data into the XML file, you can use the BINARY BASE64 option, like this:
SELECT Sales.QtyPurchased, Person.PersonID, Person.FirstName, Person.LastName
FROM Person
INNER JOIN Sales
ON Sales.PersonID = Person.PersonID
FOR XML RAW, BINARY BASE64
Explicitly Creating an XML Format
Another way to extract data with the
FOR XML
clause is to use the option
EXPLICIT
. The
EXPLICIT
option enables you to specify the format of the XML that will be created. To define the format, you have to alias the first column of output to the name
Tag
, name the second column
Parent
, and then each consecutive column has to be aliased to relate it to a specific
Tag
. The column
SELECT 1 as TAG,
NULL as Parent,
Sales.QtyPurchased as [Sales!1!QtyPurchased],
Person.PersonID as [Sales!1!PersonID],
Person.FirstName as [Sales!1!FirstName],
Person.LastName as [Sales!1!LastName]
FROM Person
INNER JOIN Sales
ON Sales.PersonID = Person.PersonID
FOR XML EXPLICIT
This returns this XML: <Sales QtyPurchased="4" PersonID="1" FirstName="Danny" LastName="Jones"/> <Sales QtyPurchased="3" PersonID="1" FirstName="Danny" LastName="Jones"/> <Sales QtyPurchased="1" PersonID="3" FirstName="Scott" LastName="Smith"/> <Sales QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <Sales QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <Sales QtyPurchased="10" PersonID="9" FirstName="Shelly" LastName="Alexander"/> <Sales QtyPurchased="5" PersonID="9" FirstName="Shelly" LastName="Alexander"/>
To do a nested element, you need to write two queries joined together with a
UNION
. The first query has a
Tag
of
1
and a
Parent
of
NULL
; the second has a
Tag
of
2
and a
Parent
of
1
. The second query's
Parent
value
SELECT 1 as Tag,
NULL as Parent,
Person.PersonID as [Person!1!PersonID],
Person.FirstName as [Person!1!FirstName],
Person.LastName as [Person!1!LastName],
NULL AS [Sales!2!QtyPurchased]
FROM Person
UNION
SELECT 2 as Tag,
1 as Parent,
Person.PersonID AS [Person!1!PersonID],
Person.FirstName as [Person!1!FirstName],
Person.LastName as [Person!1!LastName],
Sales.QtyPurchased as [Sales!1!QtyPurchased]
FROM Person
INNER JOIN Sales
ON Sales.PersonID = Person.PersonID
ORDER BY [Person!1!PersonID], [Person!1!FirstName], [Person!1!LastName]
FOR XML EXPLICIT
There are a lot of necessary details in this query. First of all, notice that because this is a UNION query, each query has to have the same number of columns. Second, notice that there is an ORDER BY . This is required, or the objects do not nest properly. Here's the result XML:
<Person PersonID="1" FirstName="Danny" LastName="Jones">
<Sales QtyPurchased="3"/>
<Sales QtyPurchased="4"/>
</Person>
<Person PersonID="2" FirstName="Melissa" LastName="Jones"/>
<Person PersonID="3" FirstName="Scott" LastName="Smith">
<Sales QtyPurchased="1"/>
</Person>
<Person PersonID="4" FirstName="Alex" LastName="Riley">
<Sales QtyPurchased="1"/>
</Person>
<Person PersonID="5" FirstName="Chris" LastName="Avery"/>
<Person PersonID="6" FirstName="Jennifer" LastName="Avery"/>
<Person PersonID="7" FirstName="Bryan" LastName="Decker"/>
<Person PersonID="8" FirstName="Robin" LastName="Decker"/>
<Person PersonID="9" FirstName="Shelly" LastName="Alexander">
<Sales QtyPurchased="5"/>
<Sales QtyPurchased="10"/>
</Person>
Notice that the result gives all the people, not just the ones with Sales, because there isn't a join in the first SELECT in the union. If you want to have only people who had purchased things in the output, you can restrict the first query to dump those folks out. Table 5.2 shows the syntax of the directives you can use as the fourth part of a column alias. The directives help you handle data in different ways, and they are optional. Table 5.2. Directives for Use in XML Column Aliases
The EXPLICIT option, as you can see, has a lot of rules and formatting constraints. It's going to be interesting to see how data is read in from the detailed and possibly convoluted structures. |