XML, the eXtensible Markup Language, is an offshoot of HTML that is designed to provide an application-independent, character-set -independent method of transferring data, especially transaction-oriented data, across systems. If you're familiar with EDI (Electronic Document Interchange), then the concept behind XML should be pretty familiar to you, but the implementation is completely different. In this section, you'll see what XML is, and how to format and extract query results into XML. If you feel the need to ask what XML is, the best answer is "Go get an XML book." (Look at the "Suggested Reading" section at the end of the chapter.) Basically, though, XML is a subset of SGML, which is a huge standard that is so complex nobody has ever completely implemented it. HTML is also a subset of SGML, but is quite limited; XML is less limited (and more complex) than HTML, but is still far less complex than SGML. An XML document consists of one or more elements, which are bound between angle brackets <>. The first word that appears inside the angle brackets is the name of the element. The rest of the element consists of element attributes. For example, here's an element: <row xmlns="x-schema:#Schema1" QtyPurchased="5" PersonID="9" FirstName="Shelly" LastName="Alexander"/> The name of the element, or the element type, is row . The xmlns property specifies the name of the schema, which is the format that the element will take. The element then has other attributes, such as QtyPurchased , PersonID , FirstName , and LastName , which all have values. The element ends with a forward slash and an angle bracket , indicating the end of the element. An element can also contain other elements, like this: <Person PersonID="1" FirstName="Danny"> <Sales QtyPurchased="4"/> <Sales QtyPurchased="3"/> </Person> In this case, the Person element contains two Sales elements. Notice that on the first line, there isn't a slash before the ending bracket; the slash is actually down on the last line. This is how objects are nested in XML. To put data into XML format, the SELECT statement includes an operator called FOR XML . This specifies to SQL Server that instead of returning a rowset, it should return an XML document. There are three different options for generating the XML: RAW , AUTO , and EXPLICIT . Automatic XML FormattingIn AUTO mode, SQL Server returns the rowset in an automatically generated nested XML format. If the query has no joins, it doesn't have a nesting at all. If the query has joins, then it returns the first row from the first table, then all the correlated rows from each joined table as a nested level. For example, this query: SELECT Person.PersonID, Person.FirstName, Person.LastName, Sales.QtyPurchased FROM Person INNER JOIN Sales ON Sales.PersonID = Person.PersonID FOR XML AUTO returns this XML: <Person PersonID="1" FirstName="Danny" LastName="Jones"> <Sales QtyPurchased="4"/> <Sales QtyPurchased="3"/> </Person> <Person PersonID="3" FirstName="Scott" LastName="Smith"> <Sales QtyPurchased="1"/> </Person> <Person PersonID="4" FirstName="Alex" LastName="Riley"> <Sales QtyPurchased="1"/> <Sales QtyPurchased="1"/> </Person> <Person PersonID="9" FirstName="Shelly" LastName="Alexander"> <Sales QtyPurchased="10"/> <Sales QtyPurchased="5"/> </Person> The actual XML, if you run the query, comes out all on one line as a stream of data. This is what it looks like if you format it to actually be readable. XML, however, doesn't know about linefeeds or making things readable; remember, it's designed for system-to-system traffic without people intervening . NOTE Where's the XML ? Unfortunately, Query Analyzer by default shows only the first 256 characters of a string that's returned. That's not going to work for most of the examples in this chapter, because they're larger than 256 characters. To fix Query Analyzer, go to the Tools menu, choose Options, click on the Results tab, and in the Maximum Characters per Column box enter some larger number, like 4096. As you add other joined tables, the nesting level increases . To figure out how to nest the different fields, SQL Server uses the order in which the columns appear in the SELECT statement. So if you rewrite the preceding SELECT statement as follows : SELECT Sales.QtyPurchased, Person.PersonID, Person.FirstName, Person.LastName FROM Person INNER JOIN Sales ON Sales.PersonID = Person.PersonID FOR XML AUTO You end up with XML that looks like this: <Sales QtyPurchased="4"> <Person PersonID="1" FirstName="Danny" LastName="Jones"/> </Sales> <Sales QtyPurchased="3"> <Person PersonID="1" FirstName="Danny" LastName="Jones"/> </Sales> <Sales QtyPurchased="1"> <Person PersonID="3" FirstName="Scott" LastName="Smith"/> <Person PersonID="4" FirstName="Alex" LastName="Riley"/> <Person PersonID="4" FirstName="Alex" LastName="Riley"/> </Sales> <Sales QtyPurchased="10"> <Person PersonID="9" FirstName="Shelly" LastName="Alexander"/> </Sales> <Sales QtyPurchased="5"> <Person PersonID="9" FirstName="Shelly" LastName="Alexander"/> </Sales> Notice that the nesting here is totally reversed . The easiest way to write queries for XML is to write them with the FOR XML clause left off, make sure that they are returning the data you want, and then add the FOR XML back onto the end of the query. That eliminates the need for a lot of extra formatting. Minimal Formatting with the RAW OptionWhat if you don't want XML that is elaborately formatted? Then you can use the RAW mode. RAW mode returns each row as an element with the identifier row . Here's an example of the same query as you just saw, returned in RAW mode: <row QtyPurchased="4" PersonID="1" FirstName="Danny" LastName="Jones"/> <row QtyPurchased="3" PersonID="1" FirstName="Danny" LastName="Jones"/> <row QtyPurchased="1" PersonID="3" FirstName="Scott" LastName="Smith"/> <row QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <row QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <row QtyPurchased="10" PersonID="9" FirstName="Shelly" LastName="Alexander"/> <row QtyPurchased="5" PersonID="9" FirstName="Shelly" LastName="Alexander"/> The query is written like the one discussed in the section "Automatic XML formatting", but it uses the RAW keyword instead of the AUTO keyword: SELECT Sales.QtyPurchased, Person.PersonID, Person.FirstName, Person.LastName FROM Person INNER JOIN Sales ON Sales.PersonID = Person.PersonID FOR XML RAW If you have data that's stored in text or image format, SQL Server assumes that you don't want it in the XML file. If you do want to put the data into the XML file, you can use the BINARY BASE64 option, like this: SELECT Sales.QtyPurchased, Person.PersonID, Person.FirstName, Person.LastName FROM Person INNER JOIN Sales ON Sales.PersonID = Person.PersonID FOR XML RAW, BINARY BASE64 Explicitly Creating an XML FormatAnother way to extract data with the FOR XML clause is to use the option EXPLICIT . The EXPLICIT option enables you to specify the format of the XML that will be created. To define the format, you have to alias the first column of output to the name Tag , name the second column Parent , and then each consecutive column has to be aliased to relate it to a specific Tag . The column names after Parent are named like this: [ ElementName!TagNumber!AttributeName! Directive ] , complete with the square brackets around the alias. So, as an example: SELECT 1 as TAG, NULL as Parent, Sales.QtyPurchased as [Sales!1!QtyPurchased], Person.PersonID as [Sales!1!PersonID], Person.FirstName as [Sales!1!FirstName], Person.LastName as [Sales!1!LastName] FROM Person INNER JOIN Sales ON Sales.PersonID = Person.PersonID FOR XML EXPLICIT This returns this XML: <Sales QtyPurchased="4" PersonID="1" FirstName="Danny" LastName="Jones"/> <Sales QtyPurchased="3" PersonID="1" FirstName="Danny" LastName="Jones"/> <Sales QtyPurchased="1" PersonID="3" FirstName="Scott" LastName="Smith"/> <Sales QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <Sales QtyPurchased="1" PersonID="4" FirstName="Alex" LastName="Riley"/> <Sales QtyPurchased="10" PersonID="9" FirstName="Shelly" LastName="Alexander"/> <Sales QtyPurchased="5" PersonID="9" FirstName="Shelly" LastName="Alexander"/> To do a nested element, you need to write two queries joined together with a UNION . The first query has a Tag of 1 and a Parent of NULL ; the second has a Tag of 2 and a Parent of 1 . The second query's Parent value relates to the first query's Tag value. So, to nest the preceding query, you do this: SELECT 1 as Tag, NULL as Parent, Person.PersonID as [Person!1!PersonID], Person.FirstName as [Person!1!FirstName], Person.LastName as [Person!1!LastName], NULL AS [Sales!2!QtyPurchased] FROM Person UNION SELECT 2 as Tag, 1 as Parent, Person.PersonID AS [Person!1!PersonID], Person.FirstName as [Person!1!FirstName], Person.LastName as [Person!1!LastName], Sales.QtyPurchased as [Sales!1!QtyPurchased] FROM Person INNER JOIN Sales ON Sales.PersonID = Person.PersonID ORDER BY [Person!1!PersonID], [Person!1!FirstName], [Person!1!LastName] FOR XML EXPLICIT There are a lot of necessary details in this query. First of all, notice that because this is a UNION query, each query has to have the same number of columns. Second, notice that there is an ORDER BY . This is required, or the objects do not nest properly. Here's the result XML: <Person PersonID="1" FirstName="Danny" LastName="Jones"> <Sales QtyPurchased="3"/> <Sales QtyPurchased="4"/> </Person> <Person PersonID="2" FirstName="Melissa" LastName="Jones"/> <Person PersonID="3" FirstName="Scott" LastName="Smith"> <Sales QtyPurchased="1"/> </Person> <Person PersonID="4" FirstName="Alex" LastName="Riley"> <Sales QtyPurchased="1"/> </Person> <Person PersonID="5" FirstName="Chris" LastName="Avery"/> <Person PersonID="6" FirstName="Jennifer" LastName="Avery"/> <Person PersonID="7" FirstName="Bryan" LastName="Decker"/> <Person PersonID="8" FirstName="Robin" LastName="Decker"/> <Person PersonID="9" FirstName="Shelly" LastName="Alexander"> <Sales QtyPurchased="5"/> <Sales QtyPurchased="10"/> </Person> Notice that the result gives all the people, not just the ones with Sales, because there isn't a join in the first SELECT in the union. If you want to have only people who had purchased things in the output, you can restrict the first query to dump those folks out. Table 5.2 shows the syntax of the directives you can use as the fourth part of a column alias. The directives help you handle data in different ways, and they are optional. Table 5.2. Directives for Use in XML Column Aliases
The EXPLICIT option, as you can see, has a lot of rules and formatting constraints. It's going to be interesting to see how data is read in from the detailed and possibly convoluted structures. |