Lesson 4:Retrieving XML Data

3 4

Extensible Markup Language (XML) is a hypertext programming language used to describe the contents of a set of data and how the data should be outputted to a device or displayed on a Web page. You can execute Transact-SQL queries to return a result as XML rather than as a standard rowset. These queries can be executed directly or executed from within stored procedures. In addition, you can use the Transact-SQL to access data represented as an XML document. This lesson will introduce you to XML, and you will learn how use the FOR XML clause in a SELECT statement to retrieve data and the OPENXML function to access XML data.


After this lesson, you will be able to:

  • Use Transact-SQL to return result sets as XML.
  • Use Transact-SQL to access XML data.

Estimated lesson time: 30 minutes


Introduction to XML

Markup languages originated as ways for publishers to indicate to printers how the content of a newspaper, magazine, or book should be organized. Markup languages for electronic data perform the same function for electronic documents that can be displayed on different types of electronic gear.

Both XML and Hypertext Markup Language (HTML) are derived from Standard Generalized Markup Language (SGML). SGML is a very large, complex language that is difficult to fully use for publishing data on the Web. HTML is a more simple, specialized markup language than SGML but has a number of limitations when working with data on the Web. XML is smaller than SGML and more robust than HTML, so it is becoming an increasingly important language in the exchange of electronic data through the Web or through intra-company networks.

In a relational database such as SQL Server, all operations on the tables in the database produce a result in the form of a table. The result set of a SELECT statement is in the form of a table. Traditional client/server applications that execute a SELECT statement process the result by fetching one row or a block of rows at a time from the tabular result set and mapping the column values into program variables. Web application programmers, on the other hand, are more familiar with hierarchical representations of data in XML or HTML format.

SQL Server2000 includes many features that support XML's functionality. The combination of these features makes SQL Server 2000 an XML-enabled database server. The following features support XML functionality:

  • The ability to access SQL Server through the use of HTTP
  • Support for XML-Data Reduced (XDR) schemas and the ability to specify XPath queries against these schemas
  • The ability to retrieve and write XML data
    • Retrieving XML data by using the SELECT statement and the FOR XML clause
    • Writing XML data by using the OPENXML rowset provider
    • Retrieving XML data by using the XPath query language
  • Enhancements to the SQL Server OLE DB (SQLOLEDB) provider that enable XML documents to be set as command text and to return result sets as a stream

NOTE


XML functionality is a sophisticated and extensive component of SQL Server. This lesson can cover only a small portion of that functionality. For more information about XML, refer to SQL Server Books Online. In addition, you can refer to the XML Developer Center on MSDN (http://msdn.microsoft.com/xml/default.asp) for the latest updates relating to SQL Server support for XML.

Using the FOR XML Clause to Retrieve Data

You can execute SQL queries against existing relational databases in order to return a result as an XML document rather than as a standard rowset. To retrieve a result directly, use the FOR XML clause of the SELECT statement.

For example, the following SELECT statement uses a FOR XML clause that specifies the AUTO mode and the ELEMENTS keyword:

 USE Northwind SELECT Customers.CustomerID, ContactName, CompanyName,     Orders.CustomerID, OrderDate FROM Customers, Orders WHERE Customers.CustomerID = Orders.CustomerID      AND (Customers.CustomerID = N'ALFKI'      OR Customers.CustomerID = N'XYZAA') ORDER BY Customers.CustomerID FOR XML AUTO, ELEMENTS 

The FOR XML clause uses the following syntax:

FOR XML {RAW | AUTO | EXPLICIT} [, XMLDATA] [, ELEMENTS][, BINARY BASE64]

RAW, AUTO, EXPLICIT

The FOR XML clause must specify one of the following XML modes: RAW, AUTO, or EXPLICIT. The XML mode determines the shape of the XML result set. The XML mode is in effect only for the execution of the query for which they are set. The mode does not affect the result of any subsequent queries.

RAW Mode

RAW mode transforms each row in the query result set into an XML element with the generic identifier row. Each column value that is not NULL is mapped to an attribute of the XML element in which the attribute name is the same as the column name.

The BINARY BASE64 option must be specified in the query in order to return the binary data in base64-encoded format. In RAW mode, retrieving binary data without specifying the BINARY BASE64 option results in an error.

When an XML-Data schema is requested, the schema, declared as a namespace, appears at the beginning of the data. In the result, the schema namespace reference is repeated for every top-level element.

AUTO Mode

AUTO mode returns query results as nested XML elements. Each table in the FROM clause, from which at least one column is listed in the SELECT clause, is represented as an XML element. The columns listed in the SELECT clause are mapped to the appropriate attribute of the element. When the ELEMENTS option is specified, the table columns are mapped to subelements instead of attributes. By default, AUTO mode maps the table columns to XML attributes.

A table name (or the alias, if provided) maps to the XML element name. A column name (or the alias, if provided) maps to an attribute name or to a non-complex subelement name when the ELEMENTS option is specified in the query.

The hierarchy (nesting of the elements) in the result set is based on the order of tables identified by the columns that are specified in the SELECT clause; therefore, the order in which column names are specified in the SELECT clause is significant.

The tables are identified and nested in the order in which the column names are listed in the SELECT clause. The first, leftmost table identified forms the top element in the resulting XML document. The second leftmost table (identified by columns in the SELECT statement) forms a subelement within the top element (and so on).

If a column name listed in the SELECT clause is from a table that is already identified by a previously specified column in the SELECT clause, the column is added as an attribute (or as a subelement if the ELEMENTS option is specified) of the element already created, instead of opening a new hierarchy (adding a new subelement for that table).

EXPLICIT Mode

In EXPLICIT mode, the query writer controls the shape of the XML document returned by the execution of the query. The query must be written in a specific way so that the additional information about the expected nesting is explicitly specified as part of the query. You can also specify additional configurations at the column level by using the directives. When you specify EXPLICIT mode, you must assume the responsibility for ensuring that the generated XML is well-formed and valid (in the case of an XML-DATA schema).

XMLDATA

The XMLDATA keyword specifies that an XML-Data schema should be returned. The schema is added to the document as an inline schema. The primary purpose for specifying XMLDATA in a query is to receive XML data type information that can be used where data types are necessary (for example, when handling numeric expressions). Otherwise, everything in an XML document is a textual string. Generating an XML-Data schema is an overhead on the server, is likely to affect performance, and should be used only when data types are needed.

If the database column from which values are retrieved is of the type sql_variant, there is no data type information in the XML-Data schema. If a given query designates different XML elements with the same name, XMLDATA might produce an invalid XML-Data schema. This will happen if element-name collisions and data type names are not resolved (you might have two elements with the same name but different data types).

ELEMENTS

If the ELEMENTS option is specified, the columns are returned as subelements. Otherwise, they are mapped to XML attributes. This option is supported in AUTO mode only.

BINARY BASE64

If the BINARY Base64 option is specified, any binary data returned by the query is represented in base64-encoded format. To retrieve binary data using RAW and EXPLICIT mode, you must specify this option. In AUTO mode, binary data is returned as a reference by default.

Using the OPENXML Function to Access XML Data

The OPENXML function is a Transact-SQL keyword that provides a rowset over in-memory XML documents. OPENXML is a rowset provider similar to a table or a view. OPENXML enables access to XML data as if it were a relational rowset by providing a rowset view of the internal representation of an XML document. The records in the rowset can be stored in database tables (similar to the rowsets provided by tables and views).

The OPENXML function can be used in SELECT and SELECT INTO statements wherever a rowset provider, such as a table, a view, or OPENROWSET, can appear as the source.

To write queries against an XML document by using OPENXML, you must first call the sp_xml_preparedocument system stored procedure, which parses the XML document and returns a handle to the parsed document that is ready for consumption. The parsed document is a tree representation of various nodes (elements, attributes, text, comment, and so on) in the XML document. The document handle is passed to OPENXML, which then provides a rowset view of the document based on the parameters passed to it.

The internal representation of an XML document must be removed from memory by calling the sp_xml_removedocument system stored procedure to free the memory.

In the following example, you are declaring a variable, parsing the XML data, retrieving the data in a SELECT statement, and removing the representation of the XML document:

 -- Declaring a variable. DECLARE @hDoc INT -- Parsing the XML data. EXEC sp_xml_preparedocument @hDoc OUTPUT,       N'<ROOT>      <Customers Customer ContactName="Joe"       CompanyName="Company1">     <Orders Customer            OrderDate="2000-08-25T00:00:00"/>           <Orders Customer            OrderDate="2000-10-03T00:00:00"/>      </Customers>      <Customers Customer ContactName="Steve"      CompanyName="Company2">No Orders yet!           <Orders Customer             OrderDate="2000-06-21T00:00:00"/>           <Orders Customer             OrderDate="2000-10-10T00:00:00"/>      </Customers>      </ROOT>' -- Using OPENXML in a SELECT statement. SELECT *  FROM OPENXML(@hDoc, N'/ROOT/Customers/Orders')       WITH (CustomerID NCHAR(5) '../@CustomerID',            ContactName NVARCHAR(50) '../@ContactName', OrderDate DATETIME) -- Removing the internal representation of the XML document. EXEC sp_xml_removedocument @hDoc 

The OPENXML function uses the following syntax:

 OPENXML(idoc int [in],rowpattern nvarchar[in],[flags byte[in]])  [WITH (SchemaDeclaration | TableName)] 

XML Document Handle (idoc)

The XML document handle is the document handle of the internal representation of an XML document. The internal representation of an XML document is created by calling sp_xml_preparedocument.

Xpath Expression (rowpattern)

The XPath expression specified as rowpattern identifies a set of nodes in the XML document. Each node identified by rowpattern corresponds to a single row in the rowset generated by OPENXML.

The nodes identified by the XPath expression can be any XML nodes (elements, attributes, processing instructions, and so on) in the XML document. If rowpattern identifies a set of elements in the XML document, there is one row in the rowset for each element node identified. For example, if rowpattern ends in an attribute, a row is created for each attribute node selected by rowpattern.

Mapping (flags)

In the OPENXML statement, you can optionally specify the type of mapping (attribute-centric or element-centric) between the rowset columns and the XML nodes identified by the rowpattern. This information is used in transformation between the XML nodes and the rowset columns.

There are two ways to specify the mapping (you can specify both):

  • Using the flags parameter.  The mapping specified by the flags parameter assumes name correspondence where the XML nodes map to corresponding rowset columns with the same name.
  • Using the ColPattern parameter.  ColPattern, an XPath expression, is specified as part of SchemaDeclaration in the WITH clause. The mapping specified in ColPattern overwrites the mapping specified by the flags parameter. ColPattern can be used to specify the special nature of the mapping (in case of attribute-centric and element-centric mapping) that overwrites or enhances the default mapping indicated by the flags. ColPattern is specified if the following conditions are met:
    • The column name in the rowset is different from the element/attribute name to which it is mapped. In this case, ColPattern is used to identify the XML element/attribute name to which the rowset column maps.
    • You want to map a metaproperty attribute to the column. In this case, ColPattern is used to identify the metaproperty to which the rowset column maps.

Both the flags and ColPattern parameters are optional. If no mapping is specified, attribute-centric mapping (the default value of the flags parameter) is assumed by default.

SchemaDeclaration

A rowset schema must be provided to OPENXML to generate the rowset. You can specify the rowset schema by using the optional WITH clause. The following options are available for specifying the rowset schema:

  • Specifying the complete schema in the WITH clause.  When you specify the rowset schema, you specify the column names and their data types and their mapping to the XML document. You can specify the column pattern (by using the ColPattern parameter in SchemaDeclaration). The column pattern specified is used to map a rowset column to the XML node identified by rowpattern and also to determine the type of mapping. If ColPattern is not specified for a column, the rowset column maps to the XML node with the same name based on the mapping specified by the flags parameter. If ColPattern is specified as part of the schema specification in the WITH clause, however, it overwrites the mapping specified in the flags parameter.
  • Specifying the name of an existing table in the WITH clause.  You can simply specify an existing table name whose schema can be used by OPENXML to generate the rowset.
  • Not specifying the WITH clause.  In this case, OPENXML returns a rowset in the edge table format. This display is called an edge table because, in this table format, every edge in the parsed XML document tree maps to a row in the rowset.

TableName

A table name can be provided instead of the schema declaration if a table with the desired schema already exists and no column patterns are required.

Exercise 4:  Retrieving XML Data

In this exercise, you will use the FOR XML clause to retrieve data in an XML format and use the OPENXML function to return XML data. To complete this exercise, you should be logged into your Windows 2000 Server computer as Administrator.

To use the FOR XML clause to retrieve data

  1. Open Query Analyzer and connect to your local server.
  2. On the toolbar, click the Execute Mode button and then click Results In Text.

By selecting this option, you can better view the result set.

  1. In the Editor pane of the Query window, enter the following Transact-SQL code:
 USE Northwind SELECT CustomerID, ContactName, CompanyName FROM Customers WHERE (CustomerID = N'ALFKI'   OR CustomerID = N'XYZAA') ORDER BY CustomerID FOR XML RAW 

In this statement, you are using the FOR XML clause to return data in an XML format. The clause specifies RAW mode.

  1. Execute the Transact-SQL statement.

The result set appears in the Results tab of the Results pane. Notice how the result set differs from the typical result set that you would see if you executed the SELECT statement without the FOR XML clause.

To use the OPENXML function to retrieve XML data

  1. In the Editor pane of the Query window, enter the following Transact-SQL code:
 DECLARE @TestDoc INT EXEC sp_xml_preparedocument @TestDoc OUTPUT,       N'<ROOT>      <Employees Emp FirstName="Ethan" 
Dept="Marketing"> </Employees> <Employees Emp FirstName="Linda"
Dept="Research"> </Employees> </ROOT>'

This statement declares a variable named @TestDoc and then uses the sp_xml_preparedocument system stored procedure to parse the XML data.

  1. Add the following Transact-SQL code to the end of the previous statement:
 SELECT *  FROM OPENXML(@TestDoc, N'/ROOT/Employees')  WITH (EmpID NCHAR(5) './@EmpID',        FirstName VARCHAR(50) './@FirstName',        Dept VARCHAR(10) './@Dept') 

This SELECT statement defines the result that should be returned. The OPENXML function is used to extract XML data. Notice that the @TestDoc variable is being used.

  1. Add the following Transact-SQL code to the end of the previous statement:
 EXEC sp_xml_removedocument @TestDoc 

This statement removes the internal representation of the XML document.

  1. Execute all four statements (DECLARE, EXEC sp_xml_preparedocument, SELECT, and EXEC sp_xml_removedocument).

The result set appears in the Results tab of the Results pane.

  1. Close Query Analyzer.

Lesson Summary

XML is a hypertext programming language used to describe the contents of a set of data and how the data should be output to a device or displayed on a Web page. SQL Server 2000 includes many features that support XML functionality. You can execute SQL queries against existing relational databases to return a result as an XML document rather than as a standard rowset. To retrieve a result directly, use the FOR XML clause of the SELECT statement. The FOR XML clause must specify one of the following XML modes: RAW, AUTO, or EXPLICIT. The OPENXML function is a Transact-SQL keyword that provides a rowset over in-memory XML documents. OPENXML enables access to XML data as if it were a relational rowset by providing a rowset view of the internal representation of an XML document. OPENXML can be used in SELECT and SELECT INTO statements wherever a rowset provider, such as a table, a view, or OPENROWSET, can appear as the source.



Microsoft Press Staff - MCSE. Microsoft SQL Server 2000 Database Design and Implementation Training Kit
MCSE Training Kit (Exam 70-229): Microsoft SQL Server(TM) 2000 Database Design and Implementation (Pro Certification)
ISBN: 073561248X
EAN: 2147483647
Year: 2001
Pages: 97

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net