3 4
Extensible Markup Language (XML) is a hypertext programming language used to describe the contents of a set of data and how the data should be outputted to a device or displayed on a Web page. You can execute Transact-SQL queries to return a result as XML rather than as a standard rowset. These queries can be executed directly or executed from within stored procedures. In addition, you can use the Transact-SQL to access data represented as an XML document. This lesson will introduce you to XML, and you will learn how use the FOR XML clause in a SELECT statement to retrieve data and the OPENXML function to access XML data.
Markup languages originated as ways for publishers to indicate to printers how the content of a newspaper, magazine, or book should be organized. Markup languages for electronic data perform the same function for electronic documents that can be displayed on different types of electronic gear.
Both XML and Hypertext Markup Language (HTML) are derived from Standard Generalized Markup Language (SGML). SGML is a very large, complex language that is difficult to fully use for publishing data on the Web. HTML is a more simple, specialized markup language than SGML but has a number of limitations when working with data on the Web. XML is smaller than SGML and more robust than HTML, so it is becoming an increasingly important language in the exchange of electronic data through the Web or through intra-company networks.
In a relational database such as SQL Server, all operations on the tables in the database produce a result in the form of a table. The result set of a SELECT statement is in the form of a table. Traditional client/server applications that execute a SELECT statement process the result by fetching one row or a block of rows at a time from the tabular result set and mapping the column values into program variables. Web application programmers, on the other hand, are more familiar with hierarchical representations of data in XML or HTML format.
SQL Server2000 includes many features that support XML's functionality. The combination of these features makes SQL Server 2000 an XML-enabled database server. The following features support XML functionality:
NOTE
You can execute SQL queries against existing relational databases in order to return a result as an XML document rather than as a standard rowset. To retrieve a result directly, use the FOR XML clause of the SELECT statement.
For example, the following SELECT statement uses a FOR XML clause that specifies the AUTO mode and the ELEMENTS keyword:
USE Northwind SELECT Customers.CustomerID, ContactName, CompanyName, Orders.CustomerID, OrderDate FROM Customers, Orders WHERE Customers.CustomerID = Orders.CustomerID AND (Customers.CustomerID = N'ALFKI' OR Customers.CustomerID = N'XYZAA') ORDER BY Customers.CustomerID FOR XML AUTO, ELEMENTS
The FOR XML clause uses the following syntax:
FOR XML {RAW | AUTO | EXPLICIT} [, XMLDATA] [, ELEMENTS][, BINARY BASE64]
The FOR XML clause must specify one of the following XML modes: RAW, AUTO, or EXPLICIT. The XML mode determines the shape of the XML result set. The XML mode is in effect only for the execution of the query for which they are set. The mode does not affect the result of any subsequent queries.
RAW mode transforms each row in the query result set into an XML element with the generic identifier row. Each column value that is not NULL is mapped to an attribute of the XML element in which the attribute name is the same as the column name.
The BINARY BASE64 option must be specified in the query in order to return the binary data in base64-encoded format. In RAW mode, retrieving binary data without specifying the BINARY BASE64 option results in an error.
When an XML-Data schema is requested, the schema, declared as a namespace, appears at the beginning of the data. In the result, the schema namespace reference is repeated for every top-level element.
AUTO mode returns query results as nested XML elements. Each table in the FROM clause, from which at least one column is listed in the SELECT clause, is represented as an XML element. The columns listed in the SELECT clause are mapped to the appropriate attribute of the element. When the ELEMENTS option is specified, the table columns are mapped to subelements instead of attributes. By default, AUTO mode maps the table columns to XML attributes.
A table name (or the alias, if provided) maps to the XML element name. A column name (or the alias, if provided) maps to an attribute name or to a non-complex subelement name when the ELEMENTS option is specified in the query.
The hierarchy (nesting of the elements) in the result set is based on the order of tables identified by the columns that are specified in the SELECT clause; therefore, the order in which column names are specified in the SELECT clause is significant.
The tables are identified and nested in the order in which the column names are listed in the SELECT clause. The first, leftmost table identified forms the top element in the resulting XML document. The second leftmost table (identified by columns in the SELECT statement) forms a subelement within the top element (and so on).
If a column name listed in the SELECT clause is from a table that is already identified by a previously specified column in the SELECT clause, the column is added as an attribute (or as a subelement if the ELEMENTS option is specified) of the element already created, instead of opening a new hierarchy (adding a new subelement for that table).
In EXPLICIT mode, the query writer controls the shape of the XML document returned by the execution of the query. The query must be written in a specific way so that the additional information about the expected nesting is explicitly specified as part of the query. You can also specify additional configurations at the column level by using the directives. When you specify EXPLICIT mode, you must assume the responsibility for ensuring that the generated XML is well-formed and valid (in the case of an XML-DATA schema).
The XMLDATA keyword specifies that an XML-Data schema should be returned. The schema is added to the document as an inline schema. The primary purpose for specifying XMLDATA in a query is to receive XML data type information that can be used where data types are necessary (for example, when handling numeric expressions). Otherwise, everything in an XML document is a textual string. Generating an XML-Data schema is an overhead on the server, is likely to affect performance, and should be used only when data types are needed.
If the database column from which values are retrieved is of the type sql_variant, there is no data type information in the XML-Data schema. If a given query designates different XML elements with the same name, XMLDATA might produce an invalid XML-Data schema. This will happen if element-name collisions and data type names are not resolved (you might have two elements with the same name but different data types).
If the ELEMENTS option is specified, the columns are returned as subelements. Otherwise, they are mapped to XML attributes. This option is supported in AUTO mode only.
If the BINARY Base64 option is specified, any binary data returned by the query is represented in base64-encoded format. To retrieve binary data using RAW and EXPLICIT mode, you must specify this option. In AUTO mode, binary data is returned as a reference by default.
The OPENXML function is a Transact-SQL keyword that provides a rowset over in-memory XML documents. OPENXML is a rowset provider similar to a table or a view. OPENXML enables access to XML data as if it were a relational rowset by providing a rowset view of the internal representation of an XML document. The records in the rowset can be stored in database tables (similar to the rowsets provided by tables and views).
The OPENXML function can be used in SELECT and SELECT INTO statements wherever a rowset provider, such as a table, a view, or OPENROWSET, can appear as the source.
To write queries against an XML document by using OPENXML, you must first call the sp_xml_preparedocument system stored procedure, which parses the XML document and returns a handle to the parsed document that is ready for consumption. The parsed document is a tree representation of various nodes (elements, attributes, text, comment, and so on) in the XML document. The document handle is passed to OPENXML, which then provides a rowset view of the document based on the parameters passed to it.
The internal representation of an XML document must be removed from memory by calling the sp_xml_removedocument system stored procedure to free the memory.
In the following example, you are declaring a variable, parsing the XML data, retrieving the data in a SELECT statement, and removing the representation of the XML document:
-- Declaring a variable. DECLARE @hDoc INT -- Parsing the XML data. EXEC sp_xml_preparedocument @hDoc OUTPUT, N'<ROOT> <Customers Customer ContactName="Joe" CompanyName="Company1"> <Orders Customer OrderDate="2000-08-25T00:00:00"/> <Orders Customer OrderDate="2000-10-03T00:00:00"/> </Customers> <Customers Customer ContactName="Steve" CompanyName="Company2">No Orders yet! <Orders Customer OrderDate="2000-06-21T00:00:00"/> <Orders Customer OrderDate="2000-10-10T00:00:00"/> </Customers> </ROOT>' -- Using OPENXML in a SELECT statement. SELECT * FROM OPENXML(@hDoc, N'/ROOT/Customers/Orders') WITH (CustomerID NCHAR(5) '../@CustomerID', ContactName NVARCHAR(50) '../@ContactName', OrderDate DATETIME) -- Removing the internal representation of the XML document. EXEC sp_xml_removedocument @hDoc
The OPENXML function uses the following syntax:
OPENXML(idoc int [in],rowpattern nvarchar[in],[flags byte[in]]) [WITH (SchemaDeclaration | TableName)]
The XML document handle is the document handle of the internal representation of an XML document. The internal representation of an XML document is created by calling sp_xml_preparedocument.
The XPath expression specified as rowpattern identifies a set of nodes in the XML document. Each node identified by rowpattern corresponds to a single row in the rowset generated by OPENXML.
The nodes identified by the XPath expression can be any XML nodes (elements, attributes, processing instructions, and so on) in the XML document. If rowpattern identifies a set of elements in the XML document, there is one row in the rowset for each element node identified. For example, if rowpattern ends in an attribute, a row is created for each attribute node selected by rowpattern.
In the OPENXML statement, you can optionally specify the type of mapping (attribute-centric or element-centric) between the rowset columns and the XML nodes identified by the rowpattern. This information is used in transformation between the XML nodes and the rowset columns.
There are two ways to specify the mapping (you can specify both):
Both the flags and ColPattern parameters are optional. If no mapping is specified, attribute-centric mapping (the default value of the flags parameter) is assumed by default.
A rowset schema must be provided to OPENXML to generate the rowset. You can specify the rowset schema by using the optional WITH clause. The following options are available for specifying the rowset schema:
A table name can be provided instead of the schema declaration if a table with the desired schema already exists and no column patterns are required.
In this exercise, you will use the FOR XML clause to retrieve data in an XML format and use the OPENXML function to return XML data. To complete this exercise, you should be logged into your Windows 2000 Server computer as Administrator.
By selecting this option, you can better view the result set.
USE Northwind SELECT CustomerID, ContactName, CompanyName FROM Customers WHERE (CustomerID = N'ALFKI' OR CustomerID = N'XYZAA') ORDER BY CustomerID FOR XML RAW
In this statement, you are using the FOR XML clause to return data in an XML format. The clause specifies RAW mode.
The result set appears in the Results tab of the Results pane. Notice how the result set differs from the typical result set that you would see if you executed the SELECT statement without the FOR XML clause.
DECLARE @TestDoc INT EXEC sp_xml_preparedocument @TestDoc OUTPUT, N'<ROOT> <Employees Emp FirstName="Ethan"
Dept="Marketing"> </Employees> <Employees Emp FirstName="Linda"
Dept="Research"> </Employees> </ROOT>'
This statement declares a variable named @TestDoc and then uses the sp_xml_preparedocument system stored procedure to parse the XML data.
SELECT * FROM OPENXML(@TestDoc, N'/ROOT/Employees') WITH (EmpID NCHAR(5) './@EmpID', FirstName VARCHAR(50) './@FirstName', Dept VARCHAR(10) './@Dept')
This SELECT statement defines the result that should be returned. The OPENXML function is used to extract XML data. Notice that the @TestDoc variable is being used.
EXEC sp_xml_removedocument @TestDoc
This statement removes the internal representation of the XML document.
The result set appears in the Results tab of the Results pane.
XML is a hypertext programming language used to describe the contents of a set of data and how the data should be output to a device or displayed on a Web page. SQL Server 2000 includes many features that support XML functionality. You can execute SQL queries against existing relational databases to return a result as an XML document rather than as a standard rowset. To retrieve a result directly, use the FOR XML clause of the SELECT statement. The FOR XML clause must specify one of the following XML modes: RAW, AUTO, or EXPLICIT. The OPENXML function is a Transact-SQL keyword that provides a rowset over in-memory XML documents. OPENXML enables access to XML data as if it were a relational rowset by providing a rowset view of the internal representation of an XML document. OPENXML can be used in SELECT and SELECT INTO statements wherever a rowset provider, such as a table, a view, or OPENROWSET, can appear as the source.