Integrating XQuery and SQL: Querying XML Datatypes

Integrating XQuery and SQL: Querying XML Datatypes

Once the XML data is safely stored in a relational table, the easiest way to get at the data is to use SQL to query the table and retrieve the XML -typed column, as in the following example:

 SELECT CustomerName, OrderForm FROM Customers WHERE CustomerID = 1 

However, SQL per se cannot query information inside an XML datatype instance. That is where XQuery comes in. XQuery allows us to query and transform the XML data. What we need to understand now is how to integrate the two, so that we can invoke XQuery functionality from SQL, and also provide information from the relational environment to the XQuery context.

This section introduces three ways to query XML datatype instances in the context of the SQL query language. It explains how the XQuery expressions access both static and dynamic information provided by the SQL context, including data stored in other SQL columns .

To understand how we combine relational queries and XQuery, we must first have a conceptual understanding of how relational queries operate . First, all tables in the FROM clause are combined into a larger relational table using either a Cartesian product (if they are combined using a comma) or the given join operation (using the SQL-92 join syntax [SQL92]). For example, the query shown below produces a table consisting of the Cartesian product of Table A and the left outer join between B and C:

 SELECT A.*, function(B.b) as b FROM A, B LEFT OUTER JOIN C ON B.id = C.b WHERE C.id = 1 AND A.id = B.a 

Second, the predicate expression given in the WHERE clause is applied to each tuple in the table. Finally, the SELECT clause indicates the columns of the tuples satisfying the predicate filter that will be returned in the final resulting table ( potentially after applying a final transformation).

The relational processing model is very close to the conceptual processing model of XQuery. However, there are some fundamental differences such as the typing rules based on a different type system and the handling of order. Unless an explicit order is given in the SQL statement, the SQL query processor is free to neglect the order of the tuples, which enables the query-execution engine to apply algorithms that may reorder the tuples in order to achieve a more efficient execution.

SQL is also a strongly and statically typed language on the relational type system, which means that the inferred relational type of each expression must be known at query compilation (or static analysis) time.

XQuery Functionality in SQL

The SQL query-processing model together with static typing indicate that the following query functionality should be provided in order to support XQuery in the context of SQL:

Transforming XML Datatype Instances

We must be able to transform an XML datatype instance to another XML datatype. This functionality is usually used in the SQL SELECT clause; it makes use of the XQuery construction functionality.

Extracting a Value into the SQL Value Space

We must have a way to extract information from an XML instance that fits into the SQL type system:

  • Values that fit into the scalar SQL types, such as character types and numeric types (e.g., integer, decimal, float, etc.)

  • XML nodes and subtrees that will be represented with the XML datatype

Testing the XML Structure for Existence

We need a way to test for the presence or absence of nodes and values based on a query expression. While this functionality could be provided using the XQuery function empty() and the value-extraction function, having such a function available at the SQL level seems more user -friendly. The following functions provide the functionality specified in the above sections:

  • query(XML, XQueryString) XML

    This function applies the XQuery expression instance and returns an XML type instance. While the input instance can be statically constrained, the result is always untyped (an additional parameter could provide the expected schema type). Since the result type has to be a valid instance of the XML datatype, XQuery expressions that return attribute nodes at the top level must raise an error, while queries that return values must convert them into top-level text nodes. This function provides the mechanism to return subtrees and to transform the XML instance into a different shape (see Listing 7.5 for an example).

    Listing 7.5 Transforming the XML Column
     SELECT CustomerName, query(PurchaseOrders,        'declare namespace po = "http://www.example.com/po-schema"         for $p in /po:purchase-order         where $p/@orderdate < xs:date("2002-10-31")         return           <purchaseorder date="{$p/@orderdate}">{            $p/*           }</purchaseorder>') FROM Customers WHERE CustomerID = 1 
  • value(XML, XQueryString, SQLType) SQLType

    This function applies the XQuery expression to the XML type instance and returns a relational value typed according to the SQL type indicated with the third argument SQLType. Since the value's SQL type must be statically known to the SQL compiler, the type must be provided as either an SQL type name or a constant string.

    The XML data model value calculated by the XQuery expression is converted to the SQL value by converting the lexical representation of the XML value to the required SQL type. If the XQuery expression results in a node, the data() function is explicitly applied. If the XQuery result is an empty sequence, the result is the relational NULL value. If the result is a sequence of values and the SQL type is a single-column table type, the result is a table.

    This function provides the mechanism to return values that can then be used in SQL expressions or select clauses. The following example shows how two XQuery expressions are used to extract a calculated integer value and extract the value of an attribute as an SQL date value, which are then used in an SQL query.

     SELECT CustomerName, value(PurchaseOrders,        'declare namespace po = "http://www.example.com/po-schema"         count(/po:purchase-order)', integer) as ordercount FROM Customers WHERE value(PurchaseOrders,       'declare namespace po = "http://www.example.com/po-schema"       /po:purchase-order[1]/@orderdate', date) < date(2002-10-31) 
  • exists(XML, XQueryString) boolean

    This function applies the XQuery expression to the XML type instance and returns true if the result of the expression is not the empty sequence and false otherwise , as shown in the following example:

     SELECT CustomerName,        CAST(OrderForm AS             XML TYPED AS 'http://www.example.com/orderform-schema1')        as OrderForm1 FROM Customers WHERE exists(OrderForm,       'declare namespace o ="http://www.example.com/orderform-schema1"       /o:order') 
Setting the Dynamic Context Item in the XQuery Functions

Since the XML datatype always provides the extended document information item as the XQuery's context item, there is no need to use the doc() or collection() functions to refer to data. On the contrary, besides being problematic in the context of the SQL processing model, these two functions would need a URI-based naming scheme for documents and collections that again would need to be integrated with the relational naming framework. Thus, all XPath expressions in these embedded XQuery expressions can start directly with the starting slash.

Compiling the XQuery Expressions

The XQuery expressions could be given dynamically or as constant strings. If the XQuery expression is given as a constant string, then relational systems can compile the XQuery at the same time as the SQL statement.

Augmenting the XQuery Static Context

XQuery has a static context that provides several predefined bindings for such things as mapping namespace prefixes to namespace URIs, built-in functions and variables (such as the XQuery functions), built-in schema types (such as the XML Schema and XQuery built-in types), and the default collation. When integrating XQuery into the relational framework of an XML datatype and SQL, the relational system can add some additional static bindings to simplify the query writer's task.

First, the system can predefine some namespace bindings for SQL- related functions and types. For the purpose of this chapter, we assume that the following prefixes are added to the predefined static namespace context:

  • The prefix sql is bound to the namespace URI representing the relational database system's built-in functions.

  • The prefix sqltypes is bound to the namespace URI representing the mapping of the relational types into the XQuery's type model.

In addition, the query writer can provide predefined namespace bindings using the SQL/XML extension to the SQL WITH clause as described in the SQL-2003 standard.

Next, the XQuery's default collation is implicitly set to the relational collation in effect for the XML datatype (either from the database system's default or the collation associated with the XML datatype). The functions belonging to the namespace referenced with the prefix sql are added to the static function context, as are constructor functions for the SQL built-in types. The SQL built-in types are added to the static type context, as are the types of the schema components in case of a statically constrained XML datatype.

Note that for the schema types that are used to constrain the XML datatype, the namespace prefix also needs to be provided. If only one schema is being associated with an XML datatype instance, then the system could associate a predefined prefix such as data or the default namespace with the schema's target namespace URI. If multiple schemata can be used to constrain an XML datatype, the query writers will still need to provide the explicit namespace declarations in the XQuery prolog or use the SQL/XML extension to the SQL WITH clause.

Having the XQuery static context extended implicitly in the way described above removes a large amount of syntactic repetition from the embedded XQuery prologs.

Providing Access to SQL Data inside XQuery

The functions above are capable of extracting XML values into the relational processing context. Sometimes, however, an XQuery expression needs to access data from the relational realm. If the relational database system provides variables, then the XQuery system probably wants to provide access to the variables. In any case, we also need access to the columns of the SELECT statement tuple set. This section presents the mechanisms to access relational data in the context of XQuery and uses the same mechanisms to provide access to other XML datatype instances to allow joining two XML datatype instances.

Access to variables can be provided in XQuery in two ways: either by mapping the relational variables directly into the static variable context (and mapping the relational types into their equivalent XML Schema/XQuery types) or by using a functional approach. In the first case, the variables should belong to the sql namespace. However, since SQL names allow more of the Unicode code points as name characters than XML names , such characters need to be encoded. For example, an XML name cannot contain a space (code point 0x20) but an SQL name can. The SQL/XML part of the SQL-2003 standard provides such a mapping by encoding invalid XML name characters into _xHHHH_ , where HHHH denotes the Unicode code point of the invalid XML character (e.g., _x0020_ for a space). Since a user would have to remember to encode such names, a more convenient alternative would be to provide a built-in pseudo-function sql:variable() that takes the name of the SQL variable as a string constant argument and returns the variable's value with its type mapped to the XML Schema/XQuery type. This approach requires more typing by the user, but it avoids the name-encoding issues.

Access to a column of an SQL statement during the query execution is very useful for integrating relational values into an XQuery. The access can be provided by a built-in pseudo-function called sql:column() that takes an SQL column identifier as a constant string argument and refers to the corresponding cell of the tuple that the SQL statement's iterator is operating on. Listing 7.6 gives an example of a simple XQuery expression and how the column-reference function accesses the relational values. Both sql:column() and sql:variable() can refer to an instance of the XML datatype. This allows joins across different XML datatype instances in any of the query functions above. The two functions operate equivalent to the built-in XQuery function fn:doc() in that they return the document node of the other XML datatype instance.

Listing 7.6 Example of sql:column() Usage
 Table A:              Table B: id:int     v:int      id:int     v:int _____________________________________________________________________      1        42           1         1      2        44           2         3      3        46           4         7 Query: SELECT A.id, query(CAST('' AS XML),              '<A><v>{sql:column("A.v")}</v></A>,               if (not(empty(sql:column("B.v"))               then                 <B><v>{sql:column("B.v")}</v></B>               else ()')        as values FROM A LEFT OUTER JOIN B ON A.id=B.id Result: id:int     values:XML _____________________________________________________________________      1     <A><v>42</v></A><B><v>1</v></B>      2     <A><v>44</v></A><B><v>3</v></B>      3     <A><v>46</v></A> 
Mapping SQL Types to XML Schema Types

In order to be able to import the relational values into the XQuery context, the SQL types must be mapped to XML Schema types that describe the SQL types for the XQuery type system. Since XML Schema does not allow the addition of new primitive simple types as direct derivations from xs:anySimpleType , these types must be added as derivations from the built-in XML Schema simple types. The SQL/XML part of the ISO SQL-2003 standard [SQL2003] provides such a mapping that maps to the most appropriate derivation of the XML Schema simple types by using the type facets to model the range and conditions of the SQL types. Additionally, it allows relational systems to provide additional information about the original SQL type using the XML Schema annotation mechanism. An example of such a mapping follows .

 Relational Type: INTEGER                  (implementation range is [-2147483648,2147483647]) maps to: <xs:simpletype name="INTEGER">   <xs:restriction base="xsd:int"/> </xs:simpletype> 

However, given that many relational systems provide additional built-in SQL types, the user must expect relational systems to provide mappings that describe the implementation types; thus, the user should probably provide the types in implementation-specific namespaces. Also, the current ISO SQL/XML mapping has a severe shortcoming in that it maps the character string types into individual XML Schema types, one for each length that is directly derived from xs:string . This not only means that there are way too many individual string types, but also that they are not type-compatible with each other since they each belong to a separate subtype tree of xs:string . Thus, we should expect built-in functions that return mapped SQL string types to return the main relational character types as simple derivations, such as sqltypes:nvarchar being derived from xs:string , disregarding the length component.

Once the SQL types are mapped into XML Schema simple types, mapping values in general becomes simple. The SQL/XML part of the ISO SQL-2003 standard provides detailed rules for the value mapping. However, bear in mind that there are still certain SQL character type values that are allowed in SQL but are considered invalid XML characters. For example, most of the low-range ASCII control characters such as ETX (code point 0x3) or BELL (code point 0x7) are not allowed in XML documents. Thus the integration must take them into account and provide correct error handling.

Adding XQuery Function Libraries

XQuery provides a way of importing externally defined XQuery function libraries. The import mechanism follows a model based on namespace URIs and location hints similar to the XML Schema import model. Given this analogy, a relational system can therefore use a similar mechanism to provide storage of XQuery function libraries by extending the metadata to allow them to be stored according to their namespace URI or an SQL identifier. Such libraries would then be loaded and stored in a precompiled format and imported into the XQuery static context when referred to by the user. Listing 7.7 gives an example in which the function library is defined using a target namespace and then used inside a query.

Listing 7.7 Example of an XQuery Function Library
 CREATE XML FUNCTION NAMESPACE    N'module "http://www.example.com/myfns"      declare namespace myf = "http://www.example.com/myfns"      define function myf:in-King-County-WA             ($zip as xs:integer)             as xs:boolean      {$zip < 98100 and $zip>=98000 }      define function myf:King-County-WA-salestax($x as xs:decimal)             as xs:decimal      {$x * 0.088}     ' SELECT CustomerName, query(PurchaseOrder,        'import module namespace myf="http://www.example.com/myfns"         declare namespace po = "http://www.example.com/po-schema"         for $p in /po:purchase-order, $d in $p/po:details         let $net as xs:decimal := $d/@qty * $d/@price         where myf:in-King-County-WA($p/po:shipTo/po:zip)         return           <po-detail id="{$p/@id}"                      total="{$net +                              myf:King-County-WA-salestax($net)}"/>')        as po-detail-price FROM Customers 

A Note on the XQuery Data Modification Language

XQuery has not yet defined an update language, but it will undoubtedly do so. When that occurs, we will require a way to integrate that update capability with the SQL data modification language. Simply populating a column cell with an XML datatype instance can be performed using the relational insertion (see Listing 7.8, which assumes an implicit cast from string to XML that parses the data). However, updating XML also requires the ability to make finer-grained, node-level changes to an XML column. We assume that this is done via a relational update to the XML datatype instance combined with a new mutator : modify(XML, XQueryDMLString) , where XQueryDMLString is expressed in the as-yet-undefined update language for XQuery. In the following examples, we use an update language extension to XQuery. Other update languages, such as so-called updategrams [RYS2001], will probably be provided as well in similar fashion.

Listing 7.8 Inserting XML Data
 INSERT INTO Customers VALUES (42, N'Jane Doe',      N'<po:purchase-order           xmlns:po="http://www.example.com/po-schema" id="2000">          <po:shipTo>            <po:address>42 Main Street</po:address>            <po:city>Redmond</po:city>            <po:state>WA</po:state>            <po:zip>98052</po:zip>          </po:shipTo>          <po:details qty="1" price="23.44"/>        </po:purchase-order>', NULL, NULL) 

Since an XML datatype is a relational column type, even a node-level change will have to lock the whole XML datatype either directly or indirectly via the row locks at the isolation level of the relational system's context. Making the concurrency control aware of the tree structure of the XML datatype instances will be certainly an important research area to improve the performance. The following is an example of how po:detail elements may be deleted from the PurchaseOrders XML instances. The concurrency control will not only lock the parent of the deleted subtrees but the whole XML instance of all the rows that satisfy the update statements where clause.

 UPDATE Customers SET modify(PurchaseOrders,     'declare namespace po = "http://www.example.com/po-schema"      delete /po:purchase-order/po:details' WHERE exists(PurchaseOrders,     'declare namespace po = "http://www.example.com/po-schema"      /po:purchase-order[po:details and po:shipTo/po:zip=98052]' 

If the XML datatype is constrained by an XML Schema, then updates may fail if they would result in an instance that violates the XML Schema constraints.



XQuery from the Experts(c) A Guide to the W3C XML Query Language
Beginning ASP.NET Databases Using VB.NET
ISBN: N/A
EAN: 2147483647
Year: 2005
Pages: 102

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net