Every composed representation requires some ability to express how an XML document is composed from the rows and columns of an SQL database. Over the last several years , many different approaches have been tried, some as research projects, some commercially. These techniques may look very different, but they have a number of conceptual similarities, which we will examine. In the next section we look at some specific techniques that use these concepts. As usual, we do not attempt to list the different composition techniques currently in use by vendors , concentrating instead on the underlying principles and variations. Certain concepts are common to most composition techniques. First of all, a composition technique is either implicit or explicit. With an implicit technique, the user or application has no control over how the relational data is translated into XML. An explicit technique allows control (either on the part of the application accessing the XML or, in the case of shredding , during the design of the database representation). If explicit control is allowed, it may include the following capabilities:
Here's a scorecard of how these different capabilities match up with different application types: As a general rule, all of them support the application types of selecting from, querying, and transforming XML documents. Update applications are compatible with items 15, so long as the values for the XML come from a single SQL table; as soon as joins or many other SQL capabilities are involved, full update capability becomes impossible (though limited update capability may still exist). Items 6 and 7 are also not compatible with full update capability. These are specific examples of a general problem known in the relational database world as "updating through views." The problem stems from the fact that the operations of a powerful query language are not, in general, uniquely reversible. Therefore, if one attempts to update the result of a (nonreversible) query, there is usually no unique way to determine how to update the original data in a corresponding way. A simple example is attempting to update a generated field that has been created by concatenating two source fields: If you change the value of the concatenation, how should you "break" that value across the sources? To preserve update capability, therefore, we must restrict the complexity of the query or transformation operations to those that are reversible. Generation of XML Structure through Hierarchical JoinsTo illustrate generation of XML with a join schematically, suppose that we have two tables, Table A and Table B. If we generate XML structure through a join from A to B, the XML is going to look something like that shown in Listing 6.4. The general idea is that within an element corresponding to a row of Table A, there are subelements that correspond to "matching" rows in Table Bthat is, the rows of Table B that join with a particular row of Table A. The relationship is actually most similar to a relational left outer join, because the mapping usually contains all rows of Table A, even those which have no corresponding rows in Table B. Listing 6.4 Generation of XML Structure from a Join between Two Tables<ElementforRowofA> <SubElementFromColumnofA> ... <SubElementforMatchingRowofB> <SubElementFromColumnofB> ... </SubElementforMatchingRowofB> <SubElementforMatchingRowofB> <SubElementFromColumnofB> ... </SubElementforMatchingRowofB> ... </ElementforRowofA> The structure shown in Listing 6.4 may vary, of course. In general, the elements corresponding to data from Table A may be intermixed with those corresponding to data from Table B. Some element may contain all the data from Table B collectively, or the elements shown that contain all the data from a given row of B may be missing. Most important, there is no limitation on the number of tables that may be joined in this fashion to produce a single XML document. Listing 6.5 shows an example that uses four tables. Note in particular that Table A joins to B, which in turn joins to C, while separately Table A joins to D. We call this arrangement a nonlinear hierarchy , because A has more than a single child that has been added through different hierarchical joins. Note that this relationship could not be modeled with standard relational joins (though it could be modeled in a nested-relational representation). Creating XML structure through hierarchical joins is the most widely supported and most commonly used idiom in composition. It is particularly useful with normalized relational data or star schemas. Listing 6.5 Complex Hierarchy Generated through Multiple Joins<ElementforA> <AContents>... <ElementsforB> <BContents>... <ElementsforC> <CContents>... </ElementsforC> </ElementsforB> <ElementsforD> <DContents>... </ElementsforD> </ElementforA> Generation of XML Structure through Hierarchical GroupingTo illustrate generation of XML with hierarchical grouping, suppose that we generate XML structure through grouping on a single Table A. The resulting XML will look similar to that in Listing 6.6. In this scenario, we create an XML element for every unique value of a set of grouping columns of A, and then within that element occur subelements corresponding to data for nongrouped columns of the individual rows within the group. Generating structure through hierarchical grouping is particularly useful if we want to recreate a "normalized" XML structure from denormalized relational data. Listing 6.6 Generation of XML Structure from Grouping on a Single Table<ElementforGroupKeyofA> <SubElementFromGroupingColumnofA> ... <SubElementforSingleRowofA> <SubElementFromNonGroupedColumnofA> ... </SubElementforSingleRowofA> <SubElementforSingleRowofA> <SubElementFromNonGroupedColumnofA> ... </SubElementforSingleRowofA> ... </ElementforGroupKeyofA> Most composition techniques do not provide direct support for hierarchical grouping. When such support is lacking, the same effect can be generated by joining Table A to itself, where the "outer table" is a query that generates the unique group keys from A, and the inner table retrieves the matching rows. But the self-join technique can be much more expensive than a direct implementation of grouping. |