Recipe8.5.Merging Documents with Unlike Schema


Recipe 8.5. Merging Documents with Unlike Schema

Problem

You have two or more dissimilar documents, and you would like to merge them into a single document.

Solution

The process of merging dissimilar data can vary from application to application. Therefore, this chapter cannot present a single generic solution. Instead, it anticipates common ways for two dissimilar documents to be brought together and provides solutions for each case.

Incorporate one document as a subpart of a parent document

Incorporating a document as a subpart is the most trivial interpretation of this type of merge. The basic idea is to use xsl:copy-of to copy one document or document part into the appropriate part of a second document. The following example merges two documents into a container document that uses element names in the container as indications of what files to merge:

<MyNoteBook>   <friends>   </friends>   <coworkers>   </coworkers>   <projects>     <project>Replalce mapML with XSLT engine using Xalan C++</project>     <project>Figure out the meaning of life.</project>     <project>Figure out where the dryer is hiding all those missing socks</project>   </projects>   </MyNoteBook>     <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">       <xsl:import href="copy.xslt"/>       <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>   <xsl:strip-space elements="*"/>      <xsl:template match="friends | coworkers">     <xsl:copy>       <xsl:variable name="file" select="concat(local-name( ),'.xml')"/>       <xsl:copy-of select="document($file)/*/*"/>     </xsl:copy>   </xsl:template> ... </xsl:stylesheet>     <?xml version="1.0" encoding="UTF-8"?> <MyNoteBook>    <friends>       <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/>       <person firstname="Mike" lastname="Palmieri" age="28" height="5.10"/>       <person firstname="Vito" lastname="Palmieri" age="38" height="6.0"/>       <person firstname="Vinny" lastname="Mari" age="37" height="5.8"/>    </friends>    <coworkers>       <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/>       <person firstname="Al" lastname="Zehtooney" age="33" height="5.3"/>       <person firstname="Brad" lastname="York" age="38" height="6.0"/>       <person firstname="Charles" lastname="Xavier" age="32" height="5.8"/>    </coworkers>    <projects>       <project>Replalce mapML with XSLT engine using Xalan C++</project>       <project>Figure out the meaning of life.</project>       <project>Figure out where the dryer is hiding all those missing socks       </project>    </projects> </MyNoteBook>

An interesting variation of this case is a document that signals the inline inclusion of another document. The W3C defines a standard way of doing this, called XInclude (http://www.w3.org/TR/xinclude/). You can implement a general-purpose XInclude processor in XSLT by extending copy.xslt:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:import href="copy.xslt"/>     <xsl:output method="xml" indent="yes"/> <xsl:strip-space elements="*"/>     <xsl:template match="xi:include" xmlns:xi="http://www.w3.org/2001/XInclude">   <xsl:for-each select="document(@href)">      <xsl:apply-templates/>   </xsl:for-each> </xsl:template>      </xsl:stylesheet>

The xsl:for-each only changes the context to the included document. Then use xsl:apply-templates to continue copying the included document's content.

Weave two documents together

A variation of simple inclusion combines elements that are children of common parent element types. Consider two biologists who have collected information about animals separately. As a first step to building a unified animal database, they may decide to weave the data together at a point of structural commonality.

Biologist1 has this file:

<animals>   <mammals>     <animal common="chimpanzee" species="Pan troglodytes" order="Primates"/>     <animal common="human" species="Homo Sapien" family="Primates"/>   </mammals>   <reptiles>     <animal common="boa constrictor" species="Boa constrictor" order="Squamata"/>     <animal common="gecko" species="Gekko gecko" order="Squamata"/>   </reptiles>   <birds>     <animal common="sea gull" species="Larus occidentalis" order="Charadriiformes"/>     <animal common="Black-Backed Woodpecker" species="Picoides arcticus"     order="Piciformes"/>   </birds> </animals>

Biologist2 has this file:

<animals>   <mammals>     <animal common="hippo" species="Hippopotamus amphibius"      family=" Hippopotamidae"/>     <animal common="arabian camel" species="Camelus dromedarius" family="Camelidae"/>   </mammals>   <insects>     <animal common="Lady Bug" species="Adalia bipunctata" family="Coccinellidae"/>     <animal common="Dung Bettle" species=" Onthophagus australis"     family="Scarabaeidae"/>   </insects>   <amphibians>     <animal common="Green Sea Turtle" species="Chelonia mydas" family="Cheloniidae"/>     <animal common="Green Tree Frog" species=" Hyla cinerea" family="Hylidae "/>   </amphibians> </animals>

The files have similar but not identical schema. Both files contain the class Mammalia, but differ in the other organizational levels. At the animal level, one biologist recorded information about the animal's order, while the other recorded data about the animal's family. The following stylesheet weaves the documents together at the animal's class level (the second level in document structure):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/>       <xsl:param name="doc2file"/>      <xsl:variable name="doc2" select="document($doc2file)"/>   <xsl:variable name="thisDocsClasses" select="/*/*"/>    <xsl:template match="/*">   <xsl:copy>     <!-- Merge common sections between source doc and doc2. Also includes           sections unique to source doc. -->     <xsl:for-each select="*">       <xsl:copy>         <xsl:copy-of select="*"/>         <xsl:copy-of select="$doc2/*/*[name( ) = name(current( ))]/*"/>       </xsl:copy>     </xsl:for-each>         <!-- Merge sections unique to doc2 -->     <xsl:for-each select="$doc2/*/*">       <xsl:if test="not($thisDocsClasses[name( ) = name(current( ))])">         <xsl:copy-of select="."/>       </xsl:if>     </xsl:for-each>   </xsl:copy> </xsl:template>    </xsl:stylesheet>

Application of the stylesheet results in a document that can be further normalized by hand or through another automated method:

<animals>    <mammals>       <animal common="chimpanzee" species="Pan troglodytes" order="Primates"/>       <animal common="human" species="Homo Sapien" order="Primates"/>       <animal common="hippo" species="Hippopotamus amphibius"        family=" Hippopotamidae"/>       <animal common="arabian camel" species="Camelus dromedarius"        family="Camelidae"/>    </mammals>    <reptiles>       <animal common="boa constrictor" species="Boa constrictor" order="Squamata"/>       <animal common="gecko" species="Gekko gecko" order="Squamata"/>    </reptiles>    <birds>       <animal common="sea gull" species="Larus occidentalis"        order="Charadriiformes"/>       <animal common="Black-Backed Woodpecker" species="Picoides arcticus"        order="Piciformes"/>    </birds>    <insects>       <animal common="Lady Bug" species="Adalia bipunctata" family="Coccinellidae"/>       <animal common="Dung Bettle" species=" Onthophagus australis"        family="Scarabaeidae"/>    </insects>    <amphibians>       <animal common="Green Sea Turtle" species="Chelonia mydas"       family="Cheloniidae"/>       <animal common="Green Tree Frog" species=" Hyla cinerea" family="Hylidae "/>    </amphibians> </animals>

Join elements from two documents to make new elements

A less-trivial merge occurs when one document is juxtaposed with another document or made children of its elements, based on the elements' matching characteristic. For example, consider the following merge of documents containing different information about people:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">       <xsl:import href="copy.xslt"/>      <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>      <xsl:param name="doc2file"/>      <xsl:variable name="doc2" select="document($doc2file)"/>       <xsl:template match="person">     <xsl:copy>       <xsl:for-each select="@*">         <xsl:element name="{local-name( )}">           <xsl:value-of select="."/>         </xsl:element>       </xsl:for-each>       <xsl:variable name="matching-person"            select="$doc2/*/person[@name=concat(current( )/@firstname,' ',                                               current( )/@lastname)]"/>       <xsl:element name="smoker">         <xsl:value-of select="$matching-person/@smoker"/>       </xsl:element>       <xsl:element name="sex">         <xsl:value-of select="$matching-person/@sex"/>       </xsl:element>     </xsl:copy> </xsl:template>     </xsl:stylesheet>

This stylesheet performs two tasks. It converts attribute-encoded information in the input stylesheets to elements and merges information from $doc2 that is not present in the source document.

Discussion

Merging XML with disparate schema is less well-defined then merging documents of identical schema. This chapter discusses three interpretations of merging, but other, more complicated types could exist. One possibility is that a merge could bring documents together so that inclusion, weaving, and joining all play a part in the final result. As such, it would be difficult to create a single, generic, XSLT-based merge utility that solves everyone's particular merge problems. However, the examples in this section provide a useful head start in crafting more ambitious types of merges.

See Also

The examples in this section focused on merging elements in a one-to-one relationship. Recipe 9.5 shows how to join information in disparate XML from the perspective of database queries. These techniques are also applicable to merging in a one-to-many relationship.




XSLT Cookbook
XSLT Cookbook: Solutions and Examples for XML and XSLT Developers, 2nd Edition
ISBN: 0596009747
EAN: 2147483647
Year: 2003
Pages: 208
Authors: Sal Mangano

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net