Recipe13.2.Working with Excel XML Spreadsheets


Recipe 13.2. Working with Excel XML Spreadsheets

Problem

You want to export data from Excel to XML, but not in the native format supported by Microsoft.

Solution

XSLT 1.0

If you have an Excel spreadsheet that looks like this:

Date

Price

Volume

20010817

61.88

260163

20010820

62.7

241859

20010821

60.78

233989

20010822

60.66

387444


Then the Excel (XP or 2003) XML format looks like this:

<?xml version="1.0"?> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"  xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas- microsoft-com:office:excel"  xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html= "http://www.w3.org/TR/REC-html40">   <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">     <Author>Salvatore R. Mangano</Author>     <LastAuthor>Salvatore R. Mangano</LastAuthor>     <Created>2002-08-18T00:43:49Z</Created>     <LastSaved>2002-08-18T02:19:21Z</LastSaved>     <Company>Descriptix</Company>     <Version>10.3501</Version>   </DocumentProperties>   <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">     <DownloadComponents/>     <LocationOfComponents HRef="/"/>   </OfficeDocumentSettings>   <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">     <WindowHeight>9915</WindowHeight>     <WindowWidth>10140</WindowWidth>     <WindowTopX>240</WindowTopX>     <WindowTopY>255</WindowTopY>     <ProtectStructure>False</ProtectStructure>     <ProtectWindows>False</ProtectWindows>   </ExcelWorkbook>   <Styles>     <Style ss: ss:Name="Normal">       <Alignment ss:Vertical="Bottom"/>       <Borders/>       <Font/>       <Interior/>       <NumberFormat/>       <Protection/>     </Style>   </Styles>   <Worksheet ss:Name="msft">     <Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="5" x:FullColumns="1"      x:FullRows="1">       <Row>         <Cell>           <Data ss:Type="String">Date</Data>         </Cell>         <Cell>           <Data ss:Type="String">Price</Data>         </Cell>         <Cell>           <Data ss:Type="String">Volume</Data>         </Cell>       </Row>       <Row>         <Cell>           <Data ss:Type="Number">20010817</Data>         </Cell>         <Cell>           <Data ss:Type="Number">61.88</Data>         </Cell>         <Cell>           <Data ss:Type="Number">260163</Data>         </Cell>       </Row>       <Row>         <Cell>           <Data ss:Type="Number">20010820</Data>         </Cell>         <Cell>           <Data ss:Type="Number">62.7</Data>         </Cell>         <Cell>           <Data ss:Type="Number">241859</Data>         </Cell>       </Row>       <Row>         <Cell>           <Data ss:Type="Number">20010821</Data>         </Cell>         <Cell>           <Data ss:Type="Number">60.78</Data>         </Cell>         <Cell>           <Data ss:Type="Number">233989</Data>         </Cell>       </Row>       <Row>         <Cell>           <Data ss:Type="Number">20010822</Data>         </Cell>         <Cell>           <Data ss:Type="Number">60.66</Data>         </Cell>         <Cell>           <Data ss:Type="Number">387444</Data>         </Cell>       </Row>     </Table>     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">       <Selected/>       <Panes>         <Pane>           <Number>3</Number>           <ActiveRow>11</ActiveRow>           <ActiveCol>5</ActiveCol>         </Pane>       </Panes>       <ProtectObjects>False</ProtectObjects>       <ProtectScenarios>False</ProtectScenarios>     </WorksheetOptions>   </Worksheet> </Workbook>

which is probably not what you had in mind!

This example conveniently maps an Excel XML file to a simpler XML file. Many spreadsheets created in Excel have a structure in which the first row contains column names and subsequent rows contain data for those columns.

One obvious mapping would convert the column names into element names and the remaining cells into element content. The only missing pieces of information are the names of the top-level element and the element containing each row. This stylesheet takes these names as parameters with some obvious defaults. It converts some of the useful metadata into comments and throws away the Excel-specific stuff. This section provides several other parameters that increase the generality of the conversion, such as which row contains the column names, where the data starts, and what to do about empty cells:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0"                  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                  xmlns:o="urn:schemas-microsoft-com:office:office"                  xmlns:x="urn:schemas-microsoft-com:office:excel"                  xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">         <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>      <!-- The name of the top-level element -->   <xsl:param name="topLevelName" select=" 'Table' "/>   <!-- The name of each row -->   <xsl:param name="rowName" select=" 'Row' "/>   <!-- The namespace to use -->   <xsl:param name="namespace"/>   <!-- The namespace prefix to use -->   <xsl:param name="namespacePrefix"/>   <!-- The character to use if column names contain whitespace -->   <xsl:param name="wsSub" select="'_'"/>   <!--Determines which row contains the col names-->   <xsl:param name="colNamesRow" select="1"/>   <!--Determines which row the data begins -->   <xsl:param name="dataRowStart" select="2"/>   <!-- If false then cells with null or whitespace-only content -->   <!-- will be skipped -->   <xsl:param name="includeEmpty" select="true( )"/>   <!-- If false then author and creation metadata will not be put -->   <!-- into a comment-->   <xsl:param name="includeComment" select="true( )"/>      <!--Normalize the namespacePrefix -->   <xsl:variable name="nsp">     <xsl:if test="$namespace">       <!-- Only use prefix if namespace is specified -->       <xsl:choose>         <xsl:when test="contains($namespacePrefix,':')">           <xsl:value-of                 select="concat(translate(substring-before(                                             $namespacePrefix,                                             ':'),' ',''),':')"/>         </xsl:when>         <xsl:when test="translate($namespacePrefix,' ','')">           <xsl:value-of                 select="concat(translate($namespacePrefix,' ',''),':')"/>         </xsl:when>         <xsl:otherwise/>       </xsl:choose>     </xsl:if>   </xsl:variable>      <!--Get the names of all the columns with whitespace replaced by  -->   <xsl:variable name="COLS" select="/*/*/*/ss:Row[$colNamesRow]/ss:Cell"/>      <xsl:template match="o:DocumentProperties">     <xsl:if test="$includeComment">       <xsl:text>&#xa;</xsl:text>       <xsl:comment>        <xsl:text>&#xa;</xsl:text>         <xsl:if test="normalize-space(o:Company)">           <xsl:text>Company: </xsl:text>           <xsl:value-of select="o:Company"/>           <xsl:text>&#xa;</xsl:text>         </xsl:if>         <xsl:text>Author: </xsl:text>         <xsl:value-of select="o:Author"/>         <xsl:text>&#xa;</xsl:text>         <xsl:text>Created on: </xsl:text>         <xsl:value-of select="translate(o:Created,'TZ',' ')"/>         <xsl:text>&#xa;</xsl:text>         <xsl:text>Last Author: </xsl:text>         <xsl:value-of select="o:LastAuthor"/>         <xsl:text>&#xa;</xsl:text>         <xsl:text>Saved on:</xsl:text>         <xsl:value-of select="translate(o:LastSaved,'TZ',' ')"/>         <xsl:text>&#xa;</xsl:text>       </xsl:comment>     </xsl:if>   </xsl:template>      <xsl:template match="ss:Table">     <xsl:element           name="{concat($nsp,translate($topLevelName,                        '&#x20;&#x9;&#xA;',$wsSub))}"           namespace="{$namespace}">       <xsl:apply-templates select="ss:Row[position( ) >= $dataRowStart]"/>     </xsl:element>   </xsl:template>      <xsl:template match="ss:Row">     <xsl:element         name="{concat($nsp,translate($rowName,                       '&#x20;&#x9;&#xA;',$wsSub))}"          namespace="{$namespace}">       <xsl:for-each select="ss:Cell">         <xsl:variable name="pos" select="position( )"/>            <!-- Get the correct column name even if there were empty -->        <!-- cols in original spreadsheet -->              <xsl:variable name="colName">           <xsl:choose>             <xsl:when test="@ss:Index and                              $COLS[@ss:Index = current( )/@ss:Index]">               <xsl:value-of                    select="$COLS[@ss:Index = current( )/@ss:Index]/ss:Data"/>             </xsl:when>             <xsl:when test="@ss:Index">               <xsl:value-of                      select="$COLS[number(current( )/@ss:Index)]/ss:Data"/>             </xsl:when>             <xsl:otherwise>               <xsl:value-of select="$COLS[$pos]/ss:Data"/>             </xsl:otherwise>           </xsl:choose>         </xsl:variable>                  <xsl:if test="$includeEmpty or                        translate(ss:Data,'&#x20;&#x9;&#xA;','')">           <xsl:element                name="{concat($nsp,translate($colName,                                       '&#x20;&#x9;&#xA;',$wsSub))}"                 namespace="{$namespace}">             <xsl:value-of select="ss:Data"/>           </xsl:element>         </xsl:if>                </xsl:for-each>     </xsl:element>   </xsl:template>      <xsl:template match="text( )"/>    </xsl:stylesheet>

The result of the transformation, with default parameter values, is the much more direct XML representation that follows:

<Table>   <Row>     <Date>20010817</Date>     <Price>61.88</Price>     <Volume>260163</Volume>   </Row>   <Row>     <Date>20010820</Date>     <Price>62.7</Price>     <Volume>241859</Volume>   </Row>   <Row>     <Date>20010821</Date>     <Price>60.78</Price>     <Volume>233989</Volume>   </Row>   <Row>     <Date>20010822</Date>     <Price>60.66</Price>     <Volume>387444</Volume>   </Row> </Table>

XSLT 2.0

The main improvements of using XSLT 2.0 is the ability to introduce some helper functions to remove redundant code and the use of more succinct XPath 2.0 syntax.

<xsl:stylesheet version="2.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   xmlns:xs="http://www.w3.org/2001/XMLSchema"   xmlns:fn="http://www.w3.org/2005/02/xpath-functions"   xmlns:o="urn:schemas-microsoft-com:office:office"   xmlns:x="urn:schemas-microsoft-com:office:excel"   xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"  xmlns:ckbk="http://www.oreilly.com/xsltckbk">     <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>      <!-- The name of the top-level element -->   <xsl:param name="topLevelName" select=" 'Table' " as="xs:string"/>   <!-- The name of each row -->   <xsl:param name="rowName" select=" 'Row' " as="xs:string"/>   <!-- The namespace to use -->   <xsl:param name="namespace" select=" '' " as="xs:string"/>   <!-- The namespace prefix to use -->   <xsl:param name="namespacePrefix" select=" '' " as="xs:string" />   <!-- The character to use if column names contain whitespace -->   <xsl:param name="wsSub" select="'_'" as="xs:string"/>   <!--Determines which row contains the col names-->   <xsl:param name="colNamesRow" select="1" as="xs:integer"/>   <!--Determines which row the data begins -->   <xsl:param name="dataRowStart" select="2" as="xs:integer"/>   <!-- If false then cells with null or whitespace-only content -->   <!-- will be skipped -->   <xsl:param name="includeEmpty" select="true( )" as="xs:boolean"/>   <!-- If false then author and creation metadata will not be put -->   <!-- into a comment-->   <xsl:param name="includeComment" select="true( )" as="xs:boolean"/>      <!--Normalize the namespacePrefix -->   <xsl:variable name="nsp" as="xs:string"         select="if (contains($namespacePrefix,':'))           then concat(translate(substring-before($namespacePrefix,':'),' ',''),':')          else          if (matches($namespacePrefix,'\W'))          then concat(translate($namespacePrefix,' ',''),':')           else '' "/>       <!--Get the names of all the columns-->   <xsl:variable name="COLS" select="/*/*/*/ss:Row[$colNamesRow]/ss:Cell"/>      <xsl:template match="o:DocumentProperties">     <xsl:if test="$includeComment">       <xsl:text>&#xa;</xsl:text>       <xsl:comment select="concat('&#xa;',                                   ckbk:comment(o:Company),                                    ckbk:comment(o:Author),                                   ckbk:comment(o:Created,'Created on'),                                   ckbk:comment(o:LastAuthor,'Last Author'),                                   ckbk:comment(o:LastSaved,'Saved on'))"/>     </xsl:if>     <xsl:text>&#xa;</xsl:text>   </xsl:template>      <xsl:template match="ss:Table">     <xsl:element         name="{ckbk:makeName($nsp,$topLevelName,$wsSub)}"           namespace="{$namespace}">       <xsl:apply-templates select="ss:Row[position( ) ge $dataRowStart]"/>     </xsl:element>   </xsl:template>      <xsl:template match="ss:Row">     <xsl:element         name="{ckbk:makeName($nsp,$rowName,$wsSub)}"          namespace="{$namespace}">       <xsl:for-each select="ss:Cell">         <xsl:variable name="pos" select="position( )"/>            <!-- Get the correct column name even if there were empty -->        <!-- cols in original spreadsheet -->              <xsl:variable name="colName" as="xs:string"                 select="if (@ss:Index and $COLS[@ss:Index = current( )/@ss:Index])                         then $COLS[@ss:Index = current( )/@ss:Index]/ss:Datae                        else                        if (@ss:Index)                        then $COLS[number(current( )/@ss:Index)]/ss:Data                        else $COLS[$pos]/ss:Data"/>                  <xsl:if test="$includeEmpty or                        translate(ss:Data,'&#x20;&#x9;&#xA;','')">           <xsl:element                name="{ckbk:makeName($nsp,$colName,$wsSub)}"                 namespace="{$namespace}">             <xsl:value-of select="ss:Data"/>           </xsl:element>         </xsl:if>       </xsl:for-each>     </xsl:element>   </xsl:template>      <xsl:template match="text( )"/>  <xsl:function name="ckbk:makeName" as="xs:string">   <xsl:param name="nsp" as="xs:string"/>   <xsl:param name="name" as="xs:string"/>   <xsl:param name="wsSub" as="xs:string"/>   <xsl:sequence select="concat($nsp,translate($name,                                       '&#x20;&#x9;&#xA;',$wsSub))"/>  </xsl:function>    <xsl:function name="ckbk:comment" as="xs:string">   <xsl:param name="elem"/>   <xsl:sequence select="ckbk:comment($elem, local-name($elem))"/>  </xsl:function>  <xsl:function name="ckbk:comment" as="xs:string">   <xsl:param name="elem"/>   <xsl:param name="label" as="xs:string"/>   <xsl:sequence select="if (normalize-space($elem))                                         then concat($label,': ',$elem,'&#xa;')                                        else '' "/>  </xsl:function>    </xsl:stylesheet>

Discussion

I almost did not include this recipe in the book because it initially seemed trivial. However, I realized that a robust solution needs to handle many special cases, and many implementations (including my first) would miss them. For example, spreadsheets often contain empty columns used as spacers. You need to know how to handle them by looking for the @ss:Index attribute. This book's initial version also hardcoded many of the choices this version exposes as parameters.

At least one obvious additional extension could be made to this stylesheet: the handling of multiple ss:Worksheet elements. This handling could be done by specifying the worksheet number as a parameter:

  <xsl:param name="WSNum" select="1"/>       <xsl:variable name="COLS"           select="/*/ss:Worksheet[$WSNum]/*/ss:Row[$colNamesRow]/ss:Cell"/>       <xsl:template match="ss:Workbook">     <xsl:element name="{concat($nsp,translate($topLevelName,                       '&#x20;&#x9;&#xA;',$wsSub))}"                   namespace="{$namespace}">       <xsl:apply-templates select="ss:Worksheet[number($WSNum)]/ss:Table"/>     </xsl:element>   </xsl:template>

A more ambitious solution handles each Worksheet in a multiple Worksheet document as a separate element in the resulting document. This setup means that the column names can no longer be handled as a global variable:

  <xsl:template match="ss:Workbook">     <xsl:element name="{concat($nsp,translate($topLevelName,                       '&#x20;&#x9;&#xA;',$wsSub))}"                   namespace="{$namespace}">       <xsl:choose>         <xsl:when test="number($WSNum) > 0">           <xsl:apply-templates                 select="ss:Worksheet[number($WSNum)]/ss:Table">             <xsl:with-param name="COLS"                   select="ss:Worksheet[number($WSNum)]                                          /*/ss:Row[$colNamesRow]/ss:Cell"/>           </xsl:apply-templates>         </xsl:when>         <xsl:otherwise>           <xsl:for-each select="ss:Worksheet">             <xsl:element                   name="{concat($nsp,translate(@ss:Name,                         '&#x20;&#x9;&#xA;',$wsSub))}"                  namespace="{$namespace}">               <xsl:apply-templates select="ss:Table">                 <xsl:with-param name="COLS"                                  select="*/ss:Row[$colNamesRow]/ss:Cell"/>               </xsl:apply-templates>             </xsl:element>           </xsl:for-each>         </xsl:otherwise>       </xsl:choose>     </xsl:element>   </xsl:template>      <xsl:template match="ss:Table">     <xsl:param name="COLS"/>       <xsl:apply-templates select="ss:Row[position( ) >= $dataRowStart]">         <xsl:with-param name="COLS" select="$COLS"/>       </xsl:apply-templates>   </xsl:template>      <xsl:template match="ss:Row">     <xsl:param name="COLS"/>          <!-- The rest is the same as original ... -->         </xsl:template>

The only trouble with this solution is that it assumes that the column names have to be in the same row in each worksheet.




XSLT Cookbook
XSLT Cookbook: Solutions and Examples for XML and XSLT Developers, 2nd Edition
ISBN: 0596009747
EAN: 2147483647
Year: 2003
Pages: 208
Authors: Sal Mangano

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net