Recipe7.2.Exporting XML to Delimited Data


Recipe 7.2. Exporting XML to Delimited Data

Problem

You need to convert some XML into data suitable for importing into another application such as a spreadsheet.

Solution

Many applications import delimited data. The most common format is called Comma Separated Values (CSV). Many spreadsheets and databases can handle CSV and other forms of delimited data. Mapping XML to delimited data can be simple or complex, depending on the difficulty of the mapping. This section starts with simple cases and progresses toward more complicated scenarios.

Create a CSV file from flat attribute-encoded elements

In this scenario, you have a flat XML file with elements mapping to rows and attributes mapping to columns.

This problem is trivial for any given XML file of the appropriate format. For example, the following stylesheet shown in Example 7-6 through Example 7-8 outputs a CSV based on the input people.xml.

Example 7-6. people.xml
<?xml version="1.0" encoding="UTF-8"?>     <people>   <person name="Al Zehtooney" age="33" sex="m" smoker="no"/>   <person name="Brad York" age="38" sex="m" smoker="yes"/>   <person name="Charles Xavier" age="32" sex="m" smoker="no"/>   <person name="David Williams" age="33" sex="m" smoker="no"/>   <person name="Edward Ulster" age="33" sex="m" smoker="yes"/>   <person name="Frank Townsend" age="35" sex="m" smoker="no"/>   <person name="Greg Sutter" age="40" sex="m" smoker="no"/>   <person name="Harry Rogers" age="37" sex="m" smoker="no"/>   <person name="John Quincy" age="43" sex="m" smoker="yes"/>   <person name="Kent Peterson" age="31" sex="m" smoker="no"/>   <person name="Larry Newell" age="23" sex="m" smoker="no"/>   <person name="Max Milton" age="22" sex="m" smoker="no"/>   <person name="Norman Lamagna" age="30" sex="m" smoker="no"/>   <person name="Ollie Kensington" age="44" sex="m" smoker="no"/>   <person name="John Frank" age="24" sex="m" smoker="no"/>   <person name="Mary Williams" age="33" sex="f" smoker="no"/>   <person name="Jane Frank" age="38" sex="f" smoker="yes"/>   <person name="Jo Peterson" age="32" sex="f" smoker="no"/>   <person name="Angie Frost" age="33" sex="f" smoker="no"/>   <person name="Betty Bates" age="33" sex="f" smoker="no"/>   <person name="Connie Date" age="35" sex="f" smoker="no"/>   <person name="Donna Finster" age="20" sex="f" smoker="no"/>   <person name="Esther Gates" age="37" sex="f" smoker="no"/>   <person name="Fanny Hill" age="33" sex="f" smoker="yes"/>   <person name="Geta Iota" age="27" sex="f" smoker="no"/>   <person name="Hillary Johnson" age="22" sex="f" smoker="no"/>   <person name="Ingrid Kent" age="21" sex="f" smoker="no"/>   <person name="Jill Larson" age="20" sex="f" smoker="no"/>   <person name="Kim Mulrooney" age="41" sex="f" smoker="no"/>   <person name="Lisa Nevins" age="21" sex="f" smoker="no"/> </people>

Example 7-7. A simple but input-specific CSV transform
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">      <xsl:output method="text"/>      <xsl:strip-space elements="*"/>            <xsl:template match="person">        <xsl:value-of select="@name"/>,<xsl:text/>        <xsl:value-of select="@age"/>,<xsl:text/>        <xsl:value-of select="@sex"/>,<xsl:text/>        <xsl:value-of select="@smoker"/>        <xsl:text>&#xa;</xsl:text>      </xsl:template>       </xsl:stylesheet>

Example 7-8. Output
Al Zehtooney,33,m,no Brad York,38,m,yes Charles Xavier,32,m,no David Williams,33,m,no Edward Ulster,33,m,yes Frank Townsend,35,m,no Greg Sutter,40,m,no ...

Although the solution is simple, it would be nice to create a generic stylesheet that can be customized easily for this class of conversion. Example 7-9 and Example 7-10 show a generic solution and how it might be used in the case of people.xml.

Example 7-9. generic-attr-to-csv.xslt
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"  xmlns:csv="http://www.ora.com/XSLTCookbook/namespaces/csv">     <xsl:param name="delimiter" select=" ',' "/>     <xsl:output method="text" />     <xsl:strip-space elements="*"/>       <xsl:template match="/">   <xsl:for-each select="$columns">     <xsl:value-of select="@name"/>     <xsl:if test="position( ) != last( )">       <xsl:value-of select="$delimiter/>     </xsl:if>   </xsl:for-each>   <xsl:text>&#xa;</xsl:text>   <xsl:apply-templates/> </xsl:template>     <xsl:template match="/*/*">   <xsl:variable name="row" select="."/>      <xsl:for-each select="$columns">     <xsl:apply-templates select="$row/@*[local-name(.)=current( )/@attr]"      mode="csv:map-value"/>     <xsl:if test="position( ) != last( )">       <xsl:value-of select="$delimiter"/>     </xsl:if>   </xsl:for-each>       <xsl:text>&#xa;</xsl:text>   </xsl:template>     <xsl:template match="@*" mode="map-value">   <xsl:value-of select="."/> </xsl:template>     </xsl:stylesheet>

Example 7-10. Using the generic solution to process people.xml
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"  xmlns:csv="http://www.ora.com/XSLTCookbook/namespaces/csv">     <xsl:import href="generic-attr-to-csv.xslt"/>     <!--Defines the mapping from attributes to columns --> <xsl:variable name="columns" select="document('')/*/csv:column"/>     <csv:column name="Name" attr="name"/> <csv:column name="Age" attr="age"/> <csv:column name="Gender" attr="sex"/> <csv:column name="Smoker" attr="smoker"/>     <!-- Handle custom attribute mappings -->     <xsl:template match="@sex" mode="csv:map-value">   <xsl:choose>     <xsl:when test=".='m'">male</xsl:when>     <xsl:when test=".='f'">female</xsl:when>     <xsl:otherwise>error</xsl:otherwise>   </xsl:choose> </xsl:template>     </xsl:stylesheet>

This solution is table-driven. The generic-attr-to-csv.xslt stylesheet uses a variable containing csv:column elements that are defined in the importing spreadsheet. The importing spreadsheet needs only to arrange the csv:column elements in the order in which the resulting columns should appear in the output. The csv:column elements define the mapping between a named column and an attribute name in the input XML. Optionally, the importing stylesheet can translate the values of certain attributes by providing a template that matches the specified attribute using the mode csv:map-value. Here you use such a template to translate the abbreviated @sex values in people.xml. Any common sets of mapping in use can be placed in a third stylesheet and imported as well. The nice thing about this solution is that it is easy for someone with only very limited XSLT knowledge to define a new CSV mapping. As an added benefit, the generic stylesheet defines a top-level parameter that can change the default delimiting character from a comma to something else.

Create a CSV file from flat element-encoded data

In this scenario, you have a flat XML file with elements mapping to rows and children mapping to columns.

This problem is similar to the previous one, except you have XML that uses elements rather than attributes to encode the columns. You can also provide a generic solution here, as shown in Example 7-11 to Example 7-14.

Example 7-11. People using elements
<people>   <person>     <name>Al Zehtooney</name>     <age>33</age>     <sex>m</sex>     <smoker>no</smoker>   </person>   <person>     <name>Brad York</name>     <age>38</age>     <sex>m</sex>     <smoker>yes</smoker>   </person>   <person>     <name>Charles Xavier</name>     <age>32</age>     <sex>m</sex>     <smoker>no</smoker>   </person>   <person>     <name>David Williams</name>     <age>33</age>     <sex>m</sex>     <smoker>no</smoker>   </person> ... </people>

Example 7-12. generic-elem-to-csv.xslt
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"     xmlns:csv="http://www.ora.com/XSLTCookbook/namespaces/csv">     <xsl:param name="delimiter" select=" ',' "/>     <xsl:output method="text" />     <xsl:strip-space elements="*"/>     <xsl:template match="/">   <xsl:for-each select="$columns">     <xsl:value-of select="@name"/>    <xsl:if test="position( ) != last( )">       <xsl:value-of select="$delimiter"/>     </xsl:if>   </xsl:for-each>   <xsl:text>&#xa;</xsl:text>   <xsl:apply-templates/> </xsl:template>     <xsl:template match="/*/*">   <xsl:variable name="row" select="."/>      <xsl:for-each select="$columns">     <xsl:apply-templates         select="$row/*[local-name(.)=current( )/@elem]" mode="csv:map-value"/>     <xsl:if test="position( ) != last( )">     <xsl:value-of select="$delimiter"/>     </xsl:if>   </xsl:for-each>       <xsl:text>&#xa;</xsl:text>   </xsl:template>     <xsl:template match="node( )" mode="map-value">   <xsl:value-of select="."/> </xsl:template>

Example 7-13. people-elem-to-csv.xslt
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"     xmlns:csv="http://www.ora.com/XSLTCookbook/namespaces/csv">     <xsl:import href="generic-elem-to-csv.xslt"/>     <!--Defines the mapping from attributes to columns --> <xsl:variable name="columns" select="document('')/*/csv:column"/>     <csv:column name="Name" elem="name"/> <csv:column name="Age" elem="age"/> <csv:column name="Gender" elem="sex"/> <csv:column name="Smoker" elem="smoker"/>     </xsl:stylesheet>

Example 7-14. Example 7-14. Output
Name,Age,Gender,Smoker Al Zehtooney,33,m,no Brad York,38,m,yes Charles Xavier,32,m,no David Williams,33,m,no ...

Handle more complex mappings

In this scenario, you must deal with an arbitrary mapping of both attributes and elements to rows and columns. Here the document order does not map as nicely onto row or column order. In addition, the mapping may be sparse, in the sense that many empty values must be generated in the CSV data.

Consider, for example, the following XML representing an expense report of a soon-to-be-fired employee:

<ExpenseReport statementNum="123">   <Employee>     <Name>Salvatore Mangano</Name>     <SSN>999-99-9999</SSN>     <Dept>XSLT Hacking</Dept>     <EmpNo>1</EmpNo>     <Position>Cook</Position>     <Manager>Big Boss O'Reilly</Manager>   </Employee>   <PayPeriod>     <From>1/1/02</From>     <To>1/31/02</To>   </PayPeriod>   <Expenses>     <Expense>       <Date>12/20/01</Date>       <Account>12345</Account>       <Desc>Goofing off instead of going to conference.</Desc>       <Lodging>500.00</Lodging>       <Transport>50.00</Transport>       <Fuel>0</Fuel>       <Meals>300.00</Meals>       <Phone>100</Phone>       <Entertainment>1000.00</Entertainment>       <Other>300.00</Other>     </Expense>     <Expense>       <Date>12/20/01</Date>       <Account>12345</Account>       <Desc>On the beach</Desc>       <Lodging>500.00</Lodging>       <Transport>50.00</Transport>       <Fuel>0</Fuel>       <Meals>200.00</Meals>       <Phone>20</Phone>       <Entertainment>300.00</Entertainment>       <Other>100.00</Other>     </Expense>   </Expenses> </ExpenseReport>

Now imagine that you need to import this XML into a spreadsheet so that when appropriate spreadsheet styles are applied, the result looks like Figure 7-1.

Figure 7-1. Expense report spreadsheet


To place the data correctly in all cells so that styling is the only further processing necessary, the following comma-delimited file must be produced:

,,,,,,,,,,,,Statement No.,123,     ,,,,,,,,,,,Expense Statement,         ,,,Employee,,,,,,,,,Pay Period,     ,,,Name,Salvatore Mangano,,Emp #,1,,,,,From,1/1/02, ,,,SSN,999-99-9999,,Position,Cook, ,,,Department,XSLT Hacking,,,,,,,,To,1/31/02,     ,,,Date,Account,Description,Lodging,Transport,Fuel,Meals,Phone,Entertainment,Other, Total,     ,,,12/20/01,12345,Goofing off instead of going to conference.,500.00,50.00,0,300. 00,100,1000.00,300.00,     ,,,12/20/01,12345,On the beach,500.00,50.00,0,200.00,20,300.00,100.00,Sub Total, ,,,Approved,,Notes,,,,,,,Advances, ,,,,,,,,,,,,Total,

As you can see, mapping from XML to delimited data lacks the uniformity that made the previous examples simple to implement. This is not to say that a stylesheet cannot be created to do the required mapping. However, if you attack the problem directly, we will probably end up with an ad-hoc and complex stylesheet.

When confronted with complex transformations, see if the problem could be simplified by first transforming the source document to an intermediate form, and then transform the intermediate form to the desired result. In other words, try to break complex transformation problems into two or more less-complicated problems.

Thinking along these lines, you'll see that the problem of mapping the XML to the spreadsheet is really a problem of assigning XML content to cells in the spreadsheet. You can therefore invent an intermediate form consisting of cell elements . For example, a cell element that places the value "foo" in cell A1 would be <cell col="A" row="1" value="foo"/>. Your goal is to create a stylesheet that maps each significant element in the source onto a cell element. Because you no longer have to worry about ordering, mapping is simple:

<xsl:template match="ExpenseReport">     <c:cell col="M" row="3" value="Statement No."/>     <c:cell col="N" row="3" value="{@statementNum}"/>     <c:cell col="L" row="6" value="Expense Statement"/>     <xsl:apply-templates/>     <xsl:variable name="offset" select="count(Expenses/Expense)+18"/>     <c:cell col="M" row="{$offset}" value="Sub Total"/>     <c:cell col="D" row="{$offset + 1}" value="Approved"/>     <c:cell col="F" row="{$offset + 1}" value="Notes"/>     <c:cell col="M" row="{$offset + 1}" value="Advances"/>     <c:cell col="M" row="{$offset + 2}" value="Total"/>   </xsl:template>       <xsl:template match="Employee">     <c:cell col="D" row="10" value="Employee"/>     <xsl:apply-templates/>   </xsl:template>       <xsl:template match="Employee/Name">     <c:cell col="D" row="12" value="Name"/>     <c:cell col="E" row="12" value="{.}"/>   </xsl:template>       <xsl:template match="Employee/SSN">     <c:cell col="D" row="13" value="SSN"/>     <c:cell col="E" row="13" value="{.}"/>   </xsl:template>       <xsl:template match="Employee/Dept">     <c:cell col="D" row="14" value="Department"/>     <c:cell col="E" row="14" value="{.}"/>   </xsl:template>       <xsl:template match="Employee/EmpNo">     <c:cell col="G" row="12" value="Emp #"/>     <c:cell col="H" row="12" value="{.}"/>   </xsl:template>       <xsl:template match="Employee/Position">     <c:cell col="G" row="13" value="Position"/>     <c:cell col="H" row="13" value="{.}"/>   </xsl:template>       <xsl:template match="Employee/Manager">     <c:cell col="G" row="14" value="Manager"/>     <c:cell col="H" row="14" value="{.}"/>   </xsl:template>       <xsl:template match="PayPeriod">     <c:cell col="M" row="10" value="Pay Period"/>     <xsl:apply-templates/>   </xsl:template>       <xsl:template match="PayPeriod/From">     <c:cell col="M" row="12" value="From"/>     <c:cell col="N" row="12" value="{.}"/>   </xsl:template>       <xsl:template match="PayPeriod/To">     <c:cell col="M" row="14" value="To"/>     <c:cell col="N" row="14" value="{.}"/>   </xsl:template>       <xsl:template match="Expenses">     <c:cell col="D" row="16" value="Date"/>     <c:cell col="E" row="16" value="Account"/>     <c:cell col="F" row="16" value="Description"/>     <c:cell col="G" row="16" value="Lodging"/>     <c:cell col="H" row="16" value="Transport"/>     <c:cell col="I" row="16" value="Fuel"/>     <c:cell col="J" row="16" value="Meals"/>     <c:cell col="K" row="16" value="Phone"/>     <c:cell col="L" row="16" value="Entertainment"/>     <c:cell col="M" row="16" value="Other"/>     <c:cell col="N" row="16" value="Total"/>     <xsl:apply-templates/>   </xsl:template>       <xsl:template match="Expenses/Expense">     <xsl:apply-templates>       <xsl:with-param name="row" select="position( )+16"/>     </xsl:apply-templates>   </xsl:template>       <xsl:template match="Expense/Date">     <xsl:param name="row"/>     <c:cell col="D" row="{$row}" value="{.}"/>   </xsl:template>       <xsl:template match="Expense/Account">     <xsl:param name="row"/>     <c:cell col="E" row="{$row}" value="{.}"/>   </xsl:template>       <xsl:template match="Expense/Desc">     <xsl:param name="row"/>     <c:cell col="F" row="{$row}" value="{.}"/>   </xsl:template>       <xsl:template match="Expense/Lodging">     <xsl:param name="row"/>     <c:cell col="G" row="{$row}" value="{.}"/>   </xsl:template>       <xsl:template match="Expense/Transport">     <xsl:param name="row"/>     <c:cell col="H" row="{$row}" value="{.}"/>   </xsl:template>       <xsl:template match="Expense/Fuel">     <xsl:param name="row"/>     <c:cell col="I" row="{$row}" value="{.}"/>   </xsl:template>       <xsl:template match="Expense/Meals">     <xsl:param name="row"/>     <c:cell col="J" row="{$row}" value="{.}"/>   </xsl:template>       <xsl:template match="Expense/Phone">     <xsl:param name="row"/>     <c:cell col="K" row="{$row}" value="{.}"/>   </xsl:template>       <xsl:template match="Expense/Entertainment">     <xsl:param name="row"/>     <c:cell col="L" row="{$row}" value="{.}"/>   </xsl:template>       <xsl:template match="Expense/Other">     <xsl:param name="row"/>     <c:cell col="M" row="{$row}" value="{.}"/>   </xsl:template>

One major advantage of using an attribute to encode a cell's value is that it lets you use attribute-value templates, thus creating a very concise translation scheme. Two types of mappings occur in this stylesheet. The first type is absolute. For example, you want the employee name to map to cell E12. The second type is relative; you want each expense item to map relative to row 16, based on its position in the source document.

When you apply this stylesheet to the source document, you get the following output:

<c:cells xmlns:c="http://www.ora.com/XSLTCookbook/namespaces/cells" >   <c:cell col="M" row="3" value="Statement No."/>   <c:cell col="N" row="3" value="123"/>   <c:cell col="L" row="6" value="Expense Statement"/>   <c:cell col="D" row="10" value="Employee"/>   <c:cell col="D" row="12" value="Name"/>   <c:cell col="E" row="12" value="Salvatore Mangano"/>   <c:cell col="D" row="13" value="SSN"/>   <c:cell col="E" row="13" value="999-99-9999"/>   <c:cell col="D" row="14" value="Department"/>   <c:cell col="E" row="14" value="XSLT Hacking"/>   <c:cell col="G" row="12" value="Emp #"/>   <c:cell col="H" row="12" value="1"/>   <c:cell col="G" row="13" value="Position"/>   <c:cell col="H" row="13" value="Cook"/>   <c:cell col="G" row="14" value="Manager"/>   <c:cell col="H" row="14" value="Big Boss O'Reilly"/>   <c:cell col="M" row="10" value="Pay Period"/>   <c:cell col="M" row="12" value="From"/>   <c:cell col="N" row="12" value="1/1/02"/>   <c:cell col="M" row="14" value="To"/>   <c:cell col="N" row="14" value="1/31/02"/>   <c:cell col="D" row="16" value="Date"/>   <c:cell col="E" row="16" value="Account"/>   <c:cell col="F" row="16" value="Description"/>   <c:cell col="G" row="16" value="Lodging"/>   <c:cell col="H" row="16" value="Transport"/>   <c:cell col="I" row="16" value="Fuel"/>   <c:cell col="J" row="16" value="Meals"/>   <c:cell col="K" row="16" value="Phone"/>   <c:cell col="L" row="16" value="Entertainment"/>   <c:cell col="M" row="16" value="Other"/>   <c:cell col="N" row="16" value="Total"/>   <c:cell col="D" row="18" value="12/20/01"/>   <c:cell col="E" row="18" value="12345"/>   <c:cell col="F" row="18" value="Goofing off instead of going to conference."/>   <c:cell col="G" row="18" value="500.00"/>   <c:cell col="H" row="18" value="50.00"/>   <c:cell col="I" row="18" value="0"/>   <c:cell col="J" row="18" value="300.00"/>   <c:cell col="K" row="18" value="100"/>   <c:cell col="L" row="18" value="1000.00"/>   <c:cell col="M" row="18" value="300.00"/>   <c:cell col="D" row="20" value="12/20/01"/>   <c:cell col="E" row="20" value="12345"/>   <c:cell col="F" row="20" value="On the beach"/>   <c:cell col="G" row="20" value="500.00"/>   <c:cell col="H" row="20" value="50.00"/>   <c:cell col="I" row="20" value="0"/>   <c:cell col="J" row="20" value="200.00"/>   <c:cell col="K" row="20" value="20"/>   <c:cell col="L" row="20" value="300.00"/>   <c:cell col="M" row="20" value="100.00"/>   <c:cell col="M" row="20" value="Sub Total"/>   <c:cell col="D" row="21" value="Approved"/>   <c:cell col="F" row="21" value="Notes"/>   <c:cell col="M" row="21" value="Advances"/>   <c:cell col="M" row="22" value="Total"/> </c:cells>

Of course, this is not the final result you are after. However, it is not too difficult to see that by sorting these cells first by @row and then by @col makes mapping the cells into a comma-delimited form simple. In fact, if you are willing to use the EXSLT node-set extension, you can obtain your result with a single pass. Also notice that the cell-to-comma delimited mapping is completely generic, so you can reuse it in the future for other complex XML-to-comma-delimited mappings. See Example 7-15 and Example 7-16.

Example 7-15. Generic cells-to-comma-delimited.xslt
<xsl:stylesheet version="1.0"       xmlns:xsl="http://www.w3.org/1999/XSL/Transform"      xmlns:c="http://www.ora.com/XSLTCookbook/namespaces/cells"       xmlns:exsl="http://exslt.org/common" extension-element-prefixes="exsl">       <xsl:output method="text"/>       <!-- Used to map column letters to numbers -->   <xsl:variable name="columns" select=" '_ABCDEFGHIJKLMNOPQRSTUVWXYZ' "/>      <xsl:template match="/">          <!-- Capture cells in a variable -->     <xsl:variable name="cells">       <xsl:apply-templates/>     </xsl:variable>           <!-- Sort into row-column order -->     <xsl:variable name="cells-sorted">       <xsl:for-each select="exsl:node-set($cells)/c:cell">         <xsl:sort select="@row" data-type="number"/>         <xsl:sort select="@col" data-type="text"/>         <xsl:copy-of select="."/>       </xsl:for-each>     </xsl:variable>         <xsl:apply-templates select="exsl:node-set($cells-sorted)/c:cell"/>       </xsl:template>       <xsl:template match="c:cell">     <xsl:choose>         <!-- Detect a row change -->       <xsl:when test="preceding-sibling::c:cell[1]/@row != @row">          <!-- Compute how many rows to skip, if any -->         <xsl:variable name="skip-rows">           <xsl:choose>             <xsl:when test="preceding-sibling::c:cell[1]/@row">               <xsl:value-of                 select="@row - preceding-sibling::c:cell[1]/@row"/>             </xsl:when>             <xsl:otherwise>               <xsl:value-of select="@row - 1"/>             </xsl:otherwise>           </xsl:choose>         </xsl:variable>         <xsl:call-template name="skip-rows">           <xsl:with-param name="skip" select="$skip-rows"/>         </xsl:call-template>             <xsl:variable name="current-col"                 select="string-length(substring-before($columns,@col))"/>         <xsl:call-template name="skip-cols">           <xsl:with-param name="skip" select="$current-col - 1"/>         </xsl:call-template>         <xsl:value-of select="@value"/>,<xsl:text/>       </xsl:when>              <xsl:otherwise>          <!-- Compute how many cols to skip, if any -->         <xsl:variable name="skip-cols">           <xsl:variable name="current-col"                 select="string-length(substring-before($columns,@col))"/>                      <xsl:choose>             <xsl:when test="preceding-sibling::c:cell[1]/@col">               <xsl:variable name="prev-col"                 select="string-length(substring-before($columns,                          preceding-sibling::c:cell[1]/@col))"/>               <xsl:value-of select="$current-col - $prev-col - 1"/>             </xsl:when>             <xsl:otherwise>               <xsl:value-of select="$current-col - 1"/>             </xsl:otherwise>           </xsl:choose>         </xsl:variable>                  <xsl:call-template name="skip-cols">           <xsl:with-param name="skip" select="$skip-cols"/>         </xsl:call-template>         <!--Output the value of the cell and a comma -->         <xsl:value-of select="@value"/>,<xsl:text/>       </xsl:otherwise>     </xsl:choose>   </xsl:template>     <!-- Used to insert empty lines for non contiguous rows --> <xsl:template name="skip-rows">   <xsl:param name="skip"/>   <xsl:choose>     <xsl:when test="$skip > 0">       <xsl:text>&#xa;</xsl:text>       <xsl:call-template name="skip-rows">         <xsl:with-param name="skip" select="$skip - 1"/>       </xsl:call-template>     </xsl:when>     <xsl:otherwise/>   </xsl:choose> </xsl:template>     <!-- Used to insert extra commas for non contiguous cols --> <xsl:template name="skip-cols">   <xsl:param name="skip"/>   <xsl:choose>     <xsl:when test="$skip > 0">       <xsl:text>,</xsl:text>       <xsl:call-template name="skip-cols">         <xsl:with-param name="skip" select="$skip - 1"/>       </xsl:call-template>     </xsl:when>     <xsl:otherwise/>   </xsl:choose> </xsl:template>     </xsl:stylesheet>

Example 7-16. Applications-specific expense-to-delimited.xslt
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   xmlns:c="http://www.ora.com/XSLTCookbook/namespaces/cells"    xmlns:exsl="http://exslt.org/common" extension-element-prefixes="exsl">       <xsl:include href="cells-to-comma-delimited.xslt"/>      <xsl:template match="ExpenseReport">     <c:cell col="M" row="3" value="Statement No."/>     <c:cell col="N" row="3" value="{@statementNum}"/>     <c:cell col="L" row="6" value="Expense Statement"/>     <xsl:apply-templates/>     <xsl:variable name="offset" select="count(Expenses/Expense)+18"/>     <c:cell col="M" row="{$offset}" value="Sub Total"/>     <c:cell col="D" row="{$offset + 1}" value="Approved"/>     <c:cell col="F" row="{$offset + 1}" value="Notes"/>     <c:cell col="M" row="{$offset + 1}" value="Advances"/>     <c:cell col="M" row="{$offset + 2}" value="Total"/>   </xsl:template>       <xsl:template match="Employee">     <c:cell col="D" row="10" value="Employee"/>     <xsl:apply-templates/>   </xsl:template>       <xsl:template match="Employee/Name">     <c:cell col="D" row="12" value="Name"/>     <c:cell col="E" row="12" value="{.}"/>   </xsl:template>     <!-- ... --> <!-- Remainder elided, same as original stylesheet above --> <!-- ... -->     </xsl:stylesheet>

The reusable cells-to-comma-delimited.xslt captures the cells produced by the application-specific stylesheet into a variable and sorts. It then transforms those cells into comma-delimited output. This is done by considering each cell relative to its predecessor in sorted order. If the predecessor is on a different row, then one or more newlines must be output. On the other hand, if the predecessor is on a nonadjacent column, then one or more extra commas must be output. You must also handle the case when the first row or column within a row is not the first row or column in the spreadsheet. Once these details are handled, you only need to output the value of the cell followed by a comma.

XSLT 2.0

A nice enhancement to xsl:value-of that makes delimited text easier to produce is the separator attribute. When xsl:value-of is given a sequence, it will serialize it and if the separator attribute is provided, it will insert the separator after each item but the last. The seperator can be a literal or an attribute value template. In the next example, I take advantage of this feature to simplify the code that outputs column names. I also take advantage of XPath 2.0 to generalize the functionality so the same base stylesheet can be used with xml that uses elements or attributes. Further, I use literal sequences to encode the CSV mappings rather than embedded stylesheet xml. This is to illustrate the added flexibility XSLT 2.0 gives you rather than to suggest one technique is superior to the other (see Example 7-17 and Example 7-18).

Example 7-17. Generic cells-to-comma-delimited.xslt
<xsl:stylesheet version="2.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"  xmlns:xs="http://www.w3.org/2001/XMLSchema"  xmlns:fn="http://www.w3.org/2004/10/xpath-functions"  xmlns:csv="http://www.ora.com/XSLTCookbook/namespaces/csv">     <xsl:param name="delimiter" select=" ',' "/> <!--These should be overridden in importing stylesheet --> <xsl:variable name="columns" select="( )" as="xs:string*"/> <xsl:variable name="nodeNames" select="$columns" as="xs:string*"/>     <xsl:output method="text" />     <xsl:strip-space elements="*"/>       <xsl:template match="/">   <!--Here we use the new ability of value-of-->   <xsl:value-of select="$columns" separator="{$delimiter}"/>   <xsl:text>&#xa;</xsl:text>   <xsl:apply-templates mode="csv:map-row"/> </xsl:template>     <xsl:template match="/*/*" mode="csv:map-row" name="csv:map-row">   <xsl:param name="elemOrAttr" select=" 'elem' " as="xs:string"/>      <xsl:variable name="row" select="." as="node( )"/>      <xsl:for-each select="$nodeNames">     <xsl:apply-templates select="if ($elemOrAttr eq 'elem')                                   then $row/*[local-name(.) eq current( )]                                   else $row/@*[local-name(.) eq current( )]"                           mode="csv:map-value"/>     <xsl:value-of select="if (position( ) ne last( )) then $delimiter else ( )"/>   </xsl:for-each>       <xsl:text>&#xa;</xsl:text>   </xsl:template>     <xsl:template match="node( )" mode="csv:map-value">   <xsl:value-of select="."/> </xsl:template>     </xsl:stylesheet>

Example 7-18. Applications-specific expense-to-delimited.xslt
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"  xmlns:xs="http://www.w3.org/2001/XMLSchema"  xmlns:csv="http://www.ora.com/XSLTCookbook/namespaces/csv">     <xsl:import href="toCSV.xslt"/>     <!--Defines the mapping from nodes to columns --> <xsl:variable name="columns" select="'Name', 'Age', 'Gender', 'Smoker'"  as="xs:string*"/> <xsl:variable name="nodeNames" select="'name', 'age', 'sex', 'smoker'" as="xs:string*"/> <!-- Switch default processing from elements to attributes --> <xsl:template match="/*/*" mode="csv:map-row">   <xsl:call-template name="csv:map-row">     <xsl:with-param name="elemOrAttr" select=" 'attr' "/>   </xsl:call-template> </xsl:template> <!-- Handle custom attribute mappings -->     <xsl:template match="@sex" mode="csv:map-value">   <xsl:choose>     <xsl:when test=".='m'">male</xsl:when>     <xsl:when test=".='f'">female</xsl:when>     <xsl:otherwise>error</xsl:otherwise>   </xsl:choose> </xsl:template>     </xsl:stylesheet>

Discussion

Most XML-to-delimited transformations you are likely to encounter are fairly simple for someone well-versed in XSLT. The value of the previous examples is that they demonstrate that problems can be separated into two parts: a reusable part that requires XSLT expertise and an application-specific part that does not require much XSLT knowledge once its conventions are understood.

The true value of this technique is that it allows individuals who are less skilled in XSLT to do useful work. For example, suppose you had to convert a large base of XML to comma-delimited data and it needed to be done yesterday. Showing someone how to reuse these generic solutions would be much easier than teaching them enough XSLT to come up with custom scripts.




XSLT Cookbook
XSLT Cookbook: Solutions and Examples for XML and XSLT Developers, 2nd Edition
ISBN: 0596009747
EAN: 2147483647
Year: 2003
Pages: 208
Authors: Sal Mangano

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net