Chapter 25: XML in Office Development


Many applications that process XML do it as an intermediate step in creating documents. Many of those documents are created with Microsoft Office by users who are more familiar with the Office tools than with angle brackets. Because Microsoft Office products are some of the most commonly used applications for working with documents, it makes sense that Microsoft added XML processing to these applications. Although it attempts to hide the angle brackets more often than an XML purist may like, Office makes creating and processing XML documents easy for average users. This chapter looks at creating and editing XML with the most commonly used Office tools. In addition, the newer alternatives of Office 2007 and Open Office are shown.

Note 

While any editions of Office 2003 (or Office 2007) can be used to save XML, only the Professional and Enterprise editions provide the advanced XML features.

Using XML with Microsoft Excel

Microsoft Excel is one of the most commonly used applications in business. Although it is a spreadsheet application, designed for calculations, Excel files may store more data for business than any other data format.

Saving Excel workbooks as XML

Excel 2003 added the capability to save workbooks as XML, using a format Microsoft calls SpreadsheetML. Spreadsheet can now be transmitted as text, rather than in the default binary format. You can now use tools such as XSLT or XQuery on the resulting XML files. The SpreadsheetML format ensures a high degree of compatibility with the original document. If you open this XML file on a computer that has Excel 2003 (or later) installed, you see the original document. Only a little functionality is lost: Mostly the lost functions are those that are not logically able to move across platforms like embedded controls and subdocuments.

Figure 25-1 shows a simple annuity calculator spreadsheet, and Listing 25-1 gives you a portion of this file saved as SpreadsheetML. Much of the file has been excluded for brevity. The full file is approximately 250KB, compared with 80KB in the binary form.

image from book
Figure 25-1

Listing 25-1: Annuity spreadsheet as XML

image from book
      <?xml version="1.0"?>      <?mso-application prog?>      <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"       xmlns:o="urn:schemas-microsoft-com:office:office"       xmlns:x="urn:schemas-microsoft-com:office:excel"       xmlns:dt="uuid:"       xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"       xmlns:html="http://www.w3.org/TR/REC-html40">       <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">        <LastAuthor>Guy Some</LastAuthor>        <LastPrinted>2001-04-09T17:58:15Z</LastPrinted>        <Created>2000-10-19T23:21:30Z</Created>        <LastSaved>2001-04-20T18:48:39Z</LastSaved>        <Company>Stuff Is Us</Company>        <Version>11.6568</Version>       </DocumentProperties>       <CustomDocumentProperties xmlns="urn:schemas-microsoft-com:office:office">        <_TemplateID dt:dt="string">TC010175321033</_TemplateID>       </CustomDocumentProperties>       <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">        <WindowHeight>8985</WindowHeight>        <WindowWidth>11175</WindowWidth>        <WindowTopX>1365</WindowTopX>        <WindowTopY>60</WindowTopY>        <ProtectStructure>False</ProtectStructure>        <ProtectWindows>False</ProtectWindows>       </ExcelWorkbook>       <Styles>        <Style ss: ss:Name="Normal">          <Alignment ss:Vertical="Bottom"/>          <Borders/>          <Font/>          <Interior/>          <NumberFormat/>          <Protection/>        </Style>        <Style ss: ss:Name="Comma">         <NumberFormat ss:Format="_(* #,##0.00_);_(* \(#,##0.00\);_(* &quot;-      &quot;??_);_(@_)"/>        </Style>        <Style ss: ss:Name="Currency">         <NumberFormat          ss:Format="_(&quot;$&quot;* #,##0.00_);_(&quot;$&quot;*      \(#,##0.00\);_(&quot;$&quot;* &quot;-&quot;??_);_(@_)"/>        </Style>      ...      <Worksheet ss:Name="Annuity" ss:Protected="1">        <Table ss:ExpandedColumnCount="9" ss:ExpandedRowCount="665" x:FullColumns="1"         x:FullRows="1" ss:Style ss:DefaultRowHeight="10.5">         <Column ss:Style ss:AutoFitWidth="0" ss:Width="117"/>         <Column ss:Style ss:AutoFitWidth="0" ss:Width="66.75"/>         <Column ss:Style ss:Width="53.25"/>         <Column ss:Style ss:AutoFitWidth="0" ss:Width="98.25"/>         <Column ss:Style ss:AutoFitWidth="0" ss:Width="91.5"/>         <Column ss:Style ss:Width="90.75"/>         <Column ss:Index="8" ss:Style ss:Width="72.75"/>         <Column ss:Style ss:Width="46.5"/>         <Row ss:AutoFitHeight="0" ss:Height="24">          <Cell ss:Style><Data ss:Type="String">Annuity investment</Data></Cell>          <Cell ss:Index="4" ss:Style/>         </Row>         <Row ss:AutoFitHeight="0" ss:Height="24">          <Cell ss:Style><Data ss:Type="String">Present value</Data></Cell>          <Cell ss:Style><Data ss:Type="Number">10000</Data></Cell>          <Cell ss:Index="4" ss:Style><Data ss:Type="String">Value after 7      years</Data></Cell>          <Cell ss:Style ss:Formula="=R[89]C[-1]"><Data      ss:Type="Number">14429.627004010901</Data></Cell>          <Cell ss:Style/>         </Row>         <Row ss:AutoFitHeight="0" ss:Height="24">          <Cell ss:Style><Data ss:Type="String">Interest rate</Data></Cell>          <Cell ss:Style><Data ss:Type="Number">5.2499999999999998E-      2</Data></Cell>          <Cell ss:Index="4" ss:Style><Data ss:Type="String">Monthly payment      after 7 years</Data></Cell>          <Cell ss:Style ss:Formula="=R[88]C[-2]"><Data      ss:Type="Number">62.854629140059927</Data></Cell>          <Cell ss:Style/>        </Row>      ... 
image from book

Because SpreadsheetML must maintain this compatibility, it includes sections for the styles, borders, and other formatting of the document, as well as the actual data. The actual data is included in a <Worksheet> element that maps to each of the pages in an Excel workbook. Each <Worksheet> contains a <Table> element, made up of a series of <Row > elements that contain multiple <Cell> elements. These <Cell> elements have attributes that point to the styles stored elsewhere in the document, the formula (if any) in the cell, and the value. Listing 25-2 shows a simple XSLT file that extracts the data and produces a simpler view of it (see Listing 25-3).

Listing 25-2: An XSLT transform of Excel

image from book
      <?xml version="1.0" encoding="UTF-8"?>      <xsl:stylesheet version="2.0"        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"        xmlns:xs="http://www.w3.org/2001/XMLSchema"        xmlns:fn="http://www.w3.org/2005/xpath-functions"        xmlns:xdt="http://www.w3.org/2005/xpath-datatypes"        xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>        <xsl:template match="/">          <data>            <xsl:attribute name="type">              <xsl:value-of select="/ss:Workbook/ss:Worksheet/@ss:Name"/>            </xsl:attribute>            <xsl:apply-templates select="/ss:Workbook/ss:Worksheet/ss:Table/ss:Row"/>          </data>        </xsl:template>        <xsl:template match="ss:Row">          <row>            <xsl:apply-templates select="ss:Cell"/>          </row>        </xsl:template>        <xsl:template match="ss:Cell">          <col>            <xsl:value-of select="ss:Data"/>            </col>        </xsl:template>      </xsl:stylesheet> 
image from book

The transform is slightly complicated by the need to include the namespaces used by the document. In particular, the urn:schemas-microsoft-com:office:spreadsheet URI is used to map the elements in the spreadsheet file.

Listing 25-3: After transform

image from book
      <?xml version="1.0" encoding="UTF-8"?>      <data xmlns:fn="http://www.w3.org/2005/xpath-functions"        xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"        xmlns:xdt="http://www.w3.org/2005/xpath-datatypes"        xmlns:xs="http://www.w3.org/2001/XMLSchema"        type="Annuity">      <row>               <col>Annuity investment</col>               <col></col>      </row>      <row>               <col>Present value</col>               <col>10000</col>               <col>Value after 7 years</col>               <col>14429.627004010901</col>               <col></col>      </row>      <row>               <col>Interest rate</col>               <col>5.2499999999999998E-2</col>               <col>Monthly payment after 7 years</col>               <col>62.854629140059927</col>               <col></col>      </row>               <row>               <col>Term (in years)</col>               <col>20</col>               <col>Value after 20 years</col>               <col>28511.140205640702</col>               <col></col>      </row>      <row>               <col>Contribution each month (reinvested interest)</col>               <col>1</col>               <col>Monthly payment after 20 years</col>               <col>124.1928944863005</col>               <col></col>               <col></col>      </row>      ...      </data> 
image from book

For more information on the Excel XML format, see the reference schemas page noted in the Resources section that follows.

Editing XML documents

You can also use Excel to edit XML documents, starting with the 2003 edition. This is especially useful when the XML documents require some calculations or intermediate functions. In addition, many office workers are much more comfortable working in Excel than with XML. Providing a familiar interface increases the likelihood of their actually using the application.

To use Excel to edit XML, you must first set up a mapping. This mapping identifies the cells that map to the elements in an XML file or XSD schema. Figure 25-2 shows an Excel file for a purchase order before any mapping has been done.

image from book
Figure 25-2

Before using this file to edit or create XML files, you must add a mapping. The mapping can be done to a sample XML file or to an XML schema. In this case, it is mapped to a simple purchase order XSD file (see Listing 25-4).

Listing 25-4: A Purchase order schema

image from book
      <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">        <xsd:annotation>          <xsd:documentation xml:lang="en">           Purchase order schema for Example.com.           Copyright 2000 Example.com. All rights reserved.        </xsd:documentation>      </xsd:annotation>      <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>      <xsd:element name="comment" type="xsd:string"/>      <xsd:complexType name="PurchaseOrderType">        <xsd:sequence>          <xsd:element name="shipTo" type="USAddress"/>          <xsd:element name="billTo" type="USAddress"/>          <xsd:element ref="comment" minOccurs="0"/>          <xsd:element name="items"  type="Items"/>        </xsd:sequence>        <xsd:attribute name="orderDate" type="xsd:date"/>      </xsd:complexType>      <xsd:complexType name="USAddress">        <xsd:sequence>          <xsd:element name="name"   type="xsd:string"/>          <xsd:element name="street" type="xsd:string"/>          <xsd:element name="city"   type="xsd:string"/>          <xsd:element name="state"  type="xsd:string"/>          <xsd:element name="zip"    type="xsd:decimal"/>        </xsd:sequence>        <xsd:attribute name="country" type="xsd:NMTOKEN"                       fixed="US"/>      </xsd:complexType>      <xsd:complexType name="Items">        <xsd:sequence>          <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">            <xsd:complexType>              <xsd:sequence>                <xsd:element name="productName" type="xsd:string"/>                <xsd:element name="quantity">                  <xsd:simpleType>                   <xsd:restriction base="xsd:positiveInteger">                     <xsd:maxExclusive value="100"/>                   </xsd:restriction>                  </xsd:simpleType>                </xsd:element>                <xsd:element name="USPrice"  type="xsd:decimal"/>                <xsd:element ref="comment"   minOccurs="0"/>                <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>              </xsd:sequence>              <xsd:attribute name="partNum" type="SKU" use="required"/>            </xsd:complexType>          </xsd:element>        </xsd:sequence>      </xsd:complexType>      <!-- Stock Keeping Unit, a code for identifying products -->        <xsd:simpleType name="SKU">          <xsd:restriction base="xsd:string">            <xsd:pattern value="\d{3}-[A-Z]{2}"/>          </xsd:restriction>        </xsd:simpleType>      </xsd:schema> 
image from book

You create the mapping by first associating the XSD file with the document. Select Data image from book XML image from book XML Source to open the XML Source side bar. Because no existing maps are associated with this document, the sidebar is currently empty. Click the XML Maps button, and add the po.xsd file to the mapping. Because two top-level elements are in the schema, you must select purchaseOrder as the root node (see Figure 25-3).

image from book
Figure 25-3

After the XML Maps dialog has been closed, the new XML map appears in the sidebar, and you're ready to begin to map the XML data to the cells in the spreadsheet. Drag the fields over to the spreadsheet, dropping them into the appropriate cells. For example, drop the orderDate field onto cell E6, the quantity element onto the first cell in the Qty column, and so on. The cell is highlighted, and selecting highlighted cells also selects the matching element in the XML Source side panel (see Figure 25-4).

image from book
Figure 25-4

With the mapping in place, you can now use the spreadsheet as you normally would, work with formulas, add data, and so on. Table areas, such as the line item area in the purchase order, are treated as lists by Excel. This enables adding multiple rows.

Saving the XML can be done either by selecting Data, XML, Export, or by selecting the XML Data option when using the Save As dialog. This saves the XML (see Listing 25-5), and leaves no association to Excel.

Listing 25-5: An XML file created with Excel

image from book
      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <purchaseOrder orderDate="2006-04-01">        <shipTo country="USA">          <name>Foo deBar</name>          <street>123 Any Drive</street>          <city>New York</city>          <state>NY</state>          <zip>10012</zip>        </shipTo>        <billTo country="USA">          <name>Guy Some</name>          <street>985 Street Avenue</street>          <city>Redmond</city>          <state>WA</state>          <zip>98052</zip>        </billTo>        <items>          <item partNum="1234">            <productName>Widgets</productName>            <quantity>3</quantity>            <USPrice>50</USPrice>            <comment>The blue ones, if possible</comment>            <shipDate>2006-04-02</shipDate>          </item>          <item partNum="1121">            <productName>Gizmos</productName>            <quantity>23</quantity>            <USPrice>12.41</USPrice>            <comment>No rush on those</comment>            <shipDate>2006-04-06</shipDate>          </item>        </items>      </purchaseOrder> 
image from book

You won't frequently use Excel to create or edit XML. However, when the document requires a number of calculations as an intermediate step, Excel can provide a highly capable tool.




Professional XML
Professional XML (Programmer to Programmer)
ISBN: 0471777773
EAN: 2147483647
Year: 2004
Pages: 215

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net