6.3 Editing XML Documents in Excel


While Excel's powerful analysis tools make it an ideal application for processing the data found in XML documents, Excel's expectation that data must appear in a grid limits its capabilities as a general XML editor. If you need to create XML files that do fit Excel's interface, however, Excel may prove an excellent way to have users create XML documents without ever realizing that they're doing so. The first few steps are much like those used to load XML data into Excel spreadsheets, but the user is encouraged to add data and save the results. In this case, Excel serves as an editor for a relatively simple class of XML documents.

As an example, we'll use a document format that is designed to represent a portion of a forest, and used to generate a stand map. Stand maps are circular maps that represent one-fifth of an acre of land, as shown in Figure 6-34.

Figure 6-34. A stand map generated from an XML document
figs/oxml_0634.gif


Though you can't see the color in this book, you can get the general idea. Trees are measured from a center point in a forest, using their distance and their compass degree. The species and diameter at breast height (dbh) are also recorded, and there may be additional notes. The data behind the map is generally recorded as a table, often on paper. (The first stand map I made was on a four-foot circle of paper, recorded using markers, templates, a compass, and a ruler.) While stand maps only represent a small section of a forest, they can provide baseline information for comparing the different contents of different forests or sections of forests. For example, the forest shown in Figure 6-34 is largely maple, with some hemlocks and some large tulip poplars just outside the ring. The forest shown in Figure 6-35 is largely black locust, with other species mixed in.

Figure 6-35. A stand map of a different forest, also generated from an XML document
figs/oxml_0635.gif


Creating these maps is beyond the capabilities of Excel's charting functions (that's done using XSLT with some trigonometry extensions to generate Scalable Vector Graphics, or SVG), but Excel is very useful in this instance as a tool for collecting data. Laptops have become more and more common in the forest, as they're far more convenient than four-foot circular tables for collecting data.

The XML data format behind these maps is pretty simple. A sample is shown in Example 6-6.

Example 6-6. A description of a forest in XML
<forest xmlns="http://simonstl.com/ns/forest/">     <tree>   <species>STM</species>   <dbh>6</dbh>   <height>13</height>   <angle>6.5</angle>   <radius>39</radius> </tree>     <tree>   <species>SM</species>   <dbh>37.5</dbh>   <height>67</height>   <angle>12</angle>   <radius>38.5</radius> </tree>     <tree>   <species>H</species>   <dbh>31</dbh>   <height>63</height>   <angle>16</angle>   <radius>29</radius>   <note>snag</note> </tree>     <tree>   <species>SM</species>   <dbh>6</dbh>   <height>30</height>   <angle>42</angle>   <radius>52</radius> </tree> ...</forest>

The schema for this data is similarly simple, as shown in Example 6-7.

Example 6-7. The schema for forest map information
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"  targetNamespace="http://simonstl.com/ns/forest/" xmlns:forest="http://simonstl.com/ns/ forest/">   <xs:element name="forest">     <xs:complexType>       <xs:sequence>         <xs:element maxOccurs="unbounded" ref="forest:tree"/>       </xs:sequence>     </xs:complexType>   </xs:element>   <xs:element name="tree">     <xs:complexType>       <xs:sequence>         <xs:element ref="forest:species"/>         <xs:element ref="forest:dbh"/>         <xs:element ref="forest:height"/>         <xs:element ref="forest:angle"/>         <xs:element ref="forest:radius"/>         <xs:element minOccurs="0" ref="forest:note"/>       </xs:sequence>     </xs:complexType>   </xs:element>   <xs:element name="species" type="xs:NCName"/>   <xs:element name="dbh" type="xs:decimal"/>   <xs:element name="height" type="xs:decimal"/>   <xs:element name="angle" type="xs:decimal"/>   <xs:element name="radius" type="xs:decimal"/>   <xs:element name="note" type="xs:string"/> </xs:schema>

Most of the declarations that directly affect users' work are those at the bottom of the schema. The abbreviations for species are non-colonized names (NCNames), while the measurements are decimals and the notes are strings. Using this schema, we'll create a map and put a list into a spreadsheet that users can treat as a recording device for their measurements in the field.

Using the XML Source task pane, add a map to the spreadsheet, using the schema as a base. You'll need to select a root (forest), and then the task pane will be populated with choices for inclusion in the spreadsheet, as shown in Figure 6-36.

Figure 6-36. The XML Map for the forest XML document
figs/oxml_0636.gif


One mildly irritating feature of this map is the ns1 prefix Excel has applied to the element names. Fortunately, this is only an issue when you work with the map directly, as it can be edited out of the list headers with no harm to the data structure. Figure 6-37 shows what our new spreadsheet with edited headers looks like.

Figure 6-37. A spreadsheet for creating forest map XML
figs/oxml_0637.gif


Using this interface is pretty easy. Researchers just enter one row per tree, filling out the required species, dbh, height, angle, and radius, and adding a note if there's a reason. Figure 6-38 shows what this data entry process looks like.

Figure 6-38. Entering new forest information
figs/oxml_0638.gif


One especially nice feature of this spreadsheet is that the XML Source task pane isn't visible. There's no need for the people working with this interface to understand that they're doing anything at all unusual. The sorting and filtering features of the list are conveniences, but they don't interfere with the data entry. Tabbing from field to field works beautifully.

Also, there's an extra sheet here, the key sheet, which is itself an imported XML document. Because this mapping format is designed to be used around the world, in places that have very different species of trees, the species codes are stored in a separate document that is reference by the XSLT that generates the map. The developers of this spreadsheet have included that information as well. Mostly this is a convenience, to help humans remember codes, but it also opens the possibility that those same humans might use the spreadsheet to modify the codes and their resulting maps. (If you don't want to permit them to save the codes back out as XML, just cut the information and paste it back in outside of a list context.) Figure 6-39 shows the key tab.

Figure 6-39. Additional key information, also stored as XML
figs/oxml_0639.gif


Researchers working in the field can save their spreadsheets as Excel files, and it'll be simple enough to extract the XML information when they return to a place where they're analyzing them. If they want to extract the information in the field, say to generate a map, they can right-click on their data and choose Export . . . from the XML menu. The results of doing that with the data shown in Figure 6-38 are shown in Example 6-8.

Example 6-8. XML created through Excel's XML interfaces
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <ns1:forest xmlns:ns1="http://simonstl.com/ns/forest/">     <ns1:tree>         <ns1:species>P</ns1:species>         <ns1:dbh>14</ns1:dbh>         <ns1:height>40</ns1:height>         <ns1:angle>10</ns1:angle>         <ns1:radius>35</ns1:radius>     </ns1:tree>     <ns1:tree>         <ns1:species>SM</ns1:species>         <ns1:dbh>4</ns1:dbh>         <ns1:height>15</ns1:height>         <ns1:angle>12</ns1:angle>         <ns1:radius>40</ns1:radius>     </ns1:tree>     <ns1:tree>         <ns1:species>SM</ns1:species>         <ns1:dbh>20</ns1:dbh>         <ns1:height>50</ns1:height>         <ns1:angle>15</ns1:angle>         <ns1:radius>15</ns1:radius>     </ns1:tree>     <ns1:tree>         <ns1:species>BN</ns1:species>         <ns1:dbh>17</ns1:dbh>         <ns1:height>40</ns1:height>         <ns1:angle>22</ns1:angle>         <ns1:radius>27</ns1:radius>     </ns1:tree>     <ns1:tree>         <ns1:species>WO</ns1:species>         <ns1:dbh>19</ns1:dbh>         <ns1:height>40</ns1:height>         <ns1:angle>32</ns1:angle>         <ns1:radius>42</ns1:radius>     </ns1:tree>     <ns1:tree>         <ns1:species>SM</ns1:species>         <ns1:dbh>3</ns1:dbh>         <ns1:height>10</ns1:height>         <ns1:angle>37</ns1:angle>         <ns1:radius>12</ns1:radius>         <ns1:note>oddly angled</ns1:note>     </ns1:tree> </ns1:forest>

Excel has, unfortunately, applied the ns1 prefix to everything, but the information comes through clearly and can be processed by all the tools built around the format shown originally in Example 6-6.

Your data doesn't have to be this flat for Excel to be capable of editing it. It could, for instance, look like the data in Example 6-9.

Example 6-9. A description of a forest in XML with some gratuitous structure
<forest xmlns="http://simonstl.com/ns/forest/">     <tree>   <details>     <species>STM</species>     <dbh>6</dbh>     <height>13</height>   </details>   <location>     <angle>6.5</angle>     <radius>39</radius>   </location> </tree>     <tree>   <details>     <species>SM</species>     <dbh>37.5</dbh>     <height>67</height>   </details>   <location>     <angle>12</angle>     <radius>38.5</radius>   </location> </tree>     <tree>   <details>     <species>H</species>     <dbh>31</dbh>     <height>63</height>     <note>snag</note>   </details>   <location>     <angle>16</angle>     <radius>29</radius>   </location> </tree>     <tree>   <details>     <species>SM</species>     <dbh>6</dbh>     <height>30</height>   </details>   <location>     <angle>42</angle>     <radius>52</radius>   </location> </tree> ...</forest>

You could edit it in a spreadsheet that looked just like Figure 6-38. Excel doesn't mind the extra container elements at all, so long as they don't interfere with its expectations for repeating list content.

If a map only represents part of an XML document, and you export it back to XML, only the parts of the XML document that were shown by the map will be exported. Don't try to use Excel to edit tables in larger documents, for instance!




Office 2003 XML
Office 2003 XML
ISBN: 0596005385
EAN: 2147483647
Year: 2003
Pages: 135

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net