6.2 Loading XML into an Excel Spreadsheet


There are several different ways to load XML data into Excel. Some are useful mostly for quick exploration and maybe some editing, while others are more appropriate for creating spreadsheets that use XML as a data source that can be easily replaced with new data whenever appropriate. All of these mechanisms share a common approach for showing XML data in the spreadsheet, so it's worth taking a moment to examine how Excel handles XML structures before moving into the mechanics of importing data.

When Excel opens an XML file, it imports data from it. If you make changes to the XML file while Excel is working with the data it has imported from that file, changes to the XML will not be reflected in the Excel spreadsheet.

6.2.1 Tables and Trees

Excel, like all spreadsheets, is built on a grid. Information is organized into rows and columns, and this worksheet grid (as well as relationships among multiple worksheet grids in a workbook) is used to create cross-references between different sections of information. Within the grid, Excel is enormously flexible. Information doesn't have to follow neat table structures pricing data could, if desired, run diagonally down a spreadsheet. It's easier to work with ranges of information if it stays in a single row or column, though, so most spreadsheets combine table areas that contain raw data and then either tables of results or cells along the fringes of the tables.

XML has no built-in notion of a grid. While it's certainly possible to represent a spreadsheet's rows and columns of cells within a worksheet as XML (and Chapter 7 will explore how Microsoft's chosen to do this), there's no guarantee that any given XML document will neatly fit into the native structures of Excel. There are a few simple but critical conditions that must apply to XML documents for them to be used easily as source data for Excel:


Tree structures that produce rows

Excel works best on XML documents when they conform to its structural expectations. The root element of the XML document should act as the primary container for a table of information. Each of the child elements of the root element should represent a row. Each of the child elements (or attributes) of the row elements should represent a cell in the grid. Roughly, this looks like:

<table>   <row>     <cell-name1>...value...</cell-name1>     <cell-name2>...value...</cell-name2>     <cell-name3>...value...</cell-name3>     <cell-name4>...value...</cell-name4>   </row>   <row>     <cell-name1>...value...</cell-name1>     <cell-name2>...value...</cell-name2>     <cell-name3>...value...</cell-name3>     <cell-name4>...value...</cell-name4>   </row>   .... </table>

Excel also works well with cells expressed as attributes:

<table>   <row cell-name1="value" cell-name2="value" cell-name3="value"       cell-name4="value" />   <row cell-name1="value" cell-name2="value" cell-name3="value"       cell-name4="value" />   .... </table>

Attributes and elements can also be mixed:

<table>   <row cell-name3="value" cell-name4="value">     <cell-name1>...value...</cell-name1>     <cell-name2>...value...</cell-name2>   </row>   <row cell-name3="value" cell-name4="value">     <cell-name1>...value...</cell-name1>     <cell-name2>...value...</cell-name2>   </row>   .... </table>

Excel is pretty relaxed about the order in which these appear as well, as it uses the names of elements and attributes rather than their order when creating a map.

It is possible to extract portions of XML documents that look like these structures, even if the rest of the document looks different, but it does take a few extra steps.



Regular structure

When Excel works with an XML document, it represents the data as rows and columns. It's very difficult for Excel to determine which rows and columns to create if the data of the document isn't consistent. It does make a best effort, but there are limits. The occasional missing piece of information shouldn't cause drastic difficulties, but extra information may not be imported, and consistency makes results much more predictable.


No mixed content

One of XML's best features for working with documents is the ability to mix elements and text together freely. A classic simple use of mixed content is highlighting information in bold or italic:

<sentence><b>This is in bold</b> and <i>this is in italic</i>.</sentence>

Unfortunately, these structures fit very badly with Excel's view of XML data as cells in a grid. If you need to process XML data that includes mixed content, you should either use Word (which is designed to support it) or pre-process your XML to strip out the extra markup.


Schema for type information (optional)

While Excel doesn't require XML Schema files that describe the XML documents you use, schemas can be a very convenient tool both for describing the information that you'll be including in a spreadsheet to Excel and for sanity-checking the documents users work with in the Excel environment. If there isn't a schema, Excel makes a pretty good best effort to analyze data and guess what schema would be appropriate.


Limited depth

Excel does well with lists of information, but can really only present two levels of lists, representing rows and cells. If a document has many layers of lists, or uses elements containing elements with the same name (recursive markup, commonly used in lists), Excel will not be able to import all of the data.

Effectively, Excel only works well with a small subset of the many possible XML document structures. The Excel subset, however, is an extremely common subset in practice. Enormous amounts of data are available in XML formats that work well with Excel.

6.2.2 Opening XML Documents Directly

The standard Excel dialog box for opening files shows XML files (or files ending in the extension .xml) right along with Excel spreadsheets, as shown in Figure 6-1.

Figure 6-1. XML files appearing in the Excel Open dialog box
figs/oxml_0601.gif


Only one of the choices presented here is a traditional Excel spreadsheet, twoPlusTwo.xls. The other files are XML files. XML files that Excel knows belong to Microsoft Word (thanks to the mso-application processing instruction), the ch02-x series, are marked with the Word icon, while ch0601.xml, an Excel SpreadsheetML file, has the Excel icon. XML files using other vocabularies get a different icon. On my system, they get a Mozilla logo, but they may have a different logo on your system, depending on what XML-processing software you have installed.

Whatever logo appears, however, you can attempt to open any XML file. If the XML contains anything other than Excel's own SpreadsheetML, covered in Chapter 7, you'll see the dialog box shown in Figure 6-2.

Figure 6-2. Dialog box for choosing how to handle XML document importation
figs/oxml_0602.gif


If the XML document you open contains any elements named html, you won't see the dialog box shown in Figure 6-2. Instead, Excel will attempt to open it as an HTML document. It even seems to do this if the elements that look like HTML are in another namespace.


6.2.2.1 Opening documents as a list

We'll start with a simple XML document recording (imaginary) sales of books to explore how these different options work, shown in Example 6-1.

Example 6-1. A simple XML document for analysis in Excel
<?xml version="1.0" encoding="UTF-8"?> <sales>     <sale> <date>10/5/2003</date> <ISBN>0596005385</ISBN> <Title>Office 2003 XML Essentials</Title> <PriceUS>34.95</PriceUS> <quantity>200</quantity> <customer >Zork's Books</customer> </sale>     <sale> <date>10/5/2003</date> <ISBN>0596002920</ISBN> <Title>XML in a Nutshell, 2nd Edition</Title> <PriceUS>39.95</PriceUS> <quantity>90</quantity> <customer >Zork's Books</customer> </sale>     <sale> <date>10/5/2003</date> <ISBN>0596002378</ISBN> <Title>SAX2</Title> <PriceUS>29.95</PriceUS> <quantity>300</quantity> <customer >Zork's Books</customer> </sale>     <sale> <date>10/7/2003</date> <ISBN>0596005385</ISBN> <Title>Office 2003 XML Essentials</Title> <PriceUS>34.95</PriceUS> <quantity>10</quantity> <customer >Books of Glory</customer> </sale>     <sale> <date>10/7/2003</date> <ISBN>0596002920</ISBN> <Title>XML in a Nutshell, 2nd Edition</Title> <PriceUS>39.95</PriceUS> <quantity>25</quantity> <customer >Books of Glory</customer> </sale>     <sale> <date>10/7/2003</date> <ISBN>0596002378</ISBN> <Title>SAX2</Title> <PriceUS>29.95</PriceUS> <quantity>5</quantity> <customer >Books of Glory</customer> </sale>     <sale> <date>10/18/2003</date> <ISBN>0596002378</ISBN> <Title>SAX2</Title> <PriceUS>29.95</PriceUS> <quantity>15</quantity> <customer >Title Wave</customer> </sale>     <sale> <date>10/21/2003</date> <ISBN>0596002920</ISBN> <Title>XML in a Nutshell, 2nd Edition</Title> <PriceUS>39.95</PriceUS> <quantity>15</quantity> <customer >Books for You</customer> </sale>     </sales>

If you open this document from Excel and choose "Open as an XML List," you'll see the dialog box shown in Figure 6-3.

Figure 6-3. Excel's warning that no schema is in use
figs/oxml_0603.gif


If you just go ahead and click OK, Excel will look at the document, infer a schema for it, build a list based on that schema, and import the contents of the XML document into that list. You'll be rewarded with the spreadsheet result shown in Figure 6-4.

Figure 6-4. The XML document shown in Example 6-1 presented as an XML list in Excel
figs/oxml_0604.gif


Excel not only imports the data from the XML document, it uses the element and attribute names as list headers. The drop down tabs to the right of the list headers let you organize the information as you'd like, as shown in Figure 6-5.

Figure 6-5. Choosing a sort or filter from a drop-down
figs/oxml_0605.gif


If you choose "Sort Ascending," for instance, you'd see the list sorted by ISBN, as shown in Figure 6-6.

Figure 6-6. Sorting the data from the XML document by ISBN
figs/oxml_0606.gif


Excel also offers some basic functionality for totaling and averaging the contents of these lists. Right-clicking on the list anywhere inside the blue box brings up a menu. If you choose List Total Row, you'll see an extra row appear at the bottom of the list, as shown in Figure 6-7.

Figure 6-7. A total row added to the spreadsheet
figs/oxml_0607.gif


By default, Excel just does a sum of the right-most column. That's common practice for spreadsheets, though in this case it works badly, since the IDs aren't exactly addable. Clicking on the cells in the total row brings up a drop-down tab. Figure 6-8 shows the choices it offers.

Figure 6-8. Total row options
figs/oxml_0608.gif


For quantity, it might be nice to know the total number of units ordered. We'll select Count for ISBN so we know how many orders we have. Figure 6-9 shows the results.

Figure 6-9. Total row results
figs/oxml_0609.gif


This is somewhat useful, but odds are good that we want to be able to perform more sophisticated calculations on the information. Fortunately, we can access the information in the list from the rest of Excel. For starters, we might well want a column that provides the total cost of an order the quantity times the price. Because this is just Excel data, that's easily done. We'll add a "Total" header in cell H1, and then a formula, =D2*E2, in cell H2. If we copy that formula from H2 to cells H3-9, we get the results shown in Figure 6-10.

Figure 6-10. Total column results
figs/oxml_0610.gif


Because we put this column right next to the XML data, Excel added this column to the list, and gave it the same sort and total capabilities of the rest of the list. Formulas can reference this data from other workbooks or from non-adjacent cells, though they won't be built into the list the same way.

While Excel provides no means of referring to data in this list by list name and column, you can safely reference the range and have Excel automatically adjust if a user reloads the XML document or modifies the information.

This only works on a list that already has data in it. If you base your formulas on an empty list, the ranges won't expand properly. The last row of the list is an entry area, which Excel doesn't count when it adjusts ranges.


To show how to reference data, we'll create some formulas on Sheet2 that reference the range containing the XML data in Sheet1, as shown in Figure 6-11.

Figure 6-11. Calculations on the XML data
figs/oxml_0611.gif


If we go back to Sheet1, and right-click on the XML list area, the XML sub-menu lets you Import... new data. When we import the data in ch0602.xml, a slightly extended version of the same information, Excel presents the data as shown in Figure 6-12.

Figure 6-12. Adding more XML data
figs/oxml_0612.gif


It's the same data, with a few extra sales. If we now return to Sheet2, as shown in Figure 6-13, we can see that the sales figures and the formulas have updated smoothly.

Figure 6-13. Automatically updated calculations on the XML data
figs/oxml_0613.gif


Excel only does this updating when new data is imported, not when changes are made to the original file, so you should set user expectations appropriately. This is a very simple example, admittedly, but you can build much more sophisticated spreadsheet applications on these same principles.

Using sample documents to create a list this way is very convenient, but you should be aware that if you re-use the list on another XML document that contains more structures than appeared in the original document, those extra structures won't get imported.


6.2.2.2 Opening documents as a read-only workbook

If you're extracting data from XML documents, you may find it useful to open them as read-only workbooks. The presentation of the information is very different, and there's no option for exporting the XML back out of the spreadsheet, but more explicit information about where the information came from is provided in the header rows. If we open ch0601.xml and select "Open as a read-only workbook," we see the result shown in Figure 6-14.

Figure 6-14. Sales data loaded as a read-only workbook
figs/oxml_0614.gif


Information about where the data came from is provided in the XPath-like headers. The #agg information seems to be aggregated information, though in this case there's only one item per column.

The name "read-only workbook" is slightly misleading. You can make changes to the data, and you can save this file elsewhere. The "read-only" just means that you can't make changes to the original XML document using this approach; if you save the file, it's saved as an Excel workbook. It isn't nearly as flexible as the list approach, but it also lets you extract information from a wider variety of documents. Given its lack of flexibility and Microsoft's lack of documentation for the resulting format, this feature is probably best used only when you want to dump content from a document into Excel and don't mind doing a lot of organization yourself.

6.2.2.3 Using the XML source task pane

Opening an XML document using the XML Source task pane produces results that are much like the list created by opening the document as an XML list, but it allows you to have more control over what appears in the list and what doesn't. Many XML documents, for example, have header information followed by repeating sections. If opened directly as a list, Excel will produce a lot of columns that repeat the header information, when it really only appeared once. Using the XML Source task pane lets you choose what elements or attributes you want to appear in the Excel grid, and is especially useful when you only want to see or work with a subset of the information used in a document.

To show off the source task pane, we'll open ch0601.xml and select "Use the XML Source task pane." If, like ch0601.xml, the XML document doesn't contain a reference to an XML Schema, Excel displays the same warning that was shown in Figure 6-3, and then generates a schema based on what it finds in the document, producing the XML task pane contents shown in Figure 6-15.

Figure 6-15. Using the XML Source task pane to select XML document parts for display in Excel
figs/oxml_0615.gif


To put information on to the spreadsheet, click on items in the task pane and drag them over to the grid. If you drag the date over to cell A3, you'll see the result shown in Figure 6-16.

Figure 6-16. Adding a component from the XML Source pane adds a column, but not the data
figs/oxml_0616.gif


While we originally started out as if we were loading a document into Excel, Excel instead loaded the structure of the document rather than its contents. Using the XML Source task pane means building the structure you want in the spreadsheet from the parts in the XML document and then importing the XML document's content. If you drag more of them over and align them side by side, Excel will create a single large list, as shown in Figure 6-17 (If some parts of a document don't repeat, you can place them in cells that are not adjacent to the main body of the list.).

Figure 6-17. A list, created from the task pane
figs/oxml_0617.gif


Populating that list takes an extra step. If the List toolbar is visible (and you can find it at View Toolbars if it isn't already visible), you can click on the Import XML Data button, as the task pane advises, find your XML document, and import it. If the toolbar isn't visible, right-clicking on the list will bring up a menu with an XML entry. Select Import . . . from that menu, choose your XML document, and Excel will import the data. Figure 6-18 shows what Excel produces if you import ch0601.xml into this list.

Figure 6-18. A filled-in list, with the List toolbar turned on
figs/oxml_0618.gif


At this point, you can work with the list the same way you could when the list was loaded directly. One important feature of building the list this way that you don't get when documents are loaded directly is that you can also place non-repeating elements on the spreadsheet. Let's suppose the sales element also contained an element named store, identifying which store had these sales. Figure 6-19 shows the store element placed above the rest of the list, displaying the value of store once and only once.

Figure 6-19. A filled-in list, with a single element above the repeating portion of the list
figs/oxml_0619.gif


6.2.3 Working with XML Maps

Opening XML documents directly is a great way to get started or to quickly analyze information, but in the long run you'll probably want to build spreadsheets that take a more structured approach. The XML Source pane lets you define XML Maps, describing the relationships between XML document structures and the lists that actually appear in your spreadsheet. These maps are built on XML Schemas, though they may either be schemas you specify or, as the previous examples showed, schemas that Excel derived by example from documents.

Most interactions with XML Maps take place through the task pane's XML Source view, or through schemas or documents which you use as a foundation for the map. Once you've created a map, there isn't much you can do through the Excel interface to change its basic structures, so getting your schema right in the first place is a critical step in creating spreadsheets that work with XML.

6.2.3.1 Excel and XML Schema

The XML Schema Recommendation provides a much more comprehensive set of tools for describing information than Excel actually needs. As noted earlier, Excel is good primarily for interacting with certain kinds of document structures, so some document-oriented features of XML Schema (like types that use mixed content) don't work with Excel. Similarly, Excel has had its own set of types for internal consumption for over a decade, and retrofitting Excel with the complete XML Schema datatype system probably would not be wise. Microsoft uses a combination of existing types to support the larger XML Schema system, as shown in Table 6-1.

More information on creating schemas with a variety of tools is covered in Appendix C and Appendix D.


Table 6-1. Mappings between XSD datatypes and Excel datatypes

XSD Datatype

Excel Format

Limitations

time

h:mm:ss

If time zones are used, stored as text.

dateTime

m/d/yyyy h:mm (may vary with local versions of Excel)

No time zones. Excel doesn't understand years below 1900 or above 9999. If either of those violated, stored as text.

date

m/d/yyyy (may vary with local versions of Excel)

No time zones. Excel doesn't understand years below 1900 or above 9999. If either of those violated, stored as text.

gYear

Number, integers only

No time zones. Excel doesn't understand years below 1900 or above 9999. If either of those violated, stored as text.

gDay, gMonth

Number, integers only

If time zones are used, stored as text.

gYearMonth

mmm-yy

No time zones. Excel doesn't understand years below 1900 or above 9999. If either of those violated, stored as text.

gMonthDay

d-mmm

 

anyType, anyURI, base64Binary, duration, ENTITIES, ENTITY, hexBinary, ID, IDREF, IDREFS, language, Name, NCName, NMTOKEN, NMTOKENS, normalizedString, NOTATION, QName, string, token

Text

 

boolean

Boolean

 

decimal, float, double

General

Insignificant zeros will be dropped, and only negative signs will be displayed. All of these forms, despite their XSD differences, are used in calculations using 15 digits of precision.

byte, int, integer, long, negativeInteger, nonNegativeInteger, nonPositiveInteger, positiveInteger, short, unsignedByte, unsignedInt, unsignedLong, unsignedShort.

General

 


These differences mean that you should not expect Excel to keep close track of the validation specified by the schema. Excel will behave as it has always behaved, with a set of rules for mapping between Excel and XSD. Formats that are represented as text will be imported or exported as they appear, while formats that have a more complex type may be formatted by Excel in the spreadsheet and according to XSD rules in the XML.

Excel also performs similar simplifications on content models. Excel is not designed as an über-XML-document processor, and it doesn't need the structural type information provided by XML Schema. From Excel's perspective, it needs to know what data goes together as a row and in what columns. Simpler structures are more manageable, and far less likely to break. While there may be times you need to work with XML that arrived with a complex schema, it may be easier in such cases to break the documents into smaller pieces and use simpler schemas if possible.

6.2.3.2 Creating an XML Map

Although some of the techniques described earlier in Section 6.2.2Section 6.2.2 create XML Maps, there are times when you'll want to incorporate data from XML documents in an existing spreadsheet, and those techniques don't work as well for that.

To create an XML Map in an existing spreadsheet, you need to bring up the task pane (View Task Pane, if it's not already there) and select XML Source from the drop-down menu at the top of the task pane. Unless you've done XML work with this spreadsheet before, you should have an empty task pane, like the one shown in Figure 6-20.

Figure 6-20. The XML Source task pane, before any sources are listed
figs/oxml_0620.gif


To create an XML Map, click on the XML Maps... button. The XML Maps dialog box, shown in Figure 6-21, will appear, empty.

Figure 6-21. A fresh XML Maps dialog box
figs/oxml_0621.gif


To add a map, click the Add . . . button. For an example, we'll use a document structure that both fits the Excel grid approach and tests out how it handles a variety of the XML Schema datatypes listed Table 6-1, using a contract management system as a guide.

For more information about creating the XML Schemas that are the foundations for these maps, see Appendix C. You may want to explore the tools mentioned at the end of that Appendix for inferring schemas from documents in particular.


These are some pretty simple contracts, which are just about payments on birthdays, enough information to get a sense of how Excel treats different datatypes. The schema for the contract description is listed in Example 6-2.

Example 6-2. A simple schema for contracts
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"  targetNamespace="http://simonstl.com/ns/example/contract" xmlns:contract="http://simonstl. com/ns/example/contract">   <xs:element name="contracts">     <xs:complexType>       <xs:sequence>         <xs:element maxOccurs="unbounded" ref="contract:contract"/>       </xs:sequence>     </xs:complexType>   </xs:element>   <xs:element name="contract">     <xs:complexType>       <xs:sequence>         <xs:element ref="contract:recipient"/>         <xs:element ref="contract:signing_date"/>         <xs:element ref="contract:signing_time"/>         <xs:element ref="contract:birthyear"/>         <xs:element ref="contract:birthday"/>         <xs:element ref="contract:male"/>         <xs:element ref="contract:payment_amount"/>         <xs:element ref="contract:years_to_pay"/>       </xs:sequence>     </xs:complexType>   </xs:element>   <xs:element name="recipient" type="xs:string"/>   <xs:element name="signing_date" type="xs:date"/>   <xs:element name="signing_time" type="xs:time"/>   <xs:element name="birthyear" type="xs:gYear"/>   <xs:element name="birthday" type="xs:gMonthDay"/>   <xs:element name="male" type="xs:boolean"/>   <xs:element name="payment_amount" type="xs:decimal"/>   <xs:element name="years_to_pay" type="xs:integer"/> </xs:schema>

We'll use this schema to create an XML map by clicking the Add... button and selecting this schema from the browse dialog that appears. When the Multiple Roots dialog box shown in Figure 6-22 appears, select "contracts" from the list and click OK.

Figure 6-22. Choosing the root element for the map
figs/oxml_0622.gif


You'll be rewarded with the result shown in Figure 6-23, a new XML Map that is named contract_map, after the root element, which describes the namespace http://simonstl.com/ns/example/contract.

Figure 6-23. The XML Map, ready to go
figs/oxml_0623.gif


You may notice that your choices for manipulating this map are very limited. You can rename it or delete it, but you can make no changes. Chapter 7 will explore how you can, if necessary, make changes to XML Maps through SpreadsheetML's XML representation of them.


If you click OK, you'll see XML components ready to be used in the XML Source task pane, as shown in Figure 6-24.

Figure 6-24. XML components, ready for use
figs/oxml_0624.gif


The XML Source pane uses a number of icons to describe the structure of the XML document, much like those used to represent files and folders in the Windows Explorer. These icons are shown in Table 6-2.

Table 6-2. XML Source pane icons

Icon

Description

figs/icon0601.gif

Used to represent a container element that may appear one or many times, most typically the root element of the document. (Root elements aren't actually optional, but perhaps Excel does this to support the possibility of an empty map.)

figs/icon0602.gif

Used to represent a container element that may only appear once, often an element that contains attributes.

figs/icon0603.gif

Used to represent a container element that may appear repeatedly, most typically an element that represents rows.

figs/icon0604.gif

Used to represent a data element that must appear once and that contains data rather than other elements.

figs/icon0605.gif

Used to represent a data element that may appear once (or not all), which contains data rather than other elements. (The same icon, very slightly darker, is used for attributes.)

figs/icon0606.gif

Used to represent a data element that may appear multiple times. These often give Excel trouble as they often break out of the simple grid structure.

figs/icon0607.gif

Used to represent the value of an element, typically when the element also has an attribute or attributes. This allows you to put an element's content into the grid separately from the values of any attributes it may have.


If you drag the ns1:contract icon to cell A1, you'll get a list based on this map set up and ready for use, as shown in Figure 6-25.

Figure 6-25. A list based on the XML Map, ready for use
figs/oxml_0625.gif


I tend to clean these up and remove the "ns1" prefixes, as you'll see in later examples. You can also dismiss the XML Source pane, and bring it back up only if you need its "Verify Map for Export . . . " option.

If you select a cell in Row 2, and select Format Cells, you can see how Excel has formatted the data automatically. For example, if you do this to the signing date, you'll see the result in Figure 6-26.

Figure 6-26. Cell formatting applied by Excel to dates
figs/oxml_0626.gif


While Excel has used the schema to determine cell formatting, it is not currently using the datatypes in the schema for any kind of data validation. If you import an XML document (or type in data) that doesn't correspond to Excel's expectations, it will format it as text. To make Excel use the schema for validation which only happens on import and export in any event you need to right-click on the list, select the XML sub-menu, and then select XML Map Properties. The dialog box shown in Figure 6-27 will appear.

Figure 6-27. The XML Map Properties dialog box
figs/oxml_0627.gif


The "Validate data against schema for import and export" is always turned off by default. While that may seem strange in contexts where you want validation to check user data, it also avoids some odd problems. It's possible that users will want to import documents that have problems so that they can repair them. It's also possible that users will be so frustrated by a document that they want to send it to someone else to sort out, without being told they can't save the file.

We'll use two test documents to explore how this works. The first one, shown in Example 6-3, is a deliberately invalid XML document, with all kinds of data that doesn't match the datatypes used by the schema. The second, shown in Example 6-4, is a document that is valid against the schema we've used.

Example 6-3. An invalid document for use in the map
<contracts xmlns="http://simonstl.com/ns/example/contract">     <!--This document is NOT VALID.-->     <contract> <recipient>Jedidiah Smith</recipient> <signing_date>June 27, 1992</signing_date> <signing_time>4 PM</signing_time> <birthyear>62</birthyear> <birthday>23 November</birthday> <male>yes</male> <payment_amount>$27</payment_amount> <years_to_pay>two</years_to_pay> </contract>         <contract> <recipient>Jane Zinger</recipient> <signing_date>April 22, 2001</signing_date> <signing_time>6:30 PM</signing_time> <birthyear>75</birthyear> <birthday>19 July</birthday> <male>no</male> <payment_amount>$42</payment_amount> <years_to_pay>four</years_to_pay> </contract>     </contracts>

Example 6-4. A valid document for use in the map
<contracts xmlns="http://simonstl.com/ns/example/contract">     <!--This document is VALID.-->     <contract> <recipient>Josiah Smith</recipient> <signing_date>1999-06-03</signing_date> <signing_time>09:03:22</signing_time> <birthyear>1962</birthyear> <birthday>--06-21</birthday> <male>true</male> <payment_amount>0004002.00200</payment_amount> <years_to_pay>26</years_to_pay> </contract>     <contract> <recipient>Jane Zang</recipient> <signing_date>1999-04-03</signing_date> <signing_time>11:04:28</signing_time> <birthyear>1968</birthyear> <birthday>--04-23</birthday> <male>false</male> <payment_amount>000401.0200</payment_amount> <years_to_pay>2</years_to_pay> </contract>     </contracts>

If a user attempts to import Example 6-3 into this map, they'll get the list of warnings shown in Figure 6-28.

Figure 6-28. "Some data was imported as text" errors on import
figs/oxml_0628.gif


While this may dissuade some users, it doesn't sound like a big deal, and all those "Complete"s are pretty reassuring. The map also looks all right in Excel, if you aren't cued in to the formatting. Figure 6-29 shows the import results.

Figure 6-29. Bad results that look like they might be okay in Excel
figs/oxml_0629.gif


If you export this map, as shown in the next section, Excel goes right ahead with it. If you use the "Verify Map for Export" link on the XML Source task pane, Excel notifies you that "contract_map is exportable." The results of the export, shown in Example 6-5, make it clear that Excel has imported and exported the document, as it's added the ns1 prefix everywhere.

Example 6-5. The exported version of the broken document
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <ns1:contracts xmlns:ns1="http://simonstl.com/ns/example/contract">     <ns1:contract>         <ns1:recipient>Jedidiah Smith</ns1:recipient>         <ns1:signing_date>June 27, 1992</ns1:signing_date>         <ns1:signing_time>4 PM</ns1:signing_time>         <ns1:birthyear>62</ns1:birthyear>         <ns1:birthday>23 November</ns1:birthday>         <ns1:male>yes</ns1:male>         <ns1:payment_amount>$27</ns1:payment_amount>         <ns1:years_to_pay>two</ns1:years_to_pay>     </ns1:contract>     <ns1:contract>         <ns1:recipient>Jane Zinger</ns1:recipient>         <ns1:signing_date>April 22, 2001</ns1:signing_date>         <ns1:signing_time>6:30 PM</ns1:signing_time>         <ns1:birthyear>75</ns1:birthyear>         <ns1:birthday>19 July</ns1:birthday>         <ns1:male>no</ns1:male>         <ns1:payment_amount>$42</ns1:payment_amount>         <ns1:years_to_pay>four</ns1:years_to_pay>     </ns1:contract> </ns1:contracts>

If, however, we turn on "Validate data against schema for import and export," we'll get an extra error message, shown in Figure 6-30.

Figure 6-30. An error message produced by failed validation
figs/oxml_0630.gif


Close that window, however, and you have the same imported result shown in Figure 6-29. Excel is not very interested in blocking bad data. If you select the schema validation error and click Details . . . you'll see the information presented in Figure 6-31.

Figure 6-31. A report on schema validation failure
figs/oxml_0631.gif


This is somewhat more meaningful, but:

  • It only reports on the first of many errors it encountered.

  • It presents the element name in a form that's not familiar to many users.

  • The line, column, and offset information is inaccurate and useless.

Hopefully, future versions of Excel will provide better support for validation on import that is more helpful to users and more useful for developers. Similarly, exporting this document produces the same result already shown in Example 6-5, but produces a warning message, shown in Figure 6-32.

Figure 6-32. The error message from exporting an invalid XML document
figs/oxml_0632.gif


Hopefully users will see this and at least know there's a problem, and perhaps being able to export XML will make it easier for them to pass it to someone else who can clean it up before schema-dependent processing takes place.

If, on the other, we import Example 6-4, the valid document, we get no error messages and a spreadsheet whose formatting conforms to the expectations described in Table 6-1. Figure 6-33 shows this valid document after import into our list area.

Figure 6-33. A valid XML document, imported
figs/oxml_0633.gif


This document also exports perfectly well. Given good examples, users should be able to produce good results. If users get used to seeing error messages with no obvious ill effects, though, it may cause trouble down the road.



Office 2003 XML
Office 2003 XML
ISBN: 0596005385
EAN: 2147483647
Year: 2003
Pages: 135

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net