Hack 95 Load an XML Document into Excel

   

figs/moderate.gif figs/hack95.gif

If someone sends you an XML file containing data that fits into tables, you don't need to read the text and all its angle brackets. You can load the document into Excel directly, tell Excel how you want to present it, and work with the data through maps .

Extensible Markup Language (XML) has become a common interchange format over the past few years , and it's not unusual for people and organizations to send each other XML files. XML's simple core structures make it easy to exchange information with much less concern that all parties are using the same software. Until recently, however, although generic XML tools were widely available, bridging the gap between XML documents and the user interface was difficult. Excel 2003 makes it much easier, at least for data that fits on a grid.

This hack uses Excel features that are available only in Excel 2003 on Windows. Earlier versions of Excel do not support this, and neither do current or announced Macintosh versions of Excel.


We'll start with a sample XML document, shown in Example 8-1.

Example 8-1. A simple XML document for analysis in Excel
 <?xml version="1.0" encoding="UTF-8"?> <sales> <sale> <date>2003-10-05</date> <ISBN>0596005385</ISBN> <Title>Office 2003 XML Essentials</Title> <PriceUS>34.95</PriceUS> <quantity>200</quantity> <customer ID="1025">Zork's Books</customer> </sale> <sale> <date>2003-10-05</date> <ISBN>0596002920</ISBN> <Title>XML in a Nutshell, 2nd Edition</Title> <PriceUS>39.95</PriceUS> <quantity>90</quantity> <customer ID="1025">Zork's Books</customer> </sale> <sale> <date>2003-10-05</date> <ISBN>0596002378</ISBN> <Title>SAX2</Title> <PriceUS>29.95</PriceUS> <quantity>300</quantity> <customer ID="1025">Zork's Books</customer> </sale> <sale> <date>2003-10-05</date> <ISBN>0596005385</ISBN> <Title>Office 2003 XML Essentials</Title> <PriceUS>34.95</PriceUS> <quantity>10</quantity> <customer ID="1029">Books of Glory</customer> </sale> <sale> <date>2003-10-05</date> <ISBN>0596002920</ISBN> <Title>XML in a Nutshell, 2nd Edition</Title> <PriceUS>39.95</PriceUS> <quantity>25</quantity> <customer ID="1029">Books of Glory</customer> </sale> <sale> <date>2003-10-07</date> <ISBN>0596002378</ISBN> <Title>SAX2</Title> <PriceUS>29.95</PriceUS> <quantity>5</quantity> <customer ID="1029">Books of Glory</customer> </sale> <sale> <date>2003-10-18</date> <ISBN>0596002378</ISBN> <Title>SAX2</Title> <PriceUS>29.95</PriceUS> <quantity>15</quantity> <customer ID="2561">Title Wave</customer> </sale> <sale> <date>2003-10-21</date> <ISBN>0596002920</ISBN> <Title>XML in a Nutshell, 2nd Edition</Title> <PriceUS>39.95</PriceUS> <quantity>15</quantity> <customer ID="9021">Books for You</customer> </sale> </sales> 

You can open this directly from Excel 2003 by selecting File Open.... The dialog box shown in Figure 8-1 will appear.

Figure 8-1. Opening an XML file in Excel 2003
figs/exhk_0801.gif

If you select "As an XML list," you'll first be warned that Excel will be creating its own schema for this schema-free document, as shown in Figure 8-2. After clicking OK, you'll be rewarded with Excel's best guess as to how to present the information in the document as a spreadsheet, as shown in Figure 8-3.

Figure 8-2. Excel 2003 warning about lack of schema references
figs/exhk_0802.gif
Figure 8-3. XML data presented as an XML list in Excel 2003
figs/exhk_0803.gif

Note that Excel expected the date format used by the date element, so it is now displaying dates imported as 2003-10-05 as 10/5/2003.

Once you have the document loaded into Excel, you can treat the data much like you would any other data in Excelincorporating it into formulas, creating named ranges for it, making charts based on its contents, etc. To help you, Excel provides some built-in functionality for analyzing the data. The drop-down boxes in the column headers enable you to choose how to sort the data (the default is the order the document had originally). You also can turn on a total row, either from the List toolbar or by right-clicking anywhere on the list and selecting List Total Row from the pop-up menu. Once the total row appears, you can choose what kind of total you prefer from the drop-down menu displayed in Figure 8-4.

Figure 8-4. Choosing totals for an XML list in Excel 2003
figs/exhk_0804.gif

You also can refresh the data, updating that area with information from an XML document with the same structure. If you had another document with the same structure, you could right-click the list, select XML Import... from the pop-up menu, and choose a different document. With more data, it might look like Figure 8-5.

Figure 8-5. The same XML list with updated data
figs/exhk_0805.gif

In addition, you can export the data back to an XML file after editing it by right-clicking the list and selecting XML Export... from the pop-up menu. This makes Excel a very convenient editing tool for simple XML documents with tabular structures.

For simple data, you can usually trust Excel to guess what a file's contents are and use the default view it provides. As data gets more complicated, however, especially if it contains dates or text that looks like numbers (note the missing initial zeros on the ISBNs in the preceding figures!), you might want to use XML schemas to constrain how Excel reads your data and what kinds of data will fit in a given map. For this document, an XML schema might look like Example 8-2.

Example 8-2. A schema for the book sales data
 <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">   <xs:element name="sales">     <xs:complexType>       <xs:sequence>         <xs:element maxOccurs="unbounded" ref="sale"/>       </xs:sequence>     </xs:complexType>   </xs:element>   <xs:element name="sale">     <xs:complexType>       <xs:sequence>         <xs:element ref="date"/>         <xs:element ref="ISBN"/>         <xs:element ref="Title"/>         <xs:element ref="PriceUS"/>         <xs:element ref="quantity"/>         <xs:element ref="customer"/>       </xs:sequence>     </xs:complexType>   </xs:element>   <xs:element name="date" type="xs:date"/>   <xs:element name="ISBN" type="xs:string"/>   <xs:element name="Title" type="xs:string"/>   <xs:element name="PriceUS" type="xs:decimal"/>   <xs:element name="quantity" type="xs:integer"/>   <xs:element name="customer">     <xs:complexType mixed="true">       <xs:attribute name="ID" use="required" type="xs:integer"/>     </xs:complexType>   </xs:element> </xs:schema> 

Note that the date element is defined as a date, and the ISBN element is defined as a string here, not an integer. If you start by opening this schema rather than the document, you can have Excel load the document while preserving the initial zero on the ISBN s. This time, you'll create a list before loading the XML document, starting from a blank worksheet.

You'll need to open the XML Source task pane. If the task pane isn't open already, open it by selecting View Task Pane or by pressing Control-F1. Then select XML Source from the drop-down box at the top of the task pane, and you'll see something such as that shown in Figure 8-6.

Figure 8-6. The XML Source task pane
figs/exhk_0806.gif

To load the schema, click the XML Maps... button. This will bring up the XML Maps dialog box, shown in Figure 8-7.

Figure 8-7. The XML Maps dialog box
figs/exhk_0807.gif

Click the Add... button to open the schema, and select the schema as shown in Figure 8-8.

Figure 8-8. Selecting an XML schema
figs/exhk_0808.gif

If your schema doesn't restrict documents to having only one possible starting element, Excel will ask you which element to use as the root element, as shown in Figure 8-9. Because the documents in this example start with the sales element, pick "sales."

Figure 8-9. Selecting a root element for the schema
figs/exhk_0809.gif

After you click OK, Excel warns about possible difficulties it might have in interpreting schemas in the dialog box shown in Figure 8-10. XML Schema is an enormous specification that supports a wide variety of structures that don't fit well with Excel's perspective on information, so Excel has some limitations.

Figure 8-10. Warning label for schema processing
figs/exhk_0810.gif

Excel will show that your schema has been added to the spreadsheet in the XML Maps dialog, which should look like that shown in Figure 8-11.

Figure 8-11. A schema loaded as an XML map
figs/exhk_0811.gif

If you click OK, you'll be returned to the main Excel interface, and the XML Source task pane will be populated with a diagram of the structure the schema described, such as that shown at the right of Figure 8-12. Now that you have the structure, you can lay out the list. The easiest way to do thisespecially with a small document such as this oneis to drag the sales icon to cell A1, producing the result in Figure 8-12.

Figure 8-12. An XML list created from the schema information in the XML Source task pane
figs/exhk_0812.gif

You also can drag items over individually, if you want to change the order or want to put different pieces in different places on the spreadsheet.


Now that you have a home for the data, it's time to populate it. You can either click the Import XML Data button on the list toolbar, or right-click the list and select XML Import. If you choose the file you used earlier (Example 8-1), you'll see a result such as that shown in Figure 8-13. Note the addition of the leading zeros to the ISBNs, which are now text, as they should be.

Figure 8-13. The XML list, populated with data
figs/exhk_0813.gif

Excel's support for XML maps and lists means you can create spreadsheets that work on data arriving in separate files with more flexibility than prior formats such as CSV or tab-delimited formats provided. Instead of having to be connected to a database to edit data interactively, a user can edit XML files while on an airplane and feed that XML to an appropriate consumer when he lands. Perhaps the best aspect of Excel's new XML features is their flexibility; so long as the data is in a structure that fits on a grid, Excel has very few rules about what kinds of XML it will accept. With a few mouse clicks and no programming, you can integrate XML data with your spreadsheets.

Simon St.Laurent



Excel Hacks
Excel Hacks: Tips & Tools for Streamlining Your Spreadsheets
ISBN: 0596528345
EAN: 2147483647
Year: 2003
Pages: 136

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net