Section 7.6.  Data analysis

Prev don't be afraid of buying books Next

7.6. Data analysis

Dawn, the Worldwide Widget accountant, finds that XML has added benefits when she wants to analyze all the expense reports that were submitted to her.

In the past, when all the employees submitted their Excel expense reports to her, Dawn had to do a lot of cutting and pasting to get any kind of summary reports. Now that they are submitting XML documents, she can create her own workbook to utilize a whole set of analysis tools.

Employees submit their expense reports by e-mail. They attach XML documents named EXP_nnn_YYYYMMDD.xml, where nnn is their employee number, and YYYYMMDD is the end date of the expense report.

Dawn creates an Excel workbook that consists of several worksheets. The first worksheet, entitled Raw Data, contains the map to import all the data from the expense reports.

7.6.1 Flattening XML data

Dawn wants to import expense reports from several different employees into the same worksheet. The worksheet we created earlier in this chapter would not work, because it would overwrite all the non-repeating elements every time a new expense report was imported. Instead, Dawn needs a worksheet that will repeat all the normally non-repeating "header" information in each row, alongside the repeating expense item elements.

You can think of this as flattening the hierarchy. We are extracting the elements with data content, regardless of the level they are at, and presenting them in adjacent cells in the same rows of the spreadsheet.[4]

[4] In the punched card data processing days, this was called a unit record because it was a self-contained complete record of a transaction and therefore could be sorted and tabulated.

To do this:

1. Open a new, blank worksheet.

2. Create a new map based on the expenses.xsd schema, as described in 7.3.1, "Creating a map", on page 138.

3. This time, instead of dragging each element type onto the worksheet individually, select the expenseReport element type and drag it to the worksheet.

You now have a large list range with a column for each element type that can contain only data (i.e. that cannot contain elements). The column headings (element-type names) are shown in Table 7-1.

Table 7-1. Column headings for flattened structure

empName

empNum

purpose

deptCode

billToCode

periodFrom

periodTo

date

explanation

mileage

airFare

perDiem

lodging

auto

taxi

phone

businessMeals

code

amount




During import, the data in the non-repeating element types is repeated in each expense item row.

To import several expense reports at once:

1. Click Import on the XML submenu.

2. Select all the files in the raw data directory at once, using Shift+click or Ctrl+click.

3. Click Open.

This will import all the data from all of the expense reports, as shown in Figure 7-8.

Figure 7-8. Flattened data from multiple expense reports (expense data analysis.xls)




Caution

The flattened data cannot be exported using the map with which you imported it. Excel will warn that the map is not exportable because it contains denormalized data. However, you can export the data – or calculations based on the data – by creating a separate range of cells that reference the data using cell equations. This second range of cells can be mapped to a separate schema and exported.




7.6.2 Reports and charts

Now that the data is combined, Dawn can use summary reports and charts to analyze it, as shown in Figure 7-9 and Figure 7-10. When she wants to update the analysis, she can simply import new XML data (or refresh the current data) and the summary report and charts will automatically be updated. A working version of this worksheet is in expense data analysis.xls.

Figure 7-9. Summary report




Figure 7-10. Data analysis using charts




Amazon


XML in Office 2003. Information Sharing with Desktop XML
XML in Office 2003: Information Sharing with Desktop XML
ISBN: 013142193X
EAN: 2147483647
Year: 2003
Pages: 176

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net