7.6. Data analysis Dawn, the Worldwide Widget accountant, finds that XML has added benefits when she wants to analyze all the expense reports that were submitted to her. In the past, when all the employees submitted their Excel expense reports to her, Dawn had to do a lot of cutting and pasting to get any kind of summary reports. Now that they are submitting XML documents, she can create her own workbook to utilize a whole set of analysis tools. Employees submit their expense reports by e-mail. They attach XML documents named EXP_nnn_YYYYMMDD.xml, where nnn is their employee number, and YYYYMMDD is the end date of the expense report. Dawn creates an Excel workbook that consists of several worksheets. The first worksheet, entitled Raw Data, contains the map to import all the data from the expense reports. 7.6.1 Flattening XML data Dawn wants to import expense reports from several different employees into the same worksheet. The worksheet we created earlier in this chapter would not work, because it would overwrite all the non-repeating elements every time a new expense report was imported. Instead, Dawn needs a worksheet that will repeat all the normally non-repeating "header" information in each row, alongside the repeating expense item elements. You can think of this as flattening the hierarchy. We are extracting the elements with data content, regardless of the level they are at, and presenting them in adjacent cells in the same rows of the spreadsheet.[4] [4] In the punched card data processing days, this was called a unit record because it was a self-contained complete record of a transaction and therefore could be sorted and tabulated. To do this: - 1. Open a new, blank worksheet.
- 2. Create a new map based on the expenses.xsd schema, as described in 7.3.1, "Creating a map", on page 138.
- 3. This time, instead of dragging each element type onto the worksheet individually, select the expenseReport element type and drag it to the worksheet.
You now have a large list range with a column for each element type that can contain only data (i.e. that cannot contain elements). The column headings (element-type names) are shown in Table 7-1. Table 7-1. Column headings for flattened structure empName | empNum | purpose | deptCode | billToCode | periodFrom | periodTo | date | explanation | mileage | airFare | perDiem | lodging | auto | taxi | phone | businessMeals | code | amount | During import, the data in the non-repeating element types is repeated in each expense item row. To import several expense reports at once: - 1. Click Import on the XML submenu.
- 2. Select all the files in the raw data directory at once, using Shift+click or Ctrl+click.
- 3. Click Open.
This will import all the data from all of the expense reports, as shown in Figure 7-8. Figure 7-8. Flattened data from multiple expense reports (expense data analysis.xls) Caution | The flattened data cannot be exported using the map with which you imported it. Excel will warn that the map is not exportable because it contains denormalized data. However, you can export the data or calculations based on the data by creating a separate range of cells that reference the data using cell equations. This second range of cells can be mapped to a separate schema and exported. | 7.6.2 Reports and charts Now that the data is combined, Dawn can use summary reports and charts to analyze it, as shown in Figure 7-9 and Figure 7-10. When she wants to update the analysis, she can simply import new XML data (or refresh the current data) and the summary report and charts will automatically be updated. A working version of this worksheet is in expense data analysis.xls. Figure 7-9. Summary report Figure 7-10. Data analysis using charts |