XML::SAXDriver::Excel Perl Module


I've demonstrated how to generate XML data from a CSV file. Now, we're going to look at another SAX-based Perl module, XML::SAXDriver::Excel. As you may have already guessed from the name , this Perl module facilitates the conversion between standard Microsoft Excel spreadsheet files and XML. Why not just save the Microsoft Excel spreadsheet file as CSV and use the Perl module that we discussed in the previous section? Well, you can do that. However, one situation when this module comes in handy is when you have a large number of Microsoft Excel spreadsheet files to convert to XML; this can save you the trouble of converting a large number of files to CSV format. Let's take a look at an example of how to use this module.

Converting Microsoft Excel Spreadsheet Files to XML

Let's assume that your company has a financial application that tracks company expenses. This information is maintained for use by other applications within the company and is accessed by a reporting application that provides customized reports to users. As with any integration project, this data is best kept in XML format, so that any application (internal or even external to your company) that needs access to the data can easily retrieve it. The problem is that most accountants don't know XML, nor do they need to. One of their most widely used tools of choice is Microsoft Excel, which is used to store, manipulate, and save their data. However, they occasionally need to upload their data to a database server to share the data with those within the company, however, it may be physically located at different geographical sites. One of the tools that you've developed provides an upload capability; however, your application requires XML as input. So, you will need to convert the Microsoft Excel spreadsheet files to XML format.

What's the best way to do this? As I mentioned earlier, you can save each Microsoft Excel spreadsheet file as a CSV file and use the XML::SAXDriver:: CSV module. However, for large-scale conversions (that is, a large number of Microsoft Excel spreadsheet files), this would be too time consuming. Imagine how long it would take to open 100 Microsoft Excel spreadsheet files and save them as CSV; this is not an option. A better choice in this situation is to use the XML::SAXDriver::Excel Perl module to facilitate the conversion directly from a Microsoft Excel spreadsheet file to XML. Let's take a look at how to do this conversion with an example.

XML::SAXDriver::Excel Perl Module Example

The solution is actually pretty simple, especially after I just demonstrated the XML::SAXDriver::CSV Perl module. The XML::SAXDriver::Excel module has the same interface as the XML::SAXDriver::CSV module, thanks in part to SAX standard. The complexities of parsing the underlying Microsoft Excel spreadsheet file are hidden from the user .

The Microsoft Excel document that needs to be converted to XML is shown in Table 7.1. As you can see, it is a typical spreadsheet that contains expense or budgeting information. Note that the spreadsheet has column headings in the first row and several records of data.

Table 7.1. Microsoft Excel file containing business expenses. (Filename: Ch7_excel_input.xls)

Employee ID

Expense

Amount

Description

Date

Dept

Manager

32884

Hotel

$120

One night stay during training travel

1/5/2002

38

John Smith

29833

Restaurant

$50

Client entertainment

2/20/2002

11

Jane Wilson

10997

Gas

$25

Two day training travel expense

2/17/2002

38

John Smith

27342

Office Supplies

$45

Misc. Office Supplies

1/10/2002

94

Jonathan Dawson

To convert this Microsoft Excel file to XML, we exercise the XML:: SAXDriver::Excel Perl module in the same way we used the XML:: SAXDriver::CSV Perl module in the last example. The new program based on the XML::SAXDriver::Excel Perl module is shown in Listing 7.5. If you compare this listing to the XML::SAXDriver::CSV-based program shown back in Listing 7.3, you'll see that the programs are basically the same except for two changes. First, we need the following statement at the top of the file to load the XML::SAXDriver::Excel module instead of the XML::SAXDriver::CSV module:

 2.   use XML::SAXDriver::Excel; 

The second change between the programs occurs on line 7. In the previous example, we created a new XML::SAXDriver::CSV object; however, in this example, we need to create a new XML::SAXDriver::Excel object. So, we use the following line to accomplish this:

 7.   my $csv = XML::SAXDriver::Excel->new(); 

Note that the two changed lines are both shown in bold text in Listing 7.5. Other than that, the programs are identical. This is very powerful if you think about it. We added support to our application for a different input data type with literally a two-line change.

Listing 7.5 Program to convert a Microsoft Excel input file to XML. (Filename: Ch7_excel_xml_app.pl)
 1.    use strict;  2.    use XML::SAXDriver::Excel;  3.    use XML::Handler::YAWriter;  4.    use IO::File;  5.  6.    my $input_file = shift;  7.    my $csv = XML::SAXDriver::Excel->new();  8.    my $writer = XML::Handler::YAWriter->new(9.                 Output => IO::File->new(">ch7_excel_to_xml.xml"),  10.                Pretty => { PrettyWhiteIndent => 1,  11.                            PrettyWhiteNewline => 1  12.                }  13.); 14.  15.   $obj->parse(Source => {SystemId => $input_file},  16.          Handler => $writer,  17.          Declaration => {Version => '1.0'},  18.          Dynamic_Col_Headings => 1  19.); 

We could easily merge the last two examples together to support both input files within the same program. One approach would be to create two subroutines ”one to convert input CSV files and the other to convert input Microsoft Excel files. We could filter the input files based on the file name extension (that is, .csv or .xls) and then easily call the appropriate subroutine based on the input data type.

Microsoft Excel File to XML Conversion Program Output

When we run our Microsoft Excel spreadsheet file to XML conversion program, the file out.xml is generated. Listing 7.6 shows the contents of out.xml. As you can see, this XML file contains the information that was originally stored in the Microsoft Excel spreadsheet. Note that the application takes a single command-line argument ”the name of the input Microsoft Excel file.

Listing 7.6 Output XML generated by the conversion from Microsoft Excel to XML. (Filename: ch7_excel_to_xml.xml)
 <?xml version="1.0" encoding="UTF-8"?>  <records>    <record>      <Employee_ID>32884</Employee_ID>      <Expense>Hotel</Expense>      <Amount>120</Amount>      <Description>One night stay during training travel</Description>      <Date>1-5-02</Date>      <Department>38</Department>      <Manager>John Smith</Manager>    </record>    <record>     <Employee_ID>29833</Employee_ID>      <Expense>Restaurant</Expense>      <Amount>50</Amount>      <Description>Client entertainment</Description>      <Date>2-20-02</Date>      <Department>11</Department>      <Manager>Jane Wilson</Manager>    </record>    <record>      <Employee_ID>10997</Employee_ID>      <Expense>Gas</Expense>      <Amount>25</Amount>      <Description>Two day training travel expense</Description>      <Date>2-17-02</Date>      <Department>38</Department>      <Manager>John Smith</Manager>    </record>    <record>      <Employee_ID>27342</Employee_ID>      <Expense>Office Supplies</Expense>      <Amount>45</Amount>      <Description>Misc. Office Supplies</Description>      <Date>1-10-02</Date>      <Department>94</Department>      <Manager>Johnathan Dawson</Manager>    </record>  </records> 

Note that the root XML file element and its child element are named <records> and <record> , respectively. This is the default behavior of both the XML::SAXDriver::CSV and XML::SAXDriver::Excel modules.You can easily redefine these tags by setting the File_Tag and Parent_Tag properties as shown in the following, or by providing them as an argument to either the new() or the parse() functions.

 $obj->{File_Tag} = "Expense_Report";  $obj->{Parent_Tag} = "Expense"; 

Doing so, we'll appropriately use these properties for the output XML file elements.

 <Expense_Report>    <Expense>                </Expense>  </Expense_Report> 

Note that the rest of the <record> element's children directly correspond to the first row headings in the Microsoft Excel spreadsheet file.

We've now seen how easily you can convert both CSV and Microsoft Excel files to XML. Both the XML::SAXDriver::CSV and XML::SAXDriver:: Excel modules provide a simple, flexible interface.

As flexible as these modules are, there are times where we may need to extend the functionality of one of these modules to accomplish a more complex task. If so, then you'll need to develop your own custom event handler. I'll demonstrate how to write your own custom event handler in the next section.



XML and Perl
XML and Perl
ISBN: 0735712891
EAN: 2147483647
Year: 2002
Pages: 145

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net