SAX2 | XML and Perl

SAX2 is the successor to the SAX1 standard. Yes, it is a standard even though it is not published by W3C. It's an XML community standard and, therefore, is well-defined and easily adapted to your needs. SAX2 has become the preferred way to stream process XML.

The SAX1 standard was usable by many applications, but many programs required more information from the parser, and SAX1 did not address these issues. SAX2 was created to address these issues. It exposes more parsing information to the application and, therefore, fills in the gap of SAX1.

Most SAX development today has evolved around SAX2 specifications. Probably the only development you will see in the SAX1 arena is the actual upgrading and porting to the SAX2 interface. Perl's SAX2 XML development has currently accelerated thanks to its relatively small but very dedicated community of module developers. As you will see through the rest of this chapter, SAX2 modules are very easy to use considering the power and flexibility they offer.

XML::SAX Perl Module

As the SAX standard evolved, several Perl modules were developed that provided a SAX interface to the XML parser. If you were to look on Comprehensive Perl Archive Network (CPAN) now, you would find a variety of XML modules as well as a few SAX parsers. Even though SAX is a well-defined standard, it becomes difficult to make sure the Perl module is compliant with the latest specifications and provides a common API. To help solve this dilemma, three very generous individuals (Matt Sergeant, Robin Berjon, and Kip Hampton) donated their valuable time to develop the next generation SAX facility, XML::SAX package.

Note

If you ever happen to run into one of the module developers, just scream "DAAHUUT!" and you'll feel right at home. It's the official saying of Perl XML module developers, which evolved from an IRC channel.

XML::SAX itself acts as a high-level interface to the SAX2-compliant parsers. XML::SAX module comes prepackaged with a few modules that provide different functionalities. They are the platform for SAX module and application development.

XML::SAX::Base

If you are writing a SAX parser, driver, handler, or filter, this is the class you need to become familiar with to make your software compliant with Perl SAX2. XML::SAX::Base is a base class that can be inherited in your program. It eliminates a lot of redundancy and hassle in writing some of the common SAX handlers and routines. To inherit from it, you would simply insert the following line in your code.

 use base qw(XML::SAX::Base);

This gives your program default SAX2 functions and properties. Now, when you develop your own handlers or any other functions that need customization, you can use the following template technique:

 package MyHandler;  use base qw(XML::SAX::Base);  sub start_element {  my $self = shift;  #Do your processing here  $self->SUPER::start_element(@_);  }

For more information on XML::SAX::Base, please see the perldoc XML::SAX::Base documentation. If you are interested in developing SAX2 modules, you can find developer's documentation at http://www.perlxml.net/sax.

XML::SAX::ParserFactory

If you are familiar with database processing in Perl and have used the DBI module, you'd see that the same problem existed in that world due to the diversity of databases and drivers necessary for accessing those databases. The DBI module solved that problem by providing a common interface to all the database-specific drivers. As a result of this effort, a majority of the databases in use today can be easily accessed and queried using the same API, so it has become the "Database Independent Interface." Using the same methodology, the XML::SAX::ParserFactory module was born. This object is used to return a SAX2 parser. Every time you install a SAX2-compliant parser, it registers itself in a configuration file named SAX.ini (which is installed with XML::SAX). This file is read by XML::SAX, from the most recently installed to the earliest installed parser.

XML::SAX::ParserFactory returns a parser of choice in three different ways.

Setting the $XML::SAX::ParserPackage variable. It should be assigned the package name and can also contain the minimum version number.
```
 $XML::SAX::ParserFactory = "XML::LibXML::SAX::Parser (1.0)"; 
```

Using a require_feature function to set the features that the parser must support. This method will query all parsers in SAX.ini and return the last installed parser that supports these feature(s) as shown in the following:

 use XML::SAX::ParserFactory;  my $factory = XML::SAX::ParserFactory->new();  $factory->require_feature (http://xml.org/sax/features/validation'); my $parser =  $factory->parser(...);  #or you can also specify this within the new function  my $factory = XML::SAX::ParserFactory->new(RequiredFeatures => {                     'http://xml.org/sax/features/validation' => 1,                                              }

Creating a SAX.ini file and listing the requirement information in it. Here is a sample line in SAX.ini:
```
 ParserPackage = XML::SAX::Expat (0.30) 
```
The information is written in foo = bar format. If you wanted a parser with certain features you can use this line:
```
 http://xml.org/sax/features/validation = 1 
```
The XML::SAX::ParserFactory object searches for the SAX.ini file in @INC and uses that if found.
If none of the above are specified, the parser will return the last installed SAX parser on the system. XML::SAX comes bundled with XML::SAX::PurePerl and, therefore, you will always have this parser to fall back on in case no other parser is installed.

If the ParserFactory does not meet any of the above set criteria, it throws an exception.

You now see the significance of having the XML::SAX module and how it can benefit the SAX applications development. Let's take a look at how we can use this facility in a real application.

XML::SAX::PurePerl Perl Module

XML::SAX::PurePerl is a SAX2 parser that is entirely written in Perl. Most of the other modules have C library dependencies that are usually hidden from the user . Why would it matter that the module is written entirely in Perl? Well, you could encounter a few situations where this would be very important.

What would happen if some crisis popped up and you needed to perform some XML parsing on a mission-critical web server that was currently live and servicing customers? Assume that the machine has Perl installed (of course), but doesn't have a C compiler ”it has been removed to discourage any users from developing on the machine and also to free up disk space. Also, assume that you don't have administrative privileges on the machine, and that the system administrator is on vacation. If you think about it, this isn't an unrealistic situation, and something similar has probably happened to all of us. Also, a server such as this could reside in the company DMZ ”physically in your building, but outside the corporate firewall. This would be the ideal situation to utilize the XML::SAX::PurePerl module. Because XML::SAX:: PurePerl doesn't have the benefit of a C library, it is extremely slow for multi-user production environments. However, in a situation such as the one just described, performance probably isn't the most important measure of success. Sometimes, it is more important to get the job done than it is to get the job done quickly.

XML::SAX::PurePerl Application

Let's take a look at an example using XML::SAX::PurePerl. Assume that your company uses a time reporting system and that every Friday it generates an XML file that contains the timecard information for all employees . In this example, we'll use the XML::PurePerl Perl module and generate the required summary report.

First, let's take a look at the DTD and XML schema for the XML document we'll need to process. The DTD is shown in Listing 3.11. As you can see, it basically contains all the information that might appear on a typical timecard. The root element is <timecard_report> , and it is made up of multiple <employee> elements. Each <employee> element has two attributes ( name and employee_num ) and a child element ( <project> ). The project element contains the <project_number> element and the <hours_charged> element.

Listing 3.11 DTD for the timecard report XML document. (Filename: ch3_pureperl_timecard_report.dtd)

 <?xml version="1.0" encoding="UTF-8"?>  <!ELEMENT timecard_report (employee)*>  <!ELEMENT employee (project)*>  <!ATTLIST employee     name CDATA #REQUIRED     employee_num CDATA #REQUIRED>  <!ELEMENT project (project_number, hours_charged)>  <!ELEMENT project_number (#PCDATA)>  <!ELEMENT hours_charged (#PCDATA)>

The XML schema for the timecard report XML document is shown in Listing 3.12.

Listing 3.12 XML schema for the timecard report XML document. (Filename: ch3_pureperl_timecard_report.xsd)

 <?xml version="1.0" encoding="UTF-8"?>  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"  elementFormDefault="qualified">     <xs:element name="employee">        <xs:complexType>           <xs:sequence minOccurs="0" maxOccurs="unbounded">              <xs:element ref="project"/>           </xs:sequence>          <xs:attribute name="name" type="xs:string" use="required"/>           <xs:attribute name="employee_num" type="xs:string" use="required"/>        </xs:complexType>     </xs:element>     <xs:element name="hours_charged" type="xs:float"/>     <xs:element name="project">        <xs:complexType>           <xs:sequence>              <xs:element ref="project_number"/>              <xs:element ref="hours_charged"/>           </xs:sequence>        </xs:complexType>     </xs:element>     <xs:element name="project_number" type="xs:string"/>     <xs:element name="timecard_report">        <xs:complexType>           <xs:sequence minOccurs="0" maxOccurs="unbounded">              <xs:element ref="employee"/>           </xs:sequence>        </xs:complexType>     </xs:element>  </xs:schema>

As with the other examples, the XML document shown in Listing 3.13 isn't very complex, but it does have some of our data stored as both element attributes and as character data.

Listing 3.13 Company timecard information in XML. (Filename: ch3_pureperl_timecard.xml)

 <?xml version="1.0" encoding="UTF-8"?>  <timecard_report>     <employee name="Mark" employee_num="123">        <project>           <project_number>100-A</project_number>           <hours_charged>19</hours_charged>        </project>        <project>          <project_number>100-B</project_number>           <hours_charged>21</hours_charged>        </project>     </employee>     <employee name="Ilya" employee_num="129">        <project>           <project_number>101-A</project_number>           <hours_charged>45</hours_charged>        </project>     </employee>     <employee name="Alyse" employee_num="626">        <project>           <project_number>105-B</project_number>           <hours_charged>43</hours_charged>        </project>     </employee>     <employee name="Ed" employee_num="120">        <project>           <project_number>100-A</project_number>           <hours_charged>10</hours_charged>        </project>        <project>           <project_number>100-C</project_number>           <hours_charged>12</hours_charged>        </project>     </employee>  </timecard_report>

Our task is to generate the report shown in Listing 3.14. Note that the report contains a repeating section for each employee, and that each employee can charge to multiple projects. Also, at the end of the report, we'll need to provide a project summary showing the total number of hours charged to each project.

Listing 3.14 Contents of the output timecard report . (Filename: ch3_pureperl_report.txt)

 Timecard Report  --------------- Name: Mark Employee Number: 123  Project Number: 100-A Hours: 19  Project Number: 100-B Hours: 21  Total Hours: 40  Name: Ilya  Employee Number: 129  Project Number: 101-A Hours: 45  Total Hours: 45  Name: Alyse  Employee Number: 626  Project Number: 105-B Hours: 43  Total Hours: 43  Name: Ed  Employee Number: 120  Project Number: 100-A Hours: 10  Project Number: 100-C Hours: 12  Total Hours: 22  Project Summary  --------------- Project 100-A charged a total of 29 hours  Project 100-B charged a total of 21 hours  Project 100-C charged a total of 12 hours  Project 101-A charged a total of 45 hours  Project 105-B charged a total of 43 hours

How do we do this? Please take a look at the program shown in Listing 3.15. It is similar to the XML::Parser::PerlSAX example; however, there are some important (and sometimes subtle) differences between the two approaches that you should know about.

Listing 3.15 Program built using the XML::SAX::PurePerl module that will parse the input XML document containing weekly timecard information. (Filename: ch3_pureperl_app.pl)

 1.   use strict;  2.   use XML::SAX::ParserFactory;  3.  4.   $XML::SAX::ParserPackage = "XML::SAX::PurePerl";  5.  6.   my $handler = MyHandler->new();  7.   my $parser = XML::SAX::ParserFactory->parser(Handler => $handler);  8.   my $inputXmlFile = shift  "ch3_pureperl.xml";  9.   my %parser_args = (Source => {SystemId => $inputXmlFile});  10.   $parser->parse(%parser_args);  11.  12.   package MyHandler;  13.   my ($current_element, $total, %projectHours, $currentProj);  14.  15.   sub new {  16.     my $type = shift;  17.     return bless {}, $type;  18.   }  19.  20.   # start_document event handler  21.   sub start_document {  22.     my $key;  23.  24.     print "\nTimecard Report\n";  25.     print "---------------\n";  26.   }  27.  28.   sub start_element {  29.     my ($self, $element) = @_;  30.  31.     # Set $current_element to the current element.  32.     $current_element = $element->{Name};  33.  34.     my %atts = %{$element->{Attributes}};  35.     my $numAtts = keys(%atts);  36.  37.     # Check to see if this element has attributes.  38.     if ($numAtts > 0) {  39.       print "\n";  40.  41.       my ($thisAtt, $val); 42.       for my $key (keys %atts) {  43.  44.         # Use this to look for a particular attribute.  45.         if ($atts{$key}->{Name} eq 'name') {  46.      print "Name: $atts{$key}->{Value}\n";  47.         }  48.         elsif ($atts{$key}->{Name} eq 'employee_num') {  49.      print "Employee Number: $atts{$key}->{Value}\n";  50.         }  51.       }  52.     }  53.   }  54.  55.   # characters event handler  56.   sub characters {  57.     my ($self, $characters) = @_;  58.  59.     my $char_data = $characters->{Data};  60.  61.     # Remove leading and trailing whitespace.  62.     $char_data =~ s/^\s*//;  63.     $char_data =~ s/\s*$//;  64.  65.     if (length($char_data)) {  66.  67.       # Look for a particular element.  68.       if (($current_element eq 'project_number')) {  69.         print "Project Number: $char_data\t";  70.         $currentProj = $char_data;  71.       }  72.       elsif ($current_element eq 'hours_charged') {  73.         if (exists ($projectHours{$currentProj})) {  74.         $projectHours{$currentProj} += $char_data;  75.         }  76.         else {  77.         $projectHours{$currentProj} = $char_data;  78.         }  79.  80.         print "Hours: $char_data\n";  81.  82.         # Increment the total hours here.  83.         $total += $char_data;  84.       }  85.     }  86.   }  87.  88.   # end_element event handler  89.   sub end_element { 90.     my ($self, $element) = @_;  91.  92.     if ($element) {  93.  94.       # We're at the end of an employee element, so  95.       # print out the total hours for this employee  96.       # and reset the scalar $total to 0.  97.       if (($element->{Name}) eq 'employee') {  98.         print "Total Hours: $total\n";  99.         $total = 0;  100.       }  101.     }  102.   }  103.  104.   # end_document event handler  105.   sub end_document {  106.     my $key;  107.  108.     # Print out any summary information here.  109.     print "\nProject Summary\n";  110.     print "---------------\n";  111.     foreach $key (sort keys %projectHours) {  112.       print "Project $key charged a total of $projectHours{$key} hours\n"  113.     }  114.   }

Now that I have presented the Perl program built around the XML::SAX:: PurePerl module, let's take a walk through the event handlers so that you understand what is happening in this example.

XML::SAX::PurePerl Application Discussion

You will see a lot of similarity between the use of XML::Parser::PerlSAX and XML::SAX::PurePerl. First, as always, we start off with the use strict and use diagnostics pragmas ”you should try to use both of these in all your Perl programs. For this example, we're going to do something a little different and use the XML::SAX::ParserFactory module. The XML::SAX::ParserFactory module acts as a front end for other modules and provides your application with a Perl SAX2 XML parser module. Whenever you install a SAX2 parser on your machine, it registers with XML::SAX and then is available to all applications by using XML::SAX::ParserFactory. The advantage of using the XML::SAX::ParserFactory module is that it provides a single interface to all your SAX2 parser modules.

Note

Please see perldoc XML::SAX::ParserFactory for additional information.

Initialization

1 “18 In the opening section of our program, we specify to the XML::Parser::Factory that we want to use the XML::SAX::PurePerl module. By doing this, the call to XML::SAX::ParserFactory->parser() returns an instance of the desired parser module ”in our case, the PurePerl module. Now that we have an instance of the PurePerl module available to us in the $parser object, we can pass in any additional information that is required (for example, the name of the file containing the XML data) in a hash of key-value pairs.

The next section is the beginning of the inline package definition. As with the XML::Parser::PerlSAX module example, the purpose of this package is to assist us in parsing the XML document and generating the required output file. Note that we declare a hash named projectHours . We'll use this to store the project numbers and the number of hours charged to each project as key-value pairs.

 1.   use strict;  2.   use XML::SAX::ParserFactory;  3.  4.   $XML::SAX::ParserPackage = "XML::SAX::PurePerl"; 5.  6.   my $handler = MyHandler->new();  7.   my $parser = XML::SAX::ParserFactory->parser(Handler => $handler);  8.   my $inputXmlFile = shift  "ch3_pureperl.xml";  9.   my %parser_args = (Source => {SystemId => $inputXmlFile});  10.   $parser->parse(%parser_args);  11.  12.   package MyHandler;  13.   my ($current_element, $total, %projectHours, $currentProj);  14.  15.   sub new {  16.     my $type = shift;  17.     return bless {}, $type;  18.   }

start_document Event Handler

20 “26 The start_document event handler is called once, so we use that to print out the heading for our report. The start_element event handler behaves exactly the same as the start_element handlers in the other parsers (that is, it is called when the parser encounters the opening [or start] tag for an element). One area of the start element that requires some explanation is attribute handling.

 20.   # start_document event handler  21.   sub start_document {  22.     my $key;  23.  24.     print "\nTimecard Report\n";  25.     print "---------------\n";  26.   }

start_element Event Handler

28 “53 In the start_element handler, you can see that there are two arguments. The first object is a reference to the current object (similar to a *this pointer in C++). The second argument is a reference to the attributes attached to the element. Note that the start_element is the only event handler that has access to the attributes. As you can see in the example, we can access the hash of attributes, treating it as any other hash containing key-value pairs. For our application, we need to find the values associated with the Name and Employee Number keys. We do this by looping through the hash and searching for the proper key name. After the required key is found, we print the value associated with the key.

 28.   sub start_element {  29.     my ($self, $element) = @_;  30.  31.     # Set $current_element to the current element.  32.     $current_element = $element->{Name};  33.  34.     my %atts = %{$element->{Attributes}};  35.     my $numAtts = keys(%atts);  36.  37.     # Check to see if this element has attributes.  38.     if ($numAtts > 0) {  39.       print "\n";  40.  41.       my ($thisAtt, $val);  42.       for my $key (keys %atts) {  43.  44.         # Use this to look for a particular attribute.  45.         if ($atts{$key}->{Name} eq 'name') {  46.      print "Name: $atts{$key}->{Value}\n";  47.         }  48.         elsif ($atts{$key}->{Name} eq 'employee_num') {  49.      print "Employee Number: $atts{$key}->{Value}\n";  50.         }  51.       }  52.     }  53.   }

characters Event Handler

55 “86 Similar to the other characters event handlers that we have seen so far, this event handler is called whenever the parser encounters character data for a particular element. Remember to remove both the leading and trailing whitespace ”this is being performed by substitutions s/^\s*// and s/\s*?// . In this handler, we're looking for the project_number and hours_charged elements, and provided that they contain valid data, we're printing the contents. In this example, we were trying to explicitly match all the elements in the document. If there were additional elements in our input document (for example, an element containing the number of vacation hours used by an employee this calendar year), then it would have been skipped in the output report.

In this event handler, we set a global scalar named $currentProj to the name of the current project element. By doing this, we can assign the current project number and the cumulative number of hours charged to the project in our hash named %projectHours .

 55.   # characters event handler  56.   sub characters {  57.     my ($self, $characters) = @_;  58.  59.     my $char_data = $characters->{Data};  60.  61.     # Remove leading and trailing whitespace.  62.     $char_data =~ s/^\s*//;  63.     $char_data =~ s/\s*$//;  64.  65.     if (length($char_data)) {  66.  67.       # Look for a particular element.  68.       if (($current_element eq 'project_number')) {  69.         print "Project Number: $char_data\t";  70.         $currentProj = $char_data;  71.       }  72.       elsif ($current_element eq 'hours_charged') {  73.         if (exists ($projectHours{$currentProj})) {  74.         $projectHours{$currentProj} += $char_data;  75.         }  76.         else { 77.         $projectHours{$currentProj} = $char_data;  78.         }  79.  80.         print "Hours: $char_data\n";  81.  82.         # Increment the total hours here.  83.         $total += $char_data;  84.       }  85.     }  86.   }

end_element Event Handler

88 “102 As with the other end_element event handlers that we've discussed so far, this end_element event handler is called when the parser encounters the closing tag for an element. In this event handler, we use it to search for the end of an employee element. When we encounter the end of an employee element, it indicates that we've finished processing the current employee. So, we need to print out the total number of hours charged by the employee.

 88.   # end_element event handler  89.   sub end_element {  90.     my ($self, $element) = @_;  91.  92.     if ($element) {  93.  94.       # We're at the end of an employee element, so  95.       # print out the total hours for this employee  96.       # and reset the scalar $total to 0.  97.       if (($element->{Name}) eq 'employee') {  98.         print "Total Hours: $total\n";  99.         $total = 0;  100.       }  101.     }  102.   }

end_document Event Handler

104 “114 The end_document event handler is called once when the parser reaches the end of the XML data (whether it is stored in a file or just as a block of XML data). This is the perfect location for printing out summary information or performing any other steps that need to be completed at the end of processing. In our case, we'll use the end_document handler to print out the contents of our projectHours hash that contains the project number and total hours stored as key-value pairs.

 104.   # end_document event handler  105.   sub end_document {  106.     my $key;  107.  108.     # Print out any summary information here.  109.     print "\nProject Summary\n";  110.     print "---------------\n";  111.     foreach $key (sort keys %projectHours) {  112.       print "Project $key charged a total of $projectHours{$key} hours\n"  113.     }  114.   }

The output report generated by our application is shown in Listing 3.16. As you can see, the report is in the desired output format and contains all the required information.

Listing 3.16 Output report in the desired format. (Filename: ch3_pureperl_report.txt)

 Timecard Report  --------------- Name: Mark Employee Number: 123  Project Number: 100-A Hours: 19  Project Number: 100-B Hours: 21  Total Hours: 40  Name: Ilya  Employee Number: 129  Project Number: 101-A Hours: 45  Total Hours: 45  Name: Alyse  Employee Number: 626  Project Number: 105-B Hours: 43  Total Hours: 43  Name: Ed  Employee Number: 120  Project Number: 100-A Hours: 10  Project Number: 100-C Hours: 12  Total Hours: 22  Project Summary  --------------- Project 100-A charged a total of 29 hours  Project 100-B charged a total of 21 hours  Project 100-C charged a total of 12 hours  Project 101-A charged a total of 45 hours  Project 105-B charged a total of 43 hours

We just explored the use of XML::SAX with a SAX2 parser, XML::SAX:: PurePerl. As I discussed, XML::SAX::PurePerl is a solid, reliable tool, but it has one weakness. XML::SAX::PurePerl can be slow because it is written purely in Perl and doesn't have the advantage of linked C libraries. A solution to this problem is discussed in the following section.

XML::SAX::Expat Perl Module

When you are setting up a production environment and must depend on the stability and the speed of the parser, XML::SAX::Expat is the choice of professionals. It is built on top of XML::Parser, which has proven stability and speed. Basically, XML::SAX::Expat converts XML::Parser's streaming facility into a SAX2-compliant parser. Although the interface of any SAX2 parser is almost identical, let's take a look at another problem/solution scenario using XML::SAX::Expat.

XML::SAX::Expat-Based Application

In a production environment, computer resources are critical. Ideally, we want applications to run as efficiently (in terms of both memory usage and speed) as possible. Although SAX parsers do not require a lot of memory, we always want to minimize the number of helper or miscellaneous processes running on a server so that the available resources are freed for mission-critical applications.

Let's say that we're working for a busy online retailer and it is the holiday season . All the incoming orders are stored in an XML file. Our task is to parse the XML file containing the orders and generate shipping labels for the warehouse workers.

XML::SAX::Expat Application Discussion

The incoming orders are stored in a simple XML document based on the format of the DTD shown in Listing 3.17. As you can see, the DTD has a root element named <shipping_orders> and a child element named <online_order> . Each <online_order> element has the following child elements: <name> , <address> , <phone> , <item> , <item_number> , <credit_card> , and <order_number> . Note that the <item> element can be repeated multiple times for each order (in case the customer orders more than one item).

Listing 3.17 DTD for the shipping order's XML document. (Filename: ch3_expat_shipping_order.dtd)

 <?xml version="1.0" encoding="UTF-8"?>   (Agile Communications, Inc.) >  <!ELEMENT shipping_orders (online_order*)>  <!ELEMENT online_order (name, address, phone, item*, credit_card, order_number)>  <!ELEMENT name (#PCDATA)>  <!ELEMENT address (#PCDATA)>  <!ELEMENT phone (#PCDATA)>  <!ELEMENT item (item_number, quantity, item_description)>  <!ELEMENT item_number (#PCDATA)>  <!ELEMENT quantity (#PCDATA)>  <!ELEMENT item_description (#PCDATA)>  <!ELEMENT credit_card (#PCDATA)>  <!ELEMENT order_number (#PCDATA)>

The XML schema for the same XML document is shown in Listing 3.18.

Listing 3.18 XML schema for the XML shipping order's XML document. (Filename: ch3_expat_shipping_order.xsd)

 <?xml version="1.0" encoding="UTF-8"?>  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"  elementFormDefault="qualified">     <xs:element name="address" type="xs:string"/>     <xs:element name="credit_card" type="xs:string"/>     <xs:element name="item">        <xs:complexType>           <xs:sequence>              <xs:element ref="item_number"/>              <xs:element ref="quantity"/>              <xs:element ref="item_description"/>           </xs:sequence>        </xs:complexType>     </xs:element>    <xs:element name="item_description" type="xs:string"/>     <xs:element name="item_number" type="xs:string"/>     <xs:element name="name" type="xs:string"/>     <xs:element name="online_order">        <xs:complexType>           <xs:sequence>              <xs:element ref="name"/>              <xs:element ref="address"/>              <xs:element ref="phone"/>              <xs:element ref="item" minOccurs="0" maxOccurs="unbounded"/>              <xs:element ref="credit_card"/>              <xs:element ref="order_number"/>           </xs:sequence>        </xs:complexType>     </xs:element>     <xs:element name="order_number" type="xs:string"/>     <xs:element name="phone" type="xs:string"/>     <xs:element name="quantity" type="xs:integer"/>     <xs:element name="shipping_orders">        <xs:complexType>           <xs:sequence>              <xs:element ref="online_order" minOccurs="0" maxOccurs="unbounded"/>           </xs:sequence>        </xs:complexType>     </xs:element>  </xs:schema>

So, what we need to do is write a program using XML::Parser::Expat that can parse the XML document and generate the required shipping labels as shown in Listing 3.19.

Listing 3.19 XML document that contains online order transactions. (Filename: ch3_expat_shipping_order.xml)

 <?xml version="1.0" encoding="UTF-8"?>  <!DOCTYPE shipping_orders SYSTEM "ch3_online_order.dtd">  <shipping_orders>  <online_order>        <name>Joe DiMaggio</name>     <address>161st St and River Ave, Bronx, NY 10452 </address>        <phone>111-222-3333</phone>        <item>          <item_number>112</item_number>           <quantity>6</quantity>           <item_description>Baseball bat</item_description>        </item>        <credit_card>2214</credit_card>        <order_number>1</order_number>     </online_order>     <online_order>        <name>Mickey Mantle</name>        <address>111 Hastings Street, Spavinaw, OK 74366</address>        <phone>222-333-4444</phone>        <item>           <item_number>209</item_number>           <quantity>6</quantity>          <item_description>Baseballs</item_description>        </item>        <item>           <item_number>220</item_number>           <quantity>1</quantity>           <item_description>Baseball glove</item_description>           </item>        <credit_card>2415</credit_card>        <order_number>2</order_number>     </online_order>     <online_order>        <name>Yogi Berra</name>        <address>100 Arch Way, St. Louis MO 63101</address>        <phone>333-444-5555</phone>        <item>           <item_number>122</item_number>           <quantity>1</quantity>           <item_description>NY Yankee Hat</item_description>        </item>        <credit_card>2150</credit_card>        <order_number>3</order_number>     </online_order>     <online_order>       <name>Phil Rizzuto</name>        <address>1112 18th St, Brooklyn NY, 10452</address>        <phone>555-666-7777</phone>        <item>           <item_number>994</item_number>           <quantity>1</quantity>           <item_description>Pair of shoes</item_description>           </item>       <item>           <item_number>332</item_number>           <quantity>1</quantity>           <item_description>Sunglasses</item_description>        </item>        <credit_card>1588</credit_card>        <order_number>4</order_number>     </online_order>  </shipping_orders>

Our shipping label should be in the following format:

 Order Number:  Cut mailing label along perforated marks  ----------------------------------------- Mail To:  [NAME GOES HERE]   [ADDRESS GOES HERE]  ----------------------------------------- Phone:  Please ship the below items:  Item Number:  Number of items ordered:  Description:  Credit card (last 4 digits):

As you can see, the fields are all in the same order that they appear in our XML file. This is the benefit of SAX parsing because the tree is traversed in the order that elements appear, enabling SAX to only cache the current construct in memory (thus using minimal resources). If you wanted to use this data in a different format, you have two options. Either reorder your XML document, or maintain your own cache and state and release it whenever needed. For applications relying heavily on these features, there are other solutions, which I will discuss in Chapter 4.

Let's now take a look at the Perl application shown in Listing 3.20. This Perl application utilizes the SAX2 parser XML::SAX::Expat Perl module and generates the formatted output that is required.

Listing 3.20 XML::SAX::Expat-based program to generate shipping labels. (Filename: ch3_expat_online_order_app.pl)

 1.   use strict;  2.   use XML::SAX::ParserFactory;  3.  4.   # Set the current package.  5.   $XML::SAX::ParserPackage = "XML::SAX::Expat";  6.  7.   # Instantiate new handler and parser objects.  8.   my $handler = MyHandler->new();  9.   my $parser = XML::SAX::ParserFactory->parser(Handler => $handler);  10.  11.  # Parse the document name that was passed in off the command line.  12.  $parser->parse_uri($ARGV[0]);  13.  14.  package MyHandler;  15.  16.  my $current_element;  17.  my $text;  18.  19.  sub new {  20.    my $self = shift;  21.    return bless {}, $self;  22.  }  23.  24.  # start_element handler  25.  sub start_element {  26.    my ($self, $element) = @_;  27.  28.    my %atts = %{$element->{Attributes}};  29.    my $current_element = $element->{Name};  30.  31.    if ($current_element eq "online_order") {  32.      print "Order Number: $atts{'{}number'}->{Value}\n";  33.      print "Cut mailing label along perforated marks\n";  34.      print "-----------------------------------------\n";  35.      print "Mail To:\n";  36.    } 37.  38.  }  39.  40.  # characters handler  41.  sub characters {  42.    my ($self, $character_data) = @_;  43.  44.    $text = $character_data->{Data};  45.  } 46.  47.  # end_element handler  48.  sub end_element {  49.    my ($self, $element) = @_;  50.  51.    my $current_element = $element->{Name};  52.    if ($current_element eq "name") {  53.      print "$text\n";  54.    }  55.    elsif ($current_element eq "address") {  56.      print "$text\n";  57.      print "-----------------------------------------\n";  58.    }  59.    elsif ($current_element eq "phone") {  60.      print "Phone: $text\n\n";  61.      print "Please ship the below items:\n";  62.    }  63.    elsif ($current_element eq "item_number") {  64.      print "\nItem Number: $text\n";  65.    }  66.    elsif ($current_element eq "quantity") {  67.      print "Number of items ordered: $text\n";  68.    }  69.    elsif ($current_element eq "item_description") {  70.      print "Description: $text\n";  71.    }  72.    elsif ($current_element eq "credit_card") {  73.      print "Credit card (last 4 digits): $text\n";  74.    }  75.    elsif ($current_element eq "online_order") {  76.      print "\n";  77.    }  78.  }

Again, we accomplished our task using simple event handlers that are mostly made up of conditional statements. Isn't this stuff easy? Listing 3.21 shows the output from running this program at the command prompt with the filename of the XML document shown in Listing 3.19 as the command-line argument.

Listing 3.21 Output report that contains order and shipping information. (Filename: ch3_expat_online_order_report.txt)

 Order Number: 1234  Cut mailing label along perforated marks  ----------------------------------------- Mail To:  Joe DiMaggio  161st St and River Ave, Bronx, NY 10452  ----------------------------------------- Phone: 111-222-3333  Please ship the below items:  Item Number: 112  Number of items ordered: 6  Description: Baseball bat  Credit card (last 4 digits): 2214  Order Number: 1235  Cut mailing label along perforated marks  ----------------------------------------- Mail To:  Mickey Mantle  111 Hastings Street, Spavinaw, OK 74366  ----------------------------------------- Phone: 222-333-4444  Please ship the below items:  Item Number: 209  Number of items ordered: 6  Description: Baseballs  Item Number: 220 Number of items ordered: 1  Description: Baseball glove  Credit card (last 4 digits): 2415  Order Number: 1236  Cut mailing label along perforated marks  ----------------------------------------- Mail To:  Yogi Berra  100 Arch Way, St. Louis MO 63101  ----------------------------------------- Phone: 333-444-5555 Please ship the below items:  Item Number: 122  Number of items ordered: 1  Description: NY Yankee Hat  Credit card (last 4 digits): 2150  Order Number: 1237  Cut mailing label along perforated marks  ----------------------------------------- Mail To:  Phil Rizzuto  1112 18th St, Brooklyn NY, 10452  ----------------------------------------- Phone: 555-666-7777  Please ship the below items:  Item Number: 994  Number of items ordered: 1  Description: Pair of shoes, size 9.5  Item Number: 332  Number of items ordered: 1  Description: Sunglasses  Credit card (last 4 digits): 1588

This time we use the parse_uri function, which enables us to simply pass it the file name of the file we want to open and parse. This is just an alternative to providing a hash as we did in the XML::Parser::PerlSAX and XML::SAX::PurePerl examples. The parse_uri also accepts optional attributes as a second argument. See the SAX2 reference at the end of this chapter for more information.

The MyHandler class defines three handlers: start_element , characters , and end_element . In our experience, these are the most widely used handlers, and the majority of XML processing applications can be written just using these three. The SAX2 API defines a variety of different handlers that can be used to gain a better control of your application , if such precision is needed.

XML::Xerces Perl Module

The XML::Xerces Perl module provides a Perl API to the Xerces XML parser developed by the Apache Software Foundation. This is the same organization that develops and distributes the Apache web server. If you're familiar with their products, then you know that the Apache Foundation has a well-deserved reputation for developing commercial-quality open source software that strictly follows the required standards. One of their most popular products is the Apache web server, which is the most widely used web server on the Internet.

The Apache Xerces XML parser is another well-designed product. Xerces is very popular for several reasons. First, it follows the XML standards very strictly. Xerces follows the XML 1.0 specification and the related standards (for example, DOM 1.0, DOM 2.0, SAX 1.0, SAX 2.0, namespaces, and schemas). Second, it is available for a number of languages. Xerces parsers are available in Java, C++, and of course Perl. Third, Xerces is available on a number of platforms. Chances are, it is available for your platform. If for some reason you can't find a Xerces binary for your platform, the source code is available.

The XML::Xerces Perl module was developed by Jason Stewart and provides a Perl wrapper around the Xerces C library. One of the benefits of the XML::Xerces parser is that it is a validating parser. A validating parser ensures that the input XML document contains all the required elements and attributes that are defined in either the DTD or XML schema. If the XML document doesn't comply with the accompanying DTD or schema, the XML::Xerces parser will issue a warning. In this section, we'll take a look at an XML::Xerces example that uses the SAX2 API.

XML:Xerces Perl Module Application

This example demonstrates how to use the XML::Xerces Perl module to parse an XML document and generate a report. For this example, we'll design an XML document that contains information collected during Automated Teller Machine (ATM) transactions. The XML document should contain basic information, such as customer name, account number, transaction type, and money amount.

The output report for this application presents all the transactions for a day in an easy-to-read format. Also, it counts the total number of transactions for the day and the sum amount of money deposited and withdrawn from the ATM.

Developing the Application

The first step in the application development is to define the input data format. For this example, we have developed the DTD shown in Listing 3.22.

Listing 3.22 DTD for the XML ATM transaction log. (Filename: ch3_xerces_atm_log.dtd)

 <?xml version="1.0" encoding="UTF-8"?>  <!ELEMENT atm_log (transaction)+>  <!ATTLIST atm_log date CDATA #REQUIRED>  <!ELEMENT transaction (name,account,amount)>  <!ATTLIST transaction type (withdrawal  deposit) #REQUIRED>  <!ELEMENT name (#PCDATA)>  <!ELEMENT account (#PCDATA)>  <!ELEMENT amount (#PCDATA)>

As you can see, this is a straightforward DTD. The <atm_log> is the root element and it has one attribute named date , which contains the date that the ATM log was collected. The <atm_log> element contains multiple <transaction> elements. Each of the <transaction> elements contains a <name> , <account> , and <amount> element and a transaction_type attribute. Note that the transaction_type attribute has only two valid values, withdrawal or deposit . The corresponding XML schema is shown in Listing 3.23.

Listing 3.23 XML schema for the XML ATM transaction log. (Filename: ch3_xerces_atm_log.xsd)

 <?xml version="1.0" encoding="UTF-8"?>  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"  elementFormDefault="qualified">     <xs:element name="account" type="xs:string"/>     <xs:element name="amount" type="xs:float"/>     <xs:element name="atm_log">        <xs:complexType>           <xs:sequence maxOccurs="unbounded">              <xs:element ref="transaction"/>           </xs:sequence>           <xs:attribute name="date" type="xs:string" use="required"/>        </xs:complexType>     </xs:element>     <xs:element name="name" type="xs:string"/>     <xs:element name="transaction">        <xs:complexType>           <xs:sequence>              <xs:element ref="name"/>              <xs:element ref="account"/>              <xs:element ref="amount"/>           </xs:sequence>           <xs:attribute name="type" use="required">              <xs:simpleType>                 <xs:restriction base="xs:NMTOKEN">                    <xs:enumeration value="withdrawal"/>                    <xs:enumeration value="deposit"/>                 </xs:restriction>              </xs:simpleType>           </xs:attribute>        </xs:complexType>     </xs:element>  </xs:schema>

The input XML document based on the DTD shown in Listing 3.22 is presented in Listing 3.24. As you can see, we have four transaction elements in the XML document.

Listing 3.24 Input XML file containing the ATM transaction log. (Filename: ch3_atm_log.xml)

 <?xml version="1.0" encoding="UTF-8"?>  <!DOCTYPE atm_log SYSTEM "ch3_atm_log.dtd">  <atm_log date="6/14">     <transaction type="withdrawal">        <name>Mark Rogers</name>        <account>11-22-33</account>        <amount>100.00</amount>     </transaction>     <transaction type="deposit">        <name>Joseph Burns</name>        <account>11-23-22</account>        <amount>500.00</amount>     </transaction>     <transaction type="withdrawal">        <name>Kayla Burns</name>        <account>22-34-55</account>        <amount>250.00</amount>     </transaction>     <transaction type="deposit">        <name>Joe Reilly</name>        <account>11-33-44</account>        <amount>1000.00</amount>     </transaction>  </atm_log>

XML::Xerces Application Discussion

The XML::Xerces Perl program is shown in Listing 3.25. In the next few sections, I'll discuss the important points in the application.

Listing 3.25 XML::Xerces Perl application. (Filename: ch3_xerces_sax2_app.pl)

 1.   use strict;  2.   use XML::Xerces;  3.  4.   my $file = "ch3_atm_log.xml";  5.   my ($deposit_count, $withdrawal_count);  6.  7.   package EventHandler;  8.   use strict;  9.   use vars qw(@ISA);  10.   @ISA = qw(XML::Xerces::PerlContentHandler);  11.  12.   my ($trans_type, $total_deps, $total_withdrawals, $current_element);  13.  14.   sub start_document {  15.       print "ATM Summary Report\n";  16.   }  17.  18.   sub start_element {  19.     my ($self,$uri,$localname,$qname,$attrs) = @_;  20.     my $attVal;  21.  22.     $current_element = $localname;  23.  24.     if ($attrs->getLength > 0) {  25.  26.         if ($localname eq "atm_log") {  27.        $attVal = $attrs->getValue("date");  28.        print "Date: $attVal\n\n";  29.        print "--------------------\n";  30.         }  31.  32.         if ($localname eq "transaction") {  33.        $trans_type = $attrs->getValue("type");  34.         }  35.     }  36.   }  37.  38.   sub characters {  39.     my ($self,$str,$len) = @_;  40.     $self->{chars} += $len;  41.  42.     if ($current_element eq "name") {  43.         print "Name: $str\n";  44.     } 45.  46.     if ($current_element eq "account") { 47.         print "Account Number: $str\n";  48.     }  49.  50.     if ($current_element eq "amount") {  51.  52.         if ($trans_type eq "deposit") {  53.        $total_deps += $str;  54.        ++$deposit_count;  55.         }  56.         elsif ($trans_type eq "withdrawal") {  57.        $total_withdrawals += $str;  58.        ++$withdrawal_count;  59.         }  60.         print "Amount: $$str\n";  61.     }  62.   }  63.  64.   sub end_element {  65.     my ($self,$uri,$localname,$qname) = @_;  66.  67.     if ($localname eq "transaction") {  68.         print "--------------------\n";  69.     }  70.   }  71.  72.   sub end_document {  73.       my $total_transactions = $deposit_count + $withdrawal_count;  74.       my $output_dep = sprintf("%.2f", $total_deps);  75.       my $output_wd = sprintf("%.2f", $total_withdrawals);  76.  77.       print "Total of $total_transactions processed today\n\n";  78.  79.       print "Transaction Summary\n";  80.       print "Received: $deposit_count deposits\n";  81.       print "Total Value of Deposits: $$output_dep\n\n";  82.  83.       print "Received: $withdrawal_count withdrawals\n";  84.       print "Total Value of Withdrawals: $$output_wd\n";  85.   }  86.  87.   package main;  88.   my $parser = XML::Xerces::XMLReaderFactory::createXMLReader();  89.  90.   eval {  91.       $parser->setFeature("http://xml.org/sax/features/validation", 1);  92.   }; 93.  94.   if ($@) { 95.       if (ref $@) {  96.      die $@->getMessage();  97.       } else {  98.      die $@;  99.       }  100.   }  101.  102.   my $error_handler = XML::Xerces::PerlErrorHandler->new();  103.   $parser->setErrorHandler($error_handler);  104.  105.   my $event_handler = EventHandler->new();  106.   $parser->setContentHandler($event_handler);  107.  108.   eval {  109.     $parser->parse (XML::Xerces::LocalFileInputSource->new($file));  110.   };  111.  112.   if ($@) {  113.       if (ref $@) {  114.      die $@->getMessage();  115.       } else {  116.      die $@;  117.       }  118.   }  119.  120.   exit;

Initialization

1 “5 The initialization section of the program contains the standard use strict pragma. Because we are using the XML::Xerces module, we need to load the module by using the use XML::Xerces pragma. The input XML document filename is hard coded to the $inputXml scalar; however, the program can be easily modified to accept another means of input (for example, a command-line argument, configuration file, and so forth).

 1.   use strict;  2.   use XML::Xerces;  3.  4.   my $file = "ch3_atm_log.xml";  5.   my ($deposit_count, $withdrawal_count);

start_document and start_element Event Handlers

14 “36 The start_document event handler is used in this case to print the report heading. Remember, the start_document handler is called only once per XML document (when the parser first starts parsing the XML document), so it is the ideal event handler to use when you need something to happen only once per XML document. In our case, the one-time event is printing the report heading. Note that the start_document event handler doesn't receive any arguments when called by the parser.

 14.   sub start_document {  15.       print "ATM Summary Report\n";  16.   }  17.  18.   sub start_element {  19.     my ($self,$uri,$localname,$qname,$attrs) = @_;  20.     my $attVal;  21.  22.     $current_element = $localname;  23.  24.     if ($attrs->getLength > 0) {  25.  26.         if ($localname eq "atm_log") {  27.        $attVal = $attrs->getValue("date");  28.        print "Date: $attVal\n\n";  29.        print "--------------------\n";  30.         }  31.  32.         if ($localname eq "transaction") {  33.        $trans_type = $attrs->getValue("type");  34.         }  35.     }  36.   }

The start_element handler is called whenever the parser encounters the start of an element. Several arguments are passed into the subroutine. The standard arguments are

$self ” Contains the element object.
$uri ” Contains the URI of the namespace for this element.
$localname ” Contains the element name.
$qname ” Contains the XML qualified name for this element.

The qualified name has a namespace:element_name format.
$attrs ” Contains any attributes associated with this element.

One of the first actions inside the start_element handler is to set the global scalar $current_element to the local scalar $localname . We'll use this to track the current element inside other event handlers.

Next, we are processing attributes. First, we use the getLength method to determine the number of attributes associated with the current element. If the element has attributes, then we know we are currently processing either the <atm_log> root element or one of the <transaction> elements (because they are the only elements in our XML document that have attributes).

If the element is the root element <atm_log> , we are extracting the date attribute and using the value to finish printing the report header. On the other hand, if we're processing a transaction element, we're extracting and storing the transaction type. Remember, the only valid transaction types are withdrawal or deposit. So, we set a global scalar named $trans_type equal to the current transaction type. This is similar to the global scalar that we set to the current element and will be used to accumulate results.

characters Event Handler

38 “62 The characters event handler receives three arguments when it is called. The input arguments are

$self ” Contains the element object.
$str ” Contains the character data for the current element.
$len ” Contains the length of the character data.

In the characters event handler, we're checking the value of the $current_ element scalar that was set in the start_element handler. After we find the current element that we're looking for, we can print the name of the element and the element contents.

When we encounter an <amount> element, we're looking at the value of the $trans_type scalar (which was set in the start_element handler). Because we are responsible for adding up the number of withdrawals and deposits, we need to know the type of the current transaction. If the current transaction type is a deposit, we increment the number of deposits processed and add the amount to the total amount deposited. If the current transaction type is a withdrawal, then we increment the number of withdrawals and the amount to the total amount withdrawn.

 38.   sub characters {  39.     my ($self,$str,$len) = @_;  40.     $self->{chars} += $len;  41.  42.     if ($current_element eq "name") {  43.         print "Name: $str\n";  44.     }  45.  46.     if ($current_element eq "account") {  47.         print "Account Number: $str\n";  48.     }  49.  50.     if ($current_element eq "amount") {  51.  52.         if ($trans_type eq "deposit") {  53.        $total_deps += $str;  54.        ++$deposit_count;  55.         }  56.         elsif ($trans_type eq "withdrawal") {  57.        $total_withdrawals += $str;  58.        ++$withdrawal_count;  59.         }  60.         print "Amount: $$str\n";  61.     }  62.   }

end element and end document Event Handlers

64 “85 The end_element event handler receives the following arguments when it is called:

$self ” Contains the element object.
$uri ” Contains the URI of the namespace for this element.
$localname ” Contains the element name.
$qname ” Contains the XML qualified name for this element.

The qualified name has a namespace:element_name format.

Note that these arguments are similar to those passed to the start_element event handler. The only difference is that the end_element event handler doesn't receive attributes.

 64.   sub end_element {  65.     my ($self,$uri,$localname,$qname) = @_;  66.  67.     if ($localname eq "transaction") {  68.         print "--------------------\n";  69.     }  70.   }  71.  72.   sub end_document {  73.       my $total_transactions = $deposit_count + $withdrawal_count;  74.       my $output_dep = sprintf("%.2f", $total_deps);  75.       my $output_wd = sprintf("%.2f", $total_withdrawals);  76.  77.       print "Total of $total_transactions processed today\n\n";  78.  79.       print "Transaction Summary\n";  80.       print "Received: $deposit_count deposits\n";  81.       print "Total Value of Deposits: $$output_dep\n\n";  82.  83.       print "Received: $withdrawal_count withdrawals\n";  84.       print "Total Value of Withdrawals: $$output_wd\n";  85.   }

The only thing that we're doing inside the end_element event handler is checking the name of the element being processed. If we are processing the end tag of a <transaction> element, then we're printing out a separator line that appears in the output report.

The end_document event handler (as the name implies) is called once at the end of the document. That makes it an ideal location to print any summary information. We're using it to print the results that we've accumulated while processing the XML document.

Parser Initialization

87 “120 This block contains the main package that instantiates a parser object and sets the appropriate options.

 87.   package main;  88.   my $parser = XML::Xerces::XMLReaderFactory::createXMLReader();  89.  90.   eval {  91.       $parser->setFeature("http://xml.org/sax/features/validation", 1);  92.   };  93.  94.   if ($@) {  95.       if (ref $@) {  96.      die $@->getMessage();  97.       } else {  98.      die $@;  99.       }  100.   }  101.  102.   my $error_handler = XML::Xerces::PerlErrorHandler->new();  103.   $parser->setErrorHandler($error_handler);  104.  105.   my $event_handler = EventHandler->new();  106.   $parser->setContentHandler($event_handler);  107.  108.   eval {  109.     $parser->parse (XML::Xerces::LocalFileInputSource->new($file));  110.   };  111.  112.   if ($@) {  113.       if (ref $@) {  114.      die $@->getMessage();  115.       } else {  116.      die $@;  117.       }  118.   }  119.  120.   exit;

First, the new parser object is created on line 88:

 my $parser = XML::Xerces::XMLReaderFactory::createXMLReader();

After the object is created, we set the appropriate options. The first option we set is to report all validation errors. In this case, the parser would generate a warning to notify us if the input XML document didn't follow the DTD or XML schema. For example, let's say that our XML document contained an element that wasn't defined in the original DTD or XML schema. Let's say that the XML document contains multiple transaction elements followed by an <atm_location> element (which isn't defined in our DTD):

 <?xml version="1.0" encoding="UTF-8"?>  <!DOCTYPE atm_log SYSTEM "ch3_atm_log.dtd">  <atm_log date="6/14">     <transaction type="withdrawal">        <name>Mark Rogers</name>        <account>11-22-33</account>        <amount>100.00</amount>     </transaction>       <atm_location>Grocery store</atm_location>  </atm_log>

By turning on the validation option, we'll get the following error from the XML::Xerces Perl module:

 ERROR:  FILE: /home/mriehl/book/ch3_atm_log.xml  LINE:   24  COLUMN: 7  MESSAGE: Unknown element 'atm_location' at ch3_xerces_sax2.pl

A number of other features can be used, depending on your situation. For example, you can enable schema constraint checking; the downside is that it may be time consuming or memory intensive . Please see the Apache XML::Xerces documentation for additional features.

As you can see, we're using Perl eval blocks to catch any run-time exceptions that may be returned by the parser. If you're not familiar with the Perl eval statement, it is similar to a "try block" in either C++ or Java. For additional information on the Perl eval function, see perldoc -f eval .

Next we register an error handler. If we don't register an error handler in our application, all error events will be silently ignored except for fatal errors. Because we would like to be notified of any possible errors, we are setting an error handler.

After setting an error handler, we instantiate an event handler (defined in our EventHandler package) and parse the XML document. The XML:: Xerces parser calls the event handlers whenever the corresponding events are encountered (for example, start_element , end_element , and so forth).

The XML::Xerces Perl application is a report that is shown in Listing 3.26. As you can see, the report summarizes the contents of the input XML document, counts the number of deposits and withdrawals, and sums the totals of the deposits and withdrawals.

Listing 3.26 Output report containing a summary of ATM transactions. (Filename: ch3_xerces_report.txt)

 ATM Summary Report  Date: 6/14  -------------------- Name: Mark Rogers  Account Number: 11-22-33  Amount: 0.00  -------------------- Name: Joseph Burns  Account Number: 11-23-22 Amount: 0.00  -------------------- Name: Kayla Burns  Account Number: 22-34-55  Amount: 0.00  -------------------- Name: Joe Reilly  Account Number: 11-33-44  Amount: 00.00  -------------------- Total of 4 processed today  Transaction Summary  Received: 2 deposits  Total Value of Deposits: 00  Received: 2 withdrawals  Total Value of Withdrawals: 0