XML and Perl

The Practical Extraction and Reporting Language (PERL), has long been a mainstay of server-side programming and a foundation of Common Gateway Interface (CGI) programming. Perl has been getting into XML in a big way, and one could easily write a book on the subject.

Perl modules are distributed at the Comprehensive Perl Archive Network (CPAN) site, www.cpan.org, and plenty of them deal with XML (I counted 156). You can find a selection of Perl XML modules, along with their descriptions as given on the CPAN site, in Table 20-2.

Table 20-2. XML Modules in Perl with CPAN descriptions
Module Description
Apache::AxKit::XMLFinder Detects XML files
Apache::MimeXML mod_perl mime encoding sniffer for XML files
Apache::MimeXML mod_perl mime encoding sniffer for XML files
Boulder::XML XML format input/output for Boulder streams
Bundle::XML A bundle to install all XML- related modules
CGI::XMLForm Extension of CGI.pm, which reads/generates formatted XML
Data::DumpXML Dump arbitrary data structures as XML
DBIx::XML_RDB Perl extension for creating XML from existing DBI datasources
GoXML::XQI Perl extension for the XML Query Interface at xqi.goxml.com
Mail::XML Adds a toXML() method to Mail::Internet
MARC::XML A subclass of MARC.pm to provide XML support
PApp::XML pxml sections and more
XML::Catalog Resolves public identifiers and remaps system identifiers
XML::CGI Perl extension for converting CGI.pm variables to and from XML
XML::Checker A Perl module for validating XML
XML::Checker::Parser An XML::Parser that validates at parse time
XML::DOM A Perl module for building DOM Level 1compliant document structures
XML::DOM::NamedNodeMap A hashtable interface for XML::DOM
XML::DOM::NodeList A node list as used by XML::DOM
XML::DOM::PerlSAX Old name of XML::Handler::BuildDOM
XML::DOM::ValParser An XML::DOM::Parser that validates at parse time
XML::Driver::HTML SAX driver for HTML that is not well formed
XML::DT A package for down translation of XML to strings
XML::Edifact Perl module to handle XML::Edifact messages
XML::Encoding A Perl module for parsing XML encoding maps
XML::ESISParser Perl SAX parser using nsgmls
XML::Filter::DetectWS A PerlSAX filter that detects ignorable whitespace
XML::Filter::Hekeln A SAX stream editor
XML::Filter::Reindent Reformats whitespace for attractively printing XML
XML::Filter::SAXT Replicates SAX events to several SAX event handlers
XML::Generator Perl extension for generating XML
XML::Grove Perl-style XML objects
XML::Grove::AsCanonXML Outputs XML objects in canonical XML
XML::Grove::AsString Outputs content of XML objects as a string
XML::Grove::Builder PerlSAX handler for building an XML::Grove
XML::Grove::Factory Simplifies creation of XML::Grove objects
XML::Grove:: Path Returns the object at a path
XML::Grove::PerlSAX A PerlSAX event interface for XML objects
XML::Grove::Sub Runs a filter sub over a grove
XML::Grove::Subst Substitutes values into a template
XML::Handler::BuildDOM PerlSAX handler that creates XML::DOM document structures
XML::Handler::CanonXMLWriter Output XML in canonical XML format
XML::Handler::Composer Another XML printer/writer/ generator
XML::Handler::PrintEvents Prints PerlSAX events (for debugging)
XML::Handler::PyxWriter Converts Perl SAX events to ESIS of nsgmls
XML::Handler::Sample A trivial Perl SAX handler
XML::Handler::Subs A PerlSAX handler base class for calling user -defined subs
XML::Handler::XMLWriter A PerlSAX handler for writing readable XML
XML::Handler::YAWriter Another Perl SAX XML writer
XML::Node Node-based XML parsing: a simplified interface to XML
XML::Parser A Perl module for parsing XML documents
XML::Parser:: Expat Low-level access to James Clark's expat XML parser
XML::Parser::PerlSAX Perl SAX parser using XML::Parser
XML::Parser::PyxParser Converts ESIS of nsgmls or Pyxie to Perl SAX
XML::PatAct::Amsterdam An action module for simplistic stylesheets
XML::PatAct::MatchName A pattern module for matching element names
XML::PatAct::ToObjects An action module for creating Perl objects
XML::PYX XML-to-PYX generator
XML::QL An XML query language
XML::RegExp Regular expressions for XML tokens
XML::Registry Perl module for loading and saving an XML registry
XML::RSS Creates and updates RSS files
XML::SAX2Perl Translates Perl SAX methods to Java/CORBAstyle methods
XML::SAX2Perl Translates Perl SAX methods to Java/CORBAstyle methods
XML::Simple Trivial API for reading and writing XML ( especially config files)
XML::Stream Creates and XML Stream connection and parses return data
XML::Stream::Namespace Object to make defining namespaces easier
XML::Template Perl XML template instantiation
XML::Twig A Perl module for processing huge XML documents in tree mode
XML::UM Converts UTF-8 strings to any encoding supported by XML::Encoding
XML::Writer Perl extension for writing XML documents
XML::XPath A set of modules for parsing and evaluating XPath
XML::XPath::Boolean Boolean true / false values
XML::XPath::Builder SAX handler for building an XPath tree
XML::XPath::Literal Simple string values
XML::XPath::Node Internal representation of a node
XML::XPath::NodeSet A list of XML document nodes
XML::XPath::Number Simple numeric values
XML::XPath::PerlSAX A PerlSAX event generator
XML::XPath::XMLParser The default XML parsing class that produces a node tree
XML::XQL A Perl module for querying XML tree structures with XQL
XML::XQL::Date Adds an XQL::Node type for representing and comparing dates and times
XML::XQL::DOM Adds XQL support to XML::DOM nodes
XML::XSLT A Perl module for processing XSLT
XMLNews::HTMLTemplate A module for converting NITF to HTML
XMLNews::Meta A module for reading and writing XMLNews metadata files

Most of the Perl XML modules that appear in Table 20-2 must be downloaded and installed before you can use them. (The process is a little lengthy, if straightforward; Download Manager tools exist for Windows and UNIX that will manage the download and installation process for you and make things easier.) The Perl distribution does come with some XML support built in, such as the XML::Parser module.

Here's an example that puts XML::Parser to work. In this case, I'll parse an XML document and print it using Perl. The XML::Parser module can handle callbacks, calling subroutines when the beginning of an element is encountered , as well as the text content in an element and the end of an element. Here's how I set up such calls to the handler subroutines start_handler , char_handler , and end_handler , respectively, creating a new parser object named $parser in Perl:

 use XML::Parser;  $parser = new XML::Parser(Handlers => {Start => \&start_handler,          End   => \&end_handler,          Char  => \&char_handler});     .     .     . 

Now I need an XML document to parse. I'll use a document we've seen before, ch07_01.xml:

 <?xml version="1.0"?>  <MEETINGS>    <MEETING TYPE="informal">        <MEETING_TITLE>XML In The Real World</MEETING_TITLE>        <MEETING_NUMBER>2079</MEETING_NUMBER>        <SUBJECT>XML</SUBJECT>        <DATE>6/1/2003</DATE>        <PEOPLE>            <PERSON ATTENDANCE="present">                <FIRST_NAME>Edward</FIRST_NAME>                <LAST_NAME>Samson</LAST_NAME>            </PERSON>            <PERSON ATTENDANCE="absent">                <FIRST_NAME>Ernestine</FIRST_NAME>                <LAST_NAME>Johnson</LAST_NAME>            </PERSON>            <PERSON ATTENDANCE="present">                <FIRST_NAME>Betty</FIRST_NAME>                <LAST_NAME>Richardson</LAST_NAME>            </PERSON>        </PEOPLE>    </MEETING> </MEETINGS> 

I can parse that document using the $parser object's parsefile method:

 use XML::Parser;  $parser = new XML::Parser(Handlers => {Start => \&start_handler,          End   => \&end_handler,          Char  => \&char_handler});  $parser->parsefile('ch07_01.xml');  .     .     . 

All that remains is to create the subroutines start_handler , char_handler , and end_handler . I'll begin with start_handler , which is called when the start of an XML element is encountered. The name of the element is stored in item 1 of the standard Perl array @_ , which holds the arguments passed to subroutines. I can display that element's opening tag like this:

 use XML::Parser;  $parser = new XML::Parser(Handlers => {Start => \&start_handler,          End   => \&end_handler,          Char  => \&char_handler});     $parser->parsefile('ch07_01.xml');  sub start_handler   {   print "<$_[1]>\n";   }  .     .     . 

I'll also print the closing tag in the end_handler subroutine:

 use XML::Parser;  $parser = new XML::Parser(Handlers => {Start => \&start_handler,          End   => \&end_handler,          Char  => \&char_handler});     $parser->parsefile('ch07_01.xml'); sub start_handler {     print "<$_[1]>\n"; }  sub end_handler   {   print "</$_[1]>\n";   }  .     .     . 

And I can print the text content of the element in the char_handler subroutine after removing discardable whitespace:

Listing ch20_05.pl
 use XML::Parser; $parser = new XML::Parser(Handlers => {Start => \&start_handler,          End   => \&end_handler,          Char  => \&char_handler}); $parser->parsefile('ch07_01.xml'); sub start_handler {     print "<$_[1]>\n"; } sub end_handler {     print "</$_[1]>\n"; }  sub char_handler   {   if(index($_[1], " ") < 0 && index($_[1], "\n") < 0){   print "$_[1]\n";   }   }  

That completes the code. Running this Perl script gives you this result, where you can see that ch07_01.xml was indeed parsed successfully:

 <MEETINGS>  <MEETING> <MEETING_TITLE> XML </MEETING_TITLE> <MEETING_NUMBER> 2079 </MEETING_NUMBER> <SUBJECT> XML </SUBJECT> <DATE> 6/1/2002 </DATE> <PEOPLE> <PERSON> <FIRST_NAME> Edward </FIRST_NAME> <LAST_NAME> Samson </LAST_NAME> </PERSON> <PERSON> <FIRST_NAME> Ernestine </FIRST_NAME> <LAST_NAME> Johnson </LAST_NAME> </PERSON> <PERSON> <FIRST_NAME> Betty </FIRST_NAME> <LAST_NAME> Richardson </LAST_NAME> </PERSON> </PEOPLE> </MEETING> </MEETINGS> 

Writing this script, parsing the document, and implementing callbacks like this in Perl may remind you quite closely of the Java SAX work we did in Chapter 12.

I'll take a look at serving XML documents from Perl scripts next . Unfortunately, Perl does not come with a built-in database protocol as powerful as JDBC and its ODBC handler, or ASP and its ADO support. The database support that comes built into Perl is based on DBM files, which are hash-based databases (although now, of course, you can install many Perl modules to interface to other database protocols, from ODBC to Oracle).

In this case, I'll write a Perl script that will let you enter a key (such as vegetable ) and a value (such as broccoli ) to store in a database built in the NDBM database format, which is a default format that Perl does support. This database will be stored on the server. When you enter a key into the page created by this script, the code checks the database for a match to that key and, if found, returns the key and its value. For example, when I enter the key vegetable in the database and the value broccoli , that key/value pair is stored in the database. When you subsequently search for a match to the key vegetable , the script returns both that key and the matching value, broccoli in an XML document using the tags <key> and <value> :

 <?xml version="1.0" ?>  <document>     <key>vegetable</key>     <value>broccoli</value> </document> 

You can see the results of the CGI script we'll create in Figure 20-4. To add an entry to the database, you enter a key into the text field marked Key to Add to the Database, and you enter a corresponding value in the text field marked Value to Add to the Database. Then you click the Add to Database button. In Figure 20-4, I'm storing the value broccoli under the key vegetable .

Figure 20-4. A Perl CGI script database manager.

graphics/20fig04.gif

To retrieve a value from the database, you enter the value's key in the box marked Key to Search For, which you see in Figure 20-4. Then you click the Look Up Value button. When you do, the database is searched and an XML document with the results is sent to the client, as you see in Figure 20-5. In this case, I've searched for the key vegetable , and the result is as it should be, as you see in Figure 20-5. Although this XML document is displayed in a browser, it's relatively easy to use Internet sockets in Perl code to let you read and handle such XML without a browser.

Figure 20-5. An XML document generated by a Perl script.

graphics/20fig05.gif

In this Perl script, I'll use CGI.pm, the official Perl CGI module, which comes with the standard Perl distribution. I begin by creating the Web page you see in Figure 20-4, including all the HTML controls we'll need:

 #!/usr/local/bin/perl  use Fcntl; use NDBM_File; use CGI; $co = new CGI; if(!$co->param()) { print $co->header, $co->start_html('CGI Functions Example'), $co->center($co->h1('CGI Database Example')), $co->hr, $co->b("Add a key/value pair to the database..."), $co->start_form, "Key to add to the database: ",$co->textfield(-name=>'key',-default=>'', -override=>1), $co->br, "Value to add to the database: ",$co->textfield(-name=>'value',-default=>'', graphics/ccc.gif -override=>1), $co->br, $co->hidden(-name=>'type',-value=>'write', -override=>1), $co->br, $co->center(     $co->submit('Add to database'),     $co->reset ), $co->end_form, $co->hr, $co->b("Look up a value in the database..."), $co->start_form, "Key to search for: ",$co->textfield(-name=>'key',-default=>'', -override=>1), $co->br, $co->hidden(-name=>'type',-value=>'read', -override=>1), $co->br, $co->center(     $co->submit('Look up value'),     $co->reset ), $co->end_form, $co->hr; print $co->end_html; }     .     .     . 

This CGI creates two HTML forms, one for use when you want to store key/value pairs and one when you want to enter a key to search for. I didn't specify a target for these two HTML forms in this page to send their data to, so the data will simply be sent back to the same script. I can check whether the script has been called with data to be processed by checking the return value of the CGI.pm param method; if it's true , there is data waiting for us to work on.

The document that this script returns is an XML document, not the default HTML. So how do you set the content type in the HTTP header to indicate that? You do so with the header method, setting the type named parameter to "application/xml ". This code follows the previous code in the script:

 if($co->param()) {     print $co->header(-type=>"application/xml");     print "<?xml version = \"1.0\"?>";     print "<document>";     .     .     . 

I keep the two HTML forms separate with a hidden data variable named type . If that variable is set to write , I enter the data the user supplied into the database:

 if($co->param()) {     print $co->header(-type=>"application/xml");     print "<?xml version = \"1.0\"?>";     print "<document>";  if($co->param('type') eq 'write') {   tie %dbhash, "NDBM_File", "dbdata", O_RDWRO_CREAT, 0644;   $key = $co->param('key');   $value = $co->param('value');   $dbhash{$key} = $value;   untie %dbhash;   if ($!) {   print "There was an error: $!";   } else {   print "$key=>$value stored in the database";   }   }  .     .     . } 

Otherwise, I search the database for the key the user has specified and return both the key and the corresponding value in an XML document:

 if($co->param()) {     print $co->header(-type=>"application/xml");     print "<?xml version = \"1.0\"?>";     print "<document>";     if($co->param('type') eq 'write') {         tie %dbhash, "NDBM_File", "dbdata", O_RDWRO_CREAT, 0644;         $key = $co->param('key');         $value = $co->param('value');         $dbhash{$key} = $value;         untie %dbhash;         if ($!) {             print "There was an error: $!";         } else {             print "$key=>$value stored in the database";         }  } else {   tie %dbhash, "NDBM_File", "dbdata", O_RDWRO_CREAT, 0644;   $key = $co->param('key');   $value = $dbhash{$key};   print "<key>";   print $key;   print "</key>";   print "<value>";   print $value;   print "</value>";   if ($value) {   if ($!) {   print "There was an error: $!";   }   } else {   print "No match found for that key";   }   untie %dbhash;   }   print "</document>";  } 

In this way, we've been able to store data in a database using Perl, and retrieve that data, formatted as XML. Here's the complete listing:

Listing ch20_06.cgi
 #!/usr/local/bin/perl use Fcntl; use NDBM_File; use CGI; $co = new CGI; if(!$co->param()) { print $co->header, $co->start_html('CGI Functions Example'), $co->center($co->h1('CGI Database Example')), $co->hr, $co->b("Add a key/value pair to the database..."), $co->start_form, "Key to add to the database: ",$co->textfield(-name=>'key',-default=>'', -override=>1), $co->br, "Value to add to the database: ",$co->textfield(-name=>'value',-default=>'', graphics/ccc.gif -override=>1), $co->br, $co->hidden(-name=>'type',-value=>'write', -override=>1), $co->br, $co->center(     $co->submit('Add to database'),     $co->reset ), $co->end_form, $co->hr, $co->b("Look up a value in the database..."), $co->start_form, "Key to search for: ",$co->textfield(-name=>'key',-default=>'', -override=>1), $co->br, $co->hidden(-name=>'type',-value=>'read', -override=>1), $co->br, $co->center(     $co->submit('Look up value'),     $co->reset ), $co->end_form, $co->hr; print $co->end_html;  } if($co->param()) {     print $co->header(-type=>"application/xml");     print "<?xml version = \"1.0\"?>";     print "<document>";     if($co->param('type') eq 'write') {         tie %dbhash, "NDBM_File", "dbdata", O_RDWRO_CREAT, 0644;         $key = $co->param('key');         $value = $co->param('value');         $dbhash{$key} = $value;         untie %dbhash;         if ($!) {             print "There was an error: $!";         } else {             print "$key=>$value stored in the database";         }     } else {         tie %dbhash, "NDBM_File", "dbdata", O_RDWRO_CREAT, 0644;         $key = $co->param('key');         $value = $dbhash{$key};         print "<key>";         print $key;         print "</key>";         print "<value>";         print $value;         print "</value>";         if ($value) {            if ($!) {                 print "There was an error: $!";             }         } else {             print "No match found for that key";         }         untie %dbhash;     }     print "</document>"; } 


Real World XML
Real World XML (2nd Edition)
ISBN: 0735712867
EAN: 2147483647
Year: 2005
Pages: 440
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net