Importing XML Data into a Database

I l @ ve RuBoard

Now that you know how to convert database records into well- formed XML documents, let's turn things around to look at how the process works in reverse. This section examines how data marked up with XML can be converted into SQL statements and inserted into a database.

Dynamically Constructing SQL Queries from an XML Document with SAX

Again, we'll begin with a simple exampleconsider the XML document in Listing 7.10.

Listing 7.10 An XML Book List ( books.xml )
 <?xml version="1.0"?>  <list>         <item>                <title>Waking Up With The Red-Eyed Bed Monster</title>                <author>J. I. M. Somniac</author>                <blurb>The blood-chillingly true story of one man's fight against the graphics/ccc.gif monsters under his bed.</blurb>         </item>         <item>                <title>The Case Of The Hungry Hippopotamus</title>                <author>P. I. Hardhat</author>                <blurb>A tough private eye is hired to solve the most challenging case of graphics/ccc.gif his career.</blurb>         </item>         <item>                <title>Making Money, Losing Friends</title>                <author>T. Backstabber</author>                <blurb>The bestselling self-help book for corporate executives on the fast graphics/ccc.gif track.</blurb>         </item>         <item>                <title>The Idiot's Guide to Sandwiches</title>                <author>B. L. Tuhmatto</author>                <blurb>Making tasty sandwiches has never been so easy!</blurb>         </item>  </list> 

The process of putting this XML-encoded data into a database involves the following four steps:

  1. Parse the XML data.

  2. Build a list of field-value pairs.

  3. Dynamically generate a query to insert these field-value pairs into the database.

  4. Execute the query.

It's possible to accomplish these tasks using SAX, XSLT, or the DOM. Most of the time, the process is fairly simple; the level of complexity varies depending on how generic you need your script to be. Listing 7.11 demonstrates the process, using SAX in a script designed specifically to handle the XML document described in Listing 7.10.

Listing 7.11 Generating and Executing SQL Statements with SAX
 <?php  // initialize some variables  $currentTag = "";  // this array will hold the values for the SQL statement  $values = array();  // this array will hold allowed fields/elements  $allowedFields = array("title", "author", "blurb");  // XML file to parse  $xml_file="books.xml";  // database parameters  $host = "localhost";  $user = "john";  $pass = "doe";  $db = "library";  $table = "books";  // called when parser finds start tag  function startElementHandler($parser, $name, $attributes)  {       global $currentTag;          $currentTag = $name;  }  // called when parser finds end tag  function endElementHandler($parser, $name)  {       global $values, $currentTag;        // import database link and table name        global $connection, $table;        // if ending <item> tag        // implies end of record        if (strtolower($name) == "item")        {             // generate the query string              $query = "INSERT INTO books";              $query .= "(title, author, blurb) ";              $query .= "VALUES(\"" . join("\", \"", $values) . "\");";              // uncomment for debug              // print $query;              // execute query              $result = mysql_query($query) or die ("Error in query: $query. " . graphics/ccc.gif mysql_error());              // reset all internal counters and arrays              $values = array();              $currentTag = "";        }  }  // called when parser finds cdata  function characterDataHandler($parser, $data)  {       global $currentTag, $values, $allowedFields;        // lowercase tag name        $currentTag = strtolower($currentTag);        // look for tag in $allowedFields[] array        // to see if it is to be included in query        if (in_array($currentTag, $allowedFields) && trim($data) != "")        {             // add field=>value pairs to $values array              $values[$currentTag] = mysql_escape_string($data);        }  }  // initialize parser  $xml_parser = xml_parser_create();  // turn off whitespace processing  xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE, TRUE);  // turn on case folding  xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, FALSE);  // set callback functions  xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler");  xml_set_character_data_handler($xml_parser, "characterDataHandler");  // open connection to database  $connection = mysql_connect($host, $user, $pass) or die ("Unable to connect!");  mysql_select_db($db) or die ("Unable to select database!");  // read XML file  if (!($fp = fopen($xml_file, "r")))  {       die("File I/O error: $xml_file");  }  // parse XML  while ($data = fread($fp, 4096))  {       // error handler        if (!xml_parse($xml_parser, $data, feof($fp)))        {             $error_code = xml_get_error_code($xml_parser);              die("XML parser error (error code " . $error_code . "): " . graphics/ccc.gif xml_error_string($error_code) . "<br>Error occurred at line " . graphics/ccc.gif xml_get_current_line_number($xml_parser));        }  }  // all done, clean up!  xml_parser_free($xml_parser);  mysql_close($connection);  ?> 

Again, most of this should already be familiar to you from Chapter 2, but I'll run through the highlights quickly anyway.

  1. An instance of the SAX parser is initialized and configured to invoke the callback functions startElementHandler() , endElementHandler() , and cdataHandler() when it encounters starting elements, ending elements, and character data respectively:

     $xml_parser = xml_parser_create();  xml_set_element_handler($xml_parser, "startElementHandler",  "endElementHandler");  xml_set_character_data_handler($xml_parser, "characterDataHandler"); 
  2. Because we already decided that this script is to be customized to the XML document presented in Listing 7.10, it's possible to clearly identify which XML elements contain the data we're interested in at the outset, and write code to handle these elements appropriately. A study of the XML data in Listing 7.10 reveals that the elements <title> , <author> , and <book> hold the information that is to be inserted into the database; consequently, every time the parser finds character data, it must first check the corresponding element name. If it matches any of the three names listed previously, it adds the element-value pair to the associative array $values .

     if (in_array($currentTag, $allowedFields) && trim($data) != "")  {       // add field=>value pairs to $values array        $values[$currentTag] = mysql_escape_string($data);  } 
  3. When the parser encounters a closing item elementindicating the end of a particular recordthe elements of the $values array are manipulated to create a single query string, which is then applied to the database via PHP's mysql_query() function:

     if (strtolower($name) == "item")  {       $query = "INSERT INTO books";        $query .= "(title, author, blurb) ";        $query .= "VALUES(\"" . join("\", \"", $values) . "\");";        $result = mysql_query($query) or die ("Error in query: $query. " . mysql_error());  } 
  4. After all the elements within the document have been processed , the parser is terminated , and the database connection is closed.

     xml_parser_free($xml_parser);  mysql_close($connection); 

Listing 7.12 demonstrates the output of this script.

Listing 7.12 A Series of SQL Statements Dynamically Generated from an XML File
 INSERT INTO books(title, author, blurb) VALUES("Waking Up With The Red-Eyed Bed Monster", graphics/ccc.gif "J. I. M. Somniac", "The blood-chillingly true story of one man\'s fight against the graphics/ccc.gif monsters under his bed.");  INSERT INTO books(title, author, blurb) VALUES("The Case Of The Hungry Hippopotamus", "P. graphics/ccc.gif I. Hardhat", "A tough private eye is hired to solve the most challenging case of his graphics/ccc.gif career.");  INSERT INTO books(title, author, blurb) VALUES("Making Money, Losing Friends", "T. graphics/ccc.gif Backstabber", "The bestselling self-help book for corporate executives on the fast graphics/ccc.gif track.");  INSERT INTO books(title, author, blurb) VALUES("The Idiot\'s Guide to Sandwiches", "B. L. graphics/ccc.gif Tuhmatto", "Making tasty sandwiches has never been so easy!"); 

Now, Listing 7.11 makes certain assumptions about the field names and the number of values in each SQL statement. These assumptions may not always be true, so let's see if we can develop something a little more generic by using the XML document in Listing 7.13 as the base.

Listing 7.13 A More Generic XML Table Representation ( data.xml )
 <?xml version="1.0"?>  <table name="readings">        <record>              <a>56</a>              <b>1222</b>              <c>78.5</c>        </record>        <record>              <x>45</x>              <y>-10</y>        </record>        <record>              <x>12</x>              <b>10459</b>              <a>756</a>              <y>9</y>        </record>  </table> 

This XML document is a little more generic than the one demonstrated in Listing 7.10. The table name is specified as an attribute of the document element <table> , whereas individual table records are enclosed within <record> elements. Every record contains a series of field names (elements) and field values (character data). The field names within each <record> may be variable, as may be the total number of fields per record.

Listing 7.14 builds on the basic concepts demonstrated in Listing 7.11 and the generic XML document described in Listing 7.13 to create a more generic parserone that makes no assumptions about the number of values in each record, and retrieves the field names from the XML data itself (rather than having them hard-coded into the script).

Listing 7.14 A More Generic XML-to-SQL Converter
 <?php  // initialize some variables  $currentTag = "";  // these two arrays will hold the fields and values for the SQL statement  $fields = array();  $values = array();  // XML file to parse  $xml_file="data.xml";  // database parameters  // get these via user input  $host = "localhost";  $user = "john";  $pass = "doe";  $db = "db423";  // called when parser finds start tag  function startElementHandler($parser, $name, $attributes)  {       global $currentTag, $table;        $currentTag = $name;        // get table name        if (strtolower($currentTag) == "table")        {             $table = $attributes["name"];        }  }  // called when parser finds end tag  function endElementHandler($parser, $name)  {       global $fields, $values, $count, $currentTag;        // import database link and table name        global $connection, $table;        // if </record> tag        // implies end of record        if (strtolower($name) == "record")        {             // generate the query string              $query = "INSERT INTO $table";              $query .= "(" . join(", ", $fields) . ")";              $query .= " VALUES(\"" . join("\", \"", $values) . "\");";              // uncomment for debug              // print $query;              // execute query              mysql_query($query) or die ("Error in query: $query. " .mysql_error());              // reset all internal counters and arrays              $fields = array();              $values = array();              $count = 0;              $currentTag = "";        }  }  // called when parser finds cdata  function characterDataHandler($parser, $data)  {       global $fields, $values, $currentTag, $count;        if (trim($data) != "")        {             // add field-value pairs to $fields and $values array              // the index of each array is used to correlate the field-value pairs              $fields[$count] = $currentTag;              // escape quotes with slashes              $values[$count] = mysql_escape_string($data);              $count++;        }  }  // initialize parser  $xml_parser = xml_parser_create();  // turn off whitespace processing  xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE, TRUE);  // turn on case folding  xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, FALSE);  // set callback functions  xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler");  xml_set_character_data_handler($xml_parser, "characterDataHandler");  // open connection to database  $connection = mysql_connect($host, $user, $pass) or die ("Unable to connect!");  mysql_select_db($db) or die ("Unable to select database!");    // read XML file  if (!($fp = fopen($xml_file, "r")))  {       die("File I/O error: $xml_file");  }  // parse XML  while ($data = fread($fp, 4096))  {       // error handler        if (!xml_parse($xml_parser, $data, feof($fp)))        {             $ec = xml_get_error_code($xml_parser);              die("XML parser error (error code " . $ec . "): " . xml_error_string($ec) . graphics/ccc.gif "<br>Error occurred at line " . xml_get_current_line_number($xml_parser));        }  }  // all done, clean up!  xml_parser_free($xml_parser);  mysql_close($connection);  ?> 

This is fairly involved, so an explanation is in order:

  1. The first step is to set up a bunch of required variables: the name of the XML file to be parsed, database access parameters, and so on. Note that, unlike Listing 7.11, there's no need to specify a table name herethat information will be sourced directly from the XML document.

     $currentTag = "";  $fields = array();  $values = array();  $host = "localhost";  $user = "john";  $pass = "doe";  $db = "db423"; 
  2. Next , the SAX parser is initialized, parsing options are set, and callback functions are defined:

     $xml_parser = xml_parser_create();  xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE, TRUE);  xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, FALSE);  xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler");  xml_set_character_data_handler($xml_parser, "characterDataHandler"); 

    A connection is also opened to the database in preparation for the queries that will follow:

     $connection = mysql_connect($host, $user, $pass) or die ("Unable to connect!");  mysql_select_db($db) or die ("Unable to select database!"); 
  3. With everything in place, we're ready to roll. The xml_parse() function is used to parse the XML data, invoking the specified callback functions as it finds elements and character data in the document.

     xml_parse($xml_parser, $data, feof($fp)); 
  4. Every time the parser encounters a start tag, it invokes the startTagHandler() function, which assigns the element name to the $currentTag variable and makes it globally available to other functions in the script (specifically, to the characterDataHandler() function, which is the only one that actually uses it). It also checks the element name to see whether it is the document element <table> ; if so, it retrieves the table name.

     $currentTag = $name;  // get table name  if (strtolower($currentTag) == "table")  {       $table = $attributes["name"];  } 
  5. Whenever the parser encounters character data, it creates a field-value pair, plac-ing the element (field) nameavailable from the $currentTag variablein the $fields array, and the character data (value) in the corresponding slot in the $values array.

     $fields[$count] = $currentTag;  $values[$count] = mysql_escape_string($data); 
  6. After the parser hits the end of an individual recorda closing <record> elementthe end tag handler uses the collected field-value pairs to build an INSERT statement, and executes this statement via a call to the mysql_query() function. It then resets all the variables used $currentTag and the field and value arraysin preparation for parsing the next record.

     if (strtolower($name) == "record")  {       $query = "INSERT INTO $table";        $query .= "(" . join(", ", $fields) . ")";        $query .= "VALUES(\"" . join("\", \"", $values) . "\");";        mysql_query($query) or die ("Error in query: $query. " . mysql_error());  } 
  7. After parsing has concluded, both the parser and the database connection are gracefully terminated.

     xml_parser_free($xml_parser);  mysql_close($connection); 

Looking Ahead

If you found Listing 7.14 interesting, you'll probably also enjoy the section on Metabase in Chapter 9, "Case Studies."

Listing 7.15 demonstrates the output.

Listing 7.15 A Series of SQL Statements Generated from an XML File
 INSERT INTO readings(a, b, c) VALUES("56", "1222", "78.5");  INSERT INTO readings(x, y) VALUES("45", "-10");  INSERT INTO readings(x, b, a, y) VALUES("12", "10459", "756", "9"); 

Now, although this is fairly complicated, it works quite well. But don't be fooled into thinking it's the only wayif there's anything this chapter demonstrates, it's that the XML/PHP combination offers more than one way to skin a cat. In this case, cat-skinning technique number two involves using an XSLT stylesheet to generate the query string, with PHP standing by to actually execute the query.

Dynamically Constructing SQL Queries from an XML Document with XSLT

As you may already know, XSLT provides an efficient, powerful mechanism to transform an XML document into a new result documenthere, a series of SQL statements. A number of basic XSLT constructsincluding loops , conditional statements, and XPath node testsare available to accomplish this. They are used with great success in Listing 7.16, which lists the complete stylesheet code to perform this transformation.

Listing 7.16 An XSLT Stylesheet to Construct SQL INSERT Statements ( sql.xsl )
 <?xml version="1.0"?>  <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">  <!-- output as text, strip whitespace -->  <xsl:output method="text" indent="no" />  <xsl:strip-space elements="*"/>  <!-- look for the table node  -->  <xsl:template match="/table" >        <!-- iterate through each record -->        <xsl:for-each select="record">              <!-- get the table name -->              <xsl:text>INSERT INTO </xsl:text>              <xsl:value-of select="/table/@name" />              <xsl:text>(</xsl:text>              <!-- iterate through child elements and get field names -->              <xsl:for-each select="child::*">                    <xsl:value-of select="name()" />                    <xsl:if test="position() != last()">, </xsl:if>              </xsl:for-each>              <xsl:text>) VALUES(</xsl:text>              <!-- iterate through child elements and get values -->              <xsl:for-each select="child::*">                    <xsl:text>"</xsl:text>                    <xsl:value-of select="." />                    <xsl:text>"</xsl:text>                    <xsl:if test="position() != last()">                          <xsl:text>,</xsl:text>                    </xsl:if>              </xsl:for-each>            <xsl:text>);</xsl:text>        </xsl:for-each>  </xsl:template>  </xsl:stylesheet> 

Listing 7.17 processes this stylesheet using PHP's XSLT API.

Listing 7.17 An XSLT-Based XML-to-SQL Converter
 <?php  // XML file  $xml_file = "data.xml";  // XSLT stylesheet  $xslt_file = "sql.xsl";  // database parameters  $host = "localhost";  $user = "john";  $pass = "doe";  $db = "db423";    // create the XSLT processor  $xp = xslt_create() or die("Could not create XSLT processor");  // process the two files to get the desired output  if($result = xslt_process($xp, $xml_file, $xslt_file))  {       // uncomment for debug        // echo $result;        // open connection to database        $connection = mysql_connect($host, $user, $pass) or die ("Unable to connect!");        mysql_select_db($db) or die ("Unable to select database!");        // split XSLT output into individual queries        $queries = explode(";", $result);        // execute each query        foreach($queries as $query)        {             if(!empty($query))              {                   mysql_query($query) or die ("Error in query: $query. " . graphics/ccc.gif mysql_error());              }        }        // close database connection        mysql_close($connection);  }  else  {       // else display error        echo "An error occurred: " . xslt_error($xp) . "(error code " . xslt_errno($xp) . graphics/ccc.gif ")";  }  // clean up  xslt_free($xp);  ?> 

In this case, the output from the XSL Transformation is equivalent to that in Listing 7.15 a series of SQL statements, returned as a single string. These SQL statements are separated from each other and individually executed via PHP's mysql_query() function.

Things to Do on a Rainy Day

Obviously, INSERT statements aren't the only types of SQL statements you can buildit's just as easy to dynamically construct UPDATE and DELETE statements from XML data. And you don't have to do it using just SAX or XSLT eitherthis type of conversion lends itself very well to DOM-type tree traversal and node manipulation. PHP's DOM extension represents every node on the DOM tree as an object, exposing properties such as the element name and value; consequently, it's not too hard to manipulate these properties to build an INSERT or UPDATE statement.

In order to gain a better understanding of the XML-to-SQL conversion process, you should attempt to duplicate the preceding examples using the DOM; for variety, try constructing a series of UPDATE instead of INSERT statements.

I l @ ve RuBoard


XML and PHP
XML and PHP
ISBN: 0735712271
EAN: 2147483647
Year: 2002
Pages: 84

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net