Recipe 10.40. Importing XML into MySQLProblemYou want to import an XML document into a MySQL table. SolutionSet up an XML parser to read the document. Then use the records in the document to construct and execute INSERT statements. Discussion
Importing an XML document depends on being able to parse the document and extract record contents from it. The way that you do this depends on how the document is written. For example, one format might represent column
<?xml version="1.0" encoding="UTF-8"?> <rowset> <row> <column name="subject" value="Jane" /> <column name="test" value="A" /> <column name="score" value="47" /> </row> <row> <column name="subject" value="Jane" /> <column name="test" value="B /> <column name="score" value="50" /> </row> ... </rowset> Another format uses column names as element names and column values as the contents of those elements: <?xml version="1.0" encoding="UTF-8"?> <rowset> <row> <subject>Jane</subject> <test>A</test> <score>47</score> </row> <row> <subject>Jane</subject> <test>B</test> <score>50</score> </row> ... </rowset>
Due to the various structuring possibilities, it's necessary to make some assumptions about the format you expect the XML document to have. For the example here, I'll assume the second format just shown. One way to process this kind of document is to use the XML::XPath module, which enables you to refer to elements within the document using path expressions. For example, the
The following script, xml_to_mysql.pl , takes three arguments: % xml_to_mysql.pl db_name tbl_name xml_file
The filename argument indicates which document to import, and the database and table name arguments
#!/usr/bin/perl
# xml_to_mysql.pl - read XML file into MySQL
use strict;
use warnings;
use DBI;
use XML::XPath;
#
... process command-line options (not shown) ...
#
... connect to database (not shown) ...
# Open file for reading
my $xp = XML::XPath->new (filename => $file_name);
my $row_list = $xp->find ("//row"); # find set of <row> elements
print "Number of records: " . $row_list->size () . "\n";
foreach my $row ($row_list->get_nodelist ()) # loop through rows
{
my @name; # array for column names
my @val; # array for column values
my $col_list = $row->find ("*"); # child columns of row
foreach my $col ($col_list->get_nodelist ()) # loop through columns
{
# save column name and value
push (@name, $col->getName ());
push (@val, $col->string_value ());
}
# construct INSERT statement, and then execute it
my $stmt = "INSERT INTO $tbl_name ("
. join (",", @name)
. ") VALUES ("
. join (",", ("?") x scalar (@val))
. ")";
$dbh->do ($stmt, undef, @val);
}
$dbh->disconnect ();
The script creates an XML::XPath object, which opens and parses the document. Then the object is queried for the set of <row> elements, using the path //row . The size of this set indicates how many records the document contains.
To process each row, the script uses the path
*
to ask for all the child elements of the row object. Each child corresponds to a column within the row; using
*
as the path for
get_nodelist( )
this way is
In the previous section, we used
mysql_to_xml.pl
to export the contents of the
expt
table as an XML document.
xml_to_mysql.pl
can perform the
% xml_to_mysql.pl cookbook expt expt.xml As it processes the document, the script generates and executes the following set of statements:
INSERT INTO expt (subject,test,score) VALUES ('Jane','A','47')
INSERT INTO expt (subject,test,score) VALUES ('Jane','B','50')
INSERT INTO expt (subject,test) VALUES ('Jane','C')
INSERT INTO expt (subject,test) VALUES ('Jane','D')
INSERT INTO expt (subject,test,score) VALUES ('Marvin','A','52')
INSERT INTO expt (subject,test,score) VALUES ('Marvin','B','45')
INSERT INTO expt (subject,test,score) VALUES ('Marvin','C','53')
INSERT INTO expt (subject,test) VALUES ('Marvin','D')
Note that these statements do not all insert the same number of columns. MySQL will set the missing columns to their default values. |