19.3 XML and PostgreSQL


PostgreSQLis an object relational database system. Unlike databases such as Tamino, PostgreSQL is not an XML-based database. This leads to higher efficiency, but when working with XML it also has some minor disadvantages.

In this section you will learn to use PostgreSQL in combination with XML. You will see that it's easy, and the fact that PostgreSQL is an object relational database offers some significant advantages.

19.3.1 Installation

In PostgreSQL's contributed directory, you can find a directory called xml. This directory contains a package for working with XML inside a PostgreSQL database. To install the package, just run make:

 [root@duron xml]# make gcc -O3 -march=athlon  -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -g -I../../src/include   -c -o pgxml_dom.o pgxml_dom.c gcc -shared -lxml2 -o pgxml_dom.so pgxml_dom.o if [ -z "$USER" ]; then USER=$LOGNAME; fi; \ if [ -z "$USER" ]; then USER=`whoami`; fi; \ if [ -z "$USER" ]; then echo 'Cannot deduce $USER.'; exit 1; fi; \ rm -f pgxml_dom.sql; \ C=`pwd`; \ sed -e "s:_CWD_:$C:g" \     -e "s:_OBJWD_:$C:g" \     -e "s:_DLSUFFIX_:.so:g" \     -e "s/_USER_/$USER/g" < pgxml_dom.source > pgxml_dom.sql rm pgxml_dom.o 

After compiling the source code, you can insert the functions in the package to PostgreSQL. This can easily be done with the help of psql:

 [hs@duron xml]$ psql phpbook < pgxml_dom.sql CREATE CREATE 

Two functions were added to the database. You can use these two functions to access and to validate an XML document.

19.3.2 Using the Module

In this section you will start to use PostgreSQL's XML functions. First, you can create a table for storing XML documents:

 phpbook=# CREATE TABLE xmldocument (id serial, tstamp timestamp DEFAULT now(), document text); NOTICE:  CREATE TABLE will create implicit sequence 'xmldocument_id_seq' for SERIAL column 'xmldocument.id' NOTICE:  CREATE TABLE / UNIQUE will create implicit index 'xmldocument_id_key' for table 'xmldocument' CREATE 

The first column contains an id. In the second column you can see when the document has been inserted into the database. The third column is an ordinary text column. It will contain the XML documents we are going to insert into the database.

For adding an XML document to the database, you can write a simple PHP script as shown in the next listing:

 <?php         $xmlfile = 'file.xml';          # file containing XML code         # connecting to the database         $dbh = pg_connect("dbname=phpbook user=postgres host=localhost");         if      (!$dbh)                 die ("cannot open connection to the database<br>\n");         # reading the file         $xml = '';                      # this var. will contain the XML file         $data = file($xmlfile);         foreach ($data as $tmp)         {                 $xml .= $tmp;         }         # escaping characters and importing file         $xml = preg_replace("/'/", "''", $xml);         $sql = "INSERT INTO xmldocument (document) VALUES ('$xml')";         $status = pg_exec($dbh, $sql);         if      (!$status)                 die ("cannot execute query<br>\n");         # disconnecting         pg_close($dbh); ?> 

The first thing to do is to connect to the database. Now the XML file is read and a variable containing the entire document is compiled. The file function returns the content of the file in an array. To compile a variable containing everything, a loop has to be processed. After that, single quotes are escaped. This is necessary because otherwise inserting the data into the database might fail because of a syntax error. Escaping is done by using a simple regular expression that does nothing except make two single quotes out of one. Now that the SQL statement has been prepared, you can insert it into the database.

After running the script, you can add the file to the table:

 phpbook=# SELECT id, tstamp, length(document) FROM xmldocument;  id |            tstamp            | length ----+------------------------------+--------   1 | 2002-02-05 00:56:23.66103+01 |    434 (1 row) 

We decided to display the length of the XML document instead of the content of the column because that would lead to an extremely long output.

19.3.2.1 Validating a Document

In the next step, it is time to see if the XML document is still correct and well formed. You can use the pgxml_parse function. It parses the input and returns a Boolean value that tells whether the document has a syntax error. Here's an example:

 phpbook=# SELECT pgxml_parse(document) FROM xmldocument;  pgxml_parse -------------  t (1 row) 

In this case the XML document is well formed because true is returned. The result is correct because we have already validated the XML document in the section called "XML Basics" earlier in this chapter.

After a file has been imported into the database, its validity can be checked. However, it is better to check if an XML document is valid before adding it to the table. This will help you to prevent errors. Before creating the modified version of the table, it is necessary to delete the old table and the sequence related to it:

 phpbook=# DROP TABLE xmldocument; DROP phpbook=# DROP SEQUENCE xmldocument_id_seq; DROP 

Then you can create a new table using a simple CREATE TABLE statement:

 CREATE TABLE xmldocument (         id serial,         tstamp timestamp DEFAULT now(),         document text CHECK (pgxml_parse(document) = 't') ); 

A CHECK constraint has been added to the CREATE TABLE statement. This CHECK constraint makes PostgreSQL call the pgxml_parse function before adding the record to the table. This way, documents that are not well formed cannot be added to the table.

Let's run the PHP script again and see what happens inside the database:

 phpbook=# SELECT id, tstamp, length(document) FROM xmldocument;  id |            tstamp             | length ----+-------------------------------+--------   1 | 2002-02-05 13:51:40.619786+01 |    434 (1 row) 

The record has been successfully added to the database. Now try to insert a random string:

 phpbook=# INSERT INTO xmldocument (document) VALUES ('not valid'); ERROR:  ExecAppend: rejected due to CHECK constraint xmldocument_document 

"not valid" is definitely not a well-formed XML document, so the INSERT operation fails.

19.3.2.2 Access to the Document

After documents have been added to the table, you might want to retrieve data from these XML documents. PostgreSQL offers a function called pgxml_xpath. As you can see from the name of the function, it is based on Xpath. Xpath is a language for referring to a particular part or component of an XML document. Here's an example:

 phpbook=# SELECT pgxml_xpath(document,'//person/birthday/text()','','') AS birthday FROM xmldocument;   birthday ------------  1978/08/09 (1 row) 

Inside person an element called birthday can be found. The content of this element is displayed. As you can see, the syntax is similar to the one used for URLs. The variable is returned as text. To convert the result to the required format, a cast can be performed:

 phpbook=# SELECT pgxml_xpath(document,'//person/birthday/text()','','')::date AS birthday FROM xmldocument;   birthday ------------  1978-08-09 (1 row) 

Now the result is returned as date.

Of course, it is also possible to access components of the documents that are deeper inside the tree structure:

 phpbook=# SELECT pgxml_xpath(document,'//person/name/firstname/text()','','') AS birthday FROM xmldocument;  birthday ----------  John (1 row) 

This time John has been returned. One thing that is important to know is what is returned if you refer to a nonexistent element:

 phpbook=# SELECT pgxml_xpath(document,'//person/notthere/text()','','') AS notthere FROM xmldocument;  notthere ---------- (1 row) 

An empty string is returned. Keep in mind that it is an empty string and not a NULL value. This is an important thing when checking for empty fields. The next example proves that the field is not a NULL value:

 phpbook=# SELECT pgxml_xpath(document,'//person/notthere/text()','','') IS NULL AS notthere FROM xmldocument;  notthere ----------  f (1 row) 

As you can see, the result is false and not true.

Sometimes it can be useful to retrieve data including some tags. PostgreSQL provides a way to do this as well. Let's take a look at the next example:

 SELECT pgxml_xpath(document,                 '//person/name/firstname/text()',                 'name',                 'firstname')         FROM xmldocument; 

Before taking a closer look at the SQL code in the listing, it is worth looking at the result:

                pgxml_xpath ------------------------------------------  <name><firstname>John</firstname></name> (1 row) 

As you can see, tags can be found around the actual data. These tags are defined by the third and fourth parameter in the function call. Take a look at the next example:

 phpbook=# SELECT pgxml_xpath(document, '//person/name/firstname/text()', 'name', '') FROM xmldocument;     pgxml_xpath -------------------  <name>John</name> (1 row) 

This time there is just one tag in the result because the fourth parameter is left blank.

Up to now PostgreSQL's interface covers just a small part of XML. The module is still being developed, and we will see many additional functions in future versions of PostgreSQL's XML package. XML is an important subject, and John Gray (the father of the XML package) will certainly continue his work.

19.3.3 Problems When Working with XML

Working with XML can lead to some problems that do not occur when sticking to the relational concept.

XML files are ASCII files and they are highly redundant because of the number of tags in the code. When you're working with huge amounts of data, this can be a problem and will lead to poorly performing applications. In addition, it is not that easy to index an XML file. Usually workarounds have to be built to speed up an XML-based application. Although XML is a standard interface for exchanging data, it is just a language and not a database system. Keep this in mind when working with XML. For really fast applications, you won't find significantly faster software than some sort of (relational) database (I am not talking about proprietary software built on bitfields).



PHP and PostgreSQL. Advanced Web Programming2002
PHP and PostgreSQL. Advanced Web Programming2002
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 201

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net