Appendix E: XML Relational Database Technology | Beginning XML Databases (Wrox Beginning Guides)

Figure E-1 shows the various stages of development for XML in various different vendor database engines.

Figure E-1: The various stages of development of XML capabilities, in various databases. All timing aspects are directly related to when this book was written. This appendix was added in mid-May, 2006.

This book initially contained a chapter that covered native XML capabilities in IBM DB2 Database. The XML native aspect of DB2 is covered by a DB2 option called DB2 XML Extender. DB2 XML Extender software was too complex, and the explanation too drawn out for what is essentially a beginner-level book that covers XML in databases, not DB2 Database specifically . This is not meant to imply that DB2 Database is inadequate. It is merely too complex for beginner-level explanation in this particular book. The basics of XML DB2 have been included in this appendix rather than as an individual chapter because details on the DB2 XML Extender option are not included.

The Basics of XML in IBM DB2 Database

Once again, it was decided that DB2 XML Extender was too complex to install and manage for a beginner-level book so I moved the details into this appendix. This section covers the basics of XML in DB2 Database, without digging experimentally in the DB2 XML Extender software.

DB2 XML Datatypes

There are three available XML data types for use in DB2 database:

XMLVARCHAR: XML documents less than 3KB.
XMLCLOB: XML documents up to 2GB.
XMLFILE: XML documents stored externally to DB2 on disk.

To enable a DB2 database for XML use, you need to enable the XML Extender by executing a script called getstart_prep.cmd .

An XML datatype in DB2 database either involves the DB2 Extender, or the use of a basic CLOB object to store an XML document as a string. A CLOB object is a data type used to store very large string values in binary format. Use the XML2CLOB function (superceded by the XMLSERIALIZE function) to convert an XML document into a string for direct database storage into a CLOB data type field.

See the section on SQL/XML later in this appendix. Coverage of the DB2 XML Extender is deliberately omitted from this book because this book is a beginner-level book. The manual covering DB2 XML Extender software is almost 350 pages long, just by itself. DB2 XML Extender requires far too much manual definition, including specialized tables and specialized structural enforcement metadata documents (DAD). Oracle Database and SQL Server database provides much of this functionality automatically, in the way of their particular implementations of an XML data type.

Document Access Definition (DAD) documents specify direct mappings between relational tables and XML documents. In other words, XML data is not stored in a relational database, but stored in rela tional tables where the DAD documents provide a real-time, overhead-consuming, direct mapping between XML document structure and relational table structures. Its a waste of resources because both the external XML and the tables contain the same dataeven though that data is structurally different. This approach is also inefficient because a change to one requires a change to both. Native XMLType data types are much more effective. An XMLType data type effectively creates a form of a Native XML database (or a collection or multiple collections thereof) within a relational database.

Creating XML Documents from a DB2 Database

This section is divided into three separate sections:

The REC2XML Function: Returns very basic record-by-record XML tagged record output from simple queries.
SQL/XML: A large number of SQL/XML functions are in a DB2 database.
DB2 XML Extender: This piece of software extends DB2 database, allowing generation of XML data in CLOB tables, utilizing Document Access Definition (DAD) files, and Websphere. Websphere is a DB2 front-end Graphical User Interface (GUI) package. Additionally, Websphere is a GUI front-end application. This book deals with XML capabilities at the database level (within the database), not front-end applications. The problem with using the DB2 XML Extender software is that it is tremendously complex. It even needs to be specifically installed. It is too complicated for a beginner-level book, which this book is, and DB2 XML Extender is thus deliberately omitted from this publication. The intention of this book is to present the reader with easy methods of resolving XML issues with various relational database engines. Much as I would like to describe, explain, and demonstrate DB2 XML Extender in this book, it is simply too complex for a beginners book.

Using The REC2XML Function

The REC2XML function is a very simple method of returning the result of a query as an XML formatted string. The result is a simple field names and field data format. The syntax for the REC2XML function is a little over complicated for the purposes of this book. Essentially, the REC2XML function can be used to retrieve records from tables, producing a formatted, record-for-record XML tag structure. The following example displays regions :

   SELECT REC2XML(1.0, 'COLATTVAL', '', REGION_ID, REGION, POPULATION, AREA) FROM DEMOGRAPHICS.REGION;

This is a partial result where each row is output on a single line. I have reformatted the output with new lines to make it a little more readable:

   <row>    <column name="REGION_ID">1</column>    <column name="REGION">Africa</column>    <column name="POPULATION">789548670</column>    <column name="AREA">26780325</column> </row> <row>    <column name="REGION_ID">10</column>    <column name="REGION">North America</column>    <column name="POPULATION">331599508</column>    <column name="AREA">18729272</column> </row> ... <row>    <column name="REGION_ID">9</column>    <column name="REGION">Near East</column>    <column name="POPULATION">1499157105</column>    <column name="AREA">4721322</column> </row>

The REC2XML function is very basic.

The DB2 Implementation of SQL/XML

The DB2 SQL/XML functions get a lot better than using only the REC2XML function. Available functions are as follows :

XMLSERIALIZE (CONTENT XML-function AS datatype) : Applies an XML function (as detailed in the list below), producing XML content.
XML2CLOB(XML-function) : Converts an XML document to a string format for storage into a CLOB object field. The XML2CLOB function is superseded by the XMLSERIALIZE function.

The XML functions are as follows:

XMLELEMENT (NAME element [, namespace ] [, XMLATTRIBUTES (... ) ] value ) : Builds an XML element:
```
    <element>value</element>    
```
XMLATTRIBUTES (value AS name [, ... ] ) : Creates an attribute name-value pair within an element:
```
   <element  attribute="value"  >value</element>   
```
XMLFOREST ([namespace] value AS element [, ... ]) : Can be used to build a sequence of elements on a single level:
```
   <element>  <element1>value1</element1>   <element2>value2</element2>  </element>   
```
XMLCONCAT (XML-function [, XML-function ... ] ) : Concatenates a variable number of elements, such as a number of elements in a subtree , passed back up to a parent element.
XMLAGG (XMLELEMENT [ ORDER BY column [, column ... ] ]) : Concatenates and optionally orders a set of XML values into a parent element.
XMLNAMESPACE (namespace AS prefix , ... ] ) : Builds a namespace declaration.

Now let me explain each method by example, as I did with SQL/XML in Chapter 5, which covers Oracle Databases implementation of SQL/XML.

The first example uses the XMLELEMENT function to create XML elements (tags) directly from a query:

   SELECT XMLSERIALIZE(CONTENT XMLELEMENT(NAME "region", REGION) AS CLOB) FROM DEMOGRAPHICS.REGION;

This is the result:

   <region>Africa</region> <region>Asia</region> <region>Australasia</region> <region>Caribbean</region> <region>Central America</region> <region>Europe</region> <region>Far East</region> <region>Middle East</region> <region>Near East</region> <region>North America</region> <region>Oceania</region> <region>Russian Federation</region> <region>South America</region>

You cant do this; DB2 will return an error:

   SELECT XMLELEMENT(NAME "region", REGION) FROM DEMOGRAPHICS.REGION;

In DB2, you have to cast the result into a data type, which is exactly what the XMLSERIALIZE function does. The XMLSERIALIZE function in the previous example typecasts the result of the query into a CLOB object.

Multiple XMLELEMENT functions can even be embedded within one another to produce a hierarchy of multiple layers , as an XML document should be constructed :

   SELECT XMLSERIALIZE(CONTENT XMLELEMENT(    NAME "region",       XMLELEMENT(NAME "name", REGION),       XMLELEMENT(NAME "population", POPULATION)    ) AS CLOB) FROM DEMOGRAPHICS.REGION;

This is a partial result, reformatted on separate lines:

   <region>    <name>Africa</name>    <population>789548670</population> </region> <region>    <name>Asia</name>    <population>47382633</population> </region> ... <region>    <name>South America</name>    <population>375489788</population> </region>

This next example mixes elements and attributes together by adding in the XMLATTRIBUTES function:

   SELECT XMLSERIALIZE(CONTENT XMLELEMENT(    NAME "region", XMLATTRIBUTES(REGION AS "name"),       XMLELEMENT(NAME "population", POPULATION)    ) AS CLOB) FROM DEMOGRAPHICS.REGION;

This is the result with the name attribute highlighted for the first region:

   <region  name="Africa"  ><population>789548670</population></region> <region name="Asia"><population>47382633</population></region> <region name="Australasia"><population>24340222</population></region> <region name="Caribbean"><population>40417697</population></region> <region name="Central America"><population>142653392</population></region> <region name="Europe"><population>488674441</population></region> <region name="Far East"><population>2100636517</population></region> <region name="Middle East"><population>294625718</population></region> <region name="Near East"><population>1499157105</population></region> <region name="North America"><population>331599508</population></region> <region name="Oceania"><population>9133256</population></region> <region name="Russian Federation"><population>258037209</population></region> <region name="South America"><population>375489788</population></region>

And this query places all the fields applicable to each region into the element for each region:

   SELECT XMLSERIALIZE(CONTENT XMLELEMENT(    NAME "region",      XMLATTRIBUTES(REGION AS "name", POPULATION AS "population") ) AS CLOB) FROM DEMOGRAPHICS.REGION;

This is the result with the attributes highlighted for the first region again:

   <region  name="Africa" population="789548670"  ></region> <region name="Asia" population="47382633"></region> <region name="Australasia" population="24340222"></region> <region name="Caribbean" population="40417697"></region> <region name="Central America" population="142653392"></region> <region name="Europe" population="488674441"></region> <region name="Far East" population="2100636517"></region> <region name="Middle East" population="294625718"></region> <region name="Near East" population="1499157105"></region> <region name="North America" population="331599508"></region> <region name="Oceania" population="9133256"></region> <region name="Russian Federation" population="258037209"></region>

The XML_CONCAT function concatenates multiple XML fragments into a single XML pattern:

   SELECT XMLSERIALIZE(CONTENT XMLELEMENT(NAME "region", XMLATTRIBUTES(REGION AS "name"),    XMLCONCAT(       XMLELEMENT(NAME "population", POPULATION),       XMLELEMENT(NAME "area", AREA)    ) ) AS CLOB) FROM DEMOGRAPHICS.REGION;

This is a partial result, formatted for readability:

   <region name="Africa">    <population>789548670</population>    <area>26780325</area> </region> <region name="Asia">    <population>47382633</population>    <area>657741</area></region> ... <region name="South America">    <population>375489788</population>    <area>17545171</area> </region>

The XMLAGG method concatenates with the added option of re-sorting child elements of the XMLAGG function:

   SELECT XMLSERIALIZE(CONTENT XMLAGG(    XMLELEMENT(NAME "region", XMLATTRIBUTES(REGION_ID AS "id"),       XMLELEMENT(NAME "name", REGION),       XMLELEMENT(NAME "population", POPULATION),       XMLELEMENT(NAME "area", AREA)    ) ORDER BY POPULATION DESC ) AS CLOB) FROM DEMOGRAPHICS.REGION;

This is a partial result showing regions sorted in order of descending population size for each region:

   <region id="7">    <name>Far East</name>    <population>  2100636517  </population>    <area>15357441</area> </region> <region id="9">    <name>Near East</name>    <population>1499157105</population>    <area>4721322</area> </region> ...

The XMLFOREST method allows you to create multiple XML elements in a single method. The following query creates multiple elements by selecting fields. It is not creating each field as an individual element:

   SELECT XMLSERIALIZE(CONTENT    XMLELEMENT(NAME "region", XMLATTRIBUTES(REGION_ID AS "id"),       XMLFOREST(REGION AS "name", POPULATION AS "population", AREA AS "area") ) AS CLOB) FROM DEMOGRAPHICS.REGION;

This is a partial result, formatted here for readability:

   <region id="1">    <name>Africa</name>    <population>789548670</population>    <area>26780325</area> </region> <region id="2">    <name>Asia</name>    <population>47382633</population>    <area>657741</area> </region> ... <region id="13">    <name>South America</name>    <population>375489788</population>    <area>17545171</area> </region>

Working with XML in DB2 CLOB Objects

XML documents can be stored into a DB2 database using CLOB objects. As you already know, a CLOB object is simply a text object stored in binary form. Thus, a CLOB object allows storage of very large text strings into a DB2 database, in a field, in a table. Lets begin by creating a table containing a CLOB object field:

   CONNECT TO NEWDEM; CREATE TABLE DEMOGRAPHICS.XML(XML CLOB(5M)); COMMIT;

A simple method of loading an XML document into a DB2 database table is to use the LOAD command. Using the LOAD command might look something like this:

   LOAD FROM '<path>\demographicsCLOB.xml' OF ASC INSERT INTO DEMOGRAPHICS.XML(XML);

Unfortunately, the LOAD command appears to be somewhat dated and functions only when attempting to upload a delimited file, or a file allowing locational specifications ( positions of columns in a line). The XML document I am attempting to load is all a single line and the LOAD command is inappropriate.

Loading small strings is not an issue, as in the following command:

   INSERT INTO DEMOGRAPHICS.XML(XML) VALUES('<root><region id="1"><name>Africa</name>< population>789548670</population><area>26780325</area></region><region id="2"><name >Asia</name><population>47382633</population><area>657741</area></region></root>');

Loading an XML document of a few gigabytes as a string might be a slight problem using a command like that shown in the preceding code.

Also, using the LOAD or IMPORT options within the DB2 Control Center tool is limited by a number of bytes for each line. It appears that the only way to load a large XML document into a table, containing a CLOB field, is by inserting a string. The DB2 XML Extender software is far too complex for this book, and thus importing a very large XML document into a DB2 database is a little problematic , as shown in Figure E-2.

Figure E-2: Loading large XML documents into DB2 is a little problematic.

A shell SQL tool could probably be used to resolve this issue, but it might be best to comply with the 2GB size limitation for DB2 CLOB objects. What I will do is to recreate the XML table with a primary identifier as shown in the next section of code. Yes, a command tool could be used, but there is another way that will comply with the size limit on CLOB objects of 2GB. The first thing to do is to recreate the XML table with a primary key field:

   CONNECT TO NEWDEM; DROP TABLE DEMOGRAPHICS.XML; CREATE TABLE DEMOGRAPHICS.XML(    ID INTEGER PRIMARY KEY NOT NULL,    XML CLOB(2M)); COMMIT;

Second, the XML data can be split into regions and each region can be added to a separate record. The result will be all regions stored in separate records as regional fragments of the entire demographics XML document:

   INSERT INTO DEMOGRAPHICS.XML(ID, XML) VALUES(2, '<region><name>Asia</name> ... <region>');

Yet another issue with DB2 and using the Command Editor tool is that it allows only a string of 64KB maximum. Only two of the regions in the demographics XML documents are less than 64KB. Only the region of Asia can be added to the database using the DB2 database CLOB field (in the XML table created previously) for the purposes of later testing, as shown here:

   INSERT INTO DEMOGRAPHICS.XML(ID, XML) VALUES(2, '<region><name>Asia</name> ... </regi on>');

All the regional XML fragments are available on the website specifically for the DB2 database; the URL is http://www.oracledbaexpert.com/oracle/beginningxmldatabases/index.html .

Retrieving XML data is a simple matter of reading a CLOB object from a table. Additionally, the only method of changing XML document content is to read, change, and then update the entire CLOB object. Further demonstration is not required.