Using XML and DB2

 <  Day Day Up  >  

Another newer technology vexing DBAs these days is XML. XML is getting a lot of publicity and, if you believe everything you read, then XML is going to solve all of our interoperability problems, completely replace SQL, and possibly even deliver world peace . Realistically, XML won't do any of those things. But XML does allow designers to create their own customized tags, thereby enabling the definition, transmission, validation, and interpretation of data between applications and organizations.

XML stands for Extensible Markup Language. Like HTML, XML is based upon SGML (Standard Generalized Markup Language). HTML uses tags to describe how data appears on a Web page. But XML uses tags to describe the data itself. XML retains the key SGML advantage of self-description , while avoiding the complexity of full-blown SGML. XML allows tags to be defined by users that describe the data in the document. This capability gives users a means for describing the structure and nature of the data in the document. In essence, the document becomes self-describing .

The simple syntax of XML makes it easy to process by machine while remaining understandable to people. Once again, let's use HTML as a metaphor to help us understand XML. HTML uses tags to describe the appearance of data on a page. For example the tag, "text", would specify that the "text" data should appear in bold face. XML uses tags to describe the data itself, instead of its appearance. For example, consider the following XML describing a customer address:

 

 <CUSTOMER>    <first_name>Craig</first_name>    <middle_initial>S.</middle_initial>    <last_name>Mullins</last_name>    <company_name>BMC Software, Inc.</company_name>    <street_address>2101 CityWest Blvd.</street_address>    <city>Houston</city>    <state>TX</state>    <zip_code>77042</zip_code>    <country>USA</country> </CUSTOMER> 

XML is actually a meta-language ”that is, a language for defining other markup languages. These languages are collected in dictionaries called Document Type Definitions (DTDs). The DTD stores definitions of tags for specific industries or fields of knowledge. So, the meaning of a tag must be defined in a "document type declaration" (DTD), such as:

 

 <!DOCTYPE CUSTOMER [ <!ELEMENT CUSTOMER (first_name, middle_initial, last_name,            company_name, street_address, city, state,            zip_code, country)> <!ELEMENT first_name (#PCDATA)> <!ELEMENT middle_initial (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ELEMENT company_name (#PCDATA)> <!ELEMENT street_address (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zip_code (#PCDATA)> <!ELEMENT country (#PCDATA)> ] 

The DTD for an XML document can either be part of the document or stored in an external file. The XML code samples shown are meant to be examples only. By examining them, you can quickly see how the document itself describes its contents.

For data management professionals, this is a plus because it eliminates the trouble of tracking down the meaning of data elements. One of the biggest problems associated with database management and processing is finding and maintaining the meaning of stored data. If the data can be stored in documents using XML, the documents themselves will describe their data content. Of course, the DTD is a rudimentary vehicle for defining data semantics. Standards committees are working on the definition of the XML Schema to replace the DTD for defining XML tags. The XML Schema will allow for more precise definition of data, such as data types, lengths, and scale.

The important thing to remember about XML is that it solves a different problem than HTML. HTML is a markup language, but XML is a meta-language. In other words, XML is a language that generates other kinds of languages. The idea is to use XML to generate a language specifically tailored to each requirement you encounter. It is essential to understand this paradigm shift in order to understand the power of XML.

NOTE

XSL, or Extensible Stylesheet Language, can be used with XML to format XML data for display.


In short, XML allows designers to create their own customized tags, thereby enabling the definition, transmission, validation, and interpretation of data between applications and between organizations. So, the most important reason to learn XML is that it is quickly becoming the de facto standard for application interfaces.

There are, however, some problems with XML. The biggest problem with XML lies largely in market hype. Throughout the industry, there is plenty of confusion surrounding XML. Some believe that XML will provide metadata where none currently exists, or that XML will replace SQL as a data access method for relational data. Neither of these assertions is true.

There is no way that any technology, XML included, can conjure up information that does not exist. People must create the metadata tags in XML for the data to be described. XML enables self-describing documents; it doesn't describe your data for you.

Moreover, XML doesn't perform the same functions as SQL. As a result, XML can't replace it. As the standard access method for relational data, SQL is used to "tell" a relational DBMS what data is to be retrieved. XML, on the other hand, is a document description language that describes the contents of data. XML might be useful for defining databases, but not for accessing them.

DB2, as well as most of the other popular DBMS products, now provides built-in support for XML. By integrating XML into DB2 databases, you can more directly and quickly access the XML documents, as well as search and store entire XML documents using SQL. Integration can involve simply storing XML in a large VARCHAR or CLOB column or breaking down the XML into multiple columns in one or more DB2 tables.

When you store or compose a document, you can use the capabilities of the DB2 XML extender, which provides functions and tools to help integrate XML with DB2. The user -defined distinct types XMLVARCHAR , XMLCLOB , and XMLFILE are delivered to help store XML data in DB2 tables.

graphics/v8_icon.gif

DB2 also provides several functions to facilitate XML storage, retrieval, extraction, and updating. These functions are documented in Chapter 3, "Using DB2 Functions."


XML Access Methods

When storing XML documents in DB2, you will have to decide on either the column or collection access method. Once you choose a method you will need to tell DB2 how you plan on storing and retrieving the documents. This is done through a Data Access Definition (DAD) file. The DAD is a file used by DB2 to define the access method needed to process an XML document as either a column or a collection.

XML Column Access Method

The column access method stores and retrieves XML documents as DB2 column data. This means the entire XML document is stored in a single DB2 column.

Consider choosing this method in the following situations:

  • The XML document already exists and you just want to store it in DB2 for integrity, archival, or auditing purposes.

  • The XML document is usually read, but not modified.

  • The document has large text paragraphs and you want to use DB2 Text Extender capabilities to search the document.

  • You want to store the XML document in the native file system and use DB2 to manage and search the documents.

XML Collection Access Method

The column access method composes XML documents from a collection of DB2 tables and columns, or decomposes XML documents into a collection of DB2 tables and columns. This means that the XML document is broken apart and stored in various tables and columns.

Consider choosing this method in the following situations:

  • You have an existing database that you would like to compose XML documents from based on a given document type definition (DTD).

  • The data will need to be accessed using SQL and treated like relational data more than it will need to be treated like XML data.

  • The XML document needs to be updated often and performance is critical.

The Data Access Definition File

The DAD file is an XML document that is used to map the structure of an XML document to how it is stored in DB2. The DAD file is used when storing XML documents in a column and when composing or decomposing XML data.

This is where DB2 stores the information pertaining to the method of XML storage (column versus collection).

XML-DB2 Administration

DB2 also provides some administration tools. But remember, the XML Extender requires IBM USS (Unix System Services). But, you can use either an administration wizard from a Windows or Unix client, or from z/OS using the USS command line and HFS, or TSO. You can avoid using a non-mainframe platform for XML administration tasks by using either the USS command line and odb2 , or TSO and batch to perform administration tasks.

The XML Extender installation creates sample files and executable files in partitioned data sets. After these partitioned data sets are installed, it is recommended that you create HFS files in your USS environment by running the DXXGPREP batch job. The DXXGPREP JCL executes the needed BIND s, creates sample DB2 tables, and copies sample files to HFS.

Keep in mind that all the XML Extender facilities supplied for application programs run in the z/OS MVS environment as stored procedures or user-defined functions (UDFs). Some of the UDFs that refer to the XMLFILE data type require access to an HFS system.

When you use the odb2 command line to enter DB2 commands from USS, DB2 uses the Call Attach Facility to execute dynamic SQL and commands from the z/OS Unix shell.

To start the odb2 command line, simply type the following from the USS command shell:

 

 odb2 

A command prompt will be displayed and you will be able to enter DB2 commands.

XML Administration Support Tables

The XML Extender creates two administration support tables to manage your XML-DB2 environment.

The first table is the DTD repository table, or DB2XML.DTD_REF . This table is used to validate XML data in an XML column or XML collection.

Each DTD in the DTD_REF table has a unique ID. The XML Extender creates the DTD_REF table when you enable a database for XML. The columns of the DTD repository table are as follows :

DTDID ” Used to identify the DTD. This primary key is defined as VARCHAR(128) , must be unique, and is not nullable. It must be the same as the SYSTEM ID on the DOCTYPE line in each XML document, when validation is used.

CONTENT ” Contains the content of the DTD in an XMLCLOB column.

USAGE_COUNT ” Contains the number ( INTEGER ) of XML columns and XML collections in the database that are using this DTD to define a DAD .

AUTHOR ” Specifies the author of the DTD. The column is defined as VARCHAR(128) and is optional.

CREATOR ” Specifies the user ID that does the first insertion ( VARCHAR(128) ).

UPDATOR ” Specifies the user ID that does the last update ( VARCHAR(128) ).

The second XML administration support table is the XML usage table, or DB2XML.XML_USAGE . This table stores common information for each XML-enabled column.

The XML Extender maintains the XML_USAGE table as XML is stored in your DB2 tables. The columns of the XML-USAGE table are as follows:

TABLE_SCHEMA ” Contains the schema name of the user table that contains an XML column (for XML column) or a value of DXX_COLL as the default schema name (for XML collection).

TABLE_NAME ” Contains the name of the user table that contains an XML column (for XML column) or a value DXX_COLLECTION , which identifies the entity as a collection (for XML collection).

COL_NAME ” Contains the name of the XML column or XML collection. It is part of the composite key along with the TABLE_NAME .

DTDID ” Contains a string associating a DTD inserted into DTD_REF with a DTD specified in a DAD file. The value of this column must match the value of the DTDID element in the DAD .

DAD ” The content of the DAD file that is associated with the column or collection.

ROW_ID ” An identifier of the row.

ACCESS_MODE ” Specifies which access mode is used: 1 indicates XML collection; indicates XML column.

DEFAULT_VIEW ” Stores the default view name if there is one.

TRIGGER_SUFFIX ” Used for unique trigger names .

VALIDATION ” Will be 1 for yes, or for no.

XML-DB2 Guidelines

Consider the following guidelines as you embark on using XML with your DB2 databases.

Use the Sample Scripts to Get Started

IBM supplies a set of sample scripts you can use as a guideline for setting up your DB2-XML environment. You can find these files in the following directory:

 

  dxx_install  /samples/cmd 

For dxx_install , substitute the USS directory name where the sample DTD, DAD, and XML files were copied by your DXXGPREP job.

Validate Your XML

You can use a DTD to validate XML data in an XML column. The XML Extender creates a table in the XML-enabled database, called DTD_REF . The table is known as the DTD reference and is available for you to store DTDs. When you validate XML documents, you must store the DTD in this repository.

Backup the XML Administration Support Tables

The XML administration support tables, XML_USAGE and DTD_REF , are required for many XML Extender activities. Be sure that you consistently backup these tables to. ensure recoverability of your XML-DB2 environment.

Learn All You Can About XML

Before you begin to mix XML and DB2, be sure that you have a solid grasp of XML. The short introduction in this section is merely the tip of the iceberg. You will need to understand that XML is truly hierarchical and, as such, will not match up well with your relational, DB2 way of thinking and processing data.

Additional information on XML and the DB2 XML Extender can be found at the following Web sites:

http://www.ibm.com/software/data/db2/extenders/xmlext/index.html

http://www.ibm.com/developer/

http://www.sswug.org/centerxml/

http://www.oasis- open .org

http://www.xml.org

 <  Day Day Up  >  


DB2 Developers Guide
DB2 Developers Guide (5th Edition)
ISBN: 0672326132
EAN: 2147483647
Year: 2004
Pages: 388

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net