XML and the Database | Beginning XML Databases (Wrox Beginning Guides)

This section examines XML and Oracle Database in a number of ways. First, you see how to create new XML documents in the database. Second, you see how to retrieve XML documents stored in the database, both in whole and in part. Third and finally, this section explores how to change XML documents stored in the database.

New XML Documents

The following command creates a table to store XML documents, within an Oracle database:

   CREATE TABLE XML(    ID NUMBER PRIMARY KEY NOT NULL,    XML XMLType);

There are various methods for adding XML data to a database:

An XML document string can be added as a CLOB object.
It can be type cast as an XMLType data type from a string.
It can be added using SQL/XML functions.
As you saw in the previous section, an XML document was added into a CLOB text binary object in a table called TMP_CLOB, using a procedure containing the DBMS_XMLGEN package.

This command reads the row already stored in the TMP_CLOB table (refer to the last example in the previous section PL/SQL and XML) into the new table:

   INSERT INTO XML(ID, XML) SELECT 1, XMLType(OBJ) FROM TMP_CLOB;

The preceding INSERT statement has added the CLOB object in the TMP_CLOB table into the XMLType data type field (XML) in the table called XML.

Now lets add the XML document form of the demographics database into the Oracle database (see Appendix B for the document called demographics .xml). Its done as before, by adding the text string of the entire database into a CLOB variable, inserting into the TMP_CLOB table, and finally inserting into the XML table by reading from the TMP_CLOB table.

Appendix B contains a website reference to scripts for generating XML data from the demographics relational tables in an Oracle Database. It also contains a script for generating the demographics.xml XML document, into an Oracle Database XML data type, using a CLOB variable. The script is stored in a zip file on the website. The website belongs to me!

You can expedite examples for this chapter in an Oracle database. Please download and execute Scripts for Oracle Database as indicated in Appendix B . The XML version of the demographics database (demographics.xml) does not include languages and occupations because the XML document would be too large.

   INSERT INTO XML(ID, XML) SELECT 2, XMLType(OBJ) FROM TMP_CLOB; COMMIT;

Now you have an XML document version of the entire demographics database stored in an XMLType in a table in that database.

Retrieving from XML Documents

Once XML documents are stored inside XMLType fields in tables in the database, they are fairly simple to retrieve using simple SQL with some added functions. A simple query to retrieve the demographics XML document from the database like this next one will dump the data to the display:

   SELECT XML FROM XML WHERE ID=2;

This is a partial result:

   <?xml version="1.0"?><demographics><region id="1"><name>Africa</name><population >789548670</population><area>26780325</area><country id="1" code="AG"><name>Alge ria</name><population><year year="1950" population_id="12009" population="889271 8"><births_per_1000>0</births_per_1000><deaths_per_1000>0</deaths_per_1000><migr ants_per_1000>0</migrants_per_1000><natural_increase_percent>0</natural_increase _percent><growth_rate>0</growth_rate></year><year year="1951" population_id="120 10" population="9073304"><births_per_1000>0</births_per_1000><deaths_per_1000>0< /deaths_per_1000><migrants_per_1000>0</migrants_per_1000><natural_increase_perce nt>0</natural_increase_percent><growth_rate>0</growth_rate></year><year year="19 52" population_id="12011" population="9279525"><births_per_1000>0</births_per_10 00><deaths_per_1000>0</deaths_per_1000><migrants_per_1000>0</migrants_per_1000><

Again the EXTRACT function can be used to beautify the output a little with a simple query like this:

   SELECT X.XML.EXTRACT('/*') AS test FROM XML X WHERE X.ID = 2;

Queries using XMLType data type methods do not appear to function properly in Oracle Database unless they are qualified using aliases.

This is a partial result:

   <demographics>   <region id="1">     <name>Africa</name>     <population>789548670</population>     <area>26780325</area>     <country id="1" code="AG">       <name>Algeria</name>       <population>         <year year="1950" population_id="12009" population="8892718">           <births_per_1000>0</births_per_1000>           <deaths_per_1000>0</deaths_per_1000>

What is really interesting is how to extract parts of an XML document to the display. This can be done by pattern matching and parsing through the XML document, searching for what is required.

Pattern matching is a term that refers to searching through text files or objects, trying to find pat terns of characters .

The first thing to do is to briefly summarize the syntax of pattern matching in XML documents stored in an Oracle database:

/ : Search from a node, either the root node or a subtree . If the forward slash is omitted at the start of the search path , then the search begins at the root node. For example, /demographics/region/country/population/year will find all countries that have at least one year entry throughout the entire document, regardless of region or country, depending on how the XML document is searched.
text() : Finds the text value of a node.
//<node>[n] : Finds the n ^th child element in a list of elements.
[ ... ] : Square brackets can be used to qualify expression predicates. What does that mean? For example, /demographics/region/country[name="Australia" or name="Burma"]/population/year will find all year tag subtrees within the countries of Australia or Burma (not both). The fact that these two countries are in two separate regions (Australasia and Asia) is completely irrelevant.
@ : The @ character accesses attribute values as opposed to text values. Australia in the element <name>Australia</name> is a text value. In the tag <country id="46" code="AS">Australia</country> , id and code are both attributes of the country tag with values of 46 and AS , respectively.

Using XMLType Methods to Read XML Documents

The beginning of this chapter described various methods available to the XMLType data type. Now I shall demonstrate using a few of those methods for retrieving data from XML documents.

The EXISTSNODE method returns 1 if a node exists and 0 if a node does not exist. Its syntax is as follows :

   EXISTSNODE (<search-path>)

The following example returns 1 because Australia exists as a country within a region:

   SELECT X.XML.EXISTSNODE('/demographics/region/country[name="Australia"]') FROM XML X WHERE X.ID=2;

This query also returns a value of 1 because both countries exist as part of regions, even though they are part of different regions (Australasia and Asia):

   SELECT X.XML.EXISTSNODE(    '/demographics/region/country[name="Australia" or name="Burma"]') FROM XML X WHERE X.ID=2;

In contrast, the following query returns a negative result (0) because the and operator implies that both Australia and Burma must exist in the same region:

   SELECT X.XML.EXISTSNODE(    '/demographics/region/country[name="Australia" and name="Burma"]') FROM XML X WHERE X.ID=2;

The EXTRACT method returns a string from within an XML document. Where the EXISTSNODE verifies a strings existence, the EXTRACT method returns that string. The string is an XML document tag. That tag can be a single element, the entire document, or a subtree, depending on what is searched for. The syntax is as follows:

   EXTRACT(<search-path>)

This query returns the entire country of Australia, and nothing outside of Australia:

   SELECT X.XML.EXTRACT('/demographics/region/country[name="Australia"]') FROM XML X WHERE X.ID=2;

This is a partial result:

   <country id="46" code="AS">   <name>Australia</name>   <population>     <year year="1950" population_id="748" population="8267337">       <births_per_1000>23.05</births_per_1000>       <deaths_per_1000>9.46</deaths_per_1000>       <migrants_per_1000>18.45</migrants_per_1000>       <natural_increase_percent>1.359</natural_increase_percent>       <growth_rate>3.204</growth_rate>     </year>     <year year="1951" population_id="749" population="8510600"> ... </country>

This query finds all states ( provinces ) in the country of Canada (those that are in this database):

   SELECT X.XML.EXTRACT('/demographics/region/country[name="Canada"]/state') FROM XML X WHERE X.ID=2;

This is the result:

   <state id="1">   <name code="BC">British Columbia</name>   <population>0</population> </state> <state id="2">   <name code="NS">Nova Scotia</name>   <population>0</population> </state> <state id="3">   <name code="ON">Ontario</name>   <population>0</population> </state> <state id="4">   <name code="QB">Quebec</name>   <population>0</population> </state> <state id="53">   <name code="AB">Alberta</name>   <population>0</population> </state>

This query, on the other hand, finds only the <name> tags for each province in Canada:

   SELECT X.XML.EXTRACT('/demographics/region/country[name="Canada"]/state/name') FROM XML X WHERE X.ID=2;

This is the result:

   <name code="BC">British Columbia</name> <name code="NS">Nova Scotia</name> <name code="ON">Ontario</name> <name code="QB">Quebec</name> <name code="AB">Alberta</name>

This example uses the text() method to return only the text values within the tags, and excludes the tags:

   SELECT X.XML.EXTRACT(    '/demographics/region/country[name="Canada"]/state/name/text()') FROM XML X WHERE X.ID=2;

This is the result showing the names of Canadian provinces, excluding both the <name> tags and the code attributes:

   British ColumbiaNova ScotiaOntarioQuebecAlberta

This query finds the code attributes for each Canadian province:

   SELECT X.XML.EXTRACT(    '/demographics/region/country[name="Canada"]/state/name/@code') FROM XML X WHERE X.ID=2;

This is the result showing all Canadian provincial codes, as a single string:

   BCNSONQBAB

And this query finds the code for a single Canadian province (British Columbia):

   SELECT X.XML.EXTRACT( '/demographics/region/country[name="Canada"]/ state[name="British Columbia"]/name/@code') FROM XML X WHERE X.ID=2;

The result will be BC for British Columbia.

Also, where more than one attribute exists all can be retrieved using the * (asterisk wildcard) character:

   SELECT X.XML.EXTRACT('/demographics/region/country[name="Canada"]/@*') FROM XML X WHERE X.ID=2;

The result of the query is 125CA. 125 is the IDD value for Canada, and CA is the country code for Canada, as demonstrated by the following query:

   SELECT X.XML.EXTRACT('/demographics/region/country[name="Canada"]') FROM XML X WHERE X.ID=2;

This is a partial result, highlighting the ID and CODE attributes:

   <country id="125" code="CA">   <name>Canada</name>   <population>     <year year="1950" population_id="953" population="14011422">       <births_per_1000>27.08</births_per_1000>       <deaths_per_1000>9.04</deaths_per_1000>       <migrants_per_1000>0</migrants_per_1000>       <natural_increase_percent>1.804</natural_increase_percent>       <growth_rate>0</growth_rate>     </year>     <year year="1951" population_id="954" population="14330675"> ... </country>

You can also find child elements as if they were elements of a collection. The following finds the third province in the list of provinces in Canada:

   SELECT X.XML.EXTRACT('/demographics/region/country[name="Canada"]/state[3]') FROM XML X WHERE X.ID=2;

This is the result:

   <state id="3">   <name code="ON">Ontario</name>   <population>0</population> </state>

Similarly, collections can be accessed on multiple levels:

   SELECT X.XML.EXTRACT('/demographics/region[10]/country[3]/state[50]') FROM XML X WHERE X.ID=2;

In the demographics database for this book, the fiftieth state is New Jersey and not Hawaii as shown in this result for the preceding query:

   <state id="55">   <name code="NJ" nickname="Garden State">New Jersey</name>   <population>8115011</population> </state>

There are many other built-in XML-oriented functions and methods (attached to classes) available for use in Oracle Database. These are the most significant ones.

The next thing to discover is how to change XML document content when stored in an Oracle database.

Changing and Removing XML Document Content

There are various methods of changing XML document content in Oracle Database, including some complex built-in functions. Without going into too much detail for Oracle Database (there are other databases to deal with in this book), the UPDATEXML function is the easiest method to use. The UPDATEXML function is not an XMLType method, and thus not presented in the XMLType data type section at the beginning of this chapter.

The UPDATEXML function changes and replaces the entire document. For a large XML document this is inefficient but it is very simple to demonstrate. The syntax is as follows:

   UPDATEXML(<XMLType-object>, <search-path>, 'replacement-string').

Lets go and change the code for a province in Canada:

   UPDATE XML SET XML = UPDATEXML(XML, '/demographics/region/country[name="Canada"] /state[name="British Columbia"]/name/@code', 'BU') WHERE ID = 2;

If you run this query, you will see the code changed to BU:

   SELECT X.XML.EXTRACT(    '/demographics/region/country[name="Canada"]/state[name="British Columbia"]/name /@code') FROM XML X WHERE X.ID=2;

It follows that there is no INSERTXML and DELETEXML functionality, apart from within complex built-in functions, which are much too advanced for this book. To delete an element from an XML document, simply find it and set it to NULL . Lets go ahead and delete British Columbia from the XML document. This can be done by setting the entire subtree for the province of British Columbia to NULL , as in the following UPDATE statement:

   UPDATE XML SET XML = UPDATEXML(XML, '/demographics/region/country[name="Canada"]/state[name="British Columbia"]',  NULL  ) WHERE ID = 2;

The following query finds all the provinces still remaining for Canada:

   SELECT X.XML.EXTRACT('/demographics/region/country[name="Canada"]/state') FROM XML X WHERE X.ID=2;

This is the result, indicating a now empty province element ( <state/> ) for British Columbia:

    <state/>  <state id="2">   <name code="NS">Nova Scotia</name>   <population>0</population> </state> <state id="3">   <name code="ON">Ontario</name>   <population>0</population> </state> <state id="4">   <name code="QB">Quebec</name>   <population>0</population> </state> <state id="53">   <name code="AB">Alberta</name>   <population>0</population> </state>

To add the missing node back into the XML document again, I simply update the XML document by adding the entire subtree for British Columbia back into the XML tree again:

   UPDATE XML SET XML = UPDATEXML(XML, '/demographics/region/country[name="Canada"]/state[1]', '<state id="1"><name code=" BC">British Columbia</name><population>0</population></state>') WHERE ID = 2;

In the preceding statement, I can no longer find the state by name so I have to use the collection of states. Reading the data again I get the result but now with British Columbia returned as the first province:

    <state id="1">   <name code=" BC">British Columbia</name>   <population>0</population>  </state> <state id="2">   <name code="NS">Nova Scotia</name>   <population>0</population> </state> <state id="3">   <name code="ON">Ontario</name>   <population>0</population> </state> <state id="4">   <name code="QB">Quebec</name>   <population>0</population> </state> <state id="53">   <name code="AB">Alberta</name>   <population>0</population> </state>

Oracle Database also has comprehensive configuration and management facilities in the form of the Oracle Enterprise Manager Console GUI (Graphical User Interface), and the browser-based Database Control tools. Both of these topics are too advanced for this book.

This chapter has introduced the use of XML from directly within Oracle SQL. XML is vastly more complex and detailed than presented in this chapter, both with respect to XML itself and to Oracle software. The intention of this chapter was to introduce the basic scope of using XML documents both with, and within, Oracle Database. The next chapter examines SQL Server database in the same way.