XML Data as an Object Structure

A discussion of the relational and object data models, and specifically the object data model, should tell you the following about XML:

  • An XML document can be flat-structured as a single-layer hierarchy report from a relational database.

  • Allowing an XML document to become a multiple-layered, hierarchical structure, which is somewhat akin to the object data model, allows much more flexibility.

Lets say for example that you want to find a single country from the XML document shown in Figure 8-4.

image from book
Figure 8-4: A properly structured multiple-layered hierarchical XML document

So what if you want to find New Zealand? Your algorithm can search through the structure of the XML document tree, stopping the search when it has found the country you are looking for. In Figure 8-4, a tree scan or depth-first search would parse the XML document something like this:

  1. Start at the root node, which is <planet name ="earth"> .

  2. Dismiss the first region at the region level because you know that New Zealand is not in Africa and so the element <region name="Africa"> is examined but all its subtree nodes ( countries ) will be ignored.

  3. The second region is <region="Australasia"> , which is the region you are looking for.

  4. Then you ignore the first country because it is not New Zealand, but it is <country="Australia"> .

  5. The second country in Australasia is the country you are looking for and thus New Zealand will be retrieved.

Important 

The most important point is that everything else, all countries, are ignored! Its just like searching an index.

The way you can most efficiently perform the preceding steps is to search, or parse, the XML document using an expression that directs the search for the correct country, within the correct region. The @ sign is used to denote use of an attribute value:

   /planet/region/region[@name=Australasia]/country[@name=New Zealand]   

The preceding expression is pseudocode. This means it wont work as it is in any particular environment.

So now lets use similar information, as shown in the XML document of Figure 8-4, but this time using a single hierarchical layer, flattened relational structure, as shown in Figure 8-5. The following query reduces the number of fields returned for the XML document shown in Figure 8-4, because there are just as many fields in Figure 8-4:

   SELECT '   <record region="'R.REGION    '" r_population="'R.POPULATION'" r_area="'R.AREA    '" country="'C.COUNTRY    '" c_population="'C.POPULATION'" c_area="'C.AREA'"></record>' FROM REGION R JOIN COUNTRY C ON(C.REGION_ID = R.REGION_ID) WHERE C.CURRENCY IS NOT NULL ORDER BY C.POPULATION;   
image from book
Figure 8-5: A poorly structured, single-layered XML document, sharing starting and editing records only

I have also re-sorted the results in the preceding query, essentially mixing up countries and regions into a random order, and Figure 8-4 has the <xml> tag processing instruction included.

In actuality, there are two layers in the XML document shown in Figure 8-5. The first layer is the root node. Everything else is in the second layer.

Now imagine a program searching through the XML document shown in Figure 8-5. Lets say you wanted to find information on the two countries of China and Gambia. The XML document is not sorted according to the names of countries. In the extreme cases of China and Gambia, the document must be parsed from start to finish looking for both countries. Not using a tree-like hierarchical structure is extremely inefficient. There is no high-performance tree-searching algorithm for this example because there is no tree. A full scan is required for two reasons:

  • Records are not necessarily sorted in the order required.

  • Countries are not necessarily contained within regions. If you want to find a single region you would have to read the entire document, unless of course you knew beforehand exactly how many countries were in a particular region.

The benefit of using a hierarchical structure within XML documents should now be clear. And because an XML document hierarchical structure is somewhat equivalent to an object data model, it is an advantage to have even just a brief understanding of the object data model. Of course, XML does not require a specific structure because it is built to be as flexible as possible. However, because of the efficient nature of building tree structures into XML documents, the benefits should be plain to see.

So how can an XML document be compared with that of an object structure? Consider the following:

  • Every node in an XML document can contain a collection of other nodes.

  • Every node can have attributes further describing itself.

  • There are no methods attached to XML elements although something like eXtensible Style Sheet (XSL) templating could possibly be loosely considered as such.

  • There is no capacity for inheritance in basic XML documents because XML documents are data not metadata. Classes are very definitely a basic metadata structure, and object modeling inheritance occurs at the class level.

Try It OutUsing Flat and Hierarchical Data
image from book

The following query executes against a relational database, producing a flattened structure of records from five tables. Alter the XML data to a hierarchical, properly object-structured XML document, and execute the result as an XML document in a browser:

   SELECT r.REGION, c.COUNTRY, l.LANGUAGE, p.YEAR,    SUM(pl.FEMALE) AS Females, SUM(pl.MALE) AS Males FROM REGION r JOIN COUNTRY c ON (c.REGION_ID = r.REGION_ID)    JOIN POPULATION p ON (p.COUNTRY_ID = c.COUNTRY_ID)       JOIN POPULATIONBYLANGUAGE pl ON (pl.POPULATION_ID = p.POPULATION_ID)          JOIN LANGUAGE l ON (l.LANGUAGE_ID = pl.LANGUAGE_ID) WHERE pl.FEMALE > pl.MALE AND p.YEAR BETWEEN 1981 AND 1985 GROUP BY r.REGION, c.COUNTRY, p.YEAR, l.LANGUAGE ORDER BY r.REGION, c.COUNTRY, l.LANGUAGE, p.YEAR;   

This is the result, which has flattened out regions, countries, population, and languages from the demographics relational database model:

   REGION           COUNTRY          LANGUAGE               YEAR    FEMALES      MALES ---------------- ---------------- ---------------- ---------- ---------- ---------- Africa           Mauritius        Bhojpuri               1983      99467      97609 Africa           Mauritius        French                 1983      19330      16888 Europe           Finland          Czech                  1985         42         36 Europe           Finland          Estonian               1985        330        102 Europe           Finland          Finnish                1981    2323676    2177310 Europe           Finland          Finnish                1985    2371522    2225386 Europe           Finland          Icelandic              1985         21         16 Europe           Finland          Korean                 1985         17         13 Europe           Finland          Norwegian              1985        123        115 Europe           Finland          Polish                 1985        274        236 Europe           Finland          Russian                1985       1463        800 Europe           Finland          Siamese                1985         48          7 Europe           Finland          Swedish                1981     156237     143913 Europe           Finland          Swedish                1985     154975     144123 Far East         Nepal            Limbu                  1981      65318      63916 Far East         Nepal            Magar                  1981     107247     105434 Far East         Nepal            Sunwar                 1981       5509       5141 Middle East      Israel           Arabic                 1983     334885     332925 Middle East      Israel           Bucharian              1983       2740       2675 Middle East      Israel           Bulgarian              1983       9775       8215 Middle East      Israel           Czech                  1983       1475       1175 Middle East      Israel           Dutch                  1983       2080       1480 Middle East      Israel           French                 1983      60835      46500 Middle East      Israel           Georgian               1983      10545      10065 Middle East      Israel           German                 1983      37215      24195 Middle East      Israel           Greek                  1983       1550       1425 Middle East      Israel           Hungarian              1983      23710      18055 Middle East      Israel           Indian                 1983       7485       6825 Middle East      Israel           Italian                1983       4225       3650 Middle East      Israel           Kurdish                1983       3695       3470 Middle East      Israel           Persian                1983      22420      20575 Middle East      Israel           Polish                 1983      23705      15370 Middle East      Israel           Portugese              1983       2340       1930 Middle East      Israel           Rumanian               1983      63730      52960 Middle East      Israel           Russian                1983      56565      44500 Middle East      Israel           Serbian                1983       1640       1335 Middle East      Israel           Spanish                1983      49805      42560 Middle East      Israel           Turkish                1983       5175       5050 Middle East      Israel           Yiddish                1983     101445      87775 North America    Canada           English                1981    7521960    7396495 North America    Canada           French                 1981    3178190    3070905 North America    Canada           German                 1981     266770     256085 North America    Canada           Ukrainian              1981     148570     143695 South America    Paraguay         Castellano             1982      91431      75010   

Use a simple SELECT statement to produce a flat-structured XML document with two levels, including the root node. After that, edit the document manually such that all appropriate layers are included, making the XML document a hierarchical structure of more than two layers. This is how:

  1. Edit the preceding SELECT statement and put all fields into a single record, for each row:

       SELECT '   <record region="'r.REGION'" country="'c.COUNTRY    '" language="'l.LANGUAGE'" year="'p.YEAR    '" females="'SUM(pl.FEMALE)'" males="'SUM(pl.MALE)'"></record>' FROM REGION r JOIN COUNTRY c ON (c.REGION_ID = r.REGION_ID)    JOIN POPULATION p ON (p.COUNTRY_ID = c.COUNTRY_ID)       JOIN POPULATIONBYLANGUAGE pl ON (pl.POPULATION_ID = p.POPULATION_ID)          JOIN LANGUAGE l ON (l.LANGUAGE_ID = pl.LANGUAGE_ID) WHERE pl.FEMALE > pl.MALE AND p.YEAR BETWEEN 1981 AND 1985 GROUP BY r.REGION, c.COUNTRY, p.YEAR, l.LANGUAGE ORDER BY r.REGION, c.COUNTRY, l.LANGUAGE, p.YEAR;   
  2. Store the output in a file and call it whatever you like. I will call it fig0806.xml.

  3. Edit the XML file you just created. Add these two lines to the beginning of the file:

       <?xml version="1.0"?> <planet name="earth">   
  4. Add this line to the end of the XML file:

       </planet>   
  5. In a browser, execute the XML file you just created and edited. The result should look something like that shown in Figure 8-6.

    image from book
    Figure 8-6: Another poorly structured, single-layered XML document

  6. The next stage is to restructure the XML file shown in Figure 8-6 into a multiple level hierarchy XML document. The result is shown in Figure 8-7.

    image from book
    Figure 8-7: Another properly structured, multiple-layered XML document

How It Works

You created a complex SELECT statement retrieving a number of fields from five different related tables. You created a flat-structured XML document, and from that manually altered that XML document to a hierarchical structure, similar to that shown in Figure 8-7.

image from book
 

This chapter has attempted to briefly explain the object data model, and then apply some facets of that model to the hierarchical structure of XML documents. I hope this approach has helped to explain both topics and, in particular, the benefits of utilizing a hierarchical structure in XML documents rather than a flat structure.



Beginning XML Databases
Beginning XML Databases (Wrox Beginning Guides)
ISBN: 0471791202
EAN: 2147483647
Year: 2006
Pages: 183
Authors: Gavin Powell

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net