Chapter 8. A Native XML DBMS

Jim Tivy

The term "native XML database" is not a precise technical term, and different people define it in different ways. Essentially, it refers to a database designed specifically for the storage and retrieval of XML documents, and the term is used to contrast such a system with a database that merely provides an XML interface to data whose intrinsic data model is something different. In the early days of relational databases, every database vendor tried to jump on the relational bandwagon by selling their old products with a new relational veneer. The relational purists responded with a set of criteria that distinguished a true relational database from one that was merely dressed up to look like one.

The thinking behind the term "native XML database" is similar, but there is still no definitive set of criteria to distinguish one. In the absence of such a definition, I intend to outline some of the features that users can expect to find in a native XML database, in particular those that distinguish it from a relational or object database that has been given an XML veneer. XML interfaces to relational databases are useful, but the characteristics of a native XML database differ and frequently provide more functionality, and we need to understand the differences.

Database management systems (DBMSs) have long been a central part of computer applications. A database can be defined as being a managed collection of data on which users may perform update and query operations. An XML database is thus a managed collection of XML data. A native XML database, if one interprets the term literally, is a database that was born to handle XML data, as distinct from a database that was originally designed for a different purpose, and was only later adapted to handle XML data.

So how does one recognize a native XML database? One test is to look at the programmer interfaces: A DBMS becomes a native XML database when the most natural way for programmers to manipulate data is in XML form. A native XML database is thus defined in terms of external features apparent to the programmer. An XML database implementation will use special internal indexing and storage strategies to deliver external features in the most effective way, but the internal algorithms do not define whether the database is a native XML database, they are just the special sauce that vendors provide to better serve up the external features. Vendors who decide to store the XML data in a relational database are not therefore excluded from the native XML DBMS club; rather, the "nativeness" of the XML DBMS is judged by the external programmer's experience of working with XML at the database interface.

In this view, the main importance of the word native comes down to how the data in the DBMS is modeled to the programmer. The model must respect the structure of XML as defined by the XML data model and, more precisely, the XQuery data model. In addition, the DBMS must work easily with other XML technologies. Alongside this deep respect for the XML data model, an XML database must provide all the features expected of any DBMS, such as transactions, recovery, and multithreading, as well as others described below. A native XML DBMS is thus a blending of features that expose and capitalize on the XML data model, XML technologies, and features generally expected from a DBMS.

One of the best ways to distinguish a native XML database from a database that merely offers an XML veneer is to examine the fidelity with which it can store an XML document ”specifically, the extent to which documents can be retrieved in exactly the form in which they were stored. Different systems are surprisingly variable in this respect. Some systems are capable of retaining the exact physical representation of an XML document, down to its use of internal and external entities, character encoding, and details such as the choice of quotation marks or apostrophes around attribute values. A more typical level of fidelity, exhibited by products such as Software AG's Tamino, is at the level of the logical XQuery data model. Such systems do not retain all the lexical details of the original document (they may, for example, reorder attributes and expand XML entities), but they do retain the full textual content of each document. By contrast, systems that provide XML interfaces only as a wrapper to some other data model may well store data internally using normalized representations of values such as dates and numbers . When the document is retrieved, it may then be missing details such as leading zeroes on a number, or comments in the middle of a list of dates. These details may not matter to some applications, but for others they can be crucial. For example, comments in the document might have been used to indicate where the document author was uncertain about the facts.

As well as the interfaces for storage and retrieval of data, a native XML database also uses XML-based interfaces for data description: typically, the structure of the database is defined in terms of a set of XML schemas to which the documents in the database must conform.

This chapter discusses the features of native XML databases in general, illustrating them with examples taken from actual XML DBMS implementations . Many examples come from the XStreamDB database because I am familiar with it inside and out. My aim is to define what the primary features of a native XML DBMS are and then to discuss these features with some reference to internal implementation details. In particular, I suggest that a native XML database is characterized by the pervasive use of the XQuery language and the XQuery data model at all levels of the system.



XQuery from the Experts(c) A Guide to the W3C XML Query Language
Beginning ASP.NET Databases Using VB.NET
ISBN: N/A
EAN: 2147483647
Year: 2005
Pages: 102

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net