Interfaces to a Native XML Database

As a query language, XQuery is obviously a key interface offered by a native XML database. But it is not the only interface that is required. XQuery does not say how schemas are registered in a database, how data is loaded and updated, or how physical storage features are organized.

Nor does it define how queries are issued by an application program, or how the results are presented. In this section I review some of the other interfaces that a native XML database system needs to offer.

An eventual objective for many of these interfaces is interoperability, so we examine first the extent to which this can currently be achieved. Then we look at some of the key interfaces offered by a DBMS: the data definition (schema) interfaces, the data update interfaces, and the database configuration interfaces.

Interoperability

In a modern DBMS, programmers expect DBMS features to be delivered in a standard way across vendors ”an idea often called interoperability. The aim of interoperability is to allow the writing of programs that run on multiple database management systems. This is of great interest to vendors building tools such as report writers and viewers that must run against multiple DBMSs, and it also means that programmers need only learn one command language and one API to use a variety of products.

Interoperability has been addressed for relational DBMSs by organizations such as the International Standards Organization with the SQL specification, and by industry consortia like the SQL Access Group , whose specifications eventually fed into the creation of Microsoft's Open Database Connectivity API (ODBC) and the similar Java JDBC interface. Interoperability and standards have in the past been the challenge for relational and object-oriented database management systems; likewise, interoperability is now the challenge for XML database systems. The World Wide Web Consortium (W3C) has spearheaded the standardization of XML and more recently XQuery, and these two specifications form the cornerstones of an interoperable XML DBMS.

At the time of writing, although XQuery 1.0 is quite stable, its scope is narrow, and several important XML DBMS features are still unspecified. As a result, these features vary across vendors. Although this is clearly not a good end result, it is a reality of this early stage in the XML DBMS standardization process. In the future, standards will emerge for key features such as an update language, full-text search language, API, and others, and vendors will probably follow with implementations of these standards.

Even in the end game of standards compliance, such as in the relational DBMS and SQL standardization process, perfect interoperability has never been totally achieved; it is more of a guiding principle.

In the XML world, expectations for interoperability are high. These expectations arise because of the pervasiveness of the World Wide Web. Database systems in the past, however, have achieved only modest levels of interoperability. It remains to be seen to what extent XML databases will change this.

Data-Definition Interfaces

In the history of database technology, there has always been a separation between data definition and data manipulation. The essence of a database as a shared resource is that the data definition is centrally managed, so that different applications can manipulate the data in different ways while still sharing information.

In a native XML database, data manipulation ”at least the retrieval side ”is handled using XQuery, while the data-description requirement is handled using a separate schema language. Typically, an XML Schema as defined in the W3C Schema specification [SCHEMA], though other kinds of schemas such as DTDs [XML] or RelaxNG schemas [RELAXNG] might also be accommodated. This section presents an overview, with several examples, of how schemas can be used in a native XML database.

The first role of a schema is to define constraints on what constitutes a valid document: The schema is used for validity checking. An XML document is said to be valid with respect to a schema if that XML document follows the content model, data typing, and other constraints defined by the schema. Some flavors of schema, such as DTD, express fewer constraints on the data than other flavors, such as W3C XML Schema. For example, a DTD cannot specify that a particular attribute must contain a valid date. There are other kinds of integrity constraints, familiar to relational database users, that cannot be expressed even in XML Schema. The best-known example is co-occurrence constraints (e.g., if one attribute has a particular value, another must be absent). Other important integrity constraints that cannot be expressed in a schema are cross-document constraints, such as a constraint that each document in a collection must have a unique reference number, or that all the hyperlinks in a document must point to other documents that exist within the collection. So XML Schema doesn't satisfy all the needs for validity checking, but it goes a long way toward this goal.

But data description in a database system is not just about validating the input data: The data description is used to define how the data is organized on disk, for checking the correctness of queries, for optimizing queries by choosing the most efficient access paths, and for constructing the data structures used to contain query results. The datatypes defined in the schema also define the semantics of operations such as comparisons and sorting. So the role of schemas in a native XML database is central.

In a relational DBMS the description of a table, its columns , and their types form the schema for that table, and all rows in that table must comply . In a native XML DBMS there is a similar need to constrain collections of XML documents with respect to schemas. The exact relationship between databases, documents, schemas, and queries is not precisely laid down in the current XQuery specifications: The standards have deliberately tried to steer clear of this. For example, some systems might require that all documents in a collection conform to the same schema, others might permit a range of possible schemas, while some systems may be able to search schema-less documents. The XQuery specification allows an XQuery expression to import a schema, and it must import a schema if it makes explicit use of datatypes defined in that schema, but it isn't required to import every schema that might have been used to validate a document in the collection being searched. The flexibility of the possible relationships between queries, documents, and schemas has caused considerable difficulty in defining the W3C specifications, but in a practical implementation, the relationships are likely to be much simpler. The automatic introduction of schema could be as simple as binding a schema to a collection so that when that collection is used, the schema is also introduced. Currently, XML DBMS vendors must decide for themselves how to manage schemas within the DBMS and how to introduce schema information into the XQuery subsystem.

The XQuery specification describes query processing in terms of two phases: the query-analysis phase and query-evaluation phase. These terms are defined in the XQuery 1.0 Language document [XQ-LANG], and I use these terms here. But it is also useful to consider two earlier phases that I will call the database-design phase and the query-authoring phase. These phases occur in the following order:

  1. Database-design phase : This is the time at which design decisions are made about the logical and physical organization of the database.

  2. Query-authoring phase : This is the time a query is written by a user or programmer.

  3. Query-analysis phase : This is the time at which the query processor analyzes the query and prepares the query-execution plan. One of the activities carried out at this time is static type-checking.

  4. Query-evaluation phase : This is the time at which the query-execution plan is executed: The query is thus evaluated to produce a result, which may be any instance of the data model. This time is often called query run-time, and one of the activities carried out at this time is dynamic type-checking.

Database-Design Phase

Part of the database design task for a native XML database is designing the schemas for the documents that the database will contain. This is an art in itself, and a detailed discussion is beyond the scope of this chapter. There are probably two main scenarios: (1) a schema for the documents already exists, and documents are to be stored in the form in which they arrive ; and (2) the schema is custom-designed for the purpose of long- term document storage. The design approach is rather different for each scenario.

Database design, however, is not finished when the schemas are written. Two of the important tasks that remain are the following:

  • Database configuration : This is the task of defining collections or other database partitions and deciding how different types of documents (represented by different schemas) will be allocated to different collections. For more detailed discussion, see "Collections and Storage" below.

  • Index definition : This is the task of deciding which data should be indexed. As with any other database, this task is essential to achieve satisfactory performance. Systems often provide tools to help. For example, when a value index is defined in the XStreamDB Explorer application, the schema is used to display a tree of all the possible index XPaths, so that the user can make a selection. At that point the index is constructed .

Query-Authoring Phase

Schema information can be very helpful when presented through a graphical user interface in a tool used for building queries or configuring the DBMS. In this case the schema can be used for the following:

  • Query building : A user constructing a query through a graphical user interface often finds it convenient to be able to look up long paths to elements or attributes. If schema information is available, such lookup capability is easily provided, and the corresponding XPath can be inserted into the query under construction. Schema-aided query building is very useful in an XML database explorer or XML report design tool.

  • Query explanation : Systems may provide tools to enable query authors to understand how a query will be executed (e.g., which indexes it will use). Often a query will not be viable unless it can make use of indexes, so this information is vital . In Tamino this tool is called explain . In XStreamDB, the query-execution plan is determined at query-execution time and is output to a QueryPlan.html file.

Query-Analysis Time

The task of query analysis is explained in much more detail in other chapters of this book. The main operations carried out at this stage are syntax checking, type checking, and query optimization. Of these, everything except syntax checking makes heavy use of schema information where it is available. The big question here is how flexible the system is in terms of allowing queries against schema-less documents, queries that search documents of more than one type, queries that span multiple versions of the same schema, and so on. Products are likely to vary in such respects.

Query-Evaluation Time

Again, this subject is covered in much more detail elsewhere in this book. Schema information is needed at evaluation time to determine the dynamic types of values (which may be more specific than their statically known types). It may also be needed to validate the result document produced by a query, and perhaps even intermediate working documents. This becomes especially important when update is supported, because the results of a query may produce a new document that is to be stored persistently in the database.

Update Interfaces

Before anyone can run queries against a database, the data must be loaded. A native XML database therefore needs to provide some interface for loading data. This is currently outside the scope of the XQuery specification.

Update commands include the expressions necessary to insert, delete, and update documents in a collection. Commands to insert new documents and delete old documents are fairly easy to implement. The XML document provides a natural unit of granularity for such operations, which may be provided as primitive operations through a client API, quite independently of any query language.

However, document-level update is not sufficient for all applications. It is inefficient to retrieve, delete, and replace a large document just to make one small change, especially if the database is heavily indexed. This becomes even more true when making a small change to every one of a large number of documents (for example, marking a thousand documents with a new attribute, such as a security classification). Performance and storage model considerations make updating of existing documents especially complicated.

In a relational DBMS, updating a row in a table is done as follows:

 UPDATE table SET column=expression [, column=expression]* WHERE predicates 

In this SQL update, you specify the table, the columns to update, the values to use, and a WHERE clause to select the correct rows to update.

The individual SET clauses allow you to say exactly what needs to change in the underlying table row, which means the row is only partially updated. Partial update works well ”especially with long binary columns ”because you avoid disturbing values in columns that are not changing.

XML update similarly requires a way to introduce the document to be updated and a way to update all or parts of a document. To update parts of a document, we must be able to insert, delete, or replace any subcomponent of a document, including element nodes, text nodes, and attribute values. For example, given a collection of Person documents, similar in structure to the example of Person document text at the beginning of this chapter, I might want to change all the $doc/Person/City values to London . This change operation requires a replacement of the current City element values. If, on the other hand, I wanted to remove the City element from all Person documents, then deleting $doc/Person/City would be necessary. Or suppose I have the text of a play, and I want to insert stage directions into it. In this case I have to find the insertion point and then insert a new StageDirection element as needed.

Any update command syntax should make it easy to optimize a partial update of an underlying document in order to reduce indexing and writes to disk ”especially if documents are large. Putting this functionality into XQuery is a challenge. The working group is currently investigating an Update language that will address this challenge. At present, various vendor mechanisms exist for update, some of them based on these proposals. XStreamDB uses XQuery-like syntax; other vendors use a variety of approaches, including an XML-based method called XUpdate [XUPDATE]. For more information, see the upcoming XQuery update language or investigate vendor solutions.

Database Configuration Interfaces

I use the term "database configuration" to describe configuration of a database through the creation of configuration objects such as databases, collections, indexes, stored procedures, triggers , schemas, and users. Although the details of these configuration objects vary from one database system to another, most systems have objects analogous to the ones defined below.

Database

A database is a container for one or more collections. Often a database has a name . In some systems, a database can simply be a collection of collections.

Collection

A collection is a set of XML documents held in persistent storage. In some systems a collection might contain other child collections. Some systems might allow collections to intersect, so that the same document is in several collections. Collections are often associated in some way with schemas. This might be a very simple correspondence (all documents in a collection have the same schema), or something a bit more flexible. See "Collections and Storage" below for further discussion.

Index

An index is concerned with speeding up a certain access path through the data and is used for satisfying conditions in a query. Such conditions typically occur in the where clause of a FLWOR expression or in a predicate of an XPath expression. The forms of indexes vary across vendors because, for many vendors, this is the "special sauce" that makes their implementation faster or better than that of a competitor. Some vendors automatically create and maintain indexes either because certain kinds of queries are being repeatedly done or because indexes are an integral part of structuring the storage in the database. More commonly, however, indexes are created on explicit request.

There are different kinds of indexes used in a Native XML database; the details vary from one product to another. A value index is an index of the typed values that exist for attribute and element nodes. For example, if your XML document has element FirstName with value John , as in

 <FirstName>John</FirstName> 

then you may have a value index that contains the entry ("John",1000) ”where 1000 is a physical or logical address of the corresponding document in the DBMS.

Value indexes are similar to column indexes in SQL databases. They can be selectively created for particular element or attribute values that are often involved in predicates in queries. For example, suppose you have the following query:

 for $d in collection("foo") where $d/Book/Author/LastName = "Date" return $d 

If a value index is defined on data(/Book/Author/LastName) , then it could be used when optimizing the query by replacing the FLWOR expression and its where clause with an index access using the key "Date" .

One variation among systems is likely to be the granularity of indexing. Does the index entry identify only the document that contains the relevant value, or does it identify the specific node? There is a trade-off here between the cost of indexing, the space occupied by the index, and the value of the index for resolving more complex queries. In the simple case, a value index will index a document with simple (key, document id) pairs. In a more complex case, value indexes may index extents of common fragments . For example, if a bibliography system were to use an aggregated storage strategy, then all the authors for all the books might be stored together and might be indexable. In this case the index entry may be (key, author-id) or even (key, author-id, document-id) triplets. Using such triplets allows either the document or the author node to be returned depending on the query return clause. This simple triplet is suggestive of the more general path index used in object-oriented databases.

The structure index of Tamino [TAMINO], a leading XML DBMS, is an example of a second kind of index that indexes the existence of named elements and attributes within the document hierarchy. We can use such an index to locate documents that contain a particular named element type or an element in a particular hierarchic context (e.g., a footnote element within a table element). This kind of index is especially useful where documents are semi-structured ”that is, where the schema is very flexible regarding the structures that it allows to appear in instance documents.

The third type of index that is often encountered in a native XML database is a full-text index, which is used to index words appearing within the content of elements. Such an index is used to optimize queries that search on words using a full-text expression of some sort . Unlike a value index, a full-text index is often applied to all the text in a document, regardless of the element in which it appears. This may be an implementation constraint or it may be done because the user of the full-text index needs to be able to search either the entire document or a particular path in an ad hoc fashion. Even when a full-text index is applied to the complete document, the search scope for that document may be restricted to just one path. For example, I may only want to search the $doc/book/title for the word "Linux," not the entire book. This path-specific search requirement means full-text indexes will often contain path information along with the words so that path-specific queries can be optimized. Additional discussion on full-text search appears later in the chapter.

Stored Functions

In relational databases, stored procedures are often used in large, complex applications as a way of extending the database command language, so that application logic can be executed within the database system instead of at the application level using another programming language like Java or C++. The idea was first introduced to reduce network traffic in client-server systems, but it also gives more scope for a query optimizer to improve access paths. Historically, stored procedures were often written in proprietary languages invented by each vendor (e.g., PL/SQL in the case of Oracle). Since XQuery is a powerful language with its own function-definition syntax, stored XQuery functions fulfill the same role in the case of a native XML database.

Functions in XQuery are probably more important even than stored procedures in a relational database, because they also perform the role of views in relational systems. Just as relational views allow query authors to interrogate the data without knowing all the intricate details of the schema structures, so functions in XQuery can provide a built-in capability to access derived data. This is especially important for navigating relationships, since relationships can be represented in many different ways in a native XML database, and we can mask these differences by using functions.

The XQuery specifications define a syntax for defining and invoking functions, but they do not say how libraries of functions are managed. A native XML database often has interfaces allowing functions to be compiled and stored within the database.

Triggers

Triggers are a special type of stored procedure. They fire when data in the database is changed or configuration events occur, such as dropping a collection.

Schemas

We have already seen that XML schemas are used to support a broad range of features in a native XML database. Schemas are also configuration objects that need to be managed. The way in which schemas are loaded into the DBMS varies from vendor to vendor.

Access Control and Locking

As with any DBMS, an XML DBMS needs concepts such as users, groups, and permissions on resources in order to manage authentication and access authorization. In the absence of any standard, the ways in which users and permissions are defined differ from one XML database to another. A native XML database may offer access control at the level of individual documents, or it may provide control of parts of a document as a resource. This subcomponent resource may in turn be assigned user permissions. In addition, an XML DBMS may define a long-transaction mechanism that allows users to apply long-duration locks to documents or document subcomponents. Although there is no direct connection between the data model and the type of locking that a database needs to support, the applications that use a native XML database often involve the kind of collaborative workflow that requires long-duration locking, in addition to the more conventional transaction-processing models supported by relational databases.

A Database Command Language

The SQL language first introduced the idea of combining data-definition interfaces, database configuration interfaces, and query/update interfaces into a single language ”a concept I will call a database command language. Many modern database management systems have followed this tradition. A native XML database necessarily separates data definition (provided through XML schemas) from data manipulation (XQuery), but there is still scope for unifying multiple functions into a single command language that is processed by the DBMS to query, update, and perform other database configuration tasks. Due to the time it takes to hammer out a standard, the XQuery specification currently defines a command language that is suitable for query only. Other important features of a command language are the following:

  • Database configuration

  • Update capability

  • Full-text search capability

Update and full-text search will probably be addressed within XQuery eventually, though definition of configuration objects may be handled outside the XQuery language. In fact, the XQuery group has currently released working drafts on full-text requirements and full-text use cases.

In XStreamDB we chose to add XQuery command-language extensions for all of the features above. The command syntax and structure in XStreamDB in no way reflect the XQuery working group's direction, but these added commands do serve as an example of such interfaces.

Other native XML databases have implemented this functionality in different ways. Tamino, for example, provides database configuration interfaces through a system management infrastructure designed to provide a single management interface not only for database resources but for other resources such as the networking infrastructure. Tamino implements interfaces for physical database design (e.g., definition of indexes) by means of annotations to XML Schema, using a custom namespace for the annotations to ensure that the schemas remain interoperable.

Database Configuration

The following is a brief sampler of XStreamDB's extensions to XQuery to create, modify, and destroy configuration objects.

Database

A database has 0.. n roots. A database can be created and dropped as follows:

  • Create : CREATE DATABASE DatabaseName

  • Drop : DROP DATABASE DatabaseName

Root

A root in XStreamDB is a collection of documents. Roots can be bound to a schema, in which case all documents within that root must comply with the schema. As well, this schema and its type information become automatically available for that root.

  • Create : CREATE ROOT RootName [WITH SCHEMA SchemaName]

  • Alter : ALTER ROOT RootName [WITH SCHEMA (SchemaName NONE)]

  • Drop : DROP ROOT RootName

Index

XStreamDB supports value indexes definable on selected paths and full-text indexes defined across an entire root.

  • Create value : CREATE [UNIQUE] INDEX IndexName ON RootName PATH PathExpr

  • Create full text : CREATE FULLTEXT INDEX ON RootName

  • Drop value : DROP INDEX IndexName ON RootName

  • Drop full text : DROP FULLTEXT INDEX ON RootName

Trigger

XStreamDB supports triggers that can be set up to fire during operations on documents within a root. These trigger definitions include much of what you would expect.

 CREATE TRIGGER  trigger-name  ON rootname [PRIORITY integer] [(BEFORE  AFTER)] (INSERT  UPDATE  DELETE) DO TriggerExpr 

TriggerExpr is any valid XQuery expression, including update extensions. In the update trigger case, the special variables $existingDocument and $newDocument are available for use within the expression TriggerExpr .

Collections and Storage

The XML document is a central unit of data in the XML world. For this reason, the XML document is often the unit that is sent, received, edited, and stored by an XML application. It follows that a collection within a native XML database is, most naturally, a collection of XML documents. This also means that a collection is a valid instance of the data model ”it is a sequence of document nodes, or at any rate, it is a set of document nodes that can be modeled as a sequence whenever it is accessed.

Given the above definition of a collection, the way collections of XML documents themselves are physically stored may still vary. The size and intended future use of a document may affect the underlying storage model of that XML document in a DBMS. Some applications involve very large documents that may represent, for example, an entire telecommunications network or a procedure manual composed of thousands of pages. Such documents can range from 500KB to many megabytes. In general, as the document becomes larger, queries will tend to drill down into the document to select parts of it, rather than retrieving the whole document.

On the other hand, many applications involve relatively small business documents. The size of an invoice or purchase order like the ones laid out by industry groups may be around 10KB, while other applications may store documents as small as 100 bytes. When relatively small business documents are in use, many queries will probably retrieve the whole document. Indeed, some queries may combine information from many documents into one, or compute aggregate information such as totals over many source documents.

To optimize queries and updates on small portions of large XML documents, a database administrator may be allowed to split a large XML document into fragments ”each fragment often rooted on a particular element type. For example, if I have a user guide composed of multiple level-one sections, enclosed by Sect1 tags, I may choose to store each of the Sect1 sections in separate, contiguous parts of the database, even though the database will logically model these sections as one big document. Storing Sect1 sections separately may speed up queries on Sect1/Title when indexes are present, because not all Sect1 data would have to be read from disk, and not all of the document would have to be filtered in order to find only the Sect1 section that met the query conditions.

Such a fragmentation strategy may also interact with locking: A database system might apply locks at the level of a document fragment, in particular the long-duration locks used to support collaborative authoring features in content management systems. This is related to the techniques used for mapping subobjects in object-oriented database systems [OODB], and similar considerations also apply when mapping XML data to relational storage, as discussed in other chapters of this book.

The Tamino database system uses a similar scheme to map parts of an XML document to external databases or web services. This allows the maintenance of "live" documents; for example, the front page of an employee resume (CV) may be generated on-the-fly from an operational personnel database, while the textual content is maintained in native XML storage.

In summary, although at a logical level a collection is a collection of documents, for a large document, a storage strategy that allows multiple subfragments to be mapped to contiguous sections of disk can give advantages in performance and manageability.

XQuery Client APIs

The XQuery specification defines the form that a query takes, but it doesn't define how an application program issues a query, or how it processes the results. A native XML database isn't useful unless client applications have a way of accessing the XML contents. A programmer must be able to write a program that manipulates the XML managed by the DBMS. Programmers demand access that is both easy and powerful.

The basic features of DBMS access follow:

  • Connections provide the ability to establish a connection from a client program to the database server.

  • Transactions provide transaction control ”begin, commit, rollback, and optionally prepare for two-phase commit.

  • Command execution allows the execution of queries, updates, and other commands in the DBMS command language, if there is such a language. Many systems also provide the ability to compile a query for repeated execution, though some may rely on stored functions for this capability.

  • Result traversal enables traversal of collections or result sets from queries. This may include a cursoring mechanism to allow users to scroll backwards and forwards through a large result set.

  • Data extraction provides a way to move values from DBMS managed data into programming language variables. This is often done through functions that access the query result set, but it can also be done in other ways.

  • Data insertion provides a way to move values from the programming language variables into the DBMS managed data. This is needed partly for updates, to perform add, delete, or modify operations as driven by a programmer. It is also needed to parameterize queries and other commands, allowing values to be passed from the programming language variables into the command-execution environment.

  • Configuration object access provides a way to define database, collection, and index objects in the database as well as a way to query this information.

These features are generally made available through an application programmer interface (API), though some operations might be done within the command language rather than through specific API functions. For example, in XStreamDB, a collection is created through the CREATE ROOT statement, not through a method such as collectionManager.create-Collection("mycol") . In fact, if the database command language is sufficiently powerful, some of the logic of the application can be located in the command language, especially if triggers and stored functions are supported. XQuery is a very powerful language for manipulating XML data, and with the probable addition of stored function libraries, update, and full-text search to XQuery, large applications will rely on XQuery, and XQuery function libraries, to perform some of their application logic.

Over the past ten years the most common and most successful relational database APIs have been Open Database Connectivity (ODBC) and Java JDBC. Both ODBC and JDBC have facilities corresponding to the above list of access features, and they work in tandem with the SQL command language. Since there is no standard XQuery API, and since the central part of the JDBC API is concerned with model-independent concepts such as Connections , Statements , PreparedStatements and ResultSets , it makes sense for a Java-based XQuery API to be designed on similar lines. So, for example, XStreamDB borrows JDBC concepts to provide corresponding XConnection , XStatement , XPreparedStatement and XResultSet interfaces. The method calls in XConnection , XStatement and XPreparedStatement are very similar to methods in JDBC Connection , Statement and PreparedStatement , but the XResultSet interface is rather different, reflecting the differences in the data model. Listing 8.1 is a simple program that performs a query using this XStream DB API.

Listing 8.1 Query Using an XStream DB API: Example Query for Persons in Paris
 import com.bluestream.xdb.*; import com.bluestream.sys.util.FlexStringBuffer; public class Example {    public static void main(String[] args)       throws Exception    {       SystemManager.init();       Server server = SystemManager.getServer("MyServer");       try       {          AuthenticationInfo authInfo = new SimpleAuthInfo();          XConnection xcon = server.getConnection(authInfo);          xcon.beginTransaction();        // also valid: Root('PersonDB:Person')[Person/City='Paris']          String stmtStr =             "for $doc in Root('PersonDB:Person') "+             "where $doc/Person/City = 'Paris' "+             "return $doc";        // note: the Root function is a vendor addition        // to the XQuery function library          XStatement stmt = xcon.createStatement();          XResultSet rs = stmt.execute(stmtStr,XStatement.SF_DEFAULT);          FlexStringBuffer fsb = new FlexStringBuffer();          // traverse result set          rs.beforeFirst();          while(rs.nextValue())          {             rs.fsbGet(fsb, false);             System.out.println(fsb.toString());          }          xcon.commitTransaction();       }       finally       {          SystemManager.shutdown();       }    } } 

The XStreamDB XResultSet object allows you to iterate over the items in the sequence returned by an XQuery expression. This iteration is done using the nextValue() method. So far this is similar to JDBC, except that JDBC allows you to iterate over a row of column values using next () . Since a value returned from XResultSet can be either a document, a fragment, or some other data model value, a different approach is required to extract parts of that XML value into programming language variables. In place of the get{Type} kind of calls in JDBC, such as getBigDecimal(String columnName) , XResultSet has getBigDecimal(String path) which extracts one or several subvalues using the path parameter to drill into the current result set value. This is useful if you want to return a tree of values in one logical result set value. For example, getString("/Person/City/text()") would select the City value out of a current result set value of < Person><City>Paris</City></Person>.

Other ways of getting values from a result set are discussed below under the heading "Getting the Data." Before discussing this more, let's put the XStreamDB API into perspective by introducing another API. The XStreamDB API described above is essentially a traditional API, in the sense that it follows in the tradition of JDBC and ODBC, and it offloads some of its operations onto the XQuery command language.

Another popular API for an XML database is that defined by the xmldb.org group (http://www.xmldb.org/xapi/xapi-draft.html). This has many similarities, though it provides a larger number of method calls and relies less on having a rich command language (in fact, at the time of this writing, it only supports XPath and not XQuery). This API is supported by both the Tamino DBMS, (though not as its primary API) and by the open-source product XIndice. In the xmldb.org API, data is organized into collections composed of resources. Resources may be either XML or binary resources. Listing 8.2 is an example xmldb program that is a modified version of one from the xmldb.org website.

Listing 8.2 Sample xmldb Program
 package examples; import org.xmldb.api.base.*; import org.xmldb.api.modules.*; import org.xmldb.api.*; /**  * Simple XML:DB API example to query the database.  */ public class Example1 {    public static void main(String[] args) throws Exception    {       Collection col = null;       try       {          String driver = "org.vendorx.xmldb.DatabaseImpl";          Class c = Class.forName(driver);          Database database = (Database) c.newInstance();          DatabaseManager.registerDatabase(database);          col = DatabaseManager.getCollection(                   "xmldb:vendorx://sample.com:2030/Person");          TransactionService transaction = (TransactionService)                    col.getService("TransactionService", "1.0");          transaction.begin();          String xpath = "//Person[City='Paris']";          XPathQueryService service = (XPathQueryService)               col.getService("XPathQueryService", "1.0");          ResourceSet resultSet = service.query(xpath);          ResourceIterator results = resultSet.getIterator();          while (results.hasMoreResources())          {             Resource res = results.nextResource();             System.out.println((String) res.getContent());          }          transaction.commit();       }       catch (XMLDBException e)       {          System.err.println("XML:DB Exception occured "             + e.errorCode);       }       finally       {          if (col != null)          {             col.close();          }       }    } } 

Here is a comparison of the XStreamDB and xmldb API:

  • Both of the examples above contain roughly thirty lines of executable code.

  • Although the xmldb API uses an XPathQueryService , an XQueryService could easily be added.

  • Both the XStreamDB API and xmldb.org API expose a set of objects concerned not with the application domain but with the DBMS domain. For example, XStreamDB uses XConnection , XStatement and XResultSet , while xmldb.org uses DatabaseManager , Collection , XPathQueryService , ResourceSet , and Resource . In contrast, application domain objects might be Invoice or Customer .

  • The xmldb.org API typically uses more objects to perform an operation and is more "object-oriented" in that way. For example, the xmldb API has an object for collection, whereas the XStreamDB API uses collection names right in the XQuery command language. In addition, XStreamDB uses XQuery command-language extensions to INSERT , REPLACE , and DELETE documents, whereas the xmldb.org API performs these operations without using a command language.

  • The xmldb.org API is in some ways more conducive to exposing objects to the Internet using a standard mechanism such as WebDAV (Web-based Distributed Authoring and Versioning (http://www.webdav.org), since WebDAV is concerned with collections, possibly nested collections, and resources within those connections, which are constructs similar to those in the xmldb.org API. Making database resources accessible on the Web through HTTP and URI identity is an important option. In fact XStreamDB had to add an additional resource API for this purpose.

  • Since the xmldb.org API exposes XML resources as nodes, this resource type is not inclusive with respect to the XQuery data model. Even if the xmldb API did expose other resources to represent other valid XQuery data model instances, it would lead to lots of resource types, and having a resource object for a hundred thousand String data instances could present object-creation performance problems.

  • In general, middleware objects such as Connection and Statement are not that interesting to an application. Both the XStreamDB and xmldb.org APIs fail on this level, as did relational APIs. In an object-oriented programming environment, it would be better if the database could return Invoices or Purchase Orders, not resources or string buffers (the difference between the application view of the world and the middleware view is often referred to as the impedance mismatch between database systems and object-oriented programming languages). In a native XML database, as defined at the beginning of this chapter, the defining characteristic of the API should be the pervasive use of the XML data model. This is discussed further below.

Getting the Data

Several mechanisms exist for bringing an XML value into object-oriented programming language variables. Among them are the following:

  • Path extraction : Use a path to extract atomic data values from the XML into programming language variables.

  • Emit as events : Emit an XML value as an event stream to an event handler. The event protocol can be the Standard API for XML (SAX) event stream or some other similar event stream.

  • Build a DOM : Put the returned XML into a Document Object Model (DOM) tree where it can be further manipulated by the programming language.

  • Data binding : Load the XML into an application domain object such as an Invoice object.

Another approach is to avoid handling the result data in the application at all. Sometimes, especially when retrieving whole documents, the only thing the application needs to do with the data is to present it to the user, often via a browser. Such applications are often invoked as servlets running within a web server. A convenient approach in such cases is for the application to pass the query results straight to an XSLT stylesheet (which might be processed either on the server or in the browser) that transforms the results to HTML for display. In this scenario, there is no need to map the query results to the data structures available in the application programming language. The fact that XSLT and XQuery share the same data model makes this a particularly attractive option.

The problem of moving data from a DBMS to a program has been termed an impedance mismatch problem. This impedance mismatch occurs between application domain objects, such as an Invoice or Purchase Order, and the XML data values stored in the DBMS. In object-oriented DBMSs this problem was solved by having the database API return application domain objects such as Invoices and by having programming language objects map directly into DBMS managed data using DBMS data identifiers. This close relationship between the programming language and the database was usually called the binding of the language to the database, and object databases were the king of this close binding [ODMG].

Binding XML data into object-oriented programming languages can be accomplished, however, by generating objects that can load themselves from XML data and save themselves back as XML data. For instance, Java has a framework called Java Architecture for XML Binding (JAXB) that defines how application domain objects can be generated automatically from XML schemas. These generated objects have no application-oriented methods, but they serve well as pure data objects with load and save capability. Other behavior-rich objects that wrap these generated objects can be used to add the required methods.

Although solving the impedance mismatch problem may seem important to some developers working with data-centric XML, the problem may be entirely unimportant to other developers working with document-centric information. For example, it may not be particularly useful to model text content, such as the content of this book, as an object at all. Indeed, some applications that deal with data-centric content, such as invoices, may only be acting as a repository for those documents and may not need to represent these documents as objects. (While native XML databases can handle both data-centric and document-centric information, as well as the important middle ground called semi-structured information, they face more competition in the area of data-centric applications. This means that in practice, the purely data-centric scenario is less common.) Many applications simply need to extract XML from the database and transform it into other XML grammars for display: such as DocBook to XHTML or GML to SVG. Or perhaps you want to go from DocBook to XSL formatting objects to PDF. None of these scenarios require the XML content to be represented as an object in an object-oriented language; rather, they suggest a need for close integration between a native XML database and an XSLT processor.



XQuery from the Experts(c) A Guide to the W3C XML Query Language
Beginning ASP.NET Databases Using VB.NET
ISBN: N/A
EAN: 2147483647
Year: 2005
Pages: 102

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net