Item 41. Hang On to Your Relational Database | Effective XML: 50 Specific Ways to Improve Your XML

Over the years I've been teaching XML, I've noticed that one perennial question is whether XML somehow replaces relational databases. Typically, this question is asked by IT managers who've had one too many Oracle salespersons visit the company and hoist them up by the toenails to see how much money they can shake out of their pockets. Often, the question is posed with an almost pleading tone, "Please tell me I can finally throw away my relational database. Please."

Sadly, for these managers the answer is no, they cannot throw away their relational database. (They may well be able to replace the ridiculously expensive Oracle with the ridiculously cheap MySQL or PostgreSQL, but that's a subject for another book.) XML is not a database. It was never meant to be a database. It is never going to be a database. Relational databases are proven technology with more than 20 years of implementation experience. They are solid, stable, useful products. They are not going away. XML is a very useful technology for moving data between different databases or between databases and other programs. However, it is not itself a database. Don't use it like one.

First of all, relational databases are based on very clear, well- understood mathematical principles. Storing all data in normalized tables makes for very robust (if sometimes slow) systems. Of course, not all the world's data fits neatly into rows and columns . A list of transactions made against a bank account with dates and amounts fits very easily into a table. The bank's privacy policy for that account does not. Narrative documents such as privacy policies, mortgage contracts, annual reports , deposit instructions, and more really don't look anything like tables. This is where XML enters the picture and where a native XML database that does understand the structure of XML documents might come in handy.

Secondly, while the interface to a relational database is based on a mathematical abstraction of fields and tables, the underlying physical storage almost certainly is not. Relational databases keep indexes on the data, store the same data in multiple places, cache particular results, and perform a host of other tricks to attempt to optimize performance. Big iron databases like Oracle even go so far as to manage their own disk partitions to try to eke out every last nanosecond of performance. XML documents do none of this. The physical storage model of an XML document is very close to the mental model most developers have of it, whether event-based like SAX or tree-based like DOM. No effort is spent on optimization. Thus storing all your data inside XML documents instead of in a relational database is almost certainly going to kill performance on an application of any reasonable size.

All of that having been said, however, there are still many uses for which XML makes a lot of sense in the database world. XML is not a database, but it is a wonderful format for exchanging data between different databases, or between databases and nondatabase systems. By 2003 most major database software ranging from MySQL to FileMaker Pro to Microsoft SQL Server to Oracle have functionality for importing and exporting XML documents. There are no standards in this arena, so how you do it depends heavily on which database you've chosen . For example, FileMaker Pro developers write XSLT stylesheets that transform to and from a native FileMaker syntax to whatever format the developer is receiving (on input) or wants to produce (on output). Other systems rely on schemas or custom mapping files to match XML data to relational fields and records.

This all works very well as long as the XML documents being stored contain the same sort of data that has traditionally gone into relational data bases: accounts receivable, passenger manifests , invoices, student records, sales statistics, and so on. It begins to break down as soon as you start trying to store ad copy, privacy policies, legal briefs, street maps, assembly diagrams, document schemas, sixteenth-century Italian love poetry, and other kinds of things that are written in XML but that have not traditionally been stored in relational databases.

If you have large collections of narrative or other semistructured data in XML (or in a format that can easily be moved to XML), it may be time to consider putting that data in a native XML database. This is a database management system in which the fundamental structure is an XML document rather than a table. The mathematics of XML database systems aren't as formal or as well understood as the mathematics of relational databases, and the standards are just beginning to be developed. (The query language is a working draft and does not include any update, insert, or delete functionality, all of which vendors have to invent their own extensions for.) Still, native XML databases do fit some problems that relational databases don't.

Since native XML databases are very bleeding edge technology, there's a lot of research, experimentation, and hype in the field. No two products in this space work as similarly as do any two SQL databases. Nonetheless, some order is beginning to congeal from the primordial chaos, and I can thus make a few assertions about the characteristics of most native XML databases, which I will do by analogy with the more familiar relational systems.

A relational database contains tables of records. An XML database contains collections of documents.
All the records in a table have the same schema. All the XML documents in a collection have the same DTD (maybe). This is perhaps the area in which different systems differ the most. Some databases use DTDs. Some use the W3C XML Schema Language. Some allow merely well- formed documents with no schema of any type. Others do not. Some require that all documents in a collection have the same structure, valid or not. Others do not.
A relational record is an unordered list of named values. An XML document is an ordered tree of named elements.
A SQL query returns an unordered set of records. An XQuery returns an ordered sequence of nodes.

If you dig deeper into how these databases implement storage and optimize queries, you will find even more differences both with each other and with traditional relational databases. However, one very good characteristic that the native XML databases have adopted wholeheartedly from the relational world is the separation of the data model from the physical model. Although the data model and query language is not standardized across XML databases as it is with SQL databases, every XML database does separate its data model and query language from its physical model. You do not need to directly concern yourself with the physical model the XML database uses, only indirectly as it impacts performance.

I do not suggest using a native XML database for data that does fit well into tabular structures. In practice, relational databases like Oracle, FileMaker Pro, and MySQL are far more reliable, much better supported, and easier to use and administer; and they perform far better than a typical native XML database. More than anything else this reflects the relative maturity of relational databases (more than 20 years) and native XML databases (less than 5 years). One day native XML databases may be as robust, reliable, and fast as relational databases, but that day is not today. In the meantime, however, there are some problems that don't fit the relational model very well, and for these applications a native XML database may be appropriate. These situations include the following:

Content management systems for newspapers and magazines
Page-based, written web sites with lots of static, narrative content
A contracts and briefs database for a law firm
Patient records in a doctor's office
The dissertations submitted by Ph.D. candidates at a university
Software documentation and specifications, including UML diagrams

Anywhere you see people storing data as folders full of Word documents, static HTML pages, or some equivalent, you're looking at an opportunity for a native XML database.

I know there are a few relational database aficionados among my readers who are right now jumping up and shouting, "But an RDBMS can do all those things!"and they're right. An RDBMS can do these things too. But a hammer can pound a screw into wood. That doesn't make it a good idea, nor does it work nearly as effectively as a screwdriver. Not all of the world's data is a table (or an object, or a tree). Spending too much time with one very powerful tool like an RDBMS tends to cause a kind of developer myopia that makes the relational database seem like the right tool for all tasks . But there is no universal tool. Use relational databases for what they're good for, and use XML for what it's good for. It's easy to connect them to each other as necessary. Don't try to make either one do everything.