1.2 Data and documents
There are two kinds of data that occur in XML documents: the transactional data that usually resides in databases ("database data") and the human-created kind that is commonly associated with documents ("publishing data"). The distinction can affect any aspect of an XML system, but especially the processing. Let's look at the characteristics of the two.
Database data is regular and has fixed depth. Databases allow rapid access and processing, which is accomplished by restricting the structure of the data. Typically, a database consists of relational tables, each table storing a sequence of records (composite values), each record containing a fixed set of fields (atomic values).
Fields can link to records in other tables to represent deeper levels of nesting. However, the full set of tables is fixed in the database schema and cannot be modified dynamically. Therefore a given database has a fixed depth.
Database-like XML documents are strongly predictable. In order for an XML document (or any single element of a document) to represent database data, its structure must map easily into the relational table model. We say it must be strongly predictable . That is, each element must be constrained to contain either a fixed sequence of element types (such as quantity, item-number, description, price, etc.), data characters only, or nothing at all.
Strongly predictable elements can easily be visualized as forms. A business transaction document, such as a purchase order, is more likely to be strongly predictable than a memo.
Publishing data is freeform. XML data in general may be far less predictable than database data. Even when constrained by a DTD or schema, sufficient variation may be allowed that the data can be considered freeform .
For example, an element type's content model may allow several different subelement types to occur multiple times in any order. Because the subelements can be of different types, the depth of the structure is unpredictable. The content model might even allow character data to be intermixed with the subelements (which the XML specification calls mixed content ).
An example of mixed content is a paragraph where some of the words are marked up for emphasis, or as references, or as other types of inline elements. From the validation viewpoint, mixed content is special because you can only control what types of elements are used within another element, not how many or in what order. From the XSLT standpoint, mixed content is somewhat inconvenient to handle because an element with mixed content has several child text nodes, each fragment of textual data between inline elements being a separate node (see also 22.214.171.124 ).
Looking ahead. One of the strengths of XML is that a document can contain both database data and publishing data, and that both document processing and data processing can be performed on it. Although a web site is primarily concerned with publishing data, there is also a need to deal with database data, particularly in dynamic sites. As we move through the book, we will see that some of the XML techniques work better for one kind of data than for the other.