XML Conceptual Model


Our business card example is extremely simple. XML's designers intended it to support much more sophisticated documents. When they defined the standard and mapped out the broader paradigm, they used an implicit conceptual model based on their experiences with HTML and SGML, as well as their beliefs about how the use of documents on the Internet would evolve . This model has subtly influenced the development of the XML paradigm, and reconstructing it yields insight into both XML's power and limitations. It has five parts .

  1. Human and machine readability

  2. Defining content

  3. Defining structure

  4. Separation of content from relationships

  5. Separation of structure from presentation

This model is an ideal. The realities of system design and the practical usage of XML have forced compromises and changes. Subsequent chapters address these issues, but understanding the vision behind XML sheds light on why the technology evolved the way it did.

Human and Machine Readability

The XML design strives to achieve both easy human reading and easy machine processing. This goal results in two different views of any particular document: the human view and the machine view. These views are subtly different, and accommodating both views accounts for much of the XML syntax.

Like HTML, XML relies on tag-based markup. From the human perspective, tag-based markup is a readable way to integrate metadata with content. From the machine perspective, tag-based markup is simply one of many possible parsing tactics. Tag-based markup is a compromise that makes human reading easier without imposing too much of a burden on machine processing. Con versely, XML requires a strict hierarchy of tagged data. A strict data hierarchy is a compromise that makes machine processing easier without imposing too much of a burden on human reading. From the machine perspective, the use of a strict hierarchy makes it easy to create programming data structures from document content. From the human perspective, hierarchy is simply one of many possible organizational strategies. These two trade-offs have their roots in the human-machine duality. The human view is of a marked -up document; the machine view is of a tree of data.

In addition to being convenient from the machine perspective, hierarchy is a powerful organizational strategy. It can represent both business concepts and programming data structures. Business process models, decision trees, and geographical models are examples of business concepts that fit the hierarchical model. Trees, linked lists, and tables are examples of programming data structures that can fit into a hierarchy.

Defining Content

To see how machine processing constrains the XML syntax, consider what a machine needs to know to understand a simple order for produce. A human reader may be able to decipher the implicit relationships among the words of an order for "100 10-pound bags carrots at $1 per pound ." However, a machine would have great difficulty with the same task. The human and machine readability goal necessitates a standard means of explicitly declaring the relationships among the words. Metadata enables a document author to specify these relationships in the document. Without metadata, the content of a document looks the same; it's just a jumble of individual words. Because the content is homogeneous, it is meaningless.

In XML, authors define content as a set of elements. An element is a self-contained unit of content with a description of what the content means. For example, you could separate the preceding document into chunks such as "Quantity = 100," "Size = 10," "Size Unit = pound," "Product = carrots," "Price Currency = $," "Price = 1," and "Pricing Unit = pound." Breaking the document into these elements would allow a human reader to interpret the order unambiguously and a machine to calculate the total price easily.

Defining Structure

With very simple documents, breaking content into elements may be sufficient by itself to make the document easy to interpret. However, even a modest document can be confusing without some additional structure. Consider an expanded produce order such as, "100 10-pound bags carrots $1 per pound and 25 limes 50 per lime." With just a list of elements, how would you know which quantity and price applied to which product? How would you account for prices in different units? The solution is to structure the content so the relationships among elements are clear. As mentioned previously, XML uses a hierarchical structuring model. By organizing the expanded document into a hierarchy, you can now clearly apply the correct quantity and price to each product as shown in Example 2-2a.

Example 2-2a
 Order  Line Item   Quantity = 100   Product = carrots    Size = 10    Size Unit = pound   Price = 1    Pricing Currency = dollars    Pricing Unit = pound  Line Item   Quantity = 25   Product = limes    Size = 1    Size Unit = each   Price = 50    Pricing Currency = cents    Pricing Unit = each 

Unfortunately, a generic hierarchy is not specific enough to provide the shared context described in Chapter 1. Clearly, a produce order and a patient record follow different rules. XML DTDs provide a means for specifying the rules that the hierarchy in a particular document must follow. These rules include the allowable element types at each level of the hierarchy. An abstract description of the rules governing the hierarchy for our example produces an order document that might look like Example 2-2b.

Example 2-2b
 Top-level elements = exactly 1 "Order"  Sub-element of "Order" is 1 or more "Line Item"   Sub-element of "Line Item" is exactly 1 "Product"    Sub-element of "Product" is exactly 1 "Size"    Sub-element of "Product" is exactly 1 "Size Unit"   Sub-element of "Line Item" is exactly 1 "Price"    Sub-element of "Price" is exactly 1 "Price Currency"    Sub-element of "Price" is exactly 1 "Price Unit" 

Separation of Content from Relationships

Although hierarchical relationships among elements within a document are important for many applications, associative relationships among elements in different documents are also important. HTML links are an example of these associative relationships. As the success of the Web illustrates, there is often informational value in these associative relationships as well as document content. For example, a set of five Product, five Order, and five Customer documents is much more useful if you know which Orders relate to which Products and which Customers. Unfortunately, HTML embeds these important relationships within the document structure, making them difficult to maintain. The results are broken links and frustrating "HTTP 404 ”Object Not Found" errors.

In the XML paradigm, document content and document relationships can be separated. In fact, the XML specification ignores relationships. A different standard specifies the appropriate syntax. This syntax enables relationships to exist outside the documents they connect, making it possible to maintain the integrity of relationships and even to introduce new relationships without affecting the referenced documents. This independence theoretically allows authors to create valuable information by relating documents in sophisticated ways without ever changing the content of the related documents.

Separation of Structure from Presentation

Another drawback of HTML is the primacy of presentation over structure. For example, to distinguish sections of a document, many authors have forsaken heading specifiers such as <H1> for direct control over font attributes such as <FONT SIZE=16>. This approach results in two problems. First, all content looks exactly the same. Second, readers are limited to the presentation imposed by the author.

In XML, document structure and document presentation can be separated. As with relationships, the XML specification ignores presentation, relying on another standard. For a given XML document, authors may suggest a presentation using this other standard, but readers are free to use whatever presentations they choose. Different readers may use different presentations, and the same reader may use different presentations under different circumstances. Another advantage of this approach is that theoretically it enables a clear separation of responsibility between the author of structured information and the designer of the presentation layout.



XML. A Manager's Guide
XML: A Managers Guide (2nd Edition) (Addison-Wesley Information Technology Series)
ISBN: 0201770067
EAN: 2147483647
Year: 2002
Pages: 75
Authors: Kevin Dick

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net