XML s Popularity


XML's Popularity

XML has gone from relative obscurity to virtual ubiquity in just a few years. By 2002 it was rare to hear of a software product, an industry consortium, or a large-scale integration effort that wasn't based on XML. Let's take a look at some of the features that have led to such rapid adoption, and set the stage for semantic markup.

Rendition Preserves Composition

The success of the relational model popularized the idea that data, in their native state, are tables. In all but the simplest cases, however, the data are likely to be complex graphs. A graph in this context is not line art or graphics, but refers to a relationship in which the data items refer to each other through a network of interrelationships. The graph in Figure 11.3 is meant to represent part of the composition of the metal sculpture from the Swetsville Zoo shown in Figure 10.1. The sculpture has a location, an offer price, a body, and a head. The head in turn has several parts, including two eyes, each of which is a "gear."

click to expand
Figure 11.3: Sculpture as a graph.

In the relational model, the graph is still there, but it is implemented as tables and foreign keys (Figure 11.4). To build the sculpture out of the components, as they reside in the tables, you have to "join" the appropriate rows in the tables, until you have a structure that resembles the graph or tree shown in Figure 11.3. Reducing a complex structure to its primary tables before storing it in the database has been likened to dismantling your car into its elemental pieces every time you drive it into your garage. This mismatch between how data is typically stored in a relational database and how it is typically used in a program is the primary source of contention between object-oriented programmers and database administrators.

click to expand
Figure 11.4: Sculpture as tables.

The data in the tables is shown in Figure 11.5, and a query to put the tree back together again is shown in Figure 11.6.

click to expand
Figure 11.5: Data in the sculpture tables.

start figure

         SELECT LPAD (" ", 2*LEVEL) U.ItemID, U.Description         FROM U,         CONNECTBY PRIOR B.parentID = U.itemID         START WITH U.itemID = "27"          (UNION SELECT ItemID, Description             FROM Sculpture, Component             ORDERBY ItemID, Description) AS U 

end figure

Figure 11.6: A query to get the "tree" from the tables.

When a user wants to access a subset of data from the graph, he or she typically describes a tree of data that is arrived at from some starting point. On the left side of Figure 11.5 we have the graph of the sculpture as it normally exists. This would be the object-oriented instance representation. However, XML is primarily hierarchic (i.e., each node can have only a single parent), so it can't represent the graph easily or directly. Instead, in most usages the graph is converted to a tree, as shown in the right side of Figure 11.7. The only difference is that the "gear" is repeated in the two places it is referred to. (Note: You do not have to repeat the data every time it is needed; you can use XML's ID/IDREF features to refer to a subtree stored elsewhere.)

click to expand
Figure 11.7: A tree from a graph.

XML preserves the tree structure. The structure is captured in an XML document as shown in Figure 11.8.

start figure

         ?sculpture>         ?location> ?/location>         ?head>           ?lefteye>              ?MadeItem> Gear ?/MadeItem>           ?/lefteye>           ?righteye>              ?MadeItem> Gear ?/MadeItem>           ?/righteye>         ?/head>         ?body>           ?leg> L1 ?/leg>         ?/body>         ?offerPrice> 200 ?/offerPrice>         ?/sculpture> 

end figure

Figure 11.8: XML version of sculpture.

Readable by Humans and Systems

One of the advantages of the XML syntax is that it is readable by humans, as well as systems, although it's not optimal for either. Further, what the human and the machine understand is not the same. As we discussed in Chapter 5, a completely unstructured document (e.g., a memo) is understandable only by humans, and a compiled program is understandable only by systems. Having one grammar that goes both ways is more powerful than it first sounds. It means you can build an Application Program Interface (API) that is meant for use by another program, and with very little adjustment allow a human to use it. This greatly rationalizes a system and ensures that the inputs from people and other systems are being subjected to the same edits.

Document/Transaction Duality

Light exhibits wave/particle duality, which means that sometimes it behaves like a wave and sometimes it behaves like a particle. XML has what I call "document/message duality," meaning that sometimes it behaves like a document and sometimes it behaves like a message or transaction.

Really it doesn't have any behavior at all, but sometimes we treat it one way or the other. What is interesting about this is that SGML and many other document markup languages were treated exclusively as documents, and the world of messages was pretty much in the domain of Electronic Data Interchange (EDI) and other non–self-describing approaches.

What is refreshing about XML is that it has this dual nature, which causes us to think a bit longer about the question, "What is the difference between a document and a message?" Indeed, we find that there really isn't any difference. Documents can be messages. Messages can be documents. Having one representation allows us to reuse interfaces, treat messages as documents, and treat documents as messages, as it is convenient.

Data Describes Itself

There was a pattern in the object-oriented pattern community called "data describes itself," which advised that a system was likely to be easier to maintain if the data and the definition of the data (at least some part of the schema) traveled together.

This is an interesting insight. The state of the art at the time for interapplication data communication was EDI, which relied on the sender and receiver "knowing" the precise layouts and sequences of all the record types in the communication. (Chapter 12 goes into this in more detail, as well as strategies for using XML and semantics in situations that were previously in the domain of EDI.)

With XML the structure and the labels (tags) for the values are transmitted with the data, in every invocation. This is inefficient, but the incremental cost of doubling or even tripling the size of each message packet is becoming a smaller and smaller penalty to bear.

"Standard Gauge"

Perhaps the most important thing about XML is that, like another great standard, it has become almost universally adopted (Figure 11.9). The standard gauge for railroads in the United States is 4 feet 8.5 inches (the Stephenson gauge). It is not superior in any significant way to any of the more than 120gauges that have been developed and used commercially, or the 60 or so that are still in use in some applications or in some areas.

click to expand
Figure 11.9: Standard gauge.

However, virtually all passenger and freight in North America is carried on rails that are 4 feet 8.5 inches wide, because once a majority of the railroads conformed there was a great benefit to all of them conforming. Before the Civil War about half of all railroads used the Stephenson gauge. After the war a consortium (the Master Car Builders Association) committed to this standardization, and it became nearly universal.

The point is that even if XML were not vastly technically superior, it has been adopted enthusiastically by virtually every software manufacturer, and its use as a lingua franca is pretty well guaranteed.




Semantics in Business Systems(c) The Savvy Manager's Guide
Semantics in Business Systems: The Savvy Managers Guide (The Savvy Managers Guides)
ISBN: 1558609172
EAN: 2147483647
Year: 2005
Pages: 184
Authors: Dave McComb

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net