Extensible Markup Language (XML)

Team-Fly    

 
Requirements Analysis: From Business Views to Architecture
By David C. Hay
Table of Contents
Appendix B.  A Comparison of Data Modeling Techniques
(Syntactic Conventions)

Extensible Markup Language (XML)

The last technique presented here isn't really a data-modeling language at all. Rather is a way of representing data structure in text, using specially defined "tags" or labels to describe the structure of text. The data being described could be either from an entity-type/relationship model or from a database design.

The Extensible Markup Language (XML) is similar to the Hypertext Markup Language (HTML) that is used to describe pages to the World Wide Web. XML and HTML are both subsets of something called "Standard Generalized Markup Language", or SGML. This is a sophisticated tag language, which, "due to [its] complexity, and the complexity of the tools required," as the Object Management Group has so delicately put it, "has not achieved widespread uptake" [OMG, 1997].

In each case, a set of "tags" are inserted into a body of text. In the case of HTML, the tags are predefined to be interpreted by a standard piece of software called a browser. The browser uses the tags to determine how various parts of the document should be displayed.

XML, on the other hand, allows tags to be defined by users and is not concerned with display at all. Rather, the tags can be defined to describe a data structure, and data can be transmitted over the Internet in that structure.

Because tags are defined by users, no existing software will automatically understand the tags. Software can read the definitions of tags and insure that data transmitted using them follows them, but it cannot provide more interpretation to the structure unless it is specifically written to do so.

This means that XML is most useful when within a community that defines the semantics of a set of tags in common for its purpose. For example, the chemical industry has set up an XML-based Chemical Markup Language , and astronomers, mathematicians and the like have similarly defined sets of tags for describing things in their respective fields.

What Is It?

Figure B.9 shows an example of XML used to describe a data record that might be presented in a document.

Figure B.9 An XML Document.
 <?XML version="1.0"?> <!-- **** Purchasing **** --> <PURCHASE_ORDER>   <ISSUED_TO_PARTY>        <party_id>  234553  </party_id>        <name>  Acme Sporting Goods  </name>        <party_type>  Organization  </party_type>        <surname></surname>        <corporate_mission>  Get America   moving  </corporate_mission>   </ISSUED_TO_PARTY>   <po_number>  743453  </po_number>   <order_date>  12 November, 1999  </order_date>   <LINE_ITEM>        <line_number>  1  </line_number>        <quantity>  12  </quantity>        <price>  64.75  </price>        <product_service_indicator>  product  </product_service_indicator>        <PRODUCT>           <product_code>  X-23  </product_code>           <description>  Nike sneakers  </description>           <unit price>  75.00  </unit_price>        </PRODUCT>    </LINE_ITEM>  <LINE_ITEM>       <line_number>  2  </line_number>       <quantity>  12  </quantity>       <price>  64.75  </price>       <product_service_indicator>  service  </product_service_indicator>       <SERVICE>           <service_id>  x-87  </product_code>           <description>  Walking the dog  </description>           <rate_per_hour>  12.00  </rate_per_hour>       </SERVICE>    </LINE_ITEM>    <LINE_ITEM/> </PURCHASE_ORDER> 

Note a few interesting things about this example.

First, as with HTML, each tag is surrounded by less-than and greater-than brackets (<>) and is usually followed by text. The text is in turn followed by an end tag, in the form </...>. A tag may have no content, in which case either the end tag follows immediately upon the tag (as in <surname></surname>), or the tag itself ends with a forward slash (as in <LINE_ITEM/>). Unlike with HTML, however, the end tag is always required in one of those two forms.

A second thing to note is that, in this case, following the tag for <PURCHASE_ORDER>, a set of related tags follow, describing characteristics ( columns and relationships from data models, in this case) of <PURCHASE_ORDER>. In this particular case, the tag <PURCHASE_ORDER> has been defined such that it must be followed by exactly one tag for <ISSUED_TO_PARTY>, one for <po_number>, and so forth. You can't see this from the example, but the tag <corporate_mission> is optional. In addition, the tag for line_item is also optional, and there may be one or more occurrences of it.

Although it is optional, all XML documents should begin with <?XML version="1.0"?> (or whatever version number is appropriate.)

Note that the structure is hierarchical, so that an element can be under only one other element, and there can be only one hierarchy in a document. In the example, therefore, party was only defined as < ISSUED_TO_PARTY > under <PURCHASE ORDER>. If it were related to something else in the model, the description would have to be repeated.

Comments are in the form <!-- . . . --> Note that the double hyphens must be part of the comment. Note also that, unlike HTML, XML lets you use a comment to surround lines of code that you want to disable.

The meaning of a tag is defined in a document type declaration (DTD). This is a body of code that defines tags through a set of elements . It is the DTD that allows you to specify a data structure. While an XML document contains data, the DTD contains the model of those data.

It is the DTD that is the analogy to the modeling techniques we have seen in this appendix.

Entity Types and Attributes

The DTD for the above example is shown in Figure B.10.

Figure B.10 An XML Data-Type Definition.
 <!DOCTYPE PURCHASE_ORDER [    <!ELEMENT PURCHASE_ORDER (ISSUED_TO_PARTY, po_number,    order_date, LINE_ITEM*)>           <!ELEMENT ISSUED_TO_PARTY (party_id, name,           party_type, surname?, corporate_mission?)>                  <!ELEMENT party_id (#PCDATA)>                  <!ELEMENT name (#PCDATA)>                  <!ELEMENT party_type (#PCDATA)>                  <!ELEMENT surname (#PCDATA)>                  <!ELEMENT corporate_mission (#PCDATA)>           <!ELEMENT po_number (#PCDATA)>           <!ELEMENT order_date (#PCDATA)>           <!ELEMENT LINE_ITEM (line_number, quantity, price,                  product_service_indicator, PRODUCT?,                  SERVICE?)>              <!ELEMENT line_number (#PCDATA)>              <!ELEMENT quantity (#PCDATA)>              <!ELEMENT price (#PCDATA)>              <!ELEMENT product_service_indicator (#PCDATA)>              <!ELEMENT PRODUCT (product_code,              description,                     unit_price)>                     <!ELEMENT product_code (#PCDATA)>                     <!ELEMENT description (#PCDATA)>              <!ELEMENT unit_price (#PCDATA)>              <!ELEMENT SERVICE (service_id, description,              rate_per_hour)>                     <!ELEMENT service_id (#PCDATA)>                     <!ELEMENT description (#PCDATA)>                     <!ELEMENT rate_per_hour (#PCDATA)> ] 

The DTD for an XML document can be either part of the document or in an external file. If it is external, the DOCTYPE statement still occurs in the document, with the argument "SYSTEM -filename-", where "-filename-" is the name of the file containing the DTD. For example, if the above DTD were in an external file called "xxx.dtd", the DOCTYPE statement would read:

 <!DOCTYPE PURCHASE_ORDER SYSTEM xxx.dtd> 

The same line would then also appear as the first line in the file xxx.dtd.

Note that the name specified in the DOCTYPE statement must be the same as the name of the highest-level ELEMENT.

Each element in the specification refers to a piece of information. An XML element is defined in terms of one or more predicates, where a predicate is simply a piece of information about an element. This may be either an attribute or an entity type in your data model. In the example above, <PURCHASE_ORDER> has as predicates <ISSUED_TO_PARTY>, <po_number>, <order_date>, and <LINE_ITEM>. <ISSUED_TO_PARTY> and <LINE ITEM> are relationships to the parent entity type in the data model that this was based on. <Po_number>, and <order_date> are attributes from that model.

Cardinality/Optionality

Relationships are represented by the attachment of predicates to elements. In the absence of any special characters , this means that there must be exactly one occurrence of each predicate for each occurrence of parent element. If the predicate is followed by a "?", then the predicate is not required. If it is followed by a "*", it is not required, but if it occurs, it may have more than one occurrence. If it is followed by a "+", at least one occurrence is required, and it may have more than one.

In the example in Figure B.10, each purchase_order must have an <ISSUED_TO_PARTY>, a <po_number> and an <order_date>. In addition, a <PURCHASE_ORDER> may or may not have any <LINE_ITEMS>s, but it could have more than one.

Each predicate is then itself an element defined in turn by its predicates that follow. At the bottom of the tree in each case, "#PCDATA" means that the element will contain text that can be parsed by browsing software.

Names

Names in XML may not have spaces. XML is case sensitive. XML keywords are in all upper case. The case of a tag name in an element definition must be the same as was used if the element appeared as a predicate, and the case of an element used an XML document must be the same as in its DTD definition.

Note that there is nothing in XML to prevent you from specifying multivalued attributes, but in the interest of coherence for the data structure, following the rules of normalization is strongly recommended. By convention in the above example, elements that would be entity types in an entity/relationship model appear in upper case. Elements that would appear in that model as attributes are in lower case. Your naming conventions may be different.

Unique Identifiers

XML has no way to recognize unique identifiers.

Sub-types

XML has no way to recognize sub-types and super-types. Note, in the example above, that the attributes of <ISSUED_TO_PARTY> had to include both attributes of person and attributes of organization from our other models. The attribute <product-service-indicator> was included in <LINE_ITEM> to determine which case was involved. Similarly, <Party_type> determined which kind of <ISSUED_TO_PARTY> a record referred to. Software would be required to enforce this.

Constraints between Relationships

XML has no way to describe constraints between relationships.

Comments

As noted above, XML isn't really a data-modeling language. It is not very sophisticated in its ability to represent the finer points of data structure. It shares the limitations of a relational database, for example, with no ability to recognize sub-types or constraints. It is being recognized, however, as a very powerful way to describe the essence of data structures for use as a template for transmitting data from one place to another.

While the tag structure does seem to be a good vehicle for describing and communicating database structure, the requirement for discipline in the way we organize data is more present than ever. XML doesn't care if we have repeating groups, monstrous data structures, or whatever. If we are to use XML to express a data structure, it is incumbent upon us to do as good a job with the tool as we can. (This is, of course, true of any modeling technique.)

In recognizing that XML is a good vehicle for describing database structure, the most obvious issue is that this will put greater responsibility on data administrators to define data correctly. XML will not do that. XML will only record whatever data design (good or bad) human beings come up with.

As Clive Finkelstein has said, the advent of XML is going to make data modelers and designers even more important than they are now. "After fifteen years of obscurity, data modelers can finally become overnight successes" [Finkelstein, 1999].


Team-Fly    
Top
 


Requirements Analysis. From Business Views to Architecture
Requirements Analysis: From Business Views to Architecture
ISBN: 0132762005
EAN: 2147483647
Year: 2001
Pages: 129
Authors: David C. Hay

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net