Schemas and Namespaces | Semantics in Business Systems: The Savvy Managers Guide (The Savvy Managers Guides)

XML has two concepts that help bridge the gap between syntax and semantics: schemas and namespaces. A schema is a document that describes the tags used in the XML document. Schemas have two main functions: descriptive and prescriptive. If we have an XML document and would like to get some more clues as to what the tags mean, we use a schema for its descriptive value. If we are building a new XML document and need to know which tags can validly follow which other tags at any point in the document, we use a schema for its prescriptive value.

Many Standards for Schemas

The first standard for XML schema was document-type definition (DTD), which was borrowed pretty much intact from SGML. Subsequent to that, dozens of schema standards were proposed. Some of the more notable include the following:

XDR—XML data and XML data reduced were originated by Microsoft.
SOX—Schema for object oriented XML was promoted by CommerceOne, primarily to aid with its business-to-business (B2B) initiatives.
DCD—Document content description was an XDR/resource description framework (RDF) hybrid.
DDML—Document definition markup language focused on the logical structure as distinct from the physical structure.
RELAX and RELAX NG—Regular language description for XML was promoted by Murata Makoto and James Clark through Oasis. RELAX uses patterns to define things such as cardinality constraints in the schema.

Although these and many other efforts essentially created prototypes, the two enduring schemas are DTD and XSD:

DTD—This is still widespread, but it is essentially a legacy standard. There are still many tools that support only DTD, and there are many DTD-defined documents out there, but most new work has moved on to XSD.
XSD—XML schema definition language was heavily promoted by a consortium of vendors (including Microsoft, IBM, Sun, and Oracle) and is the current standard for XML schemas.

We'll discuss the differences between DTD and XSD for the remainder of this section in the context of description and prescription.

DTD

XML stores its structure and validation rules in its schema. Originally the schema was stored in a data-type definition file in the same manner as SGML. DTD is a different format than XML (it is not a tagged language), so a DTD schema might look like Figure 11.11.

    <!DOCTYPE sculpture    <!ELEMENT sculpture (location, head, body, offerprice)>    <! - the location is the relative location on path -->    <!ELEMENT location (#PCDATA)>    <! - all sculptures must have right& left eyes and mouth -- >    <!ELEMENT head (lefteye, righteye, mouth)>    <! - sculptures can have any number of legs including zero (like the     snake)>    <!ELEMENT body (leg*)>    <!ELEMENT offerprice (#PCDATA)>    >

Figure 11.11: DTD schema. PCDATA is not "politically correct data," it is "parsed character data." In other words, data in this position is expected to be ASCII or Unicode data that the parser will interpret. CDATA is character data that the parser is not meant to interpret (e.g., an image).

XSD

XSD has made three major breaks from DTD:

The schema is now expressed in XML. As can be seen in the portion of an XSD schema shown in Figure 11.12, the schema is now expressed in XML itself, and not in a special language. The first benefit this brings is ease of working with the schema, because you use the same tools that you use to work with an XML document. More interestingly, schema can now refer to other schema.

    ??xml version="1.0" encoding="UTF-8"?>    ?xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"    elementFormDefault="qualified" attributeFormDefault="unqualified">    ?xs:element name="sculpture">      ?xs:annotation>        ?xs:documentation> ?/xs:documentation>      ?/xs:annotation>    ?/xs:element>    ?xs:element name="Location"/>    ?xs:group name="Head">      ?xs:sequence>         ?xs:element name="LeftEye"/>         ?xs:element name="RightEye"/>         ?xs:element name="Mouth"/>      ?/xs:sequence>    ?/xs:group>    ?xs:group name="Body">      ?xs:sequence>         ?xs:element name="Leg"/>      ?/xs:sequence>    ?/xs:group>    ?xs:element name="OfferPrice" type="xs:decimal"/>    ?/xs:schema>

Figure 11.12: XSD schema.

XSD has much stronger data typing. Many more data types are available and can be used in the XML document creation process, such as the decimal type shown as the next to the last line in Figure 11.12.
XSD supports namespaces. As we'll discuss in more detail in the next section, the use of namespaces in XSD allows us to overcome a number of potential ambiguities.

Semantic Scope and Namespace

Chapter 2 examined how words become overloaded when they are used in many different contexts. Imagine the problem we'd have with XML schema if we had to use a new word every time we had a "name clash" (i.e., we used a word for a different meaning than someone else did). This approach would not get us far.

Namespaces are a way to scope the tags. For example, the tag "sculpture" would be in the "Swetsville" namespace, and we wouldn't have to worry that our trading partners have used the tag "sculpture" in some more traditional sense. This prevents the problem of having to achieve universal agreement on all the terms we use. We only need to agree on the terms we share.