Namespaces and Schemas | Developing XML Solutions (DV-MPS General)

[Previous] [Next]

In Chapter 6, we looked at using namespaces for DTDs. Namespaces can be read and interpreted in well-formed XML documents. Unfortunately, DTDs are not well-formed XML. If you use a namespace in a DTD, the namespace cannot be resolved. Let's look at the following DTD as an example:

 <!DOCTYPE doc [ <!ELEMENT doc (body)> <!ELEMENT body EMPTY> <!ATTLIST body bodyText CDATA #REQUIRED> <!ELEMENT HTML:body EMPTY> <!ATTLIST HTML:body HTML:bodyText CDATA #REQUIRED> ]>

A valid usage of this DTD is shown here:

 <doc><body bodyText="Hello, world"/></doc>

The following usage would be invalid, however, because the HTML:body element is not defined as a child element of the doc element:

 <doc><HTML:body bodyText="Hello, world"/></doc>

As far as the DTD is concerned, the HTML:body element and the body element are two completely different elements. A DTD cannot resolve a namespace and break it into its prefix (HTML) and the name (body). So the prefix and the name simply become one word. We want to be able to use namespaces but to be able to separate the prefix from the name. Schemas enable us to do this.

Including Schemas in the Same targetNamespace

We could write a similar schema with a namespace to identify schema elements. For example, let's create a schema named NorthwindMessage.xsd, as shown here:

 <schema targetNamespace="http://www.northwindtraders.com/Message" xmlns:northwindMessage="http://www.northwindtraders.com/Message" xmlns ="http://www.w3.org/1999/XMLSchema"> <include schemaLocation= "http://www.northwindtraders.com/HTMLMessage.xsd"/> <element name="doc"> <group> <option> <element ref="northwindMessage:body"/> <element ref="northwindMessage:HTMLbody"/> </option> </group> </element> <element name="body"> <attribute name="bodyText" type="northwindMessage:TextBody"/> </element> </schema>

NOTE
The schema namespace is not assigned a prefix, so it is the default namespace. All elements without a prefix will belong to the schema namespaces. When elements that are defined in the schema are used, the schema's namespace must be used, as was done with the body and HTMLbody elements. In schemas, the body and HTMLbody elements can be separated from their namespace prefixes and properly identified.

The included file, HTMLMessage.xsd, would look like this:

 <xsd:schema targetNamespace:xsd="http://www.northwindtraders.com/ Message" xmlns:northwindMessage="http://www.northwind.com/Message" xmlns ="http://www.w3.org/1999/XMLSchema"> <xsd:simpleType name="TextBody" base="string" minLength="0" maxLength="20"/> <xsd:element name="HTMLbody"> <xsd:attribute name="bodyText" type="string"/> </xsd:element> </xsd:schema>

NOTE
In this case, we did assign a prefix to the schema namespace and used this prefix throughout the document. You can use either method, but keep in mind that defaults can sometimes be harder for people to interpret.

As you can see, namespaces play a major role in schemas. Let's look at the different elements included in this example. Both documents include a targetNamespace. A targetNamespace defines the namespace that this schema belongs to (http://www.northwindtraders.com/Message). Remember, a namespace uses a URI as a unique identifier, not for document retrieval (although an application could use the namespace to identify the associated schema). It will be up to the application to determine how the namespace is used. The include element has a schemaLocation attribute that can be used by an application or a person to identify where the schema is located.

The HTMLMessage.xsd file is included in the NorthwindMessage.xsd by using the include element. For one schema to be included in another, both must belong to the same namespace. Using the include element will result in the included document being inserted into the schema in place of the include element. Once the insertion is complete, the schema should still be a well-formed XML document. A top-level schema can be built that includes many other schemas.

Notice that the simpleType TextBody is declared in the included document but used in the top-level documents. This separation makes no difference, as both documents will be combined into one document by the processing application.

The XML document that is based on the schema is referred to as an instance document. This instance document will have only a reference to the top-level schema. The instance document will need to use only the namespace of the top-level schema. Thus, the instance document for our example schema would look like this:

 <?xml version="1.0"?> <northwindMessage:doc xmlns: northwindMessage= "http://www.northwindtraders.com/Message" > <body bodyText="Hello, world"/> </northwindMessage:doc>

You could also have the following instance document:

 <?xml version="1.0"?> <northwindMessage:doc xmlns:northwindMessage= "http://www.northwindtraders.com/Message"> <HTMLbody bodyText="<h1>Hello, world</h1>"/> </northwindMessage:doc>

As far as the instance document is concerned, all the elements come from the top-level schema. The instance document is unaware of the fact that HTMLbody element actually comes from a different schema because the schema resolves all the different namespaces.

Including Schemas from a Different targetNamespace

When you use the include element, you insert the entire referenced schema into the top-level schema and both documents must have the same targetNamespace attribute. You might also want to create schema documents that contain simpleType and complexType declarations that you can use in multiple schemas. If the multiple schemas have a different targetNamespace, you cannot use the include element for a document shared between them. Instead of using the include element, you can use the import element. If you use the import element, you can reference any data type created in the imported document and use, extend, or restrict the data type, as shown here:

 <schema targetNamespace="http://www.northwindtraders.com/Message" xmlns:northwindMessage="http://www.northwindtraders.com/Message" xmlns:northwindType="http://www.northwindtraders.com/Types" xmlns ="http://www.w3.org/1999/XMLSchema"> <import schemaLocation="http://www.northwindtraders.com/ HTMLTypes.xsd"/> <element name="doc"> <element ref="northwindMessage:body"/> </element> <element name="body"> <attribute name="bodyText" type="northwindType:TextBody"/> </element> </schema>

The HTMLTypes.xsd file might look like this:

 <xsd:schema targetNamespace:xsd="http://www.northwindtraders.com/ Types" xmlns:northwindMessage="http://www.northwindtraders.com/Types" xmlns ="http://www.w3.org/1999/XMLSchema"> <xsd:simpleType name="TextBody" base="string" minLength="0" maxLength="20"/> </xsd:schema>

In the top-level schema, we associated a namespace called http://www.northwindtraders.com/Types with the prefix northwindType. Using the import element, we can associate that namespace with a schema location. The application will determine how to use the schemaLocation attribute. Once you have done this, you can use the data type. An instance document for this schema is shown here:

 <?xml version="1.0"?> <northwindMessage:doc xmlns:northwindMessage= "http://www.northwindtraders.com/Message" > <body bodyText="Hello, world"/> </northwindMessage:doc>

Once again, as far as the instance document is concerned, it does not matter where the data types are defined—everything comes from the top-level document.

Overriding Data Types

Using namespaces, we have managed to build a schema from other schemas and include data types from other schemas. In both of these cases, the instance document uses only the top-level schema. You can also declare an element as being one particular data type and then override that data type in the instance document. Consider the following top-level schema:

 <schema targetNamespace="http://www.northwindtraders.com/Message" xmlns:northwindMessage="http://www.northwindtraders.com/Message" xmlns="http://www.w3.org/1999/XMLSchema"> <include schemaLocation= "http://www.northwindtraders.com/HTMLMessage.xsd"/> <element name="doc" type="Body"/> <complexType name="Body"> <element name="body"> <attribute name="bodyText" type="string"/> </element> </complexType> <complexType name="HTMLBodyCT"> <element name="HTMLBody"> <complexType> <element name="h1" type="string" content="text"/> </complexType> </element> </complexType> </schema>

This schema has defined the doc element as being a Body data type, and the doc element will contain a body child element that has a bodyText attribute. Now suppose you also want to be able to create messages from other body data types, such as the HTMLBodyCT data type defined in the schema. You could do this by creating a group element with choices.

Another option is to declare the schema as above and then substitute the HTMLBodyCT data type for the Body data type in the instance document. To do this, you will need to reference the schema instance namespace in the instance document. To use the HTMLBodyCT data type, you would need to create an instance document such as this:

 <?xml version="1.0"?> <northwindMessage:doc xmlns:northwindMessage= "http://www.northwindtraders.com/Message" xmlns:xsi="http://www.w3.org/1999/XMLSchema/instance" xsi:type="HTMLBodyCT" > <HTMLBody> <h1>"Hello, world"</h1> </HTMLBody> </northwindMessage:doc>

In this example, you have used the xsi:type attribute to reference a type defined in the schema (HTMLBodyCT). The xsi:type is part of the schema instance namespace and is used to override an element's type with another type that is defined in the schema. In this example, you have now redefined the doc element as being of HTMLBodyCT data type instead of a Body data type. You could also have defined the HTMLBodyCT data type in a separate schema and used the include element in the top-level schema.