Understanding Namespaces | Using XML with Legacy Business Applications

The Schema Recommendation relies heavily on namespaces. Although by now almost everyone involved with XML has at least heard of namespaces, a lot of confusion and misunderstanding still exist about some of the key concepts. A brief discussion is in order.

What Is a Namespace?

To begin, every XML instance document has a namespace, whether it is explicitly named or not. So, in the current environment there is no escaping them. You can't run, you can't hide.

A namespace is nothing more than a set of names . Namespaces are themselves identified by a URI, which I'll discuss next . But first let's drill in the concept with an analogy.

Consider a youth soccer league family picnic. The set of everyone attending the picnic forms our universe, or the default, unnamed namespace. Consider two people named Bob who are attending the picnic with their sons and daughters (Figure 4.1).

Figure 4.1. Two Bobs at the Picnic

graphics/04fig01.gif

Suppose that the coach calls out for Bob to help cook the hot dogs. Unless the coach is looking directly at one or the other of them, how are they to know which one should answer? Maybe they're both perfectly happy sitting in their lawn chairs, nursing brewskis, and they both ignore him. The coach gets annoyed because he also needs to get someone else started setting up some more tables. So, he calls out, "Bob Smith!" He has just used a namespace. Figure 4.2 shows the two Bobs "disambiguated" by putting them in the namespaces of the Smith and Jones surnames.

Figure 4.2. The Smith and Jones Namespaces

graphics/04fig02.gif

In a nutshell , that's all there is to namespaces. Now, there are certainly complexities in how they are used and how they are resolved, but that's the basic concept. For most things you'll have to deal with, you need to grasp only this essential concept and a few more details.

URIs, URNs, and URLs

The Schema Recommendation specifies that namespaces have a name expressed as a URI. We're all familiar with URLs from the Web, but fewer people have heard of URIs or URNs or really know what they mean. These terms are all defined in standards maintained by the Internet Engineering Task Force ( IETF ). IETF standards originate as Requests for Comment ( RFC ), so the standards that define these terms are referenced by their RFC numbers . Pointers to them are listed in the Resources section at the end of the chapter.

Uniform Resource Identifier ( URI ) : RFC 2396 defines a URI as "a compact string of characters for identifying an abstract or physical resource." This is the general class from which all other types of identifiers such as URLs and URNs are derived. Any valid URL or URN is also a URI. A URI can be classified as a name, a location, or both. URIs must conform to a specific syntax specified in the RFC.
Uniform Resource Locator ( URL ) : These are the familiar http://www.mumble . . . addresses we're all familiar with from the World Wide Web. URLs are specifically a URI that references a location. In addition to the familiar http protocol reference, URLs can also have protocol types such as file and ftp. The general syntax for a URL is the protocol type followed by a string identifying a resource that can be accessed via that protocol. URLs were originally specified in RFC 1738, which was later merged into RFC 2396.
Uniform Resource Name ( URN ) : This is a name for a resource that is not necessarily associated with the location of that resource. URNs use the concept of namespaces. While similar at a high level to the use of namespaces in XML documents, the details are different, and URNs have an entirely different syntax. URN syntax names have a "urn" prefix, followed by a namespace ID, followed by the specific name within that name space. For example, RFC 2141, which deals with namespace syntax, could have a URN of urn:ietf:RFC:2141, as opposed to its URL of http://www.ietf.org/rfc2141.txt. URNs are defined in RFC 2141. As described in RFC 2611, the IETF intended that URNs be created and assigned through a managed process, with the namespace IDs being registered through the Internet Assigned Numbers Authority ( IANA ).

There are two key points to understand: (1) a namespace is a set of names, and (2) the name of that namespace is just a name. Even though a specific namespace name might have the form of a URL, it doesn't necessarily mean that there is anything at that URL location. This can be kind of confusing because the current convention is to use URL names as namespace names and post the schemas that correspond to them at those URLs. However, this is just a convention and not a requirement. The use of URNs will certainly help to resolve this confusion, but they have not yet come into fashion. My hunch is that people are shying away from them because they're not yet ready to step up to the "managed process" implications of using them. Most of the namespace names currently in use are in URL form. This includes the namespaces used by the W3C.

Namespace Qualification in Instance Documents

The thing that gives most people real heartburn about namespaces is their use in instance documents. This issue has sent some of my standards committee colleagues into apoplectic fits when the topic of namespaces comes up in discussion. Since this chapter is intended to inform you about what you are likely to encounter and is not intended to be a "best practices" guide, I'm going to avoid taking sides in this issue. I will, however, try to point out what some of the controversy and consternation is about.

The most common and simplest way to use namespaces in an instance document is to not explicitly use them at all. The instance document, in essence, has an unnamed default namespace, and all the Elements and Attributes in the document come from that namespace. Very few people have much trouble with this approach.

Another way to use namespaces is to give them a prefix. We've already seen this in the schema declaration Attributes of our instance document.

 <SimpleCSV xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"     xsi:noNamespaceSchemaLocation="SimpleCSV1.xsd">

The xmlns attribute assigns the xsi prefix to the namespace http://www.w3.org/2001/XMLSchema-instance. The next Attribute uses that prefix to say that the noNamespaceSchemaLocation Attribute is defined in that namespace. Also note that this is the only place in the instance document that you see a namespace or a prefix. This minimalist strategy for instance documents is the next simplest strategy (after the choice of not using named namespaces at all).

Namespace qualification in instance documents is specified by two Attributes in the schema's root xs:schema Element. elementFormDefault and attribute Form Default, respectively, say whether or not your instance document's Elements and Attributes need to have namespaces specified. There is a general preference for setting these to "unqualified," though I have seen schemas that set them to "qualified."

At the other end of the spectrum we can have instance documents in which every Element and Attribute is qualified, sometimes from different namespaces. In this form, elementFormDefault has a value of "qualified." Here's an example.

Qualified Namespace Usage in SimpleCSVNamespaces.xml

 <?xml version="1.0" encoding="UTF-8"?> <bb:SimpleCSV     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"     xmlns:bb="http://www.babelblaster.org/SimpleCSV"     xmlns:bbc="http://www.babelblaster.org/common"     xmlns="http://www.usps.org"     xsi:schemaLocation="       http://www.babelblaster.org/SimpleCSV SimpleCSVN.xsd       http://www.babelblaster.org/common common.xsd       http://www.usps.gov http://www.usps.gov/ZipCodes.xsd       ">   <bb:Row>     <bbc:Column01>Jones</bbc:Column01>     <bbc:Column02>Mary</bbc:Column02>     <bbc:Column03>312 Renner Road</bbc:Column03>     <bbc:Column04>Apartment C</bbc:Column04>     <bbc:Column05>Richardson</bbc:Column05>     <bbc:Column06>TX</bbc:Column06>     <ZipCode xmlns="http://www.usps.org">75080</ZipCode>     <bbc:Column08>USA</bbc:Column08>     <bbc:Column09>972-996-1051</bbc:Column09>   </bb:Row>   <bb:Row>     <bbc:Column01>Smith</bbc:Column01>     <bbc:Column02>Sue</bbc:Column02>     <bbc:Column03>Highway 118</bbc:Column03>     <bbc:Column05>Terlingua</bbc:Column05>     <bbc:Column06>TX</bbc:Column06>     <ZipCode xmlns="http://www.usps.gov">79852</ZipCode>     <bbc:Column10>desertrat@aol.com</bbc:Column10>   </bb:Row> </bb:SimpleCSV>

Note : This document won't validate since there is no U.S. Post Office namespace that defines a ZipCode Element (at least not yet, anyway).

There are tradeoffs to the two extremes. If only the default, unnamed namespace is used, the instance document is limited to a single set of names. This can make it harder in some senses to support standard vocabularies. At the other extreme, liberal use of different namespaces makes everything from DOM names (and even some method calls) in Java and C++ to XPath expressions in XSLT more cumbersome.

There are ways to use multiple namespaces in different schema documents while using just one namespace in instance documents that are valid with the schemas. We'll talk about that later in the chapter when discussing structuring schemas using several different schema files.

The W3C XML Schema-Related Namespaces

The W3C defines two specific namespaces associated with the XML Schema language.

http://www.w3.org/2001/XMLSchema : This namespace defines all the constructs specified by the Schema Recommendation, including all the Element and Attribute names we use in schemas. By convention this namespace is referred to by the prefix xs, though you may occasionally see xsd.
http://www.w3.org/2001/XMLSchema-instance : This namespace defines Attributes that can be used in XML instance documents. We have seen two so far, noNamespaceSchemaLocation and schemaLocation. By convention this namespace is referred to by the prefix xsi.

The former is used in every schema document written in the W3C XML Schema language, so it is very important. I suppose it could be used in instance documents other than schemas, but I've never seen it done. The latter is used in every instance document that declares itself to conform to a schema written in the W3C XML Schema language.