The Schema Recommendation relies heavily on namespaces. Although by now almost everyone involved with XML has at least heard of namespaces, a lot of confusion and misunderstanding still exist about some of the key concepts. A brief discussion is in order.
What Is a Namespace?
To begin, every XML instance document has a namespace, whether it is explicitly named or not. So, in the current environment there is no escaping them. You can't run, you can't hide.
A namespace is nothing more than a set of names . Namespaces are themselves identified by a URI, which I'll discuss next . But first let's drill in the concept with an analogy.
Consider a youth soccer league family picnic. The set of everyone attending the picnic forms our universe, or the default, unnamed namespace. Consider two people named Bob who are attending the picnic with their sons and daughters (Figure 4.1).
Figure 4.1. Two Bobs at the Picnic
Suppose that the coach calls out for Bob to help cook the hot dogs. Unless the coach is looking directly at one or the other of them, how are they to know which one should answer? Maybe they're both perfectly happy sitting in their lawn chairs, nursing brewskis, and they both ignore him. The coach gets annoyed because he also needs to get someone else started setting up some more tables. So, he calls out, "Bob Smith!" He has just used a namespace. Figure 4.2 shows the two Bobs "disambiguated" by putting them in the namespaces of the Smith and Jones surnames.
Figure 4.2. The Smith and Jones Namespaces
In a nutshell , that's all there is to namespaces. Now, there are certainly complexities in how they are used and how they are resolved, but that's the basic concept. For most things you'll have to deal with, you need to grasp only this essential concept and a few more details.
URIs, URNs, and URLs
The Schema Recommendation specifies that namespaces have a name expressed as a URI. We're all familiar with URLs from the Web, but fewer people have heard of URIs or URNs or really know what they mean. These terms are all defined in standards maintained by the Internet Engineering Task Force ( IETF ). IETF standards originate as Requests for Comment ( RFC ), so the standards that define these terms are referenced by their RFC numbers . Pointers to them are listed in the Resources section at the end of the chapter.
There are two key points to understand: (1) a namespace is a set of names, and (2) the name of that namespace is just a name. Even though a specific namespace name might have the form of a URL, it doesn't necessarily mean that there is anything at that URL location. This can be kind of confusing because the current convention is to use URL names as namespace names and post the schemas that correspond to them at those URLs. However, this is just a convention and not a requirement. The use of URNs will certainly help to resolve this confusion, but they have not yet come into fashion. My hunch is that people are shying away from them because they're not yet ready to step up to the "managed process" implications of using them. Most of the namespace names currently in use are in URL form. This includes the namespaces used by the W3C.
Namespace Qualification in Instance Documents
The thing that gives most people real heartburn about namespaces is their use in instance documents. This issue has sent some of my standards committee colleagues into apoplectic fits when the topic of namespaces comes up in discussion. Since this chapter is intended to inform you about what you are likely to encounter and is not intended to be a "best practices" guide, I'm going to avoid taking sides in this issue. I will, however, try to point out what some of the controversy and consternation is about.
The most common and simplest way to use namespaces in an instance document is to not explicitly use them at all. The instance document, in essence, has an unnamed default namespace, and all the Elements and Attributes in the document come from that namespace. Very few people have much trouble with this approach.
Another way to use namespaces is to give them a prefix. We've already seen this in the schema declaration Attributes of our instance document.
<SimpleCSV xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="SimpleCSV1.xsd">
The xmlns attribute assigns the xsi prefix to the namespace http://www.w3.org/2001/XMLSchema-instance. The next Attribute uses that prefix to say that the noNamespaceSchemaLocation Attribute is defined in that namespace. Also note that this is the only place in the instance document that you see a namespace or a prefix. This minimalist strategy for instance documents is the next simplest strategy (after the choice of not using named namespaces at all).
Namespace qualification in instance documents is specified by two Attributes in the schema's root xs:schema Element. elementFormDefault and attribute Form Default, respectively, say whether or not your instance document's Elements and Attributes need to have namespaces specified. There is a general preference for setting these to "unqualified," though I have seen schemas that set them to "qualified."
At the other end of the spectrum we can have instance documents in which every Element and Attribute is qualified, sometimes from different namespaces. In this form, elementFormDefault has a value of "qualified." Here's an example.
Qualified Namespace Usage in SimpleCSVNamespaces.xml
<?xml version="1.0" encoding="UTF-8"?> <bb:SimpleCSV xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:bb="http://www.babelblaster.org/SimpleCSV" xmlns:bbc="http://www.babelblaster.org/common" xmlns="http://www.usps.org" xsi:schemaLocation=" http://www.babelblaster.org/SimpleCSV SimpleCSVN.xsd http://www.babelblaster.org/common common.xsd http://www.usps.gov http://www.usps.gov/ZipCodes.xsd "> <bb:Row> <bbc:Column01>Jones</bbc:Column01> <bbc:Column02>Mary</bbc:Column02> <bbc:Column03>312 Renner Road</bbc:Column03> <bbc:Column04>Apartment C</bbc:Column04> <bbc:Column05>Richardson</bbc:Column05> <bbc:Column06>TX</bbc:Column06> <ZipCode xmlns="http://www.usps.org">75080</ZipCode> <bbc:Column08>USA</bbc:Column08> <bbc:Column09>972-996-1051</bbc:Column09> </bb:Row> <bb:Row> <bbc:Column01>Smith</bbc:Column01> <bbc:Column02>Sue</bbc:Column02> <bbc:Column03>Highway 118</bbc:Column03> <bbc:Column05>Terlingua</bbc:Column05> <bbc:Column06>TX</bbc:Column06> <ZipCode xmlns="http://www.usps.gov">79852</ZipCode> <bbc:Column10>email@example.com</bbc:Column10> </bb:Row> </bb:SimpleCSV>
Note : This document won't validate since there is no U.S. Post Office namespace that defines a ZipCode Element (at least not yet, anyway).
There are tradeoffs to the two extremes. If only the default, unnamed namespace is used, the instance document is limited to a single set of names. This can make it harder in some senses to support standard vocabularies. At the other extreme, liberal use of different namespaces makes everything from DOM names (and even some method calls) in Java and C++ to XPath expressions in XSLT more cumbersome.
There are ways to use multiple namespaces in different schema documents while using just one namespace in instance documents that are valid with the schemas. We'll talk about that later in the chapter when discussing structuring schemas using several different schema files.
The W3C XML Schema-Related Namespaces
The W3C defines two specific namespaces associated with the XML Schema language.
The former is used in every schema document written in the W3C XML Schema language, so it is very important. I suppose it could be used in instance documents other than schemas, but I've never seen it done. The latter is used in every instance document that declares itself to conform to a schema written in the W3C XML Schema language.