Choosing a Namespace URI | Effective XML: 50 Specific Ways to Improve Your XML

The first choice to make is the URI scheme. While theoretically any URI scheme can be used, in practice only two are at all common: http and urn. An http scheme is the familiar http URL that is loaded into web browsers, printed in books, advertised on the sides of buses, and painted on building walls. A URN scheme, by contrast, identifies a Uniform Resource Name (as opposed to a Uniform Resource Locator). According to the URN specification, RFC 2141, URNs "are intended to serve as persistent, location-independent, resource identifiers." ^[1] Here are a few examples of URNs:

^[1] R. Moats (ed.). "URN Syntax." 1997. Accessed online in June 2003 at http://www.ietf.org/rfc/rfc2141.txt.

urn:uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882
urn:publicid:-:OASIS:DTD+DocBook+XML+V4.1.2:EN
urn:schemas-microsoft-com:xml-data
urn:schemas-microsoft-com:datatypes
urn:ISBN:059600292
urn:ndw:stylesheets:xsl:docbook:1.15

URNs feel like a very good fit for namespace URIs. They are not resolvable, and they have no particular dependence on location. Nonetheless, many developers prefer http URLs for namespace URIs, some because they want the namespace names to be resolvable, others because they're just not familiar with URNs. Perhaps the most common reason for choosing an http URL is that URN schemes must be registered before use. Except for experimental URNs that look like urn:X- foo, you can't just make them up on the fly. The domain name in an http URL also has to be registered, but the process for doing so is less involved (if more expensive), and many developers already own or have access to domain names they can use.

Assuming you do choose http for the protocol, the URI should be in a domain you own and control. For example, I often use http://namespaces.cafeconleche.org/ because I own the cafeconleche.org domain. You will of course choose something else. The namespace URI does not necessarily have to point to an existing page, host, or even domain. For the long term it's sensible to put something there, especially a RDDL document. (See Item 42.) However, when you're just beginning to design an application, this is not an urgent need. Parsers do not treat namespace URIs as anything more than strings. In particular they do not resolve the URL or load any page that may be found there. You do not need to have a network connection to parse a document that uses namespaces.

All namespace URIs should be absolute. That is, http://namespaces.cafeconleche.org/chess is a good choice for a namespace name. However, /chess, /games/chess, games /chess, and chess are bad choices. The exact meaning of a relative namespace URI is unclear. Is it just the string? Does it depend on the base URI of the document, and thus can it change when the document is moved from one system to another? The W3C has not been able to decide this point and seems unlikely to do so in the future. All they've been able to agree on (and that with a lot of argument) is that relative namespace URLs are a very bad idea. Most APIs and tools just compare namespace URIs as strings, but a few may try to resolve the URI against the base URI. The results are unpredictable. Far and away the best solution is to simply make all namespace URIs absolute. This makes the namespace constant when documents are moved from one host to another. After all, do you really want the namespace of a document to change just because you saved it on your local hard drive from a remote server?

Occasionally, you'll see URI references with fragment identifiers used as namespace URIs, shown as follows .

 <game xmlns="http://namespaces.cafeconleche.org/games#chess">   <move>f3</move>   <move>e5</move>   <move>g4</move>   <move>Qh4++</move> </game>

There's nothing wrong with this. However, it does not have any particular specification-defined meaning. Namespace URIs are compared for string equality character for character. That's all. From this perspective, http://namespaces.cafeconleche.org/games#chess and http://namespaces.cafeconleche.org/games#checkers are two completely different namespaces.

Slightly more common is the pattern used by the Resource Description Framework, XML Digital Signatures, XML Encryption, and a few other applications. In these cases, the namespace URI ends in a sharp sign (#), but no fragment identifier is present. The following list gives the namespace URIs for the three vocabularies named above, respectively.

http://www.w3.org/1999/02/22-rdf-syntax-ns#
http://www.w3.org/2000/09/xmldsig#
http://www.w3.org/2001/04/xmlenc#

This pattern enables more URLs to be formed by appending the appropriate fragment identifiers. For example, XML encryption appends algorithm names to identify specific algorithms.

http://www.w3.org/2001/04/xmlenc# tripledes -cbc
http://www.w3.org/2001/04/xmlenc#aes128-cbc
http://www.w3.org/2001/04/xmlenc#aes256-cbc
http://www.w3.org/2001/04/xmlenc#aes192-cbc
http://www.w3.org/2001/04/xmlenc#rsa-1_5
http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p
http://www.w3.org/2001/04/xmlenc#dh

However, this is just a convention. Since these algorithm URLs are not used as namespace URIs, the bare names would probably work equally well. Mostly this usage is just a symptom of the W3C's mania for making all names URIs (and http URLs in particular). It doesn't hurt much, aside from added verbosity ; but it doesn't have much practical value either.

Theoretically, query strings could also be used in namespace URIs, as in the following example.

 <game xmlns="http://namespaces.cafeconleche.org/games?name=chess">   <move>f3</move>   <move>e5</move>   <move>g4</move>   <move>Qh4++</move> </game>

I've never seen this in practice and can't really imagine a reason for doing it. Query strings have meaning only when the URI is processed by an HTTP server. Since that is not normally done to a namespace URI, I suggest you not use query strings in namespace URIs.

Currently namespace URIs are limited to the ASCII character set, just as all other URIs are. (This may change when Namespaces 1.1 is released.) Non-ASCII characters such as and y have to be percent escaped. Certain other characters like : and = that have particular meaning in a URI context also have to be escaped when they're used in a different way. And some characters such as <, >, and " must always be percent escaped. The namespaces and URI specifications are unclear about how such escaping is treated, including what character set non-ASCII characters are encoded in and whether namespace URIs are compared for equality before or after the escape sequences are resolved. Even such trivial differences as http://www.example.com/%7E versus http://www.example.com/%7e can be important in some environments. Some APIs and tools consider these to be the same. Others consider them to be different. Using any of these guarantees trouble. Make sure all your namespace URIs use only ASCII characters that do not have to be escaped.

For similar reasons, it's necessary to pick a case convention, even for normally case-insensitive parts of a URI such as the domain name. HTTP://WWW.EXAMPLE.COM/namespace and http://www.example.com/namespace are not the same namespace name, even if they are the same URL. Almost everybody just makes the scheme and domain name lower case.