Chapter 4. Namespaces | XML in a Nutshell, 2nd Edition

CONTENTS

4.1 The Need for Namespaces
4.2 Namespace Syntax
4.3 How Parsers Handle Namespaces
4.4 Namespaces and DTDs

Namespaces have two purposes in XML:

To distinguish between elements and attributes from different vocabularies with different meanings and that happen to share the same name.
To group all the related elements and attributes from a single XML application together so that software can easily recognize them.

The first purpose is easier to explain and to grasp, but the second purpose is more important in practice.

Namespaces are implemented by attaching a prefix to each element and attribute. Each prefix is mapped to a URI by an xmlns:prefix attribute. Default URIs can also be provided for elements that don't have a prefix by xmlns attributes. Elements and attributes that are attached to the same URI are in the same namespace. Elements from many XML applications are identified by standard URIs.

4.1 The Need for Namespaces

Some documents combine markup from multiple XML applications. For example, an XHTML document may contain both SVG pictures and MathML equations. An XSLT stylesheet will contain both XSLT instructions and elements from the result-tree vocabulary. And XLinks are always symbiotic with the elements of the document in which they appear since XLink itself doesn't define any elements, only attributes.

In some cases, these applications may use the same name to refer to different things. For example, in SVG a set element sets the value of an attribute for a specified duration of time, while in MathML a set element represents a mathematical set such as the set of all positive even numbers. It's essential to know when you're working with a MathML set and when you're working with an SVG set. Otherwise, validation, rendering, indexing, and many other tasks will get confused and fail.

Consider Example 4-1. This is a simple list of paintings including the title of each painting, the date each was painted, the artist who painted it, and a description of the painting.

Example 4-1. A list of paintings

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <catalog>   <painting>     <title>Memory of the Garden at Etten</title>     <artist>Vincent Van Gogh</artist>     <date>November, 1888</date>     <description>       Two women look to the left. A third works in her garden.     </description>   </painting>   <painting>     <title>The Swing</title>     <artist>Pierre-Auguste Renoir</artist>     <date>1876</date>     <description>       A young girl on a swing. Two men and a toddler watch.     </description>   </painting>   <!-- Many more paintings... --> </catalog>

Now suppose that Example 4-1 is to be served as a web page and you want to make it accessible to search engines. One possibility is to use the Resource Description Framework (RDF) to embed metadata in the page. This describes the page for any search engines or other robots that might come along. Using the Dublin Core metadata vocabulary (http://purl.oclc.org/dc/), a standard vocabulary for library-catalog-style information that can be encoded in XML or other syntaxes, an RDF description of this page might look something like this:

<RDF>   <Description      about="http://www.cafeconleche.org/examples/impressionists.xml">     <title> Impressionist Paintings </title>     <creator> Elliotte Rusty Harold </creator>     <description>       A list of famous impressionist paiintings organized       by painter and date     </description>     <date>2000-08-22</date>   </Description> </RDF>

Here we've used the Description and RDF elements from RDF and the title, creator, description, and date elements from the Dublin Core. We have no choice about these names; they are established by their respective specifications. If we want standard software, which understands RDF and the Dublin Core, to understand our documents, then we have to use these names. Example 4-2 combines this description with the actual list of paintings.

Example 4-2. A list of paintings including catalog information about the list

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <catalog>   <RDF>     <Description         about="http://www.cafeconleche.org/examples/impressionists.xml">       <title> Impressionist Paintings </title>       <creator> Elliotte Rusty Harold </creator>       <description>         A list of famous impressionist paintings organized         by painter and date       </description>       <date>2000-08-22</date>     </Description>   </RDF>   <painting>     <title>Memory of the Garden at Etten</title>     <artist>Vincent Van Gogh</artist>     <date>November, 1888</date>     <description>       Two women look to the left. A third works in her garden.     </description>   </painting>   <painting>     <title>The Swing</title>     <artist>Pierre-Auguste Renoir</artist>     <date>1876</date>     <description>       A young girl on a swing. Two men and a toddler watch.     </description>   </painting>   <!-- Many more paintings... --> </catalog>

Now we have a problem. Several elements have been overloaded with different meanings in different parts of the document. The title element is used for both the title of the page and the title of a painting. The date element is used for both the date the page was written and the date the painting was painted. One description element describes pages, while another describes paintings.

This presents all sorts of problems. Validation is difficult because catalog and Dublin Core elements with the same name have different content specifications. Web browsers may want to hide the page description while showing the painting description, but not all stylesheet languages can tell the difference between the two. Processing software may understand the date format used in the Dublin Core date element, but not the more free-form format used in the painting date element.

We could change the names of the elements from our vocabulary, painting_title instead of title, date_painted instead of date, and so on. However, this is inconvenient if you already have a lot of documents marked up in the old version of the vocabulary. And it may not be possible to do this in all cases, especially if the name collisions occur not because of conflicts between your vocabulary and a standard vocabulary, but because of conflicts between two or more standard vocabularies. For instance, RDF just barely avoids a collision with the Dublin Core over the Description and description elements.

In other cases, there may not be any name conflicts, but it may still be important for software to determine quickly and decisively to which XML application a given element or attribute belongs. For instance, an XSLT processor needs to distinguish between XSLT instructions and literal result-tree elements.

4.2 Namespace Syntax

Namespaces disambiguate elements with the same name from each other by assigning elements and attributes to URIs. Generally, all the elements from one XML application are assigned to one URI, and all the elements from a different XML application are assigned to a different URI. These URIs are sometimes called namespace names. The URIs partition the elements and attributes into disjoint sets. Elements with the same name but different URIs are different elements. Elements with the same name and the same URIs are the same. Most of the time there's a one-to-one mapping between namespaces and XML applications, though a few applications use multiple namespaces to subdivide different parts of the application. For instance, XSL uses different namespaces for XSL Transformations (XSLT) and XSL Formatting Objects (XSL-FO).

4.2.1 Qualified Names, Prefixes, and Local Parts

Since URIs frequently contain characters such as /, %, and ~ that are not legal in XML names, short prefixes such as rdf and xsl stand in for them in element and attribute names. Each prefix is associated with a URI. Names whose prefixes are associated with the same URI are in the same namespace. Names whose prefixes are associated with different URIs are in different namespaces. Prefixed elements and attributes in namespaces have names that contain exactly one colon. They look like this:

rdf:description xlink:type xsl:template

Everything before the colon is called the prefix. Everything after the colon is called the local part. The complete name including the colon is called the qualified name, QName, or raw name. The prefix identifies the namespace to which the element or attribute belongs. The local part identifies the particular element or attribute within the namespace.

In a document that contains both SVG and MathML set elements, one could be an svg:set element, and the other could be a mathml:set element. Then there'd be no confusion between them. In an XSLT stylesheet that transforms documents into XSL formatting objects, the XSLT processor would recognize elements with the prefix xsl as XSLT instructions and elements with the prefix fo as literal result elements.

Prefixes may be composed from any legal XML name character except the colon. Prefixes beginning with the three letters xml (in any combination of case) are reserved for use by XML and its related specifications. Otherwise, you're free to name your prefixes in any way that's convenient. One further restriction namespaces add to XML 1.0 is that the local part may not contain any colons. In short, the only legal uses of a colon in XML are to separate a namespace prefix from the local part in a qualified name or for the attributes XML itself defines, such as xml:space and xml:lang.

4.2.2 Binding Prefixes to URIs

Each prefix in a qualified name must be associated with a URI. For example, all XSLT elements are associated with the http://www.w3.org/1999/XSL/Transform URI. The customary prefix xsl is used in place of the longer URI http://www.w3.org/1999/XSL/Transform.

You can't use the URI in the name directly. For one thing, the slashes in most URIs aren't legal characters in XML names. However, it's occasionally useful to refer to the full name without assuming a particular prefix. One convention used on many XML mailing lists and in XML documentation is to enclose the URI in curly braces and prefix it to the name. For example, the qualified name xsl:template might be written as the full name {http://www.w3.org/1999/XSL/Transform}template. Another convention is to append the local name to the namespace name after a sharp sign so that it becomes a URI fragment identifier. For example, http://www.w3.org/1999/XSL/Transform#template. However, both forms are only conveniences for communication among human beings when the URI is important but the prefix isn't. Neither an XML parser nor an XSLT processor will accept or understand the long forms.

Prefixes are bound to namespace URIs by attaching an xmlns:prefix attribute to the prefixed element or one of its ancestors. (The prefix should be replaced by the actual prefix used.) For example, the xmlns:rdf attribute of this rdf:RDF element binds the prefix rdf to the namespace URI http://www.w3.org/TR/REC-rdf-syntax#:

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">  <rdf:Description       about="http://www.cafeconleche.org/examples/impressionists.xml">     <title> Impressionist Paintings </title>     <creator> Elliotte Rusty Harold </creator>     <description>       A list of famous impressionist paintings organized       by painter and date     </description>     <date>2000-08-22</date>   </rdf:Description> </rdf:RDF>

Bindings have scope within the element where they're declared and within its contents. The xmlns:rdf attribute declares the rdf prefix for the rdf:RDF element, as well as its child elements. An RDF processor will recognize rdf:RDF and rdf:Description as RDF elements because both have prefixes bound to the particular URI specified by the RDF specification. It will not consider the title, creator, description, and date elements to be RDF elements because they do not have prefixes bound to the http://www.w3.org/TR/REC-rdf-syntax# URI.

The prefix can be declared in the topmost element that uses the prefix or in any ancestor thereof. This may be the root element of the document, or it may be an element at a lower level. For instance, the Dublin Core elements could be attached to the http://purl.org/dc/ namespace by adding an xmlns:dc attribute to the rdf:Description element, as shown in Example 4-3, since all Dublin Core elements in this document appear inside a single rdf:Description element. In other documents that spread the elements out more, it might be more convenient to put the namespace declaration on the root element. If necessary, a single element can include multiple namespace declarations for different namespaces.

Example 4-3. A document containing both SVG and XLinks

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <catalog>   <rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">     <rdf:Description xmlns:dc="http://purl.org/dc/"        about="http://www.cafeconleche.org/examples/impressionists.xml">       <dc:title> Impressionist Paintings </dc:title>       <dc:creator> Elliotte Rusty Harold </dc:creator>       <dc:description>         A list of famous impressionist paintings organized         by painter and date       </dc:description>       <dc:date>2000-08-22</dc:date>     </rdf:Description>   </rdf:RDF>   <painting>     <title>Memory of the Garden at Etten</title>     <artist>Vincent Van Gogh</artist>     <date>November, 1888</date>     <description>       Two women look to the left. A third works in her garden.     </description>   </painting>   <painting>     <title>The Swing</title>     <artist>Pierre-Auguste Renoir</artist>     <date>1876</date>     <description>       A young girl on a swing. Two men and a toddler watch.     </description>   </painting>   <!-- Many more paintings... --> </catalog>

A DTD for this document can include different content specifications for the dc:description and description elements. A stylesheet can attach different styles to dc:title and title. Software that sorts the catalog by date can pay attention to the date elements and ignore the dc:date elements.

In this example, the elements without prefixes, such as catalog, painting, description, artist, and title, are not in any namespace. Furthermore, unprefixed attributes (such as the about attribute of rdf:Description in the previous example) are never in any namespace. Being an attribute of an element in the http://www.w3.org/TR/REC-rdf-syntax# namespace is not sufficient to put the attribute in the http://www.w3.org/TR/REC-rdf-syntax# namespace. The only way an attribute belongs to a namespace is if it has a declared prefix, like xlink:type and xlink:href.

It is possible to redefine a prefix within a document so that in one element the prefix refers to one namespace URI, while in another element it refers to a different namespace URI. In this case, the closest ancestor element that declares the prefix takes precedence. However, in most cases redefining prefixes is a very bad idea that only leads to confusion and is not something you should actually do.

4.2.3 Namespace URIs

Many XML applications have customary prefixes. For example, SVG elements often use the prefix svg, and RDF elements often have the prefix rdf. However, these prefixes are simply conventions and can be changed based on necessity, convenience, or whim. Before a prefix can be used, it must be bound to a URI like http://www.w3.org/2000/svg or http://www.w3.org/1999/02/22-rdf-syntax-ns#. It is these URIs that are standardized, not the prefixes. The prefix can change as long as the URI stays the same. An RDF processor looks for the RDF URI, not any particular prefix. As long as nobody outside the w3.org domain uses namespace URIs in the w3.org domain, and as long as the W3C can keep a careful eye on what its people are using for namespaces, all conflicts can be avoided.

Namespace URIs do not necessarily point to any actual document or page. In fact, they don't have to use the http scheme. They might even use some other protocol like mailto in which URIs don't even point to documents. However, if you're defining your own namespace using an http URI, it would not be a bad idea to place some documentation for the specification at the namespace URI. The W3C got tired of receiving broken-link reports for the namespace URIs in their specifications, so they added some simple pages at their namespace URIs. For more formal purposes that offer some hope of automated resolution and other features, you can place a Resource Directory Description Language (RDDL) document at the namespace URI. This possibility will be discussed further in Chapter 14. You are by no means required to do this, though. Many namespace URIs lead to 404-Not Found errors when you actually plug them into a web browser. Namespace URIs are purely formal identifiers. They are not the addresses of a page, and they are not meant to be followed as links.

Parsers compare namespace URIs on a character-by-character basis. If the URIs differ in even a single normally insignificant place, then they define separate namespaces. For instance, http://www.w3.org/1999/02/22-rdf-syntax-ns#, http://WWW.W3.ORG/1999/02/22-rdf-syntax-ns#, http://www.w3.org/1999/02/22-rdf-syntax-ns/, and http://www.w3.org/1999/02/22-rdf-syntax-ns/index.rdf all point to the same page. However, only the first is the correct namespace name for the RDF. These four URLs identify four separate namespaces.

4.2.4 Setting a Default Namespace with the xmlns Attribute

You often know that all the content of a particular element will come from a particular XML application. For instance, inside an SVG svg element, you're only likely to find other SVG elements. You can indicate that an unprefixed element and all its unprefixed descendant elements belong to a particular namespace by attaching an xmlns attribute with no prefix to the top element. For example:

<svg xmlns="http://www.w3.org/2000/svg"      width="12cm" height="10cm">   <ellipse rx="110" ry="130" />   <rect x="4cm" y="1cm" width="3cm" height="6cm" /> </svg>

Here, although no elements have any prefixes, the svg, ellipse, and rect elements are in the http://www.w3.org/2000/svg namespace.

The attributes are a different story. Default namespaces only apply to elements, not to attributes. Thus in the previous example the width, height, rx, ry, x, and y attributes are not in any namespace.

You can change the default namespace within a particular element by adding an xmlns attribute to the element. Example 4-4 is an XML document that initially sets the default namespace to http://www.w3.org/1999/xhtml for all the XHTML elements. This namespace declaration applies within most of the document. However, the svg element has an xmlns attribute that resets the default namespace to http://www.w3.org/2000/svg for itself and its content. The XLink information is included in attributes, however, so these must be placed in the XLink namespace using explicit prefixes.

Example 4-4. An XML document that uses default namespaces

<?xml version="1.0"?> <html xmlns="http://www.w3.org/1999/xhtml"       xmlns:xlink="http://www.w3.org/1999/xlink">   <head><title>Three Namespaces</title></head>   <body>     <h1 align="center">An Ellipse and a Rectangle</h1>     <svg xmlns="http://www.w3.org/2000/svg"          width="12cm" height="10cm">       <ellipse rx="110" ry="130" />       <rect x="4cm" y="1cm" width="3cm" height="6cm" />     </svg>     <p xlink:type="simple" xlink:href="ellipses.html">       More about ellipses     </p>     <p xlink:type="simple" xlink:href="rectangles.html">       More about rectangles     </p>     <hr/>     <p>Last Modified May 13, 2000</p>   </body> </html>

The default namespace does not apply to any elements or attributes with prefixes. These still belong to whatever namespace to which their prefix is bound. However, an unprefixed child element of a prefixed element still belongs to the default namespace.

4.2.5 Attribute Declarations for xmlns

When namespaces are only being used to identify the elements and attributes from a particular XML application, and not to distinguish different elements with the same name, a DTD can attach a fixed xmlns attribute to the primary container elements for an application so that everything is placed in the right namespace without explicit xmlns attributes in the document. For example, this ATTLIST declaration fixes the default namespace of all svg elements as http://www.w3.org/2000/:

<!ATTLIST svg xmlns CDATA #FIXED "http://www.w3.org/2000/">

This allows you to omit xmlns attributes from all your svg elements.

A document does not need to be valid to take advantage of this. All that's required is that the parser read the DTD. All parsers will read the internal DTD subset and process any such ATTLIST declarations they find there. A few nonvalidating parsers may skip external DTD subsets, and thus get confused. Ideally, you should use a parser that can validate so that it will at least be able to read the external DTD subset, though you might use it with validation turned off.

4.3 How Parsers Handle Namespaces

Namespaces are not part of XML 1.0. They were invented about a year after the original XML specification was released. However, care was taken to ensure backwards compatibility. Thus, an XML 1.0 parser that does not know about namespaces should not have any trouble reading a document that uses namespaces. Colons are legal characters in XML 1.0 element and attribute names. The parser will simply report that some of the names contain colons. Possible problems arise in the rare cases where different qualified names resolve to the same full name or where the same qualified name indicates a different full name in different parts of a document.

A namespace-aware parser does add a couple of checks to the normal well-formedness checks that a parser performs. Specifically, it checks to see that all prefixes are mapped to URIs. It will reject documents that use unmapped prefixes (except for xml and xmlns when used as specified in the XML 1.0 or Namespaces in XML specifications.) It will further reject any element or attribute names that contain more than one colon. Otherwise, it behaves almost exactly like a non-namespace-aware parser. Other software that sits on top of the raw XML parser, an XSLT engine for example, may treat elements differently depending on to which namespace they belong. However, the XML parser itself mostly doesn't care as long as all well-formedness and namespace constraints are met.

A possible exception occurs in the unlikely event that elements with different prefixes belong to the same namespace. In this case, a namespace-aware parser will report the elements as being the same, while a non-namespace-aware parser will report them as different. About equally unlikely is the case where two elements or attributes with the same qualified name are in different namespaces because the common prefix is bound to different URIs in different places in the document. Slightly more likely is the case where two unprefixed names are placed in different default namespaces. In both these cases, a namespace-aware processor will report them as different, whereas a non-namespace-aware processor will treat them the same. Many parsers let you turn namespace processing on or off as you see fit.

4.4 Namespaces and DTDs

Namespaces are completely independent of DTDs and can be used in both valid and invalid documents. A document can have a DTD but not use namespaces or use namespaces but not have a DTD. It can use both namespaces and DTDs or neither namespaces nor DTDs. Namespaces do not in any way change DTD syntax nor do they change the definition of validity. For instance, the DTD of a valid document that uses an element named dc:title must include an ELEMENT declaration properly specifying the content of the dc:title element. For example:

<!ELEMENT dc:title (#PCDATA)>

The name of the element in the document must exactly match the name of the element in the DTD including the prefix. The DTD cannot omit the prefix and simply declare a title element. The same is true of prefixed attributes. For instance, if an element used in the document has xlink:type and xlink:href attributes, then the DTD must declare the xlink:type and xlink:href attributes, not simply type and href.

Conversely, if an element uses an xmlns attribute to set the default namespace and does not attach prefixes to elements, then the names of the elements must be declared without prefixes in the DTD. The validator neither knows nor cares about the existence of namespaces. All it sees is that some element and attribute names happen to contain colons; as far as it's concerned, such names are perfectly valid as long as they're declared.

4.4.1 Parameter Entity References for Namespace Prefixes

Requiring DTDs to declare the prefixed names instead of the raw names or some combination of local part and namespace URI makes it difficult to change the prefix in valid documents. The problem is that changing the prefix requires changing all declarations that use that prefix in the DTD as well. However, with a little forethought, parameter entity references can alleviate the pain quite a bit.

The trick is to define as parameter entities both the namespace prefix and the colon that separates the prefix from the local name, like this:

<!ENTITY % dc-prefix "dc"> <!ENTITY % dc-colon ":">

The second step is to define the qualified names as more parameter entity references, like these:

<!ENTITY % dc-title       "%dc-prefix;%dc-colon;title"> <!ENTITY % dc-creator     "%dc-prefix;%dc-colon;creator"> <!ENTITY % dc-description "%dc-prefix;%dc-colon;description"> <!ENTITY % dc-date        "%dc-prefix;%dc-colon;date">

Do not omit this step and try to use the dc-prefix and dc-colon parameter entities directly in ELEMENT and ATTLIST declarations. This will fail because XML parsers add extra space around the entity's replacement text when they're used outside another entity's replacement text.

Then you use the entity references for the qualified name in all declarations, like this:

<!ELEMENT %dc-title; (#PCDATA)> <!ELEMENT %dc-creator; (#PCDATA)> <!ELEMENT %dc-description; (#PCDATA)> <!ELEMENT %dc-date; (#PCDATA)> <!ELEMENT rdf:Description   ((%dc-title; | %dc-creator; | %dc-description; | %dc-date; )*) >

Now a document that needs to change the prefix simply changes the parameter entity definitions. In some cases, this will be done by editing the DTD directly. In others, it may be done by overriding the definitions in the document's own internal DTD subset. For example, to change the prefix from dc to dublin, you'd add this entity definition somewhere in the DTD before the normal definition:

<!ENTITY % dc-prefix "dublin">

If you wanted to use the default namespace instead of explicit prefixes, you'd redefine both the dc-prefix and dc-colon entities as the empty string, like this:

<!ENTITY % dc-prefix ""> <!ENTITY % dc-colon  "">

CONTENTS