10.7 Base URIs | XML in a Nutshell, Third Edition

Relative URL references such as sark.jpg , ../pi1/sark.jpg , and turing /pi1/sark.jpg must be resolved relative to an absolute base URI before being retrieved. When relative URLs are found in XLinks, xml-stylesheet processing instructions, system identifiers, and other locations in XML documents, they are normally resolved relative to the absolute base URL of the document or entity that contains them. For instance, if you find the element <image xlink:type="simple" xlink:href="pi1/sark.jpg " /> in a document at the URL http://www.turing.org.uk/turing/index.html, you would expect to find the file sark.jpg at the URL http://www.turing.org.uk/turing/p1/sark.jpg. This isn't a surprise. It's pretty much how links have worked in HTML for over a decade .

However, XML does add a couple of new wrinkles to this procedure. First, an XML document may be composed of multiple entities loaded from multiple different URLs, even on different servers. If this is the case, then a relative URL is resolved relative to the base URL of the specific entity in which it appears, not the base URL of the entire document.

Secondly, the base URL may be reset or changed from within the document by using xml:base attributes. Such an attribute may appear on the XLink element itself or on any ancestor element in the same entity. For example, this XLink points to ftp://ftp.knowtion.net/pub/mirrors/gutenberg/etext93/wizoz10.txt:

 <novel xmlns:xlink = "http://www.w3.org/1999/xlink"        xml:base="ftp://ftp.knowtion.net/pub/mirrors/gutenberg/etext93/"        xlink:type = "simple"        xlink:href = "wizoz10.txt">   <title>The Wonderful Wizard of Oz</title>   <author>L. Frank Baum</author>   <year>1900</year> </novel>

So does this one:

 <novel xmlns:xlink = "http://www.w3.org/1999/xlink"        xml:base="ftp://ftp.knowtion.net/"        xlink:type = "simple"        xlink:href = "/pub/mirrors/gutenberg/etext93/wizoz10.txt">   <title>The Wonderful Wizard of Oz</title>   <author>L. Frank Baum</author>   <year>1900</year> </novel>

And this one does too:

 <series xml:base="ftp://ftp.knowtion.net/">   <title>Oz Books</title>   <author>L. Frank Baum</author>   <novel xmlns:xlink = "http://www.w3.org/1999/xlink"               xlink:type = "simple"          xlink:href = "/pub/mirrors/gutenberg/etext93/">     <title>The Wonderful Wizard of Oz</title>     <year>1900</year> </novel> ... </series>

All of these link to the URL ftp://ftp.knowtion.net/pub/mirrors/gutenberg/etext93/wizoz10.txt regardless of where the document containing the XLink actually came from. The base URL is taken from the nearest xml:base attribute in the same entity, in preference to the base URL of the entity that contains the element.

xml:base attributes can themselves contain relative URLs. In this case, the base URL is formed by resolving this relative URL against the base URL specified by xml:base attributes higher up in the tree and/or the base URL of the entity that contains the element. For example, resolving the URLs in the xlink:href attributes in this authors element requires applying the URLs in three separate ancestor elements:

 <authors xml:base="http://www.literature.org/authors/"          xmlns:xlink = "http://www.w3.org/1999/xlink">   <author xml:base="baum-l-frank/">     <name>L. Frank Baum</name>     <novel  xml:base = "the-wonderful-wizard-of-oz/">       <title>The Wonderful Wizard of Oz</title>       <year>1900</year>       <chapter xlink:type="simple"                 xlink:href="introduction.html">Introduction</chapter>       <chapter xlink:type="simple"                 xlink:href="chapter-01.html">The Cyclone</chapter>       <chapter xlink:type="simple"                 xlink:href="chapter-02.html">The Council with the                                                  Munchkins</chapter>       ...     </novel>   </author> </authors>

What if the top element has a relative base URL or no xml:base attribute? Then you apply the absolute base URL of the entity that contains the root element. In theory, this entity should always have an absolute base URL against which relative URLs can be resolved as a last resort. After all the entity had to come from somewhere, right? Unfortunately, there are some corner cases where this isn't true. In particular many APIs lose track of the base URLs or create documents in memory without any base URLs, so full resolution isn't always possible. The relevant specifications are not perfectly clear on what happens here, though one possible interpretation is to simply declare that the base URI is the empty string. The URI specification defines this to mean the URI of the current document, whatever it is. However, in the common case where a document is read from an actual file or URL, it should always be possible to calculate an absolute base URL for every element.

There's one point we've made a couple of times, but it's worth calling out because it's not obvious and quite tricky. All base URL resolutions are performed within the scope of a single entity, not a single document. If a document is built from multiple entities, then it's the base URI of the entity that matters, not the base URI of the document. Furthermore, xml:base attributes only have scope within the entity from which they come. They do not apply in any other entities. That is, if entity A includes entity B, no xml:base attributes in entity A will be used to resolve relative URLs in entity B. If the base URL cannot be fully resolved using xml:base attributes from entity B, then the final absolute URL is the URL from which entity B was loaded. xml:base attributes in ancestor elements from different entities are not considered .

Although we've emphasized the application of xml:base attributes to xlink:href attributes in this section, they also apply in many other contexts. For instance, they're used in XInclude and XHTML 2.0. However, xml:base is a relative latecomer to the XML table, so it's not universally applicable . For instance, XHTML 1.0 and 1.1 do not consider xml:base attributes when resolving relative URLs in a and img elements. Instead they use the traditional base element in the document's head .