Item 22. Don't Use Namespace Prefixes in Element Content and Attribute Values
The namespaces specification is very schizophrenic about using namespace prefixes in places that aren't actually defined to be XML names . The most common offender here is attribute values. This is particularly common in schema documents and XSLT stylesheets. For instance, the XSLT template below matches XHTML head elements and then selects the XHTML title element.
<xsl:template match="html:head" xmlns:html="http://www.w3.org/1999/xhtml"> <fo:block><xsl:value-of select="html:title"/></fo:block> </xsl:template>
However, element content can also be an issue. For example, consider the following elements.
<feature>uuid:a088355f-2fee-4001-ba17-e9926fca3eb5</feature> <file path="HD:documents">2003:02:21 Notice.doc</file> <book> <author>John Wogglebug</author> <title>Plato: An Analysis</title> </book> <classpath>xerces.jar:xalan.jar</classpath>
All of these look more or less like prefixed names, but none of them is. There are also many uses of colons in data that are not XML names but still might appear so at first glance.
<book id="isbn:2841771431" /> <a href="http://www.w3.org/">The W3C</a> <file path="HD:books:effectivexml" /> <a href="mailto:firstname.lastname@example.org">Elliotte Rusty Harold</a> <classpath value="/home/elharo/lib:/java/classes/."/> <module name="XML::Parser"/>
And it gets even worse when you consider things like XPath expressions that may contain one or more prefixed names but are not themselves simple XML names. It is extremely difficult to tell whether a string containing a colon is or is not an XML name when used in such places. Most XML APIs including SAX, DOM, and JDOM provide only marginal support for resolving prefixes in element content. Generally, they can tell you what URI any given prefix maps to at a certain point in the document, but they cannot tell you whether a word that contains a colon is or is not a prefixed name.
XSLT and the W3C XML Schema Language both require prefixes in content, and this has caused no end of trouble for implementers. Don't follow their bad example. If you need to use namespaces in element content and/or attribute values, use the URI and the local name rather than the prefixed names. This way there will be no ambiguity.
RELAX NG and the W3C XML Schema Language provide an excellent contrast between the right way and the wrong way to fill this need. Both RELAX NG and the W3C XML Schema Language are namespace aware, and thus they need to specify the namespaces of the elements and attributes a schema declares. The W3C XML Schema Language uses prefixed names in attribute values to do this. For instance, here is a W3C XML Schema Language declaration of a Year element in the http://www.example.com/ namespace that has type xsd:gYear .
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ex="http://www.example.com/" name="ex:Year" type="xsd:gYear"/>
Namespace prefixes are used not only on element names but also in the values of the name and type attributes.
By contrast, RELAX NG would declare the same element like this:
<rng:element xmlns:rng="http://relaxng.org/ns/structure/1.0" name="Year" ns="http://www.example.com"> <rng:data type="gYear" datatypeLibrary="http://www.w3.org/2001/ XMLSchema-datatypes" /> </rng:element>
Here namespace prefixes are used only on the element names. The namespace of the element being declared is identified by an ns attribute. The type set is identified by a URI in the datatypeLibrary attribute that isn't even a namespace URI. There are no prefixes anywhere in the attribute values. It's not that the W3C XML Schema Language approach binds a specific prefix. You can still change the prefix in instance documents. It's merely that the RELAX NG schema is much easier to parse and manipulate. There's no implicit substructure in the attribute values that the parser can't expose. It's all laid out in attribute values and element content.