Section 8.5. Well-Formed XHTML

8.5. Well-Formed XHTML

Web browsers are forgiving of sloppy HTML, but XHTML (being an XML application) requires that you play by the rigid rules of XML markup syntax. What makes XHTML documents different from HTML documents is that you need to be absolutely sure that your document follows the syntax rules of XML correctly (in other words, that it is well-formed ). The sections below summarize the requirements of well-formed XHTML as well as some tips for backward compatibility with older browsers.

8.5.1. All-Lowercase Element and Attribute Names

In XML, all elements and attribute names are case-sensitive, which means that <img>, <Img>, and <IMG> are parsed as different elements. In the reformulation of HTML into XHTML, all elements were interpreted to be lowercase. When writing XHTML documents (and their associated style sheets), be sure that all tags and attribute names are written in lowercase. Attribute values are not required to be case-sensitive.

If you want to convert the upper- and mixed-case tags in an existing HTML file to well-formed, all-lowercase tags, try the HTML Tidy utility (tidy.sourceforge.net/) or Barebones Software BBEdit (Macintosh only, www.bbedit.com), which can automate the process.

8.5.2. Quoted Attribute Values

XHTML requires that all attribute values be contained in quotation marks . Double or single quotation marks are acceptable, as long as they are used consistently throughout the document. So where previously it was okay to omit the quotes around single words and numeric values, now you need to be careful that every attribute value is quoted.

8.5.3. Element Termination

In HTML, it is okay to omit the end tag for certain block elements (such as p and li). The beginning of a new block element is enough to trigger the browser to parse the previous one as closed. Not so in XHTML. To be well-formed, every container element must have its end tag, or it registers as an error and renders the document noncompliant.

8.5.4. Empty Elements

This need for termination extends to empty elements as well. So instead of just inserting a line break as <br>, XHTML requires the element to be terminated. You can simply add a slash before the closing bracket, indicating the element's ending. So in XHTML, a line break is entered as <br/>.

The notion of closing empty elements can cause some browsers (namely Netscape 4) to complain, and even new browsers to have problems when content is sent as text/html, so to keep your XHTML digestible to those browsers, be sure to add a space before the closing slash (<br />). This allows the closed empty tag to slide right through. See Table 8-1 for a complete list of empty elements.

8.5.5. Explicit Attribute Values

XML (and therefore XHTML) does not support attribute minimization , the SGML practice in which certain attributes can be reduced to just the attribute value. So while HTML has many minimized attributes such as checked and nowrap, in XHTML, the values need to be explicitly declared as checked="checked" and nowrap="nowrap". Table 8-2 lists the attributes that were minimized in HTML but require values in XHTML.

Table 8-2. Explicit attribute values
checked="checked"	disabled="disabled"	noresize="noresize"
compact="compact"	ismap="ismap"	nowrap="nowrap"
declare="declare"	multiple="multiple"	readonly="readonly"
defer="defer"	noshade="noshade"	selected="selected"

8.5.6. Nesting Requirements

It has always been a rule in HTML that elements should be properly nested within one another. The closing tag of a contained element should always appear before the closing tag of the element that contains it. In XHTML, this rule is strictly enforced. So be sure that your elements are nested correctly, like this:

 <p>I can <em>fly!</em></p>

and not overlapping like this:

 <p>I can <em>fly!</p></em>

In addition, XHTML enforces other nesting restrictions that have always been a part of the HTML specification. The XHTML DTD includes a special "Content Models for Exclusions" note that reinforces the following:

An a element cannot contain another a element.
The pre element cannot contain img, object, applet, big, small, sub, sup, font, or basefont.
The form element may not contain other form elements.
A button element cannot contain a, form, input, select, textarea, label, button, iframe, or isindex.
The label element cannot contain other label elements.

8.5.7. Character Entities

XHTML (as a function of XML) is extremely fussy about special characters such as <, >, and &. All special characters should be represented in the XHTML document by their character entities instead. Common character entities are listed in Table 10-3, and the complete list appears in Appendix C.

Character entity references should be used in place of characters such as < and & in regular text content, as shown in these examples:

 <p> the value of A &lt; B </p> <p> Laverne &amp; Shirley </p>

In places where it was common to use special characters, such as in the title of a document or in an attribute value, it is now necessary to use the character entity instead. For instance, the following worked just fine in HTML, despite being invalid:

 <img src="/books/4/439/1/html/2/puppets.jpg" alt="Crocco & Lynch">

But in XHTML, the value must be written like this:

 <img src="/books/4/439/1/html/2/puppets.jpg" alt="Crocco &amp; Lynch" />

This applies to ampersands that occur in URLs as well.

 <a href="mailto: jen@example.com ? subject=subject&amp;cc=person ">      Email Jen<a/>

8.5.8. Protecting Scripts

It is common practice to enclose scripts and style sheets in comments (between ). Unfortunately, XML software thinks of comments as unimportant information and may simply remove the comments from a document before processing it. To avoid this problem, use an XML CDATA section instead. Content enclosed in <![CDATA[...]]> is considered simple text characters and is not parsed (for more information, see Chapter 7). For example:

 <script language="JavaScript"> <![CDATA[ ...JavaScript here... ]]> </script>

The problem with this method is backward compatibility . HTML browsers ignore the contents of the XML CDATA section , while XML browsers ignore the contents of comment-enclosed scripts and style sheets. So you can't please everyone. One workaround is to put your scripts and styles in separate files and reference them in the document with appropriate external links. The common practice is to avoid CDATA and comments altogether and keep scripts and style externalized. Although not required, it is heavily recommended as part of XHTML and document management.

8.5.9. id and name Attributes

In HTML, the name attribute may be used for the elements a, applet, form, frame, iframe, img, and map. The name attribute and the id attribute may be used in HTML to identify document fragments.

In XML, only id may be used for fragments and there may only be a single id attribute per element. XHTML documents must use id instead of name for identifying document fragments in the aforementioned elements. In fact, the name attribute for these elements has been deprecated in the XHTML 1.0 specification.

Once again, we run into an issue with browser compatibility. Some legacy browsers (namely Netscape 4) do not recognize the id attribute as an identifier for a document fragment (current standards-conformant browsers handle it just fine). If your fragment identifiers must work in Netscape 4, use both name and id. Unfortunately, this is likely to cause validation errors if you are complying to XHTML 1.0 Strict or XHTML 1.1, and therefore you should use only the id attribute when possible for fragment identifiers. The only remaining valid use of the name attribute is for form submission semantics on form control elements like input.