Section 16.3. HTML Versus XHTML


16.3. HTML Versus XHTML

The majority of HTML is completely compatible with XHTML, and this book is devoted to that majority. In this chapter, however, we talk about the minority: where the HTML 4.01 standard and the XHTML DTD differs . If you truly desire to create documents that are both HTML and XHTML compliant, you must heed the various warnings and caveats we outline in the following sections.

The biggest differencethat's Difference with a capital D and that spells difficultis that writing XHTML documents requires much more discipline and attention to detail than even the most fastidious HTML author ever dreamed necessary. In W3C parlance, that means your documents must be impeccably well formed. Throughout the history of HTMLand in this bookauthors have been encouraged to create well- formed documents, but you have to break rank with the HTML standards for your documents to be considered well formed by XML standards.

Nonetheless, your efforts to master XHTML will be rewarded with documents that are well formed and a sense of satisfaction from playing by the new rules. You will truly benefit in the future, too: through XML, your documents will be able to appear in places you never dreamed would exist (mostly good places, we hope).

16.3.1. Correctly Nested Elements

One requirement of a well-formed XHTML document is that its elements are nested correctly. This isn't any different from the HTML standards: simply close the markup elements in the order in which you opened them. If one element is within another, the end tag of the inner element must appear before the end tag of the outer element.

Hence, in the following well-formed XHTML segment, we end the italics tag before we end the bold one, because we started italicizing after we started bolding the content:

 <b>Close the italics tag <i>first</i></b>. 

On the other hand, the following:

 <b>Well formed, this is <i>not!</b></i> 

is not well formed.

XHTML strictly enforces other nesting restrictions that have always been part of HTML but have not always been enforced. These restrictions are not formally part of the XHTML DTD; they are instead defined as part of the XHTML standard that is based on the DTD. [*]

[*] This is hair splitting within the XHTML standard. The XML standard has no mechanism to define which tags may not be placed within another tag. SGML, upon which XML is based, does have such a feature, but it was removed from XML to make the language easier to use and implement. As a result, these restrictions are simply listed in an appendix of the XHTML standard instead of being explicitly defined in the XHTML DTD.

Nesting restrictions include the following:

  • The <a> tag cannot contain another <a> tag.

  • The <pre> tag cannot contain <img> , <object> , <big> , <small> , <sub> , or <sup> tags.

  • The <button> tag cannot contain <input> , <select> , <textarea> , <label> , <button> , <form> , < fieldset > , <iframe> , or <isindex> tags.

  • The <label> tag cannot contain other <label> tags.

  • The <form> tag cannot contain other <form> tags.

These restrictions apply to nesting at any level. For example, because an <a> tag cannot contain any other <a> tags, any tag contained within that <a> tag cannot itself contain an <a> tag, even though it might otherwise .

16.3.2. End Tags

As we've documented throughout this book, any HTML tag that contains other tags or content has a corresponding end tag. However, one of the hallmarks of HTML (codified in the 4.01 standard) is that you may leave out the end tags if the processing agent can infer their presence. This is why most of us HTML authors commonly leave out the </p> end tag between adjacent paragraphs. Also, lists and tables can be complicated to wade through, and not having to visually stumble over all the </li> , </td> , </th> , and </tr> end tags certainly makes HTML easier to read, albeit a bit more ambiguous.

This is not so for XHTML. Every tag that contains other tags or content must have a corresponding end tag present, correctly nested within the XHTML document. A missing end tag is an error and renders the document noncompliant. Although seemingly draconian, this and the nesting rules nonetheless remove any and all ambiguities as to where one tag starts and another tag ends.

16.3.3. Handling Empty Elements

In XML, and thus XHTML, every tag must have a corresponding end tageven those that aren't allowed to contain other tags or content. Accordingly, XHTML expects the line break to appear as <br></br> in your document. Ugh.

Fortunately, there is an acceptable alternative: include a slash before the closing bracket of the tag to indicate its ending (e.g., <br /> ). If the tag has attributes, the slash comes after all the attributes so that an image could be defined as:

 <img src="kumquat.gif" /> 

While this notation may seem foreign and annoying to an HTML author, it actually serves a useful purpose. Any XHTML element that has no content can be written this way. Thus, an empty paragraph can be written as <p /> , and an empty table cell can be written as <td /> . This is a handy way to mark empty table cells .

Clever as it may seem, writing empty tags in this abbreviated way may confuse HTML browsers. To avoid compatibility problems, you can fool the HTML browsers by placing a space before the forward slash in an empty element using the XHTML version of its end tag. For example, use <br /> , with a space between the br and / , instead of the XHTML equivalents <br/> and <br></br> . Table 16-1 contains all of the empty HTML tags, expressed in their acceptable XHTML (transitional DTD) forms.

Table 16-1. HTML empty tags in XHTML format

<area />

<base />

<basefont />

<br />

<col />

<frame />

<hr />

<img />

<input />

<isindex />

<link />

<meta />

<param />

   

16.3.4. Case Sensitivity

If you thought getting all those end tags in the right place and cleaning up the occasional nesting error would make writing XHTML documents difficult, hold on to your hat. XHTML is case-sensitive for all tag and attribute names . In an XHTML document, <a> and <A> are different tags; src and SRC are different attributes, and so are sRc and SrC ! How forgiving HTML seems now.

The XHTML DTD defines all former HTML tags and attributes using lowercase letters . Uppercase tag or attribute names are not valid XHTML tags or attributes.

This can be a difficult situation for any author wishing to convert existing HTML documents into XHTML-compliant ones. Lots of web pages use uppercase tag and attribute names, to make them stand out from the surrounding lowercase content.

To become compliant, all those names must be converted to lowercaseeven the ones you used in your CSS stylesheet definitions. Fortunately, it's easy to accomplish this kind of change with various editing tools, and XHTML authoring systems should perform the conversion for you.

16.3.5. Quoted Attribute Values

As if all those case-sensitive attribute names weren't aggravating enough, XHTML requires that you enclose every attribute valueeven the numeric onesin double quotes. In HTML, you could quote anything your heart desired, but quote marks are required only if the attribute value included whitespace or other special characters . To be XHTML compliant, every attribute must be enclosed in quotes.

For example:

 <table rows=3> 

is wrong in XHTML. It is correctly written as:

 <table rows="3"> 

16.3.6. Explicit Attribute Values

Within HTML, there are a small number of attributes that have no value. Instead, their mere presence within a tag causes that tag to behave differently. In general, these attributes represent a sort of on/off switch for the tag, like the compact attribute for the various list tags or the ismap attribute for the <img> tag.

In XHTML, every attribute must have a value. Those without values must use their own names. Thus, compact in XHTML is correctly specified as compact= "compact", and checked becomes checked="checked" . Each must contain the required attribute value enclosed in quotes. Table 16-2 contains a list of attributes with the required XHTML values.

Table 16-2. XHTML values for valueless HTML attributes

checked="checked"

compact="compact"

declare="declare"

defer=" defer "

disabled="disabled"

ismap="ismap"

multiple="multiple"

noresize="noresize"

noshade="noshade"

nowrap="nowrap"

readonly="readonly"

selected="selected"


Be aware that this attribute value requirement may cause some old HTML browsers to ignore the attribute altogether. All the modern browsers don't have that problem, so the vast majority of users won't notice any difference. There is no good solution to this problem, other than distributing HTML 4.0-compliant browsers to the needy.

16.3.7. Handling Special Characters

XHTML is more sensitive than HTML to the use of the < and & characters in JavaScript and CSS declarations within your documents. In HTML, you can avoid potential conflicts by enclosing your scripts and stylesheets in comments ( <!-- and --> ). XML browsers, however, may simply remove all the contents of comments from your document, thereby deleting your hidden scripts and stylesheets.

To properly shield your special characters from XML browsers, enclose your styles or scripts in a CDATA section. This tells the XML browser that any characters contained within are plain old characters, without special meanings. For example:

 <script language="JavaScript"> <![CDATA[   ...JavaScript here...   ]]> </script> 

This doesn't solve the problem, though. HTML browsers ignore the contents of the CDATA XML tag but honor the contents of comment-enclosed scripts and stylesheets, whereas XML browsers do just the opposite . We recommend that you put your scripts and styles in external files and reference them in your document with appropriate external links.

Special characters in attribute values are problematic in XHTML, too. In particular, you always should write an ampersand within an attribute value using &amp; and not simply an & character. Similarly, play it safe and encode less-than and greater-than signs using their &lt; and &gt; entities. For example, while:

 <img src=seasonings.gif alt="Salt & pepper"> 

is perfectly valid HTML, you must write it as:

 <img src="seasonings.gif" alt="Salt &amp; pepper" /> 

for it to be compliant XHTML.

16.3.8. The id and name Attributes

Early versions of HTML used the name attribute with the <a> tag to create a fragment identifier in the document. This fragment could then be used in a URL to refer to a particular spot within a document. The name attribute was later added to other tags, such as <frame> and <img> , allowing those elements to be referenced by name from other spots in the document.

With HTML 4.0, the W3C added the id attribute to almost every tag. Like name , id lets you associate an identifier with nearly any element in a document for later reference and use, perhaps by a hyperlink or a script.

XHTML has a strong preference for the id attribute as the anchor of choice within a document. The name attribute is defined but formally deprecated for those elements that have historically used it. With widespread support of HTML 4.0 now in place, you should begin to avoid the name attribute where possible and instead use the id attribute to bind names to elements in your documents. If you must use the name attribute on certain tags, include an identical id attribute to ensure that the tag will behave similarly when processed by a strict XHTML browser.



HTML & XHTML(c) The definitive guide
Data Networks: Routing, Security, and Performance Optimization
ISBN: 596527322
EAN: 2147483647
Year: 2004
Pages: 189
Authors: Tony Kenyon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net