To complete this discussion of markup languages, we look now at XHTML, a marriage of HTML and XML. As mentioned previously, HTML, although a very powerful markup language, is extremely challenging to work with because of the lack of strict requirements on how markup elements are put together. HTML documents can have some extremely odd markup tag combinations or missing tags, and browsers will still spend time and energy trying to parse and render the content represented in them.
To solve the problem of HTML's lack of defined structure, XHTML was designed to eventually replace it. It is much stricter and cleaner than HTML, and nearly all modern browsers support it. (It is difficult to find any released since the year 2000 that do not support it.) In a nutshell, XHTML consists of all the HTML elements but requires that they conform to XML document syntax and well formedness.
People might wonder why we would bother with such an endeavor. After all, it does seem as though modern browsers function quite well with HTML today. Why fix something that is not obviously broken?
We would still want to do this for a number of reasons. Most modern browsers have to include large amounts of code to account for all the possibilities that malformed or "wonky" HTML presents. This slows down development time significantly, increases the bugginess of applications interpreting the HTML, and results in a much larger memory footprint. It also slows down the processing of the HTML, and different browsers will invariably have slightly different interpretations of unusual tag sequences and render them differently.
Throw into this mix the fact that many smaller devices, such as cell phones, PDAs, and other handheld devices, have very limited memory space and processor capabilities, and the desire to waste non-trivial amounts of both of those on ill-formed markup is small.
If all of our HTML were also XML, a single XML implementation on any platform would verify that the document was well formed and would let programs focus on the rendering of the content. This would reduce time spent processing the markup and probably many of the inconsistencies seen in output too.
We could even use schemas and DTDs to further verify that the input was correct XHTML, reducing the need for us to worry about structure.
How to Work with XHTML
The good news about XHTML is that it is extremely similar to the HTML with which you are likely already familiar. If you just make a few small changes, your HTML documents will behave correctly as XHTML and will be quicker, cleaner, and fully compatible with all new browsers.
XHTML Is XML
The first and most important thing to remember is that XHTML documents are XML documents, which means that your XHTML documents must be well-formed XML documents.
To do this, verify the following:
It is this last point with which users might be least familiar. It is extremely common in HTML to write tags as follows:
<HR size='1'> <BR><BR> <IMG src='/books/3/445/1/html/2/oink_pig.jpg'>
In XHTML, all these are rewritten as empty tags:
<hr size='1'/> <br/><br/> <img src='/books/3/445/1/html/2/oink_pig.jpg'/>
The Minimum XHTML Document
All XHTML documents must contain at least the following:
<!DOCTYPE html PUBLIC "~//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title></title> </head> <body> </body> <html>
The first line defines the degree of strictness to which the given XHTML document conforms, with possible values being strict, transitional, and a third value for use in frames. We will use the transitional setting mostly because it gives us a bit of flexibility when writing our XHTML documents.
As you can see, most of the elements that you would have in normal HTML are here in the XHTML document, except that now many are no longer optional.
Elements and Attributes
XHTML places a few other minor restrictions on our elements and attributes. Specifically, all markup tag names must be in lowercase. Thus, in HTML where you used to write
<TABLE width='100%'> <TR> <TD>hi mom!</TD> </TR> </TABLE>
you would now write this:
<table width='100%'> <tr> <td>hi mom!</td> </tr> </table>
Finally, attributes must be well formed in XHTML. This means that two common practices in HTML documentsthat of omitting the quotes around attribute values and that of skipping the attribute name all together and just putting the value in the tagcause your XHTML documents to be ill formed. For example, the following
<table width=100%> <input checkbox>
should instead be this:
<table width='100%'> <input type='checkbox'>
If you have thus far been writing largely well-formed HTML, this transition should be even less notable for you.
Other Minor Changes
One other minor change of which you might want to take note is that HTML frequently uses the name attribute, specifically on the A, APPLET, FRAME, IFRAME, IMG, and MAP tags. For these elements, you should now use the id attribute rather than name.
For elements in HTML forms, however, you should continue to use the name attribute as the name associated with values sent along with the resulting request.
Converting to XHTML
The conversion process to XHTML from HTML for existing code is usually not a traumatic one and can be performed in a short period of time, especially if all the HTML is generated from a few common places in your source code.
Using the rules discussed previously, you can usually make the necessary changes in a few places (making sure all element tags are properly closed, lowercase names, and so on), and verify that there are no overlapping elements.
You can find conversion and validation tools on the Internet to help you with these efforts; these tools help you identify exactly what you need to address when converting documents. The site http://www.w3c.org, the World Wide Web Consortium home page, is a good place to start; from there, you can see more about the XHTML spec and get access to tutorials, validators, and specifications.
Except for small snippets to demonstrate concepts, we will always use XHTML in this book, and even those snippets will be well-formed XHTML, lacking only in the appropriate headers or complete set of elements needed.