Understand XHTML Modularization | How to Do Everything with HTML & XHTML

In the Web markup world you have two extremes. HTML is an easy-to-learn language, but it isn’t extensible. You’re stuck with it as it is. On the other hand, there is XML, which is more difficult to learn but gives you the freedom to create your own markup language from scratch. But as you think about it, you might begin to wonder if using XML is worth the trouble. After all, HTML might not be perfect, but it does provide pretty much everything you need for doing Web pages. Why should you “reinvent the wheel” by trying to create your own markup language? Isn’t it better to stick with something that has already been developed?

Well, what if you could have the benefits of HTML markup (most of it, anyway) along with the ability to extend it by adding your own elements? That’s what you have in XHTML 1.1 or Module-based XHTML.

Understand XHTML 1.0

The first version of XHTML (1.0) could have been called Transitional XHTML, because it was designed to begin moving users of HTML in the direction of XML. Of course, the W3C didn’t want to make the change too suddenly. That’s why they instituted the three different DTDs you learned about earlier: Transitional, Frameset, and Strict. By doing this, they were able to slowly phase out the presentational elements and attributes by deprecating them. However, these three DTDs were never intended to be ends in themselves. They were intended to be stepping stones, as it were, to a much different method for creating Web pages.

Understand XHTML 1.1

XHTML 1.1, or XHTML Modularization, is the current recommendation of the W3C. This specification contains a number of significant changes from the original version of XHTML. For starters, the deprecated attributes and elements have been dropped entirely. In other words, there is no longer a “transitional” DTD that allows you to work with those presentational elements and attributes you may have grown to love. A second difference is that the lang attribute has been removed for all elements and replaced with the xml:lang attribute, and the name attribute has been removed from the <a> and <map> elements in favor of id. Perhaps the most significant change of all has been the grouping together of related elements into modules.

Understand the Concept of XHTML 1.1 Modules

To understand XHTML modules and how they can be extended, think of a toolbox. Instead of providing an alphabetical listing of elements and attributes, the elements of module-based XHTML are organized into related subgroups. Each module is a tool for accomplishing a particular task. For example, all the table elements are grouped together in the Tables Module, form elements are combined into a Forms Module, and so on. Thus, you can create your own custom markup language by choosing only the tools (modules) you need. You can also design your own modules and add these in to your customized markup language. Table 16-1 lists the different modules of XHTML 1.1 and the elements that are contained in each.

Table 16-1: The Modules of XHTML 1.1
Module Name	Includes These Elements
Structure Module	<body>, <head>, <html>, <title>
Text Module	<abbr>, <acronym>, <address>, <blockquote>, <br />, <cite>, <code>, <dfn>, <div>, <em>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <kbd>, <p>, <pre>, <q>, <samp>, <span>, <strong>, <var>
Hypertext Module	<a>
List Module	<dl>, <dt>, <dd>, <ol>, <ul>, <li>
Applet Module (deprecated in favor of the Object Module)	<applet>, <param />
Presentation Module	<b>, <big>, <hr />, <i>, <small>, <sub>, <sup>, <tt>
Edit Module	<del>, <ins>
Bi-directional Text Module	<bdo>
Forms Module	<form>, <input />, <select>, <option>, <textarea>, <button>, <fieldset>, <label>, <legend>, <optgroup>
Basic Forms Module	<form>, <input />, <label>, <select>, <option>, <textarea>
Tables Module	<caption>, <table>, <td>, <th>, <tr>, <col>, <colgroup />, <tbody >, <thead>, <tfoot>
Image Module	<img />
Client-side Image Map Module	<a>, <area />, <img />, <input />, <map>, <object>
Server-Side Image Map Module	<img />, <input />
Object Module	<object>, <param />
Frames Module	<frameset>, <frame />, <noframes>
Target Module	<a>, <area>, <base>, <link>, <form>
Iframe Module	<iframe>
Intrinsic Events Module	<a>, <area>, <frameset>, <form>, <body>, <label>, <input>, <select>, <textarea>, <button>
Metainformation Module	<meta />
Scripting Module	<noscript>, <script>
Stylesheet Module	<style>
Link Module	<link />
Base Module	<base />
Name Identification Module	<a>, <applet>, <form>, <frame>, <iframe>, <img>, <map>
Legacy Module	<basefont>, <center>, <dir>, <font>, <isindex>, <menu>, <s>, <strike>, <u> The following elements have some of their deprecated attributes restored in this module: <body>, <br>, <caption>, <div>, <dl>, <h1-h6>, <hr>, <img>, <input>, <legend>, <li>, <ol>, <p>, <pre>, <script>, <table>, <tr>, <th>, <td>, <ul>

Note

A module is usually a collection of related elements. However, a module can also be a collection of other modules.

Understand How Modules Work

What makes an XHTML module different from an ordinary collection of HTML elements? It has to do with the XHTML 1.1 DTD. A DTD (Document Type Description) is the blueprint for any markup language created in XML. The module-based XHTML DTD is broken up into smaller, subset DTDs that can be referenced individually. For example, in developing your own markup language you can choose to include the Forms Module but to ignore the Tables Module. You can also write a list of your own elements and attributes—in other words, your own DTD—and save it as a module. This is what enables you to extend XHTML to include your markup. The downside is that you must be able to write your own DTD. Sound complicated? Well, actually, it is.

Did You Know?—There Are Four Required Modules

If you want to create truly module-based pages based on the XHTML DTD, you must include the four core modules: Structure, Text, Lists, and Hypertext. If you do not include the core modules, your document cannot be considered part of the XHTML family of documents. It may be a perfectly valid XML document but not truly XHTML.

HTML is intuitive and easy to learn. Its immediate successor, “transitional” XHTML, is also easy to learn. Granted, the rules are a bit stricter, but they are not daunting. Unfortunately, ease of use ends with XHTML 1.0. Although XHTML 1.1 is a great concept, its complexity has moved it out of the realm of the casual page author. It is unlikely that the “Mom and Pop” users who have created Web pages with HTML will be able to make the jump to modular-XHTML without special software. On top of that, XHTML 2.0 (currently in the working draft stage) moves even further away from its HTML predecessor. Virtually all presentational elements will be removed—for example <img />—and the markup will be stripped back to a bare minimum. Users will have no choice but to design a good portion of their markup. To give you a taste of what that will be like, the final project for this book will be to create a page in XML.

Tip

If you want to make sure your documents are ready for the change to XHTML 2.0, you can move a long way in this direction by following the principles laid out at the end of this chapter.

Project 25: Create a Page with XML

Now that you’ve learned some XHTML, as you start to explore XML you’ll find that you are on familiar ground from the very beginning. An XML document is built with elements, attributes, and values, just as the HTML documents you are familiar with. The primary difference is that the elements, attributes, and values are unique because authors can create their own tags. For example, the following code listing provides an example of how the markup in an XML document might look:

<webpage>   <heading>Welcome to my XML page</heading>   <paragraph1>This is my page of the future.</paragraph1>   <logo> <!-- This will display a logo --> </logo>   <paragraph2>It is written in XML.</paragraph2>   <closing>Good bye!</closing> </webpage>

To see how an XML page comes together, open a text editor and type in the preceding code listing. Save the page as a text file named webpage.xml.

Note

If you want to view your XML page, you will need to have at least Internet Explorer 4 or higher, Netscape 6, or Opera.

Create a Document Type Description (DTD)

A DTD is where you write down the rules for your new markup language and define your elements. You also decide what attributes and values you want each element to accept. Then, when a validating parser loads your document, it checks your DTD, reads the rules, and interprets the page accordingly. The following code listing shows what a simple DTD for our webpage.xml would look like:

<!ELEMENT webpage (heading, para1, para2, closing)> <!ELEMENT heading (#PCDATA)> <!ELEMENT para1 (ANY)> <!ELEMENT logo (ANY)> <!ELEMENT para2 (ANY)> <!ELEMENT closing (#PCDATA)>

For the sake of simplicity, this particular DTD is keeping to the very basics: elements. No attributes, values, or entities were defined, although they could have been. If you look at the DTD line by line, it’s not difficult to figure out what each line is doing:

<!ELEMENT webpage (heading, para1, logo, para2, closing)> The first statement is made up of three parts: the ELEMENT declaration, the element name (webpage), and the element’s “children” (other elements that must be nested inside it).

The first element declaration is arguably the most important part of your DTD because in it you define your root element. You can have only one root element in an XML document, and all the others must be nested within it. For example, in an HTML page, the root element is <html>. All other elements must go inside it. For the page described in the preceding example, the root element is <webpage>.

<!ELEMENT heading (#PCDATA)> This line defines the <heading> element. Again, it has three parts: the ELEMENT declaration (this tells the parser that you are defining an element), the element’s name (heading), and the type of data it can receive (#PCDATA). The #PCDATA statement stands for parsed character data, which simply means that text can be contained in that element.
<!ELEMENT para1 (ANY)> By now you should be getting a feel for what these lines are doing. This line declares and defines the <para1> element. The main difference in this line is that, instead of #PCDATA, this element can accept any data. For example, this element can also contain other elements, whereas the preceding <heading> element cannot. Although all the lines in this DTD are virtually identical, except for the element names and data types, in a more complex DTD you will have various types of elements (other than the #PCDATA type), attributes, values, and even entities.

With a clearly written document type description, your XML page is nearly complete. However, two more components must be added before it is ready to display: an XML declaration and a document type declaration.

Add an XML Declaration

The XML declaration is simply a line at the beginning of the file that identifies your document as an XML document. Strictly speaking, you don’t have to add this line, but again, it’s a good idea to include it. To add an XML declaration to the beginning of our “webpage” document, you add a line at the very top that looks like this:

<?xml version="1.0"?>

How to: Understand the Difference Between Nonvalidationg and Validation Parsers

When discussing how XML documents are dealt with, you normally hear the term parser used rather than browser. This is because XML deals with more than just Web pages. There are two basic kinds of parsers: validationg and nonvalidating. A nonvalidating parser merely checks to make sure your XML document is “well-formed,” then it displays the page. A validating parser also checks your document against a DTD to see if it conforms to the DTD’s specifications.

With the addition of an XML declaration, the “webpage” document is almost ready to display. However, another optional but helpful addition is a document type declaration.

Add a Document Type Declaration

There are two ways to include a document type declaration. The first is used to link to an external DTD, which is basically the same idea as linking an HTML page to an external style sheet or JavaScript. The statement is made up of three parts: the !DOCTYPE statement, the name of the DTD that you want the parser to use (the name of the root element), and the location of the DTD (so the parser can find it). For example, if you plan on saving your DTD as a separate file and linking the page to it, the document type declaration for the “webpage” document might look like this:

           <!DOCTYPE webpage SYSTEM "webpage.dtd">.

The SYSTEM keyword identifies the URL of the document type definition. Notice that the “webpage.dtd” is identified in quotation marks following the system keyword.

Tip

As is the case with style sheets, it’s a good idea to save your DTD as a separate file and link to it. This way you can use it with more than one document. If you embed it in the document itself, you tie it down to that one page.

If, as in the “webpage” sample, your DTD is relatively simple and you want to embed it in the document, you actually include it as part of the document type declaration. You do this by enclosing the entire DTD in square brackets inside the document type declaration tag. Because you are not linking to an outside document, you can leave out the SYSTEM keyword. With the document type declaration included in the “webpage” sample, the completed code will look like the following listing:

<?xml version="1.0"?> <!DOCTYPE webpage [< <!ELEMENT webpage (heading, para1, para2, closing)> <!ELEMENT heading (#PCDATA)> <!ELEMENT para1 (ANY)> <!ELEMENT logo (ANY)> <!ELEMENT para2 (ANY)> <!ELEMENT closing (#PCDATA)> ]>> <webpage>   <heading>Welcome to my XML page</heading>   <para1>This is my page of the future.</para1>   <logo> <!-- This will display a logo --> </logo>   <para2>It is written in XML.</para2>   <closing>Good bye!</closing> </webpage>

Save this page and then open it in Internet Explorer 4 or higher, and you will see your first XML page displayed. You might be surprised at how it looks. As shown in the following illustration, it appears to be displaying your XML code:

click to expand

If you were writing HTML and your page displayed like this, you would wonder what went wrong. However, in this case, the page is displaying exactly as it is supposed to. In fact, if you made even the slightest error, the page simply would not have displayed at all, as the following illustration demonstrates:

click to expand

You see, whereas HTML is “forgiving,” XML is not. Your code must be written correctly, down to the last quotation mark. If it isn’t, an XML browser or parser will not display it. There’s no room for sloppy coding with XML. However, that doesn’t answer the question of why your XML code displayed instead of the Web page you may have been expecting. The reason is that there is one more “component” an XML page must have: a style sheet.