The Web has reached a point that many pages contain "bad" HTML (see Example 5.13). The eXtensible HTML (XHTML) markup language was designed to combine the best of HTML and XML: It supports all of the elements of HTML 4.0 combined with the well formed syntax of XML. Essentially , XHTML is a reformulation of HTML 4.0 in XML. The HTML code in Example 5.15 will work fine with some browsers, even though it does not follow the HTML rules. The correct equivalent code is presented in Example 5.16. Example 5.13 A bad HTML missing closing </p> tags.<p>This is a paragraph <p>This is another paragraph Example 5.14 The revision in XHMTL adding closing </p> tags.<p>This is a paragraph</p> <p>This is another paragraph</p> Example 5.15 A bad HTML document which is missing closing tags.<html> <head> <title>This is bad HTML</title> <body> <h1>Bad HTML Example 5.16 The revision in good XHTML.<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head><title>This is good HTML</title></head> <body> <h1>Good HTML </body> </html> This section contains a summary of the XHTML DTD as specified and published by the W3C [WWW]. Readers are encouraged to review the detailed specification available from www.W3C.org. Some of the most noteworthy differences between HTML and XHTML include the following:
Above and beyond these differences, being based on XML, in XHTML it is relatively easy to introduce new elements or additional element attributes. The XHTML family is designed to accommodate these extensions through XHTML modules and techniques for developing new XHTML-conforming modules. As an example, the XHTML namespace can be used with other XML namespaces to specify modules with a completely different set of tags. Specifically, it is clear that for iTV application not all of the XHTML elements will be required on all receivers. Through modularization and namespaces, the XHTML framework enables coexistence of combinations of existing and new feature sets when developing content and when designing new iTV browsers. In XHTML, as in SGML, certain elements may be excluded from being contained within an element, even though such restrictions cannot be defined in the DTD. Such prohibitions (called exclusions) are not possible in XML. For example, the HTML 4 Strict DTD forbids the nesting of an a element within another a element to any descendant depth; such a restriction is not possible in XML. There are three key XHTML DTDs:
Following XML, the XHTML standard defines the id attribute for the elements <a/> , <applet/> , <form/> , <frame/> , <iframe/> , <img/> , and <map/> . In addition, the name attribute, although formally deprecated, for backward compatibility with HTML it is still supported by XHTML 1.0. The id attributes are fragment identifiers of type ID, which must be unique with the scope of a single document. To ensure that XHTML documents are well-structured XML documents, the id attribute is required even for elements that historically have also had a name attribute. See the HTML Compatibility Guidelines for additional information. 5.4.1 Character SetCharacter sets are a critical aspect of a markup language: One needs to make sure that it is supported by the font selected for rendering the content. Such validation could be performed automatically, as all the information needed should be available within the selected font. XHTML uses the ISO 8879 Latin 1 character set. The XHTML DTD specifies the mapping between ISO 8879 and ISO 10646 UNICODE. 5.4.2 ElementsDTD authors need to define the content model for their DTD. XHTML provides a variety of tools, including a set of support modules, instantiated by a main Framework module specified by the XHTML DTD. That framework module is essentially a reformulation of HTML as a modular XML application. It requires that the DTD author specifies notations, data types, namespace, qualified names, events, common attributes, document model, and character entities. The XHTML DTD follows the same principles described earlier. For example, the specification of the anchor element <a/> is given in Figure 5.11: It may contain either a qname or a content , and has the optional (i.e., #IMPLIED ) attributes of href , charset , type , hreflang , rel , rev , accesskey , and tabindex , as well as additional attributes common to many other elements. A very important aspect of XHTML is the object element used to embed external objects as part of XHTML pages. Although its functionality overlaps that of the Applet element, it is essentially more general, and typically used to embed iTV (JavaTV) Xlets. The classid attribute typically specifies a reference to a JavaTV Xlet class, and the data attribute specifies the input data for that Xlet. In the context of iTV, the code-base attribute is not often used, because the code is not always available locally at the receiver or otherwise accessible through a return channel (i.e., remote interactivity channel). An important use of the OBJECT element is to invoke a content-rendering plug-in. The data attribute points to the image or data file to be rendered, and the classid attribute points to the plug-in to be used for rendering that data (see the ATSC DASE specification). This concept could be further extended to render future versions of XHTML, or regional variants such as the Broadcast Markup Language (BML) used in Japan: The data attribute points to the BML file and the classid attribute points to the BML renderer or browser. Figure 5.11 XHTML DTD definition of the Anchor element.<!ENTITY % a.element "INCLUDE" > <![%a.element;[ <!ENTITY % a.content "( #PCDATA %InlNoAnchor.mix; )*" > <!ENTITY % a.qname "a" > <!ELEMENT %a.qname; %a.content; > <!-- end of a.element -->]]> <!ENTITY % a.attlist "INCLUDE" > <![%a.attlist;[ <!ATTLIST %a.qname; %Common.attrib; href %URI.datatype; #IMPLIED charset %Charset.datatype; #IMPLIED type %ContentType.datatype; #IMPLIED hreflang %LanguageCode.datatype; #IMPLIED rel %LinkTypes.datatype; #IMPLIED rev %LinkTypes.datatype; #IMPLIED accesskey %Character.datatype; #IMPLIED tabindex %Number.datatype; #IMPLIED> <!-- end of a.attlist -->]]> As mentioned earlier, specifications of standards and their implementations may select a subset of the XHTML elements, because each element (or group of elements) is regarded as a module. The subset required by an XHTML standard is specified by its framework module , which includes a list of required modules. The list of predefined XHTML elements is a derivative of the HTML 4.0 elements (see Table 5.3). Table 5.3. Elements Supported by XHTML
Table 5.4. Standard XHTML Element Attributes
|