Although the terms XML, HTML, and XHTML all sound similar, theyre really quite different. XML and HTML are not competing technologies. They are both used for managing and displaying online information but both do different things. XML doesnt aim to replace HTML as the language for web pages. XHTML is a hybrid of the two languages.
HTML tags deal with both the information in a web page and the way it displays. In other words, HTML works with both presentation and content. It doesnt deal with the structure of the information or the meaning of different pieces of data. You cant use HTML to transform the display of information into a completely different layout. If you store information in a table, you cant easily change it into a list.
HTML was designed as a tool for sharing information online in web pages. The complex designs that appear in todays web pages werent part of the original scope of HTML. As a result, designers often use HTML in ways that were never dreamed of when the language was first created.
The rules for using HTML arent terribly strict. For example, you can add headings by using the tags <h1> to <h6> . The <h1> tag is the first level of heading, but there is no requirement to include heading tags in any particular order. The first heading in your HTML page could actually be enclosed in an <h3> or <h4> tag.
Web pages written in HTML can contain errors that dont affect the display of the information. For example, in many browsers, you could include two <title> tags and the page would still load. You can also forget to include a closing </table> tag and the table will still be rendered.
HTML is supposed to be a standard, but it works differently across web browsers. Most web developers know about the problems in designing a website so it appears the same way in Internet Explorer, Opera, Firefox, and Netscape Browser for both PCs and Macs.
Like XML, HTML comes from the Standard Generalized Markup Language (SGML). Unlike XML, HTML is not extensible. Youre stuck with a standard set of tags that you cant change or extend in any way.
XML only deals with content. It describes the structure of information without concerning itself with the appearance of that information. An XML document can show relationships in your data just like a database. This just isnt possible in an HTML document.
XML content is probably easier to understand than HTML. The names of tags normally describe the data they mark up. In the example file address.xml , tag names such as <address> and <phone> tell you what data is contained in the element.
XML may be used to display information directly in a web page. Its more likely, though, that youll use the XML document behind the scenes. It will probably provide the content for a web application or a Flash movie.
Compared with HTML, XML is much stricter about the way markup is used. There are rules about how tags are constructed , and weve already seen that XML documents have to be well formed . A DTD or schema can also provide extra rules for the way that elements are used. These rules can include the legal names for tags and attributes, whether theyre required or optional, as well as the number of times that each element must appear. In addition, schemas specify what data type must be used for each element and attribute.
XML documents dont deal with the display of information. If you need to change the way XML data looks, you can change the appearance by using Cascading Style Sheets (CSS) or Extensible Stylesheet Language (XSL). XSL transformations offer the most power; you can use them to create XHTML from an XML document or to sort or filter a list of XML elements.
XHTML evolved so that the useful features of XML could be applied to HTML. The W3C says that XML reformulated HTML into XHTML. XHTML documents have much stricter construction rules and are generally more robust than their HTML counterparts.
The HTML specification provides a list of legal elements and attributes within XHTML. XML governs the way that the elements are used in documents. For example, in XHTML, you must close all tags. The HTML <br> tag has to be rewritten as <br/> or <br></br> . In XHTML, web designers cant use a single <p> tag to create a paragraph break as they could in HTML.
Another change is that you must write attribute values in full. For example
<input type="radio" value="JJJ" checked/>
has to be written as
<input type="radio" value="JJJ" checked="checked"/>
You can find the XHTML specification at www.w3.org/TR/xhtml1/. It became a recommendation in 2000 and was revised in 2002.
Ive summarized the main changes from HTML to XHTML:
You should include a DOCTYPE declaration specifying that the document is an XHTML document.
You can optionally include an XML declaration.
You must write all tags in lowercase.
All elements must be closed.
All attributes must be enclosed in quotation marks.
All tags must be correctly nested.
The id attribute should be used instead of name .
Attributes cant be minimized.
The following listing shows the previous address.xml document rewritten in XHTML. Ive done this so you can compare XHTML and XML documents.
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <body> <table> <tr> <td>Sas Jacobs</td> <td>123 Some Street, Some City, Some Country</td> <td>123 456</td> </tr> <tr> <td>John Smith</td> <td>4 Another Street, Another City, Another Country</td> <td>456 789</td> </tr> </table> </body> </html>
Notice that the file includes both an XML declaration and a DOCTYPE declaration. You can see the content in the resource file address.html .
Youre probably used to seeing information like this in web pages. A table displays the content and lists each contact in a separate row. Figure 2-9 shows this document opened in Internet Explorer.
Ive rewritten the content in XHTML so that it conforms with the stricter rules for XML documents. However, the way the document is constructed may still cause some problems. Each piece of information about my contacts is stored in a separate cell within a table. The <td> tags dont give me any clue about what the cell contains. I get a better idea when I open the page in a web browser.
It would be difficult for me to use a software program to extract the content from the web page. I could remove the <td> tags and add the content to a database, but if the order of the table columns changed, I might end up with the wrong data in the wrong database field. Theres no way to associate the phone number with the third column.
The web page controls the display of information. Although I can make some minor visual adjustments to the table using style sheets, I cant completely transform the display. For example, I cant remove the table and create a vertical listing of all entries without completely rewriting the XHTML.
Each time I print the document, it will look the same. I cant exclude information such as the address column from my printout. I dont have any way to filter or sort the information. I am not able to extract a list of contacts in a specific area or sort into contact name order.
Compare this case with storing the information in an XML document. I can create my own tag names and write a schema that describes how to use these tags. When I view the document in a web browser, the tag names make it very clear what information theyre storing.
I can apply a transformation to change the appearance of an XML document, including
Sorting the document into name order
Filtering the contents to display a single contact
Listing the names in a table or bulleted list
XML isnt a replacement for XHTML documents, but it certainly provides much more flexibility for working with data. Youre likely to use XML documents differently from XHTML documents. XML documents are a way to store structured data that may or may not end up in a web page. You normally use XHTML only to display content in a web browser.
XML offers many advantages compared with other forms of data storage. Before I explore what you can do with XML documents, I think its important to understand the benefits of working with XML. Ill look at this more closely in the next section.