HTML | Developing XML Solutions (DV-MPS General)

[Previous] [Next]

Nearly every computer user is familiar with HTML. HTML is a fairly simple language that has helped promote the wide usage of the Internet. HTML has come a long way since it was originally designed so that scientists could use hyperlinked text documents to share information. Let us begin by looking at HTML's original version.

Early HTML

In its original conception, HTML was supposed to include elements that could be used to mark information within the HTML document according to meaning. Tags such as <title>, <h1>, <h2>, and so on were created to represent the content of the HTML document.

How the marked text would actually be interpreted and displayed would depend on the Web browser's settings. Theoretically, any two browsers with the same user settings would present the same HTML document in the same way. This flexibility would enable users with special needs or specific preferences to customize their Web browsers to view HTML pages in their preferred format—an especially useful feature for people with impaired vision or who are using older Web browsers.

In this scenario, the HTML developer uses tags based on an HTML standard that are displayed according to the user's preferences. For this to work, it must be based on a standard for HTML. The current Web standard can be found at http://www.w3.org.

Problems with HTML

HTML has proved to be a great language for the initial development of the Internet. As the Internet matures, the need has developed for a language that can be used for more complex and large-scale purposes such as fulfilling corporate functions, and HTML quickly fails to meet the mark. Let's look at some of the problems with HTML.

Conflicting standards

In 1994, Netscape created a set of HTML extensions that worked only in Netscape's Web browser. This was the beginning of the browser wars, and the first casualty was the HTML standard. Using these extensions, Netscape could now allow the author of the HTML document to specify font size, font and background color, and other features. Eventually, Netscape added frames. Of course, all of these extensions would not display properly in any other browser. The HTML extensions were so popular that by 1996 Netscape was the number one browser.

Although Netscape won a major victory, Web developers and users suffered a major loss. In addition to the problem of handling nonstandard extensions, different browsers handle the standard tags in different ways. This means that Web designers now have to create different versions of the same HTML document for different Web browsers. The extensions force users to accept pages that are formatted according to the author's wishes.

NOTE
In most browsers, you can create default settings that will override the settings in the HTML pages. Unfortunately, most users do not know how to use these settings, and if you do set your own defaults, most pages will not display correctly.

Creating HTML documents that will appear approximately the same in all browsers is a difficult, and at times impossible, task. For information about this topic, see the Web Standards Project at http://www.webstandards.org.

NOTE
It is beyond the scope of this book to go into the details of HTML standardization, but the Web Standards Project site will provide you with the information and resources you need.

No international support

The Internet has created a global community and made the world a much smaller place. Corporations are expanding their businesses into this global marketplace, and they are extending their partners and corporations around the globe, linking everything through the Internet. A few proposals to create an international HTML standard have been put forward, but no standard has actually materialized. There are no HTML tags that can identify what language an HTML document is written in.

Inadequate linking system

When you create HTML documents, links are hard-coded into the document. If a link changes, the Web developer must search through all the HTML documents to find all references to the link and then update them. With Web sites that are dynamic and constantly evolving and growing to meet the needs of the users, this lack of a linking system can create substantial problems. We need a much more sophisticated method of linking documents than can be provided by HTML. HTML does not allow you to associate links to any element, nor does it allow you to link to multiple locations, whereas the linking system in XML does provide these features. In Chapter 6, you will learn more about XML's linking capability.

Faulty structure and data storage

HTML does have a structure, but this structure is not extremely rigid. For example, you can place heading 3 (<h3>) tags before heading 1 (<h1>) tags. Within the <body> tag, you can place any legitimate tag anywhere you want. You can validate HTML documents, but this validation only confirms that you have used the tags properly. Even worse, if you leave off end tags, the browser will try to figure out where the end tags should be and add them in. Thus, you can create HTML code that is not properly written but will still be interpreted properly by the browser.

Another problem arises if you try to put data into an HTML document. You will find it very difficult to do so. For example, suppose we are trying to put information from a database into an HTML document. We have a database table named Customer with the following fields: customerID, customerName, and customerAddress. When we create an HTML document with this data, every customer should have a customerID and a customerName value. The customerAddress value is optional. We could present this data in HTML in a table, as follows:

 <body> <table border="1" width="100%"> <tr> <th width="33%">Name</th> <th width="33%">Address</th> <th width="34%">ID</th> </tr> <tr> <td width="33%">John Smith</td> <td width="33%">125 Main St. Anytown NY 10001</td> <td width="34%">001</td> </tr> <tr> <td width="33%">Jane Doe</td> <td width="33%">2 Main St. Anytown NY 10001</td> <td width="34%">002</td> </tr> <tr> <td width="33%">Mark Jones</td> <td width="33%">35 Main St. Anytown NY 10001</td> <td width="34%"></td> </tr> </table> </body>

In a browser, this table would appear as shown in Figure 2-1.

click to view at full size.

Figure 2-1. Database table created using HTML.

This document is completely valid HTML code. There are no errors in the HTML code for the table; it is syntactically correct. Yet in terms of the validity of the data, the information is invalid. The third entry, Mark Jones, is missing an ID. Although it is possible to write applications that perform data validation on HTML documents, such applications are complex and inefficient. HTML was never designed for data validation.

HTML was also not designed to store data. The table is the most common way of both presenting and storing data in HTML. You can use <div> tags to create more complex structures to store data, but once again you are left with the task of writing your own data validation code.

What we need instead is something that enables us to put the data in a structured format that can be automatically validated for syntactical correctness and proper content structure. Ideally, the author of the document will want to define both the format of the document and the correct structure of the data. As you will see in Chapters 4 and 5, this is exactly what XML and DTDs do.