< Day Day Up > |
Before finishing the chapter, we need to take a look at some of the major open questions related to HTML/XHTML. You will encounter these issues over and over again throughout the book and while they are pretty easy to describe, they are very hard to answer.
No introduction to HTML would be complete without a discussion of the logical versus physical markup battle at the heart of HTML. Physical HTML refers to using HTML to make pages look a particular way; logical HTML refers to using HTML to specify the structure of a document while using another technology, such as Cascading Style Sheets (CSS) as discussed in Chapters 10 and 11, to designate the look of the page.
Most people are already very familiar with physical document design because they normally use WYSIWYG ( what you see is what you get ) text editors, such as Microsoft Word. When Word users want to make something bold, they simply select the appropriate button, and the text is made bold. In HTML, you can make something bold simply by enclosing it within the <b> and </b> tags, as shown here:
<b> This is important. </b>
This can easily lead people to believe that HTML is nothing more than a simple formatting language. WYSIWYG HTML editors (such as Microsoft FrontPage) also reinforce this view. But as page designers try to use HTML in this simplistic fashion, they sooner or later must face the fact that HTML is not a physical page-description language. Page authors can't seem to make the pages look exactly the way they want, and even when they can, doing so often requires heavy use of <table> tags, giant images or Flash files, and even trick HTML. Other technologies, such as style sheets, might provide a better solution for formatting text than a slew of inconsistently supported tricks and proprietary HTML elements.
According to most markup experts, HTML really was not designed to provide most of the document layout features people have come to expect, and it shouldn't be used for that purpose. Instead, HTML should be used as a logical, or generalized, markup language that defines a document's structure, not its appearance. For example, instead of defining the introduction of a document with a particular margin, font, and size , HTML just labels it as an introduction section and lets another system, such as Cascading Style Sheets, determine the appropriate presentation. In the case of HTML, the browser or a style sheet has the final say on how a document looks.
Even traditional HTML contains mostly logical elements. An example of a logical element is <strong> , which indicates something of importance, as shown here:
<strong> This is important. </strong>
The strong element says nothing about how the phrase "This is important" will actually appear, although it probably will be rendered in bold. Although many of HTML's logical elements are relatively underutilized , others, such as headings like <h1> and paragraphs <p> , are used regularly although they are generally thought of as physical tags by most HTML users. Consider that people generally consider <h1> a large heading, <h2> a smaller heading, and that <p> tags cause returns and you can see that logical or not, the language is physical to most of its users.
The benefits of logical elements might not be obvious to those comfortable with physical markup. To understand the benefits, it's important to realize that on the Web, many browsers render things differently. In addition, predicting what the viewing environment will be is difficult. What browser does the user have? What is his or her monitor's screen resolution? Does the user even have a screen? Considering the extreme of the user having no screen at all: how would a speaking browser render the <bold> tag? What about the <strong> tag? Text tagged with <strong> might be read in a firm voice, but boldfaced text might not have an easily translated meaning outside the visual realm.
Many realistic examples exist of the power of logical elements. Consider the multinational or multilingual aspects of the Web. In some countries , the date is written with the day first, followed by the month and year. In the United States, the date generally is written with the month first, and then the day and year. A <date> tag, if it existed, could tag the information and enable the browser to localize it for the appropriate viewing environment. In short, separation of the logical structure from the physical presentation allows multiple physical displays to be applied to the same content. This is a powerful idea that, even today is rarely taken advantage of. We'll take a look at this approach to page design in Chapters 10 and 11 when we cover HTML's intersection with style sheets.
Whether you subscribe to the physical (specific) or logical (general) viewpoint, traditional HTML is not purely a physical or logical language yet. In other words, currently used HTML elements come in both flavors ”physical and logical ”and developers nearly always think of them as physical. Elements that specify fonts, type sizes, type styles, and so on are physical. Tags that specify content or importance, such as <cite> and <h1> , and let the browser decide how to do things are logical. A quick look at Web pages across the Internet suggests that logical elements and style sheets often go unused because while Web developers want more layout control than raw HTML provides, style sheets are still not well understood by many developers, and browser support continues to be too buggy for some people's taste. Finally, many designers just don't think in the manner required for logical markup and their WYSIWYG page editors generally don't encourage such thinking! Of course, the strict forms of XHTML, particularly XHTML 2, will change all this, returning the language to a primarily logical formatting language.
Just because a standard is defined doesn't necessarily mean that it will be embraced. Many Web developers simply do not know or care about them. As long as their page looks right in their favorite browser, they are happy and they will continue to go on abusing HTML tags such as <table> and using various tricks and proprietary elements. In some sense, HTML is the English language of the Web, poorly spoken by many but widely understood. Yet this does not mean you should embrace every proprietary HTML tag or trick used. Instead, acknowledge what the rules are, follow them as closely as possible, and break them only on purpose and only when absolutely necessary. Be careful not to follow the standards or markup validator too religiously in the name of how things ought to be done; your users and clients certainly will not forgive browser errors or page display problems because you followed the rules and the browser vendor did not! With the rise of standards-oriented browsers and the continued refinement of the HTML, XHTML, and CSS specifications, things will improve, but the uptake is still slow, and millions of documents will continue to be authored with no concept of standards-compliant logical structuring. However, this does not mean that XHTML should be avoided. On the contrary, the structure and rigor it provides allows for easier maintenance, faster browsers, the possibility of even higher quality Web tools, and the ability to automatically exchange information between sites. Even with these incredible possible benefits, given the short- term similarity of XHTML and HTML, some developers still think: Why bother? Web page development continues to provide an interesting study of the difference between what theorists say and what people want and do.
The amount of hearsay, myths, and complete misunderstandings about HTML and XHTML is enormous . Much of this can be attributed to the fact that many people simply view the page source of sites or read quick tutorials to learn HTML. In the text that follows , I cover a few of the more common myths about HTML and try to expose the truth behind them.
HTML isn't a specific, screen- or printer-precise formatting language like PostScript. Many people struggle with HTML on a daily basis, trying to create perfect layouts by using HTML elements inappropriately or by using images to make up for HTML's lack of screen and font- handling features. Interestingly, even the concept of a visual WYSIWG editor propagates this myth of HTML as a page layout language. Other technologies, such as Cascading Style Sheets (CSS), are far better than HTML for handling presentation issues and their use returns HTML back to its structural roots.
Many people think that making HTML pages is similar to programming. However, HTML is unlike programming in that it does not specify logic. It specifies the structure of a document. With the introduction of scripting languages such as JavaScript, however, the dynamic HTML (DHTML) is becoming more and more popular and is used to create highly interactive Web pages. Simply put, DHTML is the idea of a scripting language like JavaScript dynamically modifying HTML elements. DHTML blurs the lines between HTML as a layout language and HTML as a programming environment. However, the line should be distinct because HTML is not a programming language. Heavily intermixing JavaScript with HTML markup in the ad-hoc manner that many authors do is far worse than trying to use HTML as a WYSIWYG markup language. Programming logic can be cleanly separated in HTML, as discussed in Chapters 13 to 15. Unfortunately, if this separation isn't heeded, the page maintenance nightmare that results from tightly binding programming logic to content will dwarf the problems caused by misuse of HTML code for presentation purposes.
HTML is the foundation of the Web; with literally billions of pages in existence, not every document is going to be upgraded anytime soon. The "legacy" Web will continue for years, and traditional nonstandardized HTML will always be lurking around underneath even the most advanced Web page years from now. Beating the standards drum upon high might speed things up a bit, but let's face the facts: there's a long way to go before we are rid of messed up HTML markup.
Wishful thinking, but having taught HTML for years and having seen firsthand how both editors and others build Web pages, I can tell you that it is very unlikely that XHTML will be the norm anytime soon. Since the last millennium , it was predicted that traditional HTML was dead. Yet today, documents are still primarily created both by editor and by hand sloppily, rarely conforming to even traditional HTML standards let alone XHTML.
Although HTML has had rules for years, much of the time people don't really bother to follow them because they see little penalty nor obvious benefit to actually studying the language rigorously. Often, people learn HTML simply through imitation by viewing the source of existing pages, which are not always written correctly, and going from there. Like learning a spoken language, HTML's loosely enforced rules have allowed many document authors to get going quickly. Its biggest flaw is, in some sense, its biggest asset and has allowed millions of people to get involved with Web page authoring. Rigor and structure is coming, but it will take time and require new tools.
Although some will continue to craft pages like mechanical typesetting, as the Web editors improve and produce standard markup perfectly , the need to hand-tweak HTML documents will diminish. I hope designers will realize that knowledge of the "invisible pixel" trick is not a bankable resume item and instead focus on development of their talents as they also pursue a firm understanding of HTML markup, CSS, and JavaScript.
Although HTML is the basis for Web pages, you need to know a lot more than HTML to build useful Web pages (unless the page is very simple). Document design, graphic design, and quite often programming are necessary to create sophisticated Web pages. HTML serves as the foundation for all of these tasks , and a complete understanding of HTML technology can only aid document authors.
< Day Day Up > |