13.1. HTML: The Notation of the Web
The World Wide Web is mostly text, and most of that text is in the specification language HTML (HyperText Markup Language). HTML is based on SGML (Standard General Markup Language), which is a way of adding additional text to one's text to identify logical parts of the document: "This is the title," "This is a heading," and "This is just a plain ole paragraph." Originally, HTML (like SGML) was supposed to identify just the logical parts of a documenthow it looked was up to the browser. Documents were expected to look different from one browser to another. But as the Web has evolved, two separate goals developed: Being able to specify lots of logical parts (e.g., including prices, part numbers, stocker ticker codes, temperatures, etc.), and being able to control formatting.
For the first goal, XML (eXtensible Markup Language) evolved that allows you to define new tags like <partnumber>7834JK</partnumber>. For the second goal, things like cascading style sheets were developed, which give you control over the way the page is displayed. Yet another markup language, XHTML, was developed, which is HTML in terms of XML.
For most of this chapter, we'll be introducing XHTML, but we're not going to distinguish it from original HTML. We'll just talk about it as HTML.
We're not going to have a complete tutorial for HTML here. There are many of these available, in both print and on the Web, and many are high-quality. Enter "HTML tutorial" into your favorite search engine and take your pick. Instead, we'll talk here about some general notions of HTML, and mention the tags that you should really know.
A markup language means that additional text is inserted into the original text to identify the parts. In HTML, the inserted text, called tags, are delimited with angle bracketsless-than and greater-than signs. For example, <p> starts a paragraph, and </p> ends a paragraph.
Web pages have several parts, and the parts nest within each other. The first is a DOCTYPE right at the top of the page that announces the kind of page this iswhether the browser should try to interpret it as HTML, XHTML, CSS, or what. Following the doctype comes a heading (<head>...</head>) and a body (<body>...</body>). The heading can contain information like the title nested within itthe ending of the title comes before the ending of the head. The body can have many pieces nested within it, such as images and paragraphs. All of the body and heading nests within <html>...</html> tags. Figure 13.1 shows a simple web page's source, and Figure 13.2 shows how the page appeared in Internet Explorer. Try this yourself! Type it in a simple text editor and save it with a html file suffix, and then open it in a Web browser. The only difference between this file and any Web page is that this file lives on your disk. If it were on a Web server, it would be a Web page.
Figure 13.1. Simple HTML page source.
Here are some of the tags that you should know:
The <body> tag can take parameters to set the background, text, and link colors. These colors can be simple color names like "red" or "green," or they can be specific RGB colors.
You specify colors in hexadecimal. Hexadecimal is another number system. Decimal is base 10. Hexadecimal is base 16. The decimal numbers 1 to 20 translate to hexadecimal 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, 12, 13, and 14. Think of hexadecimal "14" as 16 plus 4, which is 20.
The advantage of hexadecimal is that each digit corresponds to 4 bits. Two hexadecimal digits correspond to a byte. Thus, the three bytes of RGB colors are six hexadecimal digits, in RGB order. Hexadecimal FF0000 is red255 (FF) for red, 0 for green, and 0 for blue. 0000FF is blue. 000000 is black, and FFFFFF is white.
Headings are specified using tags <h1>...</h1> tHRough <h6>...</h6>. Smaller numbers are more prominent.
There are lots of tags for different kinds of styles: Emphasis <em>...</em>, italics <i>...</i>, boldface <b>...</b>, bigger <big>...</big> and smaller <small>...</small> fonts, typewriter font <tt>...</tt>, pre-formatted text <pre>...</pre>, block quotes <blockquote>...</blockquote>, and subscripts <sub>...</sub> and superscripts <sup>...</sup> (Figure 13.3). You can also control things like font and color using the <font>...</font> tags.
You can force a new line using <br />.
Use the <image src="/books/1/79/1/html/2/image.jpg"/> tag to insert images (Figure 13.4). The image tag takes where to get the image information from as the src= parameter. The source can be specified in one of several ways.
- If it's just a filename (like "flower1.jpg"), then it's assumed to be an image in the same directory as the HTML file referencing it.
- If it's a path, it's assumed to be a path from the same directory as the HTML page to that image. So if we had an HTML page in "My Documents" that referenced an image in my mediasources directory, we might have a reference to "mediasources/flower1.jpg". You can use UNIX (or DOS) conventions here, e.g., ".." references the parent directory, so "../images/flower1.jpg" would say to go to the parent directory, then down to images to grab image flower1.jpg.
- It can also be a complete URLyou can reference images on other servers entirely!
You can also manipulate the width and height of images with options to the image tag, e.g., <image depth="100" src="/books/1/79/1/html/2/flower.jpg"> to limit the height to 100 pixels and to adjust the width so that the picture keeps its height to width ratio. Using the optional alt, you can specify the text to be displayed if the image can't be displayed, e.g., for audio or Braille browsers.
You use the link or anchor tag <a href="someplace.html">from text</a> to create links from the current text (source anchor) to somewhere else (desti-nation anchor). In this example, someplace.html is the destination anchor for the linkit's where you go when you click on the link. The source anchor for the link is the "from text". The source anchor can be text or an image. As seen in Figure 13.5, the destination anchor can also be a complete URL.
Notice, too, in the Figure 13.5 that line breaks in the source file don't show up in the browser. We can even have line breaks in the middle of a link tag, and they don't impact the actual display of the link. The breaks that matter (that show up in the browser) are generated by tags like <br /> and <p>.
You can create bullet lists (unordered lists) and numbered lists (ordered lists) using the <ul>...</ul> and <ol>...</ol> tags, respectively. Individual items are specified using the tags <li>...</li>.
Tables are created using <table>...</table> tags. Tables are constructed out of table rows using <tr>...</tr> tags, and each row can have several table data items identified with <td>...</td> tags (Figure 13.6). Table rows nest within tables, and table data items nest within table rows.
There is lots more to HTML, such as frames (having subwindows within one's HTML page window), divisions (<div />), horizontal rules (<hr />), applets, and more. We have only covered the most critical tags for understanding the rest of this chapter. You can search the Web for HTML tutorials.