2.1. The Anatomy of a Web Page
Web pages are written in
(HyperText Markup Language), which is the language of the Web. It doesn't matter whether your Web page contains a series of plain text blog entries, a
pictures of your pet lemur, or a heavily formatted screenplayodds are if you're looking at it in a browser, it's an HTML page.
HTML plays two key roles:
a Web browser how to format a page
. Although there are plenty of computer programs that can format text (take Microsoft Word, for instance), it's almost
to find a single standard that's supported on every type of computer, operating system, and Web-enabled device. HTML fills the gap by supplying information that any browser can interpret. These formatting details include specifications about colors, headings, text alignment, and so on.
HTML links different documents together
. These links can take several forms. You can use
(discussed in Chapter 8) to let people surf from one Web page to another. You can also use HTML instructions to call up pictures (Chapter 7) or even other Web pages (Chapter 10) and combine them into a single Web page.
HTML is such an important standard that you'll
a good portion of this book digging through most of its features, frills, and shortcomings. Every Web page you'll build along the way is a bona fide HTML document.
an HTML Document
On the inside, an HTML page is actually nothing more than a plain-
text file. That means every Web page consists entirely of
, and just a few special
(like spaces, punctuation, and everything else you can spot on your keyboard). This file is quite different than what you would find if you cracked open a typical
on your computer. (A binary file contains
computer languagea series of 1s and 0s. If another program is foolish enough to try and convert this binary information into text, you end up with gibberish.)
To understand the difference, take a look at Figure 2-1, which examines a Word document under the microscope. Compare that with what you see in Figure 2-2, which dissects an HTML document containing the same content.
To take a look at an HTML document, all you need is an ordinary text editor, like Notepad, which is included on all Windows computers. To run Notepad, click the Start button and select Programs
Notepad. Then choose File
Open and begin hunting around for the HTML file you want. On the Mac, try TextEdit, which you can find at Applications
TextEdit. Choose File
Open and then find the HTML file. If youve downloaded the companion content for this book (all of which you'll find on the "Missing CD" page at www.missingmanuals.com), try opening the
file, shown in Figure 2-2.
Word documents are stored as binary information, as are documents in most file formats used by most computer programs.
Top: Even if your document looks relatively simple in the Word window, it doesn't look nearly as pretty when you bypass Word and open the file in an ordinary text editor like Notepad or TextEdit.
Bottom: Depending on the program you use, the string of ones and zeroes in the file is usually converted into a meaningless stream of intimidating gibberish. The actual text is there somewhere, but it's buried in computer gobbledygook.
Unfortunately, most text editors don't let you open a Web page directly from the Internet. In order to do that, they'd need to be able to send a request over the Internet to a Web server, which is a job that's best left to the Web browser. However, most browsers
give you the chance to look at the raw HTML for a Web page. Here's what you need to do:
Open your preferred browser
Navigate to the Web page you want to examine
In your browser, look for a menu command that allows you to view the source content of the Web page. In Internet Explorer (or Opera), select View
Source. In Firefox and Netscape, use View
Page Source. In Safari, View
View Source does the trick. Isnt diversity a wonderful thing
Once you make your selection, a new window appears showing you the HTML used to create the Web page. This window may represent a built-in text viewer that's included with the browser, or it may just be Notepad or TextEdit. Either way, you'll see the raw HTML.
HTML documents are stored as ordinary text.
Top: What you see in the Web browser is much easier to understand than what you see in an ordinary text editor.
Bottom: You can easily spot all the text from the original, along with a few extra pieces of information inside angled brackets (< >). These are HTML tags.
Firefox has a handy feature that lets you home in on part of the HTML in a complex page. Just select the text you're interested in on the page, right-click it, and then choose View Selection Source.
Most Web pages are considerably more complex than the
example shown in Figure 2-2, so you'll need to wade through many more HTML tags. But once you've acclimated yourself to the
of information, you'll have an extremely useful way to peer under the covers of any Web page. In fact, professional Web developers often use this trick to check out the snazziest work of their
POWER USERS' CLINIC
Going Beyond HTML
of HTML designed it
research papers and other unchanging documents on the Web. They didn't envision a world of Internet auctions, e-commerce
, and browser-based
. To add all these features to the modern Web browsing experience, crafty people have supplemented HTML with some tricky workarounds. And although it's more than a little confusing to consider all the ways you can extend HTML, doing so is the best way to really understand what's possible on your own Web site.
Here's an overview of the two most common ways to go beyond HTML:
. Most modern browsers support
, which are small programs than run inside your Web browser, and display information in a window inside a Web page. (To try one out and play some head-scratching Java Checkers against a computer
, surf to http://thinks.com/java/checkers/checkers.htm.) Internet Explorer can also host special tools called
. ActiveX is a Microsoft-
technology for sharing useful widgets between different programs and Web pages. (To see an ActiveX control in use, check out TrendMicro's free virus scanner at http://housecall.trendmicro.com.) Both Java applets and ActiveX controls are
programs that can be used in a Web page (if the browser supports it), but
are written in HTML.
. Browsers are designed to deal with HTML, and they don't recognize other types of content. For example, browsers don't have the ability to interpret an Adobe PDF document, which is a specialized format used to preserve the formatting of documents. However, depending on how your browser is configured, you may find that when you click a hyperlink that points to a PDF file, a PDF reader launches. The automatic launch happens if you've installed a plug-in from Adobe that runs the Acrobat software (which displays PDF files). (To see for yourself, request the sample chapter www.oreilly.com/catalog/exceltmm/chapter/ch04.pdf from
Excel: The Missing Manual
.) Another example of a common plug-in is Macromedia Flash, which shows animations on a Web page. If you surf to a page that includes a Flash animation and you don't have the plug-in, you'll be asked if you want to download it. (Check out www.orsinal.com to play some of the best free Flash games around.)
Unfortunately, there's no surefire way to tell what extensions are at work on a particular page. In time, you'll learn to spot many of the telltale signs, because each type of content looks distinctly different.
2.1.2. Creating Your Own HTML Files
Here's one of the
secrets of Web page writing: You don't need a live Web site to start creating your own Web pages. That's because you can easily build and test Web pages using only your own computer. In fact, you don't even need an Internet connection.
The basic approach is simple:
Fire up your favorite text editor
Start writing HTML content
Of course, this part is a little tricky because you haven't explored the HTML standard yet. Hang
is on the way in the
When you've finished your Web page, save the document (a simple File
Save usually does it)
By convention, HTML documents typically have the file extension
, as in
. Strictly speaking, these extensions aren't necessary, because browsers are perfectly happy displaying Web pages with any file extension. You're free to choose any file extension you want for your Web pages. The only rule is that the file has to contain valid HTML content. However, using the
file extensions is still a good idea; not only does it save confusion, it also helps your computer recognize that the file contains HTML in other situations. For example, when you double-click a file with the
extension, it opens in your Web browser automatically.
To take a look at your work, open the file in a Web browser
If you've used the extension
, it's usually as easy as double-clicking the file. If not, you may need to type in the full file
in your Web browser's address bar, as shown in Figure 2-3.
Remember, when you compose your HTML document in a text editor, you won't be able to see what the formatting actually looks like. All you'll see is the plain text and the HTML formatting instructions.
If you change and save the file
you open it in your Web browser, you can take a look at your recent changes by
the Refresh button.