Although this chapter is concerned mainly with running and maintaining a Web server, it's important that you understand something of how the Web pages served by a Web server come into being. Some of the preceding sections have already described some types of Web pages (particularly dynamic content). The most common type of static content is HTML, which is a text-based format with formatting extensions. There are several different tools available to help you create HTML, as well as the file formats upon which HTML pages frequently rely, such as graphics file formats. Understanding how to use these tools, and how Web browsers interpret HTML, will help you create Web sites that can be handled by most Web browsers available today.
HTML and Other Web File Formats
Although there are many tools for creating Web files, as discussed in the next section, "Tools for Producing Web Pages," it helps to understand something about the various file formats that are common on static Web pages. File formats that are common on the Web include various text file formats, graphics files, and assorted data files.
Most Web pages are built around an HTML text file. This file is a plain text file that you can edit in an ordinary text editor. Listing 20.2 shows a simple HTML file as an example. Most text in an HTML file is displayed in the Web browser's window, but text enclosed in angle brackets ( <> ) is formatting information. Many of these codes come in pairs, with the second bearing the same name as the first but using a slash ( / ) to indicate it's the end of the formatted area. The opening code sometimes includes parameters that fine-tune its behavior, such as setting the size and filename of a graphic or specifying the color of text and background. Some of these codes reference other documents on the Web (both on the main document's server and on other Web servers).
Listing 20.2 A sample HTML file
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>Sample Web Page</TITLE> </HEAD> <BODY BGCOLOR="#FFFFFF" TEXT="#000000"> <CENTER><H1 ALIGN="CENTER">Sample Web Page</H1></CENTER> <IMG SRC="graphics/logo.jpg" ALT="Logo" WIDTH="197" HEIGHT="279"> <P>This is a sample Web page, including <A HREF="http://www.threeroomco.com/anotherpage. html">a link.</A></P> </BODY></HTML>
Some of the formatting codes in Listing 20.2 should be self-explanatory, but others are more obscure. A handful of the more important codes include the following:
It's possible to use nothing but these codes to create a Web page, but HTML supports many more options, including the ability to format tables, specify fonts, display bulleted or numbered lists, break the document into multiple independent frames , and so on. It's possible to over-use advanced HTML features, though. The upcoming section, "Web Page Design Tips," includes some information on this matter.
In addition to HTML, Web servers can deliver other document types to browsers. Indeed, HTML documents often refer to these documents directly, as in the <IMG> option in Listing 20.2. You can link to plain text pages, graphics, downloadable program files, scripts, or any other type of file. One important caveat is that your Web server should have an appropriate MIME type set for your documents, usually in the mime.types file described in the earlier section, "Understanding Apache Configuration Files." If Apache can't determine the MIME type of the file, it usually sends it as plain text, which can cause problems because the target OS may alter certain characters in the file, thus corrupting it.
Because many Web pages incorporate extensive graphics, it's important to understand something of the graphics file formats that are common on the Web. The three most common formats are as follows :
As a general rule, a lossless format is best for line art and cartoon-like images that use just a handful of colors. These images tend to acquire ugly-looking artifacts when converted to JPEG format. Digitized photos, by contrast, usually look best in a true-color format (PNG or JPEG), and JPEG's lossy compression scheme doesn't impact such images as much. Therefore, JPEGs are common for digitized photos displayed on the Web.
When you use JPEG, your graphics package will give you an option for a compression level. You can save your graphics file with little compression, which produces a large but good-looking image, or use a great deal of compression, which produces a much smaller file that degrades more in quality. The exact scale used to describe the level of compression varies from one package to another, but a 1 “100 scale is not uncommon, with 100 representing the best quality. Most images you're likely to put on the Web look acceptable at a fairly low compression level (say, around 50), and compressing these images can help reduce the load on your Web server and cause the images to appear more quickly in your users' Web browsers. You may want to experiment with different types of graphics files to learn what compression level works best for you.
Tools for Producing Web Pages
Although you can create Web pages by hand by editing the raw HTML in a text editor and using separate tools like The GIMP (http://www.gimp.org) to create or edit graphics, many Web page designers prefer to use GUI HTML design tools. These tools let you type in and edit text much as you can in a what-you-see-is-what-you-get (WYSIWYG) word processor, using buttons or special keystrokes to indicate centering, bold text, new paragraphs, and so on. This approach is certainly convenient , and Apache doesn't really care how you generate your files, so from a server operation point of view, there's no reason to avoid such tools. One exception is that Microsoft's Front Page can create Web pages that depend on special server extensions, so it's best to avoid it when using Apache.
Examples of Web page creation tools include the following:
If you use a Web page development tool, you should be aware of the limitations of these tools. Because of the nature of the Web, no two browsers are likely to display the same page in precisely the same way, but working with these tools makes it easy to overlook this fact. If the tool creates HTML that's optimized for particular browsers, your Web site's visitors may find your site difficult to read because of the assumptions your HTML editor made.
Web Page Design Tips
Some Web designers like to use HTML features to their fullest, thus creating a layout that can be almost as complex as anything that could be created on a printed page. There are drawbacks to using the more advanced HTML features, though. Specifically, it's impossible to predict precisely how a given browser will handle a code. Indeed, even the codes in Listing 20.2 aren't entirely consistent in their application ”as noted in the preceding descriptions, different browsers respond differently to the various codes used to center text, for instance. Font specifications work only if the font is installed on the client's Web browser; if it's not, the usual result is a fallback to an ugly default, such as Courier. Color specifications may interact poorly with a user's own color choices. (One particularly annoying error is specifying a background color without specifying a text color. If you specify a white background color but no text color, a user who has defaults set to white text on black background will be unable to read your page. Listing 20.2 specifies background and foreground colors, but it doesn't specify link colors, which can also be important in this equation.)
Because Web browsers vary wildly, it's best to test your Web pages on multiple browsers. At the very least, you should test on both Netscape Navigator and Microsoft Internet Explorer. If possible, you should test on multiple versions of these browsers. Other browsers that are popular, particularly in the Linux community, include Mozilla (http://www.mozilla.org, an open source cousin to Netscape Navigator), Opera (http://www.opera.com), Konqueror (a part of the KDE project), and Lynx (http://lynx.browser.org, a text-based Web browser). Lynx is particularly important if you want your site to be accessible to all users. Because it's text-based, it will turn up problems you might not notice in a GUI browser, but that might be important to somebody who uses Lynx, or to a visually impaired person who uses a speech synthesizer with a computer. Also, keep in mind that many (perhaps most) of your Web server users won't be using Linux. On Windows, Internet Explorer is the most popular browser, but others (including many of the preceding browsers) are available. MacOS, BeOS, OS/2, and many other platforms all sport their own browsers, some of which are shared with other platforms and some of which are not.