Chapter 17: Site Management | HTML & XHTML: The Complete Reference (Osborne Complete Reference Series)

< Day Day Up >

Even after all the work of building and delivering a Web site, the Web developer's job is not done. Web sites live on and must be maintained to be effective. There are many aspects to Web site maintenance, from adding new content to upgrading a server. This chapter focuses on Web site maintenance issues controlled by or related to markup. A brief discussion about the potential extent of site maintenance duties is presented at the end of the chapter.

Meta-Information

Meta-information is simply information about information. Information on the Web often involves many pieces of associated, descriptive information that isn't always explicitly represented in the resource itself. Examples of meta-information include the creator of a document, the document's subject, the publisher, the creation date, and even the title. When used properly, descriptive meta-information has many benefits. Meta-information can assist in a variety of tasks . For example, it makes the indexing task of search engines easier, and helps filtering software determine the presence of objectionable content. As already discussed, meta-information is related to linking because it helps provide meaning for a document's role in a global or local information space. Meta-information also can provide room for miscellaneous information related to the document. HTML and XHTML's primary support for meta-information is through the meta element, which allows authors to add arbitrary forms of meta-data. This tag is found in the <head> of an HTML document and generally has name and value attributes, but an alternative form uses http-equiv and content attributes.

The name Attribute

A <meta> tag that uses the name attribute is the easiest to understand. The name attribute specifies the type of information. The content attribute is set to the content of the meta-information itself. For example, the markup

  <meta name="Favorite Sandwich" content="Turkey and Swiss" />

defines meta-information indicating the document author's favorite lunch . Although metadata can be inserted into a document and list characteristics limited only by an author's imagination , there are some well- understood values that have meaning for Web search tools such as Google, AltaVista, and so on. Many search robots understand the author , description , and keywords values for the name attribute. By setting the name and content attributes, developers can add meta-information to the head of their documents and improve the indexing of their pages by Web search robots. The following code sets the description of a Web page for a fictitious company:

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">   <html xmlns="http://www.w3.org/1999/xhtml" lang="en">   <head>   <title>  Demo Company Home Page  </title>   <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />   <meta name="author" content="Demo Company, Inc." />   <meta name="description" content="Demo Company, the #1 vendor   of green gadgets on the Web" />   <meta name="keywords" content="Demo Company, green gadgets, gadgets" />   </head>   <body>  . . .Content of the page. . .  </body>   </html>

As this example demonstrates , HTML authors can improve the indexing of their pages simply by providing the appropriate keywords in the correct <meta> tag format and alerting the search robot to the site's existence. This will be discussed in more depth later in this chapter in the section "Search Engine Promotion." For now, let's turn our attention to the other various uses of the <meta> tag.

<meta> and http-equiv

The other form of the meta element uses the http-equiv attribute, which directly allows the document author to insert HTTP header information. The browser can access this information during read time. The server also can access it when the document is sent, but this is rare. The http-equiv attribute is set to a particular HTTP header type, whereas the content value is set to the value associated with the header. For example, the markup

  <meta http-equiv="Expires" content="Wed, 04 Jun 1998 22:34:07 GMT" />

placed in the head of a document sets the expiration date to be June 4, 1998. A variety of HTTP headers can be specified by a <meta> tag. Of course, you would generally not want to set them in the page but in the actual HTTP headers themselves if you have access to the server. If you don't have access to set this up on your Web server, you may find this form of <meta> tag very useful, especially for cache control, client-pull , and site filtering.

Cache Control

Caching on the Web involves keeping a copy of a page or media item either locally on a user 's disk drive or up on a proxy server on the network to avoid fetching a brand new copy from a Web site. This is a very good idea because it avoids redundant network traffic. Consider the value of refetching a page over and over and over again for numerous visitors if it is not changing. However good the idea of caching might be, very often browsers or proxy servers cache pages too aggressively, which causes users to inadvertantly view old content. The <meta> tag can be used to influence caching by setting expiration dates as well as to provide other cache control information.

There are three <meta> tag forms that you can use to control document caching.The first, Expires , actually is supposed to specify an expiration date for the Web page. You can set the date for expiration in the past, and the browser or proxy should always ask for a new page. For example, as previously shown, we placed

  <meta http-equiv="Expires" content="Wed, 04 Jun 1998 22:34:07 GMT" />

in the head of a document to set the expiration date to be June 4, 1998. Because this obviously is in the past, it should cause the page to expire. However, it's far easier to set the content attribute value to , which should indicate an expiration time of "now," and therefore cause the browser to ask for a new version of the page every time:

  <meta http-equiv="Expires" content="0" />

Of course, you also can set a real time value in GMT format, as shown in the previous examples, to indicate a page expiration at a future date.

Aside from setting the expiration date, two values for the http-equiv attribute- Pragma and Cache-Control -are specifically designed to prevent (or control) caching, and should take a value of no-cache . So, to prevent your page being cached in most browsers, you should use the following lines:

  <meta http-equiv="Pragma" content="no-cache" />   <meta http-equiv="Cache-Control" content="no-cache" />

While cache control is certainly a good idea, far too many sites don't use it or use it too aggressively, thus defeating the value of network caches. Furthermore, some caches on the Web really don't respect the cache control information provided to them. Lastly, the <meta> tag simply does not provide developers all that they need. In most cases, the images and other embedded media items are really the items that need cache control information set and these cannot be set using this tag; instead, the server must be configured in such a way as to add these cache control options to each object's response headers.

Client-Pull

The concept of a page reloading itself or loading another page after a certain period of time is called client-pull . For example, you can build an entry page, or splash page, that welcomes visitors to a site and then automatically follow with a second page after a certain period of time. The following example <meta> tag loads a page called secondpage.htm two seconds after the first page loads:

  <meta http-equiv="REFRESH" content="2;URL=secondpage.htm" />

Using the client-pull form of the <meta> tag is easy. Just set the content equal to the desired number of seconds, followed by a semicolon and the URL (full or relative) of the page to load. Note, however, that not all browsers support this form of meta-refresh, so often people add a link to a page that indicates the user should click on the link if a page does not refresh after a certain amount of time. Consider that even modern browsers such as Internet Explorer can set their security preferences to disallow client-pull, so don't assume it is always available.

The meta element is very open -ended. The World Wide Web Consortium (W3C) is already developing more sophisticated approaches for representing metadata. The most interesting approach probably is PICS, described next , which provides a standard for site filtering.

Site Filtering with PICS

One major use of meta-information for links and pages is site filtering. At its base level, a filter can be used to restrict access to certain files or types of information. As a technology, this sounds rather innocuous , but when extended, site filtering can quickly lead to censorship. Whether filtering information on the Internet is right or wrong is an area of great debate. Obviously, parents and educators are extremely concerned with the availability of pornographic, violent, or other "inappropriate" types of information on the Internet. Deciding what is inappropriate is the key to the censorship problem because definitions of what should be allowed vary from person to person. Regardless of how "inappropriate" is defined, few people would disagree that information considered inappropriate by just about everyone does exist on the Internet. The perceived extent of this information tends to be directly related to a person's belief system. The W3C has proposed the Platform for Internet Content Selection , or PICS (http://www.w3.org/pub/WWW/PICS/), as a way to address the problem of content filtering on the Web.

The idea behind PICS is relatively simple. A rated page or site will include a <meta> tag within the head of an HTML document. This <meta> tag indicates the rating of the particular item. A rating service, which can be any group , organization, or company that provides content ratings, assigns the rating. Rating services include independent, nonprofit groups such as the Internet Content Rating Association (ICRA) (http://www.icra.org/webmasters/). The rating label used by a particular rating service must be based on a well-defined set of rules that describes the criteria for rating, the scale of values for each aspect of the rating, and a description of the criteria used in setting a value.

To add rating information to a site or document, a PICS label in the form of a <meta> tag must be added to the head of an HTML file. This particular <meta> tag must include the URL of the rating service that produced the rating, some information about the rating such as its version, submitter, or date of creation, and the rating itself. Many rating services, such as ICRA allow free self-rating. Filling out a form and answering a few questions about a site's content are all that is required to generate a PICS label.

After you complete and submit the questionnaire, you will receive the appropriate meta-information, which then can be placed in the head of your XHTML documents. An example of a PICS label using the ICRA rating is shown here:

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">   <html xmlns="http://www.w3.org/1999/xhtml" lang="en">   <head>   <title>  PICS Meta Tag Example  </title>   <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />   <meta http-equiv="pics-label" content='(pics-1.1   "http://www.icra.org/ratingsv02.html" comment   "ICRAonline EN v2.0" l gen true for   "http://www,htmlref.com" r (nz 1 vz 1 lz 1 oz 1 cz 1)   "http://www.rsac.org/ratingsv01.html" l gen true for   "http://www,htmlref.com" r (n 0 s 0 v 0 l 0))' />   </head>   <body>   <h1 align="center">  XHTML: The Complete Reference  </h1>   <hr />   <p>  There's nothing offensive at this site.  </p>   </body>   </html>

Under the RSACi rating system, which is now administered by the ICRA, information is rated based on nudity, sex, violence, and language, on a five-category scale from 0 to 4. You can see the questionnaire and get more information at the ICRA Web site (http://www.icra.org/_en/label/extended/).

Note	While ICRA has taken over most PICS ratings, the older RSACi style is still used because it is recognized by many filtering devices including Internet Explorer.

When filtering software reads a file that contains a rating, it determines whether the information should be allowed or denied . Very strict filtering environments might deny all sites that have no rating, so sites with a broad audience are encouraged to use ratings to avoid restricting readership .

Filtering technology that supports PICS is beginning to achieve widespread acceptance and use. Internet Explorer already includes PICS-based rating filtering, as shown in Figure 17-1.

Figure 17-1: PICS rating support under Internet Explorer

Note	The <meta> tag with PICS information must occur within the head of the document; otherwise , it will not be recognized. However, more than one <meta> tag may be included within the head so that multiple rating services can be used simultaneously .

Numerous filtering software packages, such as www.cyberpatrol.com, are extremely popular both with parents and corporate users trying to limit employee Web abuse. Of course, the technology itself can't cure the problem. Trust in a particular ratings system is a major stumbling block in adoption of the filtering idea. Even when trust is gained , if the rating system seems confusing or arbitrary, its value is lowered . In the "real world," Hollywood's MPAA movie rating system has a single value of G, PG, PG-13, R, or NC-17 for each movie. The assignment of a particular movie rating is based on many factors that often seem arbitrary to casual observers. When considering movies, parents might wonder how scenes of a dinosaur ripping a man to shreds merits a PG or PG-13 rating, whereas the use of certain four-letter words indicates an R rating. Certainly similar situations occur on the Web. Because of the imprecise nature of ratings, the topic is a loaded one, both off and on the Web.

Beyond simple content rating, some potential benefits of PICS aren't immediately obvious. With PICS-based environments, employers could limit employee access to Web sites that are used for day-to-day business. The idea of PICS can be extended not just to deny or allow information, but to prefer it. Imagine a filtering service for search engines that could return sites that have a particular quality of content or level of accuracy. In the general sense, labels are important because they allow documents to move beyond a mere description of where the document is to what the document is about .

< Day Day Up >