Web Tools

team lib

What exactly is a Web site? Technically speaking, this question isn't difficult to answer. A Web site can be defined as a collection of related files, usually written in HTML, that are delivered to users via HTTP.

Beyond these basics, of course, there's much more technology that a Web site can employ . Server- and client-side scripting are common techniques for making Web pages dynamic and interactive; links to databases are increasingly important; ActiveX controls and Netscape plug-ins have extended the range of data types that Web browsers can interpret; and HTTP is no longer the only protocol that matters (for example, UDP audio and video streams are increasingly important).

But for people involved with the Webconsumers, publishers, developers, those who evaluate their effortsmere technology doesn't define a Web site. Rather, most of us evaluate a site in terms of its former physical equivalent, whether that be paper or bricks and mortar.

For example, a publisher experienced in producing traditional print media is likely to see a Web site as a magazine, designed to serve readers with interesting articles and, along the way, lure them into reading some paid advertising. A retailer, on the other hand, is likely to see a Web site as a store, designed to showcase products and herd customers toward a metaphorical checkout counter.

To those of you who are Web site developers, or merely interested in their predicament, I would like to propose a less commonly encountered metaphor. Think of a Web site as if it were an airplane: The goal is to carry as many passengers as possible, get each traveler to his or her destination safely and efficiently , and leave customers pleased and eager to return for their next journey.

Clearly, any successful trip involves extensive preflight checking; only this can ensure that nothing in the plane is broken and that there is adequate support for every foreseeable requirement during the journey. The trip also requires in-flight monitoring. Error conditions should be tracked, and preferably responded to automatically via redundant systems.

Finally, after a successful landing (a user leaving the Web site) there is still plenty of postflight work to be done. Since on this particular airline we are both the engineering and the marketing department, we must do everything from keeping track of how many seats were full to surveying customers to find out what they thought of the food and the movie, to doing essential maintenance on the aircraft itself.

The metaphor is useful because Web tools, the subject of this tutorial, parse rather neatly into preflight, in-flight, and postflight categories. Read on, and I will provide an overview of the first two categories, then look at postflight analysis in greater detail.

Preflight Tasks

If a Web site is brand-new , then its developers face a task fortunately unfamiliar to any airline employee: engineering it from the ground up. The first preflight task is, then, creating the text and graphics that will be incorporated into the site.

The next step is conversion into HTML. Many productivity programssuch as both Microsoft Word and Excel in the new Office 2000 versionscan produce HTML-formatted output as a matter of course. However, many developers still favor simple ASCII text editors such as Notepad or BBEdit, using them either to code HTML by hand or to tweak the output of some other program.

Some programs maintain their classic what-you-see-is-not-what-you-get format, while adding menus that insert the most commonly used HTML tags. Others, such as Softquad's HotMetal Pro, let the user switch between a text editor view and a what-you-see-is-what-you-get view.

Many editors also offer HTML syntax checking, plus additional Web features such as cascading style sheets, dynamic HTML, and Extensible Markup Language (XML). (For more information on XML, see "Lesson 124: XML and XSL," November 1998.)

Whether out of necessity or just the desire to stand out in a crowded marketplace , many editors offer some degree of site management. For example, when used with complementary server extensions, Microsoft's FrontPage lets multiple users author parts of a site simultaneously from remote clients if necessary. It also simplifies creating threaded discussions and site indexes.

Whatever editor you use, the next essential preflight check is to test Web pages using multiple browsersdifferent releases of Netscape Navigator and Microsoft Internet Explorer at a bare minimum, and preferably others. The purpose of this is merely to evaluate the appearance of the new pages as rendered by the different browsers.

The final preflight step, checking a site to make sure all of its links work, used to be laborious and boring. Now it is easily performed by link-checking software. For detailed information on this, see "Control Web Site Content," June 1999.

In-Flight Monitoring

Once a Web site has been devised, checked out, and placed on a server, some developers consider their job doneuntil and unless they get e-mail from a user complaining that some feature doesn't work. However, with the critical nature of today's sites, this might seem the moral equivalent of waiting for a passenger to press the flight attendant call button in order to find out that an engine is on fire.

High-availability Web sites may require redundant servers that are teamed via simple round- robin DNS or perhaps a dedicated hardware unit such as Cisco Systems' Local Director, which watches TCP connections, detecting when one server fails to respond and switching clients to another.

With or without the aid of such hardware, in-flight software tools can monitor Web transactions for a bevy of error conditions. They can then automatically alert an administrator, distribute traffic around a failed server or a broken link, perform load balancing, and monitor the performance of back-end databases.

In fact, these tools include Web-specific versions of Application Performance Management (APM) software (see "Application Performance Management Software," May 1999). As such, they can operate via monitoring SNMP traps, sniffing packets, collecting data from specially designed server extensions, or a combination of all of these.

Back Down On The Ground

While not every Web site requires active, in-flight monitoring, eachto abuse my metaphor one last time certainly needs a postflight "ground crew." Scrutinizing server logs has always been important. Now, given the commercial importance of today's Web sites, it is vital .

Automatically created as users access a server, log files were originally devised simply for the benefit of system administrators. As well as providing a crude way of tracing broken links and other errors, they gave an ongoing indication of exactly when demand for a site was highest. Today, logs are being used as a primary source of demographic information about visitors . What ad banners do they click on most? What pages do they read often, and how long do they linger before clicking through to another? What companies do they come from?

To answer questions such as these, every Webmaster needs a log file analyzer. I will devote the remainder of this tutorial to this type of program and what it can (and cannot) do.

Over the years , HTTP servers have used a variety of log file formats. The baseline standard, however, is the Common Log Format (CLF), originally created by the National Center for Supercomputing Applications (NCSA) for its HTTPd server software.

The access log in CLF records the bulk of the information. Every successful (or merely attempted) file access results in a new line being added to this log. Each line contains seven data fields that are separated by spaces and further delineated, when necessary, by brackets or quotation marks.

To provide you with a random example, I used my desktop computer at Network Magazine to access a personal home page that is hosted externally (see Figure).

Only the first seven fields in this table would be found in a CLF access log. Servers using CLF record information about what Web page visitors came from last and what type of browser they are using, but they place this in separate referrer and agent log files. Since juggling these files can be cumbersome, most Web servers today use Extended Log Format (ELF), which incorporates referrer and agent information into the access log itself (as seen on the bottom line in the figure).

click to expand

Collecting More Information

Other log file formats have been designed to collect more extensive information. For example, Microsoft's Internet Information Server (IIS) adds metrics such as the name or IP address of a server, the amount of time it takes the server to process a request, the number of bytes received from a client, and Windows NT-specific status information.

Since the majority of visitors to any Web site do not have permanent, routable IP addresses, one visitor may tend to look much like another. If, for example, 20 or 30 users from AOL are accessing your site at the same time, your log file will record plenty of AOL activity but will not help you understand the movements of any individual. To get around this problem, cookies were invented. A cookie is a unique identifying code that a Web server can give to a client to store locally when a site is first visited. Later, the server can read the cookieonly the one it created, not othersto see if the user is a repeat visitor. Naturally, the cookie can be linked to a name, password, and other demographic information that you provide on first visiting the Web site.

Most users now realize that cookies are not a serious privacy threat. They can, however, cause problems in cases where many users share the same machinein a public library, for example. Conversely, cookies have difficulty keeping track of individuals who habitually use several machines to do their Web browsing.

When you import a log file into analyzer software, the program may start by converting raw IP addresses into domain names, assuming reverse DNS lookup has not already been performed by the Web server. It may then convert file names to actual page titles, making all subsequent reports easier to read.

Beyond that, an analyzer can then parse the log file to provide a wide variety of reports, both tabular and graphical. High-end analyzers, known as Web mining solutions, can even link to other databases in an enterprise, answering such questions as "How many visitors to the site purchased something from our online store?"

Download one of the many demos available and try it with your own log files. This will let you evaluate the software with all the appreciation , and skepticism, it deserves .

This tutorial, number 132, by Jonathan Angel, was originally published in the July 1999 issue of Network Magazine.

 
team lib


Network Tutorial
Lan Tutorial With Glossary of Terms: A Complete Introduction to Local Area Networks (Lan Networking Library)
ISBN: 0879303794
EAN: 2147483647
Year: 2003
Pages: 193

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net