Section 11.4. Creating the Web Site | Practical Development Environments

11.4. Creating the Web Site

This section discusses the process of creating a web site for a project. As with many parts of a development environment, the simplest possible solutions are often the best. There are thousands of complicated web sites that describe how to design and build complicated web sites. To me, this seems like too much hard work for a site whose aim in life is to help communication between project members.

Let's assume that you know how to write basic HTML and already have a web server such as Apache (http://www.apache.org) running on a machine. The web server reads files from a group of directories and returns them to the visitors to the web site. You will also want to be able to run some other tasks on the same machine, mostly for generating dynamic web pages.

The first thing to make sure of is that you have a process for updating web pages in a controlled fashion. To make this easy, I suggest using an already familiar tool such as your SCM tool to control your HTML files. Some writers may be more comfortable with HTML editors that have built-in support for copying files to and from a web server. Even so, they should keep versioned copies of the sources to their web pages using an SCM tool.

11.4.1. Static Web Pages

The simplest solution that I have found for web pages that don't change very often starts with a small number of HTML files. Static files make life easy for a web server: they take less work to return to visitors and they can be mirrored on multiple web servers for redundancy and to spread the load. To update the web pages that people see when they are browsing, you can simply arrange for commits to these files to also check them out to whatever location the web server reads them from.

To do this with CVS, check out the HTML pages just once to get started, and add the line:

^html/*          $CVSROOT/CVSROOT/update_web_site

to your CVSROOT/commitinfo file. Then add an executable shell script named update_web_site to your CVSROOT directory and also name it in the checkoutlist file. Now whenever files in the html top-level directory in the repository are committed, the script update_web_site will be run. If the web server's pages are in /var/www/html, then the script should contain:

#!/bin/bash cd /var/www/html sleep 1 cvs -q update -dP &

The -q argument causes CVS to show only the filenames that are updated, and -dP will make sure that directories appear properly. The sleep 1 and & are necessary so that CVS doesn't interfere with its own system of locks on files in the repository. The disadvantage of this approach is that as the number of HTML files grows, it takes longer to update them all, and thus longer to commit a change to just one file.

Another approach for simple content is to use SCM browsing tools (such as ViewCVS and CVSWeb) and instead of using simple filenames in the web site's <a> links, use links that download and display the pages directly from the SCM browsing tool. For instance, a link from the home page to a page about builds might usually look like:

<a href="builds.html">Builds page</a>

but if you want to use ViewCVS to always return to the latest version of the file, you would change the link to look like:

<a href="http://cvs.example.com/cgi-bin/viewcvs.cgi/*checkout*/ builds.html?rev=HEAD&content-type=text/html">Builds page</a>

Of course, this approach does increase network traffic and the load on the SCM tool, in this case the CVS server at cvs.example.com. For moderately busy web sites, this approach seems to be responsive enough that most people can't tell when ViewCVS is being used.

For convenience, use CSS stylesheets to define the appearance of your web pages. This makes it easy to change the appearance of the whole web site from a single location.

If possible, avoid JavaScript and HTML frames for this kind of simple page: JavaScript is not always enabled in people's browsers and doesn't work at all with text browsers; frames make me feel like I'm reading a web page through a keyhole.

You can produce the illusion of menus expanding when clicked to show further choices, while using only HTML. First create an HTML fragment for each page that shows the menu for that page already expanded. Then include the HTML fragment in the page, so when that page is selected in the menu, the menu appears to expand as the HTML fragment shows all the other choices.

11.4.2. Dynamic Web Pages

Dynamic web pages are pages that are regenerated each time someone visits them. If you are using Apache as a web server, then Server Side Includes (SSIs) can work well for small amounts of dynamic information. SSIs are described in the Apache tutorial "Introduction to Server Side Includes" (http://httpd.apache.org/docs/howto/ssi.html); basically, you write HTML comments that tell the web server to insert some text or other content in place of the comment. For instance, the HTML text:

Today is <!--#echo var="DATE_LOCAL" -->

will be appear on the visited web page as:

Today is Fri Jan 28 20:59:54 PST 2005

There are all sorts of SSI commands available; they can be used to run a command and insert the command's output or to include other fragments of HTML. Web pages that use SSIs often have a filename with an .shtml suffix to distinguish them from HTML pages that don't have dynamic content.

If SSIs don't meet your needs, then it's time to investigate CGI (Common Gateway Interface) scripts. These are executables placed in designated locations on the web server. These scripts are typically written in Perl, Python, or PHP, but can in fact be ordinary shell scripts or any other executable file. When the script is visited, it is executed on the server and the HTML that it generates is returned to the visitor. Since these scripts are running on your web server and use arguments passed in by the visitor, security becomes a significant problem. You may also want to limit how often a script can be run, and checking for this within every CGI script can become tedious.

For those who feel that CGI scripts are overkill for a simple project web site, there is another approach. Using crontab or a scheduled task in Windows, arrange for the dynamic pages to be regenerated every minute or two. Say you want to include some news headlines in the home page of the web site. You can periodically run a script that will download the headlines, massage them into appropriate HTML for your page, and add a header and footer to the page. So long as people are aware of when the information on the page was last generated and when it will be regenerated, this approach seems to work well enough for small to medium-sized projects.

11.4.3. Indexing and Searching

Before you can search for anything on a web site, you will need to run an indexing program on the files. This may take some time and effort to get right, so when adding search capability to your project web site, decide carefully about:

Which files and documents you want to index and which ones you don't.
How you want to group files so that visitors can search in specific areas of the web site or exclude other areas from a search.
How much support for complex searching you want. Different indexing and search programs support different search patterns and syntaxes.

For an extensive collection of search tools, see http://www.searchtools.com/tools. One of the better open source search tools is ht://Dig (http://www.htdig.org). ht://Dig is a spider search engine, which means that it follows all the links within your web site. It can use external programs to index a wide variety of file types and it supports a good range of search patterns through the use of Boolean expressions and wildcards.

The higher-end approach to searching and indexing is to purchase a Google Search Appliance (http://www.google.com/enterprise/gsa), which is a piece of hardware that you connect to your network and allow to "crawl," indexing everything that it can reach. As of early 2005, these appliances start at $4,995 for the Google Mini. If your web site is public and paid advertisements are acceptable, you can have Google WebSearch index your site and provide your search interface for you.