37. | CGI Programming with Perl

CGI Programming with Perl

5.4. Alternatives for Generating Output

There are many different ways that people output HTML from their CGI scripts. We have just looked at how you do this from CGI.pm, and in the next chapter we will look at how we can use HTML templates to keep the HTML separate from the code. However, let's look here at a couple of other techniques developers use to output HTML from their scripts.

One thing to keep in mind as we look at these techniques is how difficult the HTML is to maintain. Over the lifetime of a CGI application, it is often the HTML that changes the most. Thus much of the maintenance of the application will involve making changes to the design or wording found in the HTML, so the HTML should be easy to edit.

5.4.1. Lots of print Statements

The simplest solution for including HTML in the source code is the hardest to maintain. Many web developers start out writing CGI scripts that contain numerous print statements to return documents, even for large sections of static content -- content that remains the same each time the CGI script is called.

Here is an example:

#!/usr/bin/perl -wT use strict; my $timestamp = localtime; print "Content-type: text/html\n\n"; print "<html>\n"; print "<head>\n"; print "<title>The Time</title>\n"; print "</head>\n"; print "<body bgcolor=\"#ffffff\">\n"; print "<h2>Current Time</h2>\n"; print "<hr>\n"; print "<p>The current time according to this system is: \n"; print "<b>$timestamp</b>\n"; print "</p>\n"; print "</body>\n"; print "</html>\n";

This is a pretty basic example, but you could imagine just how complicated this can get on a large web page with numerous graphics, nested tables, style declarations, etc. Not only is this difficult to read because of the extra noise that each print statement adds, but each double quote in the HTML must be escaped with a backslash. If you forget to do this even once, you will likely generate a syntax error. Making HTML edits to something that looks like this is much more work than it should be. You should definitely avoid this approach in your scripts.

5.4.2. Here Documents

As we have seen in earlier examples, Perl supports a feature called here documents that allows you to express a large block of content separately within your code. To create a here document, simply use << followed by the token that will be used to indicate the end of the here document. You can include the token in single or double quotes, and the content will be evaluated as if it were a string within those quotes. In other words, if you use single quotes, variables will not be interpreted. If you omit the quotes, it acts as though you had used double quotes.

Here is the previous example using a here document instead:

#!/usr/bin/perl -wT use strict; use CGI; my $timestamp = localtime; print <<END_OF_MESSAGE; Content-type: text/html <html>   <head>     <title>The Time</title>   </head>      <body bgcolor="#ffffff">     <h2>Current Time</h2>     <hr>     <p>The current time according to this system is:      <b>$timestamp</b></p>   </body> </html> END_OF_MESSAGE

This is much cleaner than using lots of print statements, and it allows us to indent the HTML content. The result is that this is much easier to read and to update. You could have accomplished something similar by using one print statement and putting all the content inside one pair of double quotes, but then you would have had to precede each double quote in the HTML with a backslash, and for complicated HTML documents this could get tedious.

Another solution is to use Perl's qq// operator, but with a different delimiter, such as ~. You must find a delimiter that will not appear in the HTML, and remember that if your content includes JavaScript, it can include many characters that HTML might otherwise not. here documents are generally a safer solution.

One drawback to using here documents is that they do not easily indent, so they may look odd inside blocks of otherwise cleanly indented code. Tom Christiansen and Nathan Torkington address this issue in the Perl Cookbook (O'Reilly & Associates, Inc.). The following solutions are adapted from their discussion.

If you do not care about extra leading whitespace in your HTML output, you can simply indent everything. You can also indent the ending token if you use quotes and include the indent in the name (although this is more readable, it may be less maintainable because if the indentation changes, then you must adjust the name of the token to match):

#!/usr/bin/perl -wT use strict; use CGI; my $timestamp = localtime; display_document( $timestamp ); sub display_document {     my $timestamp = shift;          print <<"    END_OF_MESSAGE";       Content-type: text/html              <html>         <head>           <title>The Time</title>         </head>                  <body bgcolor="#ffffff">           <h2>Current Time</h2>           <hr>           <p>The current time according to this system is:            <b>$timestamp</b></p>         </body>       </html>     END_OF_MESSAGE }

One problem with indenting HTML here documents is that the extra indentation is sent to the client. You can solve this problem by creating a function that "unindents" your text. If you wish to remove all indentation, this is simple; if you want to maintain your HTML's indentation, this is more complex. The challenge is determining the amount of indentation to remove: what portion belongs to the content and what part is incidental to your script? You could assume the first line contains the smallest indent, but this would not work if you were only printing the end of an HTML document, for example, when the last line would probably contain the smallest indent.

In the following code the unindent subroutine looks at all of the lines being printed, finds the smallest indent, and removes that amount from all of the lines:

sub unindent; sub display_document {     my $timestamp = shift;          print unindent <<"    END_OF_MESSAGE";       Content-type: text/html              <html>         <head>           <title>The Time</title>         </head>                  <body bgcolor="#ffffff">           <h2>Current Time</h2>           <hr>           <p>The current time according to this system is:            <b>$timestamp</b></p>         </body>       </html>     END_OF_MESSAGE } sub unindent {     local $_ = shift;     my( $indent ) = sort /^([ \t]*)\S/gm;     s/^$indent//gm;     return $_; }

Predeclaring the unindent function, as we do on the first line, allows us to omit parentheses when we use it. This solution, of course, increases the amount of work the server must do for each request, so it would not be appropriate on a heavily used server. Also keep in mind that each additional space increases the number of bytes you must transfer and the user must download, so you may actually want to strip all leading whitespace instead. After all, users probably care more about the page downloading faster than how it looks if they view the source code.

Overall, here documents are not a bad solution for large chunks of code, but they do not offer CGI.pm's advantages, especially the ability to have your HTML code verified syntactically. It's much harder to forget to close an HTML tag with CGI.pm than it is with a here document. Also, many times you must build HTML programmatically. For example, you may read records from a database and add a row to a table for each record. In these cases, when you are working with small chunks of HTML, CGI.pm is much easier to work with than here documents.

Using CGI.pm's methods for outputting HTML generates strong reactions in developers. Some love it; others don't. Don't worry if it doesn't match your needs, we will look at a whole class of alternatives in the next chapter.


5.3. Generating Output with CGI.pm		5.5. Handling Errors