Programming CGI Applications in Perl | Mac OS X Tiger Unleashed

< Day Day Up >

Introduction to Web Programming

Writing an application for the Web is not as simple as writing an application or script that executes on a local machine. Web applications must obey the HTTP protocol, which, by design, is stateless and connectionless. This poses a problem for anything beyond simple programs that submit a form.

To understand the problem, consider the steps in running a normal piece of software from the Tiger desktop (this is a generic fictitious application):

1.	Double-click the application to display the Welcome screen.
2.	Provide basic input into the application screen by typing or clicking.
3.	The application provides feedback based on your input.
4.	Repeat steps 2 and 3 as necessary.
5.	Choose Quit from the application menu.
6.	The application saves your changes and preferences, and then exits.

To translate these operations into a web application, however, requires working around the limitations of the HTTP protocol.

Understanding the Stateless Nature of HTTP

When HTTP (Hypertext Transfer Protocol) was developed, the Web was never expected to become the consumer-driven mish-mash that it is today. HTTP was created to be simple and fast. When retrieving a web page, the client performs four actions. It first opens a connection to the remote server. The client then requests a resource from the server and sends form data, if necessary. Next, the client receives the results, and finally, it closes the connection.

This happens repeatedly for different page elements (or, depending on the browser and server, multiple requests can be made in one connection). When the browser has finished downloading data, that data is displayed on the user's screen. At this point in time, there is no connection between the client computer and the server. They have effectively forgotten each other's existence.

If the user clicks a link to visit another page on the server, the same process is repeated. The server has no advance knowledge of who the client is, even though they've just been talking. If you've seen the movie Memento, you'll understand this concept. The HTTP protocol suffers from a severe lack of short-term memory (statelessness).

Applying this new knowledge to the steps of using an application, the problems become obvious:

1.	Double-click the application. This is the equivalent of clicking a link on a web page or entering a URL into a browser. Launching a web application is nothing more than browsing its URL. No problems so far.
2.	It starts, displaying a welcome screen. An HTML welcome page is easily built with a link into the main application. Still no problems.
3.	You provide basic input into the application screen by typing or clicking. The trouble begins. Data entered on an HTML form is sent all at once. Providing live feedback to data isn't possible, except for rudimentary JavaScript functionality. Clicking links transports the browser to other pages, effectively losing any information you've already entered.
4.	The application provides feedback based on your input. The web application has access only to information provided as input in the form immediately preceding it. For example, assume that there are two forms in which a user inputs data, one right after the other. The first form submits to the second form. The second form, in turn, submits its data to a page that calculates results based on the entries in both form pages. Only the data in the second form will be taken into account. The first form's information no longer exists after submitting the second.
5.	Repeat steps 3 and 4 as necessary. During each repetition, the server is entirely unaware of what has come before. The application cannot build on previous input.
6.	Choose Quit from the application menu. This is a tough one. Remember that the connection to the web server lasts only long enough to retrieve a single page and send form data. This means that the web application effectively quits after any step of execution. Web software must be developed with the knowledge that the user can quit his browser at any given point in time. Doing so must not pose either a functional or security risk to the original software.
7.	The application saves your changes and preferences, and then exits. If a user quits in the middle of running an online application, there is no way for the software to know that this has occurred. It is up to the programmer to make sure that the website keeps track of a user's actions each time it is accessed.

So, how do you work around a protocol that was never designed to keep information between accesses? By employing session management techniques.

Maintaining State Through Session Management

A session, in web-speak, is the equivalent to the process of running an application from start to finish. The goal of session management is to help the web server remember information about a user and what that user has done in previous requests for the server. Using session management techniques, you can quickly create web applications that function like conventional desktop applications. Unfortunately, there is no perfect session management technique. There are several ways to approach the problem, but none offers a completely satisfying solution.

URL Variable Passing

URL variable passing is the simplest form of session management. To make a value available on any number of web pages, you can use the URL to pass information from page to page. For example, suppose that I had a variable, name, with the value of johnray that I wanted to be available even after clicking a link to another portion of the program. I could create links that looked like this:

http://www.acmewebsitecomp.com/webapp.cgi?name=johnray

http://www.acmewebsitecomp.com/reportapp.cgi?name=johnray

http://www.acmewebsitecomp.com/accountapp.cgi?name=johnray

Each of the three web applications would receive the variable name with the value johnray upon clicking the links. These applications could then pass the values along even further by appending the same information (?name=johnray) to links within themselves. Obviously, this would require the web applications to generate links dynamically, but it's a small price to pay for being able to reliably pass information from page to page.

This technique relies on the HTTP GET method. When a browser sends a GET request for a web resource, it can append additional data onto the request by adding it in the format:

 ?<variable>=<value>[&<variable>=<value>...]

The trouble with this approach is that to send large amounts of data between pages, you must construct extremely large URLs. Visually, this creates an ugly URL reference in the browser's URL field and could lead users to bookmark a URL that contains information about the current execution of the web application that might not be valid in subsequent executions such as the date or other time-sensitive information.

In addition, users can easily modify the URL line of the browser to send back any information to the server that they want. If you've just created a shopping cart application that passes a user's total to a final billing page where it is charged against that user's credit card, it is unlikely that you want him to be able to adjust the price of the merchandise he's purchasing.

Form Variable Passing

Similar to passing variables within a URL (the GET method) is using the POST method of transferring data. Instead of passing data directly in the request for a page, data is sent after the initial page request and cannot be directly modified by the user.

With POST, developers can use hidden form fields to hold values before they are needed. Assume that you have two forms: the first collects a first and last name, and the second collects an email address and phone number. Submitting the first form opens the second form, which, when submitted, saves the data to a file.

Each form could save its data to a file independently, but this is problematic when considering applications in which all data must be present before it can be saved. Session management can be used to ensure that all data is present when the final form is submitted.

For example, assume that the first form looks something like this:

 <form action="form2.cgi" method="post"> First Name: <input type="text" name="first"><br> Last Name: <input type="text" name="last"><br> <input type="submit"> </form>

This form submits two fields (first and last) to the form2.cgi. If the second form must collect an email address and phone number and submit them simultaneously with the first and last values, the form2.cgi could dynamically create a form that stored the original two fields in two hidden input fields:

 <form action="savedata.cgi" method="post"> Email Address: <input type="text" name="email"><br> Phone Number: <input type="text" name="phone"><br> <input type="hidden" name="first" value="first-value"> <input type="hidden" name="last" value="last-value"> <input type="submit"> </form>

Submitting this form would make all the field data available to the subsequent page (savedata.cgi).

NOTE

These examples show how you might use different techniques to pass data between web pages. For them to be effective, you must be able to dynamically generate the URLs and forms that contain your data. We're getting to that don't panic!

Unfortunately, the trouble with this approach is that only pages with forms can transfer data between one another. Form variable passing is usually used in conjunction with URL passing to cover all bases.

Data integrity is also an issue with this method because a savvy user could easily save an HTML form locally, edit the hidden field values, and then submit the data from the edited form.

NOTE

The URL and form variable passing methods are much more closely related than they appear. The technique of specifying variables and values within a URL is actually also a way of submitting a form called the GET method. When using the GET method, the values sent from a form are appended to the URL requested from the server. By doing this manually, we are simulating a form submission using GET.

The POST method, shown in these examples, sends the variable/value data to the server after requesting a resource. It does not append information to the URL and can only be used to send data via an actual form submission. In some cases, these two methods are used together, but this is not a common coding practice.

In general, POST is a cleaner code choice because it doesn't clutter your URL line. GET, however, creates URLs that can be bookmarked.

Cookies

Another way to pass information is to use a cookie. Cookies are variable/value pairs that are stored on a user's computer and can be retrieved by the remote web server. Many people are cautious about cookies because of the fear of information being stolen from the cookie without their knowledge. Cookies, however, can be a valuable tool for web developers and users alike.

From the developer's perspective, assigning a cookie is much like setting a variable. You can name the cookie and give it a value and an expiration day/time. That value then becomes globally available regardless of whether the user jumps to another page, retypes the URL, or starts over. Only if the cookie is reassigned or reaches its expiration does the value cease to exist. There is even a special type of cookie expiration that can limit a cookie's lifetime to the current browser session. In this case, the values are never stored on the client computer and are forgotten when the user exits the program. Using this special type of expiration, a programmer can create a web application that, after the user exits, leaves no remnants of the login information. This is as close to traditional programming-language variables as a web developer can hope to get.

From the user's perspective, cookies offer both security and ease-of-use advantages. If a web application stores a user's identifier in a cookie, that user can immediately be recognized when visiting a website. This is commonly used on sites such as Amazon.com to provide a personalized appearance. Because cookies can span multiple pages and applications, a single login can apply to many different portions of a website. Using URL or form variable passing, each link and form on a site must be constructed on the fly. No changes need to be made to the links when cookies are used. In the case of the former, the chance of programming error is much greater.

Cookies are saved to the local computer's drive and can be viewed in many popular browsers. Safari, for example, enables the user to examine stored cookies within the Security Preferences pane, shown in Figure 24.1.

Figure 24.1. Popular browsers, such as Safari, enable the user to browse stored cookies.

COOKIES ARE THEY EVIL?

Contrary to popular belief, cookies are not retrieved by a remote server; they are made available by the client browser. When a cookie is first set, it is given a path (URL) for which it is valid. If your browser comes across a request for a resource (HTML page, image, and so forth) that includes the path, the cookie is automatically sent to that server along with the request. Your browser will send cookies only to the paths where they belong, not to all websites you view.

The contents of a cookie are, indeed, determined by the remote server and can be set to any arbitrary string. They do not provide the capacity to upload binary files or executable applications. It's certainly possible that a cookie could hold a credit card number, but you would have had to enter that number into a web page before it could be stored in a cookie. I have never seen an e-commerce or banking site that worked in this manner, but it is possible that one might exist. If this were the case, other users on your system might be able to find the cookie and extract the sensitive information.

The most alarming use of cookies is the practice of allowing third parties to track browsing information and habits. Some popular websites allow cookies to be set by a common third-party host. Because the third-party host has access to the cookie as long as the main website allows it, information can be shared across a broad range of websites without your knowledge.

If you're concerned about using cookies on your system, the best advice is to inform your users about how cookies are being employed and make sure that they are comfortable with the information being stored. The dangers of cookies have been greatly exaggerated. Use of common sense and caution while programming with cookies will lead to applications that users will trust and enjoy.

Although it is possible to use other techniques for passing information, cookies are the fastest and easiest. Regardless of the technique used to maintain information two final elements are missing from the big picture the session database and session ID. Together they form the Holy Grail of session management, session variables.

Session Variables

A session variable is a variable that can be set to any value, will be accessible by any portion of a web application, and will last only while the web application is being used. In principle, any of the techniques we've looked at so far can do this. Unfortunately, they all fall short when applied to a large system.

For example, imagine that you're passing variables using the URL method:

http://www.mywebsite.com/mywebapp.cgi?variable1=value&variable2=value

This works great for one or two variables, but extend it to a few thousand! Suddenly a two- or three-line URL seems short. There is a limit to the amount of data that can be contained within a URL, making this impossible for large amounts of information.

When using cookies or forms to pass data, you aren't necessarily limited by the size of the request string but by the overhead and complexity of the coding. For each variable that must be stored, a hidden field must be added to a form or a cookie sent back to the server. This process must be repeated on every page. This adds up, in terms of transmission time and processing.

Luckily, there is a solution that can be used with any of the approaches to variable passing the use of a session database and a session ID.

The concept is simple when a user comes to a website, his session starts. He is assigned a unique ID, called the session ID, by the remote web application. As the user interacts with the website, the web application passes the session ID from page to page. This process can be done using the URL, forms, or cookies. When the web application software wants to store a value, it stores it on the server, in a local database that is keyed to that particular session ID.

For programmers, this is a dream come true. They can store any information they want (including sensitive data), and it is never transmitted over the network. The only piece of data that is visible on the network wire is the session ID.

Because a single piece of information can keep track of an unlimited number of variables, the session management system can be written to pass the session ID using URL/form methods or a cookie. Either way is entirely feasible. To make things even easier, developers have included these capabilities in programming languages such as JSP and PHP. For example, in PHP, you can activate session management and store a variable for use on another web page using syntax like this:

 <?php         session_start();         $_SESSION["x"] = $_SESSION["x"] + 1;         print $_SESSION["x"]; ?>

This example uses session_start() to create a new session ID, which is automatically stored in a cookie. Next, the variable x is incremented and stored again in the global $_SESSION array. Finally, the value of x is displayed. The result is a web page that displays an increasing count each time a user loads it.

NOTE

It is important to make the distinction that this is not the same as a web counter. A session ID is specific to a single user, as are all the variables registered with that session. If 50 users were accessing this script simultaneously, each would see a result independent of all the others.

More traditional languages (such as Perl or C) weren't created with web programming in mind. To implement session variables within Perl, you must create, manipulate, and manage session IDs and session databases. This has already been done so many times that a number of prebuilt solutions are available to work with, but none is as elegant as a language designed for the purposes of creating web applications.

< Day Day Up >