Implementing Session Support | MySQL and Perl for the Web

only for RuBoard - do not distribute or recompile

For the remainder of this chapter, we ll discuss how to provide session support that uses active client identification and server-side storage of session records in a database. We ll also write a few applications that demonstrate how to implement sessions, using the following general procedure:

When a user connects initially, generate a random session ID, send it to the client, and initialize a session record in the database.
As each subsequent request arrives from the user, extract the session ID and use it to retrieve the session record. Update the record as necessary using any new information in the request.
At the end of each request, take one of two actions: Close the session if you still need it for future operations, or delete it if you don t. If a customer adds an item to a shopping cart but is not done shopping, for example, you d just close the session. After the customer has finished shopping and you ve stored the final order, you d delete the session.

The mechanism for transmitting the session ID between the client and server from request to request is application-specific. Common techniques propagate identifiers in cookies, hidden fields, or URLs. (Those methods should sound familiar; they re the client-side methods discussed earlier in the chapter. In other words, even though server-side storage of state data reduces the amount of information transmitted with each request, you still need a bit of client-side information to hold the session ID.)

What to Include in the Session Record

There are certain standard kinds of information you may want to include in the session record. Most obviously, the record must contain a session ID, otherwise you cannot identify the session. In addition, a couple of other values can be helpful for administrative purposes. You might want to include a timestamp indicating session creation time or last modification time for expiration purposes. If you share a session table between applications, you may want to include an application identifier as well, so that you can identify which application each session record belongs to. (If you use a separate session table per application, this is not a problem.)

The session record generally also will include application-specific data, the content of which depends on individual application requirements. In a multiple-page survey application, for example, you might store the intermediate responses in the session record until the user completes the survey. If you have an application that enables people to request an insurance policy quote online, you may guide the user through a series of forms to gather the information you need. A session record can be useful for storing this information until the request has been completed.

If you design a session table for a particular application, you can include columns in the table specifically for the types of session information you want to record. This is also true when you share a session table among applications, as long as they all store the same kind of state data. If you want to share a session table among applications that have differing requirements, however, you need a uniform storage mechanism that will work for each application that uses the table. This can be accomplished by using a BLOB or TEXT column large enough to accommodate all your data and storing everything in that column. One way to do this is to serialize your state information (and that s what we ll do in this chapter). Serialization converts information to a single string value that can be unserialized later to recover the original data. If you want to access the session table using programs written in several languages, however, you may encounter compatibility issues between different serialization algorithms. A way around this problem is to store your session data using some representation that is neutral with respect to the programming languages you re using. For example, XML can be used to represent arbitrary session data assuming you can find a suitable XML processor for each of the languages you re using to write scripts.

The Apache::Session Module

We could implement all the session management code ourselves using the guidelines discussed so far, but an alternative is to use the Apache::Session module, which already does much of what we need. This module has the following characteristics:

It uses an algorithm based on MD5 (Message Digest #5) to generate random session IDs that are not easily guessable by intruders.
It uses server-side storage so that clients cannot modify session information directly.
The module can use either files or a database, which provides persistent storage for session data that survives server and machine restarts. We ll use MySQL to store session records, but Apache::Session also supports several other database engines such as Postgres and Sybase. This may be helpful if you need to port your code to another database.
Session records are represented as hash structures, and Apache::Session ties these hashes to database access methods. This makes it easy to store and access session data just by referring to hash members.
Session values are stored in the database using serialization. In this way, Apache::Session imposes little constraint on the type of data you can store in session records.
Apache::Session handles locking issues for you; while one client has a session open, no other client can modify it.

Setting Up to Use Apache::Session

You should get and install the Apache::Session module from the CPAN if you don t already have it. Apache::Session relies on Storable to serialize session data, so make sure you have that module, too. If you need other modules that are missing, Apache::Session should tell you what they are when you try to build it.

After you ve installed Apache::Session, use the following commands to read its general and MySQL-specific documentation:

 % perldoc Apache::Session  % perldoc Apache::Session::MySQL  % perldoc Apache::Session::Lock::MySQL  % perldoc Apache::Session::Store::MySQL

The information provided by the last of these commands shows that to use Apache::Session with MySQL, you must create a table named sessions that contains the following columns:

 CREATE TABLE sessions  (     id          CHAR(32) NOT NULL PRIMARY KEY,  # session identifier      a_session   TEXT                            # session data  )

Whenever Apache::Session creates a new session record, it generates a unique identifier and stores it in the id column. Any session values you add to the record are stored in the a_session column after being serialized. (You need not worry about creating the ID or storing the data; those operations are handled transparently for you.) Apache::Session itself manipulates only the two columns shown; although you can add other columns to the table if you like, they ll be ignored unless you provide your own mechanism for manipulating the additional columns externally to Apache::Session.

The table used by Apache::Session must be named sessions ; there is no provision for specifying an alternative name.^[2] You can put a sessions table in any database you like, but for our applications, I ll assume here that it s in the webdb database. Make sure that the table is both readable and writable to the MySQL account that your scripts use for connecting to the MySQL server.

^[2] This means the table will be shared among applications. If you really want multiple sessions tables, you must put them in different databases.

Opening a Session

To use Apache::Session with MySQL, include the appropriate use statement in your scripts and declare a hash structure in which to store your session data:

 use Apache::Session::MySQL;  my %session;

When you re ready to open a session, invoke tie and pass it several arguments: the session hash, the name of the submodule that Apache::Session should use for managing sessions, a session identifier, and attributes that indicate how to communicate with MySQL. The attributes can take two forms, because Apache::Session can use the database handle associated with an existing connection, or it can establish a new MySQL connection. To use an existing database handle, open the session like this:

 $dbh = WebDB::connect ();   # get a database handle  tie %session, "Apache::Session::MySQL", $sess_id,          {             Handle => $dbh,              LockHandle => $dbh          };

If you don t use an existing connection, you must supply the appropriate parameters, to allow Apache::Session to establish the connection itself:

 tie %session, "Apache::Session::MySQL", $sess_id,          {             DataSource => "DBI:mysql:host=localhost;database=webdb",              UserName => "webdev",              Password => "webdevpass",              LockDataSource => "DBI:mysql:host=localhost;database=webdb",              LockUserName => "webdev",              LockPassword => "webdevpass"          };

In either case, the value of $sess_id indicates the session record you want to use. To create a new session, pass a value of undef. Otherwise, the value should be the ID of an existing session record.

If you pass the handle for an existing connection to tie, Apache::Session leaves the connection open when you close or delete the session. You should close the connection yourself when you re done with it. If Apache::Session itself establishes a connection to the MySQL server when you open a session, it also closes the connection when you close or delete the session. An implication of this behavior is that if you intend to manage multiple simultaneous sessions from within the same script, it s probably better to open the connection yourself and pass the resulting database handle to all the session-initiation calls. Otherwise, you will have a separate connection to MySQL open for each session.

An additional point to keep in mind when you open a session using a handle to an existing connection: Don t close the connection while the session is still open. If you do, Apache::Session won t be able to save the session contents to the database!

Accessing Session Data

After you ve successfully opened a session, you store values in it by creating them as hash values. To store a username in your session, for example, assign it to an appropriately named hash element:

 $session{user_name} = $user_name;

To retrieve the value, access the same element:

 $user_name = $session{user_name};

As a more involved example, another way to use a session record is to store and retrieve form elements. To load a set of form values into the hash, you can use the following loop. It calls param() to get the names of the parameters that are present, and then extracts the value of each one and stuffs it into the session record. The trick here is that you can t store multiple-value elements (such as a set of check box values) as a scalar, so they must be converted to array references:

 foreach my $name (param ())  {     my @val = param ($name);      $session{$name} = (@val > 1 ? [ @@val ] : shift (@val));  }

Conversely, to install the contents of a session record into the parameter environment, do this:

 foreach my $name (keys (%session))  {     param (-name => $name, -value => $session{$name});  }

Then you can generate and display a form, and those elements having names corresponding to session hash keys will be filled in automatically. (This assumes that you take advantage of CGI.pm s sticky form behavior and don t specify the override parameter when you call functions that generate form fields.)

When you choose key names for session hash values, they can be whatever you like, except that they shouldn t begin with underscores. Such names are reserved by Apache::Session for its own purposes, although the only one it actually uses currently is _session_id for the session identifier.

Closing or Terminating a Session

When you re done using a session object in a script, you can release (close) it by calling untie() :

 untie (%session);

If you made any changes to the session, Apache::Session normally saves it back to the database automatically when you close it; you don t have to take any explicit action. This behavior is subject to one caveat: A change may not be noticed if a session object references a more complex structure such as an array and you change an element of the structure. (This happens because you re not actually changing the value of the session hash element itself, you re changing the value of the thing the element points to.) To handle this situation, you can force an update using a trivial assignment, such as the following one:

 $session{_session_id} = $session{_session_id};

That s a harmless way of making the session manager think the session has been modified, causing it to save the record to the database when you close the session. You can use this technique with any session element, not just _session_id.

If you have a session open that you no longer need at all, remove it by deleting the session object:

 tied (%session)->delete ();

untie() and delete() affect the underlying session record differently. When you close a session by calling untie(), you can reopen it later by specifying the appropriate session ID. If you delete a session by invoking delete(), the corresponding session record is removed from the sessions table and cannot be used again.

The WebDB::Session Module

To avoid repeating the session-opening code in each script that uses session management, let s encapsulate it into a module called WebDB::Session that acts as a front end to Apache::Session. The module also provides some other convenience routines. It has the following interface:

Any script that needs the WebDB::Session module should reference it by including the following use statement:
```
 use WebDB::Session; 
```
You need no use statement for Apache::Session, because all the details requiring that module will be handled internally by WebDB::Session.
To open a session and obtain a reference to the session object, invoke the open() method:
```
 $sess_ref = WebDB::Session->open ($dbh, $sess_id); 
```
The first argument, $dbh, indicates the database handle to use. It should be a handle to an open connection to MySQL, or undef if you want open() to contact the MySQL server using built-in connection parameters. The $sess_id argument should be either the ID of an existing session or undef to create a new session.
open() returns a reference to the session object, or undef to indicate an error. In the latter case, you can access the global variable $WebDB::Session::errstr to get the error message.
To close a session or destroy it, use the close() or delete() methods:
```
 $sess_ref->close ();  $sess_ref->delete (); 
```
These object-oriented methods are analogous to the corresponding untie() and delete() operations in Apache::Session that close and remove sessions.
Because open() returns a reference to the hash that represents the session (not the hash itself), the syntax for setting or getting session values requires that you dereference the session object:
```
 $sess_ref->{user_name} = $user_name;    # set session value  $user_name = $sess_ref->{user_name};    # get session value 
```

To make access to session data more generic, however, WebDB::Session provides an attr() accessor method as the preferred interface:

 $sess_ref->attr ("user_name", $user_name);    # set session attribute  $user_name = $sess_ref->attr ("user_name");   # get session attribute

Why bother doing this? Because setting and getting values this way is more abstract; it decouples session data access from the fact that the session is represented by a hash. That makes it easier to replace the underlying session mechanism if some day you decide to reimplement WebDB::Session without using Apache::Session at all.

Another accessor method, session_id(), can be used to get the session ID from the record. This is a read-only method, because the session ID value is generated and assigned inside the module and should not be changed by your application.

Implementation of the WebDB::Session module is not very complicated. The module begins with a package statement, some use statements that reference the modules it needs, and a declaration for the global variable used to hold the error message if open() cannot open a session:

 package WebDB::Session;  use strict;  use DBI;  use Apache::Session::MySQL;  # global variable for error message  $WebDB::Session::errstr = "";

The open() method initiates a session after examining its $dbh argument to determine whether to use an existing MySQL connection or to connect using the built-in default parameters. (You can change the defaults; those shown here are the same as the ones we used for the WebDB::connect() call.)

 sub open  { my ($type, $dbh, $sess_id) = @_;  my %session;    # session hash  my %attr;       # connection attributes  # default connection parameters  my $dsn = "DBI:mysql:host=localhost;database=webdb";  my $user_name = "webdev";  my $password = "webdevpass";      # Set up connection attributes according to whether      # or not we're using an existing connection      if (defined ($dbh))     # use existing connection      {         %attr =          (             Handle => $dbh,              LockHandle => $dbh          );      }      else                    # use default connection parameters      {         %attr =          (             DataSource => $dsn,              UserName => $user_name,              Password => $password,              LockDataSource => $dsn,              LockUserName => $user_name,              LockPassword => $password          );      }      eval            # open session, putting tie within eval to trap errors      {         tie %session, "Apache::Session::MySQL", $sess_id, \%attr;      };      if ($@)         # session establishment failed      {         $WebDB::Session::errstr = $@;   # save error message          return undef;      }      # get reference to session hash, bless it into class, and return it      return (bless (\%session, $type));  }

For the most part, WebDB::Session behaves similarly to Apache::Session when opening a session. One thing that differs is handling of errors. If Apache::Session cannot open a session, you get a noisy error message in your browser window and your script dies. (This can occur, for example, if a user tries to resume a session for a record that has been deleted.) We avoid this by wrapping the tie call within an eval block and returning undef to indicate an error. Your scripts should check the return value of open() and take appropriate action. The following example calls open() and prints a message to indicate the cause of the problem if an error occurs:

 defined ($sess_ref = WebDB::Session->open (undef, undef))      or print p ("Could not create session: $WebDB::Session::errstr");

In practice, you ll probably find that $WebDB::Session:errstr contains technical information more useful to you as a developer than to end users of your applications. It may be best to print its value using code that is enabled only during an application s development cycle to help you see what s going on. You can disable it before deploying the application for production use.

The close() and delete() methods map onto the corresponding Apache::Session calls for closing and destroying session objects:

 sub close  { my $self = shift;      untie (%{$self});  }  sub delete  { my $self = shift;      tied (%{$self})->delete ();  }

The attr() accessor method through which you set or get session values takes either one or two arguments. The first argument names the data value you re referring to. If that is the only argument, attr() just returns the appropriate session value by extracting the corresponding hash element. If there is a second argument, attr() uses it to set the session value first before returning it:

 sub attr  { my $self = shift;      return (undef) unless @_;           # no arguments      $self->{$_[0]} = $_[1] if @@_ > 1;  # two arguments; set value first      return ($self->{$_[0]});            # return current value  }

If you access a session element that does not exist, attr() returns undef. If you want to store an array or a hash, you should store it by reference. When accessing it later, dereference the value to recover the original array or hash.

The session_id() method that returns the session identifier is nothing more than a call that invokes attr() for you with the name of the session ID key. This hides from your applications the magic name that Apache::Session uses for the ID and exposes its value using a public method instead:

 sub session_id  { my $self = shift;      return ($self->{_session_id});  }

The WebDB::Session module file should be installed under some directory that is in the Perl search path used by your scripts, and should be named WebDB/Session.pm (or WebDB\Session.pm under Windows). On my system, I install it as /usr/local/apache/lib/perl/WebDB/Session.pm.

Passing Session IDs in URLs

As a demonstration of how WebDB::Session is used, this section describes a script, stages_url.pl, that manipulates sessions. To keep it simple, each session contains only one piece of state data: a counter value that keeps track of the stage the session is at, where stage is defined as the number of pages the script has displayed so far during the current session. stages_url.pl stores the counter on the server in the session record and uses the URL to pass the session ID from page to page so that script invocations can determine which session to use. This script illustrates the following session-related techniques:

How to determine whether to begin a new session or continue an existing one
How to retrieve session records and manipulate their contents
How to terminate a session
How to detect errors for cases when a session record cannot be created or cannot be found.

When you first invoke stages_url.pl, it creates a new session and initializes the stage counter. Then it displays a page that shows the session ID, the stage counter, a link for progressing to the next page, and another for terminating the session. The page looks like this (the ID varies from session to session):

 Session ID: 341cbe58daa5da95f9ced74b7bf7726e  Session stage: 1  Select [this link] to continue to the next stage.  Select [this link] to quit.

If you select the first link to continue the session, the script presents another page that looks the same, except that the Session stage value increments by one. This happens as long as you keep selecting the continue link. When you select the quit link, the script displays a page that shows the final number of stages and a link enabling you to begin a new session:

 Session ID 341cbe58daa5da95f9ced74b7bf7726e has ended after 4 stages.  Select [this link] to start a new session.

stages_url.pl determines what parameters are present in the URL and uses them to figure out what to do. Initially you invoke the script with no parameters using this URL:

http://www.snake.net/cgi-perl/stages_url.pl

The continue-session and terminate-session URLs are generated by the script itself and displayed as links so you can select them. Both URLs contain the session ID, and the terminate-session URL contains a quit parameter as well:

 http://www.snake.net/cgi-perl/stages_url.pl?sess_id=xxx  http://www.snake.net/cgi-perl/stages_url.pl?sess_id=xxx;quit=1

stages_url.pl begins with a preamble that references the necessary modules. Then it checks the URL parameters, opens the session, and prints the appropriate type of page:

 use strict;  use lib qw(/usr/local/apache/lib/perl);  use CGI qw(:standard escape escapeHTML);  use WebDB::Session;  my $sess_id;    # session ID  my $sess_ref;   # reference to session record  # Determine whether or not there is a session ID available.  If not,  # create a new session; otherwise, read the record for an existing session.  $sess_id = param ("sess_id");   # undef if no param (initial connection)  if (!defined ($sess_id))        # create new session  {     defined ($sess_ref = WebDB::Session->open (undef, undef))          or error ("Could not create new session: $WebDB::Session::errstr");  }  else                            # retrieve existing session  {     defined ($sess_ref = WebDB::Session->open (undef, $sess_id))          or error ("Could not retrieve session record for ID $sess_id: "                      . $WebDB::Session::errstr);  }  # If this is a new session, initialize the stage counter  $sess_ref->attr ("count", 0) if !defined ($sess_ref->attr ("count"));  print header (),      start_html (-title => "Session Stages", -bgcolor => "white");  if (!defined (param ("quit")))      # continue existing session  {     display_next_page ($sess_ref);  # display page for first/next stage      $sess_ref->close ();            # done with session; close it  }  else                                # terminate session  {     display_last_page ($sess_ref);  # display termination page      $sess_ref->delete ();           # destroy session for good  }  print end_html ();

stages_url.pl determines whether a session ID is available by looking for a sess_id parameter, and then creates a new session or retrieves an existing one accordingly. The session-opening code checks each case and prints error messages specific to the problems that might occur. The tests used by stages_url.pl are more elaborate than your applications may need: You may not care whether you re opening a new or existing session, or about displaying such specific error messages. For such instances, you may be able to reduce the session-opening code to the following statement:

 $sess_ref = WebDB::Session->open (undef, param ("sess_id"))      or error ("Could not open session");

After opening the session, we initialize the stage counter if necessary. (The session s count value will be undefined if we just created the session.) This initialization code could have been put into the preceding if statement, where the new session actually was created. However, it can be a good thing to separate the session opening and initialization operations. Code to open a session tends to be fairly stereotypical, whereas session initialization depends heavily on the type of state data you re using, and varies from application to application. Keeping the two separate makes it easier to copy and paste session-opening code between applications without having to tweak it.

For a new or continuing session, stages_url.pl calls display_next_page(). This routine increments the stage counter and displays a page showing the session ID, the current stage, and links for continuing or terminating the session. Incrementing the counter changes the state data; this will cause the session record to be saved to the database automatically when we close the session.

 sub display_next_page  { my $sess_ref = shift;      # increment stage counter      $sess_ref->attr ("count", $sess_ref->attr ("count") + 1);      # display current session information      print p (escapeHTML ("Session ID: " . $sess_ref->session_id ())),              p ("Session stage: ", $sess_ref->attr ("count"));      # display links for continuing or terminating session      my $url = sprintf ("%s?sess_id=%s", # URL for continuing session                          url (), escape ($sess_ref->session_id ()));      print p (sprintf ("Select %s to continue to the next stage.",                          a ({-href => $url}, "this link")));      $url .= ";quit=1";                  # URL for quitting session      print p (sprintf ("Select %s to quit.",                          a ({-href => $url}, "this link")));  }

For a session that is to be terminated (which is the case when the quit parameter is present), the script calls display_last_page() to generate a page that displays the session ID, the final stage count, and a link for beginning a new session:

 sub display_last_page  { my $sess_ref = shift;      # display final session state      print p (escapeHTML (sprintf ("Session %s has ended after %d stages.",                          $sess_ref->session_id (),                          $sess_ref->attr ("count"))));      # display link for starting new session      print p (sprintf ("Select %s to start a new session.",                          a ({-href => url ()}, "this link")));  }

The error() utility routine prints a message when a problem is detected:

 sub error  {     print header (),              p (escapeHTML ($_[0])),              end_html ();      exit (0);  }

To test stages_url.pl, invoke it with no parameters in the URL, and then select the various links in the pages that it generates. There is no explicit MySQL activity in the script itself, because all that is handled behind the scenes automatically as WebDB::Session invokes Apache::Session operations for you. If MySQL logging is enabled and you have access to the log file, however, you may find it instructive to watch the queries that Apache::Session issues as you progress through the stages of this application:

 % tail -f logfile

I ve described the stages_url.pl script in terms of a session through which you move linearly by selecting links in successive pages, but users don t always behave that way. To get an idea of how this script responds when you don t just proceed from page to page, try some of the following actions and observe what happens:

Begin a session to get the initial page, and then reload the page.
Begin a session, select the continue link to go to the next page, and then reload the page.
Begin a session, select the quit link to go to the last page, and then reload the page.
Select your browser s Back button after reaching various session stages, and then reload the page or select one of its links.

Invoke the script with a fake sess_id parameter:

 http://www.snake.net/cgi-perl/stages_url.pl?sess_id=X

Invoke the script without a sess_id parameter but with a quit parameter:
```
 http://www.snake.net/cgi-perl/stages_url.pl?quit=1 
```

When you try these things, do you find any of the behaviors that result to be unacceptable? If so, what can you do about them? Also, consider the fact that if you just close a page when a session is in progress, the session record in the database is not destroyed resulting in an orphaned session. Can you do anything about that?

Passing Session IDs in Hidden Fields

stages_url.pl uses a URL parameter to pass the session ID from page to page. If you re displaying a page containing a form, you can use a hidden field to pass the ID by calling the hidden() function when you generate the form:

 print hidden (-name => "sess_id", -value => $sess_id);

After the user submits the form, you can retrieve the ID by calling param(), just as you do when the ID is passed as part of the URL:

 $sess_id = param ("sess_id");

This method of passing session identifiers is fairly straightforward, so I won t discuss it further.The webdb distribution includes a script, stages_hidden.pl, that demonstrates an implementation of this technique.

Passing Session IDs in Cookies

A third ID-propagation technique is to use a cookie (assuming that the user has cookies turned on, of course.) Let s see how to do this with a script, stages_cookie.pl. On the surface, this script presents pages that are the same as those displayed by stages_url.pl, the script written in Passing Session IDs in URLs. However, the underlying details for setting and getting the session ID are quite different. The cookie script also behaves differently than stages_url.pl in some subtle ways, as we ll see shortly.

stages_cookie.pl relies on CGI.pm s cookie-handling interface to create and retrieve cookies. This interface is provided through the cookie() function.^[3]

^[3] You can also create cookies using the CGI::Cookie module. It has an interface quite similar to that of the cookie() function, which in fact uses the module internally. To read the CGI::Cookie documentation, use this command:

 % perldoc CGI::Cookie

For example, a cookie with a name of "my cookie" and a value of "my value" can be created like this:

 $cookie = cookie (-name => "my name", -value => "my value");

The name and value parameters are required, but there are others you can use to specify additional cookie attributes. The full set of parameters that cookie() understands for creating cookies is as follows:

name is the name of the cookie. It should be a scalar value.
value indicates the cookie s value, which can be a scalar or a reference to an array or a hash.
domain indicates the hosts to which the browser should return the cookie. If no domain value is specified, the browser returns the cookie only to the host from which it came. If the value names a host or domain, the browser returns the cookie to that host or to any host in the domain. Domain names should begin with a dot to distinguish them from host names. A domain name also is required to contain at least two dots (to keep you from trying to match top-level domains). Therefore, to specify a domain of "snake.net", you should specify it as ".snake.net".

path associates a cookie with specific documents at a site; the browser returns the cookie to the current server for any request having a URL that begins with a given path value. For example, "/" matches all pages on the site (in effect creating a site-wide cookie). This is the default if no path is specified. A cookie with a path value of "/cgi-perl" is more specific; the browser will return the cookie only with requests for scripts in or under the /cgi-perl directory. To associate a cookie only with a specific script, pass the script s full pathname. Any CGI.pm script can get its own pathname without hardwiring it in by calling the url() function as follows:
```
 $path = url (-absolute => 1); 
```
Site-wide cookies are useful for associating a client with state information that applies to all pages, such as a preferences profile that affects how your site interacts with the user. For such cookies, try to pick a unique name to keep your application s cookie from being clobbered by cookies sent by other applications that run on your site. More specific path values can be used to limit network traffic (the client browser will return the cookie with fewer requests) and to reduce the probability of cookie name clash between applications.
expires sets an expiration date for the cookie. If you specify no expiration, the browser remembers the cookie only until it exits. For a cookie to have a specific lifetime, you must assign it an explicit expiration date. For a date in the future, the browser remembers the cookie until that date is reached. If the date is in the past, the browser deletes the cookie. This is useful if you associate a session with a cookie: When you destroy a session, tell the browser to destroy the corresponding cookie that it s holding by sending a new cookie with an expiration date in the past. That way the browser won t try to reuse the session.

Expiration values can be given literally as GMT dates, but it s more convenient to specify them using shortcut values relative to the current time. For example, expiration values of +1d and -1d indicate one day in the future and in the past. The shortcut suffix letters are s (seconds), m (minutes), h (hours), d (days), M (months), and y (years). Be careful about specifying relative values far into the future, such as +1000y ; you may find that you end up with a cookie containing a date in the past! I assume this is due to some sort of arithmetic wraparound problem.
secure can be set to any true value (such as 1) to tell the browser to return the cookie only if the connection is secure, to protect its contents from snooping.

After creating a cookie, you can send it to the browser by calling header(). Suppose you want to create a cookie named sess_id to use for session-identification purposes. If the variable $sess_id contains the session ID value, the cookie can be created and sent in the headers as follows:

 $cookie = cookie (-name => "sess_id",               # cookie name                      -value => $sess_id,             # cookie value                      -path => url (-absolute => 1),  # use only with this script                      -expires => "+7d");             # expire in 7 days  print header (-cookie => $cookie);

The cookie parameter to header() causes the appropriate Set-Cookie: header to be sent, based on the contents of $cookie. The particular cookie shown here is associated only with the current script (so as not to interfere with any other cookies sent from our site), and will expire in seven days. No secure parameter was specified, so a secure connection is not required. If you want to send multiple cookies (which is legal, and sometimes useful), specify them using a reference to an array:

 print header (-cookie => [ $cookie1, $cookie2 ]);

header() sends cookies to the browser, but that s only half the job. You also need to retrieve them from the browser in future requests, so your application can figure out what to do next. If you want the names of all available cookies, call cookie() with no arguments; to get the value of a specific cookie, pass its name to cookie() :

 @name = cookie ();              # get names of all available cookies  $sess_id = cookie ("sess_id");  # get value of cookie named "sess_id"

cookie() returns undef if you call it with the name of a cookie that doesn t exist. This is generally how you distinguish when your script is being invoked for the first time from when it s being invoked again after you ve already sent the client the cookie:

 $sess_id = cookie ("sess_id");  if (!defined ($sess_id))  {     # ... it's the first invocation  }  else  {     # ... it's a subsequent invocation  }

Choosing a Cookie Name and Path

When you use cookies to store session IDs, choose the name and path parameters that you pass to cookie() appropriately. If a script uses its full pathname for the path value to make the cookie apply only to itself, you can choose any name you want without worrying about the name conflicting with cookies generated by other applications. If you re applying a session to a wider range of pages, the name of the associated cookie becomes more important. If you re using a path of "/" to create a site-wide cookie for a display preferences profile, for example, a generic name value such as sess_id would be ill advised. Names such as display_prefs or viewing_prefs would be better for reducing the probability of conflict with other cookies. In general, the less specific your path value, the more specific your cookie name should be.

Now we re all set to write the stages_cookie.pl script. It needs to perform the following operations:

Detect an existing cookie or create a new one, and open the corresponding session.
Initialize the session if it s new.
Present a page appropriate to the current state of the session.

The first step is to figure out whether a session ID cookie was sent by the browser and what its value is, so that we can determine whether to create a new session or look up an existing one. The following code shows one approach, based on the assumption that the cookie containing the session ID is named sess_id :

 # If the cookie containing the session ID is not present, create a new  # session and prepare a cookie to send to the client that contains the  # session ID.  Otherwise, use the ID to look up an existing session.  # If all attempts to open a session fail, we can't continue.  $sess_id = cookie ("sess_id");  # get value of sess_id cookie if it's present  if (!defined ($sess_id))        # create new session and cookie  {     defined ($sess_ref = WebDB::Session->open (undef, undef))          or error ("Could not create new session: $WebDB::Session::errstr");      $cookie = cookie (-name => "sess_id",                          -value => $sess_ref->session_id (),                          -path => url (-absolute => 1));  }  else                            # retrieve existing session  {     defined ($sess_ref = WebDB::Session->open (undef, $sess_id))          or error ("Could not retrieve session record for ID $sess_id: "                      . $WebDB::Session::errstr);  }

If the cookie doesn t exist, $sess_id will be undefined. This indicates a first-time invocation, so we must create a new session. It s also necessary to create a cookie containing the session ID and send it to the browser, so the browser can return it with subsequent requests. On the other hand, if the cookie does already exist, its value is the ID for an existing session. In this case, we don t have to create a new cookie. (Clearly, the browser must already have one, or it wouldn t have sent it to us.)^[4]

^[4] In some situations you ll want to issue a new cookie even when the browser already has one. If you extend the cookie s expiration time each time your script is requested, for example, you ll need to return a new cookie to tell the browser to update the one it s holding. If you destroy a session, you should tell the browser to delete the cookie associated with the session by sending a new cookie with an expiration date in the past.

The preceding code is fairly straightforward, but subject to an ugly problem if your application creates sessions that have a limited lifetime. Suppose a visitor starts using your application, and then goes on vacation and doesn t use the application again until after the session has expired. On the user s next visit, when the browser sends the cookie, the code will attempt to look up the corresponding session. Because the session is no longer there, the script prints an error message and quits. In fact, for the code just shown, there is now no way for the user to start over except by deleting the cookie on the browser end somehow so that the browser can t send it. There are various approaches to dealing with this problem:

Fail inexplicably with a cryptic error message that confuses and annoys the user.
Begin a new session and send the browser a new cookie to replace the old one.
Tell the user that the session has expired and send the browser a cookie that tells it to delete the old one.

Clearly the first option is the least desirable, although it s what actually happens with the preceding code. The last two options share the characteristic that they cause the browser to get rid of its old cookie, and that s something you should do regardless of the method you use to handle a cookie that has gone out of sync with the session it used to be associated with.

The following code handles the difficulty using the second option. It assumes that a session with a given ID has become unavailable if an error occurs trying to open it, and that a new session should be started. It also generates a new cookie to go with the session. This can be sent to the browser to replace the cookie it currently has, preventing it from sending the bad cookie again in the future:

 # If the cookie containing the session ID is present, use the ID to look up  # an existing session. If the attempt fails or there was no cookie, create  # a new session and prepare a cookie to send to the client that contains  # the session ID. If all attempts to open a session fail, we can't continue.  $sess_id = cookie ("sess_id");  # get value of sess_id cookie if it's present  $sess_ref = WebDB::Session->open (undef, $sess_id) if defined ($sess_id);  if (!defined ($sess_ref))       # no cookie or couldn't retrieve old session  {     defined ($sess_ref = WebDB::Session->open (undef, undef))          or error ("Could not create session: $WebDB::Session::errstr");      $cookie = cookie (-name => "sess_id",                          -value => $sess_ref->session_id (),                          -path => url (-absolute => 1));  }

This second approach is the one used by stages_cookie.pl (and by the remaining applications in this chapter). If you like, you could modify it to notify the user when an error occurs (for example, by pointing out that the old session has expired and that you ve created a new one).

After we check for a cookie as just outlined, $cookie has a value if we ve just created a new cookie to be sent to the browser, and undef otherwise. We can pass it as the cookie parameter to the header() function either way, because header() is smart enough not to generate a Set-Cookie: header if $cookie is undef. But should we call header() at this point? It depends. Keep in mind that for cookie-based applications, any cookies must be sent prior to writing any page content. On the other hand, you may not always know whether you have to send a cookie until after you ve started executing the code that generates the page! As it happens, in our current script, the cookie-detection code isn t the only place where we may need to create a cookie. If the user selects a quit link to terminate the session, we ll need to delete the session record and present the session terminated page, and we ll also need to tell the browser to delete the session cookie by sending a new cookie with an expiration date in the past. Unfortunately, if we ve already started printing the page content, it s too late to send the cookie.

One way to deal with the problem of writing headers and page content in the proper order is to defer output: Save the contents of any page you generate in a string, and then print the string after you know what headers must be sent.

stages_cookie.pl does this using a $page variable to hold the HTML for the page content:

 my $page;  # If this is a new session, initialize the stage counter  $sess_ref->attr ("count", 0) if !defined ($sess_ref->attr ("count"));  if (!defined (param ("quit")))  # continue session  {     $page .= display_next_page ($sess_ref); # display page for first/next stage      $sess_ref->close ();                    # done with session; close it  }  else                            # terminate session  {     $page .= display_last_page ($sess_ref); # display termination page      $sess_ref->delete ();                   # destroy session for good      # Create cookie that tells browser to destroy the one it's storing      $cookie = cookie (-name => "sess_id",                          -value => $sess_ref->session_id (),                          -path => url (-absolute => 1),                          -expires => "-1d");     # "expire yesterday"  }  # Send headers, including any cookie we may have created.  # Then send page contents.  print header (-cookie => $cookie);  print start_html (-title => "Session Stages", -bgcolor => "white"),          $page,          end_html ();

The display_next_page() and display_last_page() functions are similar to the functions with the same names in stages_url.pl, but they return any HTML they generate as a string rather than printing it immediately. They also create slightly different URLs for continuing and terminating the session. stages_url.pl wrote URLs that included the session identifier:

 http://www.snake.net/cgi-perl/stages_url.pl?sess_id=xxx  http://www.snake.net/cgi-perl/stages_url.pl?sess_id=xxx;quit=1

In stages_cookie.pl, we re passing the session ID in a cookie, so the URLs don t need any sess_id parameter:

 http://www.snake.net/cgi-perl/stages_cookie.pl  http://www.snake.net/cgi-perl/stages_cookie.pl?quit=1

The revised versions of display_next_page() and display_last_page() used by stages_cookie.pl look like this:

 sub display_next_page  { my $sess_ref = shift;  my $page;      # increment stage counter      $sess_ref->attr ("count", $sess_ref->attr ("count") + 1);      # display current session information      $page .= p (escapeHTML ("Session ID: " . $sess_ref->session_id ()))              . p ("Session stage: ", $sess_ref->attr ("count"));      # display links for continuing or terminating session      my $url = url ();                   # URL for continuing session      $page .= p (sprintf ("Select %s to continue to the next stage.",                          a ({-href => $url}, "this link")));      $url .= "?quit=1";                  # URL for quitting session      $page .= p (sprintf ("Select %s to quit.",                          a ({-href => $url}, "this link")));      return ($page);  }  sub display_last_page  { my $sess_ref = shift;  my $page;      # display final session state      $page .= p (escapeHTML (sprintf ("Session %s has ended after %d stages.",                          $sess_ref->session_id (),                          $sess_ref->attr ("count"))));      # display link for starting new session      $page .= p (sprintf ("Select %s to start a new session.",                          a ({-href => url ()}, "this link")));      return ($page);  }

If you try out stages_cookie.pl at this point, you ll see that it behaves much the same as stages_url.pl. In fact, its behavior is exactly the same as long as you proceed through a session by selecting links in successive pages. To see where the two scripts differ, go back to the end of the discussion of stages_url.pl, where I gave a list of things to try to see how that script acts when you don t go through pages in sequence. Try those same things with stages_cookie.pl ; you should notice several differences between the behaviors of the two scripts. When a session ID is propagated in the URL, for instance, the ID disappears if you just close the window. The result is an orphaned session. With a cookie, closing the window doesn t orphan the session that way; if you issue a new request for the script, the cookie gets sent again, and the script can determine what to do.

That s not to say that it s not possible to strand a session when you use cookies. If you quit and restart your browser, the cookie disappears. Then if you invoke stages_cookie.pl again, you get a new session (and the previous one becomes orphaned). This happens because we assigned the cookie no expiration value, so the browser remembers the cookie only as long as it continues to run.

Now that you have some experience with sessions, your head is probably brimming with ideas about using them to eliminate some of the shortcomings in the applications developed in earlier chapters. For example, our electronic greeting card application is in fact session based (although we didn t discuss it in those terms), but the implementation leaves something to be desired:

We generated card IDs using an AUTO_INCREMENT column to make sure each card had a unique value. Unfortunately, because these IDs are sequential, they are easy to guess. An intruder can visit our site, begin a new card, examine its ID, and then try other ID values near to it as likely candidates for other cards that are still in the process of being constructed and that can be hijacked. If we used random session IDs instead (such as those generated by Apache::Session), intruders would have a much more difficult time guessing the ID of an in-progress card and card stealing would become more difficult.
If a visitor begins a card but never finishes it, the expiration date for the corresponding record in the ecard table never gets set. This was a problem for the expiration program expire_ecard.pl, which is able to determine whether to expire a card only if that date is set. One solution to this problem would be to store a card in a session record while it s being constructed, and then move the card to the ecard table only after the card has been completed. That way, records never appear in the ecard table until the expiration date is known, and the expiration process can always tell whether any given card record should be deleted. Unfortunately, this is an example of solving one problem by creating another. We would only transfer the expiration problem to another table, namely, the sessions table. If cards are stored in the session table while they are being constructed, how do we know when to expire a session record for a card that the user never finishes? The answer is to assign the session record itself an expiration date that is distinct from the expiration used for the card. Some ideas for solving this type of problem are discussed in Expiring Sessions later in this chapter.
If you were to modify make_ecard.pl to track session records using cookies, you d create a problem for the user who has second thoughts about sending a card in the middle of the card-construction process. As long as the card ID is propagated by means of hidden fields and URLs, it s possible to forget a card just by closing the window or visiting another site. When the ID is stored in a cookie, that doesn t work because the browser continues to remember it. The user would find that returning to the card-making script would cause the old card to reappear! Fortunately, this is easy to solve. Provide a Forget Card button and have make_ecard.pl respond to it by forgetting the session and telling the browser to delete the cookie that holds the ID.

Another use for cookies would be the res_search2.pl script described in Chapter 7, Performing Searches. In that application, we implemented display of multiple-page search results by using a set of links to each of the pages, where each link included a lot of information. However, much of that information was constant across links, such as the search parameters needed to derive the result set. An alternative implementation might use a cookie to store all the information that remains constant from page to page, and then include in the URLs only those values that vary between pages.

only for RuBoard - do not distribute or recompile