 
 | CGI Programming with Perl | 
As we mentioned, there are problems with both of the approaches for maintaining state discussed earlier. Most importantly, if users travel to other web sites and return, there is a good chance that their state information will be lost.
Cookies (originally called "magic cookies") were created by Netscape as a solution to this problem. Cookies allow the web server to ask the browser for small amounts of information on the client machine. Netscape's original proposal was adopted by most web browsers and has become the standard manner for handling cookies. RFC 2109, HTTP State Management Mechanism, which was coauthored by a representative of Netscape, proposed a new protocol for handling cookies. However, browsers have not adopted this new protocol so Netscape's original protocol continues to be the de facto standard.
When a user requests a document, a web server can provide the web browser with one or more cookies along with the documents. The browser adds the cookie to its cookie jar (so to speak) and can pass the cookie back to the server on subsequent requests. As a result, we can store simple information, such as a session identifier, on the client side and use it to reference more complex data we maintain on the server side.
Cookies are ideal for web document personalization. For example, when a user visits our server for the first time (a missing cookie indicates a first time user), we present the user with a form asking for specific preferences. We store these preferences as cookies, and so every subsequent time users visit our site, they will see documents that match their individual preferences only.
Cookies do have restrictions. First, clients do not always accept cookies. Some browsers did not support cookies (though these browsers are becoming less common), and many users disable cookies due to privacy concerns. We will look at how to test for cookies later in this section.
Second, there are restrictions placed on cookie size and the number of cookies. According to Netscape's original cookie specification, no cookie can exceed 4KB, only twenty cookies are allowed per domain, and a total of 300 cookies can be stored on the client side. Some browsers may support more than this, but you should not assume this.
How do cookies work? When a CGI application identifies a new user, it adds an extra header to its response containing an identifier for that user and other information that the server may collect from the client's input. This header informs the cookie-enabled browser to add this information to the client's cookies file. After this, all requests to that URL from the browser will include the cookie information as an extra header in the request. The CGI application uses this information to return a document tailored to that specific client. Because cookies can be stored on the client user's hard disk, that information can even remain when the browser is closed and reopened.
In order to set a cookie, you send a Set-Cookie HTTP header to the browser with a number of parameters for the cookie you wish to set. The browser then returns the cookie in its Cookie header. The Set-Cookie header is formatted as follows:
Set-Cookie: cart_id=12345; domain=.oreilly.com; path=/cgi; expires=Wed, 14-Feb-2001 05:53:40 GMT; secure
In this example, the name of the cookie is cart_id, the value is 12345, and the rest of the parameters are set as name-value pairs except for secure, which never has a value -- it is either present or not. Table 11-2 shows a list of the parameters that you can set with a cookie.
| HTTP Cookie Parameter | CGI.pm cookie( ) Parameter | Description | 
|---|---|---|
| Name | -name | The name given to the cookie; it is possible to set multiple cookies with different names and attributes. | 
| Value | -value | The value assigned to the cookie. | 
| Domain | -domain | The browser will only return the cookie for URLs within this domain. | 
| Expires | -expires | This tells the browser when the cookie expires. | 
| Path | -path | The browser will only return the cookie for URLs below this path. | 
| Secure | -secure | The browser will only return the cookie for secure URLs using the https protocol. | 
CGI.pm supports cookies, so you can generate the header above via the following commands:
my $cookie = $q->cookie( -name => "cart_id", -value => 12345, -domain => ".oreilly.com", -expires => "+1y", -path => "/cgi", -secure => 1 ); print "Set-Cookie: $cookie\n";
However, there's no need to print the Set-Cookie header manually because CGI.pm will format it for you along with other HTTP headers:
print $q->header( -type => "text/html", -cookie => $cookie );
A browser that receives this cookie and accepts it will send it back for all future secure connections to any URL that includes a domain ending in .oreilly.com and a path that starts with /cgi. For example, if the browser requests the URL https://www.oreilly.com/cgi/store/checkout.cgi, it will supply the following header:
Cookie: cart_id=12345
This raw name-value pair is available in the HTTP_COOKIE environment variable or via CGI.pm's raw_cookie method, but it is much simpler to have CGI.pm parse cookies for you. To get the value of a cookie, call the cookie method with the name of the cookie you want:
my $cookie = $q->cookie( "cart_id" );
The following restrictions apply to the parameters that you provide when setting cookies:
Name and value can include any characters. CGI.pm will automatically URL-encode any special characters. Name and value are both required parameters.
Domain must match the domain name of the server setting the cookie. Domains are matched from right to left, so .oreilly.com matches www.oreilly.com as well as server3.oreilly.com or even fred.sf.oreilly.com.
Domains ending with a three-character top-level domain, such as .com, .net, .org, etc., must contain at least two dots. Country top-level domains, such as .au, .uk, .ca, etc., require at least three dots. This prevents someone from setting a cookie for a large common domain such as .com or .co.uk.
If the domain parameter is not explicitly set, it defaults to the full, current domain, such as www.oreilly.com.
Expires contains a timestamp in the following format:
Wdy, DD-Mon-YY HH:MM:SS GMT
Fortunately, you don't have to worry about remembering this because CGI.pm allows you to specify the expiration date using relative values:
-expires => "+1y" # 1 year from now -expires => "+6M" # 6 months from now -expires => "-1d" # yesterday (i.e., delete it) -expires => "+12h" # 12 hours from now -expires => "+30m" # 30 minutes from now -expires => "+15s" # 15 seconds from now -expires => "now" # now
Note that M is used for months and m is used for minutes. If a time is specified that's in the past, the browser does not save the cookie and deletes any previous cookies with the same name for the same domain and path.
If an expiration date is not specified, then the browser saves the cookie in memory until it exits.
Path, like domain, controls when the browser should send the cookie to the server. It must be an absolute path, and it must match the path of the request that sets the cookie. Paths are matched from left to right, and any trailing / is removed from the path parameter, so /cgi/ matches /cgi/check_cart.cgi as well as /cgi-bin/calendar.cgi.
If path is not specified, it defaults to the full path of the request that sets the cookie.
Secure tells the browser that it should only return the cookies for future requests if they are via https.
Browsers distinguish between cookies with the same name but different domains and/or paths. Thus, it is possible for a browser to send you multiple cookies with the same name. However, the browser should send the most specific cookie first in its response. For example, if you set the following two cookies:
my $c1 = $q->cookie( -name => "user", -value => "site_value", -path => "/" ); my $c2 = $q->cookie( -name => "user", -value => "store_value", -path => "/cgi" ); print $q->header( -type => "text/html", -cookie => [ $c1, $c2 ] ); . .
then on future requests, the browser should send you the following:
Cookie: user=store_value; user=site_value
Unlike form parameters, CGI.pm will not return multiple values for cookies with the same name; instead, it will always return the first value. The following:
my $user = $q->cookie( "user" );
sets $user to "store_value". If you need to get the second value, you will have to inspect the value of the HTTP_COOKIE environment variable (or CGI.pm's raw_cookie method) yourself.
Of course, you would probably never set two cookies with the same name in the same script. However, it is quite possible for large sites that you end up with different applications each setting a cookie that share the same name. Therefore, especially if your site is on a domain that is shared with others, it is a good idea with cookies to choose a unique name for your cookies and to restrict the domain and path as much as possible.
Browsers do not consider cookies with different values for secure distinct the way that cookies with different domains and paths are distinct. Thus, you cannot set one value for https connections and another value for http connections to the same domain and path; the second cookie will simply overwrite the first cookie.
If a client does not accept cookies, it will not tell you this; instead it just quietly discards them. Thus, a client who does not accept cookies looks to your CGI scripts just like a new client who has not received any cookies yet. It can be a challenge to tell them apart. Some sites do not put much effort into distinguishing the two and simply add a notice that their site requires cookies and may not work correctly without them. However, a better solution is to test for cookie support via redirection.
Let's say you have an application at http://www.oreilly.com/cgi/store/store.cgi that requires cookies in order to track users' shopping carts. The first thing that this CGI script can do is check to see whether the client sent a cookie. If so, then the user is ready to shop. Otherwise, the CGI script needs to set a cookie first. If the CGI script sets a cookie at the same time that it forwards the user to another URL, such as http://www.oreilly.com/cgi/store/check_cookies.cgi, the second URL can test whether the cookie was in fact set properly. Example 11-4 provides the beginning of the main CGI script.
#!/usr/bin/perl -wT use strict; use CGI; my $q       = new CGI; my $cart_id = $q->cookie( -name => "cart_id" ) || set_cookie( $q ); # Script continues for users with cookies . . sub set_cookie {     my $q = shift;     my $server = $q->server_name;     my $cart_id = unique_id(  );     my $cookie  = $q->cookie( -name  => "cart_id",                               -value => $cart_id,                               -path  => "/cgi/store" );     print $q->redirect ( -url => "http://$server/cgi/store/cookie_test.cgi",                          -cookie => $cookie );     exit; }If we cannot retrieve a cookie for cart_id, we calculate a new unique id for the user and format it as a cookie for the current session that is only visible within our store. The unique_id subroutine is the same one used in Example 11-1 and Example 11-3; we omit it here for brevity. We set the cookie and forward the user to a second CGI script that will test the cookie for us.
There are a number of issues specifically related to setting cookies as part of a redirection:
If the domain of the URL in your redirection is different than the domain of your script, then you cannot set a cookie for the target domain. Browsers are expected to ignore cookies under these circumstances to ensure privacy.
The URL must use an absolute path; otherwise, the web server may attempt to avoid another request and response cycle by simply returning the content for the new URL as the content of the initial response via an internal redirect.
The scope of the cookie must include both the CGI script setting the cookie as well as the CGI script testing whether the cookie is set. In our case, they are both below /cgi/store, so we set our cookie's path to this.
Example 11-5 contains the source for cookie_test.cgi .
#!/usr/bin/perl -wT use strict; use CGI; use constant SOURCE_CGI => "/cgi/store/store.cgi"; my $q      = new CGI; my $cookie = $q->cookie( -name => "cart_id" ); if ( defined $cookie ) {     print $q->redirect( SOURCE_CGI ); } else {     print $q->header( -type => "text/html", -expires => "-1d" ),           $q->start_html( "Cookies Disabled" ),           $q->h1( "Cookies Disabled" ),           $q->p( "Your browser is not accepting cookies. Please upgrade ",                  "to a newer browser or enable cookies in your preferences and",             $q->a( { -href => SOURCE_CGI }, "return to the store" ),             "."           ),           $q->end_html; }This script is quite short. First we store the relative URL of the script that we came from in a constant. We could pull this from HTTP_REFERER, but not all browsers send the Referer HTTP field; because of privacy concerns, some browsers allow the user to disable this field. The safe alternative is to hardcode it into our script here.
We then create a new CGI.pm object and check for the cookie. If the cookie is set, we redirect the user back to the original CGI script, which will now see the new cookie and continue. If the cookie is not set, then we display a message telling the user the problem and providing a link back to the original script to try again. Notice that we disable caching for this page by passing an expired parameter to CGI.pm's header method. This ensures that when the user returns, the browser calls the script to test for cookies again instead of displaying a cached copy of the error message.
| 11.2. Hidden Fields | 12. Searching the Web Server | 
Copyright © 2001 O'Reilly & Associates. All rights reserved.
