Cookies

11.6 Cookies

Cookies are the best current way to identify users and allow persistent sessions. They don't suffer many of the problems of the previous techniques, but they often are used in conjunction with those techniques for extra value. Cookies were first developed by Netscape but now are supported by all major browsers.

Because cookies are important, and they define new HTTP headers, we're going to explore them in more detail than we did the previous techniques. The presence of cookies also impacts caching, and most caches and browsers disallow caching of any cookied content. The following sections present more details.

11.6.1 Types of Cookies

You can classify cookies broadly into two types: session cookies and persistent cookies. A session cookie is a temporary cookie that keeps track of settings and preferences as a user navigates a site. A session cookie is deleted when the user exits the browser. Persistent cookies can live longer; they are stored on disk and survive browser exits and computer restarts. Persistent cookies often are used to retain a configuration profile or login name for a site that a user visits periodically.

The only difference between session cookies and persistent cookies is when they expire. As we will see later, a cookie is a session cookie if its Discard parameter is set, or if there is no Expires or Max-Age parameter indicating an extended expiration time.

11.6.2 How Cookies Work

Cookies are like "Hello, My Name Is" stickers stuck onto users by servers. When a user visits a web site, the web site can read all the stickers attached to the user by that server.

The first time the user visits a web site, the web server doesn't know anything about the user ( Figure 11-3 a). The web server expects that this same user will return again, so it wants to "slap" a unique cookie onto the user so it can identify this user in the future. The cookie contains an arbitrary list of name=value information, and it is attached to the user using the Set-Cookie or Set-Cookie2 HTTP response (extension) headers.

Cookies can contain any information, but they often contain just a unique identification number, generated by the server for tracking purposes. For example, in Figure 11-3 b, the server slaps onto the user a cookie that says id="34294". The server can use this number to look up database information that the server accumulates for its visitors (purchase history, address information, etc.).

However, cookies are not restricted to just ID numbers . Many web servers choose to keep information directly in the cookies. For example:

 Cookie: name="Brian Totty"; phone="555-1212" 

The browser remembers the cookie contents sent back from the server in Set-Cookie or Set-Cookie2 headers, storing the set of cookies in a browser cookie database (think of it like a suitcase with stickers from various countries on it). When the user returns to the same site in the future ( Figure 11-3 c), the browser will select those cookies slapped onto the user by that server and pass them back in a Cookie request header.

Figure 11-3. Slapping a cookie onto a user

figs/http_1103.gif

11.6.3 Cookie Jar: Client-Side State

The basic idea of cookies is to let the browser accumulate a set of server-specific information, and provide this information back to the server each time you visit. Because the browser is responsible for storing the cookie information, this system is called client-side state . The official name for the cookie specification is the HTTP State Management Mechanism.

11.6.3.1 Netscape Navigator cookies

Different browsers store cookies in different ways. Netscape Navigator stores cookies in a single text file called cookies.txt . For example:

 # Netscape HTTP Cookie File 
 # http://www.netscape.com/newsref/std/cookie_spec.html 
 # This is a generated file! Do not edit. 
 # 
 # domain allh path secure expires name value 
 
 www.fedex.com FALSE / FALSE 1136109676 cc /us/ 
 .bankofamericaonline.com TRUE / FALSE 1009789256 state CA 
 .cnn.com TRUE / FALSE 1035069235 SelEdition www 
 secure.eepulse.net FALSE /eePulse FALSE 1007162968 cid %FE%FF%002 
 www.reformamt.org TRUE /forum FALSE 1033761379 LastVisit 1003520952 
 www.reformamt.org TRUE /forum FALSE 1033761379 UserName Guest 
 ... 

Each line of the text file represents a cookie. There are seven tab-separated fields:

domain

The domain of the cookie

allh

Whether all hosts in a domain get the cookie, or only the specific host named

path

The path prefix in the domain associated with the cookie

secure

Whether we should send this cookie only if we have an SSL connection

expiration

The cookie expiration date in seconds since Jan 1, 1970 00:00:00 GMT

name

The name of the cookie variable

value

The value of the cookie variable

11.6.3.2 Microsoft Internet Explorer cookies

Microsoft Internet Explorer stores cookies in individual text files in the cache directory. You can browse this directory to view the cookies, as shown in Figure 11-4 . The format of the Internet Explorer cookie files is proprietary, but many of the fields are easily understood . Each cookie is stored one after the other in the file, and each cookie consists of multiple lines.

Figure 11-4. Internet Explorer cookies are stored in individual text files in the cache directory

figs/http_1104.gif

The first line of each cookie in the file contains the cookie variable name. The next line is the variable value. The third line contains the domain and path. The remaining lines are proprietary data, presumably including dates and other flags.

11.6.4 Different Cookies for Different Sites

A browser can have hundreds or thousands of cookies in its internal cookie jar, but browsers don't send every cookie to every site. In fact, they typically send only two or three cookies to each site. Here's why:

                Moving all those cookie bytes would dramatically slow performance. Browsers would actually be moving more cookie bytes than real content bytes!

                Most of these cookies would just be unrecognizable gibberish for most sites, because they contain server-specific name/value pairs.

                Sending all cookies to all sites would create a potential privacy concern, with sites you don't trust getting information you intended only for another site.

In general, a browser sends to a server only those cookies that the server generated. Cookies generated by joes-hardware.com are sent to joes-hardware.com and not to bobs -books.com or marys-movies.com .

Many web sites contract with third-party vendors to manage advertisements. These advertisements are made to look like they are integral parts of the web site and do push persistent cookies. When the user goes to a different web site serviced by the same advertisement company, the persistent cookie set earlier is sent back again by the browser (because the domains match). A marketing company could use this technique, combined with the Referer header, to potentially build an exhaustive data set of user profiles and browsing habits. Modern browsers allow you to configure privacy settings to restrict third-party cookies.

11.6.4.1 Cookie Domain attribute

A server generating a cookie can control which sites get to see that cookie by adding a Domain attribute to the Set-Cookie response header. For example, the following HTTP response header tells the browser to send the cookie user="mary17" to any site in the domain .airtravelbargains.com :

 Set-cookie: user="mary17"; domain="airtravelbargains.com" 

If the user visits www.airtravelbargains.com , specials.airtravelbargains.com , or any site ending in .airtravelbargains.com , the following Cookie header will be issued:

 Cookie: user="mary17" 
11.6.4.2 Cookie Path attribute

The cookie specification even lets you associate cookies with portions of web sites. This is done using the Path attribute, which indicates the URL path prefix where each cookie is valid.

For example, one web server might be shared between two organizations, each having separate cookies. The site www.airtravelbargains.com might devote part of its web site to auto rentalssay, http://www.airtravelbargains.com/autos/ using a separate cookie to keep track of a user's preferred car size. A special auto-rental cookie might be generated like this:

 Set-cookie: pref=compact; domain="airtravelbargains.com"; path=/autos/ 

If the user goes to http://www.airtravelbargains.com/specials.html , she will get only this cookie:

 Cookie: user="mary17" 

But if she goes to http://www.airtravelbargains.com/autos/cheapo/index.html , she will get both of these cookies:

 Cookie: user="mary17" 
 Cookie: pref=compact 

So, cookies are pieces of state, slapped onto the client by the servers, maintained by the clients , and sent back to only those sites that are appropriate. Let's look in more detail at the cookie technology and standards.

11.6.5 Cookie Ingredients

There are two different versions of cookie specifications in use: Version 0 cookies (sometimes called "Netscape cookies"), and Version 1 ("RFC 2965") cookies. Version 1 cookies are a less widely used extension of Version 0 cookies.

Neither the Version 0 or Version 1 cookie specification is documented as part of the HTTP/1.1 specification. There are two primary adjunct documents that best describe the use of cookies, summarized in Table 11-2 .

Table 11-2. Cookie specifications

Title

Description

Location

Persistent Client State: HTTP Cookies

Original Netscape cookie standard

http://home.netscape.com/newsref/std/cookie_spec.html

RFC 2965: HTTP State Management Mechanism

October 2000 cookie standard, obsoletes RFC 2109

http://www.ietf.org/rfc/rfc2965.txt

11.6.6 Version 0 (Netscape) Cookies

The initial cookie specification was defined by Netscape. These "Version 0" cookies defined the Set-Cookie response header, the Cookie request header, and the fields available for controlling cookies. Version 0 cookies look like this:

 Set-Cookie:   name   =   value   [; expires=   date   ] [; path=   path   ] [; domain=   domain   ] [; secure] 
 
 Cookie:   name1   =   value1   [;   name2   =   value2   ] ... 
11.6.6.1 Version 0 Set-Cookie header

The Set-Cookie header has a mandatory cookie name and cookie value. It can be followed by optional cookie attributes, separated by semicolons. The Set-Cookie fields are described in Table 11-3 .

Table 11-3. Version 0 (Netscape) Set-Cookie attributes

Set-Cookie attribute

Description and examples

NAME=VALUE

Mandatory. Both NAME and VALUE are sequences of characters , excluding the semicolon, comma, equals sign, and whitespace, unless quoted in double quotes. The web server can create any NAME=VALUE association, which will be sent back to the web server on subsequent visits to the site.

 Set-Cookie: customer=Mary 

Expires

Optional. This attribute specifies a date string that defines the valid lifetime of that cookie. Once the expiration date has been reached, the cookie will no longer be stored or given out. The date is formatted as:

 Weekday, DD-Mon-YY HH:MM:SS GMT 

The only legal time zone is GMT, and the separators between the elements of the date must be dashes. If Expires is not specified, the cookie will expire when the user's session ends.

 Set-Cookie: foo=bar; expires=Wednesday, 09-Nov-99 23:12:40 GMT 

Domain

Optional. A browser sends the cookie only to server hostnames in the specified domain. This lets servers restrict cookies to only certain domains. A domain of "acme.com" would match hostnames "anvil.acme.com" and "shipping.crate.acme.com", but not "www.cnn.com".

Only hosts within the specified domain can set a cookie for a domain, and domains must have at least two or three periods in them to prevent domains of the form ".com", ".edu", and "va.us". Any domain that falls within the fixed set of special top-level domains listed here requires only two periods. Any other domain requires at least three. The special top-level domains are: .com, .edu, .net, .org, .gov, .mil, .int, .biz, . info , .name, .museum, .coop, .aero, and .pro.

If the domain is not specified, it defaults to the hostname of the server that generated the Set-Cookie response.

 Set-Cookie: SHIPPING=FEDEX; domain="joes-hardware.com" 

Path

Optional. This attribute lets you assign cookies to particular documents on a server. If the Path attribute is a prefix of a URL path, a cookie can be attached. The path "/foo" matches "/foobar" and "/foo/bar.html". The path "/" matches everything in the domain.

If the path is not specified, it is set to the path of the URL that generated the Set-Cookie response.

 Set-Cookie: lastorder=00183; path=/orders 

Secure

Optional. If this attribute is included, a cookie will be sent only if HTTP is using an SSL secure connection.

 Set-Cookie: private_id=519; secure 
11.6.6.2 Version 0 Cookie header

When a client sends requests , it includes all the unexpired cookies that match the domain, path, and secure filters to the site. All the cookies are combined into a Cookie header:

 Cookie: session-id=002-1145265-8016838; session-id-time=1007884800 

11.6.7 Version 1 (RFC 2965) Cookies

An extended version of cookies is defined in RFC 2965 (previously RFC 2109). This Version 1 standard introduces the Set-Cookie2 and Cookie2 headers, but it also interoperates with the Version 0 system.

The RFC 2965 cookie standard is a bit more complicated than the original Netscape standard and is not yet completely supported. The major changes of RFC 2965 cookies are:

                Associate descriptive text with each cookie to explain its purpose

                Support forced destruction of cookies on browser exit, regardless of expiration

                Max-Age aging of cookies in relative seconds, instead of absolute dates

                Ability to control cookies by the URL port number, not just domain and path

                The Cookie header carries back the domain, port, and path filters (if any)

                Version number for interoperability

                $ prefix in Cookie header to distinguish additional keywords from usernames

The Version 1 cookie syntax is as follows :

 set-cookie = "Set-Cookie2:" cookies 
 cookies = 1#cookie 
 cookie = NAME "=" VALUE *(";" set-cookie-av) 
 NAME = attr 
 VALUE = value 
 set-cookie-av = "Comment" "=" value 
 "CommentURL" "=" <"> http_URL <"> 
 "Discard" 
 "Domain" "=" value 
 "Max-Age" "=" value 
 "Path" "=" value 
 "Port" [ "=" <"> portlist <"> ] 
 "Secure" 
 "Version" "=" 1*DIGIT 
 portlist = 1#portnum 
 portnum = 1*DIGIT 
 
 cookie = "Cookie:" cookie-version 1*((";"  ",") cookie-value) 
 cookie-value = NAME "=" VALUE [";" path] [";" domain] [";" port] 
 cookie-version = "$Version" "=" value 
 NAME = attr 
 VALUE = value 
 path = "$Path" "=" value 
 domain = "$Domain" "=" value 
 port = "$Port" [ "=" <"> value <"> ] 
 
 cookie2 = "Cookie2:" cookie-version 
11.6.7.1 Version 1 Set-Cookie2 header

More attributes are available in the Version 1 cookie standard than in the Netscape standard. Table 11-4 provides a quick summary of the attributes. Refer to RFC 2965 for more detailed explanation.

Table 11-4. Version 1 (RFC 2965) Set-Cookie2 attributes

Set-Cookie2 attribute

Description and examples

NAME=VALUE

Mandatory. The web server can create any NAME=VALUE association, which will be sent back to the web server on subsequent visits to the site. The name must not begin with "$", because that character is reserved.

Version

Mandatory. The value of this attribute is an integer, corresponding to the version of the cookie specification. RFC 2965 is Version 1.

 Set-Cookie2: Part="Rocket_Launcher_0001"; Version="1" 

Comment

Optional. This attribute documents how a server intends to use the cookie. The user can inspect this policy to decide whether to permit a session with this cookie. The value must be in UTF-8 encoding.

CommentURL

Optional. This attribute provides a URL pointer to detailed documentation about the purpose and policy for a cookie. The user can inspect this policy to decide whether to permit a session with this cookie.

Discard

Optional. If this attribute is present, it instructs the client to discard the cookie when the client program terminates.

Domain

Optional. A browser sends the cookie only to server hostnames in the specified domain. This lets servers restrict cookies to only certain domains. A domain of "acme.com" would match hostnames "anvil.acme.com" and "shipping.crate.acme.com", but not "www.cnn.com". The rules for domain matching are basically the same as in Netscape cookies, but there are a few additional rules. Refer to RFC 2965 for details.

Max-Age

Optional. The value of this attribute is an integer that sets the lifetime of the cookie in seconds. Clients should calculate the age of the cookie according to the HTTP/1.1 age-calculation rules. When a cookie's age becomes greater than the Max-Age, the client should discard the cookie. A value of zero means the cookie with that name should be discarded immediately.

Path

Optional. This attribute lets you assign cookies to particular documents on a server. If the Path attribute is a prefix of a URL path, a cookie can be attached. The path "/foo" would match "/foobar" and "/foo/bar.html". The path "/" matches everything in the domain. If the path is not specified, it is set to the path of the URL that generated the Set-Cookie response.

Port

Optional. This attribute can stand alone as a keyword, or it can include a comma-separated list of ports to which a cookie may be applied. If there is a port list, the cookie can be served only to servers whose ports match a port in the list. If the Port keyword is provided in isolation, the cookie can be served only to the port number of the current responding server.

 Set-Cookie2: foo="bar"; Version="1"; Port="80,81,8080" 
 Set-Cookie2: foo="bar"; Version="1"; Port 

Secure

Optional. If this attribute is included, a cookie will be sent only if HTTP is using an SSL secure connection.

11.6.7.2 Version 1 Cookie header

Version 1 cookies carry back additional information about each delivered cookie, describing the filters each cookie passed. Each matching cookie much include any Domain, Port, or Path attributes from the corresponding Set-Cookie2 headers.

For example, assume the client has received these five Set-Cookie2 responses in the past from the www.joes-hardware.com web site:

 Set-Cookie2: ID="29046"; Domain=".joes-hardware.com" 
 Set-Cookie2: color=blue 
 Set-Cookie2: support-pref="L2"; Domain="customer-care.joes-hardware.com" 
 Set-Cookie2: Coupon="hammer027"; Version="1"; Path="/tools" 
 Set-Cookie2: Coupon="handvac103"; Version="1"; Path="/tools/cordless" 

If the client makes another request for path /tools/cordless/specials.html , it will pass along a long Cookie2 header like this:

 Cookie: $Version="1"; 
 ID="29046"; $Domain=".joes-hardware.com"; 
 color="blue"; 
 Coupon="hammer027"; $Path="/tools"; 
 Coupon="handvac103"; $Path="/tools/cordless" 

Notice that all the matching cookies are delivered with their Set-Cookie2 filters, and the reserved keywords begin with a dollar sign ( $ ).

11.6.7.3 Version 1 Cookie2 header and version negotiation

The Cookie2 request header is used to negotiate interoperability between clients and servers that understand different versions of the cookie specification. The Cookie2 header advises the server that the user agent understands new-style cookies and provides the version of the cookie standard supported (it would have made more sense to call it Cookie-Version):

 Cookie2: $Version="1" 

If the server understands new-style cookies, it recognizes the Cookie2 header and should send Set-Cookie2 (rather than Set-Cookie) response headers. If a client gets both a Set-Cookie and a Set-Cookie2 header for the same cookie, it ignores the old Set-Cookie header.

If a client supports both Version 0 and Version 1 cookies but gets a Version 0 Set-Cookie header from the server, it should send cookies with the Version 0 Cookie header. However, the client also should send Cookie2: $Version="1" to give the server indication that it can upgrade.

11.6.8 Cookies and Session Tracking

Cookies can be used to track users as they make multiple transactions to a web site. E-commerce web sites use session cookies to keep track of users' shopping carts as they browse. Let's take the example of the popular shopping site Amazon.com. When you type http://www.amazon.com into your browser, you start a chain of transactions where the web server attaches identification information through a series of redirects, URL rewrites, and cookie setting.

Figure 11-5 shows a transaction sequence captured from an Amazon.com visit:

                Figure 11-5 aBrowser requests Amazon.com root page for the first time.

                Figure 11-5 bServer redirects the client to a URL for the e-commerce software.

                Figure 11-5 cClient makes a request to the redirected URL.

                Figure 11-5 dServer slaps two session cookies on the response and redirects the user to another URL, so the client will request again with these cookies attached. This new URL is a fat URL, meaning that some state is embedded into the URL. If the client has cookies disabled, some basic identification can still be done as long as the user follows the Amazon.com-generated fat URL links and doesn't leave the site.

                Figure 11-5 eClient requests the new URL, but now passes the two attached cookies.

                Figure 11-5 fServer redirects to the home.html page and attaches two more cookies.

                Figure 11-5 gClient fetches the home.html page and passes all four cookies.

                Figure 11-5 hServer serves back the content.

Figure 11-5. The Amazon.com web site uses session cookies to track users

figs/http_1105.gif

11.6.9 Cookies and Caching

You have to be careful when caching documents that are involved with cookie transactions. You don't want to assign one user some past user's cookie or, worse , show one user the contents of someone else's personalized document.

The rules for cookies and caching are not well established. Here are some guiding principles for dealing with caches:

Mark documents uncacheable if they are

The document owner knows best if a document is uncacheable. Explicitly mark documents uncacheable if they arespecifically, use Cache-Control: no-cache="Set-Cookie" if the document is cacheable except for the Set-Cookie header. The other, more general practice of using Cache-Control: public for documents that are cacheable promotes bandwidth savings in the Web.

Be cautious about caching Set-Cookie headers

If a response has a Set-Cookie header, you can cache the body (unless told otherwise ), but you should be extra cautious about caching the Set-Cookie header. If you send the same Set-Cookie header to multiple users, you may be defeating user targeting.

Some caches delete the Set-Cookie header before storing a response in the cache, but that also can cause problems, because clients served from the cache will no longer get cookies slapped on them that they normally would without the cache. This situation can be improved by forcing the cache to revalidate every request with the origin server and merging any returned Set-Cookie headers with the client response. The origin server can dictate such revalidations by adding this header to the cached copy:

 Cache-Control: must-revalidate, max-age=0 

More conservative caches may refuse to cache any response that has a Set-Cookie header, even though the content may actually be cacheable. Some caches allow modes when Set-Cookied images are cached, but not text.

Be cautious about requests with Cookie headers

When a request arrives with a Cookie header, it provides a hint that the resulting content might be personalized. Personalized content must be flagged uncacheable, but some servers may erroneously not mark this content as uncacheable.

Conservative caches may choose not to cache any document that comes in response to a request with a Cookie header. And again, some caches allow modes when Cookied images are cached, but not text. The more accepted policy is to cache images with Cookie headers, with the expiration time set to zero, thus forcing a revalidate every time.

11.6.10 Cookies, Security, and Privacy

Cookies themselves are not believed to be a tremendous security risk, because they can be disabled and because much of the tracking can be done through log analysis or other means. In fact, by providing a standardized, scrutinized method for retaining personal information in remote databases and using anonymous cookies as keys, the frequency of communication of sensitive data from client to server can be reduced.

Still, it is good to be cautious when dealing with privacy and user tracking, because there is always potential for abuse. The biggest misuse comes from third-party web sites using persistent cookies to track users. This practice, combined with IP addresses and information from the Referer header, has enabled these marketing companies to build fairly accurate user profiles and browsing patterns.

In spite of all the negative publicity, the conventional wisdom is that the session handling and transactional convenience of cookies outweighs most risks, if you use caution about who you provide personal information to and review sites' privacy policies.

The Computer Incident Advisory Capability (part of the U.S. Department of Energy) wrote an assessment of the overrepresented dangers of cookies in 1998. Here's an excerpt from that report:

  CIAC I-034: Internet Cookies  
  (http://www.ciac.org/ciac/bulletins/i-034.shtml)  
   
  PROBLEM:  
   
  Cookies are short pieces of data used by web servers to help identify web users. The  
  popular concepts and rumors about what a cookie can do has reached almost mystical  
  proportions, frightening users and worrying their managers.  
   
  VULNERABILITY ASSESSMENT:  
   
  The vulnerability of systems to damage or snooping by using web browser cookies is  
  essentially nonexistent. Cookies can only tell a web server if you have been there  
  before and can pass short bits of information (such as a user number) from the web  
  server back to itself the next time you visit. Most cookies last only until you quit  
  your browser and then are destroyed. A second type of cookie known as a persistent  
  cookie has an expiration date and is stored on your disk until that date. A  
  persistent cookie can be used to track a user's browsing habits by identifying him  
  whenever he returns to a site. Information about where you come from and what web  
  pages you visit already exists in a web server's log files and could also be used to  
  track users browsing habits, cookies just make it easier.  

 



HTTP. The Definitive Guide
HTTP: The Definitive Guide
ISBN: 1565925092
EAN: 2147483647
Year: 2001
Pages: 294

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net