Stateless Persistence | Flash and XML[c] A Developer[ap]s Guide

Cookies are conceptually simple. They are tiny text files written on the client's computer. Though they are written on the client's file system, it is the server (classically) that reads and writes the cookies.

The advantages to the software designer are great. The history of interacting with this client can be easily stored and recovered. For example (and this is the example we will use), the most recent login parameters can be saved and restored. But this is just a start. There are plenty of complex and subtle ways to use the power of persistence to create personalized web experiences.

In general cookies provide a way to recognize the user throughout a session and across sessions. Remember that HTTP is a stateless protocol. Without some explicit mechanism, there is no identification of the multiple independent HTTP dialogs that make up a single web experience. A web developer must work hard to connect the click that tosses a book into a shopping cart with the click five minutes later that brings the shopper to the checkout stand. And the developer must work harder still to recognize the shopper when she returns in a month to shop some more. This problem occurs not just with shopping but with collaborative work, game play, chatalmost every web-based activity.

Cookies help immensely with this problem. But they have a few problems of their own:

They are limited in size . Although they can be as large as 4K, they are transmitted with every HTTP transaction in the target domain, so large files will create huge, useless overhead.
They are tied to the machine, not the user. This creates problems for those who share a computer (a family, say) or those who access the Internet on a variety of machines.
They excite paranoia . Many proponents of web privacy worry about the potential for cookie abuse.

The first two problems can be solved by an efficient data strategy and a sensible approach to persistence.

A smart designer retains the detailed history of each individual in a database on his own server. Cookies should provide only a tiny key into that database, not the data content itself. Cookies can be set to expire at the end of a session, or to last up to five years .

Persistent cookies last longer than a single session. They make it easy for the developer to supply the database key to the returning user. If the user is on another computer, the key can be entered manually. Usually this spawns a nonpersistent cookie, which maintains the visitor's identity through the session.

The widespread fear of cookiesis trickier. Many people are unconcerned, and many are adamantly protective of the integrity and isolation of their hard drive. Most fall in the middle, abandoning vague objections in the face of ubiquity and in the name of convenience.

On the one hand, giving an outside entity permission to write to a local file system compromises principles of security. It is possible for an evil marketeer to assemble a composite of your activity on his site. (After all, that is a cookie's basic purpose, except the evil marketeer part.) But there are strong safeguards.

It is not possible for a cookie to be used to monitor general Internet activity. Cookies can be read only by the servers that create them. The server specifies the domain and the directories for which the cookie is valid, but the domain must be the server's own.

It is not possible for the writing of the cookie to perpetrate destructive mischief on the file system. While it does indeed perform a file write, that operation is hidden from the software and managed entirely by the browser.

Cookies are just text strings. They are not malicious agents that can execute instructions of any sort .

It is not possible for cookies to somehow read the identity of the user, the user's machine, or the user's e-mail address. The only information they report is information the user created while visiting the site.

As a web user, you may have mixed or even negative feelings about cookies. But as a web developer, you wish they were accepted universally . No other technique matches cookies for convenient , respectful, and reasonably reliable identification of users. Every HTTP request to your server identifies its sender transparently and absolutely effortlessly.

In fact, the worst downside is that cookies can be turned off. You will need to decide whether you will support both cookie-enabled and cookie-disabled browsers. The alternatives are ugly: Usually the identity is either embedded as GET variables in each URL or is buried again and again in obscure hidden fields in POSTed HTML forms. The GET technique is highly exposed and requires code attention at every link to concatenate the variable ID with the fixed URL. The POST technique has even more code overload, where every link must be formulated as the POST of an HTML form. It also exposes the authentication system to the curious visitor, but less obviously than in GET.

Some developers create duplicate mechanisms to support both cookies and some alternative identity scheme. Others bravely decide that their site (or features of it) will simply require cookies (just as they decide a site or a feature set requires Flash). This is generally a business decision, not a technical one. A developer can present the trade-offs, while the publisher makes the call. Since the trade-off is usually ease of development versus size of potential market, expect seat-of-the-pants decisions to usually go against you.

Even older surveys (2000) report disabled cookies in about 10 percent to 14 percent of browsers and trending downward. (The number varies widely depending on demographics .) The value of this market segment can best be balanced against the misery of developing both a cookie and non-cookie version when misery is expressed in terms of budget and schedule. Good luck.

Coding Cookies

Although JavaScript programmers can directly manipulate cookies from within the client code, we cannot in ActionScript. We treat cookies in the classic manner, as data belonging to the server but residing on the client machine.

All code that explicitly reads or sets cookies is server-side PHP code. The ActionScript code reacts to the events the cookies set in motion. In our example, we want the system to do the following:

Allow the user to fill in the username and password.
Record name and password in a cookie.
Append the cookie to all transactions in the session.
Transparently retrieve the information on next login.

We already have step 1though we must be careful not to break it.

Step 2 is straightforward PHP code; we set a cookie as soon as we have a new valid username and password. It requires no ActionScript.

Step 3 is automatic.

Step 4 requires cooperation between ActionScript and the server. The server is quite capable of reading the cookie and transmitting it to Flash, but Flash must initiate the transaction. So as this MovieScript initiates itself, it will send a request to the server and examine the response. This transaction will be an XML exchange.

In fact, we implement this feature by simply augmenting the XML dialogue we already have developed. In it, the client sends <PASSWORD> and <USERNAME> elements upstream and receives an <ACCEPT> or <DENY> token coming downstream. We extend this a bit so that the ActionScript client can intelligently handle <PASSWORD> and <USERNAME> elements when they arrive coming downstream from the server.

XML Cookies

The format of cookies follows url encoding . The set of cookies is a flat array of string variables arranged in key/value pairs. Each cookie is a single element in this array. The header of our transactions includes something like this:

HTTP

 Cookie: USERNAME=anonymous; PASSWORD=bluefish;

This information is adequate for our purposes, but it lacks the depth of XML, such as hierarchy, attributes, and indirection. So we aim instead at this:

HTTP

 Cookie: login=<LOGIN><USERNAME>anonymous</USERNAME><PASSWORD>bluefish   </PASSWORD></LOGIN>

This code affords plenty of flexibility. If you examine a raw header (with a packet sniffer, for instance), you will see that this string is url encoded , so it actually looks like this:

HTTP

 Cookie: login=%3CLOGIN%3E%3CUSERNAME%3Eanonymous%3C%2FUSERNAME%3E. . .

But this format is transparent when using PHP to set and read the cookie.

While XML is a bit less efficient in bandwidth, this shortcoming is not significant in any likely scenario. It is more efficient in preserving a different resource. The cookie spec requires that browsers be able to store 20 cookies per domain. It is certainly conceivable that a publisher might want to retain no information about visitors and instead save all history to cookies on the browser's machine. Reasons to do this might involve respecting the Children's Online Privacy Act or a fear of security compromise from hackers on the one hand or a subpoena on the other. Or the developer might simply want to avoid creating and maintaining a real database when all that's needed is casual identification.

In such a scenario, the limit of 20 cookies could be far too little. XML by definition collapses all its variables into one element and thus uses only a single cookie, regardless of its contents.

Client Side

Only a little new ActionScript is required. So far only the user has been able to input the name and password. We need to let that data be set by the XML input. This requires just two tiny access functions (to write the username and password) and two simple cases to find those elements and call the appropriate function.

ActionScript

 function xmlAccept(element){     statusline= "helloi have been   accepted"; } function xmlDeny(element)   {     statusline= getText(element); } function xmlUsername(element){     username  = getText(element); } function xmlPassword(element){     password  = getText(element); } function findElements() {     if(this.nodeName eq "ACCEPT") this.target.xmlAccept(this)     if(this.nodeName eq "DENY") this.target.xmlDeny(this)     if(this.nodeName eq "PASSWORD") this.target.xmlPassword(this)     if(this.nodeName eq "USERNAME") this.target.xmlUsername(this)

Now all we need to do is call the server twice. We call the server first to see if it will supply a username and password. If it does, it is because our code just supplied them in the header of our request. This means that they have already been recorded on this machine as cookies. We call the server again when the user punches the button or hits Enter. Both those calls can use the submit() function, which packages the current username and password into an XML object ( xmlRequest ) and sends it to the server. It takes the returning XML ( xmlResponse ) and processes it with findElements().

This XML traffic is carried in the content bodies of the HTTP request and response. A parallel and independent cookie traffic is occurring in the HTTP headers. It is not as symmetrical as the content traffic, where the server and client exchange identical data packages.

The server can, on any response, insert a set-cookie() command into the header. The client, with every request (in the domain for which it is valid), declares the cookie and its value to the server.

By adding an immediate call to submit() when login initializes, we ask the server if we have a meaningful cookie. (We can't look at our own headers directly. Like a man with something written on his forehead, we need some help.) If the server finds a name and password in a cookie, it will pass it back to us. It will be found by findElement(). When it is assigned to the appropriate variable, it will automatically fill in the form (Figure 16.4).

Figure 16.4. An Immediate Call to submit()

graphics/16fig04.jpg

Server Side

The muscle is in the PHP. Let's begin by identifying the cookie. The cookie variable we are interested in is called login. As with all cookies, login appears to the PHP script as a global variable, accessed either as a member of the $GLOBALS associative array (as $GLOBALS["login"] ) or by its own name $login after that name has been declared global. We will use the latter method.

 global $login;

This script is called in four situations:

New user, first contact
New user registers
Old user, first contact
Old user signs in

To distinguish these four conditions, the server code must look to see if there is a name/ password cookie in the request header and whether there is a name/password XML object in the request content. This yields the four states in Table 16.1.

Table 16.1. Login State Logic

No XML XML

No cookie

New user, first contact

It has neither cookie nor XML because it is running for the first time on this computer. The server returns empty XML and lets the user fill in a blank form.

New user registers

The user has submitted a name and password for the first time. The server(after presumably communicating with its database) sends XML to authorize or deny access and a set-cookie() command to record this user's ID.

Old user, first contact

As soon as the client boots up, it sends a cookie containing the ID from the last session and an XML containing nothing because the user has not seen the form yet. The server directly returns the cookie's information (already in XML format) to the client, where it is presented to the user for approval.

Old user signs in

The user has either confirmed the ID suggested by the cookie or typed in a new one over it. We do not need to differentiate between these cases or even from the previous case. When- ever XML arrives at the server, we use it for authorization and save it as a cookie.

Our PHP code must determine which of these four states is appropriate.

PHP

 if(strlen($HTTP_RAW_POST_DATA) < 48){         header("Content-type: text/xml");         if(strlen($login) < 48) echo "<empty/>";         else                    echo $login;         }

Note that our test for meaningful content data is to measure the size of the XML string. This works fine, since we know that the username/password object will require more than 50 characters if it is not empty. But it is a dangerous way to code. It ignores the extensibility of XML, a feature important enough to name the protocol after. In the future, someone may wish to add another element to this XML package (maybe <TEAM> ), and that might cause even the empty structures to exceed 48 characters and be misread as full structures. Better coding technique is strongly recommendedand relatively easy to imaginebut omitted here for the sake of brevity and clarity.

In this fragment, the server detects that no meaningful XML has arrived. It returns an XML package based on the cookie status. If the login cookie is too small to be meaningful, a nearly-void object is returned. But if login looks good, it is sent back to the client. Remember that login is an XML cookie. The cookie itself (" login={ string} ") is a flat URL-encoded pair. But the { string} is pure XML. Labeling it as text/xml and echoing it back to the client works perfectly .

The next code handles the case where the user is submitting an ID for authorization (i.e., there is a meaningful XML upload). It is processed the same as the previous code, except for two things:

 else   {    $xml_parser = xml_parser_create();    xml_set_element_handler($xml_parser, "startElement",                                   "endElement");    xml_set_character_data_handler($xml_parser, "characterData");    xml_set_default_handler($xml_parser, "unknownXML");    if (!xml_parse($xml_parser, $HTTP_RAW_POST_DATA, true))        die("XML error: xml_error_string(xml_get_error_code($xml_parser))            at line xml_get_current_line_number($xml_parser))");    xml_parser_free($xml_parser);   setcookie ("login", $HTTP_RAW_POST_DATA,                mktime (0,0,0,date("m"),date("d"),date("Y")+1));

Setting the cookie must precede all the other header commands. This is an HTTP constraint. Recall another constraint: PHP requires the header commands to precede any other output, even whitespace.

The set-cookie() command takes six parameters:

Name The cookie's name cannot have whitespace, a semicolon or a comma.
Value This string associated with the cookie can be omitted, but that tells the client to delete any existing cookie with this name and namespace.
Expires This is a date in Unix format. In the previous code, mkdate() is used to generate a date in the correct format that refers to local midnight one year from today. Omission of the date tells the client that this is an ephemeral cookie that should be used only during this session.
Pathname This directory path determines the scope of the cookie. The browser combines pathname and domain to create a mask against which URLs are matched to select cookies. The pathname "/" admits the entire domain.
Domain This is the domain in which the cookie is valid. The server that sends the cookie must be within this domain, so generally this parameter is omitted and defaults to the server's domain.
Secure The final parameter is the optional secure flag that prevents the cookie from being transmitted in cleartextonly on secure socket layers .

PHP

 header("Content-type: text/xml");    if($theElements[ "USERNAME" ] == "anonymous")       if($theElements[ "PASSWORD" ] == $anonymousPassword)           echo('<ACCEPT privilege ="guest"/>');       else           echo('<DENY error="404">Incorrect anonymous password</DENY>');    else {        echo "<RESPONSE>";        reset($theElements);        while (list($elementName, $elementContent) = each($theElements))           echo "<$elementName>$elementContent</$elementName>";        echo "</RESPONSE>";        }    }

Finally we have the case where an identity cannot be authorized. The server quickly generates a new XML object from the variables it was sent and passes it back to the Flash client. This is hardly the behavior a real-world server would exhibit, but it is useful for us early in our experimental development.