Before we get into how sessions work in PHP and the benefits they can bring your application, we look at the broader concept of sessions in general. What are they, how do they work, and why are they so important?
Consider how a Web request actually works.
When the user's Web browser makes a request, it presents, among other bits and pieces, the following core information to the Web server:
Method of request either GET or POST and the protocol version used (1.0 or 1.1)
The actual document required (for example, /index.php)
The server hostname from which the document is being requested (important because many Web sites are hosted on a single server with a single IP address)
The request parameters (for example, foo=bar, username=fred, password=letmein) in URL-encoded format
The browser type (known as the user agent; essentially variant, platform and version)
Any cookies stored on the client's machine that have been previously issued by the server of which the client is now making a request
If you want to see this in action, telnet to port 80 on the Web server of your choice and issue something that looks like the following:
GET /pub/WWW/TheProject.html HTTP/1.1 User-Agent: CERN-LineMode/2.15 libwww/2.17b3 Host: www.w3.org
In response to your request, the Web server will spit out an appropriate Web page.
You would not be dense if you struggled to see how this protocol could be anything but stateless the server, after all, is completely unaware at the time of a given request of what went on during a previous request.
This approach causes problems because although the Web server sees a user's requests as stateless, the user does not. The user remembers perfectly well what was done at the time of the last page requested and expects, rather reasonably, the Web server to do the same. A particularly good example of this is when a username and password are required to access a particular page resource. The user expects to enter this information once and only once. The application should remember the details and not ask for them again should the user need to request a similarly restricted page on the site.
After all, this is generally how computers have worked for decades. Mirroring this functionality in your Web applications is vital for the sake of your users' collective sanity.
In the early days of the Web, this functionality was sometimes provided by the server's checking the remote IP address of the connecting user against a database. When the user first logged in, the IP address was recorded and subsequent requests (within a given time frame) from that IP were assumed to have come from the same user.
This was fine in an Internet with no firewalls, Network Address Translation, proxy servers, or other such pesky intrusions. Today's Internet is quite different, of course. Some consumer ISPs will actually present completely different external proxy server addresses with each request a user makes to your Web site a side effect of their own proxy load balancing technology.
Another common solution to the problem, which does not depend on a consistent IP address from request to request, is to use HTTP Authentication. If you've ever visited a Web site and been presented with a box that looks something like Figure 15-1, then you've met HTTP Authentication.
Provided, again, from the very early days of the Web, HTTP Authentication allows certain files and directories on Web servers to be restricted to a group of users. Traditionally, this user list was a simple plain-text file, but recent advances have allowed Apache to directly consult a MySQL database for this information.
This method works well, and after you have issued a username and password for a given directory on a Web server, any subsequent requests to files in that directory or its subdirectories, whatever they may be, will be automatically presented to the Web server alongside that username and password.
This method is of little use when building PHP applications, however. It is rare to want to protect specific scripts, or even directories; far more likely is needing to restrict functionality delineated in some way that is recognized and understood in PHP but not necessarily in Apache. Even though replicating the HTTP headers required to emulate this functionality from within PHP itself is feasible, it's rarely desirable, for a number of reasons:
You have almost no control over the appearance of this box, so it's not exactly a user-friendly approach.
You can't ask for any other information (for example, to pose a security question, as in "What is your mother's maiden name?'')
You can't store any information against the request beyond the identity as which the user has logged.
The list goes on. Thankfully, a third and far more desirable solution is available, and that is to use sessions.
Strictly speaking, a session is defined as a series of consecutive HTTP requests made at designated time intervals by a single user from a single computer on a single Web application.
The general methodology behind sessions is that the first request made by a user will generate a new session, should one not yet exist. Subsequent requests will be considered to be part of that session, unless they are made outside some arbitrary time period (the session timeout period).
A session is normally used to determine the currently connected user of an application, if any. After a user has successfully logged in for the first time, the application's database should record the user's user ID against that session, such that any subsequent requests proffering that session are understood to have been made by that user and no other.
The real meat of a session is the session identifier. This uniquely identifies the session, which may exist concurrently with hundreds of other user sessions. When a session ID is generated and sent to the client for the first time, it is important that the session identifier is both unique and obscure enough so that another, valid session ID could not easily be "invented'' by a potentially hostile third party. For example, although issuing session numbers 1, 2, 3, 4, and so on would certainly satisfy the requirement for uniqueness, it fails to satisfy the security side of things, because a user who has been allocated session number 3 could simply suddenly claim to be session number 4 and potentially gain access to another user's account as a result.
Session identifiers are more often than not 32-character strings consisting of numbers and letters. This is how PHP's built-in session handling (discussed later in the chapter) generates session identifiers. The stumbling of one user across a valid session ID of another is unlikely, therefore, except through brute force and later in this chapter we look at a couple of easy ways to stop that from happening, too.
With the first of the user's requests made and a session identifier generated for that request, you are faced with the challenge of ensuring that the session identifier in question is perpetuated with each subsequent request.
There are two ways to do this: URL rewriting and cookies. A good way to begin is to look at the principles of these without getting bogged down just yet in how PHP's own session handling works.
Consider the following example. Assume for the sake of argument that PHP has decided upon the following session identifier:
and you want this session identifier to be issued with every subsequent request by the Web browser. Wherever you have anchor links in your HTML, you need to arrange for PHP to doctor them so that
<A HREF="mybasket.php">Go to my basket</A>
<A HREF="mybasket.php?session_id=abcde1234567890abcde1234567890ab ">Go to my basket</A>
Obviously, you would not hard-code this into your HTML; rather, you would work out some clever way to infuse your HTML on the fly with such session identifiers.
It would also be necessary for PHP to doctor form targets, so that
<FORM METHOD="POST" ACTION="mybasket.php">
<FORM METHOD="POST" ACTION="mybasket.php"> <INPUT TYPE="HIDDEN" NAME="session_id" VALUE="abcde1234567890abcde1234567890ab">
Indeed, there may well be rare cases of URLs being missed and, consequently, the session lost. The really unpleasant part about URL rewriting is that if you lose the session on one request, it's lost forever, so thoroughness is essential. But this is not the only pitfall.
How do you bookmark pages in this way? If you bookmark the page you're on, you'll also record the session identifier, and this won't be valid next time you log in. So, you'll have to manually scrub out the session identifier when you bookmark.
The biggest and most prevalent pitfall comes when people try to copy and paste links to their friends and colleagues. Joe User simply will not think to strip out the session identifier before he e-mails a page on your site to his friends. As a result, when his friend clicks the link, one of two things will happen depending on the level of security you've implemented. Either the friend will gain full access to the user's original login, or the system will freak out at a potential security breach because the user is claiming to be the rightful owner of a valid session without presenting other credentials associated with that session.
There is a better way, but it's not without its share of (largely unwarranted) controversy using cookies.
Although URL rewriting is theoretically speaking the simplest form of session perpetuation, using cookies is even simpler in terms of the amount of code required.
Cookies are little nuggets of information sent by a Web browser along with the HTML output of a page. The Web browser is instructed to record that information and then volunteer it with every subsequent request made to that Web server.
Cookies have, much the same as variables, a name and a value. Some also have a validity (how long it lasts) and a scope (which server or servers should receive it). With each request to the Web server, the user's Web browser offers the name and value of any cookies within the scope of that Web server's domain that have yet to expire. Expired cookies are deleted automatically by the Web browser, but live cookies can also be instantly deleted by the Web server if necessary.
The implementation is simple. As with the previous example, assume that the session identifier is
With the first request made by the user's Web browser in a session, this session identifier needs to be pushed to the user's Web browser so that the Web browser knows to offer it on subsequent requests. Accordingly, a cookie is sent to the user's Web browser instructing it to save the value abcde1234567890abcde1234567890ab to an appropriately named identifier.
With each subsequent request made to the Web server, PHP looks for a cookie offering a valid session identifier, with that validity remaining to be checked against some external set of rules and/or a database of valid sessions. If a valid session identifier has been offered as part of the cookies sent by the Web browser, PHP can assume this to be the correct session identifier and proceed with the script as normal. If none is offered, or that which is offered is deemed to be invalid, a new session identifier is generated and in turn sent as a cookie, as in the previous step.
This cycle continues from request to request throughout the remainder of the user's session.
For the name for your cookie, you should stick with something easily identifiable such as session_id. The scope should be restricted to your own Web server (or, in a pinch, domain), and the validity equal to the maximum dwell time you think users will need on your site. For example, if you don't think a typical user will ever use the site for more than half an hour, setting a validity period of 30 minutes isn't a bad idea.
The media have generated much controversy about cookies. The bulk of that controversy has arisen from ignorance on the part of the media, some instances of extraordinarily poor site design, and perhaps some commercially unscrupulous site operators.
Some potential pitfalls do exist, as the next section explains.
Is a session identifier really secure in itself? There are certainly a few risks associated with using session identifiers as the sole means of identifying logged-in users on your site but, mercifully, there are countermeasures you can take to help minimize those risks.
If a malicious visitor to your site happens to fashion a valid session identifier, he or she can hijack the session of another visitor. After all, if user A is making request after request using session identifier X, and then malicious user B comes along and uses session identifier X to make a request as well, what is your Web site to think? It will assume that user B is user A, and user B will have access to everything user A has access to, potentially exposing sensitive information.
For this scenario to be realized, the potential hacker needs to guess at a valid session identifier. How feasible is doing so in a real-world situation?
Consider your 32-character hexadecimal string. This could be generated in any number of ways, but assume that it is largely random. With 32 bytes, and 16 possible characters for each byte (a thru f, 0 thru 9), you have 1632 possible combinations for your session identifier. It would take an awfully long time for a potential hacker to cycle through all possible session identifiers 00000000000000000000000000000000, 00000000000000000000000000000001, 00000000000000000000000000000002, and so on, all the way up to ffffffffffffffffffffffffffffffff.
But what if the identifier were an MD5 serialization of a 10-digit number? If the hacker actually knew this fact, there would only be 10,000,000,000 combinations to cycle through, given that the MD5 hash of any given number is a constant answer. It would be a very small matter to knock up a script (perhaps even in PHP) to try each session identifier on a loop until it struck one that actually worked.
Consider what would happen if the user requested an "update my details'' page (mydetails.php). This page behaves in one of two ways, depending on whether a valid session was issued with the request:
If a valid session is issued and the session is known to be logged in, the page displays the user's details and allows the user to reset his or her password, among other things.
If a valid session is not issued, the page issues a 302 redirect to a login page.
The mischievous script would have to iterate through these 10 billion combinations while making an HTTP request to mydetails.php with each one. If it gets a 302 redirect, ignore it and move on; if it gets the contents of the "my details'' page, reset the user's password to 12345 by submitting the form on the page, making a note of the username, and moving on to the next session.
Leave this running for a few weeks, and it can absolutely be guaranteed that the script would have chanced upon a few valid sessions during its run. It may even have alerted the hacker via e-mail or an SMS text message when it struck a user account whose password it managed to reset. Keep in mind that with a fast connection (and the ability to run multiple instances of this script across multiple servers), a really determined hacker can get through many thousands of session identifiers every second.
If the hacker didn't know how your session identifiers were formed, you would be fairly safe. Using a random number between two limits is a pretty bad idea, however, even if the hacker didn't know what those limits were. It would be a simple matter to write a script to bombard your site with cookie-less requests, each of which would generate a new session; soon enough, the laws of probability would dictate that your hacker would be furnished with a complete database of all feasible MD5 encoded session identifiers (all 10 billion, in the previous example). The hacker wouldn't be able to reverse-engineer these to work out how they were formed, but doing so isn't necessary. The hacker can simply feed the list of pre-hashed identifiers into his or her brute-force script.
The only way to prevent such analysis is to make sure that sessions are single use. This means that the same session identifier will never be generated twice, or at least not within a considerable time span of its first generation. A good way to do this is to construct your session identifier based on a combination of a random number and the current timestamp. That way, any script constructed to fish for a complete database of session identifiers will be practically useless because its database of valid session identifiers will (theoretically) grow to near-infinity.
One simple way to completely obliterate session guessing is to send as a second cookie a supplemental key alongside the session identifier. This key would be completely random and generated at the time the session is formed. The key associated with that session would be stored in the database.
With each request, both the session identifier cookie and the session key cookie would be transmitted. Even if the session offered is valid, PHP would still check that the key transmitted matches the key originally issued and, if not, immediately cancel and invalidate the session. By canceling the session offered, any attempt to force further possible combinations of key against that session would be pointless because the first wrong key issued would invalidate that session anyway. The only downside of this approach is that your legitimately logged-in user would be logged out as the session was cancelled, but it's better to be safe than hacked.
For true randomness when generating such keys, try to avoid using any kind of randomization whose seed is based on the system clock. Using truly random seeds such as processor temperature, PHP process identifiers, and network interface statistics produce a more satisfactory random number.
Discovery of valid sessions is a less prominent threat but a harder one to protect against. Essentially, the risk is that a malicious third party can somehow gain access to the cookies on your legitimate user's machine and use them to gain access to the user's session.
Unfortunately, this technique is not immune to the session key methodology discussed previously. If a hacker can gain access to the session cookie, the key cookie is just as easily obtainable.
There are a few scenarios in which this could happen:
A session cookie is mis-issued by your site so that its scope is broader than it needs to be. A malicious Web site within that overly broad scope then visited by the user would be made privy to that session cookie, too. The operators of that site could then use the cookie to hijack your session on the original site. The user does not even necessarily have to visit a malicious site. Literally millions of pieces of HTML spam e-mail every day are sent out containing images, the purpose of which is simply to cause its recipient to make a Web request, the hope being that session cookies from other poorly constructed sites will be offered accidentally.
A physical intrusion to the machine could allow access to the cookies stored on it, from which data on valid sessions could be extracted.
An intrusion to the machine that somehow modifies the HOSTS file could redirect traffic supposedly for your site's hostname to a third-party site. This third-party site may even relay the genuine data from your Web server, such that the user is none the wiser. It would, crucially, collect the session identifier for later use.
A poorly configured network could allow session identifiers to be sniffed in HTTP traffic by malicious systems administrators on-site.
A malicious employee of an ISP that employs proxy servers (transparent or otherwise) could easily enable session identifiers transmitted to be logged.
The first of these scenarios is easy to overcome. When issuing cookies from your site, always get the scope right! This most basic of errors is exploited with such frequency only because of the frequency with which it is made in the first place.
A physical intrusion is almost impossible to guard against. Any physical exploitation is attributable more to lackluster physical security of the location from which the site is being accessed than to anything else. Good session practice (see the following section) can help minimize the risk of this kind of occurrence.
Another major problem is the modification of the HOSTS file of a machine allowing nameservers to be overridden transparently. It is remarkable there are not more e-mail worms in circulation that exploit this, but it is only a matter of time. The HOSTS file exists on almost every variant of Windows imaginable, and unless a tight local administration rights policy has been set up, users can modify it themselves. An e-mail worm (distributed as an attachment, as are most modern worms) exploiting this vulnerability would work as follows:
The user regularly visits www.myfictionalbank.com to use an online checking account management facility. This resolves to 10.123.123.123.
An attachment (which might be .vbs, .scr, .pif or any of the other numerous file extensions that still aren't universally blocked by e-mail clients) viral payload creates an entry in your HOSTS file pointing to a malicious foreign IP address, which is actually a server owned by the hacker. The IP address is 192.168.123.123 and is probably located somewhere in Eastern Europe.
Next time the user goes to log into his or her online banking account, he or she enters www.myfictionalbank.com and appears to get the account login page as normal. In fact, the connection is being made to the malicious Web server. The malicious server in turn makes a real connection to the real bank's Web server and mirrors every request made by the user. It re-transmits all the data returned by the real bank's Web server. The only data not relayed back in this way is that generated when the user's session details (immediately after login) are instantly recorded by the malicious relay server, which then seizes control of the session, denying the real user any further access. In just a few seconds, several thousand dollars of the user's hard-earned cash are on their way to Eastern Europe.
Of course, a scam like this wouldn't last long. The bank would get wind of it pretty quickly and block off the malicious server's IP address (unless the worm actually acted as a relay for other instances of itself on un-firewalled machines and updated some central database of the IP addresses of places where the worm has nested), but it does serve to illustrate just how easily this kind of scam can be pulled off.
On-site exploitation is as big a threat as exploitation from Eastern Europe, particularly in big companies. Many corporate networks may be switched at their most central point, but when floor socket availability starts to get tight, most systems administrators slap in a hub. It's cheap and quick, and the two PCs on that hub can immediately sniff each other's traffic. If a savvy user is on the same hub as a colleague whose session the user wishes to seize, some simple traffic-sniffing software can retrieve those session identifiers.
Finally, keep in mind that many ISPs these days use transparent proxy servers. That is, all HTTP traffic is intercepted and relayed through a standard proxy server without the user's explicit knowledge (and without any kind of configuration required on the user's part). Ostensibly, this is to improve performance by caching commonly requested pages and images at the ISP so that they are only one hop away from the user. The more genuine reason is that ISPs are under increasing pressure in the current climate to maintain very detailed logs on the behavior of their customers on the Web. This is not for commercial reasons, but rather for the purposes of handing over this information to law enforcement agencies should they demand it. The potential for abuse of this facility is quite obvious. If cookies are logged, seizing them is a simple matter for a bored systems administrator.
As we have seen, physically obtaining the session identifier (and session key if available) is very difficult to guard against. The most effective barriers to such exploitation are physical, political, or economic in nature. They certainly have nothing to do with PHP.
But there are a few neat things you can do in your code should you wish to minimize the risk of these theoretical exploits being put into practice.
First and foremost, use session timeouts. These are not the same as session expiration times. When issuing a user with a cookie, you will almost certainly give it an expiration time, but this does not guard against the user's walking away from his or her computer and forgetting to log out. Using session timeouts involves recording the timestamp of each request against the session and then, with each subsequent request made against that session, measuring the time elapsed since the last request. If it exceeds five or ten minutes, you would be strongly advised to revoke the session and request the user to log in again.
This is a remarkably effective measure. Many of the attacks described earlier rely on being able to make use of a session perhaps only some moments after the user has finished with it. By reducing the window of opportunity to just a few minutes, you reduce the effectiveness of those attacks.
The implementation of session timeouts has nothing to do with cookies; rather, the code you implement to handle sessions has responsibility for measuring the time elapsed between requests and determining when to revoke a session due to a timeout. In the next section of this chapter, you'll find out about the UserSession class, which can do just that.
Having a low expiration time in your session cookie does make sense. Users don't mind being asked to log in again occasionally, so setting an expiration time of an hour or so is fine. If you do this, be sure to build the rest of your application so that the interruption to the user is minimal. For example, if you are building a Webmail application, don't instruct your code to arbitrarily end a user's session if he or she is in the middle of sending out an e-mail. Try to design sympathetically.
As mentioned previously, you will see how to effectively implement a low cookie expiration time in the next section of this chapter.
One simple check you can make involves recording the user's Web browser User Agent against the session. This is the string issued by the Web browser identifying the manufacturer, browser name, version, and platform. It is by no means unique to that computer, but there are so many different browsers and browser versions in use that there is a strong chance that any two computers produce slightly different User Agent strings. By ensuring that subsequent requests against a session carry the same User Agent as the original request, you provide an additional line of defense against session hijacking. Interestingly, the user agent string that Internet Explorer produces can be modified in the Windows registry. System administrators can take advantage of this fact by making their network's workstations have organization-specific user agent strings; this further strengthens the usefulness of this mechanism of security.
If you're about to let the user place an order, or view and update his or her details, don't be afraid to ask for the username and password again. That way, should a hacker succeed in taking over a session, the damage that can be done will be limited to some degree. Many large commercial sites employ this practice, and the user will not be too irritated at this request if used judiciously.
If you're feeling particularly brave, you could construct some algorithms to watch for unusual traffic against a particular session in order to determine multiple usage. How you define unusual traffic is open to debate, but you could, for example, raise an alarm if more than one request is made in any one-second period and cancel the session as a result. Alternatively, you could map the physical navigation of your site to likely user paths and ensure that the requests made by the user's session is following one of those paths. For example, on an e-commerce site, should the user request the index of a particular product category with one request, it is exceedingly unlikely that he or she would then immediately select the view product page for a product that falls outside that category. To do that he or she would have to have accessed it from a bookmark or third-party link, or have typed the URL in. However, if your algorithms are not rock solid, the false-positives rate could easily start to disgruntle your users.
Although it is not legitimate to determine the validity of a session based on the consistency of the IP address, it is still possible to detect some possible break-in attempts by keeping an eye on its variance. Load balancing proxy servers are likely to change IP address with each request, but by how much? It is incredibly unlikely that anything but the last two octets would change. Certainly, the request would most definitely remain within the same net block (that is, the owner of the block would be consistent from request to request). An excellent way to check this is to consult RIPE, ARIN, and so on in real time.
In your application, you would simply devise a low-overhead means of checking the net block owner, utilizing some kind of caching technology if possible. And if requests of the same session seem to be coming from different owners, it's worth raising the alarm and destroying the session.
One obvious benefit of sessions is the ability to associate the value of relevant variables to a session. These variables could be the contents of a shopping basket or the population of a search form, for example.
Nevertheless, storing this information on the client side is best avoided. For one thing, this can get privacy activists very upset; for another, the value of those variables is open to modification outside the control of your code (either malicious or accidental).
Pumping that much data upstream with each HTTP request can be quite inefficient. Keep in mind that most broadband connections are asymmetric, meaning that they upload slower than they download. So if you are storing 16K worth of session variables on the client side, that will add a second or so to each request. That may not sound like much but it can quickly drive users crazy, so using cookies in this manner is not good practice.
All you need to store on the client side is the session identifier (and supplemental security key, if used). You should store everything else on the server side somewhere, out of harm's reach, and associate those variables with the session identifier in question.
The lesson of this part of the chapter is that 100 percent security is always just out of reach. However, there are always steps you can take to minimize your own exposure.