Working with the Server | Core Web Application Development with PHP and MySQL

Now that we know how to present a user interface through which the user can send us data and find out about that data within our scripts, we can turn our attention to learning more about the server, including technical details of what the user sent to us.

Server Considerations

One of the nice things about PHP is that it generally shields us from most of the minute worries about one particular server environment versus another. We usually do not spend much time worrying about whether we are running on Linux, FreeBSD, or Microsoft Windows, nor do we notice huge differences between The Apache Foundation's httpd and Microsoft's Internet Information Server (IIS).

However, there are a couple of things to which we will pay attention to help us make sure our code is more portable between servers and systems.

Paths and Directories

One of the obvious differences between most Unix systems and Microsoft Windows is in how they manage file paths. While your web site might end up in a directory named /home/httpd/customerwikiweb/www on a Unix server, it might find itself in D:\WebSites\CustomerWiki\DocumentRoot on a Windows machine. This might make it more difficult to piece together paths and directories since you will be writing code to handle forward slashes and backslashes and worrying about the drive letters.

Fortunately, since most of our web applications will focus on databases or content directories that will not be too far away from our executing scripts, we will not have to worry about drive letters often, and most file and directory functions on Windows will correctly handle forward slashes and backslashes. We will thus strive to

Take care when setting up our web sites and content that span multiple drives on Windows servers.
Add code to learn about our operating environment and make sure the correct code for the appropriate system is executed when multiple drive use is unavoidable.
Avoid using too many full paths in our scripts and opt for relative paths. If we are looking for /images/banner.png, it does not matter if the root of our web application is in /home/httpd/www or /moo/cow/eek.

Server Variables

The key mechanism we will use to learn about our operating environment will be the $_SERVER super-global array. There is also the corresponding $HTTP_SERVER_VARS array, provided that register_long_arrays is turned on in php.ini and the ability to have its contents set as global variables is based on register_globals (both of which remain discouraged).

There are numerous fields in this array. We will discuss some of the more commonly used and interesting fields.

PHP_SELF

This key in the $_SERVER array tells us the URI of the currently executing script relative to the root of the web site being accessed. For example, if the user asked to see

http://www.cutefluffybunnies.com/scripts/showbunnies.php

a request to see $_SERVER["PHP_SELF"] would return /scripts/showbunnies.php. Please note that if we asked for the value of this from within a script that is included in another script, the outermost executing script (the one that performed the inclusion) would be the value returned.

SERVER_NAME

This is the name of the server to which the request was sent. It is not prefixed with the http://, but rather the name of the server, such as www.cutefluffybunnies.com. This returns the name of the requested virtual server when the current web server is serving up the content for more than one web site. (Most modern web servers support this feature.)

SERVER_SOFTWARE

This value tells you what software the server is running. This does not prove useful for purposes other than statistics or information, but there might be situations when we want to query a particular web server and need to know if we are running it before we do so. (We will see more about querying specific servers later.) The values for the primary servers on which we are running test scripts are

 Microsoft-IIS/5.1

and

 Apache/1.3.33 (Unix) PHP/5.0.4 mod_ssl/2.8.22 OpenSSL/0.9.7f

While there are few situations when we will care about the server on which we are running, we can test the value in a manner similar to the following:

 <?php   if (strcmp(substr($_SERVER['SERVER_SOFTWARE'], 0, 6),              'Apache') == 0)   {     // call some apache-specific function   } ?>

SERVER_PROTOCOL

This value tells us which protocol the client used to request this page. The value will almost always be "HTTP/1.1," though it is possible that some clients will send us an older version (such as HTTP/1.0), implying that some functionality will not be available or understood. We will learn more about the HTTP protocol in Chapter 13.

REQUEST_METHOD

This is the data submission method used by the HTTP request. In addition to the GET and POST methods, this value could alternately contain PUT or HEAD (which we will rarely use). Although we can use this to learn whether a form was sent to us with GET or POST, we will generally know how our scripts are interacting and not query this.

REQUEST_TIME

This variable is not available under all servers, but for those that support it, it serves as a way to learn when a request was received by the server. For those who really need this information and are on a server where it is not provided, the date and time functions are a reasonable compromise. You can learn more about these functions in the PHP Online Manual.

DOCUMENT_ROOT

To find out in which directory we are executing code, we can query the DOCUMENT_ROOT field. (This is not available on all servers.) Fortunately, even for servers where the field is not available, there is another field called ORIG_PATH_TRANSLATED that provides the full disk path to the currently executing script. That value, minus the value of PHP_SELF at the end, ends up containing the same value.

 <?php   function get_document_root()   {     if (isset($_SERVER['DOCUMENT_ROOT']))     {       $doc_root = $_SERVER['DOCUMENT_ROOT'];     }     else     {       // get the information we DO have       $script = $_SERVER['PHP_SELF'];       $full_path = $_SERVER['ORIG_PATH_TRANSLATED'];       // on Windows machines, which will have backslashes       // these two lines replace all \ chars with /       $fp_parts = split('\\\\', $full_path);       $full_path = implode('/', $fp_parts);       // now go and extract the portion of the full path that       // isn't the name of the executing script.       $script_start = strpos($full_path, $script);       $doc_root = substr($full_path, 0, $script_start);     }     return $doc_root;   } ?>

This function correctly returns the document root on servers regardless of whether the DOCUMENT_ROOT field is visible in $_SERVER. There are two important things to note in this function:

The implode function, which takes an array ($fp_parts) and concatenates the values together in the given order and separates them by a given string ( / ), is not multi-byte enabled. However, this is not a problem because it does no processing on the individual array pieces. Even if they are multi-byte characters, the implode function attaches them with the intermediate characters.
The split function asking to split the path whenever the character sequence '\\\\' looks a bit strange. Oddly enough, these four backslashes are what is required to separate the given string whenever there is a single backslash in it! The first \character must be escaped so that PHP does not think it is escaping the final closing single quote (\\). However, the split function operates on regular expressions. (See Chapter 22, "Data Validation with Regular Expressions," for more detail.) In regular expressions, the backslash character is used to begin an escape within a regular expression; therefore, we have to include another pair of backslashes to tell the split function that we just want to split whenever we see a real \character.

If you are wondering why we do not use the explode function (which will take a string and break it apart when it sees another character sequence), the answer is simpleit is not multi-byte safe. While most non-Asian computers do not yet use Unicode or other multi-byte character sets for their file systems, the use of this is increasingly popular, and we would like to be safe as much as possible. For those cases where we can be positive that there will be no multi-byte strings, we might consider the explode function since it is faster than split.

HTTP_USER_AGENT

You can look at this field to see through which agent (browser, program, and so on) the client has made the request for the page. Values for Mozilla.org's Firefox Browser on Windows will print the following:

 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)    Gecko/20041107 Firefox/1.0

Some web applications insist on being sure that the connecting program is a valid web browser to prevent 'bots' (automated programs that crawl the web without requiring user interaction) from accessing their content. However, this is only a marginally effective tactic since the user agent is an easily included field in the HTTP request.

We will take this opportunity to strongly discourage web application authors from using this field to require users to visit their site with a particular browser. This is likely to annoy prospective customers or users of your site and will not save you significant amounts of work.

REMOTE_ADDR

If you want to know (and perhaps even log in your database) from which IP address the client is connecting, this is the field to query. Although not foolproof since advanced users can modify ("spoof") this on incoming packets, it can still be a useful tool for identifying people for applications, such as public forums or discussion areas.

It should be noted that individual requests from the same user in the same 'session' can in fact come from different IP addresses. Depending on the Internet Service Provider through which the user is connecting to the Internet, data might be routed by multiple machines in a short span of time.

Others

There are a number of other fascinating and interesting fields on the $_SERVER array with which you are encouraged to spend time perusing and experimenting. A simple script to dump and view all of them would be something along the code listed in Listing 7-3 (you can also call phpinfo to see a list of others). Try it on different servers (IIS vs. Apache, Windows vs. Unix, and so on) and see how the results differ. Some output for one of our test servers is shown in Figure 7-5.

Listing 7-3. Viewing Information About Your Server

   <html> <body> <table width='100%' border='1'> <?php   foreach ($_SERVER as $key => $value)   {     echo <<<EOT <tr>   <td width='25%'>      <b>$key</b>   </td>   <td>      $value   </td> </tr> EOT;   } ?> </table> <br/><br/> </body>   </html>

Figure 7-5. Browsing all the $_SERVER variables on our test server.

Environment Variables

In addition to the $_SERVER array, which tells us about the server we are operating on, the $_ENV array lets us access the operating system environment variables for the current (server) process. These are usually more specific to the operating system under which we are running, and tell us things such as what the PATH (the set of directories to search for executables), host name, operating system, or command shells are.

We can trivially modify the code from Listings 7.1 to list the subvariables of $_ENV instead of $_SERVER. You are encouraged to try this and learn more about your operating environment.

As an example, if we were slowly migrating a web application from a Microsoft Windows Server to a Unix Apache-based server and wanted to have some code to let it look for a configuration file in a number of locations, we might write the following code:

 <?php   // don't need to use mbcs-safe functions for this   if (isset($_ENV['OS'])       and (strcmp($_ENV['OS'], 'Windows_NT') === 0))   {     $schema_path = 'n:/webserver/schemas/config.xml';   }   else   {     $schema_path = '/home/httpd/schemas/config.xml';   } ?>