Section 2.1. Forms and Data | Essential PHP Security

2.1. Forms and Data

When developing a typical PHP application, the bulk of your logic involves data processingtasks such as determining whether a user has logged in successfully, adding items to a shopping cart, and processing a credit card transaction.

Data can come from numerous sources, and as a security-conscious developer, you want to be able to easily and reliably distinguish between two distinct types of data:

Filtered data
Tainted data

Anything that you create yourself is trustworthy and can be considered filtered. An example of data that you create yourself is anything hardcoded, such as the email address in the following example:

     $email = 'chris@example.org';

This email address, chris@example.org, does not come from any remote source. This obvious observation is what makes it trustworthy. Any data that originates from a remote source is input, and all input is tainted , which is why it must always be filtered before you use it.

Tainted data is anything that is not guaranteed to be valid, such as form data submitted by the user, email retrieved from an IMAP server, or an XML document sent from another web application. In the previous example, $email is a variable that contains filtered datathe data is the important part, not the variable. A variable is just a container for the data, and it can always be overwritten later in the script with tainted data :

     $email = $_POST['email'];

Of course, this is why $email is called a variable. If you don't want the data to change, use a constant instead:

     define('EMAIL', 'chris@example.org');

When defined with the syntax shown here, EMAIL is a constant whose value is chris@example.org for the duration of the script, even if you attempt to assign it another value (perhaps by accident). For example, the following code outputs chris@example.org (the attempt to redefine EMAIL also generates a notice):

     <?php     define('EMAIL', 'chris@example.org');     define('EMAIL', 'rasmus@example.org');     echo EMAIL;     ?>

For more information about constants, visit http://php.net/constants.

As discussed in Chapter 1, register_globals can make it more difficult to determine the origin of the data in a variable such as $email. Any data that originates from a remote source must be considered tainted until it has been proven valid.

Although a user can send data in multiple ways, most applications take the most important actions as the result of a form submission. In addition, because an attacker can do harm only by manipulating anticipated data (data that your application does something with), forms provide a convenient openinga blueprint of your application that indicates what data you plan to use. This is why form processing is one of the primary concerns of the web application security discipline.

A user can send data to your application in three predominant ways:

In the URL (e.g., GET data)
In the content of a request (e.g., POST data)
In an HTTP header (e.g., Cookie)

Because HTTP headers are not directly related to form processing, I do not cover them in this chapter. In general, the same skepticism you apply to GET and POST data should be applied to all input, including HTTP headers.

Form data is sent using either the GET or POST request method. When you create an HTML form, you specify the request method in the method attribute of the form tag:

     <form action="http://example.org/register.php" method="GET">

When the GET request method is specified, as this example illustrates, the browser sends the form data as the query string of the URL. For example, consider the following form:

     <form action="http://example.org/login.php" method="GET">     <p>Username: <input type="text" name="username" /></p>     <p>Password: <input type="password" name="password" /></p>     <p><input type="submit" /></p>     </form>

If I enter the username chris and the password mypass, I arrive at http://example.org/login.php?username=chris&password=mypass after submitting the form. The simplest valid HTTP/1.1 request for this URL is as follows:

     GET /login.php?username=chris&password=mypass HTTP/1.1     Host: example.org

It's not necessary to use the HTML form to request this URL. In fact, there is no difference between a GET request sent as the result of a user submitting an HTML form and one sent as the result of a user clicking a link.

Keep in mind that if you try to include a query string in the action attribute of the form tag, it is replaced by the form data if you specify the GET request method.

Also, if the specified method is an invalid value, or if method is omitted entirely, the browser defaults to the GET request method.

To illustrate the POST request method, consider the previous example with a simple modification to the method attribute of the form tag that specifies POST instead of GET:

     <form action="http://example.org/login.php" method="POST">     <p>Username: <input type="text" name="username" /></p>     <p>Password: <input type="password" name="password" /></p>     <p><input type="submit" /></p>     </form>

If I again specify chris as my username and mypass as my password, I arrive at http://example.org/login.php after submitting the form. The form data is in the content of the request rather than in the query string of the requested URL. The simplest valid HTTP/1.1 request that illustrates this is as follows:

     POST /login.php HTTP/1.1     Host: example.org     Content-Type: application/x-www-form-urlencoded     Content-Length: 30     username=chris&password=mypass

You have now seen the predominant ways that a user provides data to your applications. The following sections discuss how attackers can take advantage of your forms and URLs by using these as openings to your applications.