Recipe 19.5. Collecting Web Input

Problem

You want to extract the input parameters that were submitted as part of a form or specified at the end of a URL.

Solution

Use the capabilities of your API that provide a means of accessing the names and values of the input parameters in the execution environment of a web script.

Discussion

Earlier recipes in this chapter discussed how to retrieve information from MySQL and use it to generate various forms of output, such as static text, lists, hyperlinks, or form elements. In this recipe, we discuss the opposite problemhow to collect input from the Web. Applications for such input are many. For example, you can use the techniques shown here to extract the contents of a form submitted by a user. You might interpret the information as a set of search keywords, and then run a query against a product catalog to show the matching items to a customer. In this case, you use the Web to collect information from which you can determine the client's interests. From that you construct an appropriate search statement and display the results. If a form represents a survey, a mailing list sign-up sheet, or a poll, you might just store the values, using the data to create a new database record (or perhaps to update an existing record).

A script that receives input over the Web and uses it to interact with MySQL generally processes the information in a series of stages:

Extract the input from the execution environment. When a request arrives that contains input parameters, the web server places the input into the environment of the script that handles the request, and the script queries its environment to obtain the parameters. It may be necessary to decode special characters in the parameters to recover the actual values submitted by the client, if the extraction mechanism provided by your API doesn't do it for you. (For example, you might need to convert %20 to space.)
Validate the input to make sure that it's legal. You cannot trust users to send legal values, so it's a good idea to check input parameters to make sure they look reasonable. For example, if you expect a user to enter a number into a field, you should check the value to be sure that it's really numeric. If a form contains a pop-up menu that was constructed using the allowable values of an ENUM column, you might expect the value that you actually get back to be one of these values. But there's no way to be sure except to check. Remember, you don't even know that there is a real user on the other end of the network connection. It might be a malicious script roving your web site, trying to hack into your site by exploiting weaknesses in your form-processing code.
If you don't check your input, you run the risk of entering garbage into your database or corrupting existing content. It is true that you can prevent entry of values that are invalid for the data types in your table columns by enabling strict SQL mode. However, there might be additional semantic constraints on what your application considers legal, in which case it is still useful to check values in your script before attempting to enter them. Also, by checking in your script, you may be able to present more meaningful error messages to users about problems in the input than the messages returned by the MySQL server when it detects bad data. For these reasons, it might be best to consider strict SQL mode a valuable additional level of protection, but one that is not necessarily sufficient in itself. That is, you can combine strict mode on the server side with client-side validation. See Section 10.20 for additional information about setting the SQL mode for strict input value checking.
Construct a statement based on the input. Typically, input parameters are used to add a record to a database, or to retrieve information from the database for display to the client. Either way, you use the input to construct a statement and send it to the MySQL server. Statement construction based on user input should be done with care, using proper escaping to avoid creating malformed or dangerous SQL statements. Use of placeholders is a good idea here.

The rest of this recipe explores the first of these three stages of input processing. Recipes Section 19.6 and Section 19.7 cover the second and third stages. The first stage (pulling input from the execution environment) has little to do with MySQL, but is covered here because it's how you obtain the information that is used in the later stages.

Input obtained over the Web can be received in several ways, two of which are most common:

As part of a get request, in which case input parameters are appended to the end of the URL. For example, the following URL invokes a PHP script named price_quote.php and specifies item and quantity parameters with values D-0214 and 60:
```
http://localhost/mcb/price_quote.php?item=D-0214&quantity=60 
```
Such requests commonly are received when a user selects a hyperlink or submits a form that specifies method="get" in the <form> tag. A parameter list in a URL begins with ? and consists of name = value pairs separated by ; or & characters. (It's also possible to place information in the middle of a URL, but this book doesn't cover that.)
As part of a post request, such as a form submission that specifies method="post" in the <form> tag. The contents of a form for a post request are sent as input parameters in the body of the request, rather than at the end of the URL.

You may also have occasion to process other types of input, such as uploaded files. Those are sent using post requests, but as part of a special kind of form element. Section 19.8 discusses file uploading.

When you gather input for a web script, you should consider how the input was sent. (Some APIs distinguish between input sent via get and post; others do not.) However, after you have pulled out the information that was sent, the request method doesn't matter. The validation and statement construction stages do not need to know whether parameters were sent using get or post.

The recipes distribution includes some scripts in the apache/params directory (tomcat/mcb for JSP) that process input parameters. Each script enables you to submit get or post requests, and shows how to extract and display the parameter values thus submitted. Examine these scripts to see how the parameter extraction methods for the various APIs are used. Utility routines invoked by the scripts can be found in the library modules in the lib directory of the distribution.

Web input extraction conventions

To obtain input parameters passed to a script, you should familiarize yourself with your API's conventions so that you know what it does for you, and what you must do yourself. For example, you should know the answers to these questions:

How do you determine which parameters are available?
How do you pull a parameter value from the environment?
Are values thus obtained the actual values submitted by the client, or do you need to decode them further?
How are multiple-valued parameters handled (for example, when several items in a checkbox group are selected)?
For parameters submitted in a URL, which separator character does the API expect between parameters? This may be & for some APIs and ; for others. ; is preferable as a parameter separator because it's not special in HTML like & is, but many browsers or other user agents separate parameters using &. If you construct a URL within a script that includes parameters at the end, be sure to use a parameter-separator character that the receiving script will understand.

Perl. The Perl CGI.pm module makes input parameters available to scripts through the param⁠(⁠ ⁠ ⁠) function. param⁠(⁠ ⁠ ⁠) provides access to input submitted via either get or post, which simplifies your task as the script writer. You don't need to know which method a form used for submitting parameters. You don't need to perform any decoding, either; param⁠(⁠ ⁠ ⁠) handles that as well.

To obtain a list of names of all available parameters, call param⁠(⁠ ⁠ ⁠) with no arguments:

@param_names = param ();

To obtain the value of a specific parameter, pass its name to param⁠(⁠ ⁠ ⁠). In scalar context, param⁠(⁠ ⁠ ⁠) returns the parameter value if it is single-valued, the first value if it is multiple-valued, or undef if the parameter is not available. In array context, param⁠(⁠ ⁠ ⁠) returns a list containing all the parameter's values, or an empty list if the parameter is not available:

$id = param ("id"); @options = param ("options");

A parameter with a given name might not be available if the form field with that name was left blank, or if there isn't any field with that name. Note too that a parameter value may be defined but empty. For good measure, you may want to check both possibilities. For example, to check for an age parameter and assign a default value of unknown if the parameter is missing or empty, you can do this:

$age = param ("age"); $age = "unknown" if !defined ($age) || $age eq "";

CGI.pm understands both ; and & as URL parameter separator characters.

Ruby. For Ruby scripts that use the cgi module, web script parameters are available through the same cgi object that you create for generating HTML elements. Its param method returns a hash of parameter names and values, so you can access this hash or get the parameter names as follows:

params = cgi.params param_names = cgi.params.keys

The value of a particular parameter is accessible either through the hash of parameter names and values or directly through the cgi object:

id = cgi.params["id"] id = cgi["id"]

The two access methods differ slightly. The params method returns each parameter value as an array. The array contains multiple entries if the parameter has multiple values, and is empty if the parameter is not present. The cgi object returns a single string. If the parameter has multiple values, only the first is returned. If the parameter is not present, the value is the empty string. For either access method, use the has_key? method to test whether a parameter is present.

The following listing shows how to get the parameter names and loop through each parameter to print its name and value, printing multiple-valued parameters as a comma-separated list:

params = cgi.params param_names = cgi.params.keys param_names.sort! page << cgi.p { "Parameter names:" + param_names.join(", ") } list = "" param_names.each do |name|   val = params[name]   list << cgi.li {              "type=#{val.class}, name=#{name}, value=" +              CGI.escapeHTML(val.join(", "))            } end page << cgi.ul { list }

The cgi module understands both ; and & as URL parameter separator characters.

PHP. Input parameters can be available to PHP in several ways, depending on your configuration settings:

If the TRack_vars variable is on, parameters are available in the $HTTP_GET_VARS and $HTTP_POST_VARS arrays. For example, if a form contains a field named id, the value will be available as $HTTP_GET_VARS["id"] or $HTTP_POST_VARS["id"], depending on whether the form was submitted via GET or POST. If you access $HTTP_GET_VARS and $HTTP_POST_VARS in a nonglobal scope, such as within a function, you must declare them using the global keyword to make them accessible.
As of PHP 4.1, parameters also are available in the $_GET and $_POST arrays if TRack_vars is on. These are analogous to $HTTP_GET_VARS and $HTTP_POST_VARS except that they are "superglobal" arrays that are automatically available in any scope. (For example, it is unnecessary to declare $_GET and $_POST with global inside of functions.) The $_GET and $_POST arrays are the preferred means of accessing input parameters.
If the register_globals variable is on, parameters are assigned to global variables of the same name. In this case, the value of a field named id will be available as the variable $id, regardless of whether the request was sent via GET or POST. This is dangerous, for reasons described shortly.

The track_vars and register_globals settings can be compiled into PHP or configured in the PHP php.ini file. track_vars is always enabled as of PHP 4.0.3, so I'll assume that this is true for your PHP installation.

register_globals makes it convenient to access input parameters through global variables, but the PHP developers recommend that it be disabled for security reasons. Suppose that you write a script that requires the user to supply a password, which is represented by the $password variable. In the script, you might check the password like this:

if (check_password ($password))   $password_is_ok = 1;

The intent here is that if the password matches, the script sets $password_is_ok to 1. Otherwise $password_is_ok is left unset (which compares false in Boolean expressions). But suppose that someone invokes your script as follows:

http://your.host.com/chkpass.php?password_is_ok=1

If register_globals is enabled, PHP sees that the password_is_ok parameter is set to 1 and sets the corresponding $password_is_ok variable to 1. The result is that when your script executes, $password_is_ok is 1 no matter what password was given, or even if no password was given! The problem with register_globals is that it enables outside users to supply default values for global variables in your scripts. The best solution is to disable register_globals, in which case you need to check the global arrays for input parameter values. If you cannot disable register_globals, take care not to assume that PHP variables have no value initially. Unless you're expecting a global variable to be set from an input parameter, it's best to initialize it explicitly to a known value. The password-checking code should be written as follows to make sure that only $password (and not $password_is_ok) can be set from an input parameter. That way, $password_is_ok is assigned a value by the script itself whatever the result of the test:

$password_is_ok = 0; if (check_password ($password))   $password_is_ok = 1;

The PHP scripts in this book do not rely on the register_globals setting. Instead, they obtain input through the global parameter arrays.

Another complicating factor when retrieving input parameters in PHP is that they may need some decoding, depending on the value of the magic_quotes_gpc configuration variable. If magic quotes are enabled, any quote, backslash, and NUL characters in input parameter values accessed by your scripts will be escaped with backslashes. I suppose that this is intended to save you a step by allowing you to extract values and use them directly in SQL statement strings. However, that's only useful if you plan to use web input in a statement with no preprocessing or validity checking, which is dangerous. You should check your input first, in which case it's necessary to strip out the slashes, anyway. This means that having magic quotes turned on isn't really very useful.

Given the various sources through which input parameters may be available, and the fact that they may or may not contain extra backslashes, extracting input in PHP scripts can be an interesting problem. If you have control of your server and can set the values of the various configuration settings, you can of course write your scripts based on those settings. But if you do not control your server or are writing scripts that need to run on several machines, you may not know in advance what the settings are. Fortunately, with a bit of effort it's possible to write reasonably general-purpose parameter extraction code that works correctly with very few assumptions about your PHP operating environment. The following utility function, get_param_val⁠(⁠ ⁠ ⁠), takes a parameter name as its argument and returns the corresponding parameter value. If the parameter is not available, the function returns an unset value. (get_param_val⁠(⁠ ⁠ ⁠) uses a helper function, strip_slash_helper⁠(⁠ ⁠ ⁠), which is discussed shortly.)

function get_param_val ($name) { global $HTTP_GET_VARS, $HTTP_POST_VARS;   $val = NULL;   if (isset ($_GET[$name]))     $val = $_GET[$name];   else if (isset ($_POST[$name]))     $val = $_POST[$name];   else if (isset ($HTTP_GET_VARS[$name]))     $val = $HTTP_GET_VARS[$name];   else if (isset ($HTTP_POST_VARS[$name]))     $val = $HTTP_POST_VARS[$name];   if (isset ($val) && get_magic_quotes_gpc ())     $val = strip_slash_helper ($val);   return ($val); }

To use this function to obtain the value of a single-valued parameter named id, call it like this:

$id = get_param_val ("id");

You can test $id to determine whether the id parameter was present in the input:

if (isset ($id))   ... id parameter is present ... else   ... id parameter is not present ...

For a form field that might have multiple values (such as a checkbox group or a multiple-pick scrolling list), you should represent it in the form using a name that ends in []. For example, a list element constructed from the SET column accessories in the cow_order table has one item for each allowable set value. To make sure PHP treats the element value as an array, name the field accessories[], not accessories. (See Section 19.3 for an example.) When the form is submitted, PHP places the array of values in a parameter named without the []. To access it, do this:

$accessories = get_param_val ("accessories");

The value of the $accessories variable will be an array, whether the parameter has multiple values, a single value, or even no values. The determining factor is not whether the parameter actually has multiple values, but whether you named the corresponding field in the form using [] notation.

The get_param_val⁠(⁠ ⁠ ⁠) function checks the $_GET, $_POST, $HTTP_GET_VARS, and $HTTP_POST_VARS arrays for parameter values. Thus, it works correctly regardless of whether the request was made by GET or POST, or whether register_globals is enabled. The only thing that the function assumes is that track_vars is enabled.

get_param_val⁠(⁠ ⁠ ⁠) also works correctly regardless of whether magic quoting is enabled. It uses a helper function strip_slash_helper⁠(⁠ ⁠ ⁠) that performs backslash stripping from parameter values if necessary:

function strip_slash_helper ($val) {   if (!is_array ($val))     $val = stripslashes ($val);   else   {     foreach ($val as $k => $v)       $val[$k] = strip_slash_helper ($v);   }   return ($val); }

strip_slash_helper⁠(⁠ ⁠ ⁠) checks whether a value is a scalar or an array and processes it accordingly. The reason it uses a recursive algorithm for array values is that in PHP it's possible to create nested arrays from input parameters.

To make it easy to obtain a list of all parameter names, use another utility function:

function get_param_names () { global $HTTP_GET_VARS, $HTTP_POST_VARS;   # construct an array in which each element has a parameter name as   # both key and value.  (Using names as keys eliminates duplicates.)   $keys = array ();   if (isset ($_GET))   {     foreach ($_GET as $k => $v)       $keys[$k] = $k;   }   else if (isset ($HTTP_GET_VARS))   {     foreach ($HTTP_GET_VARS as $k => $v)       $keys[$k] = $k;   }   if (isset ($_POST))   {     foreach ($_POST as $k => $v)       $keys[$k] = $k;   }   else if (isset ($HTTP_POST_VARS))   {     foreach ($HTTP_POST_VARS as $k => $v)       $keys[$k] = $k;   }   return ($keys); }

get_param_names⁠(⁠ ⁠ ⁠) returns a list of parameter names present in the HTTP variable arrays, with duplicate names removed if there is overlap between the arrays. The return value is an array with its keys and values both set to the parameter names. This way you can use either the keys or the values as the list of names. The following example prints the names, using the values:

$param_names = get_param_names (); foreach ($param_names as $k => $v)   print (htmlspecialchars ($v) . "<br />\n");

To construct URLs that point to PHP scripts and that have parameters at the end, you should separate the parameters by & characters. To use a different character (such as ;), change the separator by means of the arg_separator configuration variable in the PHP initialization file.

Python. The Python cgi module provides access to the input parameters that are present in the script environment. Import that module, and then create a FieldStorage object:

import cgi params = cgi.FieldStorage ()

The FieldStorage object contains information for parameters submitted via either GET or POST requests, so you need not know which method was used to send the request. The object also contains an element for each parameter present in the environment. Its key⁠(⁠ ⁠ ⁠) method returns a list of available parameter names:

param_names = params.keys ()

If a given parameter, name, is single-valued, the value associated with it is a scalar that you can access as follows:

val = params[name].value

If the parameter is multiple-valued, params[name] is a list of MiniFieldStorage objects that have name and value attributes. Each of these has the same name (it will be equal to name) and one of the parameter's values. To create a list containing all the values for such a parameter, do this:

val = [] for item in params[name]:   val.append (item.value)

You can distinguish single-valued from multiple-valued parameters by checking the type. The following listing shows how to get the parameter names and loop through each parameter to print its name and value, printing multiple-valued parameters as a comma-separated list. This code requires that you import the types module in addition to the cgi module.

params = cgi.FieldStorage () param_names = params.keys () param_names.sort () print "<p>Parameter names:", param_names, "</p>" items = [] for name in param_names:   if type (params[name]) is not types.ListType:  # it's a scalar     ptype = "scalar"     val = params[name].value   else:                                         # it's a list     ptype = "list"     val = []     for item in params[name]:   # iterate through MiniFieldStorage       val.append (item.value)   # items to get item values     val = ",".join (val)        # convert to string for printing   items.append ("type=" + ptype + ", name=" + name + ", value=" + val) print make_unordered_list (items)

Python raises an exception if you try to access a parameter that is not present in the FieldStorage object. To avoid this, use has_key⁠(⁠ ⁠ ⁠) to find out if the parameter exists:

if params.has_key (name):   print "parameter " + name + " exists" else:   print "parameter " + name + " does not exist"

Single-valued parameters have attributes other than value. For example, a parameter representing an uploaded file has additional attributes you can use to get the file's contents. Section 19.8 discusses this further.

The cgi module expects URL parameters to be separated by & characters. If you generate a hyperlink to a Python script based on the cgi module and the URL includes parameters, don't separate them by ; characters.

Java. Within JSP pages, the implicit request object provides access to the request parameters through the following methods:

getParameterNames⁠(⁠ ⁠ ⁠): Returns an enumeration of String objects, one for each parameter name present in the request.
getParameterValues(String name): Returns an array of String objects, one for each value associated with the parameter, or null if the parameter does not exist.
getParameterValue(String name): Returns the first value associated with the parameter, or null if the parameter does not exist.

The following example shows one way to use these methods to display request parameters:

<%@ page import="java.util.*" %> <ul> <%   Enumeration e = request.getParameterNames ();   while (e.hasMoreElements ())   {     String name = (String) e.nextElement ();     // use array in case parameter is multiple-valued     String[] val = request.getParameterValues (name);     out.println ("<li> name: " + name + "; values:");     for (int i = 0; i < val.length; i++)       out.println (val[i]);     out.println ("</li>");   } %> </ul>

Request parameters are also available within JSTL tags, using the special variables param and paramValues. param[name] returns the first value for a given parameter and thus is most suited for single-valued parameters:

color value: <c:out value="${param['color']}"/>

paramValues[name] returns an array of values for the parameter, so it's useful for parameters that can have multiple values:

accessory values: <c:forEach items="${paramValues['accessories']}" var="val">   <c:out value="${val}"/> </c:forEach>

If a parameter name is legal as an object property name, you can also access the parameter using dot notation:

color value: <c:out value="${param.color}"/> accessory values: <c:forEach items="${paramValues.accessories}" var="val">   <c:out value="${val}"/> </c:forEach>

To produce a list of parameter objects with key and value attributes, iterate over the paramValues variable:

<ul>   <c:forEach items="${paramValues}" var="p">     <li>       name:       <c:out value="${p.key}"/>;       values:       <c:forEach items="${p.value}" var="val">         <c:out value="${val}"/>       </c:forEach>     </li>   </c:forEach> </ul>

To construct URLs that point to JSP pages and that have parameters at the end, you should separate the parameters by & characters.