Dangers Presented by Remote Users | MySQL and Perl for the Web

only for RuBoard - do not distribute or recompile

Web applications typically respond to information provided by clients. However, writing an application that is driven by client input allows the client to control, at least to some extent, how the application works. This is one basis for Web attacks a client provides input that makes your application behave in a way you did not anticipate and did not intend. Generally, the input will be something designed to cause you to expose more information than you want disclosed, or something designed to cause your Web or database server to malfunction or crash.

Effective prevention of such attacks requires that you be aware of how they can occur so that you can prepare for them in advance. Bad input doesn t just come from malicious clients trying to make your life miserable. It can also come from innocent users who simply enter the wrong thing by mistake, Nevertheless, if you re not prepared, the result is the same either way: a misbehaving application that does something other than what you want.

Sources of Client Input

Input presented by clients to your applications is not necessarily trustworthy, no matter what form it takes.These forms are as follows:

Information supplied through cookies. Normally, cookies are an invisible part of the client-server interaction as far as the user is concerned. However, cookies are maintained by the browser on the user s local disk in a file. The file can be edited to manipulate cookie contents, so cookies are just as subject to compromise as any other kind of information received from the client.
Parameters supplied in a URL. For example:
```
 http://www.snake.net/cgi-perl/some_app.pl?param1=x;param2=y;param3=z 
```
Applications often generate these URLs themselves. (For example, the real-estate search applications in Chapter 7, Performing Searches, generate URLs for going to different pages of a search result or for displaying individual listings.) But obviously anyone can invoke your script by entering a URL into their browser manually and typing whatever they want for the parameter values. You cannot therefore assume the parameters will contain only sane values such as your application itself might provide.
Parameters received from a form. These are subject to mischief through a procedure that I ll describe shortly. A malicious user can use this procedure to subvert fields that normally cannot be modified (such as hidden fields) or fields that otherwise provide only the choices you supply when you generate the form (such as radio buttons or check boxes). This means you can t trust the value of any field to be what you expect.

Illegitimate Manipulation of Input Data

To demonstrate the point that you have reason to be suspicious of any information you receive from a client, I ll describe briefly how to subvert a form. This will demonstrate that the only thing you really know about a form is what you send to the client. You know nothing about what happens in the interval between your sending the form to the client and the client returning it with the fields filled in, and it s possible to get back something quite different than what you expect.

When you send a form to a browser, your expectation is that the user will fill in text fields, make selections from radio buttons, check boxes, or lists, and so forth. You may also include hidden fields containing information that the client is not supposed to modify. When the user submits the form, you obviously wouldn t trust text fields of a form to contain only legal values, because users can type whatever they want into them. That s why you routinely verify such fields against content or length constraints. Other types of fields seem to offer better security prospects. Hidden fields have values that are assigned by you when you generate the form, not by the user. And structured form elements such as radio buttons, check boxes, and lists enable the user to select only those choices you provide explicitly when you create the form. Therefore, you might think, Do I need to check such fields, given that their values are guaranteed to be legal? If there really were any such guarantee, you wouldn t. But the assumption inherent in that question is flawed; there is no guarantee. These types of form elements are useful for users because they make it more convenient to fill in forms, but they provide no assurance about the integrity of the data you receive back. The sense of security these fields offer script writers is entirely illusory. They are no less subject to modification than text fields.

To illustrate, suppose you have an application that conducts an online shopping session. Near the end of this process, you collect a credit card number from the customer and then display a final confirmation page asking the user to review the order and okay it. The page shows the items ordered and presents item quantities as pop-up menus to enable the customer to make final adjustments. It also contains the user s credit card number in a hidden field. That way, when the user submits the form to confirm the order, you can get all the information you need to process the order by extracting the parameters from the form. For example, if I m ordering three boxes of light bulbs, the confirmation page you present to me may include a form that looks something like this (for brevity, I ve omitted fields for information such as shipping address):

 <form method="post" action="http://www.snake.net/cgi-perl/order_confirm.pl">  <input type="hidden" name="credit_card" value="012345678901">  <input type="hidden" name="item1" value="light bulbs">  Light bulbs (box):  <select name="quantity">  <option value="1">1</option>  <option value="2">2</option>  <option value="3" selected>>3</option>  <option value="4">4</option>  </select>  <input type="submit" name="choice" value="Confirm">  </form>

If I want to subvert your application so that you send me a hundred boxes of light bulbs and bill someone else for them, here s the procedure I follow:

I save the confirmation page displayed in my browser to a file on my local disk.

I open the file in a text editor, and then modify it by changing the credit_card field value to a different number (perhaps one that I ve stolen from someone else). I also add an option to the quantity pop-up menu for 100 units and select it:

 <form method="post" action="http://www.snake.net/cgi-perl/order_confirm.pl">  <input type="hidden" name="credit_card" value="109876543210">  <input type="hidden" name="item1" value="light bulbs">  Light bulbs (box):  <select name="quantity">  <option value="1">1</option>  <option value="2">2</option>  <option value="3">3</option>  <option value="4">4</option>  <option value="100" selected>>100</option>  </select>  <input type="submit" name="choice" value="Confirm">  </form>

I save the file, reload it into my browser, and select the Submit button to send the form back to your application.
Your application processes my order. I get 100 boxes of light bulbs, and somebody else gets the bill.

What I ve just described is one way an application can be attacked, but it s by no means the only way. A script may find itself invoked from entirely unexpected sources, and being fed input that bears no relation to what it expects. If I want to send you bad information, the preceding example discusses one way I can break the relationship between the form an application sends and the form it gets back. But I don t have to modify a form you send me; I can write my own form and point it at your script. Furthermore, although I d include fields with the same names as the parameters your application expects to receive, I don t need to use the same types of fields you use in your own form. If I make them text-input fields, I can submit any value whatsoever for any parameter. For example, I need not go through the trouble of actually running your order_confirm.pl application to get to the confirmation page. I can write a simple Web page containing a form that looks like this:

 <form method="post" action="http://www.snake.net/cgi-perl/order_confirm.pl">  Credit card number:  <input type="text" name="credit_card">  Item:  <input type="text" name="item1">  Quantity:  <input type="text" name="quantity">  <input type="submit" name="choice" value="Confirm">  </form>

This form points to your script and it has the same field names as a legitimate form generated by your application. But the contents are fully editable directly from my browser window. I can select any item, in any quantity, and bill it to someone else. All I have to do is load this page into my browser, fill in the fields, press the Submit button, and the contents of the form get sent to your script. In this case, I m forcing your script to process the contents of my form a form over which you have no control and about which you can make no assumptions.

The implication of this is that no type of form element affords any measure of protection at all. You may as well consider all fields in any form you generate as completely editable, because in fact anyone can write an attack form that treats them that way. (You might even say that the real danger posed by hidden fields and list fields is that form designers come to think their values are safe from tampering and can be used safely.)

Form input need not come even from a form at all. Form input really is just data encoded in a particular way and sent to your Web server. Anyone can write a script that encodes false information so that it appears to originate from a form, then opens a connection to your Web server to invoke your script and shoves the information at it.Writing a program to generate information that looks like a form submission is one way to make it easier to perform automated attacks. There s no need to fill in a form manually; just run the program repeatedly.

In summary, the preceding discussion illustrates several principles to keep in mind when you re writing applications that process form input:

There is no necessary relationship between the form you send to a client and the form you get back.
There is no necessary relationship between the type of form elements you use to collect information and the way the information actually is collected. Don t think that hidden fields or structured fields that provide a fixed set of options offer any guarantee of integrity on the values you ll get back.
Your application may normally expect to be invoked by forms that it generates itself, but other people can point their own forms at your script, or invoke it without using a form at all.

Responding to User Input

Given the extent of the potential for subversion of input by users, should you just throw up your hands and declare that it s not worth trying to write any applications at all? Well, no. The point is not that you should fall into deep despair. The point is that if you want to enforce some constraints on the values of form elements, but you re not willing to trust the client (and there is little reason that you should), you need to enforce the constraints yourself on the server side after the form has been submitted. You can t trust what you get from the client, so you must validate the values of all fields for which you expect a particular kind of information.

In some cases, not much verification may be needed. For example, if you have a search form with a sex field represented by a pop-up menu containing values of male and female, you probably won t care much if someone hacks the form to submit a silly value such as bookcase. All that will accomplish is to ensure that the query makes no sense and returns no records. No harm done. Other kinds of information are relatively free format and require little or no validation. If you re collecting comments for a guestbook, you may care only whether the value is empty. If you re collecting names as well, it may not matter whether the user puts in the correct name. (Besides, if users can t be counted on to supply their own name correctly, you certainly have no way of doing so for them.) If you re running a poll and the user modifies your form to cast a vote for a non-existent candidate, it s likely that the query to update the tally will affect no records, with the result that the vote simply is discarded.

In other cases, however, validation of user input is extremely important to make sure that values are in the expected format. For example, if a field value is supposed to look like an email address, telephone number, Zip code, or credit card number, you ll want to perform some kind of pattern match. We ve developed a few functions like this in earlier chapters to help check values (such as strip_non_digits() and looks_like_email()). To augment these, you ll probably write your own arsenal of validation functions suited to the type of information you collect. You may also want to check length, not just content. (You can include a maxlength parameter in a text field, but that s easily bypassed, so you should not assume any value submitted really is within the legal length.)

Validation is particularly important when you re using input to construct queries that change the contents of your database. You want the modifications to be legal ones that make sense. Suppose you have a form-based application for updating employee records by applying a percentage pay raise to them. You ll want to verify that the raise is valid each time a form is submitted. (If you have a form element with percent increase values of 2, 4, 6, 8, and 10, make sure the user really submitted one of those values to prevent the user from giving raises of 50 or 100 percent to friends.)

Does Client-Side Validation Improve Form Integrity?

Some applications use client-side validation to check field contents before a form is returned to the server. If you have a field into which the user must enter a number, for example, you can include JavaScript code that checks the value and pops up an alert if the value is bad. This technique also has the benefit of reducing the number of round trips to the server for invalid values. That sounds helpful, and there s no particular reason not to use such techniques.

However, be sure to recognize that they can only reduce the likelihood of receiving malformed input from the client, not eliminate it. Some browsers don t handle JavaScript at all. But even for those that do, client-side validation cannot be trusted as authoritative. It is an additional check that helps honest users, but is no barrier to malicious ones. There s little you can do to be sure the client hasn t subverted the form to remove the validity-checking code. JavaScript can be hacked just like HTML, or the user can disable JavaScript entirely in the browser preferences. It s still necessary to validate the input yourself on the server side.

Detecting Form Tampering

You may not be able to keep a user from illegitimately modifying the contents of a form, but you can construct the form so that tampering is detectable. For example, if you have a hidden field containing a record ID number that you expect to come back with exactly the same value you sent to the client, you can add a checksum.

This doesn t prevent the user from changing the value, but it does enable you to detect such attempts. If in addition you want to prevent the user from being able to see the original field value, you can encrypt it. These techniques are illustrated later in the chapter (see the section Writing a Secure Application ).

For fields such as pop-up menus or check boxes that present a specific list of options to the user, you can detect tampering by making sure that the value returned to your application corresponds to a value that actually was present in the form that you sent to the client. This means the form generation and validation components of your application must agree about the set of legal values. That is pretty easy if you re processing forms using methods such as those described in Chapter 6, Automating the Form-Handling Process. For example, to present a set of radio buttons corresponding to the elements of an ENUM column my_enum in a table my_table, you can get the information you need from the table description. Suppose the column is defined like this:

 my_enum ENUM('value a','value b','value c')

The definition can be used to generate a form element. To generate a set of radio buttons, you might do something like this:

 $tbl_info = WebDB::TableInfo->get ($dbh, "my_table");  @members = $tbl_info->members ("my_enum");  print radio_group (-name => "my_enum", -values => \@members);

To verify values that users submit, use the column description again:

 $tbl_info = WebDB::TableInfo->get ($dbh, "my_table");  @members = $tbl_info->members ("my_enum");  $val = param ("my_enum");  $legal = defined ($val) && grep (/^$val$/i, @members);

$legal will be true if the value of the my_enum field is one of those listed in the my_enum column description, false otherwise.

Handling User Input Safely for Query Construction

After you ve checked the input presented to your application by the user to be sure that it makes sense, are you done? Is the information safe to use? The answer to that question is, Safe for what? If you re going to use the input to issue database queries, you must still be careful about how you construct the queries. For example, you might collect search parameters to use in a SELECT statement, or you might store form information by issuing an INSERT statement. In all such cases, you must ensure that clients cannot submit information (either mistakenly or maliciously) that causes the query to be malformed. At best, your statement will just fail with a syntax error, but certain types of input can have serious unintended consequences unless you take steps to prevent them. A SELECT that normally selects just a few rows may select an entire table if constructed incorrectly, leading to excessive server resource consumption or Web server failure. If a client can produce such a result, your machine can be tied up to such an extent that the input serves effectively as a denial-of-service attack. Worse yet, a query may destroy data. For example, a malformed DELETE or UPDATE query can change many more rows than you intend.

To prevent these problems, you must encode information into your queries properly to ensure that you don t generate malformed SQL or queries that do something other than what you want. As it happens, you already know how to do this.

Throughout the earlier chapters, we ve been using placeholders or quote() to make sure that data values are properly encoded when they are added to a query string. I haven t said all that much about why we should do this, other than to make sure that characters such as quotes or backslashes are escaped properly; now it s time to discuss how failure to encode values can really hurt you.

Suppose you run a Web site for an organization that maintains a descriptive profile of members and allows members to edit or delete their own profile using a form-based Web application. The application keeps track of the record being edited using an ID number stored in a hidden field.When a form is submitted, you collect the ID value and put it in a variable $id. If the user request is to delete the record, you might construct the DELETE statement like this by inserting the value of $id directly into the query string:

 $dbh->do ("DELETE FROM my_table WHERE id = $id");

That s fine as long as the application is used legitimately for its intended purpose. Suppose, however, that a user hacks the form to supply an id value such as 1 OR id !=1 ; in that case, the query becomes:

 DELETE FROM my_table WHERE id = 1 OR id != 1

That query is true for every record, so it completely empties the table. (Whoops! I hope you had a backup.) This illustrates that if you don t treat user input with the proper respect, people can attack your database through your own Web applications.

In this particular instance, you might have detected the hack attempt by testing the id parameter to make sure its value really is a number. But sometimes field values are free format and content checks can t be applied so readily. To make sure an input value doesn t get treated as anything other than a single data value in the query string, you should reference it using a placeholder:

 $dbh->do ("DELETE FROM my_table WHERE id = ?", undef, $id);

Alternatively, use the quote() method:

 $dbh->do ("DELETE FROM my_table WHERE id = " .. $dbh->quote ($id));

Either way, the attacker s input is encoded safely into the query string, which ends up looking like this:

 DELETE FROM my_table WHERE id = '1 OR id != 1'

MySQL will perform a string-to-number conversion on the id comparison value to produce the value 1. The result is that the query affects only the record with an ID of 1, not every record in the table. (This does not help you detect whether the user actually did modify the form it just limits the damage. Form modification detection and prevention is covered in the later section Writing a Secure Application. )

The preceding example shows how a user might attempt to damage your database by causing records to be changed or lost. A similar attack can be used in search contexts. If you have a record-retrieval application that solicits an id value for the my_table table, you might construct and execute the query like this:

 $sth = $dbh->prepare ("SELECT * FROM my_table WHERE id = $id");  $sth->execute ();  while (my $ref = $sth->fetchrow_hashref ())  {     # display record here ...  }  $sth->finish ();

Here, too, if the user submits an id value such as 1 OR id != 1 , and you put it directly into the query string, you end up with a statement that doesn t do what you want:

 SELECT * FROM my_table WHERE id = 1 OR id != 1

This query retrieves the entire contents of the table, causing your database and Web servers to devote more computing resources to processing the query than you intend. Input like this can be used as the basis for a denial-of-service attack that causes your machine to spend so much time doing useless work processing bogus queries that it cannot respond effectively to legitimate client requests. The solution, just as with the preceding DELETE query, is to change the code to use placeholders or quote(). The placeholder version looks like this:

 $sth = $dbh->prepare ("SELECT * FROM my_table WHERE id = ?");  $sth->execute ($id);  while (my $ref = $sth->fetchrow_hashref ())  {     # display record here ...  }  $sth->finish ();

Using quote(), the code looks like this:

 $id = $dbh->quote ($id);  $sth = $dbh->prepare ("SELECT * FROM my_table WHERE id = $id");  $sth->execute ();  while (my $ref = $sth->fetchrow_hashref ())  {     # display record here ...  }  $sth->finish ();

Placeholders and quote() do have their limitations; they can be used only for data values. For other query elements, they re inapplicable and you must verify the values yourself. These include the following types of information:

Database, table, or column names. These are commonly used as form elements in database-browsing applications. For example, you might provide a popup menu containing a list of table names for the user to choose from.

Comparison operators. You might use a form such as the following one that enables the user to specify an operator to be applied to a price value in a search query:

 print start_form (-action => url ()),      "Find records where the price is", br (),      radio_group (-name => "operator",                  -values => [ "<", "=", ">" ],                  -labels => {                     "<" => "less than",                      "=" => "equal to",                      ">" => "greater than"}),      br (), " the following value: ", br (),      textfield (-name => "price"),      end_form ();

Function names. Suppose you have an application that enables the user to apply a functional transform to a column of data values from a table, using a query like this:
```
 SELECT x, function_name(x) AS y FROM my_table 
```
The application would construct the SELECT query, but you might allow the user to choose the function name by means of a pop-up menu in a form like this:
```
 print start_form (-action => url ()),      "Pick a function name: ",      popup_menu (-name => "func_name",                 -values => [ "SQRT", "EXP", "LOG", "SIN", "COS" ]),      end_form (); 
```

In each of these cases, you must check for yourself that submitted values are valid. For database, table, or column names, verify that the name is legal. For the operator or function name examples, make sure the value that comes back from the user is one you actually provided as a choice.

File Upload Issues

Uploaded files are another kind of user input. CGI.pm stores each uploaded file in a temporary directory on the Web server host, and then deletes it after your script terminates. (This is why you must copy the file somewhere else if you want it to exist beyond the lifetime of your script.) CGI.pm s default file upload behavior includes two aspects that may present problems under some circumstances:

File visibility. All scripts run by the Web server normally run with the same file system access privileges unless you take steps to ensure otherwise. Therefore, any script can see temporary files created by another script if they happen to be running at the same time. This won t be a problem if you re the only one who can install Web scripts to be executed by the server, but if multiple users can do so and you don t trust them all, you may have cause for concern.
File size. It s possible for a user to upload a huge file that fills up the file system where the temporary file directory is located. Although the file will be deleted when the script terminates, that doesn t help other programs that try to write to the file system while the upload is in progress.

You can address these problems by setting CGI.pm configuration variables, which are located in the initialize_globals() routine of the CGI.pm source. To change them, you ll need to edit the installed version of CGI.pm.^[1] Three variables are relevant to file uploads:

^[1] If you do this, remember that you ll need to make your changes again each time you install a new version of CGI.pm.

$PRIVATE_TEMPFILES controls whether temporary upload files are private to the script that creates them. By default, this variable is 0 (false), so files are visible to other Web scripts that run at the same time. If you set this to a true value (such as 1), CGI.pm immediately deletes the temporary file after opening it. This is a UNIX idiom that causes the file s name to disappear from the file system (thus rendering it unavailable to other processes) but still be available to your script through the open file handle. (As you might guess from the preceding sentence, setting this variable to true has no effect under Windows.) If you do enable $PRIVATE_TEMPFILES, be aware that this causes the CGI.pm function tmpFileName() to stop working. That function returns the name of the uploaded file, but the private file mechanism causes that name to be gone by the time you d call tmpFileName(). Normally, this shouldn t be a problem because your script still has access to the file through the file handle. The CGI.pm documentation deprecates use of tmpFileName(), anyway.
$POST_MAX controls the maximum size of uploaded files. The default value is 1 (no limit); set it to a positive value to constrain file sizes to that many bytes.
$DISABLE_UPLOADS controls whether file uploads are allowed at all. By default, the value is false; set it to true to prevent uploads.

If you edit the CGI.pm source to enable these restrictions globally, scripts that need to override them can do so on an individual basis. Suppose you set the variables in CGI.pm as follows to allow uploads but make temporary files private and limited to one kilobyte in size:

 $DISABLE_UPLOADS = 0;      # allow uploads  $PRIVATE_TEMPFILES = 1;    # make temporary files private  $POST_MAX = 1024;          # limit file size to 1KB

A script can make files public (perhaps so it can use tmpFileName()) and set the maximum posting size to 1 megabyte by resetting the variables like this:

 $CGI::PRIVATE_TEMPFILES = 0;    # files not private  $CGI::POST_MAX = 1024 * 1024;   # limit size to 1MB

If $PRIVATE_TEMPFILES is enabled globally, a script can disable it only for its own temporary files. In other words, another user s script cannot gain access to your script s temporary files by setting the variable false. That s small comfort, though. If you save the temporary file somewhere more permanent, the permanent file may be visible to any other Web script that knows where to look, whether the script was written by you or by someone else. The reason for this is explained in the section Dangers from Other Users with Apache Access.

Using $POST_MAX To Limit Text Field Input

The $POST_MAX variable actually applies not just to a single uploaded file, but to the combined size of all the elements in a form. Therefore, you can set this variable to keep people from pasting huge amounts of junk into text fields, too.

only for RuBoard - do not distribute or recompile