The basic logic of review code for HTML scripting attacks is as follows :
Identify all places content is returned to the Web browser or where a client application writes data to the file system (for script injection in local files).
Check whether the output could include attacker-supplied data.
If attacker-supplied data is returned, verify that it is properly validated and/or encoded before being returned.
It is usually recommended for most security issues that you start where attacker-supplied data enters the application and follow it all the way through. As you can see, the preceding approach is the reverse of that. Although to make a comprehensive security pass we still recommend using the approach of starting at the point where attacker data enters the application, but because HTML scripting attacks are a problem related to output, starting with the output and working backward is both effective and efficient.
To accomplish the first step in the code review, you must understand which functions are used to return data to the Web browser or file system. The following table shows common functions for returning data to the Web browser.
Language | Function |
---|---|
ASP | Response.Write Response.BinaryWrite <%=strVariable%> |
PHP | echo print printf <?=$variable?> |
Now that you have identified all of the code that returns data to the browser or the file system, you must determine whether it includes attacker-supplied data. There is no HTML scripting threat if the output cannot contain attacker-supplied data. Common ways to obtain data from an attacker include HTTP form variables and data from the database (where an attacker s data might have been previously stored). The functions shown in the following table are commonly used to read attacker-supplied input.
Language | Function |
---|---|
ASP | Form("variable") Request.Form("variable") Request.QueryString("variable") Request.ServerVariables("QUERY_STRING") recordSet("columnName") |
PHP | $_GET['variable'] $_POST['variable'] $_REQUEST['variable'] $HTTP_POST_VARS['variable'] Server("QUERY_STRING"); msql_query mysql_query sybase_query |
Once you have identified places where data can be specified by an attacker and returned to the victim (HTML returned from the server or local files for client applications), you must verify that the code validates and/or encodes the data to avoid allowing script to run. Sometimes this is straightforward, but not always. It really depends on the application. The following is a simple example of an XSS bug present in an ASP page and the equivalent code in PHP.
ASP | Response.Write "Hello, " + Form(" name ") + "! Nice to meet you." |
PHP | echo "Hello, ", $_GET['name'],"! Nice to meet you."; |
Both lines of code take untrusted user input from the URL as a GET parameter named name and then echo it back to the Web browser without validating it. By searching through code to find lines that contain common output and input functions, you can quickly find bugs like this. However, this approach will not work with slight variations that accomplish the same effect. For example, the output of the following code is equivalent to the preceding example except the input is retrieved on a line separate from where the output is generated.
ASP | username = Form("name") Response.Write "Hello, " + username + "! Nice to meet you." |
PHP | $userName = $_GET['name']; echo "Hello, ", $_GET['name'],"! Nice to meet you."; |
Once you examine the code, it is easy to determine that userName is untrusted data coming in as a GET parameter. As you might suspect, tracing backward through the code to determine whether the origin is attacker controlled is very common in an XSS code review.
Remember that validating and/or encoding attacker-supplied data doesn t always prevent HTML scripting attacks. It is important to verify the correct protection is in place. Sometimes this is easy to spot; other times, using knowledge of the code to generate test cases proves effective. The following lines of code incorrectly encode the attacker-controlled input before returning it.
ASP | Response.Write "<INPUT type='text' name='username' value=' " + Server.HtmlEncode(Form("username"), "'>" |
PHP | echo "<INPUT type='text' name='username' value=' ", strip_tags ($_GET['name']), "'>"; |
The PHP example removes HTML tags from the input by using the strip_tags function. The ASP example HTML-encodes the data. However, because both strip_tags and HTMLEncode allow single quotation marks to be returned, and because the attacker s data is enclosed in single quotation marks, an attacker can close off the value tag with a single quotation mark and inject an attribute of choice. For example, the URL http://server/test.php?name='%20onclick=alert(document.domain);// runs script when the victim clicks the input text control returned on the Web page.
Table 10-4 shows common encoding functions and how each modifies the data passed to it.
Language | Function | Description |
---|---|---|
ASP | HtmlEncode | Modifies angle brackets (< >), quotation marks ("), and the ampersand (&) to their corresponding HTML entities (<, >, ", and &, respectively). This function does not modify single quotation marks. |
UrlEncode | Encodes all nonalphanumeric characters except for the hyphen (-) and underscore (_). For example, characters such as the question mark (?), ampersand (&), forward slash (/), quotation marks ("), and colon (:) are returned as %3f, %26, &3f, %22, and %3a, respectively. This is the encoding described in RFC 1738. | |
PHP | htmlspecialchars / htmlentities | Encodes <, >, and & like ASP s HtmlEncode . Single and double quotation marks can be encoded depending on the flags passed in. See http://us2.php.net/manual/en/function.htmlspecialchars.php for more information. |
Rawurlencode | Same as ASP s UrlEncode . | |
Urlencode | Same as Rawurlencode except spaces are substituted with the plus sign ( ). | |
strip_tags | Modified HTML tags. For more information, see http://us2.php.net/manual/en/function.strip-tags.php . |
Classic ASP and PHP both require the programmer to generate all of the HTML output by hand (either in static HTML or code-generated output). ASP.NET has the notion of form controls. Creating an ASP.NET Web page is similar to creating a Windows application.
Anycontrols the programmer wishes to use are placed on the Web page and assigned a name. Each control has properties associated with it. For example, a text box has a property called text . Instead of printing out the HTML tag for the text box with the value set, the programmer only needs to set the value of the text box on the server. For example, the following code sets the value of a text box:
this.txtBox.Text = Request.Form["name"];
If the input form contained "Tom", ASP.NET generates the following HTML when the page is displayed:
<INPUT type= "text" name= "txtBox" value= "Tom">
At first glance, this appears to be an XSS bug because the programmer isn t encoding the value before setting it as the text property of the text box. However, ASP.NET automatically HTML-encodes the text value of this form control before returning it. This prevents the XSS bug.
Not all ASP.NET controls automatically encode data. Sometimes developers introduce cross-site scripting bugs because they believe that all controls encode. For example, we found several XSS bugs where the developer believed the text property of the label control automatically encoded the data, which it doesn t. The Excel spreadsheet included on the companion Web site lists many common ASP.NET controls, their properties, and whether the property is automatically encoded. Use this reference when code reviewing ASP.NET code for cross-site scripting bugs.