User Input Attacks | Designing Secure Web-Based Applications for Microsoft Windows 2000 with CDROM

[Previous] [Next]

Most Web applications take user input, manipulate the data, and create results based on the information. The most common way to gather user input in a Web application is with a Web form, which uses the <FORM> tag. However, few Web developers perform stringent validation of the posted data. The most common user input attacks take these forms:

Posting HTML, script, or specially crafted input to the server

Posting large amounts of data to the server

Let's look at both types of user input attack in turn.

Posting HTML, Script, or Specially Crafted Input to the Server

You must be cautious if your application uses any user input to create results that are displayed in a browser. You should inspect all user input before using it for output or before using it to access requested system resources. You might have the best firewalls and perimeter defenses, but HTTP and HTML traffic travels right through those defenses—they don't protect you from malicious user input.

An Important Rule of Thumb
All user input is bad until proven otherwise!

A good example of an application that might be vulnerable to a user input attack is a guest book application. There are two main uses for a guest book application: to gather user input and to display the results for other users to see. An attacker can easily look at the source code (in Microsoft Internet Explorer, you right-click the page and then choose View Source from the context menu) of the data-gathering form to determine what sort of input the application expects and then attempt to post nonstandard data to the server. Any form that accepts text is a potential target for posting of HTML or script.

Let's look at a very common, real-life example. The attacker goes to the data entry page at www.exair.com/comments.html and looks at the source code. He sees that there are three fields: name, address, and comments. Each of these is a potential target. For fun, he enters his name as <!--META HTTP-EQUIV="REFRESH"CONTENT="1" URL="www.advworks.com"> and clicks the Submit button, which posts the data to the Exploration Air Web server. After one second, any user viewing this person's "name" is whisked away from www.exair.com to www.advworks.com!

How did this work? It's actually very simple. When the comments were redisplayed, the tag he entered was executed by the client's browser and caused the browser to go to www.advworks.com after one second. This is a simple but irritating attack because no one can read the comments.

The easiest way to protect against this kind of attack is to parse all user input as it comes to the server, as described in the following sections. To restate the simple rule you should always follow: never use user input as output without checking it first.

Regular Expressions in VBScript and Microsoft JScript
The following code samples use the RegExp object built into version 5 of the VBScript and JScript script engines. RegExp lets you use a search string—known as a regular expression—to search for or replace a pattern of text in a string. You should use RegExp—which debuted in late 1999—in your ASP files to parse and process user input before you use that input to query databases or to display data.

You'll recognize the syntax if you're familiar with Perl or grep regular expressions. The characters are described in Table 12-8.

Table 12-8. Abridged VBScript and JScript RegExp syntax.

RegExp Character	Comments
\	Marks the next character as a special character or a literal. For example, "n" matches the character n. "\n" matches a newline character. The sequence "\\" matches \, and "\(" matches (.
^	Matches the beginning of the input.
$	Matches the end of the input.
*	Matches the preceding character zero or more times. For example, "mi"* matches m or michael.
+	Matches the preceding character one or more times. For example, "mi+" matches michael but not m.
?	Matches the preceding character zero or one times. For example, "a??ve" matches alive.
{n}	Matches a pattern exactly n times.
{n,m}	Matches a pattern at least n times and no more than m times.
.	Matches any single character except a newline character.
(pattern)	Matches the pattern and remembers the match in the Matches collection.
x\|y	Matches x or y. For example, "g\|food" matches g or food. "(g\|f)ood" matches good or food.
[xyz]	A character set. Matches any one of the enclosed characters. For example, "[abc]" matches the a in plain.
[^xyz]	A negative character set. Matches any character not enclosed. For example, "[^abc]" matches the p in plain.
[a-z]	A range of characters. Matches any character in the specified range. For example, "[a-z]" matches any lowercase alphabetic character in the range a through z.
[^m-z]	A negative range of characters. Matches any character not in the specified range. For example, "[m-z]" matches any character not in the range m through z.
\b	Matches a word boundary (the position between a word and a space). For example, "er\b" matches the er in never but not the er in verb.
\B	Matches a nonword boundary. For example, "ear\B"* matches the ear in never early.
\d	Matches a digit character. Equivalent to [0-9].
\D	Matches a nondigit character. Equivalent to [^0-9].
\n	Matches a newline character.
\r	Matches a carriage return character.
\s	Matches any white space, including space, tab, and form-feed. Equivalent to "[ \f\n\r\t\v]".
\S	Matches any non-white space character. Equivalent to "[^ \f\n\r\t\v]".
\w	Matches any word character, including the underscore. Equivalent to "[A-Za-z0-9_]".
\W	Matches any nonword character. Equivalent to "[^A-Za-z0-9_]".
\xn	Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, "\x41" matches A. "\x041" is equivalent to \x04 and 1. Allows ASCII codes to be used in regular expressions.

Stripping a string of all characters other than 0-9, a-z, A-Z, and _

In some cases, you know you're dealing with a limited set of characters—for example, names or country names. In this case, you should remove all unnecessary characters from the string, as shown here:

 Set reg = New RegExp reg.Pattern = "\W+" ' Characters that are NOT 0-9, ' a-z, A-Z, or _. strUnTainted = reg.Replace(strTainted, "")

Stripping all text after pipe or redirect operators

Attackers often try to use pipe (|) or redirect (< and >) operators to invoke other programs after some data is processed, especially on UNIX platforms. For example, say that an ASP file takes a user's age as input and uses it to build an SQL query such as select * from useraccounts where age = <%=age%>. This code is flawed because if an attacker enters her name as 34 | someprogram.exe, the following SQL is executed: select * from useraccounts where age = 34 | someprogram.exe, which executes someprogram.exe.

The following code strips out the < and > characters and has the side effect of rendering HTML tags useless.

 Set reg = New RegExp reg.Pattern = "^(.+)\<|\>|\|(.+)" strUnTainted = reg.Replace(strTainted, "$1")

As a side note, if you want to specifically search for and remove HTML tags, don't search for them using the following expression:

 ' INCORRECT HTML parsing code. Set reg = New RegExp reg.Pattern = "\<.*\>" ' A < character followed by zero or ' more characters then a > character. StrUnTainted = reg.Replace(strTainted, "")

This expression works only if the post contains single lines of HTML—in other words, if the opening < and closing > are on the same line. However, it's legal HTML for these two characters to appear on different lines, so you must adjust the parsing code to search all the lines as a single unit, not separately:

 ' CORRECT HTML parsing code. Set reg = New RegExp reg.Global = True reg.Pattern = "\<.*\>" ' A < character followed by zero or ' more characters then a > character. StrUnTainted = reg.Replace(strTainted, "")

Stripping invalid text from a filename

Windows 2000 has a function called CreateFile for creating and opening files. However, in Windows 2000 a file is more than a sequence of bytes on a disk drive; it can also be a printer or a serial port. If you use the FileSystem object (which uses CreateFile) from an ASP page, you should be sure that any data passed to you from a client that is used as part of the filename is valid and not the name of a printer or a serial port.

The following JScript code strips invalid filenames from a string called strIn:

 var strOut = strIn.replace(/(AUX|PRN|NUL|COM\d|LPT\d)+\s*$/i, "");

This script removes all instances of strings matching AUX, PRN, NUL, COMn, and LPTn (where n is a number) and possibly followed by spaces.

A twist on malicious input: script injection and cross-site scripting

An attack on a Web server doesn't always affect the Web server itself, but might use the Web site as a catalyst for attacking other unsuspecting Web users. This type of attack relies on servers that create dynamic output based on user input. A Web site that is a victim of this attack creates malicious content for other unsuspecting Web surfers and makes them vulnerable to any number of attacks.

Let's say that Attacker A wants to attack User B. Here's how the attack works:

Attacker A finds a vulnerable Web site C, which takes script as input and includes the script in the output. For example, the Web site might take a request such as www.exair.com/search.asp?searchfor=code, where code is specially formatted code that will run in a client's browser. As you'll see, code will be sent to User B's browser because Web site C puts the input data into the output data without first checking its validity.

Attacker A sends User B an anonymous e-mail or a posting in a news forum that contains the specially formatted URL for User B to click. The URL contains malicious client code pointing to Web site C.

When User B clicks on the URL, his or her Web browser loads and attempts to connect to the URL.

Web site C receives the request and evaluates the text after the ? in the URL.

Web site C creates a dynamic page that includes the results of the search and the initial data used in the search. It's common practice for search engines to include the search string in the output to the user.

User B's Web browser loads the dynamically generated HTML, but when it comes time to display the search string, the browser executes the code in the security context of User B. The code might do any of a number of things, such as update cookies on User B's machine (using JavaScript's document.cookie object) so that the attacks continue for a long time.

As you can see, it's utterly imperative that your Web sites parse all user input. Some user input might be an attempt to attack you, while other input might be an attempt to attack others and look like it came from you!

Reviewing ASP code for vulnerability

You can take the following steps to make sure that your ASP code isn't a catalyst in this type of attack:

Look for all code that generates dynamic output—most notably, Response.Write and <%= %> tags.

Determine whether the output includes input parameters from sources such as

Request.Form

Request.QueryString

Databases

Request.Cookies

Session

You must fix any code that creates output based on untested input, including

Code that uses regular expressions (as described earlier)

Code that uses Server.HTMLEncode to encode input parameters

Code that uses Server.URLEncode to encode URLs received as input parameters

For more information about checking your ASP code, see Knowledge Base article Q253119 at support.microsoft.com/support/kb/articles/q253/1/19.asp .You can also find more information about this type of attack at www.microsoft.com/technet/security/crssite.asp and www.microsoft.com/technet/security/CSOverv.asp.

Some useful regular expressions

You can use the expressions listed in Table 12-9 in your code to check the validity of user input using the RegExp.Test method. For example, the following VBScript code checks a number:

 Set reg = New RegExp reg.Pattern = "\d+" ' One or more 0-9's. bIsOK = reg.Test(strTainted)

Table 12-9. Common regular expressions.

Input Type	Regular Expression
U.S. telephone number Example: (abc) def-ghij	^$\d{3}$ \d{3}-\d{4}$
U.S. zip code Example: 98006 or 98006-9111	^\d{5}(-\d{4}){0,1}$
Dollar amount Example: $34.12, 34.12, $34, or 34	^\${0,1}\d+(\.\d\d){0,1}$
IP address	/^([01]?\d?\d\|2[0-4]\d\|25[0-5])\. ([01]?\d?\d\|2[0-4]\d\|25[0-5])\. ([01]?\d?\d\|2[0-4]\d\|25[0-5])\. ([01]?\d?\d\|2[0-4]\d\|25[0-5])\$

IMPORTANT
If you cannot use regular expressions to verify user input, you should consider using the <PRE> tag to display any output based on unparsed user input. Text displayed using this tag is not interpreted by the browser; it is displayed. <PRE> is often used to display HTML tags in Web pages for educational reasons.

Posting Large Amounts of Data to the Server

Look at this HTML form code:

 <form method=POST action=Comments.asp enctype="application/x-www-form-urlencoded"> <table border=0 cellspacing=0 cellpadding=0> <tr> <td width=131 valign=top>Name:</td> <td width=288 valign=top> <INPUT TYPE="TEXT" MAXLENGTH="64" SIZE="49" NAME="txtName"> </td> </tr> <!-- Other fields, including address and comment --> <tr> <td><INPUT TYPE="SUBMIT" METHOD="POST"></td> </tr> </table> </form>

Your browser prevents you from entering more than 64 characters of data into the name field. Unfortunately, many Web applications do no other data-size checking because they rely on the browser to do the work. An attacker can quite easily "handcraft" an HTML post to include a very large name—say, 64,000 characters long. If the data is unchecked by the server, all kinds of failures can occur, such as database errors if the data is written to a database or access violations if the data is used by a C++ COM component that has a fixed 64-byte array.

The following Perl code is an example of an attack of this kind:

 use HTTP::Request::Common qw(POST); use HTTP::Headers; use LWP::UserAgent; # Create a new, invalid request. $ua = LWP::UserAgent->new(); $req = POST 'http://www.exair.com/comment.asp', [ Name => 'A' x 64000, Address => 'B' x 8192, Comment => 'C' x 8192 ]; $res = $ua->request($req); # Display the error from the server. $err = $res->status_line; print "$err "; # If interested, display the body of the request. print "Display body? (y/N)"; print $res->as_string if (getc == 'y');

The line that causes the damage is $req = POST. It builds up a post with long name (64,000 As), address (8192 Bs), and comment (8192 Cs) fields.

You should always check the size of posted data in your server-side code before you use it as input (to a database, for example). The following sample ASP code performs this task:

 If Len(Request.Form("Address")) >= 128 Then Response.Write "Address is too long. Try again." Else ' Normal processing. End If

Input Type	Regular Expression
U.S. telephone number Example: (abc) def-ghij	^\(\d{3}\) \d{3}-\d{4}$
U.S. zip code Example: 98006 or 98006-9111	^\d{5}(-\d{4}){0,1}$
Dollar amount Example: $34.12, 34.12, $34, or 34	^\${0,1}\d+(\.\d\d){0,1}$
IP address	/^([01]?\d?\d\|2[0-4]\d\|25[0-5])\. ([01]?\d?\d\|2[0-4]\d\|25[0-5])\. ([01]?\d?\d\|2[0-4]\d\|25[0-5])\. ([01]?\d?\d\|2[0-4]\d\|25[0-5])\$