Input Validation | Improving Web Application Security: Threats and Countermeasures

Input validation is a challenging issue and the primary burden of a solution falls on application developers. However, proper input validation is one of your strongest measures of defense against today's application attacks. Proper input validation is an effective countermeasure that can help prevent XSS, SQL injection, buffer overflows, and other input attacks.

Input validation is challenging because there is not a single answer for what constitutes valid input across applications or even within applications. Likewise, there is no single definition of malicious input. Adding to this difficulty is that what your application does with this input influences the risk of exploit. For example, do you store data for use by other applications or does your application consume input from data sources created by other applications?

The following practices improve your Web application's input validation:

Assume all input is malicious .
Centralize your approach .
Do not rely on client-side validation .
Be careful with canonicalization issues .
Constrain, reject, and sanitize your input .

Assume All Input Is Malicious

Input validation starts with a fundamental supposition that all input is malicious until proven otherwise . Whether input comes from a service, a file share, a user , or a database, validate your input if the source is outside your trust boundary. For example, if you call an external Web service that returns strings, how do you know that malicious commands are not present? Also, if several applications write to a shared database, when you read data, how do you know whether it is safe?

Centralize Your Approach

Make your input validation strategy a core element of your application design. Consider a centralized approach to validation, for example, by using common validation and filtering code in shared libraries. This ensures that validation rules are applied consistently. It also reduces development effort and helps with future maintenance.

In many cases, individual fields require specific validation, for example, with specifically developed regular expressions. However, you can frequently factor out common routines to validate regularly used fields such as e-mail addresses, titles, names , postal addresses including ZIP or postal codes, and so on. This approach is shown in Figure 4.3.

Figure 4.3: A centralized approach to input validation

Do Not Rely on Client-Side Validation

Server-side code should perform its own validation. What if an attacker bypasses your client, or shuts off your client-side script routines, for example, by disabling JavaScript? Use client-side validation to help reduce the number of round trips to the server but do not rely on it for security. This is an example of defense in depth.

Be Careful with Canonicalization Issues

Data in canonical form is in its most standard or simplest form. Canonicalization is the process of converting data to its canonical form. File paths and URLs are particularly prone to canonicalization issues and many well-known exploits are a direct result of canonicalization bugs . For example, consider the following string that contains a file and path in its canonical form.

 c:\temp\somefile.dat

The following strings could also represent the same file.

 somefile.dat c:\temp\subdir\..\somefile.dat c:\  temp\   somefile.dat ..\somefile.dat c%3A%5Ctemp%5Csubdir%5C%2E%2E%5Csomefile.dat

In the last example, characters have been specified in hexadecimal form:

%3A is the colon character.
%5C is the backslash character.
%2E is the dot character.

You should generally try to avoid designing applications that accept input file names from the user to avoid canonicalization issues. Consider alternative designs instead. For example, let the application determine the file name for the user.

If you do need to accept input file names, make sure they are strictly formed before making security decisions such as granting or denying access to the specified file.

For more information about how to handle file names and to perform file I/O in a secure manner, see the "File I/O" sections in Chapter 7, "Building Secure Assemblies," and Chapter 8, "Code Access Security in Practice."

Constrain, Reject, and Sanitize Your Input

The preferred approach to validating input is to constrain what you allow from the beginning. It is much easier to validate data for known valid types, patterns, and ranges than it is to validate data by looking for known bad characters. When you design your application, you know what your application expects. The range of valid data is generally a more finite set than potentially malicious input. However, for defense in depth you may also want to reject known bad input and then sanitize the input. The recommended strategy is shown in Figure 4.4.

Figure 4.4: Input validation strategy: constrain, reject, and sanitize input

To create an effective input validation strategy, be aware of the following approaches and their tradeoffs:

Constrain input .
Validate data for type, length, format, and range .
Reject known bad input .
Sanitize input .

Constrain Input

Constraining input is about allowing good data. This is the preferred approach. The idea here is to define a filter of acceptable input by using type, length, format, and range. Define what is acceptable input for your application fields and enforce it. Reject everything else as bad data.

Constraining input may involve setting character sets on the server so that you can establish the canonical form of the input in a localized way.

Validate Data for Type, Length, Format, and Range

Use strong type checking on input data wherever possible, for example, in the classes used to manipulate and process the input data and in data access routines. For example, use parameterized stored procedures for data access to benefit from strong type checking of input fields.

String fields should also be length checked and in many cases checked for appropriate format. For example, ZIP codes, personal identification numbers , and so on have well defined formats that can be validated using regular expressions. Thorough checking is not only good programming practice; it makes it more difficult for an attacker to exploit your code. The attacker may get through your type check, but the length check may make executing his favorite attack more difficult.

Reject Known Bad Input

Deny "bad" data; although do not rely completely on this approach. This approach is generally less effective than using the "allow" approach described earlier and it is best used in combination. To deny bad data assumes your application knows all the variations of malicious input. Remember that there are multiple ways to represent characters. This is another reason why "allow" is the preferred approach.

While useful for applications that are already deployed and when you cannot afford to make significant changes, the "deny" approach is not as robust as the "allow" approach because bad data, such as patterns that can be used to identify common attacks, do not remain constant. Valid data remains constant while the range of bad data may change over time.

Sanitize Input

Sanitizing is about making potentially malicious data safe. It can be helpful when the range of input that is allowed cannot guarantee that the input is safe. This includes anything from stripping a null from the end of a user-supplied string to escaping out values so they are treated as literals.

Another common example of sanitizing input in Web applications is using URL encoding or HTML encoding to wrap data and treat it as literal text rather than executable script. HtmlEncode methods escape out HTML characters, and UrlEncode methods encode a URL so that it is a valid URI request.

In Practice

The following are examples applied to common input fields, using the preceding approaches:

Last Name field . This is a good example where constraining input is appropriate In this case, you might allow string data in the range ASCII A “Z and a “z, and also hyphens and curly apostrophes (curly apostrophes have no significance to SQL) to handle names such as O'Dell. You would also limit the length to your longest expected value.
Quantity field . This is another case where constraining input works well. In this example, you might use a simple type and range restriction. For example, the input data may need to be a positive integer between 0 and 1000.
Free-text field . Examples include comment fields on discussion boards . In this case, you might allow letters and spaces, and also common characters such as apostrophes, commas, and hyphens. The set that is allowed does not include less than and greater than signs, brackets, and braces.

Some applications might allow users to mark up their text using a finite set of script characters, such as bold "<b>", italic "<i>", or even include a link to their favorite URL. In the case of a URL, your validation should encode the value so that it is treated as a URL.

For more information about validating free text fields, see "Input Validation" in Chapter 10, "Building Secure ASP.NET Pages and Controls."
An existing Web application that does not validate user input . In an ideal scenario, the application checks for acceptable input for each field or entry point. However, if you have an existing Web application that does not validate user input, you need a stopgap approach to mitigate risk until you can improve your application's input validation strategy. While neither of the following approaches ensures safe handling of input, because that is dependent on where the input comes from and how it is used in your application, they are in practice today as quick fixes for short- term security improvement:
- HTML-encoding and URL-encoding user input when writing back to the client . In this case, the assumption is that no input is treated as HTML and all output is written back in a protected form. This is sanitization in action.
- Rejecting malicious script characters . This is a case of rejecting known bad input. In this case, a configurable set of malicious characters is used to reject the input. As described earlier, the problem with this approach is that bad data is a matter of context.

For more information and examples of input coding, using regular expressions, and ASP.NET validation controls, see "Input Validation" in Chapter 10, "Building Secure ASP.NET Pages and Controls."