Ways Programmers Try to Prevent HTML Scripting Attacks

As discussed earlier, the most common way to attempt to stop HTML scripting attacks is to HTML-encode the user -supplied data. Also discussed were situations in which HTML encoding wouldn t stop the attack. Programmers use many other methods to attempt to block HTML scripting attacks. The following sections discuss several different approaches, how each approach attempts to block the attack, and some ways attackers might bypass these attempts.

Filters

Filtering user input is a good idea. However, filters that attempt to block characters that are known to be bad usually fail. All an attacker needs to do to defeat such a filter is find one case that the programmer didn t realize was bad, and then use that character in an attack. Filtering and only allowing known good characters (known as whitelisting) is always a better approach.

Some filters modify the user s data before returning it. HTML encoding can be considered a form of filtering: the programmer specifically looks for such characters as angle brackets (<>), the ampersand (&), and quotation marks (") and modifies them to their encoded equivalents. Other filters return an error and refuse to process the request if the input includes characters on the black list. Following are two examples of different types of filtering.

Removing Strings from Input Before Returning It

A few filters attempt to block script by removing the string script. Consider an application that returns the user s data as the value of the src attribute of the <img> tag. Script could run by using a script protocol like javascript: . Sometimes programmers are also aware of this. In this case, the programmer attempts to block the HTML scripting attack by removing the strings script and mocha from the input before returning it. At first it appears the attack is blocked by the developer, but as a security tester you want to be persistent and think about this further because an attacker will. Can you find a way to bypass this filter? If programmers simply make a single pass through the input to remove the blacklisted strings and then return the data, they are in for a surprise.

Consider a string such as AAAscriptBBB . After it passes through the filter, the application returns AAABBB . Getting any ideas? What happens if the input is scriscriptpt ? The substring script would be removed resulting in script ! In the <img> tag example, attackers want to send in data that ends up being a scripting protocol; they could send in something like javascripscriptt:alert('Gotcha') and end up with javascript:alert('Gotcha') , resulting in an XSS bug.

Blocking Breaking Out of an Attribute by Escaping

In many cases, user-supplied data is returned as the value of a string variable (see the section titled Stuck in a Script Block earlier in this chapter). The developer needs to make sure an attacker cannot get out of the string variable declaration. In a situation in which the returned data looks like the following, if attacker-controlled data is returned, the developer must ensure an attacker cannot close the single quotation marks around the user input:

 <SCRIPT> var strMyVar = 'user input goes here'; more script appears here </SCRIPT>

Single quotation marks can be escaped using a backslash. For example, if the user input is it s fun testing this app and the application correctly escapes the input, the following is returned:

 <SCRIPT> var strMyVar = 'it\'s fun testing this app'; more script appears here </SCRIPT>

Sometimes programmers won t think much past blocking the attacker from entering a single quotation mark to break out. The backslash that is added in the modified output can sometimes be escaped by the attacker s input. Just as a backslash escapes a single quotation mark, a backslash can also escape another backslash (\\ is treated as one backslash in the string variable). So if the attacker input is \ ; alert(document.domain);// and the programmer doesn t escape backslashes from the input, the following HTML would result:

 <SCRIPT> var strMyVar = '\';alert(document.domain);//'; more script appears here </SCRIPT>

This HTML runs script in the browser because the single quotation mark is no longer escaped.

Sometimes it s hard to run meaningful code because of character limitations

In the example in which the developer attempts to block an attacker from breaking out of an attribute by escaping, every time a single quotation mark is used the returned data puts a backslash before it. Although it is possible to run script, it might be difficult to run the script the attacker desires. For example, if the data is also HTML encoded, the attacker is severely limited in which characters to use. It might seem difficult to even run script like alert('Hi');, which would become alert(\'Hi\'); and result in a syntax error.

The location.hash property discussed earlier isn t sent to the server, but is accessible to script running on the page. Because it isn t sent to the server it won t be filtered by any server-side code. The location.hash property can be used to include characters that are normally modified by server-side filters. In the example, if the data was sent through the query string in a URL such as http://server/filter.asp?input=data , where data is filtered as discussed previously, attackers could insert their own <script> block that would not be affected by the filters with a URL like http://server/filter.asp?input= \'; document. open ();document.write(location.hash);document.close();//#<SCRIPT>alert("Hi!");</SCRIPT> .

Gaining In-Depth Understanding of the Browser s Parser

In the preceding example in which data is returned as the value of a string variable inside a <script> block, surprisingly it isn t necessary to close the single quotation marks. It turns out that the browser is looking for the </script> tag to close the <script> block. Then everything in between the <script> and </script> tags is treated as script and is checked for syntax errors. If programmers aren t aware of this and think script can t run without breaking out of the single quotation marks, they might not worry about filtering such characters as angle brackets. If an attacker sends in </SCRIPT><SCRIPT>alert(document.domain)</SCRIPT> as the input, the following would be returned:

 <SCRIPT> var strMyVar = '</SCRIPT><SCRIPT>alert(document.domain)</SCRIPT>'; more script appears here </SCRIPT>

The browser would interpret this as two separate <script> blocks. The first one has syntax errors and won t run any code, but the second one is syntactically correct. It will appear as <SCRIPT>alert(document.domain)</SCRIPT> and will run script successfully.

This is one example of how you and attackers can take advantage of browser idiosyncrasies. Some browsers have different nuances , so it is important to study each carefully .

Tip

Another little-known browser implementation detail is that Internet Explorer ignores NULL characters inside the HTML document, which allows <sc [null] ript> to be interpreted as <script>. Most filters looking for the <script> tag will not interpret <sc [null] ript> as script and will allow it to go through the filter.

Comments in Styles

In the section titled Using Styles earlier in this chapter, we demonstrated how to run script using a style expression. Some programmers are aware of this issue. They will specifically block styles that include the string expression . Styles support C-style comments anywhere within the style. For example, the following HTML includes a comment in the style:

 <INPUT name="txtInput1" type="text" value="SomeValue" style= "font-family:wingdings  /*   That funky Wingdings font will be used to display the text */">

Comments can be used to help bypass filters. In the example, the developer is looking for the string expression because it is used to run script through a style. Placing a comment in the middle of the word expression will bypass some filters. For example, the following HTML will run script and bypass a filter that is looking for expression :

 <INPUT name="txtInput1" type="text" value= "SomeValue" style="font-family:e/**/xpression(alert('Hi!'))">

Character Sets

The encoding and filter approaches generally take place on the server before it returns the user-supplied data to the client s Web browser. A challenge of writing an effective server-side filter is enabling the server to recognize the data in the same way the client will. One way to bypass some server-side filters is by getting the server to interpret data using one encoding, but have the client use another. Consider the following sample ASP code (example charset.asp is included on the book s companion Web site):

 <HTML> <HEAD><TITLE>XSS Charset Demo</TITLE></HEAD> <BODY> <% response.write Server.HTMLEncode(Request("Name")) %> </BODY> </HTML>

At first the code looks free of cross-site scripting security holes. The user-supplied data that is returned (the name parameter from the query string) is HTML encoded so an attacker can t get <script> returned to the browser.

The goal of this test case is to send data to the server using an encoding/character set different from the one used by the server. If Unicode Transformation Format 7 (UTF-7) is used to represent the angle brackets, the URL will change from http://server/charset.asp?name=<SCRIPT>alert(document.domain)</SCRIPT> to http://server/charset.asp?name=%2B%41%44%77%2DSCRIPT%2B%41%44%34%2D%61%6C%65%72%74%28document.domain%29%3B%2B%41%44%77%2D%2FSCRIPT%2B%41%44%34%2D . Supplying this URL will not run script on the victim s machine, however, unless the user s browser interprets the page as UTF-7. Most users do not have the UTF-7 encoding selected specifically, although Internet Explorer has a feature that can automatically detect which encoding to apply to the page. This feature is not turned on by default but can be enabled; it is recommended that users who want multilanguage support enable this feature: in Internet Explorer, on the View menu, select Encoding, and then click Auto-Select.

With the Auto-Select feature enabled, the preceding UTF-7 data is returned from the Web server and is interpreted as the <script> tag ”resulting in script running. This technique isn t limited to UTF-7. You can typically find ways to bypass any filtering logic on the server any time the browser interprets data that uses an encoding or character set different from the one the server uses to filter it.

Internet Explorer will not auto-select the character set if the HTTP response specifies a character set in the Content-Type header or in the meta portion of the HTML returned. Opera and Netscape both support multiple character sets, but don t seem to have the Auto-Select feature present in Internet Explorer.

Tip	RSnake maintains an extensive set of test cases for HTML script attacks on his Web site at http://ha.ckers.org/xss.html .

ASP.NET Built-in Filters

Microsoft ASP.NET 1.1 introduces a feature, named ValidateRequest , to help stop attacks from reaching vulnerable ASP.NET code; this feature is enabled by default. When the Validate Request property is enabled, the query string and POST data are inspected before being passed to the code contained in the ASP.NET page. If the data is suspicious, an exception is thrown. Some of the data that ValidateRequest perceives as suspicious include <script>, onload= , and style= . Figure 10-16 shows an example of an error page that ASP.NET displays if the server hasn t disabled error messages or caught the exception.

Figure 10-16: An ASP.NET exception, which is thrown if input that might lead to an HTML scripting attack is encountered

This filter certainly blocks many attacks, but won t stop everything. The bug in the ASP code still exists, but there is a road block preventing you from getting to the vulnerable code easily.

Important

Built-in filters such as the ASP.NET filter stop many attacks, but you should not rely on them exclusively to prevent HTML scripting bugs . It is still worth fixing flaws in code because the built-in filters will not prevent all attacks.