Common Ways Programmers Try to Stop Attacks

The most common way programmers attempt to stop attacks is to encode the HTML of an attacker s input before returning it to the Web browser. HTML encoding replaces characters used to create HTML tags, such as angle brackets, with other characters that are not interpreted as special HTML characters. The replacement characters do not affect the way text is displayed in the Web browser ”they only stop the HTML rendering engine from recognizing data as HTML tags. So, when <SCRIPT>alert("hi")</SCRIPT> is HTML encoded, it is returned as <SCRIPT>alert("hi")</SCRIPT> . (Table 10-2 lists several characters that are HTML encoded.) This approach to stopping XSS attacks often works. However, this approach won t always stop all XSS attacks.

Table 10-2: HTML Encoding for Input Characters
Original character	Character after HTML encoding
	It
	gt
	amp
	quot

Developers can significantly limit XSS attacks by HTML-encoding all of the user -supplied data because then attackers often cannot get their data to be returned from the server as HTML. This technique of encoding is good for security reasons, but many programs want to allow users to use HTML. For example, some Web-based programs such as Web logs and Web-based e-mail systems offer users the opportunity to richly format their entries by using HTML tags; however, these applications don t want to allow users to run script. Attempting to block script while allowing use of other HTML tags is very difficult, and there many ways to run script without using the <script> tag.

HTML-Encoded Data Doesn t Always Stop the Attack

Often, programmers can decrease the capability of running script when they HTML-encode untrusted data. However, this method won t stop script in all cases. Following are a few situations when script can run even if the attacker s data is HTML-encoded by the programmer.

Stuck in a Script Block

Sometimes the attacker s data ends up inside the <script> tag. This usually happens when the data passed in is being set as the value for a script variable. For example, look at this code:

 <SCRIPT>  SomeCode  var strEmailAdd = 'attacker data';  MoreCode </SCRIPT>

In this example, attackers don t need to send a <script> tag ”their data is already inside a script block. All an attacker needs to do is close the quotation marks in which the variable s value is set. In this example, the programmer of the script chose to use single quotation marks to enclose the value of the string. Single quotation marks aren t modified when the data is HTML encoded. To run script, an attacker could send '; alert('Hi!'); // as the data. The script returned to the browser then would look like this:

 <SCRIPT>  SomeCode  var strEmailAdd = ''; alert('Hi!'); //';  MoreCode </SCRIPT>

Notice that the input closes the value of the string variable strEmailAdd with the first character (single quotation mark); then, it uses the statement delimiter (semicolon) and is followed by arbitrary code. The data is ended with two forward slashes to comment out the rest of the line. Because the input data is always followed by the closing quotation mark and a semicolon (';) in the output HTML, the attacker wants to comment that out. The attacker doesn t want a syntax error in the script that would prevent the exploit from running.

Using Events

In HTML, attributes of a tag can be enclosed in single quotation marks, double quotation marks, or no quotation marks at all (see Figure 10-7). If untrusted data is returned as the attribute of a tag and the data is HTML encoded, an attacker cannot break out of the attribute if the attribute is enclosed in double quotation marks (double quotation marks are converted to " ). However, if the HTML author didn t enclose the attribute s value in double quotation marks and is HTML-encoding the user s data, the untrusted data will be confined to the tag, but not the attribute.

Figure 10-7: Attributes enclosed in single quotation marks, double quotation marks, and no quotation marks

The more knowledge you have (or an attacker has) about HTML, the more effective you will be at finding ways to run script when certain constraints are imposed. For example, most tags have events. When a tag s event occurs, the user-defined script associated with that event runs. In the <input> tag example in Figure 10-7, there are many possible events. One of the events is the onclick event. If the untrusted data is returned in the HTML where the untrusted data is HTML encoded, as follows , script can still run:

 <INPUT name="txtInput2" type="text" value='unTrustedData'>

If OurData' onclick=alert('Hi') junk=' is sent as the untrusted data, the following HTML will be returned:

 <INPUT name="txtInput2" type="text" value=' OurData' onclick=alert('Hi') junk=''>

When the user clicks the text box, the onclick event will fire and script will run. Usually, there are many different events for each HTML tag. When you exploit a condition similar to this, it is wise to consult an HTML reference. Sometimes programmers attempt to filter suspicious-looking data, which might make events not commonly used more important to test. By using less common events, an attacker hopes that the programmer doesn t know about an event and so it is therefore unfiltered .

Using Styles

HTML Styles also allow script to be run. Legitimate script in styles is a feature that isn t commonly used, but you should think like an attacker when testing ”attackers will use anything to attack. HTML Styles are normally used for formatting the page display. For example, the fontused in a text box can be specified to be Wingdings by using HTML styles, as shown in Figure 10-8.

Figure 10-8: Using the Style property of the <input> tag to change the font to Wingdings

Expressions in styles can be used to run arbitrary script. For example, <INPUT name="txtInput1" type="text" value="SomeValue" style="font-family:expression(alert('Hi!'))"> will run script. It isn t common to be stuck in a style attribute, but if you are, it could be a way to run script. Styles are more useful in places where the programmer knows to block events but doesn t know about styles.

Scripting Protocols

In some situations, untrusted data is HTML-encoded and is returned as the value of the src property of an IMG tag. For example, look at this code:

 <IMG src="untrusted data">

Normally, the data that would be sent is the filename of a graphics file, for example, smiley.gif, or a full URL such as http://www.example.com/ monkey .gif . Sending a URL for a picture won t run script. However, most browsers support JavaScript URLs: the URL begins with javascript: and is followed by code. Often, JavaScript URLs are used in links when the author of the pagewants to run some script on the page when the link is clicked. This JavaScript URL syntax can be used to an attacker s advantage.

Almost everywhere a full URL in a Web page can be placed, a JavaScript URL will work. In the preceding example, javascript:alert('Hi!') could be sent as the untrusted data instead of a graphics filename, and a script would run on the page. Angle brackets aren t even needed! The javascript: protocol is the most widely used scripting protocol and should work in most browsers. However, many browsers recognize some additional scripting protocols. For example, older versions of Netscape also support mocha: and livescript: . Internet Explorer currently supports vbscript: in addition to javascript: .

Important

To help protect users, Internet Explorer 7 doesn t support scripting protocols as the src property of an image tag.