Chapter 1: Hacking Web Apps 101 | HACKING EXPOSED WEB APPLICATIONS, 3rd Edition

This chapter provides a brief overview of the "who, what, when, where, how, and why" of web application hacking. It's designed to set the stage for the subsequent chapters of the book, which will delve much more deeply into the details of web application attacks and countermeasures. We'll also introduce the basic web application hacking toolset, since these tools will be used throughout the rest of the book for numerous purposes.

What Is Web Application Hacking?

We're not going to waste much time defining web application unless you've been hiding under a rock for the last ten years , you likely have firsthand experience with dozens of web applications (Google, Amazon.com, Hotmail, and so on). For a broader background, look up "web application" on Wikipedia.org. We're going to stay focused here and cover purely security-relevant items as quickly and succinctly as possible.

We define a web application as one that is accessed via the HyperText Transfer Protocol, or HTTP (see "References and Further Reading" at the end of this chapter for background reading on HTTP). Thus, the essence of web hacking is tampering with applications via HTTP . There are three simple ways to do this:

Directly manipulating the application via its graphical web interface
Tampering with the Uniform Resource Identifier, or URI
Tampering with HTTP elements not contained in the URI

GUI Web Hacking

Many people are under the impression that web hacking is geeky technical work best left to younger types who inhabit dark rooms and drink lots of Mountain Dew. Thanks to the intuitive graphical user interface (GUI, or "gooey") of web applications, this is not necessarily so.

Here's how easy it can be. In Chapter 7, we'll discuss one of the most devastating classes of web app attacks: SQL injection. Although its underpinnings are somewhat complex, the basic details of SQL injection are available to anyone willing to search the Web for information about it. Such a search usually turns up instructions on how to perform a relatively simple attack that can bypass the login page of a poorly-written web application: inputting a simple set of characters that causes the login function to return "access granted"every time! Figure 1-1 shows how easily this sort of attack can be implemented using the simple GUI provided by a sample web application called Hacme Bank from Foundstone, Inc.

Figure 1-1: Entering the string ' OR 1=1 -- bypasses the login screen for Foundstone's sample Hacme bank application. Yes, it can be this easy!

Some purists are no doubt scoffing at the notion of performing "true" web app hacking using just the browser, and sure enough, we'll describe many tools later in this chapter and throughout this book that vastly improve upon the capabilities of the basic web browser, enabling industrial-strength hacking. However, don't be too dismissive. In our combined years of web app hacking experience, it's really the basic logic of the application that hackers are trying to defeat, no matter what tools are used to do it. In fact, some of the most elegant attacks we've seen involved only a browser.

Even better, such attacks are also likely to provide the greatest impetus to the web application administrator/developer/manager/executive to fix the problem. There is usually no better way of demonstrating the gravity of a vulnerability than by illustrating how to exploit it with a tool that nearly everyone on the planet is familiar with.

URI Hacking

For those of you waiting for the more geeky technical hacking stuff, here we go.

Anyone who's used a computer in the last five years would instantly recognize the most common example of a Uniform Resource Identifier it's the string of text that appears in the address bar of your favorite browser when you surf the Web, the thing that usually looks something like "http://www.somethingorother.com".

From a more technical perspective, RFC 2396 describes the structure and syntax of URIs (as well as subcategories including the more commonly used term Uniform Resource Locator , URL). Per RFC 2396, URIs are comprised of the following pieces:

 scheme://authority/path?query

Translating this into more practical terms, the URI describes a protocol ( scheme ) for accessing a resource ( path ) or application ( query ) on a server ( authority ). For web applications, the protocol is almost invariably HTTP (the major exception being the "secure" version of HTTP, called HTTPS , in which the session data is protected by either the SSL or TLS protocols; see "References and Further Reading" for more information).

Caution

Standard HTTPS (without client authentication) does nothing for the overall security of a web application other than to make it more difficult to eavesdrop on or interfere with the traffic between client and server.

The server is one or more computers running HTTP software (usually specified by its DNS name , like www.somesite.com), the path describes the hierarchy of folders or directories where application files are located, and the query includes the parameters that need to be fed to application executables stored on the server(s).

Note	Everything to the right of the "?" in a URI is called the query string .

The HTTP client (typically a web browser) simply requests these resources, and the server responds. We've all seen this performed a million times by our favorite web browser, so we won't belabor the point further. Here are some concrete examples:

http://server/file.html
http://server/folder/application?parameter1=value1&parameter2=value2
http://www.webhackingexposed.com/secret/search.php?input=foo&user=joel

As we noted earlier, web hacking is as simple as manipulating the URI in clever ways . Here are some simple examples of such manipulation:

https://server/folder/../../../../cmd.exe
http://server/folder/application?parameter1=aaaaa...256 a's...]
http://server/folder/application?parameter1=<script>'alert'</script>

If you can guess what each of these attacks might do, then you're practically an expert web hacker already! If you don't quite get it yet, we'll demonstrate graphically in a moment. First, we have a few more details to clarify.

Methods , Headers, And Body

There's a bit more going on under the covers than the URI lets on (but not much!). HTTP is stateless request-response protocol. In addition to the information in the URI (everything to the right of the protocol://domain), there is also the method used in the request, several protocol headers, and the data carried in the body. None of these are visible within the URI , but they are important to understanding web applications.

HTTP methods are the type of action performed on the target resource. The HTTP RFC defines a handful of methods, and the Web Distributed Authoring and Versioning (WebDAV) extension to HTTP defines even more. But most web applications use just two: GET and POST. GET requests information. Both GET and POST can send information to the server. There is one important difference. GET leaves all the data in the URI, while POST places the data in the body of the request (not visible in the URI). POST is usually used to submit form data to an application, such as with an online shopping application that asks for name, shipping address, and payment method. It's a common misunderstanding to assume that because of this lack of visibility, POST somehow protects data better than GET. As we'll demonstrate endlessly throughout this book, this is generally a faulty assumption (although sending sensitive information on the query string using GET does open more possibilities for exposing the data in various places, including the client cache and web server logs).

HTTP headers are usually used to store additional information about the protocol-level transaction. Some security-relevant examples of HTTP headers include

Authorization Defines whether certain types of authentication are used with the request, which doubles as authorization data in many instances (such as with Basic authentication).
Cache-control Defines whether a copy of the request should be cached on intermediate proxy servers.
Referer (The misspelling is deliberate , per the HTTP RFC) Lists the source URI from which the browser arrived at the current link. Sometimes used in primitive, and trivially defeatable, authorization schemes.
Cookies Commonly used to store custom application authentication/session tokens. We'll talk a lot about these in this book.

Here's a glimpse of HTTP "under the covers" provided by the popular netcat tool. We first connect to the www.test.com server on TCP port 80 (the standard port for HTTP; HTTPS is TCP 443), and then we request the /test.html resource. The URI for this request would be http://www.test.com/test.html.

 C:\>  nc -vv www.test.com 80  www.test.com [10.124.72.30] 80 (http) open  GET /test.html HTTP/1.0  HTTP/1.1 200 OK Date: Mon, 04 Feb 2002 01:33:20 GMT Server: Apache/1.3.22 (Unix) Connection: close Content-Type: text/html     <HTML><HEAD><TITLE>TEST.COM</TITLE>etc.

In this example, it's easy to see the method (GET) in the request, the response headers (Server: and so on), and response body data (<HTML> and so on). Generally, hackers don't need to get to this level of granularity with HTTP in order to be proficientthey just use off-the-shelf tools that automate all this low-level work and expose it for manipulation if required. We'll illustrate this graphically in the upcoming section on "how" web applications are attacked .