Fingerprinting Authz | HACKING EXPOSED WEB APPLICATIONS, 3rd Edition

Web application authorization can be complex and highly customized. Methodical attackers will thus seek to " fingerprint " the authz implementation first in order to get the lay of the land before launching overt attacks.

Crawling Acls

The easiest way to check the ACLs across the breadth of a site is to simply crawl it. We discussed web crawling techniques in Chapter 2, including several tools that automate the process (these are sometimes called offline browsers since they retrieve files locally for later analysis). We'll introduce an additional web crawler here called Offline Explorer Pro (from MetaProducts Software Corp.) because it provides better visibility into web ACLs than the ones discussed in Chapter 2.

Like most web crawlers, Offline Explorer Pro (OEP) is pretty simplejust point it at a URL and it grabs all linked resources within the specified depth from the provided URL. The interesting thing about OEP is that it displays the HTTP status code that it receives in response to each request, permitting easy visibility into ACLs on files and folders. For example, in Figure 5-1, OEP's Download Progress pane shows an Error: 401 Unauthorized response, indicating that this resource is ACL-ed and requires authentication.

Figure 5-1: Offline Explorer Pro lists HTTP status codes in the Download Progress pane, indicating resources that might be ACL'ed

OEP also natively supports most popular web authn protocols (including Windows NTLM and HTML forms), which makes it easy to perform differential analysis on the site. Differential analysis involves crawling the site using unauthenticated and authenticated sessions, or sessions authenticated as different users, in order to reveal which portions are protected, and from which users. The authentication configuration option in OEP is a bit hard to findit's located on the Project Properties page for a given project (File Properties), under the Advanced category, labeled "Passwords." This is shown in Figure 5-2.

Figure 5-2: Offline Explorer Pro's authentication configuration screen

Tip	For you command-line junkies, OE.exe can take parameters via the command line.

The only real drawback to this approach is that it only "sees" portions of the web site that are linked from other pages. Thus, you may not get a complete picture using web crawling (for example, the hidden "admin" page may not be linked from any of the site's main pages, and thus be invisible to the crawler). Of course, as we noted in Chapter 2, automated crawling provides a great head start on more rigorous manual analysis that has a better chance of turning up such hidden content. That's the best you can do until someone invents an automated crawler that will perform nifty human tricks like perusing HTML source code for the inevitable hints about hidden directories that developers leave behind.

Identifying Access/Session Tokens

Access/session tokens are sometimes easy to see within web application flows, sometimes not. Table 5-1 lists information commonly found in access/session tokens, along with common abbreviations, to give the reader an idea of what we'll be looking for in later sections.

Table 5-1: Information Commonly Stored in a Web Application Authorization/Session Token
Session Attribute	Common Abbreviation
Username	username, user , uname , customer
User Identifier	id, id, userid , uid, uid, customerid
User Roles	admin=TRUE/FALSE, role=admin, priv=1
User Profile	profile, prof
Shopping Cart	cart, cartid
Session Identifier	session ID, sid, sessid

COTS Session IDs

Many common off-the-shelf (COTS) web servers have the capability to generate their own pseudo-random session IDs. Table 5-2 lists some common servers and their corresponding session-tracking variables . The IDs generated by more modern servers are generally large enough to preclude guessing attacks, although they are all vulnerable to replay (we'll discuss each of these in the upcoming section on attacking tokens).

Table 5-2: Common COTS Session IDs
Application Server	Session ID Variable Names
IIS	ASPSESSIONID
J2EE-based servers	JSESSIONID
PHP	PHPSESSID
Apache	SESSIONID
ColdFusion	CFID CFTOKEN JSESSIONID (runs on top of J2EE)
Miscellaneous	JServSessionID JWSESSIONID SESSID SESSION SID session_id

Analyzing Session Tokens

OK, you're fingerprinting a web application's authorization/session management functionality, and you've identified a value that is probably the session token, but it's a visually indecipherable blob of ASCII characters or a jumbled numeric value that offers no immediate visual cues as to how it's being used. Surrender and move on? Of course not! This section discusses some approaches to determining what you're up against.

Even though the session data may not immediately appear to be comprehensible, a little extra analysis ( backed by lots of experience!) can reveal subtle clues that in fact enable calculated guessing. For example, some session components tend to be quite predictable because they have a standard format or they behave in a predictable fashion. A datestamp, for example, could be identified by values in the token that continuously increment. We list several common attacks against such deterministic items in Table 5-3.

Table 5-3: Common Session Token Contents
Session Component	Identifying Features	Possible Attacks
Time- and Datestamp	Constantly changes, even if encoded. A literal string, or a number in a 10-digit epoch format.	Changing this value could extend a login period. Replay attacks may depend on this.
Incrementing Number	Changes monotonically with each request.	Changing this value could lead to session hijacking.
User Profile	Encoded forms of known values: first/last name , address, etc.	Session hijacking.
Server IP Address	Four bytes; e.g., 192.168.0.1 could be either 0xC0A80001 (big endian) or 0x0100A8C0 (little endian)	Changing this value would probably break the session, but it helps map out the web server farm.
Client IP Address	Same as server IP address.	Possible dependency for replay attack session hijacking.
Salt	May change with each request, may change with each session, or remain static.	Collecting several of these values could lead to guessing secret keys used by the server to encrypt data.

Tip	Use the GNU date +%s command to view the current epoch time. To convert back to a human-readable format, try the Perl command: perl -e 'use Time::localtime; print ctime(<epoch number>)'

Analyzing Encoding and Encryption

Visually indecipherable blobs of ASCII characters usually mean one of two things: encoding or cryptography is at work. If the former, there is a ray of sunlight. If the latter, your best effort may only allow minimal additional insight into the function of the application.

Defeating Encoding Base64 is the most popular encoding algorithm used within web applications. If you run into encoding schemes that use upper- and lowercase Roman alphabet characters (AZ, az), the numerals (09), the + and / symbols, and that end with the = symbol, then the scheme is most likely base64.

Numerous encoder/decoder tools exist. For example, the Fiddler HTTP analysis tool discussed in Chapter 1 comes with a utility that will encode/decode Base64, URL, and hexadecimal formats.

If you want to write your own Base64 handler, such as for automated session analysis, Perl makes it simple to encode and decode data in Base 64. Here are two Perl scripts (actually, two effective lines of Perl) that encode and decode Base 64:

 #!/usr/bin/perl # be64.pl # encode to base 64 use MIME::Base64; print encode_base64($ARGV[0]);

The decoder:

 #!/usr/bin/perl # bd64.pl # decode from base 64 use MIME::Base64; print decode_base64($ARGV[0]);

Analyzing Crypto Web applications may employ encryption and/or hashing to protect authorization data. The most commonly used algorithms are not trivially decoded, as with Base 64. However, they are still subject to replay and fixation attacks, so it can be helpful to the attacker to identify hashed or encrypted values within a token.

For example, the popular hashing algorithm, MD5, is commonly used within web applications. The output of the MD5 algorithm is always 128 bits. Consequently, MD5 hashes can be represented in three different ways:

16-byte Binary Digest Each byte is a value from 0 to 255 (16 — 8 = 128).
32-byte Hexadecimal Digest The 32-byte string represents a 128-bit number. Think of four 32-bit numbers , represented in hexadecimal, concatenated in a single string.
22-byte Base 64 Digest The Base 64 representation of the 128 bits.

An encrypted session token is hard to identify. For example, data encrypted by the Data Encryption Algorithm (DES) or Triple-DES usually appear random. There's no hard-and-fast rule for identifying the algorithm used to encrypt a string. There are no length limitations to the encryption, although multiples of eight bytes tend to be used.

We'll talk more about attacking crypto later in this chapter.

Analyzing Numeric Boundaries

When you identify numeric values within session ID, it can be beneficial to identify the range in which those numbers are valid. For example, if the application gives you a session ID number of 1234567, what can you determine about the pool of numbers that make a valid session ID? Table 5-4 lists several tests and what they can imply about the application.

The benefit of testing for a boundary is that you can determine how difficult it would be to launch a brute-force attack against that particular token. From an input validation or SQL injection point of view, it provides an extra bit of information about the underlying structure of the application.

Differential Analysis

Sometimes it is difficult to craft the right request or even know what fields are what. The authors have used a technique called differential analysis that has proven quite successful. The technique is very simple: you essentially crawl the web site with two different accounts and note the differences, such as where the cookies and/or other authorization/ state-tracking data differ. For example, some cookie values may reflect differences in profiles or customized settings. Other values, ID numbers for one, might be close together. Still other values might differ based on the permissions of each user.

Table 5-4: Numeric Boundaries
Numeric Test	What a Successful Test Could Mean
Submit various length values consisting of all 9's (e.g., 999, 9999, 99999).	If you have a string of 20 numbers, then the application is most likely using a string storage type.
-128 127	The session token uses an 8-bit signed integer.
255	The session token uses an 8-bit unsigned integer.
-32768 32767	The session token uses a 16-bit signed integer.
65535	The session token uses a 16-bit unsigned integer.
-2,147,483,648 2,147,483,647	The session token uses a 32-bit signed integer.
4294967295	The session token uses a 32-bit unsigned integer.

Note	We provide a real-world example of differential analysis in the "Authorization Attack Case Studies" section later in this chapter.

Role Matrix

A useful tool to aid the authorization audit process is a role matrix. A role matrix contains a list of all users (or user types) in an application and their corresponding access privileges. The role matrix can help graphically illustrate the relationship between access tokens and ACLs within the application. The idea of the matrix is not necessarily to exhaustively catalog each permitted action, but rather to record notes about how the action is executed and what session tokens the action requires. Table 5-5 has an example matrix.

Table 5-5: An Example Role Matrix
Role	User	Admin
View Own Profile	/profile/view.asp?UID=TB992	/profile/view.asp?UID=MS128
Modify Own Profile	/profile/update.asp?UID=TB992	/profile/update.asp?UID=MS128
View Other's Profile	n/a	/profile/view.asp?UID=MS128&EUID=TB992
Delete User	n/a	/admin/ deluser .asp?UID=TB992

The role matrix is similar to a functionality map. When we include the URIs that each user accesses for a particular function, patterns might appear. Notice how the example in Table 5-5 shows that an administrator views another user's profile by adding the EUID parameter. The matrix also helps identify where session information, and consequently authorization methods , are being handled. For the most part, web applications seem to handle session state in a consistent manner throughout the site. For example, an application might rely solely on cookie values, in which case the matrix might be populated with cookie names and values such as AppRole=manager, UID=12345, or IsAdmin=false. Other applications may place this information in the URL, in which case the same value shows up as parameters.

The matrix helps even more when the application does not use straightforward variable names. For example, the application could simply assign each parameter a single letter, but that doesn't preclude you from modifying the parameter's value in order to bypass authorization. Eventually, you will be able to put together various attack scenarios especially useful when the application contains many tiers of user types.

Next, we'll move on to illustrate some example attacks against web application authorization mechanisms.