| ||
Web application authorization can be complex and highly customized. Methodical attackers will thus seek to " fingerprint " the authz implementation first in order to get the lay of the land before launching overt attacks.
The easiest way to check the ACLs across the breadth of a site is to simply crawl it. We discussed web crawling techniques in Chapter 2, including several tools that automate the process (these are sometimes called offline browsers since they retrieve files locally for later analysis). We'll introduce an additional web crawler here called Offline Explorer Pro (from MetaProducts Software Corp.) because it provides better visibility into web ACLs than the ones discussed in Chapter 2.
Like most web crawlers, Offline Explorer Pro (OEP) is pretty simplejust point it at a URL and it grabs all linked resources within the specified depth from the provided URL. The interesting thing about OEP is that it displays the HTTP status code that it receives in response to each request, permitting easy visibility into ACLs on files and folders. For example, in Figure 5-1, OEP's Download Progress pane shows an Error: 401 Unauthorized response, indicating that this resource is ACL-ed and requires authentication.
OEP also natively supports most popular web authn protocols (including Windows NTLM and HTML forms), which makes it easy to perform differential analysis on the site. Differential analysis involves crawling the site using unauthenticated and authenticated sessions, or sessions authenticated as different users, in order to reveal which portions are protected, and from which users. The authentication configuration option in OEP is a bit hard to findit's located on the Project Properties page for a given project (File Properties), under the Advanced category, labeled "Passwords." This is shown in Figure 5-2.
Tip | For you command-line junkies, OE.exe can take parameters via the command line. |
The only real drawback to this approach is that it only "sees" portions of the web site that are linked from other pages. Thus, you may not get a complete picture using web crawling (for example, the hidden "admin" page may not be linked from any of the site's main pages, and thus be invisible to the crawler). Of course, as we noted in Chapter 2, automated crawling provides a great head start on more rigorous manual analysis that has a better chance of turning up such hidden content. That's the best you can do until someone invents an automated crawler that will perform nifty human tricks like perusing HTML source code for the inevitable hints about hidden directories that developers leave behind.
Access/session tokens are sometimes easy to see within web application flows, sometimes not. Table 5-1 lists information commonly found in access/session tokens, along with common abbreviations, to give the reader an idea of what we'll be looking for in later sections.
Session Attribute | Common Abbreviation |
---|---|
Username | username, user , uname , customer |
User Identifier | id, *id, userid , uid, *uid, customerid |
User Roles | admin=TRUE/FALSE, role=admin, priv=1 |
User Profile | profile, prof |
Shopping Cart | cart, cartid |
Session Identifier | session ID, sid, sessid |
Many common off-the-shelf (COTS) web servers have the capability to generate their own pseudo-random session IDs. Table 5-2 lists some common servers and their corresponding session-tracking variables . The IDs generated by more modern servers are generally large enough to preclude guessing attacks, although they are all vulnerable to replay (we'll discuss each of these in the upcoming section on attacking tokens).
Application Server | Session ID Variable Names |
---|---|
IIS | ASPSESSIONID |
J2EE-based servers | JSESSIONID |
PHP | PHPSESSID |
Apache | SESSIONID |
ColdFusion | CFID CFTOKEN JSESSIONID (runs on top of J2EE) |
Miscellaneous | JServSessionID JWSESSIONID SESSID SESSION SID session_id |
OK, you're fingerprinting a web application's authorization/session management functionality, and you've identified a value that is probably the session token, but it's a visually indecipherable blob of ASCII characters or a jumbled numeric value that offers no immediate visual cues as to how it's being used. Surrender and move on? Of course not! This section discusses some approaches to determining what you're up against.
Even though the session data may not immediately appear to be comprehensible, a little extra analysis ( backed by lots of experience!) can reveal subtle clues that in fact enable calculated guessing. For example, some session components tend to be quite predictable because they have a standard format or they behave in a predictable fashion. A datestamp, for example, could be identified by values in the token that continuously increment. We list several common attacks against such deterministic items in Table 5-3.
Session Component | Identifying Features | Possible Attacks |
---|---|---|
Time- and Datestamp | Constantly changes, even if encoded. A literal string, or a number in a 10-digit epoch format. | Changing this value could extend a login period. Replay attacks may depend on this. |
Incrementing Number | Changes monotonically with each request. | Changing this value could lead to session hijacking. |
User Profile | Encoded forms of known values: first/last name , address, etc. | Session hijacking. |
Server IP Address | Four bytes; e.g., 192.168.0.1 could be either 0xC0A80001 (big endian) or 0x0100A8C0 (little endian) | Changing this value would probably break the session, but it helps map out the web server farm. |
Client IP Address | Same as server IP address. | Possible dependency for replay attack session hijacking. |
Salt | May change with each request, may change with each session, or remain static. | Collecting several of these values could lead to guessing secret keys used by the server to encrypt data. |
Tip | Use the GNU date +%s command to view the current epoch time. To convert back to a human-readable format, try the Perl command: perl -e 'use Time::localtime; print ctime(<epoch number>)' |
Visually indecipherable blobs of ASCII characters usually mean one of two things: encoding or cryptography is at work. If the former, there is a ray of sunlight. If the latter, your best effort may only allow minimal additional insight into the function of the application.
Defeating Encoding Base64 is the most popular encoding algorithm used within web applications. If you run into encoding schemes that use upper- and lowercase Roman alphabet characters (AZ, az), the numerals (09), the + and / symbols, and that end with the = symbol, then the scheme is most likely base64.
Numerous encoder/decoder tools exist. For example, the Fiddler HTTP analysis tool discussed in Chapter 1 comes with a utility that will encode/decode Base64, URL, and hexadecimal formats.
If you want to write your own Base64 handler, such as for automated session analysis, Perl makes it simple to encode and decode data in Base 64. Here are two Perl scripts (actually, two effective lines of Perl) that encode and decode Base 64:
#!/usr/bin/perl # be64.pl # encode to base 64 use MIME::Base64; print encode_base64($ARGV[0]);
The decoder:
#!/usr/bin/perl # bd64.pl # decode from base 64 use MIME::Base64; print decode_base64($ARGV[0]);
Analyzing Crypto Web applications may employ encryption and/or hashing to protect authorization data. The most commonly used algorithms are not trivially decoded, as with Base 64. However, they are still subject to replay and fixation attacks, so it can be helpful to the attacker to identify hashed or encrypted values within a token.
For example, the popular hashing algorithm, MD5, is commonly used within web applications. The output of the MD5 algorithm is always 128 bits. Consequently, MD5 hashes can be represented in three different ways:
16-byte Binary Digest Each byte is a value from 0 to 255 (16 — 8 = 128).
32-byte Hexadecimal Digest The 32-byte string represents a 128-bit number. Think of four 32-bit numbers , represented in hexadecimal, concatenated in a single string.
22-byte Base 64 Digest The Base 64 representation of the 128 bits.
An encrypted session token is hard to identify. For example, data encrypted by the Data Encryption Algorithm (DES) or Triple-DES usually appear random. There's no hard-and-fast rule for identifying the algorithm used to encrypt a string. There are no length limitations to the encryption, although multiples of eight bytes tend to be used.
We'll talk more about attacking crypto later in this chapter.
When you identify numeric values within session ID, it can be beneficial to identify the range in which those numbers are valid. For example, if the application gives you a session ID number of 1234567, what can you determine about the pool of numbers that make a valid session ID? Table 5-4 lists several tests and what they can imply about the application.
The benefit of testing for a boundary is that you can determine how difficult it would be to launch a brute-force attack against that particular token. From an input validation or SQL injection point of view, it provides an extra bit of information about the underlying structure of the application.
Sometimes it is difficult to craft the right request or even know what fields are what. The authors have used a technique called differential analysis that has proven quite successful. The technique is very simple: you essentially crawl the web site with two different accounts and note the differences, such as where the cookies and/or other authorization/ state-tracking data differ. For example, some cookie values may reflect differences in profiles or customized settings. Other values, ID numbers for one, might be close together. Still other values might differ based on the permissions of each user.
Numeric Test | What a Successful Test Could Mean |
---|---|
Submit various length values consisting of all 9's (e.g., 999, 9999, 99999). | If you have a string of 20 numbers, then the application is most likely using a string storage type. |
-128 127 | The session token uses an 8-bit signed integer. |
255 | The session token uses an 8-bit unsigned integer. |
-32768 32767 | The session token uses a 16-bit signed integer. |
65535 | The session token uses a 16-bit unsigned integer. |
-2,147,483,648 2,147,483,647 | The session token uses a 32-bit signed integer. |
4294967295 | The session token uses a 32-bit unsigned integer. |
Note | We provide a real-world example of differential analysis in the "Authorization Attack Case Studies" section later in this chapter. |
A useful tool to aid the authorization audit process is a role matrix. A role matrix contains a list of all users (or user types) in an application and their corresponding access privileges. The role matrix can help graphically illustrate the relationship between access tokens and ACLs within the application. The idea of the matrix is not necessarily to exhaustively catalog each permitted action, but rather to record notes about how the action is executed and what session tokens the action requires. Table 5-5 has an example matrix.
Role | User | Admin |
---|---|---|
View Own Profile | /profile/view.asp?UID=TB992 | /profile/view.asp?UID=MS128 |
Modify Own Profile | /profile/update.asp?UID=TB992 | /profile/update.asp?UID=MS128 |
View Other's Profile | n/a | /profile/view.asp?UID=MS128&EUID=TB992 |
Delete User | n/a | /admin/ deluser .asp?UID=TB992 |
The role matrix is similar to a functionality map. When we include the URIs that each user accesses for a particular function, patterns might appear. Notice how the example in Table 5-5 shows that an administrator views another user's profile by adding the EUID parameter. The matrix also helps identify where session information, and consequently authorization methods , are being handled. For the most part, web applications seem to handle session state in a consistent manner throughout the site. For example, an application might rely solely on cookie values, in which case the matrix might be populated with cookie names and values such as AppRole=manager, UID=12345, or IsAdmin=false. Other applications may place this information in the URL, in which case the same value shows up as parameters.
The matrix helps even more when the application does not use straightforward variable names. For example, the application could simply assign each parameter a single letter, but that doesn't preclude you from modifying the parameter's value in order to bypass authorization. Eventually, you will be able to put together various attack scenarios especially useful when the application contains many tiers of user types.
Next, we'll move on to illustrate some example attacks against web application authorization mechanisms.
| ||