Identifying Interesting Data | Hunting Security Bugs

Sometimes data might not appear to be of any interest to malicious users because the values are not readable or understandable. However, upon further investigation, you might discover the developer used a weak attempt, as discussed later, to protect the data. Also, items you might not traditionally think of as data can disclose sensitive information. For instance, monitor burn-in can reveal data if the data was displayed on the screen for a long period of time, CPU usage reports can indicate the hours when a user works, screen captures published in documentation can show sensitive information, and attackers can also use a technique called van Eck phreaking or tempest ( http://en.wikipedia.org/wiki/Van_Eck_Phreaking ) to eavesdrop on the contents of the monitor using its electronic emissions. It s up to you to identify the data and interpret how it can be useful to attackers.

Obfuscating Data

Data might be obfuscated , and so the information disclosure bug might not be easily detected . Obfuscation refers to the process of modifying data so it isn t easy to understand or read but can still be interpreted ”if you know how. For example, data can be encoded to prevent it from appearing in clear text. However, using data obfuscation schemes does not protect the data; it just makes it harder for the tester to find the bugs , but easy for the attacker to break once the attacker figures out the encoding.

For instance, data sent across the network might look like %50%41%53%53word= %66%6f %6f. Not too readable, right? However, the data was encoded in hexadecimal format.If you know that and then decode the data, it would reveal password=foo. Other types of encoding schemes might not be as simple; however, once the attacker figures out what the encoding scheme is, the game is over.

Here are some common schemes developers might use to protect sensitive data ”but make no mistake, they are not protecting anything:

Hexadecimal
Base64
Rot-13/Rot-n
XOR
Mime

Chapter 12, Canonicalization Issues, goes into greater depths about these types of encoding, but data obfuscation is worth introducing here because you need to understand that the data might be encoded. The data might also be compressed using something like the ZLIB or GZIP algorithms, which make the data harder to recognize but which can easily be uncompressed. Developers might also use their own custom schemes to try to protect the data. The point is that you should know what the data is and how it is being used. If you find data that can easily fall into the hands of an attacker and it is protected only by obfuscation techniques like the examples mentioned, log a bug and get it fixed.

Implied Disclosures

Implied disclosures are when an attacker makes logical guesses to gain access to a resource. Suppose you just bought two phone cards and scratched off the gray area to reveal their activation numbers. If those numbers are sequential, you can probably guess the activation number of the next phone card in the store display. However easy and far- fetched that might seem, it happens. For instance, if you have a Web application that sets the session ID as a client cookie, it might be possible to hijack someone else s session by guessing that user s ID.

Note

One weekend , I (Bryan effries) was writing an application that interacted with a Web site to programmatically send requests to a form. To prevent people from doing this, the site used a technique called completely automated public Turing test to tell computers and humans apart (CAPTCHA). The goal of CAPTCHA is to make sure the user is a human, not a computer. For example, CAPTCHA could be used on a site that provides free e-mail accounts to prevent an automated script from creating several accounts useful for spammers. In my situation, the Web site showed an image containing six random numbers that the human user had to enter for the operation to succeed. When I right-clicked the image, I noticed the filename was also a six-digit number, but not the same value as the one shown on-screen. After refreshing the screen a few times and making note of the number displayed and the image s filename, I then realized that the difference between the two numbers remained constant. Thus, I could easily predict the random CAPTCHA number shown on-screen by looking at the image filename and adding to it the constant difference I had determined previously between the two numbers.

Following are more examples of implied disclosures:

Hidden Web pages that are not linked to
- http://www.alpineskihouse.com/admin
- http://www.alpineskihouse.com/admin.asp

Common account or user names and passwords
- Default user names and passwords that are never changed
- If username _read exists, perhaps so does username _write

Backup copies of files (file.bak, file.old, taxes05.tax.bak, etc.)
Using Bitwise XOR to protect secrets
Using cookies or HTTP headers to send secrets