The Sin Explained | Writing Secure Code

As weve already mentioned, there are two parts to the information leakage sin. Privacy is a topic that concerns a great deal of users, but we feel its largely outside the scope of this book. We do believe you should carefully consider the requirements of your user base, being sure to solicit opinions on your privacy policies. But in this chapter, well pretty much ignore those issues and look at the ways in which you can accidentally leak information that is valuable to an attacker.

Side Channels

There are plenty of times when an attacker can glean important information about data by measuring information that the design team wasnt aware was being communicated. Or, at least, the design team wasnt aware that there were potential security implications!

There are two primary forms of these so-called side channel issues: timing channels and storage channels. With timing channels , the attacker learns about the secret internal state of a system by measuring how long operations take to run. For example, in Sin 11 we explain a simple timing side channel on the TENEX login system, in which the attacker could figure out information about a password by timing how long it takes to respond to bad passwords. If the attacker got the first letter of the password right, the system responded faster than if he got the first letter of the password wrong.

The basic problem occurs when an attacker can time the durations between messages, where message contents are dependent on secret data. It all sounds pretty esoteric, but it can be practical in some situations, as we will see.

In general, there is probably a lot of cryptographic code out there subject to timing attacks. Most public key cryptography, and even a lot of secret key cryptography, uses time-dependent operations. For example, AES uses table lookups that can run in time that is dependent on the key (that is, the time it takes will change with the key), depending on the AES implementation. When such tables arent hardened , a statistical attack with precise timing data can extract an AES key just by watching data as it goes by.

While table lookups are usually thought of as a constant-time operation, they usually arent, because parts of the table may be forced out of level 1 cache (due to the table being too big, other operations in other threads kicking data out of the cache, and even other data elements in the same operation kicking data out of the cache).

Well look briefly at a few examples of timing attacks on cryptosystems in the Example Sins section. There is reason to believe that some of them will be remotely exploitable, at least under some conditions. And, you should expect such problems are generally exploitable when an attacker can have local access to a machine performing such operations.

Timing channels are the most common type of side channel problem, but theres another major category: storage channels. Storage channels allow an attacker to look at data and extract information from it that probably was never intended or expected. This can mean inferring information from the properties of the communication channel that are not part of the data semantics and could be covered up. For example, simply allowing attackers to see an encrypted message on the wire can give them information, such as the length of the message. The length of the message usually isnt considered too important, but there are cases where it could be. And, the length of a message is certainly something that can be kept from attackers, for example, by always sending encrypted data at a fixed rate so that attackers cannot make out message boundaries. Sometimes a storage channel can be the meta-data for the actual protocol/system data, such as file system attributes or protocol headers encapsulating an encrypted payload. For example, even if all your data is protected, an attacker can often learn information about who is communicating from the destination IP address in the headers (this is even true in IPSec).

Storage side channels arent generally as interesting as the primary communication channel. For instance, even if you are doing proper cryptography on the wire, you are likely to expose the username on the wire before authenticating, which gives an attacker useful starting points when trying to launch password guessing or social engineering attacks. As well see in the rest of this chapter, both information leakage through the primary channel and timing side-channel attacks may have far more practical information.

TMI: Too Much Information!

The job of any application is to present information to users so they can use it to perform useful tasks . The problem is that there is such a thing as too much information (TMI). This is particularly true of network servers, which should be conservative about the information they give back in case theyre talking to an attacker, or an attacker is monitoring the conversation. But client applications have numerous information disclosure problems, too.

Here are some examples of information that you shouldnt be giving to users.

Whether the Username Is Correct

If your login system gives a different error message for a bad username and a bad password, then youre giving attackers an indication of when theyve guessed a username correctly. That gives them something to target for a brute force guessing attack or a social engineering attack.

Detailed Version Information

The problem with having detailed version information is one of aiding the attackers and allowing them to operate unnoticed. The goal of attackers is to find vulnerable systems without doing anything that will get them noticed. If attackers try to find network services to attack, they first want to fingerprint the operating system and services. Fingerprinting can be done at several levels and with various degrees of confidence. Its possible to accurately identify many operating systems by sending an unusual collection of packets and checking for responses (or lack of response). At the application level, you can do the same thing. For example, Microsofts IIS web server wont insist on a carriage return/line feed pair terminating a HTTP GET request, and will also accept just a linefeed . Apache insists on proper termination according to the standard. Neither application is wrong, but the behavioral differences can reveal whether you have one or the other. If you create a few more tests, you can narrow down exactly which server youre dealing with, and maybe which version.

A less reliable method would be to send a GET request to a server and check the banner thats returned. Heres what youd get from an IIS 6.0 system:

 HTTP/1.1 200 OK Content-Length: 1431 Content-Type: text/html Content-Location: http://192.168.0.4/iisstart.htm Last-Modified: Sat, 22 Feb 2003 01:48:30 GMT Accept-Ranges: bytes ETag: "06be97f14dac21:26c" Server: Microsoft-IIS/6.0 Date: Fri, 06 May 2005 17:03:42 GMT Connection: close

The server header tells you which server youre dealing with, but thats something that the user could modify. For example, some people run an IIS 6.0 server with the banner set to Apache, and then laugh at people launching the wrong attacks.

The trade-off the attacker is faced with is that while the banner information may be less reliable than a more comprehensive test, getting the banner can be done with a very benign probe thats unlikely to be noticed by intrusion detection sensors. So if attackers can connect to your network server and it tells them exact version information, they can then check for attacks known to work against that version and operate with the least chance of getting caught.

If a client application embeds the exact version information in a document, thats a mistake as well; if someone sends you a document created on a known vulnerable system, you know that you can send them a malformed document that causes them to execute arbitrary code.

Host Network Information

The most common mistake is leaking internal network information such as:

MAC addresses
Machine names
IP addresses

If you have a network behind a firewall, Network Address Translation (NAT) router, or proxy server, you probably dont want any of this internal network detail leaked beyond the boundary. Therefore, be very careful about what sort of nonpublic information you include in error and status messages. For example, you really shouldnt leak IP addresses in error messages.

Application Information

Application information leakage commonly centers on error messages. This is discussed in detail in Sin 6. In short, dont leak sensitive data in the error message.

Its worth pointing out that error messages that seem benign often arent, such as the response to an invalid username, as mentioned earlier. In crypto protocols, its quickly becoming best practice to never state why there is a failure in a protocol and to avoid signaling errors at all, when possible, particularly after recent attacks against SSL/TLS took advantage of information from error messages. Generally, if you can communicate an error securely, and you are 100 percent sure about whos receiving the error, you probably dont have much to worry about. But if the error goes out of band where everyone can see it (as was the case in SSL/TLS), then you should consider dropping the connection instead.

Path Information

This is a very common vulnerability, and just about everyone has committed it. Telling the bad guys the layout of your hard drive makes it easier for them to identify where they can drop malware if the computer is compromised.

Stack Layout Information

When youre writing in C, C++, or assembly, and you call a function passing too few arguments, the run time doesnt care. It will just take data off the stack. That data can be the information that an attacker needs to exploit a buffer overflow somewhere else in the program, as it may very well give a good picture of the stack layout.

This may not sound likely, but its actually a common problem, in which people call *printf() with a specific format string, and then provide too few arguments.

A Model for Information Flow Security

In a simple us vs. them scenario, its not too hard to reason about information leakage. Either youre giving sensitive data to the attacker, or youre not. In the real world, though, systems tend to have a lot of users, and there may be concern about access controls between those users. For example, if youre doing business with two big banks, theres a good chance neither bank wants the other to see its data. It should also be easy to imagine more complex hierarchies, where we might want to be able to selectively grant access.

The most common way to model information flow security is with the Bell-LaPadula model (see Figure 13-1). The basic idea is that you have a hierarchy of permissions, where each permission is a node on a graph. The graph has links between nodes. Relative position is important, as information should only flow up the graph. Intuitively, the top nodes are going to be the most sensitive, and sensitive information shouldnt flow to entities that only have less sensitive permissions. Nodes that are of the same height cant flow information to each other unless they have a link, in which case they effectively represent the same permission.

Figure 13-1: The Bell-LaPadula Disclosure model

Note	This illustration is a simplification of the model, but it is good enough. The original description of the model from 1976 is a 134-page document!

Bell-LaPadula is an abstraction of the model that the U.S. Government uses for its data classification (for example, Top Secret, Secret, Classified , and Unclassified). Without going into much detail, its also capable of modeling the notion of com- partmentalization that the government uses, meaning that, just because you have Top Secret clearance, doesnt mean you can see every Top Secret document. There are basically more granular privileges at each level.

This model can also protect against a lot of data mistrust issues. For example, data labeled untrusted will have that tag associated with it through the lifetime of the data. If you try to use that data in an operation classified as, say, highly privileged, the system would barf.

If youre building your own privilege model, you should study the Bell-LaPadula model and implement a mechanism for enforcing it. However, you should be aware that, in practice, there will be cases where you need to relax it, such as the example where you want to use data from an untrusted source in a privileged operation. There may also be cases where you want to release information selectively, such as allowing the credit card company to see someones credit card number, but not their name . This corresponds to a selective declassification of data. Generally, you should have an API that explicitly allows for declassification, which is really just a call that says, Yes, I mean to give this information to a lower privilege (or allow the operation requested by the lower privilege to happen). It is okay.

Bell-LaPadula is the model for several language-based security systems. For example, Javas privilege model (most visible with applets) is based on Bell-LaPadula. All objects have permissions attached to them, and the system wont let a call run unless all of the objects involved in a request (the call stack) have the right permissions. The explicit declassification operation is the doPrivileged() method, allowing one to circumvent the call stack check (so-called stack inspection). The Common Language Runtime (CLR) used by .NET code has a similar permission model for assemblies.

Sinful C# (and Any Other Language)

This is one of the most common leakage mistakes we see: giving exception information to the user, er, attacker.

 string Status = "No"; string sqlstring =""; try {  // SQL database access code snipped } catch (SqlException se) {  Status = sqlstring + " failed\r\n";  foreach (SqlError e in se.Errors)   Status += e.Message + "\r\n"; } catch (Exception e) {  Status = e.ToString(); }  if (Status.CompareTo("No") != 0) {     Response.Write(Status); }

Related Sins

The closest sin to this one is discussed in Sin 6. Another set of sins to consider are cross-site scripting vulnerabilities that can divulge cookie data (Sin 7), and SQL injection vulnerabilities (Sin 4) that allow an attacker to access data by forcing a change in the SQL statement used to query a database.

We gave a specific example of a timing side channel attack in Sin 11, when we talked about the TENEX bug.