The Sin Explained | Writing Secure Code

The biggest sin you can commit with random numbers is not using them when they should be used. For example, lets say youre writing some web-based banking software. To track client state, youll want to put a session identifier in the clients list of cookies. Lets say you give everybody a sequential session ID. What could happen? If the attacker watches his cookies and sees that hes #12, he could tamper with the cookie, change it to #11, and see if he gets logged into someone elses account. If he wants to log into some particular users account, he can now just wait until he sees that user log in, log in himself, and then keep subtracting from the value he gets looking at the appropriate data each time to determine when he has found targeted users.

The random number generators that have been around for years arent good for security at all. The numbers may look random, but an attacker can generally guess them easily, anyway. Then, even if you do use a good random number generator, you have to make sure its internal state isnt easily guessable, which can actually be an issue.

Lets understand the problem better by looking at the three different kinds of random numbers:

Noncryptographic pseudo-random number generators (noncrytographic PRNG)
Cryptographic pseudo-random number generators (CRNGs)
True random number generators (TRNGs), which are also known as entropy generators

Sinful NonCryptographic Generators

Before the Internet, random numbers werent really used for security-critical applications. Instead, they were used only for statistical simulation. The idea was to have numbers that would pass all statistical tests for randomness, for use in Monte Carlo experiments. Such experiments were designed to be repeatable. Thus, Application Programming Interfaces (APIs) were designed to take a single number, and have that number be the source for a very long stream of numbers that appeared randomly . Such generators usually use a fairly simple mathematical formula to generate numbers in a sequence, starting from the initial value (the seed).

When security became an issue, random number requirements got more stringent. Not only do numbers have to pass statistical random tests, but also you need to ensure that attackers cant guess numbers that are produced, even if they can see some of the numbers.

The ultimate goal is if attackers cant guess the seed, they wont be able to guess any outputs you dont give them. This should hold true, even if you give them a lot of outputs.

With traditional noncryptographic generators, the entire state of the generator can be determined just from looking at a single output. But, most applications dont use the output directly. Instead, they map it onto a small space. Still, that only serves as a minor speed bump for an attacker. Lets say that the attacker starts out knowing nothing about the internal state of the generator. For the most noncryptographic generators, 2^32 possible states exist. Every time the program gives the user one bit of information about a random number (usually, whether its even or odd), the attacker can generally rule out half of the states. Therefore, even if the attacker can only infer minimal information, it only takes a handful of outputs (in this case, about 32 outputs) before the entire state gets revealed anyway.

Clearly, you want generators that dont have this property. But, it turns out that the study of producing good random numbers is basically equal to producing a good encryption algorithm, as many encryption algorithms work by generating a sequence of random numbers from a seed (the key), and then XORing the plaintext with the stream of random numbers. If you treat your random number generator as a cipher and a cryptographer can break it, that means someone could guess your numbers far more easily then youd like.

Sinful Cryptographic Generators

The simplest cryptographic pseudo-random number generators (CRNGs) act very much like traditional random number generators, in that they stretch out a seed into a long sequence of numbers. Anytime you give it the same seed, it produces the same set of numbers. The only real difference is that if the attacker doesnt know the seed, you can give an attacker the first 4,000 outputs, and they shouldnt be able to guess what the 4,001th will be with any probability thats significantly better than chance.

The problem here is that the attacker cant know the seed. For a CRNG to be secure, the seed has to be unguessable, which can prove to be a challenge, as youll see in a little while.

What this really means is that the security of a CRNG can never be much better than the security of the seed. If the attacker has a 1 in 224 chance in guessing the seed, then they have a 1 in 224 chance of guessing which stream of numbers youre getting. Here, the system only has 24 bits of security, even if the underlying crypto is capable of 128 bits of security. The attackers challenge is only a bit harder because they do not know where in the stream of numbers you are.

CRNGs are often considered to be synonymous with stream ciphers. This is technically true. For example, RC4 is a stream cipher, which produces a string of random digits that you can then XOR with your plaintext to produce ciphertext . Or, you can use the output directly, and its a CRNG.

But, we consider CRNGs to include reseeding infrastructure when available, and not just the underlying cryptographic pseudo-random number generator. For that reason, modern CRNGs arent useful as ciphers, because they are taking a highly conservative approach, attempting to mix in new truly random data (entropy), and do so frequently. This is akin to taking a stream cipher, and randomly changing the key without telling anybody. Nobody can use it to communicate.

Another point to note about cryptographic generators is that the strength of their outputs can never be better than the strength of the underlying key. For example, if you want to generate 256-bit Advanced Encryption Standard (AES) keys, because you think 128 bits arent enough, dont use RC4 as your random number generator. Not only is RC4 generally used with 128-bit keys, but also the effective strength of those keys is only 30 bits.

These days, most operating systems come with their own CRNGs, and harvest true random numbers on their own, so its not as important to be able to build these things yourself.

Sinful True Random Number Generators

If CRNGs need a truly random seed to operate , and if youre not doing Monte Carlo experiments you want to be able to repeat, then why not just skip right over them, and go straight to true random number generators (TRNGs)?

If you could, that would be great. But, in practice, thats hard to do, partially because computers are so deterministic. There are few uncertain events that happen on a machine, and, yes, its good to measure those. For example, its common to measure the time between keystrokes, or mouse movements. However, there isnt nearly as much uncertainty in those kinds of events as one would like. This is because while a processor might be capable of running very quickly, keyboard events and the like tend to come in on very regular intervals in comparison, because theyre tied to clocks internal to the device that are much, much slower than the system clock. On the other hand, its hard to tell exactly what the capabilities of an attacker will be; these kinds of sources are generally only estimated to have a few bits of unguessable data per sample.

People will try to get unguessable data out of other parts of the system, but system state doesnt change all that unpredictably. Some of the popular sources (usually kernel and process state) can change far more slowly than expected, as well.

As a result, true random numbers on the typical machine are in short supply relative to the demand for them, especially on server hardware that has nobody sitting in front of the console using the keyboard and mouse. While its possible to solve the problem with hardware, its usually not cost effective. Therefore, it usually pays to be sparse with true random numbers and use them instead to seed CRNGs.

Plus, data that has entropy in it, such as a mouse event, isnt directly usable as a random number. Even data that comes off a hardware random number generator can end up having slight statistical biases. Therefore, its a best practice to whiten true entropy to remove any statistical patterns. One good way to do that is to seed a CRNG and take output from there.

Related Sins

Having guessable random numbers is one of the ways that cryptosystems can fail. In particular, a way to misuse SSL is to not use a good source of randomness, making session keys predictable. We show an example of this later in the chapter.