Section 19.2. Cryptography and Random Numbers | Linux Application Development (paperback) (2nd Edition)

19.2. Cryptography and Random Numbers

The authors of this book are not cryptography experts. Writing cryptographic software is a particularly subtle pursuit, and anyone who attempts it without proper research will fail to write robust and secure cryptographic applications. This chapter has two, and only two, purposes:

To convince people who are not experts in cryptography that this is an area best left to experts.
To let experts in cryptography know that a particularly useful tool is available for their use.

If you are not an expert in cryptography but need to use it, we suggest Applied Cryptography [Schneier, 1996] as an excellent overview and introduction to the topic.

Cryptography is generally no different from other software in its predictability requirements; when you give a program the key to decrypt data, you want it to decrypt the data the same way every time. There is one exception: choosing a truly random key. No matter how sophisticated an encryption algorithm is, it is worthless if an attacker can guess what key was used to generate the data. For example, many encrypted messages include some sort of timestamp that tells approximately when they were created. If you then use the current time to seed a common pseudo-random number generator, it would not take long for the attacker to decrypt the data by simply using the time when the message was created to seed common pseudo-random number generators and try keys based on those numbers.

It does not work much better to ask a human to provide a key. In general, people pick keys that are hardly random. Their keys are generally related to natural language text, which is highly predictable in terms of information theory. Natural language is said to have low entropy; a truly random key has high entropy.

If every computer had a small radiation source, the unpredictable amount of time between the particles emitted by decaying atoms could be used to produce truly random numbers, numbers that could not be predicted based on any other information. No other known data would be sufficient to predict which numbers might have been created by the emission of radiation.

Since most computers are not equipped with such devices, Linux improvises. Ted Ts'o wrote code that examines timings of external events (mouse clicks, keyboard keypresses, and so on) and extracts information from them, storing it in an entropy pool. There are components of human (and other external) interaction with the computer that are essentially random. The code that fills the entropy pool attempts to distinguish the amount of entropy that is being added, which allows the programmer to determine the amount of entropy used to generate random information. Recently, many computers have started to provide hardware-based sources of cryptographically random data; on Linux, this random data is fed into the system entropy pool as needed so that all Linux programs can use the same interface regardless of the hardware they use.

Programmers who need random numbers based on unpredictable events can take random numbers from the entropy pool through one of two similar devices: /dev/random and /dev/urandom. The /dev/random device returns only as many bytes of random data as the device currently estimates are in the entropy pool. The /dev/urandom device does not try to offer any guarantees about the amount of entropy in the information it returns; it generates as much random data as you want, based on the entropy pool. Whenever either device is read, it subtracts the number of bytes read from the entropy count.

When you read data from either device, it does not simply return the data that is in the entropy pool. It returns data stirred by a one-way hash algorithm that does not reveal the state of the pool in its output.

Use neither /dev/random nor /dev/urandom for data you want to replicate. They are particularly useless sources of data for Monte Carlo methods; even 1,2... n-1,n would be better; it is at least a repeatable series.
If you need only a certain amount of entropy, but need more raw data than entropy, you can read a small amount of data from one of the random devices (depending on what quality you need guaranteed) and then extend it with a hash function such as MD5 or SHA.

The source code to the random driver, drivers/char/random.c, includes considerable documentation on the details. If you do intend to write cryptographic code that uses the data provided by one of the interfaces, we recommend that you read that documentation first.