Chapter 11: Large Random Numbers | Cryptography in C and C++

Team-Fly

Mathematics is full of pseudorandomness—plenty enough to supply all would-be creators for all time.

—D. R. Hofstadter, Gödel, Escher, Bach

SEQUENCES OF "RANDOM" NUMERICAL VALUES are used in many statistical procedures, in numerical mathematics, in physics, and also in number-theoretic applications to replace statistical observations or to automate the input of variable quantities. Random numbers are used:

to select random samples from a larger set,
in cryptography to generate keys and in running security protocols,
as initial values in procedures to generate prime numbers,
to test computer programs (a topic to which we shall return),
for fun,

as well as in many additional applications. In computer simulations of natural phenomena random numbers can be used to represent measured values, thereby representing a natural process (Monte Carlo methods). Random numbers are also useful when arbitrary randomly selected numbers are required. Before we set out in this chapter to produce some functions for the generation of large random numbers, which will be required, in particular, for cryptographic applications, we should take care of some methodological preparations.

There are many sources of random numbers, but we should be sure to differentiate between genuine random numbers, which arise as the result of random experiments, and pseudorandom numbers, which are generated algorithmically. Genuine random numbers arise from such processes as the tossing of coins or dice, spinning a (fair) roulette wheel, observing processes of radioactive decay with the aid of suitable measuring equipment, and evaluating the output of electronic components. In contrast to these, pseudorandom numbers are computed by algorithms, generated with the aid of pseudorandom number generators, which are deterministic and therefore both predictable and reproducible. Pseudorandom numbers thus do not arise randomly in the strict sense of the word. The reason that this situation can frequently be ignored is that we are in possession of algorithms that are able to produce pseudorandom numbers of "high quality," where we shall have to explain what we mean by this term.

The first thing that we establish is that in fact, it makes no sense to talk about a single number being "random," but that mathematical requirements for randomness are always satisfied by sequences of numbers. Knuth speaks of a sequence of independent random numbers with a particular distribution, in which every number is produced randomly and independently of all other numbers of the sequence, and every number assumes a value within a certain range of values with a certain probability (see [Knut], Section 3.1). We use the terms "random" and "independent" here to mean that the events leading to the selection of concrete numbers are too complex in their formation and interaction to be detected by statistical or other tests.

This ideal is theoretically unachievable by generating numbers using deterministic procedures. Yet the goal of many different algorithmic techniques is to approach this ideal as closely as possible. In parallel to this has been the development of theoretical and empirical tests to detect structural properties of sequences of pseudorandom numbers and thereby measure the quality of the algorithms for generating such sequences. We shall not go into this more deeply. The theory in this area in deep and complex. Instead, we shall refer the reader to the literature, a good overview of which can be found in [Knut], while a comprehensive presentation of particularly the theoretical evaluation of random number generators is to be found in [Nied]. Some pragmatic suggestions for testing sequences of random numbers can be found in [FIPS].

For our purposes we first select from the many possibilities at hand a proven and frequently used procedure for generating pseudorandom numbers (for the sake of brevity we shall frequently drop the "pseudo" and speak simply of random numbers, random sequences, and random number generators) and spend some time with the method of linear congruences. Beginning with an initial value X₀ the elements of a sequence are generated by the linear recursion

(11.1)

This procedure was developed in 1951 by D. Lehmer, and it has enjoyed considerable popularity since that time. Despite their simplicity, linear congruences can produce excellent random sequences, where their quality, as one might expect, depends on the choice of parameters a, b, and m. In [Knut] it is shown that linear congruences with carefully chosen parameters can pass through the hoops of statistical tests with flying colors, but that on the other hand, a random selection of parameters almost always leads to a poor generator. The moral is this: Be careful in your choice of parameters!

The choice of m as a power of two has at once the advantage that forming the residue modulo m can be accomplished with a mathematical AND. An accompanying disadvantage is that the least-significant binary digits of numbers thus generated demonstrate less random behavior than the most-significant digits, and thus one must be careful in working with such numbers. In general, one must look out for poor random properties of such numbers formed from sequential values of a linear congruence generator modulo a prime divisor of the modulus m, so that the choice of m as a prime number should also be considered, since in this case individual binary digits are no worse than any others.

The choice of a and m has influence on the periodic behavior of the sequence: Since only finitely many, namely at most m distinct, sequence values can appear, the sequence begins to repeat at the latest with the generation of the (m + 1)st number. That is, the sequence is periodic. (One says also that the sequence enters a period or a cycle.) The entry point into a cycle need not be the initial value X₀, but can be some later value X_μ. The numbers X₀, X₁, X₂, ..., X_μ−1 are called the nonrecurring elements. We may thus indicate the periodic behavior of the sequence as shown in Figure 11.1

Figure 11.1: Periodic behavior of a pseudorandom sequence

Since the regular repetition of numbers in short cycles represents poor random behavior according to all reasonable criteria, we must strive to maximize the length of the cycles or indeed to find generators that possess only cycles of maximum length. We can establish criteria by which a linear congruence sequence with parameters a, b, and m possesses exactly the maximal period length. Namely, the following conditions should be fulfilled:

gcd(b, m) = 1.
For all primes p one has p | m ⇒ p | (a − 1).
4 | m ⇒ 4 | (a − 1).

For a proof and additional details see [Knut], Section 3.2.1.2.

As an example of parameters that fulfill these criteria let us consider the linear congruence that the ISO-C standard recommends as exemplary for the function rand():

where m = 2^k, with k determined by 2^k − 1 being the largest number representable by the type unsigned int. The number X_i+1 is not returned as the value of rand(), but rather X_i+1/2¹⁶ mod (RAND_MAX + 1), so that the function rand() generates all values between 0 and RAND_MAX. The macro RAND_MAX is defined in stdio.h and should have a value of at least 32267 (see [Pla1], p. 337). Here the recommendation of Knuth to do without the least-significant binary digits in the case of power-of-two moduli has apparently been taken into account. We easily determine that the above requirements (i)–(iii) are satisfied and that therefore a sequence produced by this generator possesses the maximum period length 2^k.

Whether this happens to be the case for a particular implementation of the C library, whose source code is usually unavailable,^[1] can be tested under favorable circumstances with the aid of the following algorithm by R. P. Brent. The Brent algorithm determines the period length λ of a sequence that is computed by the recursion X_i+1 = F (X_i) on a set of values D using the generating function F : D → D and an initial value X₀ Î D. One needs at most 2 · max {μ, λ} calculations of the function F (cf.[HKW], 4.2).

Algorithm of Brent for determining the period length λ of a sequence generated by X₀, X_i+1 = F (X_i)

Set y ← X₀, r ← 1, and k ← 0.
Set x ← y, j ← k, and r ← r + r.
Set k ← k + 1 and y ← F(y); repeat this step until x = y or k ≥ r.
If x ≠ y, go to step 2. Otherwise, output λ = k − j.

This process is successful only if in step 3 one actually sees the actual sequence values F (y) and not, as in the above ISO recommendation, only their most-significant parts.

This cycle test can be extended with the chi-squared test (also written " χ² test"), which tests how well an empirically obtained probability distribution corresponds to a theoretically expected distribution. The chi-squared test computes the statistic

(11.2)

where for t distinct events X_i we designate H (X_i) the observed frequency of the event X_i, pr (X_i) the probability for the occurrence of X_i, and n the number of observations. For the case to which these distributions correspond, the statistic χ², viewed as a random variable, has the expected value E (χ²) = t − 1. The threshold values that lead to the rejection of the test hypothesis of equality of the distributions for given error probabilities can be read from tables of the chi-squared distribution for t − 1 degrees of freedom (cf. [Bos1], Section 4.1).

The chi-squared test is employed in connection with many empirical tests to measure their results for correspondence with the theoretically calculated test distributions. The test is particularly simple to apply for sequences of uniformly distributed (which is our test hypothesis!) random numbers X_i in a range of values W = { 0, ..., w − 1 }: We assume that each of the numbers in W is taken with the same probability p = 1/w and thus expect that among n random numbers X_i each number from W appears approximately n/w times (where we assume n > w). However, this should not be exactly the case, because the probability P_k that among n random numbers X_i a specific value w Î W appears k times is given by

(11.3)

This binomial distribution indeed has the largest values for k ≈ n/w, but the probabilities P₀ = (1 − p)ⁿ and P_n = pⁿ are not equal to zero. Under the assumption of random behavior we therefore expect to observe in the sequence of the X_i frequencies h_w of individual values w Î W according to the binomial distribution. Whether this actually occurs is established by the chi-squared test in calculating

(11.4)

The test is repeated for several random samples (partial sequences of X_i). A rough approximation to the chi-squared distribution allows us to deduce that in most cases the test result χ² must lie in the interval [w − 2 , w + 2 ]. Otherwise, the given sequence would attest to a lack of randomness. Based on this, the probability of an error, namely, that an actually "good" random sequence is declared "bad" based on the result of the chi-squared test, is about two percent. Take note that the test is valid only for a sufficiently large number of samples: This number must be at least n = 5w (see [Bos2], Section 6.1), with an even larger number to be preferred.

The linear congruence generator in the ISO-C standard that we considered above passes this simple test, as do the pseudorandom number generators that we shall implement below for the FLINT/C package.

After this brief excursion into statistics we should indicate that random sequences must meet other criteria in addition to the statistical requirements, depending on the area of application: Random sequences that are to be used for cryptographic purposes may not in any way be predictable or reproducible from a few elements without the knowledge of additional, secret, information, in order that attackers be denied the possibility of reconstructing cryptographic keys or sequences of keys generated by such a sequence. A random number generator that has been well examined with respect to such properties is the BBS bit generator of L. Blum, M. Blum, and M. Shub, which is based on results of complexity theory. We shall now describe this procedure and then, further below, implement it, though without going into the theoretical details, for which see [Blum] or [HKW], Chapter 4 and Section 6.5.

We shall need two primes p, q ≡ 3 mod 4, which we multiply by a modulus n as well as by a number X that is relatively prime to n. From X₀ := X² mod n we obtain the initial value X₀ for a sequence of numbers that we calculate by repeated squaring modulo n:

As our random number we take the least-significant bit of each X_i.

A random sequence constructed from binary digits thus obtained is secure in the cryptographic sense: The predictability of further binary digits from those that have already been computed is possible only if the factors p and q of the modulus are known. If these are kept secret, then the modulus n must be factored in order to predict future bits with a probability of success greater than or to reconstruct unknown parts of the sequence. The security of the BBS generator rests on the same principle as that of the RSA procedure. The price of the trust that one can derive in the qualities of the BBS generator is the difficulty in calculating the random bits: For each bit one must execute a squaring modulo a large integer, which is clearly reflected in a large time requirement for generating long random sequences. In the development of shorter sequences of random numbers, such as for generating individual cryptographic keys, this is of little consequence. Here the only issue is security, though in judging the degree of security the process for obtaining initial values must also be considered. Since the BBS generator is a deterministic process, "pure chance" can be obtained only if initial values are chosen by a suitable method. One can use dates and times, system statistics such as the number of clock ticks for given processes, or measurements of external events such as the time between keyboard events or mouse clicks executed by the user, as well as other methods that are best used in combination (for further tips on obtaining initial values, see [East] and [Matt].^[2]

We turn our attention now to the actual topic of this chapter and from the resources available compose two methods for generating random numbers in CLINT format. As starting point for the generation of prime numbers, for example, we would like to be able to generate large numbers with a predetermined number of binary digits. For this the most-significant bit should be set to 1, and all remaining bits should be generated randomly.

First we construct a linear congruence generator from whose sequential values we will take the digits of a CLINT random number. The parameters a = 6364136223846793005 and m = 2⁶⁴ for our generator are taken from the table with results of the spectral test in Knuth ([Knut], pages 102–104). The sequence X_i+1 (X_i · a + 1) mod m thus generated possesses a maximal period length λ = m as well as good statistical properties, as we conclude from the test results presented in the table. The generator is implemented in the following function rand64_l(). On each call to rand64_l() the next number in the sequence is generated and then stored in the global CLINT object SEED64, declared as static. The parameter a is stored in the global variable A64. The function returns a pointer to SEED64.

Function:	linear congruence generator with period length 2⁶⁴
Syntax:	clint * rand64_l (void);
Return:	pointer to SEED64 with calculated random number

 clint * rand64_l (void) {   mul_l (SEED64, A64, SEED64);   inc_l (SEED64);

The reduction modulo 2⁶⁴ proceeds simply by setting the length field of SEED64 and costs almost no computational time.

   SETDIGITS_L (SEED64, MIN (DIGITS_L (SEED64), 4));   return ((clint *)SEED64); }

Next, we require a function for setting the initial values for rand64_l(). This function is called seed64_l(), and it accepts a CLINT object as input, from which it takes at most four of the most-significant digits as initial values in SEED64. The previous value of SEED64 is copied into the static CLINT object BUFF64, and a pointer to BUFF64 is returned.

Function:	set an initial value for rand64_l()
Syntax:	clint * seed64_l (CLINT seed_l);
Input:	seed_l (initial value)
Return:	pointer to BUFF64 with previous value of SEED64

 clint * seed64_l (CLINT seed_l) {   int i;   cpy_l (BUFF64, SEED64);   for (i = 0; i <= MIN (DIGITS_L (seed_l), 4); i++)     {       SEED64[i] = seed_l[i];     }   return BUFF64; }

The following variant of the function seed64_l() accepts initial values of type ULONG.

Function:	Set an initial value for rand64_l()
Syntax:	clint * ulseed64_l (ULONG seed);
Input:	seed (initial value)
Return:	pointer to BUFF64 with previous value of SEED64

 clint * ulseed64_l (ULONG seed) {   cpy_l (BUFF64, SEED64);   ul2clint_l (SEED64, seed);   return BUFF64; }

The next function returns random numbers of type ULONG. All numbers are generated with a call to rand64_l(), where the most-significant digits of SEED64 are used to build a number of the requested type.

Function:	generation of a random number of type unsigned long
Syntax:	unsigned long ulrand64_l (void);
Return:	random number of type unsigned long

 ULONG ulrand64_l (void) {   ULONG val;   USHORT l;   rand64_l();   l = DIGITS_L (SEED64);   switch (l)     {     case 4:     case 3:     case 2:       val = (ULONG)SEED64[l-1];       val += ((ULONG)SEED64[l] << BITPERDGT);       break;     case 1:       val = (ULONG)SEED64[l];       break;     default:       val = 0;     }   return val; }

The FLINT/C package contains the additional functions ucrand64_l(void) and usrand64_l(void), which generate random numbers of types UCHAR and USHORT, respectively. However, we shall not discuss them here. We now present the function rand_l(), which generates large random numbers of CLINT type, with the number of binary digits to be specified.

Function:	generation of a random number of type CLINT
Syntax:	void rand_l (CLINT r_l, int l);
Input:	l (number of binary digits of the number to be generated)
Output:	r_l (random number in the interval 2^l−1 ≤ r_l ≤ 2^l − 1)

 void rand_l (CLINT r_l, int l) {   USHORT i, j, ls, lr;

The requested number of binary digits l is first bounded by the maximum permitted value for CLINT objects. Then the number ls of required USHORT digits and the position lr of the most-significant binary digit of the most-significant USHORT are determined.

  l = MIN (l, CLINTMAXBIT);  ls = (USHORT)l >> LDBITPERDGT;  lr = (USHORT)l & (BITPERDGT - 1UL);

Now the digits of r_l are generated by successive calls to the function usrand64_l(). The least-significant binary digits of SEED64 are therefore not used for the construction of CLINT digits.

  for (i = 1; i <= ls; i++)    {      r_l[i] = usrand64_l ();    }

Now follows the precise manufacture of r_l by setting the most-significant bit in position lr − 1 of the (ls + 1)st USHORT digit to 1 and the most-significant bits to 0. If lr == 0, then the most-significant bit of the USHORT digit ls is set to 1.

   if (lr > 0)     {       r_l[++ls] = usrand64_l ();       j = 1U << (lr - 1); /* j <- 2 ^ (lr - 1) */       r_l[ls] = (r_l[ls] | j) & ((j << 1) - 1);     }   else     {       r_l[ls] |= BASEDIV2;     }   SETDIGITS_L (r_l, ls); }

To close this chapter we shall now implement the BBS generator. To this end we determine, with the help of the function prime_l(), primes p ≡ q ≡ 3 mod 4, both with approximately the same number of binary digits (this leads to the result that the modulus is as difficult as possible to factor, upon which the cryptographic security of the BBS generator depends; see page 328), and we form from them a modulus n = pq. Such a modulus with 2048 binary digits is contained in the FLINT/C package, though without the associated factors (these are known only to the author ;-)).

The static variables XBBS and MODBBS hold the members of the sequence X_i and the modulus n. The function randbit() uses these to calculate the next random bit.

Function:	pseudorandom number generator à la Blum-Blum-Shub
Syntax:	int randbit_l (void);
Return:	value from {0, 1}

 static CLINT XBBS, MODBBS; static const char *MODBBSSTR = "81aa5c..."; /* modulus as a character string */ int randbit_l (void) {   msqr_l (XBBS, XBBS, MODBBS);

Output the least-significant bit of XBBS.

   return (*LSDPTR_L (XBBS) & 1); }

The initialization of the BBS generator is accomplished with the help of the function seedBBS_l().

Function:	set initial values for randbit_l() and randBBS_l()
Syntax:	int seedBBS_l (CLINT seed_l);
Input:	seed_l (initial value)

 int seedBBS_l (CLINT seed_l) {   CLINT gcd_l;   str2clint_l (MODBBS, (char *)MODBBSSTR, 16);   gcd_l (seed_l, MODBBS, gcd_l);   if (!EQONE_L (gcd_l))     {       return -1;     }   msqr_l (seed_l, XBBS, MODBBS);   return 0; }

The function ulrandBBS_l() below, which also generates random numbers of type ULONG, is the analogue of the function ulrand64_l().

Function:	generation of a random number of type unsigned long
Syntax:	unsigned long ulrandBBS_l (void);
Return:	random number of type unsigned long

 ULONG ulrandBBS_l (void) {   ULONG i, r = 0;   for (i = 0; i < (sizeof(ULONG) << 3); i++)     {       r = (r << 1) + randbit_l();     }   return r; }

We still lack the function randBBS_l(CLINT r_l, int l), which generates random numbers r_l with exactly l binary digits r_l in the interval 2^l−1 ≤ r_l ≤ 2^l − 1. Since this corresponds to a great extent to the function rand_l(), we shall omit its description. Of course, the function is contained in the FLINT/C package.

^[1]The GNU-C library, of the Free Software Foundation, and the EMX-C library, by Eberhard Mattes, are excellent exceptions. The rand() function of the EMX library uses the parameters a = 69069, b = 5, and m = 2³². The multiplier a = 69069 suggested by G. Marsaglia produces, together with the modulus m = 2³², good statistical results and a maximal period length (see [Knut], pp. 102–104).

^[2]For highly sensitive applications the generation of initial values or entire random sequences made up of truly random numbers should always be done with suitable hardware components.

Team-Fly