Through a generalization of the binary algorithm on page 80 the number of modular multiplications for exponentiation can be reduced even further. The idea is to represent the exponent in a base greater than 2 and to replace multiplication by a in step 3 by multiplication by powers of a. Thus let the exponent e be given by e = (en−1 en−2...e0)M, to a base M yet to be determined. The following algorithm calculates the powers ae mod m.
M-ary algorithm for exponentiation ae mod m
Calculate and store a2 mod m, a3 mod m,..., aM−1 mod m as a table.
Set p ← mod m and i ← n − 2.
Set p ← pM mod m.
If ei ≠ 0, set p ← mod m.
Set i ← i − 1; if i ≥ 0, go to step 3.
Output p.
The number of necessary multiplications evidently depends on the number of digits of the exponent e and thus on the choice of M. Therefore, we would like to determine M such that the exponentiation in step 3 can be computed to the greatest extent possible by means of squaring, as in the example above for 216, and such that the number of multiplications by the precomputed powers of a be minimized to a justifiable cost of storage space for the table.
The first condition suggests that we choose M as a power of 2: M = 2k. In view of the second condition we consider the number of modular multiplications as a function of M:
We require
(6.1) |
squares in step 3 and on average
(6.2) |
modular multiplications in step 4, where
is the probability that a digit ei of e is nonzero. If we include the M − 2 multiplications for the computation of the table, then the M-ary algorithm requires on average
(6.3) |
(6.4) |
modular squarings and multiplications.
For exponents e and moduli m of, say, 512 binary places and M = 2k we obtain the numbers of modular multiplications for the calculation of ae mod m as shown in Table 6.1. The table shows as well the memory requirement for the precomputed powers of a mod m, which result from the product (2k − 2)CLINTMAXSHORT · sizeof(USHORT).
k | Multiplications | Memory (in bytes) |
---|---|---|
1 | 766 | 0 |
2 | 704 | 1028 |
3 | 666 | 3084 |
4 | 644 | 7196 |
5 | 640 | 15420 |
6 | 656 | 31868 |
We see from the table that the average number of multiplications reaches a minimum of 640 at k = 5, where the required memory for each larger k grows by approximately a factor of 2. But what are the time requirements for other orders of magnitude of the exponents?
Table 6.2 gives information about this. It displays the requirements for modular multiplication for exponentiation with exponents with various numbers of binary digits and for various values of M = 2k. The exponent length of 768 digits was included because it is a frequently used key length for the RSA cryptosystem (see Chapter 16). The favorable numbers of multiplications appear in boldface type.
Number of binary digits in the exponent | ||||||||
---|---|---|---|---|---|---|---|---|
k | 32 | 64 | 128 | 512 | 768 | 1024 | 2048 | 4096 |
1 | 45 | 93 | 190 | 766 | 1150 | 1534 | 3070 | 6142 |
2 | 44 | 88 | 176 | 704 | 1056 | 1408 | 2816 | 5632 |
3 | 46 | 87 | 170 | 666 | 996 | 1327 | 2650 | 5295 |
4 | 52 | 91 | 170 | 644 | 960 | 1276 | 2540 | 5068 |
5 | 67 | 105 | 181 | 640 | 945 | 1251 | 2473 | 4918 |
6 | 98 | 135 | 209 | 656 | 954 | 1252 | 2444 | 4828 |
7 | 161 | 197 | 271 | 709 | 1001 | 1294 | 2463 | 4801 |
8 | 288 | 324 | 396 | 828 | 1116 | 1404 | 2555 | 4858 |
In consideration of the ranges of numbers for which the FLINT/C package was developed, it appears that with k = 5 we have found the universal base M = 2k, with which, however, there is a rather high memory requirement of 15 kilobytes for the powers a2, a3,..., a31 to base a that are to be precomputed. The M-ary algorithm can be improved, however, according to [Cohe], Section 1.2, to the extent that we can employ not M − 2, but only M/2, premultiplications and thus require only half the memory. The task now additionally consists in the calculation of the power ae mod m, where e = (en−1en−2...e0)M is the representation of the exponent to the base M = 2k.
M-ary Algorithm for exponentiation with reduced number of premultiplications
Compute and store a3 mod m, a5 mod m, a7 mod m,..., mod m.
If en−1 = 0, set p ← 1.
If en−1 ≠ 0, factor en−1 = 2tu with odd u. Set p ← au mod m.
In each case set i ← n − 2.
If ei = 0, set p ← mod m by calculating mod m (k-fold squaring modulo m).
If ei ≠ 0, factor ei = 2tu with odd u; set p ← mod m and then p ← pau mod m; now set p ← mod m.
Set i ← i − 1; if i ≥ 0, go to step 3.
Output p.
The trick of this algorithm consists in dividing up the squarings required in step 3 in a clever way, such that the exponentiation of a is taken care of together with the even part 2t of ei. Within the squaring process the exponentiation of a by the odd part u of ei remains. The balance between multiplication and squaring is shifted to the more favorable squaring, and only the powers of a with odd exponent need to be precomputed and stored.
For this splitting one requires the uniquely determined representation ei = 2tu, u odd, of the exponent digit ei. For rapid access to t and u a table is used, which, for example, for k = 5 is displayed in Table 6.3.
ei | t | u | ei | t | u | ei | t | u |
---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 11 | 0 | 11 | 22 | 1 | 11 |
1 | 0 | 1 | 12 | 2 | 3 | 23 | 0 | 23 |
2 | 1 | 1 | 13 | 0 | 13 | 24 | 3 | 3 |
3 | 0 | 3 | 14 | 1 | 7 | 25 | 0 | 25 |
4 | 2 | 1 | 15 | 0 | 15 | 26 | 1 | 13 |
5 | 0 | 5 | 16 | 4 | 1 | 27 | 0 | 27 |
6 | 1 | 3 | 17 | 0 | 17 | 28 | 2 | 7 |
7 | 0 | 7 | 18 | 1 | 9 | 29 | 0 | 29 |
8 | 3 | 1 | 19 | 0 | 19 | 30 | 1 | 15 |
9 | 0 | 9 | 20 | 2 | 5 | 31 | 0 | 31 |
10 | 1 | 5 | 21 | 0 | 21 |
To calculate these values we can use the auxiliary function twofact_l(), which will be introduced in Section 10.4.1. Before we can program the improved M-ary algorithm there remains one problem to be solved: How, beginning with the binary representation of the exponent or the representation to base B = 216, do we efficiently obtain its representation to base M = 2k for a variable k > 0? It will be of use here to do a bit of juggling with the various indices, and we can "mask out" the required digits ei to base M from the representation of e to base B. For this we set the following: Let (εr−1εr−2...ε0)2 be the representation of the exponent e to base 2 (we need this on account of the number r of binary digits). Let (eu−1eu−2...e0)B be the representation of e as a CLINT type to base B = 216, and let be the representation of e to the base M = 2k, k ≤ 16 (M should not be greater than our base B). The representation of e in memory as a CLINT object e_l corresponds to the sequence [u + 1],[e0],[e1],...,[eu−1],[0] of USHORT values e_l[i] for i = 0,..., u + 1; one should note that we have added a leading zero.
Let f := , and for i = 0,..., f let si := and di := ki mod 16. With these settings the following statements hold:
There are f + 1 digits in ; that is, n − 1 = f.
contains the least-significant bit of the digit .
di specifies the position of the least-significant bit of in (counting of positions begins with 0). If i < f and di > 16 − k, then not all the binary digits of are in ; the remaining (higher-valued) bits of are in +1. The desired digit thus corresponds to the k least-significant binary digits of
As a result we have for i Î {0,..., f} the following expression for determining :
(6.5) |
Since for the sake of simplicity we set e_l [sf + 2] ← 0, this expression holds as well for i = f.
We have thus found an efficient method for accessing the digits of the exponent in its CLINT representation, which arise from its representation in a power-of-two base 2k with k ≤ 16, whereby we are saved an explicit transformation of the exponent. The number of necessary multiplications and squarings for the exponentiation is now
(6.6) |
where in comparison to μ1(k) (see page 86) the expenditure for the precomputations has been reduced by half. The table for determining the favorable values of k (Table 6.4) now has a somewhat different appearance.
Number of binary digits in the exponent | ||||||||
---|---|---|---|---|---|---|---|---|
k | 32 | 64 | 128 | 512 | 768 | 1024 | 2048 | 4096 |
1 | 47 | 95 | 191 | 767 | 1151 | 1535 | 3071 | 6143 |
2 | 44 | 88 | 176 | 704 | 1056 | 1408 | 2816 | 5632 |
3 | 44 | 85 | 168 | 664 | 994 | 1325 | 2648 | 5293 |
4 | 46 | 85 | 164 | 638 | 954 | 1270 | 2534 | 5066 |
5 | 53 | 91 | 167 | 626 | 931 | 1237 | 2459 | 4904 |
6 | 68 | 105 | 179 | 626 | 924 | 1222 | 2414 | 4798 |
7 | 99 | 135 | 209 | 647 | 939 | 1232 | 2401 | 4739 |
8 | 162 | 198 | 270 | 702 | 990 | 1278 | 2429 | 4732 |
Starting with 768 binary digits of the exponent, the favorable values of k are larger by 1 than those given in the table for the previous version of the exponentiation algorithm, while the number of required modular multiplications has easily been reduced. It is to be expected that this procedure is on the whole more favorable than the variant considered previously. Nothing now stands in the way of an implementation of the algorithm.
To demonstrate the implementation of these principles we select an adaptive procedure that uses the appropriate optimal value for k. To accomplish this we rely again on [Cohe] and look for, as is specified there, the smallest integer value k that satisfies the inequality
(6.7) |
which comes from the formula μ2(k) given previously for the number of necessary multiplications based on the condition μ2(k + 1) − μ2(k) ≥ 0. The constant number of modular squarings ⌊log2 e⌋ for all algorithms for exponentiation introduced thus far is eliminated; here only the "real" modular multiplications, that is, those that are not squarings, are considered.
The implementation of exponentiation with variable k requires a large amount of main memory for storing the precomputed powers of a; for k = 8 we require about 64 Kbyte for 127 CLINT variables (this is arrived at via (27 − 1) * sizeof(USHORT) * CLINTMAXSHORT), where two additional automatic CLINT fields were not counted. For applications with processors or memory models with segmented 16-bit architecture this already has reached the limit of what is possible (see in this regard, for example, [Dunc], Chapter 12, or [Petz], Chapter 7).
Depending on the system platform there are thus various strategies appropriate for making memory available. While the necessary memory for the function mexp5_l() is taken from the stack (as automatic CLINT variables), with each call of the following function mexpk_l() memory is allocated from the heap. To save the expenditure associated with this, one may imagine a variant in which the maximum needed memory is reserved during a one-time initialization and is released only at the end of the entire program. In each case it is possible to fit memory management to the concrete requirements and to orient oneself to this in the commentaries on the following code.
One further note for applications: It is recommended always to check whether it suffices to employ the algorithm with the base M = 25. The savings in time that comes with larger values of k is relatively not so large in comparison to the total calculation time so as to justify in all cases the greater demand on memory and the thereby requisite memory management. Typical calculation times for various exponentiation algorithms, on the basis of which one can decide whether to use them, are given in Appendix D.
The algorithm, implemented with M = 25, is contained in the FLINT/C package as the function mexp5_l(). With the macro EXP_L() contained in flint.h one can set the exponentiation function to be used: mexp5_l() or the following function mexpk_l() with variable k.
Function: | modular exponentiation |
Syntax: |
int mexpk_l (CLINT bas_l, CLINT exp_l, CLINT p_l, CLINT m_l); |
Input: | bas_l (base) exp_l (exponent) m_l (modulus) |
Output: | p_l (power residue) |
Return: | E_CLINT_OK if all is ok E_CLINT_DBZ if division by 0 E_CLINT_MAL if malloc() error |
We begin with a segment of the table for representing ei = 2tu, u odd, 0 ≤ ei < 28. The table is represented in the form of two vectors. The first, twotab[], contains the exponents t of the two-factor 2t, while the second, oddtab[], holds the odd part u of a digit 0 ≤ ei < 25. The complete table is contained, of course, in the FLINT/C source code.
static int twotab[] = {0,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5, ...}; static USHORT oddtab[]= {0,1,1,3,1,5,3,7,1,9,5,11,3,13,7,15,1,17,9,19,5,21,11,23,3,25,13, ...}; int mexpk_l (CLINT bas_l, CLINT exp_l, CLINT p_l, CLINT m_l) {
The definitions reserve memory for the exponents plus the leading zero, as well as a pointer clint **aptr_l to the memory still to be allocated, which will take pointers to the powers of bas_l to be precomputed. In acc_l the intermediate results of the exponentiation will be stored.
CLINT a_l, a2_l; clint e_l[CLINTMAXSHORT + 1]; CLINTD acc_l; clint **aptr_l, *ptr_l; int noofdigits, s, t, i; unsigned int k, lge, bit, digit, fk, word, pow2k, k_mask;
Then comes the usual checking for division by 0 and reduction by 1.
if (EQZ_L (m_l)) { return E_CLINT_DBZ; } if (EQONE_L (m_l)) { SETZERO_L (p_l); /* modulus = 1 ==> residue = 0 */ return E_CLINT_OK; }
Base and exponent are copied to the working variables a_l and e_l, and any leading zeros are purged.
cpy_l (a_l, bas_l); cpy_l (e_l, exp_l);
Now we process the simple cases a0 = 1 and 0e = 0 (e > 0).
if (EQZ_L (e_l)) { SETONE_L (p_l); return E_CLINT_OK; } if (EQZ_L (a_l)) { SETZERO_L (p_l); return E_CLINT_OK; }
Next, the optimal value for k is determined; the value 2k is stored in pow2k, and in k_mask the value 2k−1. For this the function ld_l() is used, which returns the number of binary digits of its argument.
lge = ld_l (e_l); k = 8; while (k > 1 && ((k - 1) * (k << ((k - 1) << 1))/((1 << k ) - k - 1)) >= lge - 1) { --k; } pow2k = 1U << k; k_mask = pow2k - 1U;
Memory is allocated for the pointers to the powers of a_l to be computed. The base a_l is reduced modulo m_l.
if ((aptr_l = (clint **) malloc (sizeof(clint *) * pow2k)) == NULL) { return E_CLINT_MAL; } mod_l (a_l, m_l, a_l); aptr_l[1] = a_l;
If k > 1, then memory is allocated for the powers to be computed. This is not necessary for k = 1, since then no powers have to be precomputed. In the following setting of the pointer aptr_l[i] one should note that in the addition of an offset to a pointer p a scaling of the offset by the compiler takes place, so that it counts objects of the pointer type of p.
We have already mentioned that the allocation of working memory can be carried out alternatively in a one-time initialization. The pointers to the CLINT objects would in this case be contained in global variables outside of the function or in static variables within mexpk_l().
if (k > 1) { if ((ptr_l = (clint *) malloc (sizeof(CLINT) * ((pow2k >> 1) - 1))) == NULL) { return E_CLINT_MAL; } aptr_l[2] = a2_l; for (aptr_l[3] = ptr_l, i = 5; i < (int)pow2k; i+=2) { aptr_l[i] = aptr_l[i - 2] + CLINTMAXSHORT; }
Now comes the precomputation of the powers of the value a stored in a_l. The values a3, a5, a7,...,ak − 1 are computed (a2 is needed only in an auxiliary role).
msqr_l (a_l, aptr_l[2], m_l); for (i = 3; i < (int)pow2k; i += 2) { mmul_l (aptr_l[2], aptr_l[i - 2], aptr_l[i], m_l); } }
This ends the case distinction for k > 1. The exponent is lengthened by the leading zero.
*(MSDPTR_L (e_l) + 1) = 0;
The determination of the value f (represented by the variable noofdigits).
noofdigits = (lge - 1)/k; fk = noofdigits * k;
Word position si and bit position di of the digit ei in the variables word and bit.
word = fk >> LDBITPERDGT; /* fk div 16 */ bit = fk & (BITPERDGT-1U); /* fk mod 16 */
Calculation of the digit en−1 with the above-derived formula; en−1 is represented by the variable digit.
switch (k) { case 1: case 2: case 4: case 8: digit = ((ULONG)(e_l[word + 1] ) >> bit) & k_mask; break; default: digit = ((ULONG)(e_l[word + 1] | ((ULONG)e_l[word + 2] << BITPERDGT)) >> bit) & k_mask; }
First run through step 3 of the algorithm, the case digit = en−1 ≠ 0.
if (digit != 0) /* k-digit > 0 */ { cpy_l (acc_l, aptr_l[oddtab[digit]]);
Calculation of ; t is set to the two-part of en−1 via twotab[en−1]; p is represented by acc_l.
t = twotab[digit]; for (; t > 0; t--) { msqr_l (acc_l, acc_l, m_l); } } else /* k-digit == 0 */ { SETONE_L (acc_l); }
Loop over noofdigits beginning with f - 1.
for (--noofdigits, fk -= k; noofdigits >= 0; noofdigits--, fk -= k) {
Word position si and bit position di of the digit ei in the variables word and bit.
word = fk >> LDBITPERDGT; /* fk div 16 */ bit = fk & (BITPERDGT - 1U); /* fk mod 16 */
Computation of the digit ei with the formula derived above; ei is represented by the variable digit.
switch (k) { case 1: case 2: case 4: case 8: digit = ((ULONG)(e_l[word + 1] ) >> bit) & k_mask; break; default: digit = ((ULONG)(e_l[word + 1] | ((ULONG)e_l[word + 2] << BITPERDGT)) >> bit) & k_mask; }
Step 3 of the algorithm, the case digit = ei ≠ 0; t is set via the table twotab[ei] to the two-part of ei.
if (digit != 0) /* k-digit > 0 */ { t = twotab[digit];
Calculation of au in acc_l. Access to au with the odd part u of ei is via aptr_l [oddtab [ei]].
for (s = k - t; s > 0; s--) { msqr_l (acc_l, acc_l, m_l); } mmul_l (acc_l, aptr_l[oddtab[digit]], acc_l, m_l);
Calculation of ; p is still represented by acc_l.
for (; t > 0; t--) { msqr_l (acc_l, acc_l, m_l); } } else /* k-digit == 0 */ {
Step 3 of the algorithm, case ei = 0: Calculate .
for (s = k; s > 0; s--) { msqr_l (acc_l, acc_l, m_l); } } }
End of the loop; output of acc_l as power residue modulo m_l.
cpy_l (p_l, acc_l);
At the end, allocated memory is released.
free (aptr_l); if (ptr_l != NULL) free (ptr_l); return E_CLINT_OK; }
The various processes of M-ary exponentiation can be clarified with the help of a numerical example. To this end let us examine the calculation of the power 1234667 mod 18577, which will be carried out by the function mexpk_l() in the following steps:
Precomputations
The representation of the exponent e = 667 can be expressed to the base 2k with k = 2 (cf. the algorithm for M-ary exponentiation on page 86), whereby the exponent e has the representation e = .
The power a3 mod 18577 has the value 17354. Further powers of a do not arise in the precomputation because of the small size of the exponent.
Exponentiation loop
exponent digit ei = 2tu | 21 · 1 | 21 · 1 | 20 · 1 | 21 · 1 | 20 · 3 |
p ← p2 mod n | - | 14132 | 13261 | 17616 | 13599 |
p ← mod n | - | - | 4239 | - | 17343 |
p ← pau mod n | 1234 | 13662 | 10789 | 3054 | 4445 |
p ← p2 mod n | 18019 | 7125 | - | 1262 | - |
Result
p = 1234667 mod 18577 = 4445.
As an extension to the general case we shall introduce a special version of exponentiation with a power of two 2k as exponent. From the above considerations we know that this function can be implemented very easily by means of k-fold exponentiation. The exponent 2k will be specified by k.
Function: | modular exponentiation with exponent a power of 2 |
Syntax: |
int mexp2_l (CLINT a_l, USHORT k, CLINT p_l, CLINT m_l); |
Input: | a_l (base) k (exponent of 2) m_l (modulus) |
Output: | p_l (residue of mod m_l) |
Return: | E_CLINT_OK if all is ok E_CLINT_DBZ if division by 0 |
int mexp2_l (CLINT a_l, USHORT k, CLINT p_l, CLINT m_l) { CLINT tmp_l; if (EQZ_L (m_l)) { return E_CLINT_DBZ; }
If k > 0, then a_l is squared k times modulo m_l.
if (k > 0) { cpy_l (tmp_l, a_l); while (k-- > 0) { msqr_l (tmp_l, tmp_l, m_l); } cpy_l (p_l, tmp_l); } else
Otherwise, if k = 0, we need only to reduce modulo m_l.
{ mod_l (a_l, m_l, p_l); } return E_CLINT_OK; }
Team-Fly |