ALTHOUGH WE NOW HAVE AT our disposal a software package with a well-founded and well-rounded suite of functions, we confront now the question of in what directions our work might be continued. There are possibilities for work in the areas of functionality and performance. With regard to functionality, one can imagine the application of the basic functions in FLINT/C to areas that have been only touched upon or not even mentioned at all, such as factorization or elliptic curves, which have properties that have led to increasing interest in them for application to cryptography. The interested reader can find detailed explications in [Bres], [Kobl], and [Mene], but also in the standard works [Cohe], [Schn], and [MOV], which we have cited frequently and which contain many references to the literature.
A second area for development is that of measures for improving throughput, first and foremost the increase in digit length from 16 to 32 bits (B = 232), as well as through the use of assembler functions and, for platforms that support it, the C/C++ implementation.
The work in development and testing for this last approach could be carried out independent of platform, such as with the help of the GNU compiler gcc, using the gcc type unsigned long long: The type CLINT would be defined by typedef ULONG CLINT[CLINTMAXLONG];. Furthermore, certain constants would have to be adjusted that relate to the base of the internal representation of integers.
In the functions of the FLINT/C package all explicit casts and other references to USHORT must be replaced by ULONG and those to ULONG by unsigned long (or after a suitable typedef by, say, ULLONG. A few functions that make assumptions about the length of a digit in the data type used must be ported. After an extensive test and debugging phase including static syntax checking (cf. Chapter 12) the FLINT/C package would then be ready for CPUs with 64-bit word length.
The inclusion of assembler functions also makes it possible to operate with digits of 32 bits and results of 64 bits, and to do so on processors that themselves have only a 32-bit word length but that nonetheless support a 64-bit result of an arithmetic operation.
Since with the use of assembler functions we are abandoning our previous strategy of independence of special platforms, it is useful to implement such functions in a narrowly targeted way. We must therefore identify those FLINT/C functions that would most profit in speedup from assembler support. It is not difficult to determine which are the functions in question. They are those arithmetic functions with quadratic run-time behavior: multiplication, squaring, and division. Since the basic operations occupy the principal portion of time taken up by most of the number-theoretic functions, time improvements in those functions would be linear, without directly changing the implementation of the algorithms. To benefit from such a potential, for the FLINT/C package the functions
mult(), umul(), sqr(), div_l(),
are implemented in 80x86 assembler. The functions mult(), umul(), and sqr() form the respective kernels of the functions mult_l(), umul_l(), and sqr_l() (see page 70). The functions support arguments up to length 4096 binary digits, which is 256 (= MAXB) digits of a CLINT number, and results of double that length. The assembler functions are, like the corresponding C functions, implemented according to the algorithms given in Chapter 4, where access to the CPU register allows processing of 32-bit arguments and 64-bit results with the arithmetic machine commands (cf. Chapter 2).
The assembler module mult.asm, umul.asm, sqr.asm, and div.asm are contained in the FLINT/C package as source code. They can be assembled using Microsoft MASM (call: ml /Cx /c /Gd <filename>) or Watcom WASM,[1] and these replace the corresponding C functions when the module flint.c is translated with -DFLINT_ASM.[2] The calculation times given again in Appendix D permit a direct comparison of some of the important functions with and without assembler support.
Montgomery exponentiation (see Chapter 6) offers additional savings potential, and also the two auxiliary functions mulmon_l() and sqrmon_l() (cf. page 108) can be implemented profitably as assembler functions with 32-bit digits. A starting point for this is offered by the modules mul.asm and sqr.asm. As the interested reader can see, there is a very large sandbox out there to play in.
For now, this is all we know.
—Jon Hiseman, Colosseum
[1]Depending on the compiler used the indicators mult, umul, sqr, and div_l of the assembler procedures are to be provided with underscores (_mult, _umul, _sqr, and _div_l), since WASM does not generate them.
[2]With the modules mult.asm, sqr.asm, umul.asm, and div.asm this functions on 80x86 compatible platforms. For other platforms one must carry out the corresponding implementations.
| Team-Fly   | 
