Chapter 2: Number Formats: The Representation of Large Numbers in C

Team-Fly

So I have made up my own system for writing large numbers and I am going to use this chapter as a chance to explain it

—Isaac Asimov, Adding a Dimension

The process that has led to the higher organization of this form could also be imagined differently

—J. Weber, Form, Motion, Color

ONE OF THE FIRST STEPS in creating a function library for calculating with large numbers is to determine how large numbers are to represented in the computer's main memory. It is necessary to plan carefully, since decisions made at this point will be difficult to revise at a later time. Changes to the internal structure of a software library are always possible, but the user interface should be kept as stable as possible in the sense of "upward compatibility."

It is necessary to determine the order of magnitude of the numbers to be processed and the data type to be used for coding these numerical values.

The basic function of all routines in the FLINT/C library is the processing of natural numbers of several hundred digits, which far exceeds the capacity of standard data types. We thus require a logical ordering of a computer's memory units by means of which large numbers can be expressed and operated on. In this regard one might imagine structures that automatically create sufficient space for the values to be represented, but no more than is actually needed. One would like to maintain such economically organized housekeeping with respect to main memory by means of dynamic memory management for large numbers that allocates or releases memory according to need in the course of arithmetic operations. Although such can certainly be realized (see, for example, [Skal]), memory management has a price in computation time, for which reason the representation of integers in the FLINT/C package gives preference to the simpler definition of static length.

For representing large natural numbers one might use vectors whose elements are a standard data type. For reasons of efficiency an unsigned data type is to be preferred, which allows the results of arithmetic operations to be stored in this type without loss as unsigned long (defined in flint.h as ULONG), which is the largest arithmetic standard C data type (see [Harb], Section 5.1.1). A ULONG variable can usually be represented exactly with a complete register word of the CPU.

Our goal is that operations on large numbers be reducible by the compiler as directly as possible to the register arithmetic of the CPU, for those are the parts that the computer calculates "in its head," so to speak. For the FLINT/C package the representation of large integers is therefore by means of the type unsigned short int (in the sequel USHORT). We assume that the type USHORT is represented by 16 bits and that the type ULONG can fully accept results of arithmetic operations with USHORT types, which is to say that the informally formulated size relationship USHORT × USHORT ≤ ULONG holds.

Whether these assumptions hold for a particular compiler can be deduced from the ISO header file limits.h (cf. [Harb], Sections 2.7.1 and 5.1). For example, in the file limits.h for the GNU C/C++ compiler (cf. [Stlm]) the following appears:

 #define UCHAR_MAX 0×ffU #define USHRT_MAX 0×ffffU #define UINT_MAX 0×ffffffffU #define ULONG_MAX 0×ffffffffUL

One should note that with respect to the number of binary places there are actually only three sizes that are distinguished. The type USHRT (respectively USHORT in our notation) can be represented in a 16-bit register; the type ULONG fills the word length of a CPU with 32-bit registers. The type ULONG_MAX determines the value of the largest unsigned whole numbers representable by scalar types (cf. [Harb], page 110).^[1] The value of the product of two numbers of type USHRT is at most 0×ffff * 0×ffff = 0×fffe0001 and is thus representable by a ULONG type, where the least-significant 16 bits, in our example the value 0×0001, can be isolated by a cast operation into the type USHRT. The implementation of the basic arithmetic functions of the FLINT/C package is based on the above-discussed size relationship between the types USHORT and ULONG.

An analogous approach, one that used data types with 32-bit and 64-bit lengths in the role of USHORT and ULONG in the present implementation, would reduce the calculation time for multiplication, division, and exponentiation by about 25 percent. Such possibilities are realizable with functions written in assembler with direct access to 64-bit results of machine instructions for multiplication and division or with processors with 64-bit registers that would also allow to C implementations the lossless storage of such results in a ULONG type. The FLINT/C package contains some examples whose gain in speed results from the use of arithmetic assembler functions (see Chapter 18).

The next question is that of the ordering of the USHORT digits within a vector. We can imagine two possibilities: from left to right, with a descending evaluation of digits from lower to higher memory addresses, or the other way round, with an ascending evaluation of digits from lower to higher memory addresses. The latter arrangement, which is the reverse of our usual notation, has the advantage that changes in the size of numbers at constant addresses can take place with the simple allocation of additional digits, without the numbers having to be relocated in memory. Thus the choice is clear: The evaluation of digits of our numerical representation increases with increasing memory address or vector index.

As a further element of the representation the number of digits will be appended and stored in the first element of the vector. The representation of long numbers in memory thus has the format

where B denotes the base of the numerical representation; for the FLINT/C package we have B := 2¹⁶ = 65536. The value of B will be our companion from here on and will appear continually in what follows. The constant CLINTMAXDIGIT represents the maximal number of digits of a CLINT object.

Zero is represented by the length l = 0. The value n of a number that is represented by a FLINT/C variable n_l is calculated as

If n > 0, then the least-significant digit of n to the base B is given by n_l[1], and the most-significant digit by n_l[n_l[0]]. The number of digits of n_l[0] will be read in what follows by the macro DIGITS_L (n_l) and set to l by the macro SETDIGITS_L (n_l, l). Likewise, access to the least-significant and most-significant digits of n_l will be conveyed by LSDPTR_L(n_l) and MSDPTR_L(n_l), each of which returns a pointer to the digit in question. The use of the macros defined in flint.h yields independence from the actual representation of the number.

Since we have no need of a sign for natural numbers, we now have all the required elements for the representation of such numbers. We define the corresponding data type by

 typedef unsigned short clint; typedef clint CLINT[CLINTMAXDIGIT + 1];

In accordance with this, a large number will be declared by

 CLINT n_l;

The declaration of function parameters of type CLINT can follow from the instruction CLINT n_l in the function header.^[2] The definition of a pointer myptr_l to a CLINT object occurs via CLINTPTR myptr_l or clint *myptr_l.

FLINT/C functions can, depending on the setting of the constant CLINTMAXDIGIT in flint.h, process numbers up to 4096 bits in length, which corresponds to 1233 decimal digits or 256 digits to the base 2¹⁶. By changing CLINTMAXDIGIT the maximal length can be adjusted as required. The definitions of other constants depend on this parameter; for example, the number of USHORTs in a CLINT object is specified by

 #define CLINTMAXSHORT CLINTMAXDIGIT + 1;

and the maximal number of processable binary digits is defined by

 #define CLINTMAXBIT CLINTMAXDIGIT << 4;

Since the constants CLINTMAXDIGIT and CLINTMAXBIT are used frequently, yet are rather unwieldy from a typographical point of view, we shall denote these constants by abbreviations MAX_B and MAX₂ (with the exception of program code, where the constants will appear in their normal form).

With this definition it follows that CLINT objects can assume whole-number values in the interval [0, B^MAX_B − 1] respectively [0, 2^MAX₂− 1]. We denote the value B^MAX_B − 1] = 2^MAX₂ − 1, the largest natural number that can be represented by a CLINT object, by N_max.

For some functions it is necessary to process numbers that have more digits than can be accommodated by a CLINT object. For these cases the variants of the CLINT type are defined by

 typedef unsigned short CLINTD[1+(CLINTMAXDIGIT<<1)];

and

 typedef unsigned short CLINTQ[1+(CLINTMAXDIGIT<<2)];

which can hold double, respectively four times, the number of digits.

As support personnel to aid in programming, the module flint.c defines the constants nul_l, one_l, and two_l, which represent the numbers 0, 1, and 2 in CLINT format; and in flint.h there are corresponding macros SETZERO_L(), SETONE_L(), and SETTWO_L(), which set CLINT objects to the corresponding values.

^[1]Without taking into account such practical nonstandard types as unsigned long long in GNU C/C++ and certain other C compilers.

^[2]In this regard compare Chapters 4 and 9 of the extremely readable book [Lind], where there is an extensive explanation of when vectors and pointers in C are equivalent, and above all, when this is not the case and what types of errors can arise from a misunderstanding of these issues.

Team-Fly