1.2 Algorithmic Complexity Rationale

Digital computers are commonly referred to as general purpose machines. The seeming implication is that with sufficient memory and operating speed it should be possible to implement any computable process on such a machine. The concept of computation universality, originally expressed in terms of the Turing model of computation, captures this idea. For the present purposes, the Turing formalism can be equated to a digital machine with no a priori limit on available memory and time. Such an idealized machine would be capable of computing any computable function. Realizable machines are, of course, finite. The memory available may not be sufficient to perform the desired computation; or the computation might require an unacceptable span of time. Here we are especially concerned with a further limitation: The size of the program that can be presented to the machine is also subject to practical restrictions.

The above distinction, between limits on processing capacity and program size, has an important implication. Even if processing speed and memory space could be increased indefinitely, a large class of information processing tasks would still be inaccessible. The programs, or maps describing the input-output behavior of the system, can be too large to practically specify.

Let us take as a computer any system that, starting from a state that encodes a problem description, will change to a state interpretable as the solution of the problem. The limited precision and limited dynamic range of the computer's components, together with the requirement of a finite response time, restrict any computer to a finite set of discernible inputs and a finite repertoire of outputs.

A deterministic computer is a physical realization of a function that takes an input signal pattern as argument and returns as the value the associated output signal pattern. To make the computer perform a desired task, it is necessary to specify the appropriate function. The specification may be provided explicitly by programming or, in the case of an adaptable system, implicitly through training. In either case, the specification has to select the desired system behavior from the set of potential behaviors.

Consider a deterministic computer that is supposed to respond to each n-bit input pattern with an appropriate m-bit output pattern. The function that maps the input into output can, in principle (and for small values of n also in practice), be described by a table. The table would have 2ⁿ rows, one for every possible input, and each row would contain the pattern that the computer should output in response to this input. Programming a computer requires that the table it should implement be communicated to it.

The amount of information necessary to specify the input-output map is given by the number of bits needed to select one specific table from the set of all possible tables. There are 2ⁿ rows corresponding to the possible inputs in the table, and any one of the 2^m possible outputs may be assigned to each row. This gives rise to 2^(m2ⁿ) possible tables. Selecting an arbitrary table from this set requires a specification that is log₂[2^(m2ⁿ)] = m2ⁿ bits long (Ashby 1968). The important implication is this: Even for input patterns of very moderate size, it will almost always be impossible to program a computer to perform a map arbitrarily selected from the set of possible maps. For example, consider a pattern of the size of a single character on a computer screen, say 10 10 black and white pixels (n = 100 bits), and suppose we want to classify such tiny images according to whether or not they contain a certain feature (meaning that m = 1 bit). This could require a program 10²⁰ gigabytes in length.

On the surface, it might seem that for any particular job required, it should be possible to devise an appropriate program of practical size. The following considerations from algorithmic complexity theory reveal that programming a "general" purpose computer is in fact practical only in very special situations.

In the example considered above, every row of the table that describes the classification of the 10 10 pixel images has a 1-bit entry indicating the presence or absence of the feature. The content of the table corresponds to a binary string of length equal to the number of rows in the table. Chaitin (1966) asked the question: How long would a program need to be in order to generate such a sequence? For our purpose, we can take the ability to generate the contents of the table as equivalent to the capacity to implement the input-output map described by the table. Some classifications have short programs. If we want each input image to be classified according to whether it is all black, then all but one row in the table will contain the same bit. A program much shorter than the explicit table will be sufficient to generate the table. This corresponds to the fact that the table is highly compressible—the program being a compressed description of the table. The algorithmic complexity of the table is defined as the length, up to an additive constant, of the shortest program required to generate it (Li and Vit nyi 1997). The additive constant reflects differences in machine architecture that, from a practical point of view, can have an immense impact as the constant becomes large (Kampis 1991).

For most tables, no significant compression is possible, as can be seen from a simple counting argument (Chaitin 1974). Under the assumption that (due to the capacity of the machine or its programmers) the longest practical program is limited to a length of b bits, there exist only 2^b distinct programs. The fraction η of tables describing n bit inputs mapped to m bit outputs, which can be compressed to a b-bit-long specification, is therefore at most

n = 2^(b−m2ⁿ)

Furthermore, this maximum value of η can only be achieved if the machine architecture is not degenerate in the sense that two or more distinct programs yield identical input-output behavior.

The above equation shows that, in practice, only a very small fraction of the conceivable information processing tasks can be implemented by programming a putatively general-purpose computer. However, the compressibility of the tables is relative to the machine architecture on which they are specified. Different architectures can bring different input-output behaviors within reach of practical specifications. An extreme example would be a machine specifically constructed to solve a single large problem instance (Zauner and Conrad 1996).

Every realizable information processing machine can only implement a small subset of the possible input-output transforms and is therefore a special-purpose device (Zauner and Conrad 2001). The common computers, often naively assumed to be general purpose, are in fact specialized devices that have been designed to implement the narrow class of highly compressible input-output maps.