Representation and Simulation

Data structures and algorithms can be derived from the theory to assist the programmer.

Transition Array

One popular way to encode the transitions is to store them inside a 2D array (see Table 38.5). The two dimensions are state and input, and each cell stores the next state. All transitions are therefore explicitly encoded. There needs to be a specific symbol to indicate an empty cell, which allows the finite-state machine to be sparse (that is, not fully connected).

Using only a transition array, a finite-state automaton can be modeled. To represent a finite-state machine, an additional output table is required. This stores the output in terms of the current state and input.

Table 38.5. Transitions Stored as a 2D Array, with Each Cell Pointing to the Next State (Some cells do not have following states.)
State\Input	a	b	c
1	4	3	-
2	-	1	-
3	-	-	2
4	1	-	-

Such an approach does prove quite fast given a reasonable size table because it's a plain lookup. The major benefit lies in simplicity, and minimal knowledge of programming is required to implement a 2D array. Very little effort is required to test and debug it, too.

On the downside, this representation is not compact. In an average case, a lot of redundant information is stored; indeed, most finite-state machines are sparsely connected (that is, many empty cells). An array that can store all the transitions is wasteful, and scales very poorly.

Directed Graph

A directed graph proves a more compact representation of a finite-state machine. This corresponds exactly with the graphical representation, so encoding it as a directed graph is little trouble. Typically, the states are stored as an array (at least conceptually). Each state is then responsible for keeping track of outward transitions. This can be done with a nested list or another array, as shown in Table 38.6.

The active state can be determined with a single lookup. The bottleneck of this approach lies in finding the applicable transition. A traversal of the array at each state is necessary, checking whether the current input symbol matches at each step.

Table 38.6. Representation of a Directed Graph Using an Array of States Containing Lists of Transitions
State	Transitions
1	a: 4, b: 3
2	b: 1
3	c: 2
4	a: 1

Even a naive traversal of this array may prove faster than a lookup in a large array, because of the cache consistency. Indeed, all the transitions can often be stored within the same cache line. To squeeze every possible cycle out of the search, each of the transitions could be sorted according to the input symbol consumed. This would allow a binary search, and reduce the complexity of the traversal to O(log n) rather than O(n). Finally, a hash table could make this a near constant lookup at the cost of additional memory.

These tricks are often unnecessary because of the simplicity of the finite-state machine used and because the overhead is negligible compared to other aspects of the AI. If an early prototype reveals that large finite-state machines are required, however, such optimizations should be considered.

Grammar

Finite-state machines recognize a distinct set of patterns. All these patterns fall into the category of regular languages, which have limited complexity. Finite-state machines are not a model of computation as advanced as the Turing machine (which can emulate any form of computation).

Intuitively, regular languages correspond to regular expressions commonly used for text processing (mainly in the Perl programming language). Such regular expressions are a compact way of representing a regular grammar, which can thereby be compiled into a finite-state machine:

  (baab)* ((.*)ab+){1,2}

Grammars can be used to store the essential structure of finite-state automata, but are not appropriate for finite-state machines with outputs. This representation should be considered mainly for recognizing and accepting text. Those not familiar with regular expressions may prefer the graphical approach to design!

Some constructs are easier to model as finite-state machines, but operations such as counting can be difficult to express in a simple fashion. The {1,2} shown actually specifies that the previous pattern should be present either once or twice; this requires duplicating the sub-FSA capable of accepting that pattern, and connecting it twice. This gets worse for larger numbers, so alternative approaches (that is, counters, using an additional variable to count) are generally used in combination to create simpler, hybrid finite-state machines.

Transition Array

Table 38.5. Transitions Stored as a 2D Array, with Each Cell Pointing to the Next State (Some cells do not have following states.)

Directed Graph

Table 38.6. Representation of a Directed Graph Using an Array of States Containing Lists of Transitions

Grammar