Introducing the Finite State Machine

A Finite State Machine (FSM) is best described as an example of an abstract machine. It's not a machine in the traditional, physical sense, but a logic model implemented entirely in software. In some respects, it is a mini computer in its own right. It has inputs, outputs, and processing in which decisions are made on what its output should be at particular points in time, based on its input.

The true test for any abstract machine is that it could be implemented outside a software environment. For those of you with a background in electronics, the simplest example is an AND gate two inputs, one output, and the output determined by two inputs. The AND gate can be constructed in the real world using two transistors. In programming it is implemented using the AND operator (&& in PHP).

The FSM is an example of a somewhat more sophisticated abstract machine, however.

The traditional textbook definition suggests that a Finite State Machine consists of a set of arbitrary states. These include an initial state, which is applicable as soon as it is first initialized, a set of input events, a set of output events (although the word events is somewhat misleading), and a state transition function.

In other words, an FSM is a device for iterating or traversing some collection of inputs in an intelligent manner to produce an answer of some kind. Unlike a simple foreach statement, or even an implementation of Iterator against a collection class, it is not simply a single action performed against a series of candidates. Rather, an FSM implements an action performed against each candidate that can depend on all the previous candidates encountered before it.

For example, it would not be correct to consider a simple for loop passing over each object in an array of objects to be a Finite State Machine, because the for loop is simply applying the same logic to each object. Although the logic applied may be determined or influenced in some way by the progress through the loop, it is not affected in any manner by the actions it has performed previously, or by those objects it has encountered earlier in the loop. A real-life example of such iterations is traversing the result tuples returned from a database query in order to instantiate corresponding objects. The act of instantiation is the same regardless of the data in the previous iteration.

A Simple FSM: The RPN Calculator

The simplest known example of an FSM, one frequently demonstrated in textbooks explaining the theory, is that of the Reverse Polish Notation (RPN) Calculator. Those of you with UNIX exposure may be familiar with bc and other similar command-line tools. These helpful GNU utilities allow you to easily parse an arithmetic expression, as shown in the following example:

   ed@genesis:$bc    bc 1.05    Copyright 1991, 1992, 1993, 1994, 1997, 1998 Free Software Foundation, Inc.    This is free software with ABSOLUTELY NO    WARRANTY.    For details type 'warranty'.    7 * (9 + 1) * 29    2030 

In this example, bc is used to calculate a simple sum [7 (9+1) 29].

This is all well and good for modern tools such as bc and sophisticated spreadsheet packages such as Excel, but in the earlier days of computing the concept of the precedence of operators (that is, parentheses, indices, multiplication and division, addition and subtraction) was hugely difficult to implement, especially given that parentheses may be nested. Accordingly, the use of parentheses was out and some other means of expressing precedence was required. This was where Reverse Polish Notation, or RPN, came in.

The concept was simple. Operators were written to the right of the numbers to which they related, with those numbers being written in reverse. In the previous example, you would normally write:

   7 * (9 + 1) * 29 

In RPN, you would write the same expression as:

   1 9+29 7 * * 

Essentially, a stack of numbers is maintained. Moving from left to right, every time a number is encountered it is pushed onto the very top of a stack. Every time an operator is encountered the topmost two numbers are taken from the stack and that single expression evaluated in isolation. The result is added to the top of the stack as a number in its own right.

In the previous example, the numbers 1 and 9 are added to the stack. A plus sign is then encountered, meaning that two numbers must be taken from the stack. The 1 and the 9 are taken, leaving the stack empty. The 1 and the 9 are added together, giving 10. This number 10 is added to the empty stack. Next, the number 29 is encountered, then the number 7. These are all added to the top of the stack, giving us 10, 29, and 7 in the stack. Next, a multiplication sign is encountered. The top two numbers, 7 and 29, are taken and multiplied together, to give 203. This number 203 is then put back on the stack, leaving the numbers 230 and 10 on the stack. Finally, a multiplication sign is encountered again, at which point the final two numbers on the stack, 203 and 10, are taken and multiplied together to give 2030. At the end of the series of inputs, there is no more data, so 2030 is the answer.

This is a very good example of a Finite State Machine. There are two states, either accumulating the number stack or performing a basic operation. For each piece of entry data, which will either be a number or an operator, the FSM determines both what it has been doing last (accumulating the stack or performing an operations) and what it is looking at now (an operator or a digit).

It then has to decide what to do now:

Current State

Current Symbol

Action

New State

Accumulating

Number

Add number to stack

Accumulating

Accumulating

Operator

Take top two numbers off the stack, perform the operation on the two numbers, put the result on the top of the stack

Operating

Operating

Number

Add number to the stack

Accumulating

Operating

Operator

Take top two numbers off the stack, perform the operation on the two numbers, put the result on the top of the stack

Operating

This particular FSM does not really need to decide its next state at any stage, because this is not determined by previous inputs, only by the current input.

This variety of FSM is actually known as a Push Down Automaton, or PDA, because it has a concept of memory in the form of its stack. All PDAs are FSMs. Not all FSMs are PDAs.

Theoretical Implementation of FSMs

At the simplest level, an individual Finite State Machine is defined by building what are known as transition tables.

These tables map combinations of input symbols and current machine states into a combination of an action and a subsequent state. There needs to be one rule for each combination of input and state. For example, if two inputs and two states are conceivable, the following transition table might apply:

   {input1, current1} --> {action, next}    {input2, current1} --> {action, next}    {input2, current1} --> {action, next}    {input2, current2} --> {action, next} 

For any given input symbol, the Finite State Machine uses these tables to decide what action to perform and what the next state will be. Generally, a default transitional rule is also defined, either to avoid verbosity or to trap errors in input.

There are two basic variations on the FSM theme: deterministic FSMs (DFSMs) and nondeterministic FSMs (NDFSMs). In a deterministic FSM, the next state of the machine is determined solely by a single input symbol. By contrast, the next state of a nondeterministic FSM depends on both the current input symbol and one or more subsequent symbols. The RPN calculator is an example of an NDFSM in the respect that its next state is determined from whether the next symbol it encounters is an operator or a numeric, and cannot be determined from only the current symbol.

Implementing FSMs in PHP

FSMs are relatively easy to implement syntactically. Indeed, you may well have unconsciously written code in the past that could easily be classified as an FSM to satisfy some algorithmic requirement. However, PHP has an implementation of FSM of its own in the form of a PEAR class known simply as FSM.

Installing the FSM Class

You should install the FSM PEAR class in the normal manner. Use the PEAR package manager to install it from your development server's command prompt:

   root@genesis:# pear install FSM    downloading FSM-1.2.1.tgz ...    ...done: 3,151 bytes    install ok: FSM 1.2.1 

You can test your installation by executing the following command-line input, simply by entering code straight into the PHP command-line application:

   root@genesis:/public_html/prophp5# php    <?    require_once("FSM.php");    ?> 

Press Control+D on UNIX (Control+Z on Windows systems) to indicate EOF and you should see nothing at all. If PHP throws an error message about not finding the class, double-check that the package installed correctly into /usr/local/lib/php (or your setup's equivalent).

The Syntax of the FSM Class

The FSM class is very simple. Unlike other programming frameworks such as PHPUnit, it is not extended in everyday use but is implemented like any other class you may have written yourself. For the curious, the full syntax of the FSM class can be found at author Jon Parise's Web site at http://www.indelible.org/pear/FSM/api/, but we detail the essential methodology in this section.

This particular implementation of the FSM model is limited to coping with a single string as its input. Accordingly, should you wish it to cope with collections of objects or something more complex, you will need to serialize them in some manner. In addition, symbols are read strictly one character at a time, so you must implement transitions to account for characters that are delineating entities (for example, the string '53 15 19 * * =' will be read as characters '5', '3', ' ', '1', '5', ' ', and so on, so your code must be able to rebuild '5' and '3' to '53', and so forth).

Instantiating the Class

The FSM class is instantiated as follows:

   $stack = array();    $fsm = new FSM('INIT', $stack); 

The first parameter passed is the initial state of the machine. States in the FSM class are named using simple strings, and these names have no real significance beyond code readability (consider them equivalent to variable names in this respect). Here, the initial state is named INIT, which seems a sensible choice.

Note also that we have passed a stack variable (which should be of a simple array type) for the FSM to use as its "memory.'' The FSM constructor specifies that this is passed "by reference'' rather than as a copy, so you do not need to pass it as a reference here, too.

Setting the Default Transition

First, you should endeavor to tell your instantiated Finite State Machine the name of the default method to call if it is unable to match another transition against the input and current state. This may be either an error routine (that is, to handle bad input) or a catch-all method.

   $fsm->setDefaultTransition('INIT', 'Error'); 

The first parameter passed refers to the machine state that should be adopted when this transition is encountered. The second parameter is the name of the method, which will be called prior to this state's being set. Calling this method in your own FSM implementation is optional but is highly recommended nonetheless.

Setting a Single Transition

You may set a single transition for a given input symbol and given entry state using the addTransition method.

   $fsm->addTransition('=', 'INIT', 'INIT', 'DoEqual'); 

The first parameter is the symbol in question. The second is the entry state that must be matched for this transition to apply. The third is the state that will be set after this transition has completed. The fourth is the method that should be called when this transition occurs. The final action parameter may be null.

Setting a Transition for an Array of Inputs

You may simplify your code by specifying a given transition for a single state and an array of input symbols. This can be very useful, for example, for specifying a single transition that applies for all digits 0 thru 9.

        $fsm->addTransitions(range(0,9), 'BUILDING_NUMBER', 'BUILDING_NUMBER',    'BuildNumber'); 

The first parameter is the array of matching symbols in question. The second is the entry state that must be matched for this transition to apply. The third is the state that will be set after this transition has completed. The fourth is the method that should be called when this transition occurs. Again, the final action parameter may be null.

Starting the Machine

After you have defined your transitions, the machine is started by calling the processList method as follows:

   $fsm->processList($symbols); 

One parameter is allowed: the string containing the symbols to be processed. If this is directly dependent on user input, you may want to perform some sanity checking and cleansing before it is passed to the FSM class.

Disassembling the RPN Calculator Example

An example implementation of the RPN Calculator can be found within the PEAR class API at http://www.indelible.org/pear/FSM/api/. Although the source code for the RPN Calculator is listed in full within the API at the preceding URL and works perfectly, no explanation is provided as to its inner workings. Accordingly, it's time to look further through this simple example to discover how it works.

The full source code is available online, so if you haven't done so already, copy and paste the source code into a new file and call it rpn.php. Run this script from the command line, not from your Web browser, as shown in the following example:

   root@genesis:/public_html/prophp5# php ./rpn.php    Expression:    1 9 + 29 7 * * =    2030    root@genesis:/public_html/prophp5# 

The RPN implementation from the FSM API documentation is a command-line application rather than a traditional Web-based PHP script, but it would be a relatively trivial matter to adapt it should you feel any great need to do so.

As you can see, we have supplied our expression 29 (1+9) 7 and reached the correct answer. But how does it work?

Take a look at the code itself. The entry point is after the function definitions:

       $stack = Array();        $fsm = new FSM('INIT', &$stack);        $fsm->setDefaultTransition('INIT', 'Error');        $fsm->addTransitionAny('INIT', 'INIT');        $fsm->addTransition('=', 'INIT', 'INIT', 'DoEqual');        $fsm->addTransitions(range(0,9), 'INIT', 'BUILDING_NUMBER',    'BeginBuildNumber');        $fsm->addTransitions(range(0,9), 'BUILDING_NUMBER', 'BUILDING_NUMBER',    'BuildNumber');        $fsm->addTransition(' ', 'BUILDING_NUMBER', 'INIT', 'EndBuildNumber');        $fsm->addTransitions(array('+','-','*','/'), 'INIT', 'INIT', 'DoOperator'); 

As you can see, the first steps are to set up the instance of the FSM class, and set the default and normal-use transitions.

Only two states are defined. The initial state is known as INIT, which is active as soon as the class is instantiated, after an operation has been performed, and after a distinct number has been added to the stack. The second is known as BUILDING_NUMBER, which is active only while the digits of a number are being glued together.

Note that the range() method is used to specify the numerals 0 thru 9 and that the arithmetic operators are specified using a simple, in-line array. Five custom methods are referred to for the transitions: BeginBuildNumber, BuildNumber, EndBuildNumber, DoOperator, and DoEqual. You can now look at these in more detail.

BeginBuildNumber

This method is called whenever the string contains a numeric and the machine had not previously been building a number; for example, when the previous character was a space and this character is a 5 this 5 must be the beginning of a number, so this method is called.

   function BeginBuildNumber($symbol, $payload)    {        array_push($payload, $symbol);    } 

This method, in common with all methods called as part of a transition, takes two non-optional parameters: a reference to the FSM stack and the symbol being passed.

In the case of this method, the symbol passed is added to the stack and nothing else. Don't concern yourself with changing states; this is handled by the FSM directly.

BuildNumber

This method is called whenever the string contains a numeric and the machine had previously been building a number; for example, whena1is encountered, and the last digit encountered was a 5. This is therefore the second digit in a number being built up.

   function BuildNumber($symbol, $payload)    {        $n = array_pop($payload);       $n = $n . $symbol;       array_push($payload, $n);    } 

The number may be 51, or it could be 51310 you not know this yet, so you take the number as it previously existed on the stack off the stack, append this new digit, and place it back on the stack. For example, if the number was previously 5 in the stack, this 5 is removed from the stack, 1 is appended to give 51, and the number is placed back on the stack.

EndBuildNumber

This method is called whenever the string contains a blank space and the machine had previously been building a number; for example, when a space is encountered and the last digit encountered was a 1.

   function EndBuildNumber($symbol, $payload)    {        $n = array_pop($payload);        array_push($payload, (int)$n);    } 

The assembled number is taken from the stack; however, it currently resides there in string form because it has been appended to using the string concatenation operator. Before it is placed back on the stack, it is cast into an integer, ready for arithmetic operations.

DoOperator

This method is called whenever the string contains an operator (plus, minus, multiply by, or divide by) and the state is INIT that is, a number build process is not under way (see the BuildNumber method, discussed previously).

   function DoOperator($symbol, $payload)    {        $ar = array_pop($payload);        $al = array_pop($payload);        if ($symbol == '+') {            array_push($payload, $al + $ar);        } elseif ($symbol == '-') {            array_push($payload, $al-$ar);        } elseif ($symbol == '*') {            array_push($payload, $al * $ar);        } elseif ($symbol == '/') {            array_push($payload, $al / $ar);        }    } 

The last two numbers of the stack are recorded and removed. Because they will have both undergone the EndBuildNumber process, they will be integers, not strings. A simple if construct is then used to determine which operator has been specified, and the two numbers are operated on accordingly. The result of the operation is pushed back onto the stack.

DoEqual

This method is called whenever the string contains an equals sign and the state is INIT that is, a number build process is not under way. This tells the FSM to complete its operation and output the result currently on the stack.

   function DoEqual($symbol, $payload)    {        echo array_pop($payload) . "\n";    } 

One small oversight here is that no checking is done to see whether the stack contains only one value, or more than one. For any valid RPN expression, after all operators have been processed, the stack should be empty but for one final number, which is the evaluated expression.

If you feel strongly about this, it should be straightforward enough to add a further state to allow the FSM to determine whether the stack is ready for exit. Calls to the DoEqual method when this state is not true would not be allowed.

Real-World Examples of FSMs

An RPN calculator is a nice textbook example to follow, but it is not likely to be something you would want to implement in your own applications. In PHP in particular, you will find that you can use the eval() function to evaluate any numeric expression with relative ease, and certainly without recourse to the slightly obscure Reverse Polish Notation.

There are some real-world examples, however, of when the FSM is a genuinely useful approach.

Many implementations of FSM involve string parsing, and almost all could be solved with either a lethally complex regular expression (as is popular with PERL programmers) or some variety of while loop coupled with a series of "flags.'' Neither of these is a particularly elegant solution, and an FSM approach allows you to construct shorter, simpler, more maintainable code.

Configuration Files

One of the most common uses for a Finite State Machine is that of the configuration file. The file must be human readable in order to be human modifiable, but also computer readable in order to be useful in your application. For all the benefits of XML, it still requires some specialist knowledge to accurately modify, and as such is not a particularly useful format for a configuration file (far more common is that in fact used by PHP php.ini). In the next section of this chapter, you'll look at parsing a configuration file of that very format using PHP's built-in methods.

Lexical Analysis for Artificial Intelligence

You will almost certainly have played with "chat-bots.'' Early examples such as Eliza have been superseded by extraordinarily sophisticated models that are almost capable of passing the "Turing Test,'' which dictates that a machine professing to be artificially intelligent will have succeeded only if it can convince a questioner that it is in fact not a machine at all. Chat-bots are a fun implementation of lexical analysis, but real-world implementations are cropping up for "helpdesk'' search algorithms that allow users to express their questions in plain English.

FSMs provide an excellent mechanism for parsing language constructs; they can understand characters, words, and sentences and, at a more complex level, verbs, nouns, adverbs, adjectives, and more. Because the FSM is "state-aware,'' it can be programmed to expect these verbs, nouns, adverbs, and adjectives based on the current "state'' of the sentence.

The precise implementation of such an FSM is outside the scope of this book and probably too sophisticated for the FSM class met in this chapter. However, the Web has many excellent resources on the subject, including a white paper by the Universit t des Saarlandes at http://www.coli.uni-sb.de/kay/motivation.pdf.



Professional PHP5 (Programmer to Programmer Series)
Professional PHP5 (Programmer to Programmer Series)
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 182
BUY ON AMAZON

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net