7.2 Defining Object State Behavior

In state machines designed by traditional structured methods, the portion of the system exhibiting the state behavior is not clearly defined. Some set of functions and data collaborate in a way that lends itself to finite state modeling, but generally this set is only vaguely defined. In object-oriented methods, the programmatic unit exhibiting state behavior is clear only classifiers, such as classes and use cases, can define state models, and only objects execute state machines.^[3]

^[3] We use the term state model to mean the definition of a state machine, which, when used, must defined within a class. The term state machine is an instance of a state model and therefore must belong to an object.

Consider a simple, retriggerable, one-shot timer. Such a timer is generally in one of two possible states: idle and counting down. When the timeout event occurs, the timer issues an event to a client object, (implicitly) resets, and returns to the counting-down state. This model is shown in Figure 7-2.

Figure 7-2. Retriggerable One-Shot Timer FSM

graphics/07fig02.gif

Remember that the UML does not define an action language, although it does define the set of things you can specify with actions the action semantics. We will use C++ and the occasional macro (such as GEN() to generate an event) as the action language here, but any computer or formal language can be used.

The example state machine for the OneShotTimer class shown in Figure 7-2 consists of two states and three transitions. In the Idle state, the timer isn't counting down it sits patiently waiting for the evStart Cmd. The evStartCmd event carries with it a single value, the length of time between timeouts. When the event occurs, the object transitions to the counting-down state. As soon this state is entered, an implicit timer is started because the state has an exiting timeout transition (shown using the common tm() event trigger^[4]). When the timeout occurs, the tm() transition is taken, and the actions in its action list are executed. Actions are shown in an action list, which is separated from the transition label with a slash (/). The action in the tm() transition's action list is to send a signal to a client object. The actions in the action list are normally short, but in any case they will run to completion before the transition completes. The counting-down state is then re-entered. This restarts the timer associated with the timeout transition.

^[4] Note that some other tools use after() rather than tm().

In the counting-down state, the object can respond to two events: a timeout and an evStopCmd. In the latter case, the object enters the Idle state. The timer corresponding to the tm() transition is (implicitly) stopped as soon as its source state (counting down) is exited.

Another example of object state behavior is demonstrated by the model shown in Figure 7-3. In this model, objects that either send or receive such messages have the class CommunicatingObjects; such objects know how to construct messages to be sent or how to extract information from messages received. They associate with a Communicator class that knows how to orchestrate their transmission or reception. Note that the Communicator class has two associations with the CommunicatingObject, as indicated by the sender role and the receiver role. When one object (in the sender role) sends a bus message to a remote object (playing the receiver role) using a reliable communication protocol, an object is temporarily created until it can be verified that the receiver has received the message properly. Such an object is called a transactional object because it only exists for the duration of an interaction, or transaction, between other objects.

Figure 7-3. Message Transaction Structure

graphics/07fig03.gif

The transactional object in Figure 7-3 is called MsgTransaction. Its behavior is specified with the statechart in Figure 7-4. Instances of MsgTransaction begin life in the ReadyToSend state. When commanded (via the evGo event), they enter the sending state and invoke a method in the protocol stack called send. Because MsgTransaction objects mediate reliable message delivery, once done sending the message, they wait for an acknowledgement (an ACKMsg) for that specific message (as specified with a message identifier). If an acknowledgement message for the transmitted message is received, then we're done. If not, the MsgTransaction object, after waiting an appropriate length of time, retransmits the same message and resumes waiting. If after some number of attempts, the acknowledgement is not received, the MsgTransaction object gives up and sends a failure event to the Communicator, which will in turn inform the sender. Either way, the MsgTransaction object is eventually discarded and destroyed.

Figure 7-4. Message Transaction Behavior

graphics/07fig04.gif

As a brief aside, the astute reader will notice that the Communicator knows how to send using both reliable and unreliable means. It is important, for example, that ACKMsgs and NAKMsgs are not sent reliably or the system will enter an infinite recursion.

Also notice the © symbol on the statechart. As discussed in Chapter 3, this is a notation for the conditional pseudostate, a type of junction pseudostate in which a single event triggers the transition and at most one exiting transition branch will be taken, on the basis of the evaluation of the guard conditions. Guards are side effect free Boolean conditions placed in square brackets that are evaluated before any actions are executed.

7.2.1 Cardiac Pacemaker Example

A cardiac pacemaker is an excellent example of a predominantly reactive system in which many objects use finite state machines to specify their behavior. The following problem statement will be developed into a class model that will allow us to see how the various statechart features can be used in a real system.

Problem Statement: A Cardiac Pacemaker

The human heart is a marvelous four-chambered pump made up of two atrial chambers that contract slightly before the two larger ventricular chambers, and an electrical conduction system that, in the healthy heart, ensures the heart beats in an efficient, timely, and effective way. The atrial contraction preloads the larger and more powerful ventricular chamber by pumping extra blood into it, stretching its muscle fibers. The one-way a-v valves then close, and the ventricles contract strongly, sending unoxygenated blood in the right ventricle into the lungs to bind with oxygen and oxygenated blood from the lungs into the arterial system of the body. Myocardial cells contract autonomously because they have leaky membranes with ionic pumps that maintain a charge across the membrane. When the charge reaches a certain level, a set of ionic pores open up in the membrane causing a rapid depolarization of the muscle cell, causing it to contract. Different cells have different rates at which this contraction occurs. The fastest contracting cells are located in the sinoatrial (SA) node; because this area contracts first, and the depolarization spreads from cell to cell electrically, this area is said to be the "pacemaker" of the heart. The upper part of the heart, housing the atria, is separated electrically from the lower part, housing the ventricles. This separation (called the AV node) causes a slight delay in the contraction of the ventricles. This is hemodynamically important because it allows the ventricles to more completely fill before they contract and adds 20% or so to the volume of the blood pumped per beat. That's in the healthy heart. Sometimes the electrical control system fails and this can lead to any number of problems, the most common of which is a heart beat that is too slow to allow normal activity, a condition known as bradycardia.

A cardiac pacemaker is an implanted device that assists cardiac function when underlying pathologies make the intrinsic heart rate too low or absent^[5] by taking over the initiation of a cardiac contraction. Pacemakers operate in quite a number of behavioral modes, each indicated by a three-letter acronym. The first letter is either A, V, or D depending on whether the atrium or the ventricle or both (dual) is being paced (i.e., induced to contract via the delivery of a well-quantified electric shock). The second letter is also A, V, or D, depending on which heart chamber is being monitored for intrinsic heart beats. The last letter is I, T, or D, indicating inhibited, triggered, or dual pacing modes. In an inhibited mode, a sensed heartbeat will inhibit the delivery of a pace from the pacemaker. In triggered mode, a sensed heart event will immediately trigger a pace from the pacemaker. For example, VVI mode means that the ventricle is paced (the first V) if a ventricular sense (the second V) does not occur. A ventricular sense event is generated when the ventricle of the heart contracts. If a ventricular sense does occur, then the pace is inhibited (the I). Dual modes are more complex and will not be discussed here.

Most of the time, a pacing pacemaker waits for a sense event. When it decides to pace, the pacemaker first shuts off the sensing hardware to prevent electrical damage, and then delivers an electric current of a programmable voltage (called the pulse amplitude) for a programmable period of time (called the pulse width). Following a pace, the pacemaker is put into a refractory state for a set period of time, during which all cardiac activity is ignored. Following the refractory period the pacemaker resumes monitoring for the next cardiac event. The rate of pacing is determined by the programmable pacing rate. The period of time the pacemaker will remain in the waiting state is based on the pacing rate and the pulse width. The refractory period is usually fixed. Our problem is to create a pacemaker that operates in VVI, AAI, VVT, AAT, and AVI pacing modes as programmed by the physician.

Pacemaker parameters are programmed via a telemetric interface to an external programmer. Telemetry is sent by pulsing an electromagnetic coil a certain number of times to indicate a 0 bit and a different number of times to indicate a 1 bit. To avoid inadvertent programming by electrical noise, a reed switch must be closed with a magnet before programming is enabled and must remain closed as long as communication is ongoing. Removal of the magnet disables communications. In this particular pacemaker, commands are of the form

 [CD][Command ID] [Length][Data][CRC]

^[5] Other kinds of pacemakers address other pathologies, such as the heart beating too fast (tachycardia) and as well as extra ventricular beats (preventricular contractions or PVCs) and uncoordinated muscular contractions (fibrillations).

The command is an 8-bit value that specifies the command to be executed; the length is an 8-bit value that specifies the number of bytes in the data field, plus the size, plus 4 (to account for the command ID byte, the length byte, and the 16-bit CRC). Table 7-1 lists the messages for the pacemaker.

Table 7-1. Pacemaker Messages
Command ID	Value	Description
Get parameters	0	Request programmable settings in pacemaker.
Get serial number	1	Request manufacturers information from the pacemaker.
Set pacing mode	3	Sets the pacing mode to one of AAI=0, VVI=1, AAT=2, VVT=3, AVI=4. Data is 8 bits in length.
Set parameters	4	Sets the three pacing parameters, each of which is 8 bits long: pulse amplitude in mV from 0 (off) to 15 mV in integral units of mV. Pulse width is specified from 0ms to 30ms. Rate is set in beats per minute in the range of 30 .. 120, inclusive.
Exit low power Mode	32	Command the pacemaker to power the electronics for normal operation.
Enter low power Mode	33	Commands the pacemaker to disable all electronics except as necessary to perform communications.
Parameters	128	Data consists of four values plus 32-bit numbers: pacing mode (1 byte), pulse amplitude (1 byte), pulse width (1 byte), pacing rate (1 byte), and battery voltage (fixed point integer for range of 0.00 to 5.00 volts) (returned from the pacemaker).
Serial number	129	Data consists of four ASCII characters, "ACME," followed by three 32-bit numbers: the serial number, the hardware version number, and the software version number (returned from the pacemaker).
ACK	254	Acknowledgement of last message.
NAK	255	Last command failed validation.

The commands constructed from the bits must be checked with the 16-bit CRC before they can be acted on. If the command validates properly, then an ACK message is returned. Communications occur between two devices: the pacemaker and a device used by the physician called a programmer. The communications protocol between the two devices requires the programmer to receive an ACK before sending the next command. If the pacemaker doesn't respond within 5 seconds or if a NAK is received, the programmer will resend the message. This occurs automatically. After five successive transmission failures, the programmer notifies the physician (this can occur if the programming wand is too far from the pacemaker resulting in a weak telemetry signal). Note that communications cannot affect pacing other than as the result of processing a command that affects pacing; that is, they must occur in parallel.

A typical pacemaker-programmer interaction might be

Programmer:	Get Parameters
Pacemaker:	ACK
Pacemaker:	Parameters (AAI, 12mV, 15mS, 70ppm, 3.66V)
Programmer:	Set Pacing Mode (AVI)
Pacemaker:	ACK
Programmer:	Set Pacing mode (AVI)
Programmer:	Set Parameters (10mV, 20ms, 60 ppm)
Pacemaker:	ACK

Figure 7-5 shows the primary use cases for the CardioNada pacemaker. Three use cases are shown: set pacing parameters, retrieve current pacemaker settings, and pace the heart. Figure 7-6 shows the class model for the pacemaker. There are two packages in this system, each containing a single subsystem and the parts of those subsystems used to achieve the use cases. The Comm_Subsystem contains the classes necessary to send and receive, validate, and process those commands.

Figure 7-5. Pacemaker Use Cases

graphics/07fig05.gif

Figure 7-6. Pacemaker Class Diagram

graphics/07fig06.gif

7.2.1.1 How Communications Work

The first package contains the classes for communications. These are briefly described in Table 7-2.

Table 7-2. Communication Package Classes
Class	Description
Comm_Subsystem	This is a subsystem that manages communications and command processing.
ReedSwitch	Interfaces with the magnetic reed switch and sends events to the CoilDriver and Communications Gnome when the switch opens or closes. Communications is disabled unless the reed switch is closed.
CoilDriver	The CoilDriver pulses the magnetic coil 15 times to produce a 0 bit and 8 times to produce a 1 bit, with a programmed time interval between bits and a longer time interval between bytes and a still longer period between messages. Telemetry is half-duplex; that is, messages can be sent or received at any given time.
Communications gnome	The gnome processes receives income bytes, constructs messages from them, and validates them. If valid, the gnome sends out an ACKMsg (or a NAKMsg if not valid) as well as processes valid commands.
MessageQueue	There are two message queue objects in the Comm_Subsystem, one for holding bytes as them are received (until an EOM End Of Message event is received), and the other for holding messages ready to be transmitted.
Message	A set of bytes defined by the protocol.

Communications is achieved with the collaboration of the elements in the Communications_Subsystem. The ReedSwitch has simple On-Off state behavior. It propagates events to other classes in the communication subsystem to enable and disable communications specifically the CoilDriver and CommunicationsGnome. This prevents inadvertent reprogramming of the pacemaker by ambient radiation, arc welders, and microwave ovens. The GEN() macro sends the named event (and any specified parameters) to the object with the named role. For example, the ReedSwitch knows the CommunicationsGnome as myGnome. Using

myGnome->GEN(RS_Close)

sends the RS_Close event to the CommunicationsGnome. The Reed Switch has simple On-Off state behavior, as shown in Figure 7-7.

Figure 7-7. ReedSwitch State Model

graphics/07fig07.gif

The CoilDriver class has more elaborate state behavior, as shown in Figure 7-8. At the highest level, the CoilDriver has two or-states: Disabled (the default) and Enabled. When the ReedSwitch closes, it propagates an RS_Open event to the CoilDriver and the CoilDriver enters the Enabled states.

Figure 7-8. CoilDriver State Model

graphics/07fig08.gif

Inside of the Enabled state, there are three nested or-states: Idle, Receiving, and Transmitting. Since these are or-states, communication proceeds in a half-duplex fashion.

When the CoilDriver is in the Idle state, an incoming pulse causes it to go into the receiving state. The CoilDriver hardware contains an internal one-shot timer that determines pulse widths and issues an interrupt to the CoilDriver when a pulse is detected that is within the proper time window that results in the evPulse event shown in the statechart.

The ReceivingBit state counts the pulses that occur within the bit window. As long as a pulse is received before the timeout occurs, the timer is restarted. The number of pulses is tracked in the count attribute of the CoilDriver class. When the tm(BIT_WAIT) occurs, the decode() operation determines that it is either a 0 (in the range of 13 to 17 pulses) or a 1 (in the range of 5 to 10 pulses). This decoded bit is then shifted into a value that will hold the transmitted byte. Once all 8 bits have been shifted in, the completed byte is sent over to the Communications Gnome and the CoilDriver enters the WaitingForBit state. In this state, either an evPulse can be received (indicating the start of the next bit) or a timeout (indicating the end of the message) occurs. If the end of the message is found, the end of message (EOM) event is sent to the CommunicationsGnome so that it can process the now completely received message.

The other possibility from the Idle state is that the Communications Gnome sends a byte to be transmitted. For each bit in the byte to be transmitted, the CoilDriver shifts out the next bit, determines the number of pulses needed to send the bit (held in the pulstCt attribute), and sends pulses the coil that many times. Once all the bits have been sent out, the CoilDriver sends a DoneSending event to the CoilDriver to get the next byte to transmit.

The Communications Gnome oversees the communication process for the pacemaker (see Figure 7-9). It is enabled and disabled by the RS_Close and RS_Open events propagated from the Reed Switch. When enabled, but in the absence of incoming or outgoing messages, the Communications Gnome spends its time in its Idle state. The message level is almost completely decoupled from the strict timing requirements of the CoilDriver. In the Receiving state, the Communications Gnome adds the incoming bytes into the message until the EOM event is received. Then the message is validated (to be valid, the command must have a valid command byte, be of the correct length, and must pass the CRC check). If it is validated, an ACKMsg is queued to send and the command is processed. If the command fails validation, a NAKMsg is sent instead.

Figure 7-9. Communications Gnome State Model

graphics/07fig09.gif

In terms of setting pacing modes, the Communications Gnome sends commands to the instances of the AtrialModel and VentricularModel. When the Communications Gnome receives a command to put the pacemaker in AAI mode, for example, it sends a toIdle event to the VentricularModel instance and a toSelfInhibited event to the Atrial Model instance. The effects of the system pacing mode on the Atrial Model and VentricularModel objects are shown in Table 7-3.

Table 7-3. Pacing Modes and States
Pacing Mode	AtrialModel State	VentricularModel State
Off	Idle	Idle
AAI	SelfInhibited	Idle
VVI	Idle	SelfInhibited
AAT	SelfTriggered	Idle
VVT	Idle	SelfTriggered
AVI	DualMode	DualMode

For setting pulse width and pulse amplitude and heart rate, these are sent to both the AtrialModel and VentricularModel instances.

7.2.1.2 How Pacing Works

The Pacing package contains classes, shown in Table 7-4, used in monitoring and pacing the heart.

Table 7-4. Pacing Subsystem Classes
Class	Description
Pacing_Subsystem	This subsystem manages the monitoring and pacing of the heart through delegation to its internal pieces.
Chamber Model	This abstract base class defines the basic functionality for monitoring and pacing a cardiac chamber, including associations to a voltage sensor and an output capacitor.
Atrial Model	A concrete subclass of the Chamber Model class that uses the basic behavior defined in the Chamber Model class and that specializes the behavior of AVI (dual mode) pacing. This class monitors and/or paces the atrium of the heart via its associations to its own voltage sensor and output capacitor.
Ventricular Model	A concrete subclass of the Chamber Model class that uses the basic behavior defined in the Chamber Model class and that specializes the behavior of AVI (dual mode) pacing. This class monitors and/or paces the ventricle of the heart via its associations to its own voltage sensor and output capacitor.
Voltage Sensor	This class drives the hardware voltage sensor with operations that enable it (turn it on) and disable it (turn it off). When a contraction of the associated heart chamber is detected, it issues a Sense event to its associated Atrial_Model or Ventricular_Model instance.
Output Capacitor	This class controls the release of current stored in the output capacitor. The voltage is controlled by setting the voltage level in the enable() operation. The pulse width is the length of time the voltage is delivered.

The Chamber Model specifies most of the pacing behavior (see Figure 7-10). Pacing the ventricle works largely in the same fashion as pacing the atrium. The On state contains four nested or-states corresponding to idle, inhibited, triggered, and dual modes. For all but the last, there is no need for specialization in the subclasses. In the last (dual mode) however, the AtrialModel and VentricularModel play different roles in the interaction and specialization is a natural way to capture those differences.

Figure 7-10. Chamber Model State Model

graphics/07fig10.gif

The statechart in Figure 7-10 references three submachines. A submachine, you will remember, is a specification for the internals of a state when it is shown in a separate diagram. Figure 7-11 shows the internal structure of the SelfInhibited state and Figure 7-12 shows the internal structure for the SelfTriggered state. Note that the actions in these submachines invoke operations of the associated voltage sensor and output capacitor. The Dual state is decomposed as well, but it was left empty and so isn't shown here.

Figure 7-11. Chamber Model SelfInhibited Statechart

graphics/07fig11.gif

Figure 7-12. Chamber Model SelfTriggered Statechart

graphics/07fig12.gif

For the most part, the AtrialModel and VentricularModel state behavior is identical to that specified by the Chamber Model superclass. However, in AVI mode, the atrial model does the pacing and signals the VentricularModel when it should and should not pace. Those statecharts are shown in Figure 7-13 and Figure 7-14.

Figure 7-13. AtrialModel Dual Mode Statechart

graphics/07fig13.gif

Figure 7-14. VentricularModel Dual Mode Statechart

graphics/07fig14.gif

7.2.2 Calculator Example

Another highly reactive system, although admittedly one not normally considered either embedded or real-time, is a calculator. A calculator is an almost ideal example: It is conceptually simple yet complex enough to be difficult to implement in handwritten source code; it is highly reactive with rich state behavior; it has a well-understood behavioral model; and test examples to validate the model's correctness are easy to come up with.

Figure 7-15 shows the calculator's use cases. Because it is a simple device, there is only a single primary use case to evaluate an expression. In this case, that means to take an input string of characters representing a mathematical expression using real numbers and the arithmetic operators +, , *, /, ( and ), and to return a string representing the arithmetic result. The calculator should observe the normal operator precedence rules:

Parentheses have the strongest binding.
The next strongest binding is the unary + or operator.
The next strongest binding is for the multiplicative operators * and /.
The weakest binding is for the binary additive operators + and .

Figure 7-15. Calculator Use Cases

graphics/07fig15.gif

For example, the expression 6 + 3/4 should be evaluated as 3 divided by 4; the result of that subexpression should then be added to 6.

While it is possible to construct a monolithic (and complex) object to read in the string and spit out the answer, the solution provided in the class diagram is shown in Figure 7-16. The system is shown as a composite class named Calculator. It contains a number of parts, each of which fulfills some responsibilities to achieve the use case. Table 7-5 lists the general responsibilities of the various classes.

Figure 7-16. Calculator Classes

graphics/07fig16.gif

Table 7-5. Calculator Class Responsibilities
Class	Responsibility
Calculator	A System class, Calculator is composed of all of the parts of the internal system. As a composite, it is responsible for the creation and destruction of the parts it owns, as well as for wiring them together by instantiating the links between the parts with fixed multiplicities (in this case, all of them).
CharParser	Takes the input string and reads out a character at a time, identifies the type of the character (number, operator, or white space), then sends an appropriate event (along with the character) to the Tokenizer.
Tokenizer	Takes the input characters and constructs tokens either operators or numbers to be evaluated. Once a token is parsed, the token is sent to the Evaluator for evaluation.
Evaluator	The Evaluator takes in numeric and operator tokens and processes them using the operator precedence rules. Numbers are pushed onto the number stack, and operators are pushed onto the operator stack. The numbers and operators are reduced according with the rules of arithmetic and operator precedence.
NumberStack	An instantiation of a parameterized Stack class, this stack holds objects of type double.
OperatorStack	An instantiation of a parameterized Stack class, this stack holds objects of type char, representing the operators.
Stimulator	An object whose only purpose is to provide test vectors and force the calculator through its paces. The stereotype «TestingBuddy» indicates that it is part of the testing and debugging structures.

7.2.2.1 Calculator Class

The Calculator class is the composite class that represents the entire system. As a composite, it has the responsibility to construct and destroy the part instances. This makes it a convenient way to create the entire system we simply create a single Calculator instance and it in turn creates all of the parts and links them together as specified. Notice that all of the parts have single instances if the multiplicity was specified as *, the Calculator class wouldn't have enough information to do its job in its constructor, because the number of part instances and links wouldn't have been specified.

7.2.2.2 CharParser Class

The CharParser class takes a string, decomposes it into an ordered set of characters, identifies the types of each of these characters, and for each character, generates an event for the Tokenizer of the appropriate character type and passes the character. The types of characters are white space, digits (0 through 9), dot (.), or operators. These are computed via the set of utility functions define in the package, as shown in Code Listing 7-1.

Code Listing 7-1. Calculator Utility Functions

 //## operation isDigit(char) OMBoolean isDigit(char  c) {     //#[ operation isDigit(char)     return (c>='0' && c<='9');     //#] } //## operation isDot(char) OMBoolean isDot(char  c) {     //#[ operation isDot(char)     return c == '.';     //#] } //## operation isOp(char) OMBoolean isOp(char  c) {     //#[ operation isOp(char)     return (c=='-' || c=='+' ||         c=='*' || c=='/' || c == '^' ||         c == '(' || c == ')');     //#] } //## operation isWhiteSpace(char) OMBoolean isWhiteSpace(char  c) {     //#[ operation isWhiteSpace(char)     return (c==' ' || c== '/t');     //#] }

The CharParser has the list of attributes shown in Table 7-6.

Table 7-6. CharParser Attributes
Attribute	Type	Description
currentChar	char	Holds the current character extracted from exprString.
exprString	OMString	OMString is a basic string type provided by the Rhapsody tool. The exprString contains the expression to be evaluated.
freeRun	OMBoolean	OMBoolean is a basic Boolean type. This attribute is set to TRUE when you do not want to single step the execution; when set to FALSE, the characters are extracted, typed, and sent to the Tokenizer and the CharParser waits for the evGetNextChar event one at a time.
len	int	Length of exprString.
pos	int	Position of currentChar within exprString.

The behavior of the CharParser is highly state dependent. The statechart for this class is shown in Figure 7-17. The statechart could be simplified somewhat, but allows for both single-stepped operation and for free-run operation. The CharParser statechart begins parsing the expression once it receives an evGo event. Then it extracts the first character and types it, using the utility functions mentioned previously. Once the character has been typed, the CharParser GENs an event for the Tokenizer, passing the character with the event. The CharParser then continues with the next character until all characters have been read. As mentioned in , the freeRun attribute determines whether it will single-step or wait for the evGetNextChar event after each character is read and processed. Note that a special token end of string (EOS) is sent once all the characters have been processed.

Figure 7-17. CharParser Statechart

graphics/07fig17.jpg

7.2.2.3 Tokenizer Class

The Tokenizer class receives the events from the CharParser that correspond to the sequence of characters of the string to be evaluated. These events are evDigit, evDot (for a decimal place), evOp (for an operator), and evWhiteSpace (for a character to be ignored). For operators, the processing is very straightforward it constructs a single character token from the operator (since all operators are a single token), then calls sendOp(op) to send the operator to the Evaluator.

The Tokenizer has a number of attributes, shown in Table 7-7.

Table 7-7. Tokenizer Attributes
Attribute	Type	Description
op	char	The operator currently being tokenized
tensPlace	int	The decimal place for the current digit being processed
tokenStr	OMString	A string constructed of the incoming characters (used only for debugging)
value	double	The value of the number being tokenized

The code for the important operators is shown in Code Listing 7-2.

Code Listing 7-2. Tokenizer Operations

 void Tokenizer::addDigit(char  c) {     //#[ operation addDigit(char)     tokenStr = tokenStr + c; // build up string     value = value*10 + digit(c);     //#] } void Tokenizer::addDot() {     //#[ operation addDot()     tokenStr = tokenStr + '.';     tensPlace = 10;     //#] } void Tokenizer::addFractionalDigit(char  c) {     //#[ operation addFractionalDigit(char)     tokenStr = tokenStr + c;     value = value + digit(c)/tensPlace;     tensPlace *= 10; // get next decimal position     //#] } void Tokenizer::beginToken(char  c) {     //#[ operation beginToken(char)     tokenStr = c;     if (isDigit(c))       value = digit(c); // value of the number so far     else {       value = 0;       if (isOp(c)) op = c;       };     tensPlace = 10; // digit position     //#] } int Tokenizer::digit(char  c) {     //#[ operation digit(char)     return c-'0';     //#] } void Tokenizer::sendOp(int  c) {     //#[ operation sendOp(int)     switch (op) {       case '-':       case '+': itsEvaluator->GEN(evAddOp(op)); break;       case '*':       case '/':       case '^': itsEvaluator->GEN(evMultOp(op));         break;       case '(': itsEvaluator->GEN(evLeftParen(op));         break;       case ')': itsEvaluator->GEN(evRightParen(op));         break;       default: throw "Unknown operator";       };     //#] }

The processing for numbers is more complex. Because we need to properly interpret numbers such as 12, 12.4, 0.4, and .56789, the Tokenizer statechart (Figure 7-18) separates the processing of the whole part of the number from the fractional part. In addition, it uses its operations, shown in Listing 7-2, to compute a value, held in attribute value. As digits are read in, the tensPlace attribute holds the decimal place; this is used to multiply the current value by 10 before adding the next digit. Once we've found a decimal point (dot), we use this to divide the digit by 10^tensPlace to add that fraction to the value. The management of the execution of these operations is controlled by the Tokenizer statechart.

Figure 7-18. Tokenizer Statechart

graphics/07fig18.jpg

Note that the sendOp(op) operation further types the operator tokens into AddOp (+ or -) or MultOp (*, /) before sending them to the Evaluator. This simplifies the job for the Evaluator, as we shall see next.

7.2.2.4 Evaluator Class

So far, we've been getting ready to evaluate the string. The CharParser sends typed characters one at a time to the Tokenizer. The Tokenizer, in turn, constructs arithmetic tokens numbers or operators from these characters. The Evaluator actually does the mechanics of evaluating the string of tokens. As you might expect, it has the most complex statechart (see Figure 7-19).

Figure 7-19. Evaluator Statechart

graphics/07fig19.jpg

Let's walk through this statechart so that we understand how it works. Remember that the Evaluator gets tokens either operators (which are addition operators (+ or ), multiplicative operators (* or /), numbers, or a special token called EOS (end of string)).

Expressions that can be evaluated are of the form

 [ "signed number" operator "signed number" ]

where the elements inside the square brackets may repeat. Signed number is a number with an optional additive operator preceding it. Thus, the following are acceptable expression for the evaluator:

16
+8 45/89+19 1

White space is, as you might expect, discarded as the elements of the expression are tokenized so the Evaluator never "sees" white space.

The first thing the Evaluator expects to find is either a number (via an evNumber event) or a sign (via an AddOp event). In the case of an AddOp, the pushUnary(op) pushes a special token # to signify when a leading minus sign occurs; if a leading plus sign occurs, then it can be ignored (since while syntactically legal, it doesn't change the value).

Eventually, a number is received and the Evaluator enters the GotNumber state. Now it expects either an operator or an EOS token. Additive operators have the same precedence that is, because they are inverse operations, it doesn't matter in which order they are applied. In the same sense, multiplicative operators have the same precedence. However, the multiplicative operators have a higher precedence than additive ones. For example,

 1 + 2*4

returns 9 because the multiplication is performed first and then the addition. If operators were simply evaluated from left to right, this expression would evaluate to 12 (=3*4). Therefore, if we see an additive operator we need to save it for later because we don't want to apply it until we know whether or not there is a multiplicative operator coming. That's why when the Evaluator is in the GotNumber state and we receive an additive operator (via the AddOp(op) event), we push the number on the operator stack. This is also why, when we see a multiplicative operator, we can immediately apply it (okay, not immediately, as we have to get the second number first).

As long as we get additive operators we push the numbers and the operators onto the appropriate stacks. Once we see a multiplicative operator followed by a number, we can pop the numbers off the stack and apply the operators and push the results. This process is called reduction and is done by calling the reduce() operation.

Most of the rest of the transitions are for error handling. To make the statechart less ugly, several diagram connector pseudostates are used (WRONG TOKEN, EARLY EOS, and DONE) when special things happen. The first two of these are for ill-formed expressions such as 5.6 7.7 or 6 + * 8, where the token received wasn't what was expected. The DONE pseudostate is used when an EOS is received.

The methods defined for the Evaluator class are relatively simple, as the complexity of the Evaluator behavior is in the sequencing of the method calls, and that is controlled by the statechart. The methods of interest are shown in Code Listing 7-3.

Code Listing 7-3. Evaluator Methods

 void Evaluator::clear() {     //#[ operation clear()     itsNumberStack->clear();     itsOperatorStack->clear();     //#] } void Evaluator::displayResult() {     //#[ operation displayResult()     try {       result = itsNumberStack->pop();       cout << "Result = " << result << endl;     }     catch (EmptyStackError){       cout << "Found Empty stack in Evaluator.displayResult()" << endl;       };     //#] } void Evaluator::push(double  num) {     //#[ operation push(double)     itsNumberStack->push(num);     //#] } void Evaluator::push(char  op) {     //#[ operation push(char)     itsOperatorStack->push(op);     //#] } void Evaluator::pushUnary(char  unaryOp) {     //#[ operation pushUnary(char)     // '#' is pushed to indicate a unary minus     // if a unary plus is found (the other option)     // if can just be ignored and tossed away     if (unaryOp == '-')         itsOperatorStack->push('#');     //#] } void Evaluator::reduce() {     //#[ operation reduce()     OMBoolean done = FALSE;     while (!done) {      if (itsNumberStack->getTopOfStack()>1)        if (itsOperatorStack->elementAtTop() != '(')                   reduceFactor();        else                   done = TRUE; // done if left parenthesis      else       done = TRUE; // done if no numbers are left     }; // end while     //#] } void Evaluator::reduceFactor() {     //#[ operation reduceFactor()     char opToApply;     double num1,num2;     if (itsOperatorStack->elementAtTop() != '(') {       opToApply = itsOperatorStack->pop();       num1 = itsNumberStack->pop();       num2 = itsNumberStack->pop();       switch (opToApply) {             case '+': result = num1 + num2; break;             case '-': result = num2 - num1; break;             case '/': if (num1 == 0) throw "Divide by Zero";                     else result = num2 / num1;                     break;             case '*': result = num2 * num1; break;             case '(': throw "Unmatched Left Parenthesis";             case ')': throw "Parsing Error: Found Right Parenthesis";             };       itsNumberStack->push(result);     };     //#] } void Evaluator::reduceSubExpr() {     //#[ operation reduceSubExpr()     char op;     OMBoolean done = FALSE;     // reduce stack until left parenthesis found     while (!done) {       if (itsOperatorStack->getTopOfStack()>0) {             op=itsOperatorStack->elementAtTop();             switch (op) {                case '#': reduceUnary(); break;                case '(': op = itsOperatorStack->pop();                              done = TRUE;                              break;                case '+':                case '-':                case '*':                case '/': reduceFactor(); break;                }             }       else // didn't find matching left parenthesis             throw "Unmatched Right parenthesis";       };     //#] } void Evaluator::reduceUnary() {     //#[ operation reduceUnary()     char opToApply;     double num;     OMBoolean done = FALSE;     // reduce all leading signs as in     // 1 - - - - - -6 to 1 + 6     while (!done) {       if (itsOperatorStack->getTop()>0) {         if (itsOperatorStack->elementAtTop() == '#') {             opToApply = itsOperatorStack->pop();             num = itsNumberStack->pop();             result = -num;             itsNumberStack->push(result);             }  // end if         else           done = TRUE;       } // end if         else             done = TRUE;     }; // end while     //#] }

7.2.2.5 NumberStack and OperatorStack Classes

The NumberStack and OperatorStack are instantiations of a parametric class called Stack. C++ and Ada support the notion of parametric classes. A parametric class (called a template class in C++) is simply a class that is incompletely specified. Parametric classes are extremely useful for defining various kinds of containers, such as queues, stacks, lists, and trees, in which the containment behavior is fundamentally independent from the types of the elements being contained. In the definition of a parametric class, these undefined elements are given symbolic names and referred to by those names. Because part of the definition of the parametric class is missing, parametric classes cannot be directly instantiated into objects the missing information must first be provided.

The Stack class defers the specification of two important features the type of element being stored and the maximum number of elements being stored. The code for the stack is standard C++ code, as shown in Code Listing 7-4. The UML representation (not shown) is a class box named Stack with a dashed box in the upper righthand corner listing the parameters (size and T).

Code Listing 7-4. Stack Parameterized Class

 //## class Stack template <int size = 100, class T> class Stack  { public :     #ifdef _OMINSTRUMENT         //## ignore         typedef OMAnimatedStack<size, T> OMAnimatedStackType;     #endif // _OMINSTRUMENT //## class Stack ////    Friends    //// public : #ifdef _OMINSTRUMENT     friend  class OMAnimatedStack<size, T> ; #endif // _OMINSTRUMENT ////    Constructors and destructors    //// public :     //## auto_generated     Stack();     //## auto_generated     ~Stack(); ////    Operations    //// public :     //## operation clear()     void clear();     //## operation isEmpty()     OMBoolean isEmpty();     //## operation isFull()     OMBoolean isFull();     //## operation pop()     T pop();     //## operation push(T)     void push(T  element);     //## operation show()     void show(); ////    Additional operations    //// public :     //## auto_generated     T getStk(int  i1) const;     //## auto_generated     void setStk(int  i1, T  p_stk);     //## auto_generated     int getTop() const;     //## auto_generated     void setTop(int  p_top); ////    Attributes    //// protected :     T stk[size];        //## attribute stk     // top of stack - the index into the array     int top;            //## attribute top }; //## class Stack

A NumberStack is simply a Stack for doubles and an OperatorStack is a stack for operators (base type char).

Code Listing 7-5. NumberStack and OperatorStack Classes

 //## class NumberStack typedef Stack<50, double> NumberStack; //## class OperatorStack typedef Stack<100, char> OperatorStack;

In Code Listing 7-5, we see that the NumberStack can handle up to 50 doubles, while the OperatorStack can store 100 operators. If you want to show that the NumberStack is a parametric instantiation of Stack, you can represent NumberStack as a class with a solid-line rectangle in the upper righthand corner, with the values assigned to the parameters.

The purpose of the stack is to remember the numbers and operators until the rules of evaluation allow them to be processed. When a finite-state machine is combined with one or more stacks, it is called a stack machine. A stack machine is often used in parsing applications such as compiler front ends.

7.2.2.6 Stimulator Class

The Stimulator class is there as a debugging aid; it has the stereotype «TestingBuddy», which indicates it is part of the testing harness for the system. You can see from its statechart (Figure 7-20) that when it receives events such as test1, it sets a specific expression string to be evaluated and then kicks off the evaluation process by sending an evGo to the CharParser. Additionally, the stimulator can set whether the evaluation is done as a free-run process or requires single-stepping through the expression, a character at a time. The TestInput event is used when you want to just type characters into standard input.

Figure 7-20. Stimulator Statechart

graphics/07fig20.gif

7.2.3 Event Hierarchies

Because some events may be specialized versions of other events, it is possible to build event-class generalization hierarchies. Event hierarchies are useful because they allow polymorphic event receptions. This means that different objects (or states within the same object) can accept events at different levels in the hierarchy as appropriate.

Figure 7-21a shows a simple hierarchy for user input events. A user input active object accepting InputEvent objects would also accept any subclass of InputEvent, such as LeftMouseClick or KeyboardEvent, because a LeftMouseClick is a kind of InputEvent.

Figure 7-21a. Event Hierarchy

graphics/07fig21.gif

Figure 7-21b illustrates how one object might use the event hierarchy. The object has four and-states. A key thing to remember is that when the object receives an event, each active and-state logically receives its own copy of the event and is free to act independently on it, or discard it, as appropriate. For example, suppose the user moves the mouse over an element on the screen, selects it with a left mouse click, then double clicks to activate editing of that element, types in the name "Robotoothbrush," and hits the Enter key. The MouseProcessing and-state processes the mouse events while the KeyboardProcessing and-state processes the keystrokes. The EventLogging and-state accepts and logs every input event, because MouseMove, AlphNumeric, Left Click, and so on events are all InputEvents and as such, activate the transition in the EventLogging and-state.

Figure 7-21b. Event Reception

graphics/07fig21_01.gif