33.

[Top] [Next] [Prev]

12.1 Random Logic

In this section, we examine classical controller structures based on the standard methods for implementing Moore and Mealy machines. This controller organization is sometimes called random logic, because the next-state and output functions are formulated in terms of discrete logic gates. We could just as easily use programmable logic, such as a PAL/PLA, EPLD, FPGA, or ROM, to implement these functions.

In this section, we examine two alternative control implementations for the instruction set and data-path introduced in the last chapter. Since we examined the Mealy implementation in Chapter 11, we concentrate on the Moore approach in the next subsection.

A Mealy machine is often the most economical way to implement the controller state machine, but its asynchronous outputs introduce timing problems. We will look at the differences between synchronous and asynchronous Mealy machines and the timing relationship between signal assertions and their effect in the data-path. You'll see that it doesn't take much to convert an asynchronous Mealy machine into its synchronous cousin.

12.1.1 Moore Machine

Figure 12.1 gives the complete state diagram, including register transfer operations, for a Moore machine implementation of the processor of Section 11.3. It requires more states than the equivalent Mealy diagram, but the difference is small. In particular, we need an extra state in the reset/instruction fetch sequence and another one in the branch negative sequence.

The assignment of register transfer operations to states is reasonably straightforward. Only one combination of register transfer operations in the same state is surprising. This is where the memory read request is used at the same time as latching the memory databus into the MBR (see states IF2, LD1, and AD1). Does this result in latching invalid data?

No, it doesn't.

Figure 12.2 shows the detailed timing of events for the state sequence IF1, IF2, and IF3. Each time we loop in such a state, the MBR captures the current value of the memory bus. For the first few times looping in the state, the data captured by the MBR is invalid. However, the Wait signal remains asserted until the memory places valid data on the bus. When Wait is unasserted, the value latched into MBR at the next clock transition is valid. This is the same state transition that advances the state machine to its next state (IF3, LD2, or AD2).

Moore Machine Block Diagram The block diagram of the Moore machine is given in Figure 12.3.

It requires 16 states. We encode this densely in a 4-bit state register. The next-state logic has nine inputs (four current-state bits, Reset, Wait, two IR bits, and one AC bit) and four outputs (the next state). Since the data-path control signals are decoded from the state, this block of logic has four inputs and 18 outputs.

Implementation Choices: ROM Versus PAL/PLA We can implement the next-state logic block and the output logic block as either ROMs or PAL/PLAs. Using ROMs, we can implement the next-state logic by a 512 by 4-bit memory and the output logic by a 16 by 18-bit memory. Since single device ROMs come in widths that are powers of 2, we would implement the latter with several 4-bit- or 8-bit-wide ROMs.

We begin implementing the controller by obtaining the symbolic next-state table. This is shown in Figure 12.4. A couple of things are worth observing about this table. First, we can make extensive use of the don't cares among the inputs/address lines. Notice that a given input signal is examined in very few states. For example, the IR bits are examined in state OD and the AC sign bit is tested only in state BR0.

Second, the number of register transfer operations asserted in any given state is rather small. In Figure 12.4, no more than four register transfer operations are asserted in any state. Some of the outputs, like those associated with memory references, are always asserted together. We will exploit this in some of the controller implementation strategies later in this chapter.

Of course, a ROM-based implementation cannot take advantage of don't cares. You must program all 512 ROM words for the next-state logic, a rather tedious task. However, one advantage of using a ROM is that you need not worry about a careful state assignment.

If you use a PAL or a PLA, then a good state assignment is essential for reducing the complexity of the next-state logic. For example, the naive state assignment suggested by Figure 12.4 (basically, a depth-first enumeration) yields a 21-product-term implementation.

This compares rather unfavorably with the equivalent of 512 product terms in the ROM case (one term for each ROM word).

Figure 12.5 shows the espresso inputs and outputs for this particular state assignment. The next-state logic is fairly complex. Each next-state bit requires seven to nine product terms for its implementation. This implies that you should use a PAL component with large OR gate fan-ins, like the P22V10. For a PLA-based implementation, all you need is a PLA that provides 21 unique product terms.

A nova state assignment can do even better. It requires only 18 product terms:

state IF0: 0000   state ST0: 0101 state IF1: 1011   state ST1: 0110 state IF2: 1111   state AD0: 0111 state IF3: 1101   state AD1: 1000 state OD: 0001   state AD2: 1001 state LD0: 0010   state BR0: 1010 state LD1: 0011   state BR1: 1100 state LD2: 0100   state RES: 1110

12.1.2 Synchronous Mealy Machine

The organization of a synchronous Mealy machine is not very different from the Moore machine just described. The key is merging the next-state and output Boolean functions into a single logic block. For the Mealy machine, the logic has nine inputs and 22 outputs (four state outputs and 18 microoperation control outputs).

The combined next-state and output functions have some interesting implications for a ROM-based implementation. The Moore machine required only 2336 ROM bits for its implementation (512 ¥ 4 + 16 ¥ 18). The Mealy machine needs 11,264 ROM bits (512 ¥ 22). This shows some of the inherent inefficiency of the ROM-based approach. Many of those ROM bits are really don't cares. Of course, ROMs are very dense and even large ROMs are not too expensive. We will see more efficient methods for using ROMs in our discussion of microprogramming.

Synchronous Versus Asynchronous Mealy Machines A conventional Mealy machine is asynchronous. Input changes lead to output changes, independent of the clock. This can play havoc when the outputs are signals that immediately control the data-path. We must be able to assert control signals in a well-behaved, synchronous manner.

To some extent, we can minimize the danger of asynchronous control signals by selecting data-path components with synchronous controls. These inputs need not stabilize until a setup time before the controlling clock edge.

However, there is still a problem for control signals that take effect immediately. The safest remedy is to make the Mealy machine synchronous. In a synchronous Mealy machine, the outputs change only when the state changes and remain stable throughout the state time. We achieve this by placing registers between the input signals, the combinational logic that computes the outputs, and the output signals. Let's examine the approaches for constructing synchronous Mealy machines next.

Synchronizing a Mealy Machine 

Figure 12.6 shows three possible ways to construct a Mealy machine with synchronized outputs: using edge-triggered devices (a) at the input and output, (b) only at the inputs, and (c) only at the outputs. Each affects the timing of control signals in a slightly different way. In the figure, we assume that output should be asserted whenever input A is asserted.

Let's start with case (a), which synchronizes both the inputs and the outputs. Assuming that A is asserted in cycle 0, the synchronized output will not be asserted until cycle 2. This delays the calculation of by two cycles. Thus, if A is asserted in state S0, the output is not asserted until state S2. The timing and state diagrams of Figure 12.7 make this more clear.


You should realize that placing synchronizing registers at both the inputs and outputs is overkill. We can get the desired synchronization by placing flip-flops at one side or the other of the output logic. Let's consider case (b): only the inputs are synchronized.

Figure 12.8 shows the effects. If A is asserted in cycle 0, is asserted in the following cycle. Alternatively, we can label the state transitions in the next state with the synchronized input A' and the output .

Case (c) places synchronizing logic only on the outputs. The timing diagram is similar to Figure 12.8 and is shown in Figure 12.9.

The synchronized output signal ' takes effect in the state after the one in which A is first asserted.

Synchronizing the Simple CPU Mealy State Diagram To make these ideas concrete, let's examine a Mealy implementation of the processor control state machine we derived in Section 11.3. Case (b), which places registers on the inputs, makes the most sense for synchronizing this machine. Of the five inputs, IR<15:14> and AC<15> are already synchronized because they are data-path registers clocked by the same clock as the control state machine. The delayed effect of control signals does not apply here; we are not placing an additional register in the path between the IR and the AC and the control. For example, we can load the IR with a new instruction in one state and compute a multiway branch based on the op code in the very next state.

Reset and Wait are another matter. Because these signals come from outside the processor, it is prudent to pass them through synchronizing flip-flops anyway. This means that the external Reset and Wait signals are delayed by one clock cycle before they can influence the state machine.

Delaying reset by one cycle has little effect because the state of the machine will be reset anyway. However, the one state delay of the Wait signal does affect performance. The processor normally loops in a state until there is a change in Wait. This means that the machine stays in the loop for an extra cycle. Even a memory that responds to a request immediately requires one processor clock cycle before the processor can recognize that the operation is complete.

If we design the memory system to be synchronous with the processor, we can avoid this performance loss. Since the memory system's controller is clocked with the same clock as the processor, the Wait signal no longer needs to be synchronized.

[Top] [Next] [Prev]


This file last updated on 07/16/96 at 04:34:26.
randy@cs.Berkeley.edu;


What is Sarbanes-Oxley[q]
What is Sarbanes-Oxley[q]
ISBN: 71437967
EAN: N/A
Year: 2006
Pages: 101

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net