AN INTRODUCTION TO COMMUNICATING MEALY MACHINES AS A MODEL FOR SYSTEM BEHAVIOR

SUZAN SZOLLAR

MOTIVATION:

High level description languages offer significant productivity advantages for designers because systems can be defined purely by their behavior, with no implementation details provided. The shift to higher levels of abstraction is crucial for complexity management in tomorrow's billion transistor designs. Furthermore the uses of these languages allow a more thorough exploration of the design space, a benefit for both large and small designs. The language in which a design is described makes certain implications about the model of time (and therefore behavior/computation) in the system. For many years there has been a continuing struggle between the ideal characteristics of a design language and the incorporation of a mathematical model of behavior into these languages. As a result most approaches to creating a design language today consist of creating a subset of some common language (C or VHDL) or creating an entirely new language fundamentally rooted in the model of computation. Both of these approaches are non-ideal for the end user of the system who wishes to program in any easy to use, widely known design language. Daniel Gajski presents a taxonomy of these languages and the support for the "embedded-system model characteristics." From the fact that many of the languages are completely non-standard and that designers might need to understand as many as nine languages, it is clear that today's solution is non-ideal. Furthermore if a new model characteristic is required, the traditional paradigm requires the designers to drastically alter his design environment (e.g., switch to a yet another language.) For this reason, the WELD team at UC Berkeley has proposed a new paradigm for system description (http://www-cad.eecs.berkeley.edu/~newton) Figure 1, A user describes the system in Java with no constraints. The user then selects a model with which the code should be compared. The selected model depends largely on the type of design and implementation requirements of the design. Once the model is chosen, The Java code is then refined through a series of successive steps to yield code that is guaranteed compliant with the selected model. This paradigm allows the user to work and design in the widely supported Java language and still create code that is explicitly compliant with a given model, eliminating many of the traditional problems.

Figure 1: Refined Java for System Design

This paper analyzes the various models M required for design and proposes a general purpose communicating Mealy machine model with variants (Mi, Mr, Ma). The Mealy machine model is compared against other defined models such as Co-design Finite State Machines - CFSM (UC Berkeley CAD group, Polis project) and finite state machine with datapath - FSMDs (Dan Gajski, UC Irvine). Furthermore, Java is proposed as a basis for a specification language to describe the communicating Mealy machines (as in Figure 1). This work looks at Figure 1 from the bottom up with the end goal of understanding the model required and its implications. An example Mealy machine is defined in Java and implemented in an example of the conditional clocking used to assure low-power on the StrongArm processor, developed by Dan Dobberpuhl at DEC.

INTRODUCTION:

This paper asserts that a good model for describing system behavior is communicating Mealy machines. In Mealy machines, output is based on the input and the present state. We discuss three model variants, a time independent model Mi, a time-relative model Mr, and a time-absolute Ma. The end goal is that a user creates Java code that can be compared against the model to determine the minimum number of constraints necessary to guarantee its compliance with the selected model. After the general purpose Mealy-machine model is presented, several variants are discussed. In the time independent model, there is no notion of time, there can be no assertion of the length of time required for input and output signals. For example, if inside the Mealy machine is a counter, an input event sets off the counter, then another input event comes in, there can be no knowledge as to how many counts it has accomplished. In this model there is no order dependence. This non-deterministic behavior can be eliminated with a handshaking protocol. The time-independent model is a good design choice when there are critical data transfers or when the targeted implementation platform is unpredictable (e.g., deep-submicron interconnects). The second model variant requires a relative notion of time, which means that there is some idea of at what point one event occurs with respect to another. This type of model can be used when the overhead of handshaking is too expensive or not necessary. The partial ordering of events might be described by an approximate number, a min-max range, or a distribution. The third model is the absolute time model, which is mapped directly onto a timeline. This third model can be seen as the time-relative model with the a defined origin and value described in seconds for the event orderings

Other models such as the Polis group's CFSM and Dan Gajski's FSMD are compared against the communicating Mealy machine model to illustrate the differences.

As an example implementation of how the model plays into a real-world situation, the clocking around the StrongArm processor is evaluated. Dan Dopperpuhl's project to build a low power processor has a conditional clocking scheme. The clocking scheme will be modeled in Java and will be evaluated to determine which version of the Mealy machine is appropriate to specify its behavior, speculating that it can be done using Mi.

MEALY MACHINE MODEL AND VARIANTS:

A Mealy machine is defined such that the outputs depend on the present state and the present value of the inputs. We introduce the notion of a clocking element into the machine to provide an element of state. The outputs can change when the inputs change, independent of the clock, and therefore without refinement the model has zero delay (Figure 2).

Figure 2. Mealy Machine Diagram

An issue with the Mealy machine model is that if two are put in a loop, a non-determinable system is produced (Figure 3). However, it is argued that race conditions can easily be identified in a piece of (Java) software, and can be alleviated with the addition of a delay element such as a latch (Figure 4). One method of detecting races within a Java description, would be to trace through one complete cycle of the combinational logic (state machine code) with a watch on the input variables. If both a read and write to the variable occurs within once trace, then a delay needs to be inserted.

Figure 3. Oscillator, Race Conditions

Figure 4. Element of Delay, No Race Conditions

Three time based parameters are introduced into the fundamental Mealy machine model which can be used to define the behavior of a system, by specifying only the time constraints which are necessary for the intended (deterministic) behavior. The time independent model, Mi, is appropriate when there is no knowledge regarding the time ordering of inputs and outputs of a system. If a system implements a handshaking protocol, then this is the appropriate model, because it does not require any information about the length of input and output signals. The handshaking protocol is important for situations where the complexity is too great to reliably utilize the ordering of events. Another instance of its importance is when there is little predictability in the lengths or delays in a system. Such cases are increasingly common in deep-submicron technology. The process variations lead to very hard to predict delays. The time independent model is highlighted below (Figure 4). Implicit in the diagram is that within the Mealy machine is a counter, and that an input event triggers it to start counting. The output is in a feedback loop which continues to generate output pulses while the input is asserted. In this situation, the pulse widths for the inputs and frequency of the outputs are unknown. Therefore, it cannot be determined how many output pulses have been generated while the input pulse was asserted. If the input pulse had a time span of 2ns and was asserted once, and the outputs were each 4ns long, then only one output pulse would have been generated. However, if the input pulse was 4ns long, and each clock tick within the Mealy machine was 1ns, then two output pulses might have been generated. In this case, the output is non-deterministic.


Figure 4. Non-Deterministic Behavior of the Time Independent Mealy Machine

The time relative model is such that there is some knowledge of when events occur with respect to each other i.e., a partial ordering. If for example there are shared variables in a given context, then there must be some constraints placed on when the variable is accessed to guarantee that the right value is available to the methods accessing it. A rudimentary control-dataflow graph (CDFG) is used to illustrate the importance of a relative notion of time for the case of shared variables. (Control-dataflow graphs allow for loops and branches, whereas simple dataflow graphs are not sufficient to represent reactive or embedded systems in which the control sequence is based on external conditions.)

Figure 5. Shared Variable and the Required Partial Ordering

In this case there must be some knowledge about when the C2s compute with respect to each other, because they both access the same variable. If the two series of computations were realized in Java, with concurrency expressed by using threads, then a relationship between the computations must be expressed. Three expressions of the relative time relationship are proposed; an estimate of the time lapse between the two computations, an estimate of the minimum and maximum amount of time between the computations accessing the shared variable, and a distribution model indicating the most likely relative time lapse between the computations at it's peak.

Finally, the clocked or absolute model requires that the time at which events occur, as well as their duration, be mapped directly onto a timeline, making it's behavior fully determined in time. This can occur when the time-relative model is assigned a time origin and the relative units are assigned a time in seconds.

OTHER MODELS:

The Mealy machine model was compared against several other models, of which two are the Co-design Finite State Machine or CFSM (defined by the CAD group working on Polis at UC Berkeley), and the Finite State Machine with a Datapath or FSMD (defined by Dan Gajski of UC Irvine).

CFSMs are asynchronous and communicate through events which are emitted by a CFSM and are detected some time later by other CFSMs, assuming a broadcast model. Information is exchanged via a shared buffer (Figure 6). (It is argued that this is safer than shared variables and easier to implement than rendez-vous, but can be used as a building block for both). Two types of input events are defined, pure-value and trigger events. Trigger events are used only once and cause a transition of a given CFSM. They implement the synchronization mechanism of CFSMs. Pure value events cannot directly cause a transition but can be used to choose among different possibilities involving the same set of trigger events. The model is event driven. The state of CFSMs consists of a set of event types that are both input and output for it. The non-zero reaction time of the feedback loop provides the storage capability to implement state. A discrete model of time is used, each computing element takes a non-zero unbounded time to perform its task.


Figure 6. Co-design Finite State Machine

Unlike CFSMs, the model we have described - communicating Mealy machines - does not require two different kinds of input events to be defined. This distinction of event types does not need to be made for the specification of system behavior. Furthermore, the Mealy machine model does not require that there be buffers at their inputs and outputs. Buffers are inserted only if there is a loop that causes a race condition. In CFSMs, the buffers hold their information for one clock cycle. Buffers are not clocked synchronously, their values are held until the next input/output event is asserted.

FSMDs are described as Finite State Machines augmented by the use of variables, thus an FSM with a Datapath. It is argued that a FSM model does not work as a model for systems with more than a few hundred states because it becomes incomprehensible to the designer. The FSMD is defined according by a relation (such as a + b > x -1 ) and an expression (such as x = (a2 + b2)1/2), and the concept of superstate which introduces into the system. The FSMD is implemented with a control unit and a datapath. Each state in the model corresponds to a clock cycle in the implementation. The FSMD is intended as a general model for describing all hardware designs. Similar to the Mealy machine model, if there is a feedback loop, a latch is inserted to introduce an element of delay and prevent race conditions. The FSMD's next state and outputs depend not only on the present state and the external signals but also on the internal status signals that indicate whether a relation between two datapath quantities is true or false. The Mealy machine differs from the FSMD in that it does not redefine the standard FSM inputs and outputs into relations and expressions, and that it does not rely on communication with the datapath to determine it's behavior. The Mealy machine model is more simple, yet provides the necessary constraints for specification of a system.

JAVA*: JAVA COMPLIANT TO THE MEALY MACHINE MODEL

Because Java supports sequential and concurrent assignments (by implementing threads) conditional constructs, loops, and synchronization, it provides a good foundation for an augmented form of the language to be used for specification and system behavior description. Furthermore, because there are no pointers in Java, and because memory management is not handled by the user (but rather by a garbage collector), it stays clear of the pitfalls of object oriented languages such as C++, which give too much control to the designer and inspire programs that are difficult to analyze. Other languages used for system description are VHDL, HardwareC, Verilog, Silage, CSML. We believe an implementation of a design system such as described in Figure 1 of the Motivation section will avoid several of the pitfalls in the nature of today's widely used hardware description languages VHDL and Verilog. These languages were originally developed for simulation and form a cumbersome interface to behavioral and architecture description. The languages are difficult to master especially for synthesis and have significantly slowed the overall pace of innovation in the design technology community (http://www-cad.eecs.berkeley.edu/~newton). However, these languages have been dictated to users via the suppliers of EDA technology and the desire to standardize on a language already in use in the design community. Java's ease of use and excellent support combined with an interactive refinement process will allow synthesizable Java* code to be produced without having to invent a new language and all new tools.

The classes and methods required to implement the communicating Mealy machine model on a system is being developed initially using threads to encapsulate separate processes. This allows for concurrent processes to run, as well as the implementation of synchronization and handshaking, with methods wait() (implicit buffer), notify() (implicit buffer), and join(). join() waits until a thread it depends on is finished running and then starts the next part. wait() is defined to enable waiting until some condition occurs, and notify() is defined to tell the threads waiting that something has occurred. Handshaking can be implemented using wait() and notify(), or by wrapping a computation or variable assignment with synchronized.

EXAMPLE: STRONGARM PROCESSOR MODIFIED FOR LOW POWER

The low-power version of the StrongArm processor is used to demonstrate an application of the Mealy machine model to a real-time system. In order to achieve low power, conditional clocking (as well as edge triggered latches used to maximize speed and minimize gate load on the clock) is implemented. The various processes are encapsulated in Java threads, both to demonstrate concurrency, as when the bus clock and the internal clock grid run separately, and to implement conditional clocking situations, such as when a cache fill occurs and the internal clock grid runs off the bus clock.

DESCRIPTION: A PLL (phase locked loop) is connected to two separate clocks for I/O and the core. Conditional clocking is used. The chip has five clock regions for: 1)pins; 2)bus interface control & write buffer; 3)core (integer unit and MMUs); 4)Icache; 5)Dcache

FIRST PART RELEVANT TO MEALY MODEL:

The pins, bus-interface, and write-buffers run off the bus clock (MCLK). The core, Icache, and Dcache are connected to the internal clock grid (DCLK). During cache fills, the internal clock grid is clocked by the bus clock (MCLK) rather than the PLL to save power and to minimize the synchronization time during fills. Since the pins and bus interface control are clocked by the MCLK, and since the caches are filled from these inputs, it makes sense to use MCLK on them as well. Clocking off of MCLK does not slow the system down because the system can't clock any faster than the input coming from the pins and bus interface.

Figure 7. Clocking of Low Power StrongArm

Figure 8. Clocking of Low Power StrongArm During Cache Fills

In the Core:

EBOX: integer math except multiplier

EMUL: multiplier

IMMU: instruction memory management unit

DMMU: data memory management unit

WB: write buffer

SECOND PART RELEVANT TO MEALY MODEL:

The write buffer receives clocks sourced by both the internal clock grid and the bus clock grid (DCLK, MCLK) since during stores it must pass data across this clock boundary.

The StrongArm also has a 5-stage pipeline. The instruction unit (IBOX) operates in the first 2 stages, Fetch and Issue. In the fetch cycle, the next instruction is fetched from the instruction cache (ICACHE) and latched into to the instruction buffer. In the issue stage, all register dependencies and execution unit (EBOX) availability is checked and the instruction is issued if there are no conflicts. If there are conflicts, the instruction is held in the IBOX until all the dependencies clear. Result forwarding is provided for all results between generation and register file writing. Most instructions need one issue cycle.

The basic states in which the processor operates are standby, idle, MCLK and/or DCLK active, and the MCLK runs the internal clock grid during cache fills. Figure 9 demonstrates the sequence of

events.

Figure 9. Possible Clocking from Power-up.

CONCLUSION:

We have shown that the currently existing approach to developing a language for describing system behavior is limited by being application specific. Therefore, many languages have been developed which can neither communicate with each other efficiently or render the best description of a given system. We have decided to take a different approach, letting the system designer specify the design in a standard, popular language such as Java, and putting the code through a set of refinement steps compared against a selection of models to determine the one with the appropriate number of constraints to express the behavior of the system accurately. We have introduced three variants of the Mealy machine model, time independent, time relative, and time absolute, as a category of possible models, and we have compared the Mealy model to the CFSM and FSMD to illustrate the differences and advantages, and why the generalized model is sufficient to express behavior.

REFERENCES:

[1]Berry, G. The Constructive Semantics of Pure Esterel. May 22, 1996.

[2]Bowhill, William J., Allmon, Randy L., Bell, Shane L., Cooper, Elizabeth M., Donchin, Dale R., Edmondson, John H., Fischer, Timothy C., Gronowski, Paul E., Jain, Anil K., Kroesen, Patricia L., Loughlin, Bruce J., Preston, Ronald P., Rubinfeld, Paul I., Smith, Michael J., Thierauf, Stephen C., Wolrich, Gilbert M.. A 300MHz 64b Quad-Issue CMOS RISC Microprocessor. 1995 IEEE International Solid-States Circuits Conference, p.182-183.

[3]Chiodo, Massimiliano, Damiano, Antonino, Lavagno, Luciano, Sangiovanni-Vincentelli, Alberto. Design Automation for Reactive Embedded Controller Co-design.

[4]Chiodo, Massimiliano, Giusto, Paolo, Hsieh, Harry, Jurecska, Attila, Lavagno, Luciano, Sangiovanni-Vincentelli, Alberto. A Formal Methodology for Hardware/Software Co-design of Embedded Systems.

[5]Clarke Jr., Edmund M., Long, David E., McMillan, Kenneth L.. A Language for Compositional Specification and Verification of Finite State Hardware Controllers. Proceedings of the IEEE, Vol. 79, No. 9, September 1991, p. 1282-1292.

[6]Dobberpuhl, Dan. The Design of a High Performance Low Power Microprocessor. Slides. (DEC)

[7]Gajski, Daniel D., Ramachandran, Loganath. Introduction to High-Level Synthesis. IEEE Design & Test of Computers, Winter 1994, p.44-54.

[8]Gajski, Daniel D., Vahid, Frank. Specification and Design of Embedded Hardware-Software Systems. IEEE Design and Test of Computers, Spring 1995, p. 53-67.

[9]Gupta, Rajesh K., de Micheli, Giovanni. Hardware-Software Cosynthesis for Digital Systems. IEEE Design and Test of Computers, September 1993, p. 29-41.

[10]Kaenel, Vincent von, Aebischer, Daniel, Piguet, Christian, Dijkstra,

Evert. A 320MHz, 1.5mW at 1.35V CMOS PLL for Microporcessor Clock Generation. 1996 IEEE International Solid-State Circuits Conference, p.132-133.

[11]Kalavade, Asawaree, Lee, Edward A.. A Hardware-Software Codesign Methodology for DSP Applications. IEEE Design and Test of Computers, September 1993, p.16-28.

[12]Montanaro, James, Witek, Richard T., Anne, Krishna, Black, Andrew J., Cooper, Elizabeth M., Dobberpuhl, Daniel W., Donahue, Paul M., Eno, Him, Farell, Alejandro, Hoeppner, Gregory W., Kruckemyer, David, Lee, Thomas H., Lin, Peter, Madden, Liam, Murray, Daniel, Pearce, Mark, Santhanam, Sribalan, Snyder, Kathryn J., Stephany, Ray, Thierauf, Stephen C.. A 160MHz 32b CMOS RISC Microprocessor. 1996 IEEE International Solid-State Circuits Conference, p. 214-215.

[13]Shiple, Thomas R., Berry, Gerard, Touati, Herve. Constructive Analysis of Cyclic Circuits (EDTC, Paris, March 1996).

[14]Shiple, Thomas R., Singhal, Vigyan, Brayton, Robert K., Sangiovanni-Vincentelli, Alberto L.. Analysis of Combinational Cycles in Sequential Circuits.