High level description languages offer significant productivity advantages for designers because systems can be defined purely by their behavior, with no implementation details provided. The shift to higher levels of abstraction is crucial for complexity management in tomorrow's billion transistor designs. Furthermore the uses of these languages allow a more thorough exploration of the design space, a benefit for both large and small designs. The language in which a design is described makes certain implications about the model of time (and therefore behavior/computation) in the system. For many years there has been a continuing struggle between the ideal characteristics of a design language and the incorporation of a mathematical model of behavior into these languages. As a result most approaches to creating a design language today consist of creating a subset of some common language (C or VHDL) or creating an entirely new language fundamentally rooted in the model of computation. Both of these approaches are non-ideal for the end user of the system who wishes to program in any easy to use, widely known design language. Daniel Gajski presents a taxonomy of these languages and the support for the "embedded-system model characteristics." From the fact that many of the languages are completely non-standard and that designers might need to understand as many as nine languages, it is clear that today's solution is non-ideal. Furthermore if a new model characteristic is required, the traditional paradigm requires the designers to drastically alter his design environment (e.g., switch to a yet another language.) For this reason, the WELD team at UC Berkeley has proposed a new paradigm for system description (http://www-cad.eecs.berkeley.edu/~newton) Figure 1, A user describes the system in Java with no constraints. The user then selects a model with which the code should be compared. The selected model depends largely on the type of design and implementation requirements of the design. Once the model is chosen, The Java code is then refined through a series of successive steps to yield code that is guaranteed compliant with the selected model. This paradigm allows the user to work and design in the widely supported Java language and still create code that is explicitly compliant with a given model, eliminating many of the traditional problems.
This paper analyzes the various models
M required for design and proposes a general purpose communicating
Mealy machine model with variants (Mi, Mr,
Ma). The Mealy machine model is compared against other
defined models such as Co-design Finite State Machines - CFSM
(UC Berkeley CAD group, Polis project) and finite state machine
with datapath - FSMDs (Dan Gajski, UC Irvine). Furthermore, Java
is proposed as a basis for a specification language to describe
the communicating Mealy machines (as in Figure 1). This work
looks at Figure 1 from the bottom up with the end goal of understanding
the model required and its implications. An example Mealy machine
is defined in Java and implemented in an example of the conditional
clocking used to assure low-power on the StrongArm processor,
developed by Dan Dobberpuhl at DEC.
This paper asserts that a good model
for describing system behavior is communicating Mealy machines.
In Mealy machines, output is based on the input and the present
state. We discuss three model variants, a time independent model
Mi, a time-relative model Mr, and a time-absolute
Ma. The end goal is that a user creates Java code
that can be compared against the model to determine the minimum
number of constraints necessary to guarantee its compliance
with the selected model. After the general purpose Mealy-machine
model is presented, several variants are discussed. In the time
independent model, there is no notion of time, there can be no
assertion of the length of time required for input and output
signals. For example, if inside the Mealy machine is a counter,
an input event sets off the counter, then another input event
comes in, there can be no knowledge as to how many counts it has
accomplished. In this model there is no order dependence. This
non-deterministic behavior can be eliminated with a handshaking
protocol. The time-independent model is a good design choice
when there are critical data transfers or when the targeted implementation
platform is unpredictable (e.g., deep-submicron interconnects).
The second model variant requires a relative notion of time,
which means that there is some idea of at what point one event
occurs with respect to another. This type of model can be used
when the overhead of handshaking is too expensive or not necessary.
The partial ordering of events might be described by an approximate
number, a min-max range, or a distribution. The third model is
the absolute time model, which is mapped directly onto a timeline.
This third model can be seen as the time-relative model with
the a defined origin and value described in seconds for the event
Other models such as the Polis group's
CFSM and Dan Gajski's FSMD are compared against the communicating
Mealy machine model to illustrate the differences.
As an example implementation of how
the model plays into a real-world situation, the clocking
around the StrongArm processor is evaluated. Dan Dopperpuhl's
project to build a low power processor has a conditional clocking
scheme. The clocking scheme will be modeled in Java and will
be evaluated to determine which version of the Mealy machine
is appropriate to specify its behavior, speculating that it can
be done using Mi.
MEALY MACHINE MODEL AND VARIANTS:
A Mealy machine is defined such that
the outputs depend on the present state and the present value
of the inputs. We introduce the notion of a clocking element
into the machine to provide an element of state. The outputs
can change when the inputs change, independent of the clock, and
therefore without refinement the model has zero delay (Figure
An issue with the Mealy machine model
is that if two are put in a loop, a non-determinable system is
produced (Figure 3). However, it is argued that race conditions
can easily be identified in a piece of (Java) software, and can
be alleviated with the addition of a delay element such as a latch
(Figure 4). One method of detecting races within a Java description,
would be to trace through one complete cycle of the combinational
logic (state machine code) with a watch on the input variables.
If both a read and write to the variable occurs within once trace,
then a delay needs to be inserted.
Three time based parameters are introduced
into the fundamental Mealy machine model which can be used to
define the behavior of a system, by specifying only the time constraints
which are necessary for the intended (deterministic) behavior.
The time independent model, Mi, is appropriate when
there is no knowledge regarding the time ordering of inputs and
outputs of a system. If a system implements a handshaking protocol,
then this is the appropriate model, because it does not require
any information about the length of input and output signals.
The handshaking protocol is important for situations where the
complexity is too great to reliably utilize the ordering of events.
Another instance of its importance is when there is little predictability
in the lengths or delays in a system. Such cases are increasingly
common in deep-submicron technology. The process variations lead
to very hard to predict delays. The time independent model is
highlighted below (Figure 4). Implicit in the diagram is that
within the Mealy machine is a counter, and that an input event
triggers it to start counting. The output is in a feedback loop
which continues to generate output pulses while the input is asserted.
In this situation, the pulse widths for the inputs and frequency
of the outputs are unknown. Therefore, it cannot be determined
how many output pulses have been generated while the input pulse
was asserted. If the input pulse had a time span of 2ns and was
asserted once, and the outputs were each 4ns long, then only one
output pulse would have been generated. However, if the input
pulse was 4ns long, and each clock tick within the Mealy machine
was 1ns, then two output pulses might have been generated. In
this case, the output is non-deterministic.
The time relative model is such that there is some knowledge of when events occur with respect to each other i.e., a partial ordering. If for example there are shared variables in a given context, then there must be some constraints placed on when the variable is accessed to guarantee that the right value is available to the methods accessing it. A rudimentary control-dataflow graph (CDFG) is used to illustrate the importance of a relative notion of time for the case of shared variables. (Control-dataflow graphs allow for loops and branches, whereas simple dataflow graphs are not sufficient to represent reactive or embedded systems in which the control sequence is based on external conditions.)
In this case there must be some knowledge
about when the C2s compute with respect to each other, because
they both access the same variable. If the two series of computations
were realized in Java, with concurrency expressed by using threads,
then a relationship between the computations must be expressed.
Three expressions of the relative time relationship are proposed;
an estimate of the time lapse between the two computations, an
estimate of the minimum and maximum amount of time between the
computations accessing the shared variable, and a distribution
model indicating the most likely relative time lapse between the
computations at it's peak.
Finally, the clocked or absolute
model requires that the time at which events occur, as well as
their duration, be mapped directly onto a timeline, making it's
behavior fully determined in time. This can occur when the time-relative
model is assigned a time origin and the relative units are assigned
a time in seconds.
The Mealy machine model was compared
against several other models, of which two are the Co-design Finite
State Machine or CFSM (defined by the CAD group working on Polis
at UC Berkeley), and the Finite State Machine with a Datapath
or FSMD (defined by Dan Gajski of UC Irvine).
CFSMs are asynchronous and communicate
through events which are emitted by a CFSM and are detected some
time later by other CFSMs, assuming a broadcast model. Information
is exchanged via a shared buffer (Figure 6). (It is argued that
this is safer than shared variables and easier to implement than
rendez-vous, but can be used as a building block for both). Two
types of input events are defined, pure-value and trigger events.
Trigger events are used only once and cause a transition of a
given CFSM. They implement the synchronization mechanism of CFSMs.
Pure value events cannot directly cause a transition but can
be used to choose among different possibilities involving the
same set of trigger events. The model is event driven. The state
of CFSMs consists of a set of event types that are both input
and output for it. The non-zero reaction time of the feedback
loop provides the storage capability to implement state. A discrete
model of time is used, each computing element takes a non-zero
unbounded time to perform its task.
Unlike CFSMs, the model we have described
- communicating Mealy machines - does not require two different
kinds of input events to be defined. This distinction of event
types does not need to be made for the specification of system
behavior. Furthermore, the Mealy machine model does not require
that there be buffers at their inputs and outputs. Buffers are
inserted only if there is a loop that causes a race condition.
In CFSMs, the buffers hold their information for one clock cycle.
Buffers are not clocked synchronously, their values are held
until the next input/output event is asserted.
FSMDs are described as Finite State
Machines augmented by the use of variables, thus an FSM with a
Datapath. It is argued that a FSM model does not work as a model
for systems with more than a few hundred states because it becomes
incomprehensible to the designer. The FSMD is defined according
by a relation (such as a + b > x -1 ) and an expression (such
as x = (a2 + b2)1/2), and the
concept of superstate which introduces into the system. The FSMD
is implemented with a control unit and a datapath. Each state
in the model corresponds to a clock cycle in the implementation.
The FSMD is intended as a general model for describing all hardware
designs. Similar to the Mealy machine model, if there is a feedback
loop, a latch is inserted to introduce an element of delay and
prevent race conditions. The FSMD's next state and outputs depend
not only on the present state and the external signals but also
on the internal status signals that indicate whether a relation
between two datapath quantities is true or false. The Mealy machine
differs from the FSMD in that it does not redefine the standard
FSM inputs and outputs into relations and expressions, and that
it does not rely on communication with the datapath to determine
it's behavior. The Mealy machine model is more simple, yet provides
the necessary constraints for specification of a system.
JAVA*: JAVA COMPLIANT TO THE MEALY MACHINE MODEL
Because Java supports sequential
and concurrent assignments (by implementing threads) conditional
constructs, loops, and synchronization, it provides a good foundation
for an augmented form of the language to be used for specification
and system behavior description. Furthermore, because there are
no pointers in Java, and because memory management is not handled
by the user (but rather by a garbage collector), it stays clear
of the pitfalls of object oriented languages such as C++, which
give too much control to the designer and inspire programs that
are difficult to analyze. Other languages used for system description
are VHDL, HardwareC, Verilog, Silage, CSML. We believe an implementation
of a design system such as described in Figure 1 of the Motivation
section will avoid several of the pitfalls in the nature of today's
widely used hardware description languages VHDL and Verilog.
These languages were originally developed for simulation and form
a cumbersome interface to behavioral and architecture description.
The languages are difficult to master especially for synthesis
and have significantly slowed the overall pace of innovation in
the design technology community (http://www-cad.eecs.berkeley.edu/~newton).
However, these languages have been dictated to users via the suppliers
of EDA technology and the desire to standardize on a language
already in use in the design community. Java's ease of use and
excellent support combined with an interactive refinement process
will allow synthesizable Java* code to be produced without having
to invent a new language and all new tools.
The classes and methods required to implement the communicating Mealy machine model on a system is being developed initially using threads to encapsulate separate processes. This allows for concurrent processes to run, as well as the implementation of synchronization and handshaking, with methods wait() (implicit buffer), notify() (implicit buffer), and join(). join() waits until a thread it depends on is finished running and then starts the next part. wait() is defined to enable waiting until some condition occurs, and notify() is defined to tell the threads waiting that something has occurred. Handshaking can be implemented using wait() and notify(), or by wrapping a computation or variable assignment with synchronized.
EXAMPLE: STRONGARM PROCESSOR MODIFIED
FOR LOW POWER
The low-power version of the StrongArm
processor is used to demonstrate an application of the Mealy machine
model to a real-time system. In order to achieve low power, conditional
clocking (as well as edge triggered latches used to maximize speed
and minimize gate load on the clock) is implemented. The various
processes are encapsulated in Java threads, both to demonstrate
concurrency, as when the bus clock and the internal clock grid
run separately, and to implement conditional clocking situations,
such as when a cache fill occurs and the internal clock grid runs
off the bus clock.
DESCRIPTION: A PLL (phase locked
loop) is connected to two separate clocks for I/O and the core.
Conditional clocking is used. The chip has five clock regions
for: 1)pins; 2)bus interface control & write buffer; 3)core
(integer unit and MMUs); 4)Icache; 5)Dcache
FIRST PART RELEVANT TO MEALY MODEL:
The pins, bus-interface, and write-buffers
run off the bus clock (MCLK). The core, Icache, and Dcache are
connected to the internal clock grid (DCLK). During cache fills,
the internal clock grid is clocked by the bus clock (MCLK) rather
than the PLL to save power and to minimize the synchronization
time during fills. Since the pins and bus interface control are
clocked by the MCLK, and since the caches are filled from these
inputs, it makes sense to use MCLK on them as well. Clocking
off of MCLK does not slow the system down because the system can't
clock any faster than the input coming from the pins and bus interface.
In the Core:
EBOX: integer math except multiplier
IMMU: instruction memory management unit
DMMU: data memory management unit
WB: write buffer
SECOND PART RELEVANT TO MEALY MODEL:
The write buffer receives clocks sourced by both the internal clock grid and the bus clock grid (DCLK, MCLK) since during stores it must pass data across this clock boundary.
The StrongArm also has a 5-stage
pipeline. The instruction unit (IBOX) operates in the first 2
stages, Fetch and Issue. In the fetch cycle, the next instruction
is fetched from the instruction cache (ICACHE) and latched into
to the instruction buffer. In the issue stage, all register dependencies
and execution unit (EBOX) availability is checked and the instruction
is issued if there are no conflicts. If there are conflicts,
the instruction is held in the IBOX until all the dependencies
clear. Result forwarding is provided for all results between
generation and register file writing. Most instructions need
one issue cycle.
The basic states in which the processor operates are standby, idle, MCLK and/or DCLK active, and the MCLK runs the internal clock grid during cache fills. Figure 9 demonstrates the sequence of
Figure 9. Possible Clocking from Power-up.
We have shown that the currently
existing approach to developing a language for describing system
behavior is limited by being application specific. Therefore,
many languages have been developed which can neither communicate
with each other efficiently or render the best description of
a given system. We have decided to take a different approach,
letting the system designer specify the design in a standard,
popular language such as Java, and putting the code through a
set of refinement steps compared against a selection of models
to determine the one with the appropriate number of constraints
to express the behavior of the system accurately. We have introduced
three variants of the Mealy machine model, time independent, time
relative, and time absolute, as a category of possible models,
and we have compared the Mealy model to the CFSM and FSMD to illustrate
the differences and advantages, and why the generalized model
is sufficient to express behavior.
Berry, G. The Constructive Semantics
of Pure Esterel. May 22, 1996.
Bowhill, William J., Allmon, Randy
L., Bell, Shane L., Cooper, Elizabeth M., Donchin, Dale R., Edmondson,
John H., Fischer, Timothy C., Gronowski, Paul E., Jain, Anil K.,
Kroesen, Patricia L., Loughlin, Bruce J., Preston, Ronald P.,
Rubinfeld, Paul I., Smith, Michael J., Thierauf, Stephen C., Wolrich,
Gilbert M.. A 300MHz 64b Quad-Issue CMOS RISC Microprocessor.
1995 IEEE International Solid-States Circuits Conference, p.182-183.
Chiodo, Massimiliano, Damiano,
Antonino, Lavagno, Luciano, Sangiovanni-Vincentelli, Alberto.
Design Automation for Reactive Embedded Controller Co-design.
Chiodo, Massimiliano, Giusto,
Paolo, Hsieh, Harry, Jurecska, Attila, Lavagno, Luciano, Sangiovanni-Vincentelli,
Alberto. A Formal Methodology for Hardware/Software Co-design
of Embedded Systems.
Clarke Jr., Edmund M., Long, David
E., McMillan, Kenneth L.. A Language for Compositional Specification
and Verification of Finite State Hardware Controllers. Proceedings
of the IEEE, Vol. 79, No. 9, September 1991, p. 1282-1292.
Dobberpuhl, Dan. The Design of
a High Performance Low Power Microprocessor. Slides. (DEC)
Gajski, Daniel D., Ramachandran,
Loganath. Introduction to High-Level Synthesis. IEEE Design
& Test of Computers, Winter 1994, p.44-54.
Gajski, Daniel D., Vahid, Frank.
Specification and Design of Embedded Hardware-Software Systems.
IEEE Design and Test of Computers, Spring 1995, p. 53-67.
Gupta, Rajesh K., de Micheli,
Giovanni. Hardware-Software Cosynthesis for Digital Systems.
IEEE Design and Test of Computers, September 1993, p. 29-41.
Kaenel, Vincent von, Aebischer, Daniel, Piguet, Christian, Dijkstra,
Evert. A 320MHz, 1.5mW at 1.35V
CMOS PLL for Microporcessor Clock Generation. 1996 IEEE International
Solid-State Circuits Conference, p.132-133.
Kalavade, Asawaree, Lee, Edward
A.. A Hardware-Software Codesign Methodology for DSP Applications.
IEEE Design and Test of Computers, September 1993, p.16-28.
Montanaro, James, Witek, Richard
T., Anne, Krishna, Black, Andrew J., Cooper, Elizabeth M., Dobberpuhl,
Daniel W., Donahue, Paul M., Eno, Him, Farell, Alejandro, Hoeppner,
Gregory W., Kruckemyer, David, Lee, Thomas H., Lin, Peter, Madden,
Liam, Murray, Daniel, Pearce, Mark, Santhanam, Sribalan, Snyder,
Kathryn J., Stephany, Ray, Thierauf, Stephen C.. A 160MHz 32b
CMOS RISC Microprocessor. 1996 IEEE International Solid-State
Circuits Conference, p. 214-215.
Shiple, Thomas R., Berry, Gerard,
Touati, Herve. Constructive Analysis of Cyclic Circuits (EDTC,
Paris, March 1996).
Shiple, Thomas R., Singhal, Vigyan, Brayton, Robert K., Sangiovanni-Vincentelli, Alberto L.. Analysis of Combinational Cycles in Sequential Circuits.