Professional Documents
Culture Documents
0
by
J.P.L. Segers
Department of Mathematics and Computing Science
Eindhoven University of Technology
The Netherlands
0
This report was presented to the Eindhoven University of Technology in fulllment of the thesis re-
quirement for the degree of Ingenieur in de Technische Informatica. The work was done while visiting the
University of Waterloo from September 1992 until May 1993.
Acknowledgements
I am very thankful to my supervisor Jo Ebergen from the University of Waterloo in Canada for
listening to me, answering my questions, and for carefully reading this manuscript. He made my
stay in Waterloo very educational.
I thank all members of the MAVERIC research group for some interesting discussions on designing
up-down counters. In one of those discussions Peter Mayo gave me the idea for an up-down counter
with constant power consumption.
There are more people at the Computer Science Department of the University of Waterloo that
deserve to be acknowledged. I will not try to mention all of them, because I would undoubtedly
forget someone. Hereby I thank all of them.
The International Council for Canadian Studies is acknowledged for their nancial support. The
Government of Canada Award that they awarded to me made my stay in Waterloo nancially
possible.
I thank Rudolf Mak for getting me in touch with Jo Ebergen, and Franka van Neerven for helping
with all kinds of organizational details that had to be taken care of before I could go to Waterloo.
Finally, I thank Twan Basten for being patient with me and listening to me during the stay in
Waterloo.
i
Abstract
The goal of this report is to investigate up-down counter implementations in the framework of delay-
insensitive circuits. An up-down counter is a counter on which two operations can be performed:
an increment by one and a decrement by one. For N larger than zero, an up-down N-counter
counts in the range from zero through N. In the counters we design, the value of the counter,
or its count, cannot be read, but it is possible to detect whether the counters value is zero, N,
or somewhere in between. Up-down counters have many applications. For example, they can be
useful in implementing queues or stacks.
Various implementations for up-down N-counters are presented for any N larger than zero. All
counter designs are analyzed with respect to three performance criteria, namely area complexity,
response time, and power consumption. One of the designs is optimal with respect to all three
performance criteria. Its area complexity grows logarithmically with N, and its response time and
power consumption are independent of N.
ii
Contents
0 Introduction 1
0.0 Synchronous Up-Down Counter Implementations . . . . . . . . . . . . . . . . . . . . 2
0.1 Designing Asynchronous Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.2 Results of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
0.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1 Trace Theory and Delay-Insensitive Circuits 7
1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Trace Theory and Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Extending Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Basic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Delay-Insensitivity and DI decomposition . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Sequence Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Formal Specication of Up-Down Counters 20
2.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1 An Up-Down Counter with an ack-nak Protocol . . . . . . . . . . . . . . . . . . . . . 20
2.2 An Up-Down Counter with an empty-ack-full Protocol . . . . . . . . . . . . . . . . . 21
iii
3 Some Simple Designs 24
3.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1 Unary Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.0 Specication of the Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 Correctness of the Implementation . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 A Binary Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.0 Specication of the Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 Correctness of the Implementation . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 Implementations for General N . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 An Implementation with Parallelism 40
4.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1 Specication of the Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Correctness of the Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Implementations for General N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.0 Area Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.1 Response Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.2 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 The ack-nak Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 An Implementation with Constant Power Consumption 58
5.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1 Specication of the Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Correctness of the Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
iv
5.3.0 Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Conclusions and Further Research 71
6.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Bibliography 76
v
Chapter 0
Introduction
Counters of all kinds are used in a variety of digital circuits. The kinds of counters used include
binary counters, Gray code counters, ring counters, and up-down counters. Most of these counters
cycle through a number of states, each state representing a natural number. Because of this cyclic
behavior, the next state can be determined from the present state. An up-down counter behaves
dierently. It counts up or down, depending on the input received.
For many counters, the value of the counter, or its count, can be read by the environment. Some-
times, however, there is no need to be able to read the value of the counter. In the case of a
modulo-N counter, for example, it can be sucient to detect when the count modulo N is equal
to zero. In the case of an up-down counter with a counting range from zero through N, it can
be sucient to detect when the counters value is one of the boundary values. In [DNS92], for
example, an up-down counter is used to nd out whether a FIFO-queue is empty, full, or neither
empty nor full. In [EG93a] the use of an up-down counter is proposed for a similar purpose.
In this report we specify and design up-down counters that count in the range from zero through
N, for N larger than zero. We call such counters up-down N-counters, or just N-counters. The
counters we specify and implement are of the kind where the environment of the counter can only
detect whether the counters value is zero, N, or neither zero nor N. It cannot read the counters
value. Being able to detect whether a counters value is zero is called empty detection. Detecting
whether the counters value is N is called full detection.
We want to investigate whether or not it is possible to design N-counters with empty and full
detection of which emptiness and fullness can be detected within an amount of time independent
of N after an input has been sent to the counter. If only empty and full detection are required
in a specic application, such a counter could have a faster response time than a readable counter
1
2 Chapter 0. Introduction
for which the detection of boundary values is implemented a posteriori. A readable up-down N-
counter requires a number of memory cells that is at least logarithmic in N. It is hard to imagine
that in such a counter the new value can be read after an amount of time independent of N, if no
broadcasting of signals is allowed.
In the design of counters we can distinguish between synchronous and asynchronous counters. In
this report we devote our attention to asynchronous counters, i.e., counters that do not use a global
clock signal. Before giving some advantages of asynchronous circuits, we briey discuss synchronous
counter implementations. We have not found any designs for asynchronous up-down counters in
the existing literature.
0.0 Synchronous Up-Down Counter Implementations
A synchronous counter is a counter that uses a global clock for its operation. Designing synchronous
counters is usually considered a standard exercise. Synchronous designs can be found in many
textbooks on logic design, such as [Man91]. There are not many articles on synchronous counters.
In some of them the counters are used to illustrate particular circuit design methods, as in [LT82,
CSS89]. In those articles the maximum clock frequency usually depends on the size of the counter
and the counter size is limited. In the design proposed in [LT82] this is most apparent in the
circuitry that implements the empty detection: it uses gates with a number of inputs that depends
on the size of the counter. The authors of this article say
Traditional counters, both asynchronous and synchronous, suer either from slow speed
(in the asynchronous case) since there is a carry or borrow propagation during counting,
or from irregularity (in the synchronous case) due to the control gate of each stage being
dierent.
The counter they design is used as a bracket counter. The object is matching opening and closing
brackets in a string of brackets. Hence the authors are interested in testing whether the counters
value is zero. We show that asynchronous counters with empty detection are not necessarily slow
due to carry or borrow propagation, and that their structure can be very regular.
Guibas and Liang [GL81] describe an implementation of a binary up-down counter by a transi-
tion table. They do not formally prove that this implementations is correct. The idea for the
implementation is the same as for the counter design presented in Chapter 4 of this report. Their
counter does not have full detection and requires a global clock signal. Guibas and Liang conjecture
0.1 Designing Asynchronous Circuits 3
that there is a correspondence between their binary up-down counter design and a stack design
presented in the same paper. This conjecture is not explained. A correspondence between stack
implementations and unary counters is much more obvious, as we show in Chapter 3. Finally,
Guibas and Liang claim that their counter design can be made totally asynchronous. They do not
give any justication for this claim.
In [JB88] an up-down 2N-counter is implemented by N identical modules. The inputs for increment-
ing and decrementing the counter are broadcast to all modules. This results in an implementation
where the new value of the counter can be read after a constant number of clock cycles after the
counter has received an input, under the assumption that the input signal can be broadcast to all
modules in a constant amount of time. In this report we look at counter designs in which the inputs
are sent to one module only.
Oberman wrote a textbook on counting and counters [Obe81]. The book contains a large number
of counter implementations of all kinds, among which up-down counters in Chapter 2. Some
commercially available counters are discussed, and some simpler up-down counters are presented
for educational purposes. Oberman does not discuss the performance of the implementations he
presents.
Parhami proposes some up-down counter designs in [Par87]. His counters behave like modulo-N
counters (see e.g. [EP92]) when the counter is incremented in its full state, and the value of the
counter cannot be read. He also considers counters that can represent negative values. His binary
counter design has the drawback that its specication assumes every cell to have two neighbors.
The result is that in the implementation internal signals have to be generated to simulate the
behavior of these (physically nonexistent) cells. Parhamis work is based on [GL81].
In the above articles no binary counter implementations are presented for general N. Usually the
maximum count is a power of two minus one.
0.1 Designing Asynchronous Circuits
At present, most digital circuits use a global clock. They perform their computations in lockstep
with the pulses of the clock. The correctness of the operation of such synchronous circuits depends
on the delays of its elementary building blocks: they should be no longer than a xed number of
clock periods.
As circuits become larger and larger, distribution of the clock signal over the circuit becomes
increasingly more dicult. This becomes apparent when looking at DECs Alpha microprocessor
4 Chapter 0. Introduction
for example [BBB
+
92]. In asynchronous circuits there is no global clock signal. The goal of this
report is to design asynchronous up-down counters of a special kind; we aim for delay-insensitive
implementations, see e.g. [Ebe89, Udd86]. The correct operation of delay-insensitive circuits does
not depend on bounds for the delays of its parts, i.e. delays of basic components and connection
wires.
The absence of the need to distribute a global clock signal is not the only advantage of delay-
insensitive circuits. The advantages include the following:
Delay-insensitive circuits have better scalability than synchronous circuits. The reason for
better scalability is that in scaling down the dimensions of a circuit (size and in synchronous
circuits the clock period), the delays in wires do not scale down proportionally. For syn-
chronous circuits this means that the timing constraints have to be veried again and that
the clock frequency may have to be adjusted. For delay-insensitive circuits it is not a problem
since the correctness of their operation does not depend on delays in connection wires.
Delay-insensitive circuits have better modiability than their synchronous counterparts. In
a delay-insensitive circuit parts of the circuit can be redesigned, e.g. to obtain better per-
formance or a smaller circuit, without aecting the correctness of the whole as long as the
functionality of the redesigned part does not change. In synchronous designs this is not
possible without verifying all the timing constraints as well.
Asynchronous circuits possibly have a better performance than synchronous circuits. Syn-
chronous circuits exhibit worst-case behavior, since the clock frequency has to be adjusted to
the worst-case delay in computations. In asynchronous circuits, a computation step is made
as soon as possible. So asynchronous circuits tend to exhibit average-case behavior.
Asynchronous circuits possibly have a lower power consumption. A reason for the lower power
consumption of asynchronous circuits is the absence of the clock, which ticks continuously,
even when no computation is being executed. In [BBB
+
92] it is stated that in DECs Alpha
microprocessor chip, 17.3% of the power dissipation comes from the clock generation (clock
distribution is not included in this number). Thus, circuits without a clock may have a
lower power consumption. Moreover, absence of a clock may reduce cooling problems in large
circuits.
In delay-insensitive circuits metastable behavior does not cause errors. In synchronous cir-
cuits metastable behavior may cause errors when the behavior lasts longer than the clock
0.2 Results of the Thesis 5
period. In delay-insensitive circuits the time it takes to reach a stable state only inuences
the performance, not the correctness of the circuit.
Currently a lot of research is devoted to designing and analyzing asynchronous circuits [vBKR
+
91,
Bru91, Dil89, Ebe89, Gar93, Mar90, JU90, RMCF88, Sut89, Udd86]. There are still many problems
to be solved, like analyzing performance measures of the designed circuits, liveness properties, and
testing.
Some interesting performance criteria for asynchronous circuits are their area complexity, response
time, and power consumption. The area complexity of a circuit can be analyzed by counting the
number of basic elements in the circuit, provided that the circuit does not have long connection
wires between its basic elements. We use this basic element count as a measure for the area
complexity of our designs.
The response time of an asynchronous circuit can be dened as the delay between an output event of
the circuit and the last input event that has to take place before that output can occur. A possible
measure for the response time is the number of internal events that have to occur sequentially
between an input to the circuit and the next output. For a class of asynchronous circuits this can
be formalized by sequence functions [Zwa89, Rem87]. Sometimes counting events is not sucient.
We show this in a later chapter and propose a dierent way to estimate the response time.
Van Berkel was one of the rst to identify low power consumption as an attractive property of
asynchronous circuits [vB92]. He analyzes power consumption of his implementations by counting
the number of communications of the handshake components in his circuits, see e.g. [vB93]. We
estimate the power consumption of our counter designs by counting the average number of events
per external event in our specication language.
0.2 Results of the Thesis
In this report we concentrate on the design of delay-insensitive up-down counters. The up-down
counters are operated by a handshake protocol between the counter and its environment. The
counters have inputs up and down. If an up is received from the environment, then the counter is
incremented and an acknowledgement is sent to the environment provided that the counter was
not at its maximum value before the up was received. In the same way, the counter is decremented
if a down is received provided that the counters value was greater than zero before the down was
received. Each input is acknowledged in such a way that the counters environment knows whether
the counter is empty, full, or neither. The counters value cannot be read by its environment.
6 Chapter 0. Introduction
The proposed implementations are analyzed with respect to the three performance criteria men-
tioned in the previous section. Since we do not present transistor implementations, we do not have
exact numbers for the measures. We analyze the order of growth of the three performance criteria
in terms of N. To indicate the order of growth we use to indicate a lower bound, O to indicate
an upper bound, and to indicate a tight bound. A tight bound is both a lower bound and an
upper bound.
In the following we design a number of up-down counter implementations. Some of them are similar
to synchronous counter implementations found in the literature. These implementations show that
a global clock is not required.
In the response time analysis of one of the counters we show that under certain assumptions
sequence functions may not be adequate to determine the response time of asynchronous circuits.
We analyze the response time under the weaker assumption that basic elements have variable,
but bounded, delays. Ai denition for bounded response time is proposed, to be used instead of
constant response time as dened in [Zwa89] when the weaker assumptions apply.
Furthermore, an up-down N-counter design is presented, for any N greater than zero, with optimal
growth rates for area complexity, response time, and power consumption. The area complexity of
this type of counter is logarithmic in its size, and the response time and power consumption are
independent of its size. We can even prove that the response time is bounded according to our
denition, which is a stronger result than proving constant response time.
0.3 Thesis Overview
The goal of this report is to examine possible delay-insensitive implementations for up-down N-
counters. Before we can give any implementation, we need a specication. In Chapter 1 we give
an overview of the formalism we use for describing the specications and implementations, and we
introduce the correctness concerns for implementations. In Chapter 2 we present two specications
for up-down counters and Chapters 3, 4, and 5 are devoted to designing and analyzing implementa-
tions. The implementation presented in Chapter 5 is a new one. It presents a method for designing
up-down counters with constant power consumption, bounded response time, and logarithmic area
complexity for any N. Chapter 6 contains some concluding remarks and suggestions for further
research.
Chapter 1
Trace Theory and Delay-Insensitive
Circuits
1.0 Introduction
The formalism we use to specify delay-insensitive circuits and to verify the correctness of imple-
mentations is introduced in [Ebe89]. Behaviors of circuits are described by strings of events, called
traces, over a certain alphabet. This is formalized in trace theory. Our specications are so-called
commands. They are similar to regular expressions. Commands are a way to specify regular trace
structures, a subclass of trace structures.
Implementations consist of sets of connected components. Each of the components in an imple-
mentation can be specied by a command. A set of components that implements a specication is
called a decomposition of that specication.
This chapter contains an introduction to trace theory, the command language, and decomposition.
A more extensive introduction can be found in [Ebe91].
1.1 Trace Theory and Commands
Components are specied by commands. Commands prescribe the possible sequences of communi-
cations between components and their environment.
The underlying semantics for commands is trace theory [vdS85, Kal86]. Every command is associ-
ated with a (directed) trace structure. A directed trace structure is a triple 'I, O, T`. I and O are
7
8 Chapter 1. Trace Theory and Delay-Insensitive Circuits
alphabets; I is the input alphabet and O the output alphabet. The input alphabet I represents
the input terminals of the specied component and the output alphabet O represents its output
terminals. The set of possible communication behaviors of the component is given by T; T is called
the trace set of the trace structure. It is a set of sequences over I O. A trace structure that
describes the communication between a component and its environment has disjoint input and
output alphabets. There are no bidirectional communication channels.
For a trace structure S, the input alphabet, output alphabet, and trace set are denoted by iS, oS,
and tS respectively. Furthermore we dene the alphabet of S as iS oS. It is denoted by aS. For
command C, we use iC, oC, aC, and tC to denote the input alphabet, output alphabet, alphabet,
and trace set of the corresponding trace structure.
The command language consists of atomic commands and operators. Since commands are used
to described the behavior of components we want the corresponding trace structures to have a
non-empty trace set. Moreover, we want the trace set to be prex-closed. This means that for any
trace in the trace set, all its prexes are in the trace set as well. A non-empty, prex-closed trace
structure is also called a process.
The atomic commands are , b?, b!, and !b?, where b is an element of a suciently large set of
names. The atomic commands correspond to trace structures in the following way:
', , `
', , `
b? ' b , , b `
b! ', b , b `
!b? ' b , b , b `.
In this report we simply write b for the command !b?. This does not cause any confusion; if b occurs
in a command, it is an atomic command, and not a symbol. There are seven operators dened on
commands. For commands C and D and alphabet A we have:
C; D = 'iC iD, oC oD, (tC)(tD)`
C [ D = 'iC iD, oC oD, tC tD`
[C] = 'iC, oC, (tC)
`
pref C = 'iC, oC, t : ( u :: tu tC) : t `
C [`A = 'iC A, oC A, t : t tC : t [`A`
C | D = 'iC iD, oC oD, t : t (aC aD)
t [`aC tC t [`aD tD : t `.
The seventh operator is treated separately in the next section. The rst three operations are well
known from formal language theory. Juxtaposition and
denote concatenation and Kleene closure
1.2 Extending Commands 9
of sets of strings. The pref operator constructs prex-closed trace structures from its arguments.
The projection of a trace t on an alphabet A, denoted by t [`A, is trace t with all occurrences of
symbols not in A deleted. We also use another notation to describe projection. For command C,
we write
[[ A :: C ][
as an alternative for
C [`(aC ` A).
This alternative notation has the advantage that the set of symbols to be hidden, A, appears before
the command C. The symbols occurring in A can be interpreted as internal symbols of C. The trace
set of the weave of two commands C and D, consists of the interleavings of the traces described by
C and D. We stipulate that unary operators have higher binding power than binary operators. Of
the binary operators weaving has the highest priority, followed by concatenation, and nally union.
With these operations every command corresponds to a regular trace structure. This means that
components specied by commands have a nite number of states.
A result from trace theory that we use later is the following.
Property 1.1.0. For trace structures R and S, and alphabet A
(R | S) [`A = (R[`A) | (S [`A) aR aS A.
2
A proof can be found in [Kal86].
1.2 Extending Commands
We extend the command language to make it easier to specify nite state machines. Finite state
machines can be expressed with the operators introduced so far, but it is not always easy. In-
troducing tail recursion will remedy this. Ebergen introduced tail recursion to specify nite state
machines in [Ebe89]. The proofs of the claims made in this section can be found there. Dening
the meaning of a tail-recursive specication requires some lattice theory. A general introduction to
lattice theory can be found in [Bir67].
10 Chapter 1. Trace Theory and Delay-Insensitive Circuits
A function f is a tail function if it is dened by
f.R.i = pref ( [ j : 0 j < n : (S.i.j)(R.j))
for vector of trace structures R of length n, matrix of trace structures S of size n n, and for
0 i < n. Matrix S determines f uniquely. We assume that every row of S contains at least one
non-empty trace structure.
Let I be the union of the input alphabets of the elements of S and let O be the union of the output
alphabets of the elements of S. The set of all vectors of length n of non-empty, prex-closed trace
structures with input alphabet I and output alphabet O is denoted by T
n
.I.O. On T
n
.I.O a partial
order can be dened:
R _ R
.i))
for trace structures R and R
in T
n
.I.O. With this partial order, T
n
.I.O is a complete lattice with
least element
n
.I.O, where .I.O = 'I, O, `. The least upper bound operation on this lattice
is pointwise union of vectors of trace structures and the greatest lower bound operation is pointwise
intersection.
Moreover, for a matrix S that has a non-empty trace structure in each of its rows, the tail function f
induced by S is continuous. This means that this tail function has a least xpoint (Knaster-Tarski
theorem). As usual, we denote the function that maps tail functions to their least xpoints by .
The relation between nite state machines and xpoints of tail functions can be described as follows.
Consider a nite state machine with states q.i for 0 i < n and initial state q.0. If trace structure
S.i.j is non-empty, then there is a transition from q.i to q.j labeled with S.i.j. For 0 k and
R.k T
n
.I.O we dene:
R.0 =
n
.I.O
R.k = f.(R.(k 1)) for 1 k.
In words this means that tR.k.i, for 0 i < n, is the prex-closure of the union of the trace sets
obtained by concatenating the k trace structures on a path of length k starting in state q.i. The
trace structure corresponding to the nite automaton is ( [ k : 0 k : R.k.0) It can be proved that
.f.i = ( [ k : 0 k : R.k.i), so .f.0 is the trace structure corresponding to the nite automaton.
For a trace structure dened by a tail function we can use xpoint induction to prove properties of
the trace structure. We use this in Chapter 5. Here we formulate the xpoint induction theorem
for the lattice of vectors of trace structures and tail functions only.
1.2 Extending Commands 11
Theorem 1.2.0. (Fixpoint induction theorem)
Let f be a (continuous) tail function and let P be an inductive predicate such that P.(
n
.I.O)
holds and f maintains P, i.e.,
P.R P.(f.R),
for R T
n
.I.O. Then P.(.f) holds. 2
A predicate is inductive if
( R : R V : P.R) P.(
R : R V : R),
for any non-empty, directed subset V of T
n
.I.O (a directed set or chain in a partial order is a set
of which the elements are totally ordered).
A tail function can also be specied by a matrix of commands instead of trace structures. This is
essentially the way in which we use tail recursion.
As a small example we give a specication using tail recursion and use xpoint induction to show
that this specication can be simplied.
Tail function f T
4
. a . b is dened by the matrix
a?
b!
a?
b!
To make tail-recursive specications more readable, we will use the following format in the rest of
this report:
S.0 = pref (a?; S.1)
S.1 = pref (b!; S.2)
S.2 = pref (a?; S.3)
S.3 = pref (b!; S.0).
We can use xpoint induction to prove a property of this specication. Predicate P is dened by
P.R (R.0 = R.2),
for R T
4
. a . b . P is an inductive predicate: for any subset V of T
n
. a . b we have
12 Chapter 1. Trace Theory and Delay-Insensitive Circuits
P.(
R : R V : R)
Denition of P
(
R : R V : R).0 = (
R : R V : R).2
Denition of
( [ R : R V : R.0) = ( [ R : R V : R.2)
Leibniz
( R : R V : (R.0 = R.2))
Denition of P
( R : R V : P.R).
All predicates that express that two components of a vector of trace structures are the same are
inductive predicates. For example, predicate Q, dened by:
Q.R (R.1 = R.3)
is inductive too.
It is obvious that both P.(
4
. a . b ) holds and that P.(
4
. a . b ) holds: all components are
equal to . a . b .
Next we show that f maintains P Q. For any R such that Q.R holds we derive:
f.R.0
= Denition of f
pref (a?; R.1)
= Q.R
pref (a?; R.3)
= Denition of f
f.R.2.
Similarly we can derive that f.R.1 = f.R.3 if P.R holds. Thus, by the Fixpoint Induction Theorem,
we conclude that .f.0 is equal to .f.2 and that .f.1 is equal to .f.3. This means that
.f.0 = pref (a?; .f.1)
.f.1 = pref (b!; .f.0).
So (.f.0, .f.1) is equal to the least xpoint of the tail function dened by
S.0 = pref (a?; S.1)
S.1 = pref (b!; S.0).
The xpoint operator has some nice properties. A property we use is that .f [`A can be obtained
by removing all symbols not in A from the dening equations for f.
1.3 Basic Components 13
1.3 Basic Components
In this section we discuss a number of basic components that can be used to implement larger
components. We only introduce the components that are used in this report.
The rst component is the wire. A wire component has one input and one output. Its commu-
nication behaviors are all sequences in which input and output alternate, starting with an input, if
any.
An iwire component has one input and one output as well. Its communication behaviors are
alternations of inputs and outputs as well, but starting with an output, if any. iwires can be used
for starting computations.
The merge component is a component with two inputs and one output. Again, inputs and outputs
alternate in the communication behaviors. After one of the two inputs is received, an output is
produced.
A toggle has one input and two outputs. Communication behaviors of the toggle start with an
input, if any, and inputs and outputs alternate. The outputs are produced alternatingly at each of
the two output terminals.
The next element is the 1-by-1 join, also called join. It has two inputs and one output and is
used for synchronization. After both inputs have been received, an output is produced, and this
behavior repeats. We also use an initialized join. An initialized join behaves as a join for which
one input event has already taken place.
Table 1.0 contains the commands and schematics for the introduced basic components.
1.4 Decomposition
A decomposition of a specication is a set of components that implements the behavior of the
specication according to four correctness concerns. Before we introduce the correctness concerns
of decomposition, we describe how specications should be interpreted.
Earlier we saw that every specication corresponds to a regular trace structure with disjoint input
and output alphabets. The alphabet of the trace structure represents the connections between the
component and its environment, the terminals of the component. The input alphabet consists of
the terminals at which the environment may produce events. The output alphabet consists of the
terminals at which the component may produce events.
14 Chapter 1. Trace Theory and Delay-Insensitive Circuits
Basic component Command Schematic
wire(a; b) pref [a?; b!]
E E
a? b!
iwire(a; b) pref [b!; a?]
E E
a? b!
merge(a, b; c) pref [a?; c! [ b?; c!]
E
E
E
M
b?
a?
c!
toggle(a; b, c) pref [a?; b!; a?; c!]
,
E
E
E
r
r
c!
b!
a?
join(a, b; c) pref [(a?|b?); c!]
E
E
E
c!
b?
a?
Table 1.0: Basic components.
A behavior of the component describes the order in which events occur at the terminals. The trace
set of the trace structure describing the component species which behaviors can occur. Initially
the sequence of events that have occurred is empty. Consider a communication behavior t such that
ta is also a valid communication behavior. If a is an input symbol, this means that the environment
may produce an a event after behavior t. If a is an output symbol, this means that the component
may produce an a event after behavior t has occurred.
We note two things. First, an input or output is not guaranteed to occur, even though it might
be the only event that can occur after a certain behavior. Second, our specications prescribe the
behavior of the environment as well as of the component. This means that correct operation of the
component is guaranteed only in the case that the environment behaves as specied.
In the above we mentioned the environment of a component a number of times. The outputs of the
component are the inputs of its environment and the inputs of the component are the outputs of the
environment. We can turn the environment of component S into a component
S by interchanging
the input and output alphabets.
Definition 1.4.0. Let S be a trace structure. Its reection
S is dened by
S = 'oS, iS, tS`. 2
In a decomposition of a component S into components T.i for 1 i < n, the network produces the
outputs as specied by S. Its environment produces the inputs as specied by S. Equivalently we
can say that the outputs of
S are the inputs to the network. Therefore we consider the network
1.4 Decomposition 15
consisting of
S, T.1, . . . , T.(n 1) in dening decomposition formally.
Definition 1.4.1. Let 1 < n. Component S can be decomposed into the components T.1, . . . , T.n,
denoted by
S (i : 1 i < n : T.i),
if Conditions (0), (1), (2), and (3) below are satised.
Let T.0 be the reection of S and dene W = ( | i : 0 i < n : T.i).
(0) The network is closed:
(
i : 0 i < n : o(T.i)) = (
i : 0 i < n : i(T.i)).
(1) There is no output interference:
( i, j : 0 i < j < n : o(T.i) o(T.j) = ).
(2) There is no computation interference:
t tW x o(T.i) tx[`a(T.i) t(T.i) tx tW,
for any trace t, symbol x, and index i with 0 i < n.
(3) The set of network behaviors is complete:
tW [`aS = tS.
2
Condition (0) states that there are no dangling inputs and outputs. Condition (1) states that no two
outputs are connected to each other. These two conditions are structural conditions on the network.
Conditions (2) and (3) are behavioral conditions. The former states that, after any behavior of
the network, all outputs that can be produced by any of the components, can be accepted by the
components for which that symbol is an input. The last condition ensures that all behaviors of the
specication S may occur in the implementation. This means that an implementation that accepts
all inputs, but never produces any output, is not acceptable. The last condition does not guarantee,
16 Chapter 1. Trace Theory and Delay-Insensitive Circuits
however, that after a certain behavior a specied output or input actually occurs. It merely rules
out implementations where this specied output or input is guaranteed never to occur.
Verifying absence of computation interference formally is often very laborious and the proofs are
not very readable. In the correctness proofs of the counter implementations we verify absence of
computation interference informally.
An automatic verier for decompositions has been developed and is described in [EG93b]. This
verication tool, called verdect has a slightly more restrictive syntax than the command language
described here. But since there is a standard way to specify nite state machines in this restrictive
syntax, all regular trace structures can be described.
A theorem that is useful in verifying decompositions, is the Substitution Theorem. It allows
hierarchical decomposition of components.
Theorem 1.4.2. (Substitution Theorem)
Let S.0, S.1, S.2, S.3, and T be components such that S.0 (S.1, T) and T (S.2, S.3). Then
S.0 (S.1, S.2, S.3) if (a(S.0) a(S.1)) (a(S.2) a(S.3)) = aT. 2
The condition on the alphabets of the components states that the decompositions of S.0 and T
only have symbols from aT in common. By renaming the internal symbols in the decomposition of
T this condition can always be satised.
A proof of the Substitution Theorem can be found in [Ebe89]. The theorem can be generalized to
decompositions with larger numbers of components.
The following lemma is useful for showing that the network behaviors in the decomposition
S (T.1, T.2)
are complete.
Lemma 1.4.3. Let S, T.1 and T.2 be non-empty, prex-closed trace structures. Then
t(
pref (up?; ack!; S.(i + 1) [ down?; empty!; S.(i 1)) for i = 1 and 2 < N
pref (up?; ack!; S.(i + 1) [ down?; ack!; S.(i 1)) for 1 < i < N 1
pref (up?; full !; S.(i + 1) [ down?; ack!; S.(i 1)) for 1 < i and i = N 1
pref (up?; full !; S.(i + 1) [ down?; empty!; S.(i 1)) for i = 1 and N = 2
S.N = pref (down?; ack!; S.(N 1)).
2
From the denition of UDC.N it is obvious that our specications put constraints on the behavior
of the environment. For the counter to function correctly, its environment should adhere to the
specied protocol. For example, if the counter is in its initial state, the environment is not allowed
to send an up signal followed by two down signals. This means that for this specication the current
count after a certain behavior is simply the number of ups minus the number of downs.
The counter specied in Denition 2.1.0 allows any sequence of inputs, but even there some con-
straints are put on the environments behavior. The environment is not allowed to send two inputs
to the counter without an ack or nak happening between the two inputs.
Note that it is not hard to build an up-down counter as specied in the previous section, using
an implementation for the specication given in this section. A cell implementing the following
behavior could be added:
S.0 = pref (up?; sup!; S.3 [ down?; nak!; S.0)
S.1 = pref (up?; sup!; S.3 [ down?; sdown!; S.3)
S.2 = pref (up?; nak!; S.2 [ down?; sdown!; S.3)
S.3 = pref (sempty?; ack!; S.0 [ sack?; ack!; S.1 [ sfull?; ack!; S.2).
The terminals of the UDC implementation should be renamed to sup, sdown, sempty, sack, and
sfull. If the current count of the UDC implementation is zero and the environment sends a down
input, then a nak is sent to the environment. If the current count of the UDC implementation is at
its maximum and an up input is received from the environment, a nak is sent as well. In all other
cases, the input is propagated to the UDC implementation. The type of acknowledgement received
from this counter determines the behavior of the cell upon receiving the next input.
2.2 An Up-Down Counter with an empty-ack-full Protocol 23
Building a UDC.N from an counter with ack-nak protocol is less straightforward.
From now on, we use the words (up-down) N-counter to refer to the counter specied in this
section.
Chapter 3
Some Simple Designs
3.0 Introduction
We design two implementations for UDC.N. The term implementation is used here to denote a
network of components that is a decomposition of the specication (as dened in Chapter 1) and in
which each of the components has a number of states that is independent of the number of states
of the specication. We do not design an implementation at the gate level.
The step from a specication with a variable number of states to a network of a variable number
of components with a constant number of states each, is the most important step in the design
on the way to a low-level implementation (e.g. gate-level implementation). The decomposition of
specications with a xed, nite number of states into basic components or gates has been studied
extensively [Chu87, RMCF88, LKSV91, MSB91, ND91, DCS93].
In this chapter two implementations for UDC.N are presented and proved correct, using the four
correctness criteria described in Chapter 1. Furthermore a performance analysis of the two imple-
mentations is given.
All our implementations, in this and in following chapters, consist of linear arrays of cells. There
are two types of cells in such an array: the end cell and the other cells. The end cell has only ve
terminals for communication with its environment. The other cells have ten terminals: ve for com-
munication with their left environment, and ve for communication with their right environment.
Figure 3.0 depicts block diagrams for the two types of cells.
Based on Figure 3.0 we refer to communications at terminals up, down, empty, ack, and full as
communications with the left environment. Communications at the other terminals are referred to
24
3.1 Unary Implementations 25
-
-
-
-
-
-
sack?
sdown!
full !
ack!
empty!
down?
up?
sfull?
sempty?
sup!
full !
ack!
empty!
down?
up?
(b) (a)
Figure 3.0: (a) block diagram for the general cell; (b) block diagram for the end cell.
as communications with the right environment or subcomponent. The terminals for communication
with the subcomponent start with an s as a mnemonic reminder.
In an implementation of an up-down counter we consider the cells to be numbered starting at zero.
The leftmost cell is numbered zero.
3.1 Unary Implementations
In unary counter implementations the current count of the counter is the sum of the internal counts
of the cells in the arrays. Denoting the internal count of cell i by c.i, the current count is
(i : 0 i < N : c.i).
Unary counter implementations may be useful when the maximum count is small. For large counting
ranges they are not particularly useful, since the number of cells needed to implement an N-counter
is .N.
There is a close relation between unary implementations of up-down counters and the control
structures for stack implementations. A unary counter implementation can be seen as a stack in
which only the number of elements on the stack is relevant, not the actual data values. A number of
(control structures for) delay-insensitive stack implementations have been proposed [Mar90, JU91].
Here, a very simple implementation is presented. The response time is not very good, but the
specied cells have only a few states.
26 Chapter 3. Some Simple Designs
3.1.0 Specication of the Cells
Our unary implementation of the N-counter consists of an array of N cells. Each cell can be either
empty or full. We only allow a prex of the array of cells to be full, i.e., if a cell is empty, all its
successors are empty. Marking all full cells with a 1, the current count is represented by the string
of 1s interpreted as a unary number.
Definition 3.1.0. The end cell of the unary N-counter described above is simply a UDC.1. For
the other cells, let C
0
= (C
0
.0, C
0
.1, C
0
.1
, C
0
.2, C
0
.2
, C
0
.3) be the least xpoint of the following
equations in S:
S.0 = pref (up?; ack!; S.1)
S.1 = pref (up?; sup!; S.1
)
S.2
).
Now C
0
.0 species the behavior of the cell. 2
verdect shows that the state graph corresponding to command C
0
.0 has twelve states. The
components of C
0
in Denition 3.1.0 can be considered a subset of those states. We often refer to
the elements of this subset as the named states. A specication of an up-down counter cell based on
the stack design in [JU91] has 38 states; our specication of a counter cell based on Martins lazy
stack in [Mar90] has sixteen states. The counter cell based on [JU91] may seem to be unnecessarily
complicated, but it has a better response time and the number of cells needed to implement an
N-counter is half the number of C
0
.0 cells or cells based on the lazy stack needed for an N-counter.
To clarify the behavior of the C
0
.0 cell we give some assertions that hold in C
0
.0, C
0
.1, C
0
.2, and
C
0
.3 (in these four states the counter is waiting for input from its environment).
C
0
.0: the current count is zero,
C
0
.1: the current count is one,
C
0
.2: the current count is larger than one and smaller than the maximum count
of the cell and its subcounter,
C
0
.3: the current count is equal to the maximum count of the cell and its
subcounter.
3.1 Unary Implementations 27
3.1.1 Correctness of the Implementation
Proving that the implementation presented in the previous section satises the specication requires
proving that for all N larger than zero
UDC.N ((i : 0 i < N 1 : s
i
C
0
.0), s
N1
UDC.1)
where s
i
C
0
.0 is C
0
.0 with all terminals prexed by i ss.
We give a proof by induction on N. The basic step is easy: the proof obligation is
UDC.1 (UDC.1),
which is a property of decomposition.
For the inductive step we reduce the proof obligation by applying the Substitution Theorem. The
remaining proof obligation is:
UDC.(N + 1) (C
0
.0, sUDC.N).
Proving that this simplication is justied requires the careful verication of the alphabet conditions
of the Substitution Theorem, but the proof is not very hard.
Verifying the two structural conditions for the decomposition of UDC.(N +1) into a C
0
.0 cell and
a UDC.N is easy. The network consisting of UDC.(N + 1), C
0
.0, and UDC.N is closed and there
is no output interference. We concentrate on the behavioral conditions. First we verify that the
network behaviors are complete, and then we look at absence of computation interference.
The cases N = 1 and N > 1 are treated separately. The reason is that these two cases were also
distinguished in the specication of UDC.N. We present the proof for the case N > 1 only.
First we construct the set of network behaviors t(UDC.(N + 1) | C
0
.0 | sUDC.N). We do this by
constructing a set of dening equations for C
0
.0 | sUDC.N, and then looking at the weave of the
result and UDC.(N + 1).
The set of named states for C
0
.0 | sUDC.N will correspond to a subset of the Cartesian product of
the named states of the weavands. The starting state corresponds to the product of the two starting
states of the weavands. Then the other states are obtained by looking at the possible events in
the corresponding states of the weavands. An event at a terminal that occurs in both weavands, is
possible in the weave only if it is possible in both weavands. An event at a terminal that occurs
in only one of the weavands, is possible in the weave if it is possible in that weavand. The named
28 Chapter 3. Some Simple Designs
states of the weave are numbered according to the numbers of the states of the weavands to which
they correspond.
The dening equations for C
0
.0 | sUDC.N are
S.0.0 = pref (up?; ack!; S.1.0)
S.1.0 = pref (up?; sup; sack; ack!; S.2.1
[down?; empty!; S.0.0)
S.2.i =
, C
1
.2, C
1
.2
, C
1
.3, C
1
.4, C
1
.5)
be the least xpoint of
S.0 = pref (up?; ack!; S.1)
S.1 = pref (up?; sup!; S.1
)
S.2
)
S.5 = pref (down?; ack!; S.4).
The behavior of the counter cell is specied by C
1
.0. 2
The two extra named states were introduced to avoid having to write down transitions more than
once. In state C
1
.1
an sup has been sent to the subcomponent and the cell is waiting for an output
from the subcomponent. In state C
1
.2
)
S.2
).
Denoting this cell by C, we have
UDC.(2N) (C, sUDC.N).
The growth rates for the area complexity, response time, and power consumption of an implemen-
tation using cells of this type and C
1
.0 cells are the same as for an implementation using C
0
.0 and
C
1
.0 cells.
3.2.3 Performance Analysis
A (2
k
1)-counter implemented with C
1
cells and a UDC.1 consists of k cells. The implementation
of a general N-counter using as many C
1
.0 cells as possible (and as few C
0
.0 cells as possible) also
has a number of cells that grows logarithmically with N. Therefore we have achieved the optimal
growth rate for the area complexity of up-down counter implementations.
For the response time, we notice that the implementations do not have any parallel behavior. As
was the case in the unary implementation described in Section 3.1, an input may be propagated
from the rst cell to the last cell and an output of the last cell is propagated back to the rst cell
before an output to the environment occurs. Since the number of cells grows logarithmically with
N, the response time does so as well.
If the implementation consists of C
1
cells and a UDC.1 cell only, the response time depends on
the current count as follows. It is determined by the length of the sux of ones in the binary
representation of the current count (in case the next input is an up), or by the length of the sux
of zeroes in the binary representation of the count (in case the next input is a down).
The power consumption of this implementation grows logarithmically with N too. If all C
1
cells in
the implementation are in state C
1
.3, all C
0
cells are in state C
0
.2, and the last cell has internal
3.2 A Binary Implementation 39
count zero, then an up input is propagated all the way to the last cell in the array. This corresponds
to incrementing the count when all cells except the last have internal count one. If the up input is
followed by a down, this down input is also propagated all the way to the end of the array. Thus,
there are behaviors where, after a bounded prex, all inputs cause logarithmically many internal
communications. Since the number of these inputs is unbounded, the power consumption grows
logarithmically with N, assuming that the assumptions made in the section on power consumption
of the unary implementation hold for this binary implementation as well.
Chapter 4
An Implementation with Parallelism
4.0 Introduction
The implementations presented in Chapter 3 have a response time that grows linearly with the
number of cells of the implementation. In this chapter we present an implementation that has a
better response time. Under certain assumptions one can conclude that the response time of this
implementation does not depend on the number of cells. If the assumptions are weakened, however,
the response time still depends on the number of cells.
Better response times can be obtained by designing implementations with parallelism. The unary
and binary counters from Chapter 3 do not have any parallelism; their behaviors are strictly
sequential.
Designing implementations with parallelism is more dicult than designing sequential implementa-
tions. A good way to specify parallel behaviors for linear arrays of cells is specifying the behaviors
of a cell with respect to its left and right environments separately. The two partial behaviors of
the cell are then weaved together. The proper synchronization between the partial behaviors is
obtained by introducing internal symbols.
In specications with parallelism, the commands language results in smaller specications than,
for example, state graphs. The reason is that, due to the weave operator, we do not have to
represent parallelism by giving all interleavings of the events that may occur in parallel. Another
advantage of the commands language will become evident in the correctness proof of the proposed
implementation.
In this chapter we analyze the response time of the designed implementations by rst abstracting
40
4.1 Specication of the Cells 41
away from the dierent inputs and from the dierent outputs. This idea is based on the response
time analysis of the stack design in [JU91]. The abstract implementation is analyzed using sequence
functions and so-called timing functions. The underlying assumption for sequence functions is that
delays are constant. The assumption for our timing functions is weaker. We assume that delays
may vary between xed lower and upper bounds. This seems to correspond more naturally to
asynchronous implementations.
4.1 Specication of the Cells
As before, the end cell of an array of counter cells is a UDC.1 cell. The other cells are specied by a
weave of two sequential behaviors, the behavior with respect to the environment and the behavior
with respect to the subcomponent. We start with an explanation of the former.
For the behavior with respect to the environment only the emptiness or fullness of the cell is encoded
in the named states. This is the only information needed to determine whether communication
with the subcomponent must be initiated. If the cell is full and an up input is received, then there
is a carry propagation to the next cell. If the cell is empty, an up input does not cause a carry
propagation. Upon receiving a down input, there is a borrow propagation if and only if the cell is
empty.
Initiation of communication with the subcomponent is done by introducing two internal symbols,
su and sd. They should be interpreted as send an sup to the subcomponent and send an sdown
to the subcomponent.
For determining which output must be sent to the environment after an input has been received,
we introduce three additional internal symbols, viz., se, sn, and sf. The occurrence of an se
event is to be interpreted as the subcomponent is empty. Similarly, sf can be interpreted as the
subcomponent is full and sn as the subcomponent is neither full nor empty.
We must make sure that our denition for the communication with the subcomponent justies the
interpretation of the internal symbols.
Definition 4.1.0. We use two named states for the description of the external behavior. A state
0 which indicates that the cell is empty, and state 1 which indicates that the cell is full. We get
42 Chapter 4. An Implementation with Parallelism
the following equations in S = (S.0, S.1):
S.0 = pref ( (up?; ((se [ sn); ack! [ sf; full !)
[down?; sd; ack!
); S.1)
S.1 = pref ( (up?; su; ack!
[down?; ((sf [ sn); ack! [ se; empty!)
); S.0).
We denote the least xpoint of these equations by D
2
.
The behavior with respect to the subcomponent is specied using three states, encoding whether
the subcomponent is empty, neither full nor empty, or full. The internal behavior is described by
E
2
.0, where E
2
is the least xpoint of
S.0 = pref (se; S.0
[su; sup!; (sack?; S.1 [ sfull?; S.2)
)
S.1 = pref (sd; sdown!; (sack?; S.1 [ sempty?; S.0)
[sn; S.1
[su; sup!; (sack?; S.1 [ sfull?; S.2)
)
S.2 = pref (sf; S.2
[sd; sdown!; (sack?; S.1 [ sempty?; S.0)
).
The behavior of the counter cell is specied by the command
C
2
= [[ se, sn, sf, sd, su :: D
2
.0 | E
2
.0 ][ .
2
The state graph for this cell has 29 states. Conceptually the specication is related to the binary
counter of [GL81].
Before we turn to the correctness proof for the counter implementation consisting of an array of
C
2
cells and a UDC.1, we try to specify a cell without using internal symbols.
4.2 Correctness of the Implementation 43
Suppose that both the cell and its subcounter are empty, so the current count is zero. When an
up input is received, the cell immediately sends an ack to its environment. This is captured by the
following equation:
S.0 = pref (up?; ack!; S.1).
Next the cell waits for another input, which may be an up or a down. If a down arrives, an empty
output is produced and the cell returns to its initial state. If the input is an up, then an sup is
sent to the subcomponent. Since the cell itself becomes empty again, an ack can be sent to the
environment at the same time. This behavior is formalized as follows:
S.1 = pref (down?; empty!; S.1
[up?; sup!|ack!; S.2).
In state 2 inputs from the environment and the subcomponent may arrive in either order. There
are several possibilities:
S.2 = pref (up?|sack?; ack!; S.3
[up?|sfull?; full !; S.4
[down?|sack?; sdown!|ack!; S.5
[down?|sfull?; sdown!|ack!; S.5)
Even though the above three equations describe only a part of the behavior of the proposed cell,
we already have six named states.
Moreover, the (partial) specication of this cell is incorrect. Note that after trace up ack up ack
the environment has no way of knowing that it has to wait for an internal action (sup) to occur
before sending the next input to the counter. A counter implemented by cells like this suers from
computation interference. This shows that one has to be careful in specifying behaviors in which
things can happen in parallel.
4.2 Correctness of the Implementation
We prove that for 1 k a network of k 1 components of type C
2
and one UDC.1 implements a
2
k
1-counter. For k = 1 we only have to prove
UDC.1 (UDC.1).
44 Chapter 4. An Implementation with Parallelism
As mentioned before, this is a property of decomposition, so there is nothing left to prove. We use
this case as the basic step for an inductive proof.
As before, the proof obligation for the inductive step can be reduced to
UDC.(2N + 1) (C
2
, sUDC.N)
by applying the Substitution Theorem. Before we prove that this last decomposition is valid, we
introduce abbreviations for some alphabets:
A
0
= a(sUDC.N)
A
1
= sd, su, se, sn, sf .
The structural conditions for this decomposition can be veried easily. For the behavioral conditions
we consider the case N > 1 only; the case N = 1 is similar, but easier. For the completeness of the
network behaviors we derive:
[[ A
0
:: C
2
| sUDC.N ][
= Denition of C
2
[[ A
0
:: [[ A
1
:: D
2
.0 | E
2
.0 ][ | sUDC.N ][
= Property 1.1.0 with [[ . ][ instead of [`(a(sUDC.N) A
1
= )
[[ A
0
A
1
:: D
2
.0 | E
2
.0 | sUDC.N ][
= Dene F = E
2
.0 | sUDC.N, see Property 4.2.0
[[ A
0
A
1
:: D
2
.0 | F ][
= Property 1.1.0 with [[ . ][ instead of [`a(D
2
.0) A
0
=
[[ A
1
:: D
2
.0 | [[ A
0
:: F ][ ][
= Dene G = [[ A
0
:: F ][, see Property 4.2.1
[[ A
1
:: D
2
.0 | G][
= Dene H = D
2
.0 | G, see Property 4.2.2
[[ A
1
:: H ][ .
This derivation is made possible by the structure of the specication for C
2
as a weave of two
sequential behaviors, one with respect to its left environment and one with respect to its right
environment.
The second step of the derivation allows us to circumvent the construction of D
2
.0|E
2
.0, the state
graph of which has 29 states. The fourth step allows us to hide the symbols of the alphabet of
sUDC.N, which in turn allows for an easy specication of G.
4.2 Correctness of the Implementation 45
Property 4.2.0. For N > 1 the weave of E
2
.0 and sUDC.N can be specied as the least xpoint
of
S.0.0 =
pref (se; S.0.0
[su; sup; sack; S.1.1) for 1 < N
S.1.i =
,
,
,
`
d
d
d
d
s
'
E '
d
d
E
E
'
d
ds
a.3
r.3 a.2
r.2 a.1
r.1
a.0
r.0
a.(k 1)
r.(k 1)
r.(k 2)
a.(k 2)
Figure 4.0: Implementation of abstract counter cells: a micropipeline.
most time units occurs between the time that the output becomes enabled and the time that the
output has taken place. Furthermore we assume that the delays in connection wires are zero; this
poses no restriction on the validity of the results: with non-zero wire delays the same result can be
obtained. With these assumptions, the implementation of a C
2
abs
component with a join satises
exactly the delay assumptions made earlier for the C
2
abs
cell. In Figure 4.0 the implementation is
depicted. The initialized input to the joins is indicated with a bubble.
The idea for obtaining a timing function for which all initialized joins have bounded response time,
but the network does not, is the following. We let the rst input to the array propagate to cell
k 2 as fast as possible, we let the second input propagate to cell k 3 as fast as possible, and so
on. Then we let the acknowledgements propagate back as slowly as possible. The result is a timing
function for which the delay between (r.0, k 2) and (a.0, k 2) depends on k. Although it is not
very likely that this distribution of delays will occur in practice, it is possible in theory.
For cell j, with j at least zero and smaller than k 2, we have the following timing function:
T
j
.(r.j).i = (j + 2i) for 0 i < k 1 j
T
j
.(r.j).i = (k 2) + (k j + 2 (i k + 1 +j)) for k 1 j i
T
j
.(a.j).i = (j + 2i + 1) for 0 i < k 2 j
T
j
.(a.j).i = (k 2) + (k j 1 + 2 (i k + 2 +j)) for k 2 j i
T
j
.(r.(j + 1)).i = (j + 2i + 1) for 0 i < k 2 j
T
j
.(r.(j + 1)).i = (k 2) + (k j 1 + 2 (i k + 2 +j)) for k 2 j i
T
j
.(a.(j + 1)).i = (j + 2i + 2) for 0 i < k 3 j
T
j
.(a.(j + 1)).i = (k 2) + (k j 2 + 2 (i k + 3 +j)) for k 3 j i.
4.4 Performance Analysis 53
For the last micropipeline cell, i.e. cell k 2, the timing function T
k2
can be used:
T
k2
.(r.(k 2)).i = (k 2) + 2i for 0 i
T
k2
.(a.(k 2)).i = (k 2) + (2i + 1) for 0 i
T
k2
.(r.(k 1)).i = (k 2) + (2i + 1) for 0 i
T
k2
.(a.(k 1)).i = (k 2) + (2i + 1) for 0 i.
Finally, for cell k 1, which is just a wire, we have:
T
k1
.(r.(k 1)).i = (k 2) + (2i + 1) for 0 i
T
k1
.(a.(k 1)).i = (k 2) + (2i + 1) for 0 i.
Now we have to show that these timing functions for the separate cells form a timing function for
the whole array of connected cells. According to [Zwa89, Theorem 2.5.13] it is sucient to show
that for all a such that a occurs in the alphabet of two cells, say j and j
'
s
E
'
'
r
r
E E
r
r
d
d
'
'
E '
'
'
'
d
d
r
r
E E
b.(k 2)
r.(k 1)
a.(k 1)
a.(k 2)
r.(k 2)
a.2
r.2
b.1 a.0 r.1
b.0 a.1 r.0
c.0
c.1
c.(k 2)
M
M
M
Figure 5.0: An array of C
3
abs
/C
4
abs
cell implementations.
If we look only at the structure of the network (see Figure 5.0), the delay between corresponding
r.js and a.js increases at most linearly in k j. Consider cell j 1, for some j greater than zero.
Under certain assumptions for the minimum delay between an a.(j 1) event and the next r.(j 1)
event, we can even prove that the maximum delay between an r.(j 1) event and the corresponding
a.(j 1) event is no larger than the maximum delay between an r.j event and the corresponding
a.j event.
The delay between an a.j event and the next r.j event increases with j. As a result, for large
enough k there is a cell for which the assumptions for the minimum delay between its outputs to its
left environment and the following inputs from its left environment are satised. The cell number
of this cell determines the response time of the counter, and that cell number does not depend on
the number of cells. Hence the counter has bounded response time.
We now give a formal proof. Let T be a timing function for the array depicted in Figure 5.0 such
that according to T the response time of the join, merge, and toggle elements is bounded from
below by and from above by , and the response time of wires is zero. This assumption about
the response time of wires is not crucial to the argument given below. We prove two properties of
T :
0. ( i, j : 0 i 0 j < k 1 T .(r.j).(i + 1) T .(a.j).i 3 2
: T .(a.j).i T .(r.j).i 3 ),
68 Chapter 5. An Implementation with Constant Power Consumption
and
1. ( i, j : 0 i 0 j < k : T .(r.j).(i + 1) T .(a.j).i 3j ).
Proof of 0. The proof is by induction.
Basic step (j = k 2). By the assumptions that the delays of the basic components used in
Figure 5.0 are at most and that wire delays are zero, we see that
T .(a.(k 2)).i T .(r.(k 2)).i 3
for any i.
Inductive step. Let 0 < j < k 1 and let 0 i. Assume that the delay between a.(j 1)
and consecutive r.(j 1) is at least 3 2 . The delay between occurrence no. 2i of
r.(j 1) and occurrence no. 2i of a.(j 1) is the delay of a toggle plus the delay of a merge
component. Thus
T .(a.(j 1)).(2i) T .(r.(j 1)).(2i) 2 .
For the other occurrences of r.(j 1) and a.(j 1) we derive:
T .(a.(j 1)).(2i + 1) T .(r.(j 1)).(2i + 1)
T .(a.(j 1)).(2i + 1) T .(c.(j 1)).i + and
T .(r.(j 1)).(2i + 1) T .(b.(j 1)).i
T .(c.(j 1)).i T .(b.(j 1)).i + 2
Delay of join is at most
(T .(b.(j 1)).i max T .(a.j).i) T .(b.(j 1)).i + 3
= Distribution of + over max
0 max (T .(a.j).i T .(b.(j 1)).i) + 3
Induction hypothesis
0 max (T .(r.j).i + 3 T .(b.(j 1)).i) + 3
T .(r.j).i T .(a.(j 1)).(2i) and T .(b.(j 1)).i T .(r.(j 1)).(2i +1) +
0 max (T .(a.(j 1)).(2i) T .(r.(j 1)).(2i + 1) + 3 2 ) + 3
= T .(r.(j 1)).(2i + 1) T .(a.(j 1)).(2i) 3 2
3 .
Proof of 1. This proof is also by induction.
5.3 Performance Analysis 69
Basic step (j = 0). After having sent an r.0, the environment does not send a next r.0 before
receiving the a.0 corresponding to the former r.0 (on valid behaviors). Thus
T .(r.0).(i + 1) T .(a.0).i 0
for any i 0.
Inductive step. Let 0 j < k 1 and let 0 i. We derive:
T .(r.(j + 1)).(i + 1) T .(a.(j + 1)).i
T .(r.(j + 1)).(i + 1) T .(r.j).(2i + 2) +
T .(r.j).(2i + 2) + T .(a.(j + 1)).i
T .(a.(j + 1)).i T .(a.j).(2i + 1) 2
T .(r.j).(2i + 2) T .(a.j).(2i + 1) + 3
Induction hypothesis
3j + 3
= Algebra
3 (j + 1) .
Let h be the smallest integer solution for j of the equation
3j 3 2 .
Then the response time of cell h is bounded from above by 3 . Moreover, the response time of
the network is bounded as well. The upper bound for the response time depends on h only. Since h
is determined by and , the upper bound for the response time does not depend on the number
of cells in the network.
Incorporating nonzero wire delays into the model results in a bounded response time as well,
although the value found for h might be dierent.
Given implementations of C
3
and C
4
cells, we can choose the values for the delays of the merge,
toggle, and join elements such that the timed behaviors of the network of Figure 5.0 correspond
to timed behaviors of the counter implementation. Therefore we conclude that an N-counter can
be implemented with C
3
and C
4
cells, with a response time that does not depend on N.
5.3.1 Power Consumption
The specications of the BSC and BSDC cells have one property that is important for proving
constant power consumption, viz., that the number of communications with their subcomponent is
70 Chapter 5. An Implementation with Constant Power Consumption
at most half the number of communications with their environment. We do not prove this formally.
It is already indicated by the commands C
3
abs
and C
4
abs
.
Let N > 0. Consider the weave W of the components of the implementation of the N-counter:
W = ( | j : 0 j < f.N 1 : s
j
C
1+g.N.j
) | UDC.(g.N.(f.N 1))).
Here f is a function that maps a number to the number of digits of the representation described
in Lemma 5.0.0; function g maps the pair (N, i) to digit number i in Ns representation.
With V as an abbreviation for up, down, we derive:
t tW
t [`a(s
i
C
1+g.N.i
) t(s
i
C
1+g.N.i
) for 0 i < f.N
( i : 0 i < f.N 1 : 2 .(t [`s
i+1
V ) .(t [`s
i
V ))
Algebra
( i : 0 i < f.N : 2
i
.(t [`s
i
V ) .(t [`V ))
Algebra
( i : 0 i < f.N : .(t [`s
i
V ) (1/2
i
) .(t [`V ))
+ is monotonous w.r.t
(i : 0 i < f.N : .(t [`s
i
V )) (i : 0 i < f.N : (1/2
i
) .(t [`V ))
aW = (
i : 0 i < f.N : s
i
V ); Algebra
.t (i : 0 i < f.N : 2
i
.(t [`V ))
2
i
.(t [`V ) 0
.t (i : 0 i : 2
i
.(t [`V ))
Algebra; denition of V
.t 2 .(t [` up, down)
This proves that this implementation has constant power consumption.
Chapter 6
Conclusions and Further Research
6.0 Introduction
In this chapter we discuss the obtained results and present some conclusions. Furthermore we give
some suggestions for further research.
6.1 Conclusions
A number of delay-insensitive implementations for up-down counters have been specied and ana-
lyzed with respect to their area complexity, response time, and power consumption.
All counter implementations consist of a linear array of cells. The current count of the counter can
be derived from the states of these cells. For the simplest of the counters, the current count is just
the sum of the internal counts of the cells. This corresponds to unary or radix-1 counting. For the
other implementations we used binary or radix-2 counting.
For specifying the behaviors of the cells we used the commands language described in Chapter 1.
The weave operator allowed for relatively short specications, compared to state graphs, for exam-
ple. Specifying parallel behavior is made easy by specifying the behaviors at the dierent boundaries
separately and then weaving these partial behaviors. Internal symbols can be used to obtain the
necessary synchronization. Having the partial behaviors was advantageous in proving the correct-
ness of the proposed implementations. Specifying cells as a weave of partial behaviors also makes
it easier to avoid computation interference.
Unary up-down counter implementations turn out to be closely linked to the control parts of stack
71
72 Chapter 6. Conclusions and Further Research
implementations. We proved that the power consumption of unary up-down N-counters grows at
least logarithmically with N. Since every stack implementation can be seen as an up-down counter,
the power consumption of stack implementations grows at least logarithmically with the size of the
stack.
The binary counter implementation presented in Chapter 4 shows that counters described earlier,
for example in [GL81], can be implemented in a delay-insensitive way. In [GL81] counters with
maximum count 2
k
1, for some k greater than zero, are designed. We can implement N-counters
for any N greater than zero.
Furthermore, in the analysis of the proposed implementation in Chapter 4 we argued that un-
der certain assumptions using sequence functions and the denition of constant response time for
sequence functions may not be suitable to analyze the worst-case response time of asynchronous
circuits. These assumptions are that the delays of basic components may vary between a lower and
an upper bound. A suggestion was made for a denition of bounded response time for a particular
class of specications, namely cubic specications with alternating inputs and outputs. Subse-
quently we showed that using this denition, the response time of the implementation depends on
the number of cells. For this proof we used an abstraction of the counter cells. The advantage
of analyzing the response time of the abstract cells is that each cell has only one input from its
left environment and one output to its left environment. As a result, only the delays between that
input and that output have to be considered. If there are more inputs and outputs, case analysis
might be required.
The counter implementation of Chapter 5 is a new one and is an improvement on all previous imple-
mentations. It shows that up-down counters can be implemented with constant power consumption.
Constant power consumption was achieved by introducing redundancy in the representation of the
current count. Moreover, the implementations response time is independent of the number of cells,
even with respect to our stronger denition. Its area complexity grows logarithmically with its size.
Thus, this counter has optimal growth rates with respect to all three performance criteria.
6.2 Further Research
First of all, in this report only high-level implementations of up-down counters are presented. A next
step is the decomposition of the cells into smaller (basic) components or directly into transistors.
Second, there are some possible extensions to the up-down counter specied in Chapter 2. A
possibility is having the counter count modulo N + 1 if the current count is N and another up is
6.2 Further Research 73
received. This is considered useful by some authors [Par87].
Third, in this report we only considered counter implementations consisting of linear arrays of cells.
Unary counters can also be implemented by cells congured in a binary tree. Then a logarithmic
response time may be obtained without any parallelism in the implementation.
Fourth, counters based on other number systems than radix-1 and radix-2 number systems can
be designed. For example, in an implementation with a linear structure, one plus the cell number
can be chosen as the weight for its internal count. With digit set 0, 1 , six can be represented
by 100000, 10001, and 1010 (most signicant digit on the left). In this way all natural numbers
can be represented. Specifying cells for an implementation using this number system requires the
introduction of an extra output channel since in each cell the internal count of the subcell is needed
in order to determine the next state upon receiving an input. We give a specication for the general
cells of such a counter. We have not veried its correctness for all N, but veried a small number
of cases using verdect.
Definition 6.2.0. Dene D
5
as the least xpoint of
S.0 = pref (up?; ((se [ sn); ack
1
! [ sf; empty!); S.1
[down1; (sd
0
; ack
0
!; S.0 [ sd
1
; ack
1
!; S.1)
)
S.1 = pref (up?; (su
0
; ack
0
!; S.0 [ su
1
; ack
1
!; S.1)
[down?; ((sf [ sn); ack
0
! [ se; ack
0
!); S.0
)
74 Chapter 6. Conclusions and Further Research
and dene E
5
as the least xpoint of
S.0 = pref (se; S.0
[su
0
; sup!; (sack
1
?; S.2 [ sfull?; S.3)
)
S.1 = pref (sd
0
; sdown!; (sack
0
?; S.1 [ sack
1
?; S.2 [ sempty?; S.0)
[sn; S.1
[su
0
; sup!; (sack
1
?; S.2 [ sfull?; S.3)
)
S.2 = pref (sd
1
; sdown!; (sack
0
?; S.1 [ sempty?; S.0)
[sn; S.2
[su
0
; sup!; (sack
0
?; S.1 [ sack
1
?; S.2 [ sfull?; S.3)
)
S.3 = pref (sf; S.3
[sd
1
; sdown!; (sack
0
?; S.1 [ sempty?; S.0)
).
Then the counter cell is dened as
C
5
= [[ se, sn, sf, sd
0
, sd
1
, su
0
, su
1
:: D
5
.0 | E
5
.0 ][ .
2
To obtain an implementation of an up-down counter as specied in Chapter 2, the outputs ack
0
and ack
1
of the head cell of this proposed implementation can be merged into one signal ack. The
current count of a counter implementation consisting of k 1 of these cells, a 1-counter, and a
merge(ack
0
, ack
1
; ack) is
( i : 0 i < k : c
i
i),
where c
i
is the internal count of cell i. The implementation of an N-counter requires .(
N) cells
of type C
5
.
Counters based on other (redundant) number systems may have better average response time than
the binary implementations presented in this report.
Fifth, it is not clear how accurate the proposed response time analysis is. Abstracting away from
the identity of inputs and outputs of cells as we did in Chapters 4 and 5 may not inuence the
6.2 Further Research 75
growth rate of the response time, but it does inuence the constant factors, and we do not know
to which extent.
Sixth, we only analyzed the worst-case response time of the proposed designs. One could also
analyze the average-case response time. For counters with the same worst-case response time, the
average response times might still be dierent. This seems to be the case for the unary counter
implementation of Section 3.1 and a counter based on Martins lazy stack protocol. For both
implementations the worst-case response time is linear in the maximum count, but we suspect
the average response time of the latter to be much better. For synchronous implementations this
would not be of much interest, since they reect the worst-case behavior anyway. For asynchronous
implementations the dierence inuences the performance.
Bibliography
[BBB
+
92] Roy W. Badeau, R. Iris Bahar, Debra Bernstein, Larry L. Biro, William J. Bowhill,
John F. Brown, Michael A. Case, Ruben W. Castelino, Elizabeth M. Cooper, Mau-
reen A. Delaney, David R. Deverell, John H. Edmondson, John J. Ellis, Timothy C.
Fischer, Thomas F. Fox, Mary K. Gowan, Paul E. Gronowski, William V. Her-
rick, Anil K. Jain, Jeanne E. Meyer, Daniel G. Miner, Hamid Partovi, Victor Peng,
Ronald P. Preston, Chandrasekhara Somanathan, Rebecca L. Stamm, Stephen C.
Thierauf, G. Michael Uhler, Nicholas D. Wade, and William R. Wheeler. A 100
MHz macropipelined VAX microprocessor. IEEE journal of Solid-State Circuits,
27(11):15851598, November 1992.
[Bir67] G. Birkho. Lattice Theory, volume 25 of AMS Colloquium Publications. American
Mathematical Society, 1967.
[Bru91] Erik Brunvand. Translating Concurrent Communicating Programs into Asynchronous
Circuits. PhD thesis, Carnegie Mellon University, 1991.
[Chu87] Tam-Anh Chu. Synthesis of Self-Timed VLSI Circuits from Graph-Theoretic Speci-
cations. PhD thesis, MIT Laboratory for Computer Science, June 1987.
[CSS89] Somsak Choomchuay, Somkiat Supadech, and Manus Sangworasilp. An 8 bit preset-
table/programmable synchronous counter/divider. In IEEE Sixth International Elec-
tronic Manufacturing Technology Symposium, pages 230233. IEEE, 1989.
[DCS93] Al Davis, Bill Coates, and Ken Stevens. Automatic synthesis of fast compact asyn-
chronous control circuits. In Proceedings of the IFIP WG10.5 Working Conference on
Asynchronous Design Methodologies, March 1993.
[Dil89] David L. Dill. Trace Theory for Automatic Hierachical Verication of Speed-
Independent Circuits. ACM Distinguished Dissertations. MIT Press, 1989.
76
Bibliography 77
[DNS92] David L. Dill, Steven M. Nowick, and Robert F. Sproull. Specication and automatic
verication of self-timed queues. Formal Methods in System Design, 1(1):2960, July
1992.
[Ebe89] Jo C. Ebergen. Translating Programs into Delay-Insensitive Circuits, volume 56 of
CWI Tracts. Centre for Mathemathics and Computer Science, 1989.
[Ebe91] Jo C. Ebergen. A formal approach to designing delay-insensitive circuits. Distributed
Computing, 5(3):107119, 1991.
[EG93a] Jo C. Ebergen and Sylvain Gingras. An asynchronous stack with constant response
time. Technical report, University of Waterloo, 1993.
[EG93b] Jo C. Ebergen and Sylvain Gingras. A verier for network decompositions of command-
based specications. In Trevor N. Mudge, Veljko Milutinovic, and Lawrence Hunter,
editors, Proceedings of the Twenty-Sixth Annual Hawaii International Conference on
System Sciences, volume I, pages 310318. IEEE Computer Society Press, 1993.
[EP92] Jo C. Ebergen and Ad M. G. Peeters. Modulo-N counters: Design and analysis of delay-
insensitive circuits. In Jrgen Staunstrup and Robin Sharp, editors, 2nd Workshop on
Designing Correct Circuits, Lyngby, pages 2746. Elsevier Science Publishers, 1992.
[Gar93] J.D. Garside. A CMOS VLSI implementation of an asynchronous ALU. In S. Furber
and M. Edwards, editors, IFIP WG 10.5 Working Conference on Asynchronous Design
Methodologies. Elsevier Science Publishers, 1993.
[GL81] Leo J. Guibas and Frank M. Liang. Systolic stacks, queues, and counters. In P. Peneld,
Jr., editor, 1982 Conference on Advanced Research in VLSI, pages 155164. Artech
House, 1981.
[JB88] Edwin V. Jones and Guoan Bi. Fast up/down counters using identical cascaded mod-
ules. IEEE journal of Solid-State Circuits, 23(1):283285, February 1988.
[JU90] Mark B. Josephs and Jan Tijmen Udding. Delay-insensitive circuits: An algebraic
approach to their design. In J. C. M. Baeten and J. W. Klop, editors, CONCUR 90,
Theories of Concurrency: Unication and Extension, volume 458 of Lecture Notes in
Computer Science, pages 342366. Springer-Verlag, August 1990.
[JU91] Mark B. Josephs and Jan Tijmen Udding. The design of a delay-insensitive stack. In
G. Jones and M. Sheeran, editors, Designing Correct Circuits, pages 132152. Springer-
Verlag, 1991.
78 Bibliography
[Kal86] Anne Kaldewaij. A Formalism for Concurrent Processes. PhD thesis, Dept. of Math.
and C.S., Eindhoven Univ. of Technology, 1986.
[LKSV91] Luciano Lavagno, Kurt Keutzer, and Alberto Sangiovanni-Vincentelli. Synthesis of
veriably hazard-free asynchronous control circuits. In Carlo H. Sequin, editor, Ad-
vanced Research in VLSI: Proceedings of the 1991 UC Santa Cruz Conference, pages
87102. MIT Press, 1991.
[LT82] X. D. Lu and Philip C. Treleaven. A special-purpose VLSI chip: A dynamic pipeline
up-down counter. Microprocessing and Microprogramming, 10(1):110, 1982.
[Man91] M. Morris Mano. Digital Design. Prentice Hall, 2nd edition, 1991.
[Mar90] Alain J. Martin. Programming in VLSI: From communicating processes to delay-
insensitive circuits. In C. A. R. Hoare, editor, Developments in Concurrency and
Communication. Addison-Wesley, 1990. UT Year of Programming Institute on Con-
current Programming.
[MSB91] Cho W. Moon, Paul R. Stephan, and Robert K. Brayton. Synthesis of hazard-free
asynchronous circuits from graphical specications. In Proceedings of ICCAD-91, pages
322325. IEEE Computer Society Press, November 1991.
[ND91] Steven M. Nowick and David L. Dill. Automatic synthesis of locally-clocked asyn-
chronous state machines. In Proceedings of ICCAD-91, pages 318321. IEEE Computer
Society Press, November 1991.
[Obe81] Roelof M. M. Oberman. Counting and Counters. MacMillan Press, 1981.
[Par87] Behrooz Parhami. Systolic up/down counters with zero and sign detection. In
Mary Jane Irwin and Renato Stefanelli, editors, IEEE Symposium on Computer Arith-
metic, pages 174178. IEEE Computer Society Press, 1987.
[Par90] Behrooz Parhami. Generalized signed-digit number systems: A unifying framework
for redundant number representations. IEEE Transactions on Computers, 39(1):89
98, 1990.
[Rem87] Martin Rem. Trace theory and systolic computations. In J. W. de Bakker, A. J. Nijman,
and P. C. Treleaven, editors, PARLE: Parallel Architectures and Languages Europe,
Vol. I, volume 258 of Lecture Notes in Computer Science, pages 1433. Springer-Verlag,
1987.
Bibliography 79
[RMCF88] Fred U. Rosenberger, Charles E. Molnar, Thomas J. Chaney, and Ting-Pien Fang.
Q-modules: Internally clocked delay-insensitive modules. IEEE Transactions on Com-
puters, 37(9):10051018, September 1988.
[Sut89] Ivan E. Sutherland. Micropipelines. Communications of the ACM, 32(6):720738,
January 1989.
[Udd86] Jan Tijmen Udding. A formal model for dening and classifying delay-insensitive
circuits. Distributed Computing, 1(4):197204, 1986.
[vB92] C. H. (Kees) van Berkel. Handshake Circuits: An Intermediary between Communi-
cating Processes and VLSI. PhD thesis, Dept. of Math. and C.S., Eindhoven Univ. of
Technology, 1992.
[vB93] C. H. (Kees) van Berkel. VLSI programming of a modulo-N counter with constant
response time and constant power. In S. Furber and M. Edwards, editors, IFIP WG
10.5 Working Conference on Asynchronous Design Methodologies. Elsevier Science
Publishers, 1993.
[vBKR
+
91] C.H. (Kees) van Berkel, Joep Kessels, Marly Roncken, Ronald Saeijs, and Frits Schalij.
The VLSI-programming language Tangram and its translation into handshake circuits.
In Proceedings of the European Design Automation Conference, pages 384389, 1991.
[vdS85] Jan L. A. van de Snepscheut. Trace Theory and VLSI Design, volume 200 of Lecture
Notes in Computer Science. Springer-Verlag, 1985.
[WE85] Neil H. E. Weste and Kamran Eshraghian. CMOS VLSI Design. Addison-Wesley VLSI
Systems Series. Addison-Wesley, 1985.
[Zwa89] Gerard Zwaan. Parallel Computations. PhD thesis, Dept. of Math. and C.S., Eindhoven
Univ. of Technology, 1989.