You are on page 1of 216

SPECIFICATION AND DESIGN

OF
EMBEDDED SYSTEMS
by

Daniel D. Gajski
Frank Vahid
Sanjiv Narayan
Jie Gong
University of California at Irvine
Department of Computer Science
Irvine, CA 92715-3425

1 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Design representations

 Behavioral
Represents functionality but not implementation

 Structural
Represents connectivity but not dimensionality

 Physical
Represents dimensionality but not functionality

Introduction

2 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Levels of abstraction

Levels

Behavioral
forms

Structural
components

Physical
objects

Transistor

Differential eq.,
currentvoltage
diagrams

Transistors,
resistors,
capacitors

Analog and
digital cells

Gate

Boolean equations,
finitestate machines

Gates,
flipflops

Modules,
units

Register

Algorithms,
flowcharts,
instruction sets,
generalized FSM

Adders, comparators,
registers, counters,
register files, queues

Microchips,
ASICs

Processor

Executable spec.,
programs

Processors, controllers,
memories, ASICs

PCBs,
MCMs

Introduction

3 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Design methodologies

 Capture-and-simulate
Schematic capture
Simulation

 Describe-and-synthesize
Hardware description language
Behavioral synthesis
Logic synthesis

 Specify-explore-re ne
Executable speci cation
Software and hardware partitioning
Estimation and exploration
Speci cation re nement

Introduction

4 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Motivation
Executable
specification

System
implementation

Processor

Memory

if (x = 0) then
y=a*b/2

Video
accelerator

Partitioning

Models
Languages

Introduction

5 of 214

Estimation
Refinement

ASIC

Software compilation
Behavioral synthesis
Logic synthesis

I/O

Physical design
Test generation
Manufacturing

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline
 Introduction
 Design models and architectures
 System-design languages
 An example
 Translation
 Partitioning
 Estimation
 Re nement
 Methodology and environments
Outline

6 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Models and architectures

Specification + Constraints

Models
(Specification)

Design
process

Implementation

Architectures
(Implementation)

Models are conceptual views of the systems functionality


Architectures are abstract views of the systems implementation

Models & Architectures

7 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Models and architectures

 Model: a set of functional objects and rules for composing these objects
 Architecture: a set of implementation components and their connections

Models & Architectures

8 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Models of an elevator controller


"If the elevator is stationary and the floor
requested is equal to the current floor,
then the elevator remains idle.
If the elevator is stationary and the floor
requested is less than the current floor,
then lower the elevator to the requested floor.
If the elevator is stationary and the floor
requested is greater than the current floor,
then raise the elevator to the requested floor."

loop
if (req_floor = curr_floor) then
direction := idle;
elsif (req_floor < curr_floor) then
direction := down;
elsif (req_floor > curr_floor) then
direction := up;
end if;
end loop;
(b) Algorithmic model

(a) English description

(req_floor < curr_floor)


/ direction := down

Down

(req_floor = curr_floor)
/ direction := idle

(req_floor < curr_floor)


/ direction := down
(req_floor = curr_floor)
/ direction := idle

Idle

(req_floor > curr_floor)


/ direction := up

(req_floor > curr_floor)


/ direction := up

Up

(req_floor = curr_floor)
/ direction := idle

(req_floor < curr_floor) / direction := up


(req_floor < curr_floor) / direction := down
(c) Statemachine model

Models & Architectures

9 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

req_floor
curr_floor
State register

Combinational logic

Architectures for implementing the elevator controller

direction
req_floor
curr_floor

In/out ports

Processor

direction

Memory
Bus

(a) Register level

Models & Architectures

(b) System level

10 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Models
 State-oriented models
Finite-state machine (FSM), Petri net, Hierarchical concurrent FSM

 Activity-oriented models
Data ow graph, Flowchart

 Structure-oriented models
Block diagram, RT netlist, Gate netlist

 Data-oriented models
Entity-relationship diagram, Jacksons diagram

 Heterogeneous models
Control/data ow graph, Structure chart, Programming language paradigm,
Object-oriented paradigm, Program-state machine, Queueing model

Models & Architectures

11 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State oriented: Finite-state machine (Mealy model)

r1/n

r2/n
r2/u1

start

S2

S1

u1
r3/

r2/

d2

r1/

/u2

r3

d1

r1/d1

S3
r3/n
S = { s1, s2, s3}
I = {r1, r2, r3}
O = {d2, d1, n, u1, u2}
f: S x I > S
h: S x I > O

Models & Architectures

12 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State oriented: Finite-state machine (Moore model)

r1
r1
r1

start

S11/d2

r3

S21/d1
r2
r1

r2

r2

r1

S22 /n

r2

r2

r1

Models & Architectures

r3

S32 /u1

r1

r1

S13 /n

r3

r3

r2

r1

S12 /d1

S31 /n

r2

r2

S23 /u1

r3
r3

r2

S33 /u2

r3
r3
r3

13 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State oriented: Finite-state machine with datapath

(curr_floor != req_floor) / output := req_floor curr_floor; curr_floor := req_floor

start

S1
(curr_floor = req_floor) / output := 0

Models & Architectures

14 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Finite-state machines

 Merits:
represent systems temporal behavior explicitly
suitable for control-dominated system

 Demerits:
lack of hierarchy and concurrency resulting in
state or arc explosion when representing complex systems

Models & Architectures

15 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State oriented: Petri nets

p2

p1

t1

p5

t4

t2

p4

p3
Net = (P, T, I, O, u)
P = {p1, p2, p3, p4, p5}
T = {t1, t2, t3, t4}
I: I(t1) = {p1}
I(t2) = {p2,p3,p5}
I(t3) = {p3}
I(t4) = {p4}

Models & Architectures

O: O(t1) = {p5}
O(t2) = {p3,p5}
O(t3) = {p4}
O(t4) = {p2,p3}

t3

u: u(p1) = 1
u(p2) = 1
u(p3) = 2
u(p4) = 0
u(p5) = 1

16 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Petri nets

t1

t2

t1

(a) Sequence

t1

t2

(d) Resource contention

Models & Architectures

t1

t2

(b) Branch

t1

(c) Synchronization

t2

t3

t4

(e) Concurrency

17 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Petri nets

 Merits:
good at modeling and analyzing concurrent systems

 Demerits:
at model that is
incomprehensible when system complexity increases

Models & Architectures

18 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State oriented: Hierarchical concurrent FSM

Y
D

u
a(P)/c

r
F
s

Models & Architectures

19 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchical concurrent FSMs

 Merits:
support both hierarchy and concurrency
good for representing complex systems

 Demerits:
concentrate only on modeling control aspects
and not data and activities

Models & Architectures

20 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Activity oriented: Data ow graphs (DFG)

Input

A2.1

A2.2

A2.3

A1

A2

File

Output
Y

W
Output
(a) Activity level

Models & Architectures

(b) Operation level

21 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data ow graphs

 Merits:
support hierarchy
suitable for specifying complex transformational systems
represent problem-inherent data dependencies

 Demerits:
do not express temporal behaviors or control sequencing
weak for modeling embedded systems

Models & Architectures

22 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Activity oriented: Flowchart (CFG)

start

J=1
MAX = 0
J = J+1
No
J>N

No

MEM(J) > MAX

Yes

MAX = MEM(J)

Yes

end

Models & Architectures

23 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Flowcharts

 Merits:
useful to represent tasks governed by control ow
can impose a order to supersede natural data dependencies

 Characteristics:
used only when the systems computation is well known

Models & Architectures

24 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Structure oriented: Component-connectivity diagrams

Right
bus

Left
bus
Program
memory

Data
memory

Register file

System bus

Processor

LIR

I/O
coprocessor

Application
specific
hardware

(a) Block diagram

Models & Architectures

RIR

ALU

(b) RT netlist

(c) Gate netlist

25 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Component-connectivity diagrams

 Merits:
good at representing systems structure

 Characteristics:
often used in the later phases of design process

Models & Architectures

26 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data oriented: Entity-relationship diagram

Availability

Supplier

Customer

Models & Architectures

Request

P.O.
instance

Product

Order

27 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Entity-relationship diagrams

 Merits:
provide a good view of the data in the system, also
suitable for expressing complex relations among various kinds of data

 Demerits:
do not describe any functional or temporal behavior of the system.

Models & Architectures

28 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data oriented: Jacksons diagram

Drawing
AND
Users *

Shape

Color

OR

Circle

Name

Rectangle
AND

Radius

Models & Architectures

Width

Height

29 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Jacksons diagrams

 Merits:
suitable for representing data having a complex composite structure.

 Demerits:
do not describe any functional or temporal behavior of the system.

Models & Architectures

30 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Control/data ow graph


Data flow graphs
Read X

Read W

+
Control flow graph

stop / disable A2 , disable A3

start

stop

X
1

W = 10
disable
enable

S0

A1
W

start / enable A1 , enable A2

enable

Const 3

Read X

+
X := X + 2
A := X + 5

A := X + 3

A := X + W
Write A

disable
S1

C
2

Write A

A2

Read X

Const 2

W = 10 / disable A1 , enable A3

+
disable
enable

S2

A3

Const 5

Control
Write X

(a) Activity level

Models & Architectures

Write A

(b) Operation level

31 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Control/data ow graphs

 Merits:
correct the inability of DFG in representing the control of a system
correct the inability of CFG to represent data dependencies

Models & Architectures

32 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Structure chart

control
Data

Main
A,B

C
A,B

A,B

C,D

A,B

Get

Transform

Compute

Out_C

Branch
A
A

B
A B

Get_A

Get_B

Change_A

Change_B

Do_Loop1

Do_Loop2

Iteration

Models & Architectures

33 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Structure charts

 Merits:
represent both data and control

 Characteristics:
used in the preliminary stages of program design

Models & Architectures

34 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Programming languages

 Imperative vs declarative programming languages:


C, Pascal, Ada, C++, etc.
LISP, PROLOG, etc.

 Sequential vs concurrent programming languages:


Pascal, C, etc.
CSP, ADA, VHDL, etc.

Models & Architectures

35 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Programming languages

 Merits:
model data, activity, and control

 Demerits:
do not explicitly model the systems states

Models & Architectures

36 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Object-oriented paradigm

Object

Object

Object

Data

Data

Data

Operations

Operations

Operations

Transformation
function

Models & Architectures

37 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Object-oriented paradigms

 Merits:
support information hiding, inheritance, natural concurrency

 Demerits:
not suitable for systems with complicated transformation functions

Models & Architectures

38 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Program-state machine

variable A: array[1..20] of integer

variable i, max: integer ;


B

e1

e2

max = 0;
for i = 1 to 20 do
if ( A[i] > max ) then
max = A[i] ;
end if;
end for

e3

Models & Architectures

39 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Program-state machines

 Merits:
represent systems states, data, control and activities in a single model
overcome the limitations of programming languages and HCFSM models

Models & Architectures

40 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Queueing model

Arriving
requests

Queue Server

(a) One server

Arriving
requests

(b) Multiple servers

Models & Architectures

41 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Queueing model

 Characteristics:
used for analyzing systems performance, and
can nd utilization, queueing length, throughput

Models & Architectures

42 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Architectures

 Application-speci c architectures
Controller architecture,
Datapath architecture,
Finite-state machine with datapath (FSMD).

 General-purpose processors
Complex instruction set computer (CISC)
Reduced instruction set computer (RISC)
Vector machine
Very long instruction word computer (VLIW)

 Parallel processors

Models & Architectures

43 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Controller architecture

State register

Nextstate
function

Output
function

Outputs

Inputs

Models & Architectures

44 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Datapath architecture
x(i) b(0)

x(i1) b(1)

x(i2) b(2)

x(i3) b(3)

+
Pipeline stages

+
y(i)
(a) Three stage pipeline

x(i) b(0)

x(i1) b(1)

x(i2) b(2)

x(i3) b(3)

y(i)

Pipeline stages
(b) Four stage pipeline

Models & Architectures

45 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

FSMD

Datapath inputs

State register

Nextstate
function

Output
function

Control

Datapath

Status
Control unit

Datapath outputs

Models & Architectures

46 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

CISC architecture

Control

Microprogram
memory

Datapath
PC

MicroPC
+1
Address
selection
logic

Status

Memory
Control unit

Models & Architectures

Instruction reg.

47 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

RISC architecture

Datapath

Control

Hardwired
output and
nextstate
logic

Register
file

ALU

Data
cache

State register
Status

Instruction reg.

Instr.
cache

Memory

Control unit

Models & Architectures

48 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Vector machines

Interleaved memory

Models & Architectures

Memory
pipes

Memory
pipes

Vector
registers

Scalar
registers

Vector
functional
unit

Scalar
functional
unit

49 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

VLIW architecture

Memory

Register file

Models & Architectures

50 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Parallel processors: SIMD/MIMD

PE 0

Control
unit

PE 1

PE N1

Proc. 0

Proc. 1

Proc. N1

Mem. 0

Mem. 1

Mem. N1

Interconnection network
(a) Message passing

Proc. 0

Proc. 1

Proc. N1

Interconnection network

Mem. 0

Mem. 1

Mem. N1

(b) Shared memory

Models & Architectures

51 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Conclusion

 Different models focus on different aspects


 Proper model needs to represent systems features
 Models are implemented in architectures
 Smooth transformation of models to architectures increases productivity

Models & Architectures

52 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

System speci cation

 For every design, there exists a conceptual view


 Conceptual view depends on application
Computation : conceptualized as a program
Controller : conceptualized as a state-machine

 Goal of speci cation language


Capture conceptual view with minimum designer effort

 Ideal language
1-to-1 mapping between conceptual model & language constructs

53 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

 Characteristics of commonly used conceptual models:


Concurrency, hierarchy, synchronization

 Requirements for embedded system speci cation


 Evaluate HDLs with respect to embedded systems
VHDL, Verilog, Esterel, CSP, Statecharts, SDL, SpecCharts

System speci cation

54 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Concurrency

 Behavior: a chunk of system functionality


e.g. process, procedure, state-machine

 System often conceptualized as set of concurrent behaviors


 Concurrency can exist at different abstraction levels:
Job-level
Task-level
Statement-level
Operation-level
Bit-level

 Two types of concurrency within a behavior


Data-driven, Control-driven

System speci cation

55 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data-driven concurrency
 Operations execute when input data is available
 Execution order determined by data dependencies

add
1: Q = A + B
2: Y = X + P
3: P = (C D) * Q

subtract

multiply

add
Q

System speci cation

56 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Control-driven concurrency
 Control thread : set of operations executed sequentially
 Concurrency represented by multiple control threads
Q

Fork-join statement

sequential behavior X
begin
Q();
fork A(); B(); C(); join;
R();
end behavior X;

Process statement
System speci cation

57 of 214

concurrent behavior X
begin
process A();
process B();
process C();
end behavior X;

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State-transitions

 Systems often are state-based, e.g. controllers


 State may represent
mode or stage of being
computation

 Dif cult to capture using programming constructs


u

start

Q
x

System speci cation

58 of 214

finish

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchy

 Required for managing system complexity


Allows system modeler to focus on one subsystem at a time
Enhances comprehension of system functionality
Scoping mechanism for objects like types and variables

 Two types of hierarchy


Structural hierarchy
Behavioral hierarchy

System speci cation

59 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Structural hierarchy
 System represented as set of interconnected components
 Interconnections between components represent wires
 Several levels: systems, chips, RT-components, gates
System
Processor
Control Logic

Datapath
data bus

Memory
control
lines

System speci cation

60 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Behavioral hierarchy

 Ability to successively decompose behavior into sub-behaviors


behavior P
variable x, y;
begin
Q(x) ;
R(y) ;
end behavior P;

 Concurrent decomposition
Fork-join
Process

P
e4

 Sequential decomposition
Procedure
State-machine

Q1

e5
e2

e1

Q3
Q2

R
R1
e8

e6

e3

R2
e7

System speci cation

61 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Programming constructs

 Some behaviors easily conceptualized as sequential algorithms


 Wide variety of constructs available
Assignment, branching, iteration, subprograms,
recursion, complex data types (records, lists)
type
buffer_type is array (1 to 10) of integer;
variable buf : buffer_type;
variable i, j : integer;
for i = 1 to 10
for j = i to i
if (buf(i) > buf(j)) then
SWAP(buf(i), buf(j));
end if;
end for;
end for;

System speci cation

62 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Behavioral completion

 Behavior completes when all computations performed


 Advantages
Behavior can be viewed without inter-level transitions
Allows natural decomposition into sequential subbehaviors
B
q

start

X1
q

final
state

63 of 214

e5

e2

Y1
e3

X3

e1

X2

System speci cation

Y2
e4

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Communication

shared memory

 Concurrent behaviors exchange data


 Shared-memory model

process P

process Q

process P

process Q

begin
variable x
....
send (x);
....
end

begin
variable y
....
receive (y);
....
end

Sender updates common medium


Persistent, Non-persistent

 Message-passing model
Data sent over abstract channels
Unidirectional / bidirectional
Point-to-point / multiway
Blocking / non-blocking

System speci cation

64 of 214

channel
C

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Synchronization

 Concurrent behaviors execute at different speeds


 Synchronization required when
Data exchanged between behaviors
Different activities must be performed simultaneously

 Two types of synchronization mechanisms


Control-dependent
Data-dependent

System speci cation

65 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Control-dependent synchronization
 Synchronization based on control structure of behavior
Q

Fork-join

behavior X
begin
Q();
fork A(); B(); C(); join;
R();
end behavior X;

C
synchronization
point

Reset

AB

ABC
A

B1

A1
A2

System speci cation

66 of 214

B2

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data-dependent synchronization

 Synchronization based on communication of data between behaviors


AB
AB

AB
B

A
A1

A1
A1

A2

A
B1

B2

A2

Synchronization by
common event

System speci cation

B1
entered A2

B2

Synchronization by
status detection

67 of 214

x:=0
e

A2
x:=1

B1
(x=1)

B2

Synchronization by
common variable

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Exception handling

 Occurrence of event terminates current computation


 Control transferred to appropriate next mode
 Example of exceptions: interrupts, resets
P

P1

P2

System speci cation

68 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Timing

 Required to represent real world implementations


 Functional timing: affects simulation of system speci cation
wait for 200 ns;
A <= A + 1 after 100 ns;

 Timing constraints: guide synthesis and veri cation tools


min 50 ns

behavior Q
IN
behavior
B

max 10 ms

channel C
(max 10 Mb/s)
behavior P

System speci cation

69 of 214

OUT
time

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Embedded system speci cation

 Embedded system: behavior de ned by interaction with environment


 Essential characteristics
State-transitions
Behavioral hierarchy
Programming constructs

Exceptions
Concurrency
Behavioral completion
P

start
P
u

P1
fork

P2

w
join

Q
x

System speci cation

70 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

VHDL

 IEEE standard, intended for documentation


and exchange of designs [IEE88]
 Characteristics supported
Behavioral hierarchy : single level of processes
Structural hierarchy : nested blocks and component instantiations
Concurrency : task-level (process), statement-level (signal assignment)
Programming constructs
Communication : shared-memory using global signals
Synchronization : wait on and wait until statements
Timing : wait for statement, after clause in assignments

 Characteristics not supported


Exceptions : partially supported by guarded signal assignments
State transitions

System speci cation

71 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Verilog and Esterel

 Verilog [TM91] developed as proprietary language


for speci cation, simulation
 Esterel [Hal93] developed for speci cation of reactive systems
 Characteristics supported:
Behavioral hierarchy : fork-join
Structural hierarchy : hierarchy of interconnected modules
Programming constructs
Communication : shared registers (Verilog) and broadcasting (Esterel)
Synchronization : wait for an event on a signal
Timing : modeling of gate, net, assignment delays in Verilog
Exceptions : disable (Verilog), watching, do-upto, trap statements (Esterel)

 Characteristics not supported: State transitions


System speci cation

72 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

SDL (Speci cation and Description language)


 CCITT standard in telecommunication
for protocol speci cation [BHS91]

system
block
process

signal route

 Characteristics supported
Behavioral hierarchy : nested data ow
Structural hierarchy : nested blocks
State transitions : state machine in processes
Communication : message passing
Timing : timeouts generated by timer object

 Characteristics not supported


Exceptions
Programming constructs

System speci cation

73 of 214

process
signal route

channel

channel

block
channel

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

CSP (Communicating Sequential Processes)

 Intended to specify programs running on


multiprocessor machines [Hoa78]
 Characteristics supported
Behavioral hierarchy : fork-join using parallel command
Programming constructs
Communication : message passing using input, output commands
Synchronization : blocking message passing

 Characteristics not supported


Exceptions
State transitions
Structural hierarchy
Timing

System speci cation

74 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

SpecCharts
 Developed for embedded
system speci cation [NVG92]

port P, Q : in integer;

 PSM (program-state machine) model + VHDL

type INTARRAY is array


(natural range <>) of integer;
signal A : INTARRY (15 downto 0);

 Characteristics supported

Y
X1

Behavioral hierarchy : sequential/concurrent behaviors


State transitions: TOC (transition on completion) arcs
Communication : shared memory, message passing
Exceptions : TI (transition immediately) arcs

 Characteristics similar to VHDL

variable MAX : integer ;

e1
X2

MAX := 0;
for J in 0 to 15 loop
if ( A(J) > MAX ) then
max := A(J) ;
end if;
end loop

e2

e3

Programming constructs
Structural hierarchy
Synchronization and Timing

System speci cation

75 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

SpecCharts : state transitions

 State transitions represented by TOC and TI arcs between behaviors

start

behavior MAIN type sequential subbehaviors is


begin
P : (TOC, u, Q) ;
Q : (TOC, v, P), (TOC, w, R);
R : (TOC, x, Q);

P
v

behavior P .....
behavior Q .....
behavior R .....

Q
w
x

end MAIN;

System speci cation

76 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

SpecCharts : behavioral hierarchy

 Hierarchy represented by nested behaviors


 Behavior decomposed into sequential or concurrent subbehaviors
behavior MAIN type sequential subbehaviors is
begin
P : (TOC, true, Q_R);
Q_R : (TOC, true, S);
S:;

P
fork

join

behavior Q_R type concurrent subbehavior is


begin
Q : (TOC, true, halt);
R : (TOC, true, halt);
behavior Q .....
behavior R .....
end Q_R;
behavior S.....
end MAIN;

System speci cation

behavior P .....
.....

77 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

SpecCharts : exceptions

 Exceptions represented by TI (transition immediately) arcs

behavior MAIN type sequential subbehaviors is


begin
P : (TI, e, Q);
Q : ;

P1
P2

behavior P
behavior P1
.......
behavior P2
.......

e
Q

behavior Q
......
end MAIN;

System speci cation

78 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary
Embedded System Features
Language
State
Transitions

Behavioral
Hierarchy

Concurrency

Program
Constructs

Exceptions

Behavioral
Completion

VHDL
Verilog
Esterel
SDL
CSP
Statecharts
SpecCharts
Feature fully
supported

System speci cation

79 of 214

Feature partially
supported

Feature not
supported

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Speci cation example

 An executable speci cation-language enables:


Early veri cation
Precision
Automation
Documentation

 A good language/model match reduces:


Capture time
Comprehension time
Functional errors

80 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

 Capture an examples model in a particular language


PSM model in the SpecCharts language

 Point out the bene ts of a good language/model match


 Highlight experiments that demonstrate those bene ts

Speci cation example

81 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Answering machine controllers environment


phone line

tollsaver

beep

offhook

ring

hangup

tone

Line
circuitry

tape_cnt

tape_rew

tape_fwd

tape_play

Tape
unit

tape_rec

ann_done

ann_play

ann_rec

Announcement
unit

messages

power

rec
ann
light

Controller
hear
ann

on/off
memo
play
msgs
mic

Speci cation example

stop

82 of 214

rew

play

fwd

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Highest-level view of the controller

Controller
SystemOff
power=0

power=1

SystemOn
phone line

tollsaver

beep

offhook

ring

hangup

tone

Line
circuitry

tape_cnt

tape_rew

tape_fwd

tape_play

Tape
unit

tape_rec

ann_done

ann_play

ann_rec

Announcement
unit

messages

power

rec
ann
light

Controller
hear
ann

on/off
memo
play
msgs
mic

stop

rew

play

fwd

Speci cation example

83 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The SystemOn behavior

 System usually responds


to the line
 Pressing any machine button
gets immediate response

SystemOn
RespondToLine
rising(any_button_pushed)
RespondToMachineButton

phone line

tollsaver

beep

offhook

ring

hangup

tone

Line
circuitry

tape_cnt

tape_rew

tape_fwd

tape_play

Tape
unit

tape_rec

ann_done

ann_play

ann_rec

Announcement
unit

messages

power

rec
ann
light

Controller
hear
ann

on/off
memo
play
msgs
mic

stop

rew

play

fwd

Speci cation example

84 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The RespondToMachineButton behavior

RespondToMachineButton
behavior RespondToMachineButton
type code is
begin
if (play=1) then
HandlePlay;
elsif (fwd=1) then
HandleFwd;
elsif (rew=1) then
HandleRew;
elsif (memo=1) then
HandleMemo;
elsif (stop=1) then
HandleStop;
elsif (hear_ann=1) then
HandleHearAnn;
elsif (rec_ann=1) then
HandleRecAnn;
elsif (play_msgs=1) then
HandlePlayMsgs;
end if;
end;

phone line

tollsaver

beep

offhook

ring

hangup

tone

Line
circuitry

tape_cnt

tape_rew

tape_fwd

tape_play

Tape
unit

tape_rec

ann_done

ann_play

ann_rec

Announcement
unit

HandlePlay
play=1
HandleFwd
fwd=1
HandleRew
rew=1
HandleMemo
memo=1
HandleStop
stop=1
HandleHearAnn
hear_ann=1
HandleRecAnn
rec_ann=1

messages

HandlePlayMsgs

power

play_msgs=1

rec
ann
light

Controller
hear
ann

on/off

(a)

(b)

memo
play
msgs
mic

stop

rew

play

fwd

Speci cation example

85 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The RespondToLine behavior

 Monitors line for rings


 Answers line

RespondToLine

 Responds to exceptions

Monitor

Hangup
Machine turned off

rising(hangup)
falling(machine_on)

phone line

Answer

tollsaver

beep

offhook

ring

hangup

tone

Line
circuitry

tape_cnt

tape_rew

tape_fwd

tape_play

Tape
unit

tape_rec

ann_done

ann_play

ann_rec

Announcement
unit

messages

power

rec
ann
light

Controller
hear
ann

on/off
memo
play
msgs
mic

stop

rew

play

Speci cation example

fwd

86 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The Monitor behavior


Monitor

 Counts for
required rings
 Requirements
may change

MaintainRingsToWait

phone line

tollsaver

beep

offhook

ring

hangup

tone

Line
circuitry

tape_cnt

tape_rew

tape_fwd

tape_play

Tape
unit

tape_rec

ann_done

ann_play

ann_rec

Announcement
unit

signal rings_to_wait : integer range 1 to 20 := 4;


function DetermineRingsToWait return integer is begin
if ((num_msgs > 0) and (tollsaver=1) and (machine_on=1)) then
return(2);
elsif (machine_on=1) then
return(4);
else
return(15);
end if;
end;

messages

power

rec
ann

loop
rings_to_wait <= DetermineRingsToWait;
wait on tollsaver, machine_on;
end loop;

light

Controller
hear
ann

on/off
memo

CountRings
variable I : integer range 0 to 20;
i := 0;
while (i < rings_to_wait) loop
wait on rings_to_wait, ring;
if (rising(ring)) then
i := i + 1;
end if;
end loop;

play
msgs
mic

stop

rew

play

fwd

Speci cation example

87 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The Answer behavior

Answer
rising(hangup)
PlayAnnouncement
button="0001"

RecordMsg

Hangup

button="0001"
RemoteOperation
(a)

behavior PlayAnnouncement type code is


begin
ann_play <= 1;
wait until ann_done = 1;
ann_play <= 0;
end;

(b)

behavior RecordMsg type code is


begin
ProduceBeep(1 s);
if (hangup = 0) then
tape_rec <= 1;
wait until hangup=1 for 100 s;
ProduceBeep(1 s);
num_msgs <= num_msgs + 1;
tape_rec <= 0;
end if;
end;
(c)

phone line

tollsaver

beep

offhook

ring

hangup

tone

Line
circuitry

tape_cnt

tape_rew

tape_fwd

tape_play

Tape
unit

tape_rec

ann_done

ann_play

ann_rec

Announcement
unit

messages

power

rec
ann
light

Controller
hear
ann

on/off
memo
play
msgs
mic

stop

rew

play

fwd

Speci cation example

88 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The RemoteOperation behavior

 Owner can operate machine remotely by phone


 Owner identi es himself by four button ID
RemoteOperation
hangup=1
CheckCode
code_ok=0

code_ok=1

RespondToCmds
(a)

behavior CheckUserCode type code is


begin
code_ok <= true;
for (i in 1 to 4) loop
wait until tone /= "1111" and toneevent;
if (tone /= user_code(i)) then
code_ok <= false;
end if;
end loop;
end;
(b)

phone line

tollsaver

beep

offhook

ring

hangup

tone

Line
circuitry

tape_cnt

tape_rew

tape_fwd

tape_play

Tape
unit

tape_rec

ann_done

ann_play

ann_rec

Announcement
unit

messages

power

rec
ann
light

Controller
hear
ann

on/off
memo
play
msgs
mic

stop

rew

play

fwd

Speci cation example

89 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The answering machine controller speci cation


Controller
SystemOff

SystemOn

Announcement
unit

Tape
unit

power=0

power=1

phone line

InitializeSystem

RespondToMachineButton

Line
circuitry

beep

offhook

ring

hangup

tone

RespondToLine
tape_cnt

tape_rew

tape_fwd

tape_play

tape_rec

ann_done

ann_play

ann_rec

rising(any_button_pushed)

tollsaver

rising(hangup)

Monitor

falling(machine_on)

messages

power

Answer
rising(hangup)

Hangup

RecordMsg

PlayAnnouncement
tone="0001"

rec
ann
light

Controller

RemoteOperation
hangup=1

hear
ann

CheckUserCode

on/off
memo

code_ok

not code_ok

RespondToCmds
tone="0010"

play
msgs
mic

stop

rew

play

fwd

MiscCmds

HearMsgsCmds
hangup=1

other

ResetTape

Speci cation example

90 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Executable speci cation use

 Precision
Readability/precision compete in a natural language
Executable speci cation encourages precision
Designer asks questions, speci cation answers them

 Language/model match (SpecCharts/PSM):


Hierarchy
State-transitions
Programming constructs
Concurrency
Exceptions
Completion
Equivalence of states and programs

Speci cation example

91 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Speci cation capture experiment


VHDL

SpecCharts

40

16

Number of modelers

Number of incorrect specifications first time

Number of incorrect specifications second time

Average specificationtime in minutes

 VHDL modelers required 2.5 times longer


 Two VHDL speci cations possessed control errors
 SpecCharts were effective for state-transitions and exceptions

Speci cation example

92 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Comparison of SpecCharts, VHDL and Statecharts


Answering machine example

Specification attributes

Conceptual
model

SpecCharts

VHDL
(hierarch.)

VHDL
(flat)

Programstates

42

42

42

32

80

Arcs

40

40

40

152

135

Control signals

84

Lines/leaf

27

29

Lines

446

1592

963

Words

1733

6740

8088

No sequential
program constructs

Shortcomings

Statecharts

No hierarchy

No exception
constructs

No hierarchical
events
No statetransition
constructs

Speci cation example

93 of 214

X
X

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Design quality experiment

Design attribute

Designed from
English

Designed from
SpecCharts

Control transistors

3130

2630

Datapath transistors

2277

2251

Total transistors

5407

4881

Total pins

38

38

 No loss in design quality with an executable language

Speci cation example

94 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary

 Executable languages encourage precision and automation


 The language should support an appropriate model
Makes speci cation easy

 Strongly parallels programming languages


Structured vs. assembly languages
Object-oriented model and C++

Speci cation example

95 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Translation

 Model often unsupported by a standard language


(1) Use a standard language anyway
Many tools available
But, captures model unnaturally

(2) Use an application-speci c language


Captures model naturally
But, not many tools available

(3) Use a front-end language


Captures model naturally
Many tools available after translating to a standard

96 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

 Front-end language in VHDL environment


 State machine translation
 Fork-join translation
 Exception translation

Translation

97 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A front-end language in a VHDL environment

VHDL

SpecCharts

Translator

VHDL

VHDL environment
Synthesis
tool

Simulator

Debuger

Testgenerator

Tool output

Translation

98 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State machine translation

type state_type is (P, Q, R);


variable state : state_type := P;
start

(a)

Translation

99 of 214

not u

loop
case (state) is
when P =>
<actions for P>
if (u) then
state := Q;
else if (not u) then
state := R;
end if;
when Q =>
<actions for Q>
state := P;
when R =>
<actions for R>
state := Q;
end case;
end loop;
(b)

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Fork-join translation

signal fork, P1_done, P2_done : boolean;


Main: process
begin

Main : process
begin

statement1;

statement1;

parallel
{
P1;
P2;
}

fork <= true;


wait until P1_done
and P2_done;

statement2;
...

statement2;
...

(a)

Translation

P1_process : process
begin
wait until fork;
P1;
P1_done <= true;
wait until not fork;
P1_done <= false;
end;
(b)

100 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Exception translation

event e : T > S;
T:
statement1;
statement2;
statement3;

S:
statement4;
statement5;
(a)

Translation

101 of 214

T
statement1;
if (e)
goto S_start;
statement2;
if (e)
goto S_start;
statement3;
S_start: S
statement4;
statement5;
(b)

T
T_loop : loop
statement;
if (e)
exit T_loop;
statement2;
if (e)
exit T_loop;
statement3;
exit T_loop;
end loop;
S
statement4;
statement5;
(c)

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary

 The perfect standard language may never exist


 No standard language supports all models
 Using a front-end language solves the problem
Natural capture
Large base of tools and expertise

 Translators are simple


Maps characteristics to existing constructs
Generates well-structured and consistent output

Translation

102 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

System partitioning

 System functionality is implemented on system components


ASICs, processors, memories, buses

 Two design tasks:


Allocate system components or ASIC constraints
Partition functionality among components

 Constraints
Cost, performance, size, power

 Partitioning is a central system design task

103 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

 Structural vs. functional partitioning


 Natural vs. executable language speci cations
 Basic partitioning issues and algorithms
 Functional partitioning techniques for hardware
 Hardware/software partitioning
 Functional partitioning techniques for software
 Exploring tradeoffs with functional partitioning

System partitioning

104 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Structural vs. functional partitioning

 Structural: Implement structure, then partition


 Functional: Partition function, then implement
Enables better size/performance tradeoffs
Uses fewer objects, better for algorithms/humans
Permits hardware/software solutions
But, its harder than graph partitioning

System partitioning

105 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Natural vs. executable language speci cations

 Alternative methods for specifying functionality


 Natural languages common in practice
 Executable languages becoming popular
Automated estimation/partitioning explores solutions
Early veri cation reduces costly late changes
Precision eases integration

System partitioning

106 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning issues


Specification abstractionlevel

Granularity
Metrics and estimations
Partitioning algorithms
Objective and closeness functions
Systemcomponent allocation

Output

System partitioning

107 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning issues (cont.)

 Speci cation-abstraction level: input de nition


Just indicating the language is insuf cient
Abstraction-level indicates amount of design already done
e.g. task DFG, tasks, CDFG, FSMD

 Granularity: speci cation size in each object


Fine granularity yields more possible designs
Coarse granularity better for computation, designer interaction
e.g. tasks, procedures, statement blocks, statements

 Component allocation: types and numbers


e.g. ASICs, processors, memories, buses

 Output: format and uses


e.g. new speci cation, hints to synthesis tool

System partitioning

108 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning issues (cont.)

 Metrics and estimations: "good" partition attributes


e.g. cost, speed, power, size, pins, testability, reliability
Estimates derived from quick, rough implementation
Speed and accuracy are competing goals of estimation

 Objective and closeness functions


Combines multiple metric values
Closeness used for grouping before complete partition
Weighted sum common
e.g. k1 F (area; c) + k2 F (delay; c) + k3 F (power; c)

System partitioning

109 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning issues (cont.)

 Algorithms: control strategies


seeking best partition
Constructive creates partition
Iterative improves partition
Key is to escape local minimum

System partitioning

110 of 214

A
Cost

Number of moves

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Typical partitioning-system con guration

User interface

Output

Input
Model

Algorithms

Estimators
Design
feedback
Objective
function

System partitioning

111 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning algorithms

 Clustering and multi-stage clustering [Joh67, LT91]


 Group migration (a.k.a. min-cut or Kernighan/Lin) [KL70, FM82]
 Ratio cut [KC91]
 Simulated annealing [KGV83]
 Genetic evolution
 Integer linear programming

System partitioning

112 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchical clustering

 Constructive algorithm using closeness metrics


 Overview
Groups closest objects
Recomputes closenesses
Repeats until termination condition met

 Cluster tree maintains history of merges


Cutline across the tree de nes a partition

System partitioning

113 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchical clustering algorithm


/* Initialize each object as a group */
for each o loop

p
P

oS
=P
p
=

end loop

/* Compute closenesses between objects */


for each p loop
for each p loop
c = ComputeCloseness(p ; p )
end loop
end loop
i

i;j

/* Merge closest objects and recompute closenesses


*/

while not Terminate(P ) loop


p ; p = FindClosestObjects(P; C )

P ;p ; p Sp
for each p loop
c = ComputeCloseness(p ; p
i

ij

ij;k

ij

end loop
end loop
return P

System partitioning

114 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchical clustering example

o
30
15

o2

1
10

20

o3

o2

o3

o2

o3

10
10

10

o1

o1

25

o4

o3

10

10

o4

o4

o2
o4

Avg(10,10) = 10
Avg(15,25) = 20

o1 o2 o3 o4

o1 o2 o3 o4

(a)

(b)

System partitioning

115 of 214

o1 o2 o3 o4
(c)

o1 o2 o3 o4
(d)

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Simulated annealing

 Iterative algorithm modeled after physical annealing process


 Overview
Starts with initial partition and temperature
Slowly decreases temperature
For each temperature, generates random moves
Accepts any move that improves cost
Accepts some bad moves, less likely at low temperatures

 Results and complexity depend on temperature decrease rate

System partitioning

116 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Simulated annealing algorithm

temp = initial temperature


cost = Objfct(P )

while not Frozen loop


while not Equilibrium loop
P tentative = Move(P )
cost tentative = Objfct(P

tentative)
cost = cost tentative ; cost
if (Accept(cost; temp) > Random(0; 1)) then
P = P tentative
cost = cost tentative

end if
end loop
temp = DecreaseTemp(temp)
end loop
where:

System partitioning

117 of 214

cost
)
Accept(cost; temp) = min(1; e; temp

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional partitioning for hardware: BUD

 Goal: incorporate area/time into synthesis [MK90]


 Clusters CDFG operations into datapath modules
 Closeness metrics:
Interconnecting wires
Concurrency
Shared hardware

 Each clustering corresponds to an allocation/scheduling


 Selects clustering with best area/time

System partitioning

118 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

BUD example
start
(bitwidths = 4)

+ =

1
y z

.2

<

.7

<
4

x := a + b;
if (a = b)
c := ((x y) < z);

+
0

cond

.2

38

cond

finish
(a)

System partitioning

119 of 214

(b)

(c)

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

BUD example (cont.)


)=
,0
38
.

19

g(

Av

.035

<

<

+=<

AVG(.19,.12) =

4)

.1

.2

0,

.2

g(

Av

=<

(a)

<

Chip area A Expected cycle


time T

Clusters
+=<
+, =<
+, =, <
+, , =, <

17.5
15.8
13.8
16.4

36
26
26
26

<

Objfct = AxT
630
411
359 (best)
426

(b)
Chip

+
Controller

<

=
3 clusters

(c)

System partitioning

120 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional partitioning for hardware: Aparty

 Extends BUD clustering to multiple stages [LT91]


Different closeness metrics for each stage

 Closeness metrics:
Control transfer reduction
Data transfer reduction
Hardware sharing

System partitioning

121 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Aparty example

o1
o 12
o2

o 12

17

o3

o3

o3
23

21

o4

o1 o2 o3 o4
(a)

System partitioning

122 of 214

o4

o4

o 12
(b)

o3 o4
(c)

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hardware/software partitioning

 Combined hardware/software systems are common


 Software is cheap, modi able, and quick to design
 Hardware is fast
 Special algorithms are needed to favor software
 Proposed algorithms
Greedy [GD92]
Hill climbing [EHB94]
Binary-constraint search with hill climbing [VGG93]

System partitioning

123 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional partitioning for systems: Vulcan, Cosyma

 Vulcan [GD90]I
Partitions CDFG operations among hardware only
Group migration and simulated annealing algorithms

 Vulcan II [GD93]
Partitions operations among hardware/software
Architecture: processor, hardware, memory, bus
All communication through memory
Uses greedy algorithm, extracts behaviors from hardware

 Cosyma [EHB94]
Partitions statement blocks among hardware/software
Architecture: processor, hardware, memory, bus
Simulated annealing, extracts behaviors from software

System partitioning

124 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional partitioning for systems: SpecSyn

 Solves three partitioning problems


Behaviors to processors/ASICs
Variables to memories
Communication channels to buses

 Uses fast incremental-update estimators


 Covers both hardware and
hardware/software partitioning [GVN94, VG92]

System partitioning

125 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Exploring tradeoffs with functional partitioning


1200.0
chipset1
chipset2
chipset3

 Each line represents a


different vendors chip set
 Each point represents an
allocation and partition
 Many designs quickly examined

performance (microseconds)

1000.0

800.0

600.0

400.0
C
A
200.0
0.0

System partitioning

126 of 214

20.0

40.0

B
60.0
80.0 100.0
cost (dollars)

120.0

140.0

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary

 Partitioning heavily in uences design quality


 Functional partitioning is necessary
 Executable speci cation enables:
Automation
Exploration
Documentation

 Variety of algorithms exist


 Variety of techniques exist for different applications

System partitioning

127 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Future directions

 Metrics from real design to guide partitioning


 Comparison of functional partitioning algorithms
 Impact of metric selections and orderings
 Impact of of granularity on partition quality
 Exploitation of regularity in partitioning

System partitioning

128 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Estimation

 Estimates allow
Evaluation of design quality
Design space exploration

 Design model
Represents degree of design detail computed
Simple vs. complex models

 Issues for estimation


Accuracy
Speed
Fidelity

129 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

 Accuracy versus speed


 Fidelity
 Quality metrics
Performance metrics
Hardware and software cost metrics

Estimation

130 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Accuracy vs. Speed

 Accuracy: difference between estimated and actual value

; j E (DM) ;(DM) (D) j

 Speed: computation time for obtaining estimate

Estimation Error

Simple Model

Estimation

131 of 214

Computation Time

Actual Design
Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Fidelity
 Estimates must predict quality metrics for different design alternatives
 Fidelity: % of correct predictions for pairs of design implementations
 Higher delity =) correct decisions based on estimates

Metric
estimate

(A, B) =

E(A) > E(B), M(A) < M(B)

(B, C) =

E(B) < E(C), M(B) > M(C)

(A, C) =

E(A) < E(C), M(A) < M(C)

measured
Fidelity = 33 %
A

Estimation

132 of 214

Design
points
Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Quality metrics

 Performance Metrics
Clock cycle, control steps, execution time, communication rates

 Cost Metrics
Hardware: manufacturing cost (area), packaging cost(pin)
Software: program size, data memory size

 Other metrics
Power, testability, design time, time to market

Estimation

133 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hardware design model

Memory

DR

Control
Logic

AR

Control
Register

Muxes

n2

R1

R2

Registers/
Register Files

RF

n
1

State Reg.

p
1

p
3

Muxes

n6

NextState
Logic
n

FU

p
2

Functional
Units

Status bits
Status
Register
Control Unit

Estimation

134 of 214

Datapath

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Clock cycle estimation


 Clock cycle determines:
Resources, execution time

 Determining clock cycle


Designer speci ed [PK89, MK90]
Maximum delay of any functional unit [PPM86, JMP88]
Clock utilization [NG92]
i1 i2
i1 i2

i3

i4

i5

150

i4

80

x
80

80
150

80

150

Estimation

135 of 214

i3

i4

80

i5

i6

+
80

+
80

+
80

+
+

150

o2
o1

Clock Cycle
Exec. Time
Resources

i1 i2

80
o1

i6

80
150

80

i5

i6
150

80

i3

: 380 ns
: 380 ns
: 2 x, 4 +

Clock Cycle
Exec. Time
Resources

o2

: 150 ns
: 600 ns
: 1 x, 1 +

o1

o2

Clock Cycle : 80 ns
Exec. Time : 400 ns
Resources
: 1 x, 1 +

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Clock slack and utilization

 Slack : portion of clock cycle for which FU is idle

slack(clk; ti )

= (

ddelay(ti)  clke  clk ) ; delay(ti)

 Average slack: FU slack averaged over all operations

ave slack(clk)

T
X
[ occur (ti )
slack(clk; ti ) ]
i
T
X
occur(ti)
i

 Clock utilization : % of clock cycle utilized for computations

utilization(clk)
Estimation

136 of 214

(clk )
; ave slack
clk

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Clock utilization

number of
operations

1 x CLK

2 x CLK

3 x CLK

occur(x)=6
occur()=2
occur(+)=2

50

100

Functional unit delay

time (ns)

150
Clock = 65 ns

Slack

6x32
2x9

x
ave_slack(65 ns) =

2 x 17

= 24.4 ns

utilization(65 ns) = 1 (24.4 / 65.0) = 62

Estimation

137 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Slack minimization algorithm

Clock Slack Minimization [NG92]


Compute range: clkmax, clkmin
Compute occurrences: occur(t )
i

max utilization = 0

/* Examine each clock cycle in range */

clkmax loop

for

clkmin  clk 

for all operation types t 2 T loop


Compute slack slack (clk; t )
end loop
i

Compute average slack: ave slack (clk )


Compute utilization: utilization(clk )
/* If highest utilization */
if utilization(clk ) > max
then

utilization

max utilization = utilization(clk)


max utilization clk = clk

end if
end loop

clk(SM ) = max utilization clk

Estimation

138 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Execution time vs. clock utilization


Second order differential equation example

 Clock with highest utilization results in better execution times

Clock cycle vs. Utilization

Execution time vs. utilization


1200.0

160.0
Execution time (ns)

140.0

Clock cycle (ns)

120.0
100.0
80.0
60.0

1000.0

800.0

600.0

560 ns

56 ns
92%

400.0
0.0

40.0
20.0
0.0
0.0

Estimation

92%

20.0

40.0
60.0
Utilization (%)

139 of 214

80.0

20.0

40.0
60.0
Utilization (%)

80.0

100.0

100.0

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Control steps estimation

 Operations in the speci cation assigned to control step


 Number of control steps determines:
Execution time of design
Complexity of control unit

 Scheduling
Granularity is operations in a data ow graph
Computationally expensive

Estimation

140 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Operator-use method
 Granularity is statements in speci cation
 Faster than scheduling, average error 13%
maximum
macronode
control steps

u1 := u x dx
num(t i ) clocks(t i )

ti
add
mult
sub

1
2
1

1
4
1

u2 := 5 x w
u3 := 3 x y
y1 := i x dx

u1 := u x dx
u2 := 5 x w
n u3 := 3 x y
1
y1 := i x dx
w := w + dx

w := w + dx
u1 := u x dx ;
u2 := 5 x w ;
u3 := 3 x y ;
y1 := i x dx ;
w := w + dx ;
u4 := u1 x u2 ;
u5 := dx x u3 ;
y := y + y1 ;
u6 := u u4 ;
u := u6 u5 ;

u := u6 u5

Estimation

141 of 214

max (1 , 8) = 8

u4 := u1 x u2 add: (1/1)*1= 1
2 u5 := dx x u3 mult: (2/2)*4= 4
y := y + y1
max (1 , 4) = 4

3 u6 := u u4

4 u := u6 u5

y := y + y1
u6 := u u4

mult: (4/2)*4= 8

n
u4 := u1 x u2
u5 := dx x u3

add: (1/1)*1= 1

sub: (1/1)*1= 1
max (1 ) = 1
sub: (1/1)*1= 1
max (1 ) = 1
Estimated total
control steps

= 14

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Branching in behaviors

 Control steps maybe shared across exclusive branches


sharing schedule: fewer states, status register
non-sharing schedule: more states, no status registers
B
1
o1
o2

B
2
o3

o6

o4

o7

o5
B
4

o8
(a)

Estimation

B
3

142 of 214

s1

o1

s1

o1

s2

o2

s2

o2

s3

o3

o6

s3

o3

o6 s6

s4

o4

o7

s4

o4

o7 s7

s5

o5

s5

o5

s6

o8
(b)

s8

o8
(c)

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Execution time estimation

 Average start to nish time of behavior


 Straight-line code behaviors

exectime(B )

csteps(B )  clk

 Behavior with branching


Estimate execution time for each basic block
Create control ow graph from basic blocks
Determine branching probabilities
Formulate equations for node frequencies
Solve set of equations
exectime(B ) = X exectime(bi) freq(bi)
b 2B

Estimation

143 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Probability-based ow analysis

B1

A := A + 1;
V
1

A := A + 1;
B2

for I in 1 to 10 loop
B := B + 1;
C := C A;

D>A

if (D > A ) then
D := D + 2;
else
D := D + 3;
end if

e
12

B := B + 1 ;
C := C A;

D <= A

0.5

3
D := D + 2;

D := D + 3;

144 of 214

0.5
e
24

23

V
4

V3
e

e
35

5 E := D * 2 ;
(I > 10)

Estimation

E := D * 2;
end loop;
B := B * A;
C := 3

B: = B * A;
C := 3;

(I =< 10)

e
52

V
5
e
56

45

0.9
0.1

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Probability-based ow analysis
 Flow equations:

freq(S )
freq(v1)
freq(v2)
freq(v3)
freq(v4)
freq(v5)
freq(v6)

=
=
=
=
=
=
=

1:0
1:0
1:0
0:5
0:5
1:0
0:1

 freq(S )
 freq(v1)
 freq(v2)
 freq(v2)
 freq(v3)
 freq(v5)

0:9

 freq(v5)

1:0

 freq(v4)

 Node execution frequencies:

freq(v1)
freq(v3)
freq(v5)

=
=
=

1:0
5:0
10:0

freq(v2)
freq(v4)
freq(v6)

=
=
=

10:0
5:0
1:0

 Can be used to estimate number of accesses to


variables, channels or procedures

Estimation

145 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Communication rates
bits sent over
channel C

200

400

600

800

1000

time (ns)

 Average channel rate


rate of data transfer over lifetime of behavior
56 bits
averate(C ) = 1000
ns = 56 Mb=s

 Peak channel rate


rate of data transfer of single message
8 bits
peakrate(C ) = 100
ns = 80 Mb=s

Estimation

146 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Communication rate estimation

 Total behavior execution time consists of

Computation time, comptime(P ), obtained from ow-analysis


Communication time, commtime(P; C ) = access(P; C ) delay (C )

 Total bits transferred by the channel,

total bits(P; C )

 Channel average rate

averate(C )

access(P; C )  bits(C )

total bits(B; C )
comptime(B ) + commtime(B; C )

 Channel peak rate

bits(C )
peakrate(C ) = protocol
delay(C )

Estimation

147 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Area estimation

 Two tasks:
Determining number and type of components required
Estimating component size for a speci c technology (FSMD, gate arrays etc.)

 Behavior implemented as a FSMD ( nite state machine with datapath)


Datapath components: registers, functional units, multiplexers/buses
Control unit: state register, control logic, next-state logic

 We will discuss
Datapath component estimation
Control unit estimation
Layout area for a custom implementation

Estimation

148 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Clique-partitioning

 Commonly used for determining datapath components


 Let G = (V; E ) be a graph, V and E are set of vertices and edges
 Clique is a complete subgraph of G
 Clique-partitioning
divides the vertices into a minimal number of cliques
each vertex in exactly one clique

 One heuristic: maximum number of common neighbors [CS86]


Two nodes with maximum number of common neighbors are merged
Edges to two nodes replaced by edges to merged node
Process repeated till no more nodes can be merged

Estimation

149 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Clique-partitioning
Common
neighbors

Edge

s
s

1,3

1,4

v2

v3
s

v4
s

3,4

v5

13
v

v3

v4

v5

v2

v2

Common
neighbors

Edge

s
v3
134

v4

v5

25

v4

v5

134

s
Cliques:

150 of 214

2,5

v3

Estimation

2,5

4,5

13,4

v2

4,5

2,5

Common
neighbors

Edge

2,3

134
s
25

{v1 , v 3 , v 4 }

{v2 , v 5 }

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Storage-unit estimation

 Variables not used concurrently maybe mapped same storage-unit


 To use clique-partitioning, construct a graph where
Each variable represented by a vertex
Variables with non-overlapping lifetimes have an edge between] their vertices
v1

v2

v3

v4

v5

v6

v7

v8

v9

v10 v11
v8

v10

v1

Cliques
v9

v2

v7

v11

v3

v5

v4

Storage unit

{v2 , v 3 }

R1

{v6 , v7 , v 9 }

R2

{v4 , v5 , v 8 }

R3

{v10 , v 11}

R4

{v1 }

R5

v6

Estimation

151 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional-unit and interconnect-unit estimation

 Clique-partitioning can be applied


 For determining the number of FUs required, construct a graph where
Each operation in behavior represented by a vertex
Edge connects two vertices if
Corresponding operations assigned different control steps
There exists an FU that can implement both operations

 For determining the number of interconnect units, construct a graph where


Each connection between two units is represented by a vertex
Edge connects two vertices if corresponding connections not used
in same control step

Estimation

152 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Computing datapath area

Routing
channel

Bit slices

LSB

MSB

 Bit-sliced datapath

Lbit =  tr(DP )

bit

nets
Hrt = nets per
track 

area(bit)
area(DP )

=
=

Lbit  (Hcell

Hrt)

bitwidth(DP )  area(bit)

Control
lines
H
cell
H
bit

Estimation

153 of 214

rt
Datapath
components

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Pin estimation

 Number of wires at behaviors boundary depends on


Global data
Port accessed
Communication channels used
Procedure calls
variable N : integer;
variable X : bit_vector(15 downto 0);
procedure SUM(A, B, OUT) is
begin
....
end SUM;
process Main ( ch1, ch2)
out channel ch1 ;
in channel ch2;
{
send (ch1, N);
portF <= portG + 4;
............
receive (ch2, Result);
}

Estimation

154 of 214

channel ch1

channel ch2

portF
portG

process Factorial ( ch1, ch2)


in channel ch1 ;
out channel ch2;
{
receive (ch1, M);
/* compute factorial */
................
send (ch2, result);
}

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Software estimation models

Specification
Specification

8086
Estimator

Compile
to 8086

Compile
to 68000

Compile
to MIPS

8086
instructions

68000
instructions

MIPS
instructions

8086
instruction
timing & size
information

68000
Estimator

68000
instruction
timing & size
information

Software
Metrics

Processor specific model

Estimation

155 of 214

Compile to
generic instructions
8086
instruction
timing & size
information

Generic
instructions

MIPS
Estimator

MIPS
instruction
timing & size
information

Estimator

technology
files for target
processors

68000
instruction
timing & size
information

MIPS
instruction
timing & size
information

Software
Metrics

Generic model

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Deriving processor technology les


Generic instruction
dmem3 = dmem1 + dmem2

8086 instructions

68020 instructions
clocks

instruction
mov ax, word ptr[bp+offset1]
add ax, word ptr[bp+offset2]
mov word ptr[bp+offset3], ax

(10)
(9 + EA1)
(10)

bytes
3
4
3

instruction

clocks

bytes

mov a6@(offset1), d0
add a6@(offset2), d0
mov d0, a6@(offset3)

(7)
(2 + EA2)
(5)

2
2
2

technology file for 8086


generic instruction

execution
time

technology file for 68020


size

...

execution
time

size

...

dmem3 = dmem1 + dmem2


...

Estimation

generic instruction

35 clocks

10
bytes

dmem3 = dmem1 + dmem2

22 clocks

6
bytes

...

156 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Software estimation

 Program execution time


Create basic blocks and compile into generic instructions
Estimate execution time of basic blocks
Perform probability-based ow analysis
Compute execution time of the entire behavior:
exectime(B ) =  ( X exectime(bi) freq(bi) )
b 2B
 accounts for compiler optimizations

 Program memory size


X

progsize(B )

g2G

 Data memory size X

datasize(B )

Estimation

157 of 214

d2D

instr size(g)
datasize(d)

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary and future directions

 We described methods for estimating:


Performance metrics: clock, control steps, execution time, communication rates
Cost metrics: design area, pins, program and data memory size

 Future directions:
Incorporating synthesis/compilation optimizations
New metrics for testability, power, integration cost, etc.
New architectural features for the estimation model

Estimation

158 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Re nement

 Functional objects are grouped and mapped to system components


Functional objects: variables, behaviors, and channels
System components: memories, chips or processors, and buses

 Re nement is update of speci cation to re ect mapping


 Need for re nement
Makes speci cation consistent
Enables simulation of speci cation
Generate input for synthesis, compilation and veri cation tools

159 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

 Re ning variable groups


 Channel re nement
 Resolving access con icts
 Re ning incompatible interfaces

Re nement

160 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Re ning variable groups

 Group of variables mapped to a memory


 Variable folding:
Implementing each variable in a memory with a xed word size

 Memory address translation


Assignment of addresses to each variable in group
Update references to variable by accesses to memory

Re nement

161 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Variable folding
variable
variable
variable
variable

A:
B:
C:
D:

11

bit_vector( 3 downto 0) ;
bit_vector(15 downto 0) ;
bit_vector(11 downto 0) ;
bit_vector(11 downto 0) ;

8 7

4x1
7

0
7..4

...

3..0

to variable C in memory

A( 3 downto 0)
B( 7 downto 0)
B(15 downto 8)

11

C( 7 downto 0)
C(11 downto 8)
D( 5 downto 0)
D(11 downto 6)

Re nement

6x1

...
...

5..0

8bit Memory

to variable D in memory

162 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Memory address translation


variable J, K : integer := 0;
variable V : IntArray (63 downto 0);
....
V(K) := 3;
X := V(36);
V(J) := X;
....
for J in 0 to 63 loop
SUM := SUM + V(J);
end loop;
....
Original specification

Re nement

163 of 214

MEM(163 downto 100)

Assigning addresses to V

variable J, K : integer := 0;
variable MEM : IntArray (255 downto 0);
....
MEM(K +100) := 3;
X := MEM(136);
MEM(J+100) := X;
....
for J in 0 to 63 loop
SUM := SUM + MEM(J +100);
end loop;
....
Refined specification

V (63 downto 0)

variable J : integer := 100;


variable K : integer := 0;
variable MEM : IntArray (255 downto 0);
....
MEM(K + 100) := 3;
X := MEM(136);
MEM(J) := X;
....
for J in 100 to 163 loop
SUM := SUM + MEM(J);
end loop;
....
Refined specification
without offsets for index J

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Re ning channel groups

 Channels are virtual entities over which messages are transferred


 Bus is a physical medium that implements groups of channels
 Bus consists of:
wires representing data and control lines
protocol de ning sequence of assignments to data and control lines

 Two re nement tasks


Bus generation: determining buswidth i.e. number of data lines
Protocol generation: specifying mechanism of transfer over bus

Re nement

164 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Characterizing communication channels

 For a given behavior P that sends data over channel C ,

Message size, bits(C ) : number of bits in each message


Accesses, accesses(P; C ) : number of times P transfers data over C
Average rate, averate(C ) : rate of data transfer of C over lifetime of behavior
Peak rate, peakrate(C ) : rate of transfer of single message

channel
X
t=0

X1

X2

X3

100

200

300

time (ns)
400

bits(C ) = 8 bits
24 bits
averate(C ) = 400
ns = 60 Mbits=s
8 bits
peakrate(C ) = 100
ns = 80 Mbits=s
Re nement

165 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Characterizing buses

 For a given bus B ,

Buswidth , buswidth(B ) : number of data lines in B


Protocol delay, protdelay (B ) : delay for single message transfer over bus
Average rate, averate(B ) : rate of data transfer over B over lifetime of system
Peak rate, peakrate(B ) : maximum rate of transfer of data on bus

buswidth(B )
peakrate(C ) = protdelay
(B )

Re nement

166 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Determining bus rates

 Idle slots of a channel used for messages of other channels


 To ensure that channel average
rates are unaffected by bus
X

averate(B ) 

C 2B

averate(C )

 Goal: to synthesize a bus that constantly transfers data i.e.

peakrate(B ) = averate(C )

Average rate
channel
X

channel
Y
8

bus
B

X1

t=0

Re nement

167 of 214

X1

X2

(2x8 bits) / 4s
= 4 bits/s

16

16

16

Y1

Y2

Y3

16

16

Y1

Y2

1s

2s

16

X2

Y3

3s

(3x16 bits) / 4s
= 12 bits/s

(4 + 12 bits/s)
= 16 bits/s
4s

time

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Constraints for bus generation

 Buswidth: affects number of pins on chip boundaries


 Channel average rates: affects execution time of behaviors
 Channel peak rates: affects time required for single message transfer

channel X

16

16

X1

X2

bus B

X1

bus B

16

X1

X2

168 of 214

1s

averate(B) = 8 bits/s
peakrate(B) =8 bits/s

8
X2

16

t=0

Re nement

averate(X) = 8 bits/s

averate(B) = 8 bits/s
peakrate(B) = 16 bits/s
2s

3s

4s

time

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Bus generation algorithm [NG94]


/* Determine range of buswidths */
minwidth = 1, maxwidth = Max(bits(C ))

mincost = 1, mincostwidth = 1
for currwidth in minwidth to maxwidth loop
/* compute bus peak rate */

peakrate(B ) = currwidth  protdelay(B )

/* compute sum of channel average rates */


averatesum = 0;
for all channels C 2 B loop

access(P; C )  bits(C )
averate(C ) = comptime
(P ) + commtime(P )
averatesum = averatesum + averate(C );

end loop
if (peakrate(B ) > averatesum) then
/* feasible solution, determine minimal cost */
currcost = ComputeCost(currwidth)
if (currcost < mincost) then
mincost = currcost, mincostwidth = currwidth
end if
end if
end loop
return(mincostwidth)

Re nement

169 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Bus generation algorithm

 Compute buswidth range:

minwidth = 1, maxwidth = Max(bits(C ))

 For minwidth  currwidth  maxwidth loop


Compute bus peak rate:

peakrate(B ) = currwidth  protdelay(B )

Compute channel average rates

bits C e  protdelay (B ) ]
commtime(P ) = access(P; C )  [ d currwidth
access(P; C )  bits(C )
averate(C ) = comptime
(P ) + commtime(P )
X
if peakrate(B ) 
averate(C ) then
C 2B
if bestcost > ComputeCost(currwidth) then
bestcost = ComputeCost(currwidth)
bestwidth = currwidth
(

Re nement

170 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Bus generation example

 2 behavior accessing 16 bit data over two channels

Cost Function Value

 Constraints speci ed for channel peak rates

Re nement

171 of 214

9000.0
8000.0
7000.0
6000.0
5000.0
4000.0
3000.0
2000.0
1000.0
0.0
-1000.0
0.0

selected buswidth

infeasible
implementations
feasible
implementations

4.0

8.0

12.0
16.0
Buswidth

20.0

24.0

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Performance vs. buswidth tradeoffs

 Allows a buswidth to be selected, given performance constraints

Behavior execution time (clocks)

e.g. behavior P 1 has performance constraint of 2500 clocks.


buswidths of 4 or greater must be selected

Re nement

172 of 214

7000.0
6000.0
5000.0
4000.0
3000.0
2000.0
1000.0
0.0
0.0

4.0

8.0

12.0
16.0
Buswidth (pins)

20.0

24.0

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol generation

 Bus consists of several sets of wires:


Data lines, used for transferring message bits
Control lines, used for synchronization between behaviors
ID lines, used for identifying the channel active on the bus

 All channels mapped to bus share these lines


 Number of data lines determined by bus generation algorithm
 Protocol generation consists of six steps

Re nement

173 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol generation
1. Protocol selection: full handshake, half-handshake etc.
2. ID assignment: N channels require log2(N ) ID lines

Re nement

behavior P
variable AD;
begin
.....
X <= 32 ;
.....
MEM(AD) := X + 7;
.....
end ;

CH0

"00"

CH1

"00"

CH2

"00"

behavior Q
variable COUNT;
begin
.....
MEM(60) := COUNT ;
.....
end ;

CH3

"00"

174 of 214

variable X :
bit_vector(15 downto 0) ;

variable MEM : bit_vector


(63 downto 0, 15 downto 0);

bus B

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol generation
type HandShakeBus is record
START, DONE : bit ;
ID : bit_vector(1 downto 0) ;
DATA : bit_vector(7 downto 0) ;
end record ;
signal B : HandShakeBus ;

3. Bus structure de nition

4. Bus protocol de nition

Re nement

175 of 214

procedure ReceiveCH0( rxdata : out bit_vector) is


begin
for J in 1 to 2 loop
wait until (B.START = 1) and (B.ID = "00") ;
rxdata (8*J1 downto 8*(J1)) <= B.DATA ;
B.DONE <= 1 ;
wait until (B.START = 0) ;
B.DONE <= 0 ;
end loop;
end ReceiveCH0;
procedure SendCH0( txdata : in bit_vector) is
begin
bus B.ID <= "00" ;
for J in 1 to 2 loop
B.data <= txdata(8*J1 downto 8*(J1)) ;
B.START <= 1 ;
wait until (B.DONE = 1) ;
B.START <= 0 ;
wait until (B.DONE = 0) ;
end loop;
end SendCH0;

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol generation

5. Update variable references


6. Generate behaviors for variables
process P
variable AD Xtemp;
begin
.....
SendCH0(32) ;
.....
ReceiveCH1(Xtemp);
SendCH2(AD, Xtemp+7);
.....
end ;

process Q
variable COUNT;
begin
.....
SendCH3(60, COUNT);
.....
end ;

Re nement

176 of 214

bus B

process Xproc
variable X ;
begin
wait on B.ID;
if (B.ID="00") then
receiveCH0(X);
elsif (B.ID="01" ) then
sendCH1(X);
end if;
end;
process MEMproc
variable MEM: array(0 to 63);
begin
wait on B.ID;
if (B.ID="10") then
receiveCH2(MEM);
elsif (B.ID="11" ) then
receiveCH3(MEM);
end if;
end;

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Resolving access con icts

 System partitioning may result in concurrent accesses to a resource


Channels mapped to a bus may attempt data transfer simultaneously
Variables mapped to a memory may be accessed by behaviors simultaneously

 Arbiter needs to be generated to resolve such access con icts


 Three tasks
Arbitration model selection
Arbitration scheme selection
Arbiter generation

Re nement

177 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Arbitration models
addr / data
addr / data

port1

MemArbiter

port2

memory MEM
req,
grant

Static

behavior P

behavior Q

req,
grant

behavior R

addr / data
addr / data

Dynamic

port1

MemArbiter

port2

memory MEM
req,
grant

behavior P

Re nement

178 of 214

req,
grant

req,
grant

behavior Q

behavior R

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Arbiter generation

 Example of bus arbitration

Two behaviors accessing a single resource, bus B


Behavior P assigned higher priority than Q
Fixed priority implemented with two handshake signals Req and Grant

process B_arbiter
begin
wait until (Req_P=1)
or (Req_Q = 1);
if (Req_P = 1) then
Grant_P = 1;
wait unitl (Req_P = 0);
Grant_P = 0";
elsif (Req_Q = 1) then
Grant_Q <= 1;
wait until (Req_Q = 0);
Grant_Q <= 0;
end if;
end process;

Re nement

179 of 214

Req_P
Grant_P

Req_Q
Grant_Q

process P
variable AD Xtemp;
begin
.....
Req_P <= 1;
wait until (Grant_P = 1);
SendCH0(32) ;
Req_P <= 0;
.....
end process ;
process Q
variable COUNT;
begin
.....
Req_Q <= 1;
wait until (Grant_Q = 1);
SendCH3(60, COUNT);
Req_Q <= 0;
.....
end process;

bus B

process Xproc
variable X ;
begin
wait on B.ID;
if (B.ID="00") then
receiveCH0(X);
elsif (B.ID="01" ) then
sendCH1(X);
end if;
end process;

process MEMproc
variable MEM: array(0 to 63);
begin
wait on B.ID;
if (B.ID="10") then
receiveCH2(MEM);
elsif (B.ID="11" ) then
receiveCH3(MEM);
end if;
end process;

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Effect of binding on interfaces


Custom

behavior
A

Custom

Channel X
Pa

behavior
B

Pb

protocol

protocol
Standard

Custom

behavior
X

Channel X
Pb

Pa

Standard

Standard

behavior
A

Re nement

180 of 214

behavior
B

Pa

Interface
Process

Pb

behavior
B

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol operations

 Protocols usually consist of ve atomic operations


waiting for an event on input control line
assigning value to output control line
reading value from input data port
assigning value to output data port
waiting for xed time interval

 Protocol operations may be speci ed in one of three ways


Finite state machines (FSMs)
Timing diagrams
Hardware description languages (HDLs)

Re nement

181 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol speci cation : FSMs


 Protocol operations ordered by sequencing between states
 Constraints between events may be speci ed using timing arcs
 Conditional & repetitive event sequences require extra states, transitions
start

start

ADDRp <= AddrVar(7 downto 0);


a1 ARDYp <= 1;

b1
(RDp = 1)

(ARCVp = 1 )
ADDRp <= AddrVar(15 downto 8);
a2 AREQp <= 1;

MAddrVar := MADDRp

b3

MDATAp <=
MemVar (MAddrVar)

(100 ns)

(DRDYp = 1 )
a3

DataVar <= DATAp

Protocol Pa

Re nement

b2

182 of 214

Protocol Pb
Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol speci cation : Timing diagrams

 Advantages:
Ease of comprehension, representation of timing constraints

 Disadvantages:
Lack of action language, not simulatable
Dif cult to specify conditional and repetitive event sequences
ARDYp
ADDRp

7..0

MADDRp

15..0

15..8

RDp

ARCVp
DREQp

15..0

MDATAp

DRDYp
100ns
15..0

DATAp

Protocol Pa

Re nement

183 of 214

Protocol Pb

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol speci cation : HDLs

 Advantages:
Functionality can be veri ed by simulation
Easy to specify conditional and repetitive event sequences

 Disadvantages:
Cumbersome to represent timing constraints between events
port ADDRp : out
bit_vector(7 downto 0);
port DATAp : in
bit_vector(15 downto 0);
port ARDYp : out bit;
port ARCVp : in bit;
port DREQp : out bit;
port DRDYp : in bit;
ADDRp <= AddrVar(7 downto 0);
ARDYp <= 1;
wait until (ARCVp = 1 );
ADDRp <= AddrVar(15 downto 8);
DREQp <= 1;
wait until (DRDYp = 1);
DataVar <= DATAp;

Protocol Pa

Re nement

184 of 214

ADDRp
DATAp
port MADDRp : in
bit_vector(15 downto 0);
port MDATAp : out
bit_vector(15 downto 0);
port RDp
: in bit;

16
ARDYp
ARCVp
DREQp
DRDYp

RDp
MADDRp
MDATAp

16

wait until (RDp = 1);


MAddrVar := MADDRp ;
wait for 100 ns;
MDATAp <= MemVar (MAddrVar);

16

Protocol Pb

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Interface process generation

 Input: HDL description of two xed, but incompatible protocols


 Output: HDL process that translates one protocol to the other
i.e. responds to their control signals and sequence their data transfers

 Four steps required for generating interface process (IP):


Creating relations
Partitioning relations into groups
Generating interface process statements
interconnect optimization

Re nement

185 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

IP generation: creating relations

 Protocol represented as an ordered set of relations


 Relations are sequences of events/actions
Protocol Pa

Relations

ADDRp <= AddrVar(7 downto 0);


ARDYp <= 1;
wait until (ARCVp = 1 );
ADDRp <= AddrVar(15 downto 8);
DREQp <= 1;
wait until (DRDYp = 1);
DataVar <= DATAp;

A1 [ (true) :
ADDRp <= AddrVar(7 downto 0)
ARDYp <= 1 ]
A2 [ (ARCVp = 1) :
ADDRp <= AddrVar(15 downto 8)
DREQp <= 1 ]
A3 [ (DRDYp = 1) :
DataVar <= DATAp ]

Re nement

186 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

IP generation: partitioning relations

 Partition the set of relations from both protocols into groups.


 Group represents a unit of data transfer
Protocol Pb

Protocol Pa
A1 (8 bits out)

B1 (16 bits in)

G1

B2 (16 bits out)

G2

A2 (8 bits out)

A3 (16 bits in)

G1
Re nement

187 of 214

= (

A1 A2 B1 )

G2

= (

B1 A3 )

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

IP generation: inverting protocol operations

 For each operation in a group, add its dual to interface process


 Dual of an operation represents the complementary operation
 Temporary variable may be required to hold data values
Interface Process
Atomic operation

Dual operation

wait until (Cp = 1)

Cp <= 1

Cp <= 1

wait until (Cp = 1)

var <= Dp

Dp <= TempVar

Dp <= var

TempVar := Dp

wait for 100 ns

Re nement

wait for 100 ns

188 of 214

/* (group G1) */
wait
until
(ARDYp
=
1);
8
TempVar1(7 downto 0) := ADDRp ;
ADDRp
ARCVp <= 1 ;
DATAp
wait until (DREQp = 1);
16
TempVar1(15 downto 8) := ADDRp ;
ARDYp
RDp <= 1 ;
ARCVp
MADDRp <= TempVar1;
/* (group G2) */
DREQp
wait for 100 ns;
DRDYp
TempVar2 := MDATAp ;
DRDYp <= 1 ;
DATAp <= TempVar2 ;

16

MADDRp
MDATAp
16

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

RDp

UC Irvine

IP generation: interconnect optimization

 Certain ports of both protocols may be directly connected


 Advantages:
Bypassing interface process reduces interconnect cost
Operations related to these ports can be eliminated from interface process
Interface Process
ADDRp
8
ARDYp
ARCVp

DREQp
DRDYp

DATAp

Re nement

189 of 214

wait until (ARDYp = 1);


TempVar1(7 downto 0) := ADDRp ;
ARCVp <= 1 ;
wait until (DREQp = 1);
TempVar1(15 downto 8) := ADDRp ;
RDp <= 1 ;
MADDRp <= TempVar1;
wait for 100 ns;
DRDYp <= 1 ;
16

MADDRp
16
RDp

MDATAp

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Transducer synthesis [BK87]

 Input: Timing diagram description of two xed protocols


 Output: Logic circuit description of transducer
 Steps for generating logic circuit from timing diagrams:
Create event graphs for both protocols
Connect graphs based on data dependencies or explicitly speci ed ordering
Add templates for each output node in combined graph
Merge and connect templates
Satisfy min/max timing constraints
Optimize skeletal circuit

Re nement

190 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Generating event graphs from timing diagrams


e.g. FIFO stack control cell
Ri
L

Ro
Ri
Ao

Ai

Cell

Ro

Ao
Ai

Ri

Ri
L

Ro

Ro
Ao

Ao
Ai

Re nement

191 of 214

Ai

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Deriving skeletal circuit from event graph

Ao
Ri
L
Ao
Ri
L
Ao
Ro
L
Ao
Ro
L

Ri
Ro
L
Ri
Ro
L

S
R

S
Q

Ro

Ai

L
Ai
Ro
L
Ai
Ro

S
R

 Advantages:
Synthesizes logic for transducer circuit directly
Accounts for min/max timing constraints between events

 Disadvantages:
Cannot interface protocols with different data port sizes
Transducer not simulatable with timing diagram description of protocols

Re nement

192 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hardware/Software interface re nement

v2

Hardware partition

Software partition
v1

v3

v2

v4

v1

Processor

Memory

s2

s1

B1

B2
p1

B1

B3

B2

p2

B4

v3
Buffer

Ports
p1

p2

p1

Re nement

193 of 214

v4

s2

s1

p3
ASIC

(a) Partitioned specification

Data access

p2

B3

B4

p3
(b) Mapping to architecture

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Tasks of hardware/software interfacing

 Data access (e.g., behavior accessing variable) re nement


 Control access (e.g., behavior starting behavior) re nement
 Select bus to satisfy data transfer rate and reduce interfacing cost
 Interface software/hardware components to standard buses
 Schedule software behaviors to satisfy data input/output rate
 Distribute variables to reduce ASIC cost and satisfy performance

Re nement

194 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary and future directions

 In this section, we described:


Re nement of variable groups: variable folding, address translation
Re nement of channel groups: bus and protocol generation
Resolution of access con icts: arbiter generation
Re nement of incompatible interfaces: IP generation, transducer synthesis

 Future work should address the following issues:


Effects of bus arbitration delays on performance of a behavior
Developing metrics to guide selection of protocols and arbitration schemes
Ef cient synthesis of arbiter and interface processes

Re nement

195 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Methodology

 Past design effort focused on lower levels


 Higher levels lack well-de ned methodology and tools
 Paradigm shift to higher levels can increase productivity
 Need methodology and tools for system level

196 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

 Basic concepts in design methodology


 Example
 A design methodology
 A generic synthesis system
 Conceptualization environment

Methodology

197 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Items a design methodology must specify

 Syntax and semantics of input and output


 Algorithms for transforming input to output
 Components to be used in the design implementation
 De nition and ranges of constraints
 Mechanism for selection of architectural styles
 Control strategies (scenarios or scripts)

Methodology

198 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Example: Interactive TV processor

InteractiveTvProcessor
audio_out

audio_in
Analog
subsystem

video_in

Digital
subsystem

video_out

Analog
subsystem

av_cmd
video

audio +
commands

button

audio

video

keypad
receiver
IC

Main computer

Methodology

199 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Examples data ow behavior


Digital subsystem
audio_in

audio_out

audio1[100k][8]

GenerateAudio

StoreAudio
audio2[100k][8]
video_in

video_out
video[500k][8]

ProcessAVCmd

av_cmd[8]

StoreGenerateVideo

OverlayCharacters

fonts[128][16][16]

screen_chars[30][30][8]
av_cmd
StoreAVCmd

ProcessMainCmds
main_cmds

Methodology

200 of 214

ProcessRemoteButtons
button

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Examples implementation after system design


Digital subsystem
Memory1

Memory2

audio1[100k][8]
video[500k][8]
audio2[100k][8]

audio_in

audio_out

video_in

video_out

ASIC1

ASIC2

StoreAudio

Memory3

StoreGenerateVideo

fonts[128][16][16]

StoreAVCmd
screen_chars[30][30[]8]

GenerateAudio
av_cmd[8]

av_cmd

Processor
ProcessAVCmd

ProcessRemoteButtons

ProcessMainCmds

OverlayCharacters

main_cmds

Methodology

201 of 214

button

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

An example design methodology


Proposed methodology

Current practice
Functionality specification

Natural language
Manual

Functional specification

Executable language

System design

Allocation
Partitioning
Refinement

bus
Processor
Funct.
Spec.

ASIC
Funct.
Spec.

ASIC
Funct.
Spec.

Memory
Variables

Component implementation

detailed bus protocol


Processor

C
code

Methodology

202 of 214

ASIC

ASIC

RTL
struct.

RTL
struct.

Memory
mapped
address
space

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

System-design tasks

Functional objects

Systemdesign tasks
Allocation

Partitioning

Variables

Memories

Variables to memories

Behaviors

Processors

Behaviors to processors

Interfacing

Channels

Buses

Channels to buses

Arbitration/protocols

Methodology

203 of 214

Refinement
Address assignment

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

One possible ordering of tasks

1. Functionality specification
Specification

Memory allocation

Variabletomemory partitioning

Bus allocation

Channeltobus partitioning
2. System design
ASIC/processor allocation

BehaviortoASIC/processor partitioning

Interface synthesis

Arbiter synthesis
3. Component implementation
Implement software

Methodology

204 of 214

Implement hardware

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Generic synthesis system requirements

 Completeness
All levels of design, all implementation styles

 Extensibility
Allow addition of new algorithms and tools

 Controllability
User control of tools, design-quality feedback

 Interactivity
Partial design, design modi cation

 Upgradability
Evolve to describe-and-synthesize method

Methodology

205 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A generic synthesis system


System Specification

Designer

ASIC
synthesis

SDB

Compilation

Logic/Sequential
synthesis

CDB

Conceptualization environment

Software
synthesis

Intermediate forms

Description generators

Verification/simulation suite

System
synthesis

Physical design
synthesis

Assembly code

Methodology

206 of 214

ASIC description
to manufacturing

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A generic system-synthesis tool


System behavioral specification

Compiler

Allocator

Transformer

SR

Estimators

Partitioner

Interface &
arbitration
synthesis
Systemmodule
behavioral specifications

To software synthesis

Methodology

207 of 214

To chip synthesis

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A generic chip-synthesis tool


Behavioral
description

Compiler

Scheduler

Component
selector
Storage
binder

CDFG

Functional unit
binder
Interconnection
binder
Module
selector

Technology
mapper
CDB
Microarchitecture
optimizer

Logic/Sequential synthesis

To physical design

Methodology

208 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A generic logic-synthesis tool


State
tables

Boolean
expressions

Timing
diagrams

Memory
specifications

State
minimization

Timing graph
compiler

Memory
synthesis

State
encoding

Interface
synthesis

Logic
minimization

Technology
mapping

Physical design

Methodology

209 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Conceptualization environment

 Tool is only effective if the designer can use it


Understandable display of data
Highlight design parts that need attention

 Must support many design avenues

Methodology

210 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A system-synthesis tool interface

Module
type

Mappings
System
ASIC1

 Allocation
 Partition
 Estimates
 Constraints

X100

Execution
Area
time

105
/100*
30

CaptureAudio

100/110

GenerateAudio

100/110

ASIC2

X100

30

CaptureGenerateVideo

100/110

CaptureAVCmd

100/110

Memory1

V1000

10

V1000

10

Y900

25

Pins

Instr

16000 46/60
/20000

18000 48/60
/20000

audio_array1
audio_array2
Memory2
video_array
Processor1

6000
/5000*

ProcessRemoteButtons
ProcessMiscCmds
Cost: 5.43

Methodology

211 of 214

View options

Partition/Allocate

Refine

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

An optional design view

Quality metric

Estimate/
Constraint

$(System)

105/100

Executiontime(CaptureAudio)

100/110

Executiontime(GenerateAudio)

100/110

Executiontime(CaptureGenerateVideo)

100/110

Executiontime(CaptureAVCmd)
Area(ASIC1)

100/110
16000/20000

Area(ASIC2)

18000/20000

Pins(ASIC1)

56/60

Pins(ASIC2)

58/60

Instr(Processor1)

6000/5000

Violation?

Methodology

212 of 214

constraint

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary

 Three-step design methodology


Functionality speci cation
System design
Component implementation

 Major tasks in system design


Allocation
Partitioning
Re nement

 Generic synthesis tool


 Conceptualization environment
Crucial to practical use

Methodology

213 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Future directions

 Advanced estimation methods


 Formal veri cation
 Testability
 Frameworks and databases
 Regularity exploiting
 System-level transformations
 Feedback incorporation

Methodology

214 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

References
[BHS91] F. Belina, D. Hogrefe, and A. Sarma. SDL with Applications from Protocol Speci cations. Prentice Hall, 1991.
[BK87] G. Borriello and R.H. Katz. \Synthesis and optimization of interface transducer logic,". In Proceedings of the International
Conference on Computer-Aided Design, 1987.
[CS86] C.Tseng and D.P. Siewiorek. \Automated synthesis of datapaths in digital systems,". IEEE Transactions on Computer-Aided
Design, pages 379{395, July 1986.
[EHB94] R. Ernst, J. Henkel, and T. Benner. \Hardware-software cosynthesis for microcontrollers,". In IEEE Design & Test of Computers, pages 64{75, December 1994.
[FM82] C.M. Fiduccia and R.M. Mattheyses. \A linear-time heuristic for improving network partitions,". In Proceedings of the Design
Automation Conference, 1982.
[GD90] R. Gupta and G. DeMicheli. \Partitioning of functional models of synchronous digital systems,". In Proceedings of the International Conference on Computer-Aided Design, pages 216{219, 1990.
[GD92] R. Gupta and G. DeMicheli. \System-level synthesis using re-programmable components,". In Proceedings of the European
Conference on Design Automation (EDAC), pages 2{7, 1992.
[GD93] R. Gupta and G. DeMicheli. \Hardware-software cosynthesis for digital systems,". In IEEE Design & Test of Computers, pages
29{41, October 1993.
[GVN94] D.D. Gajski, F. Vahid, and S. Narayan. \A system-design methodology: Executable-speci cation re nement,". In Proceedings
of the European Conference on Design Automation (EDAC), 1994.
[Hal93] Nicolas Halbwachs. Synchronous Programming of Reactive Systems. Kluwer Academic Publishers, 1993.
[Hoa78] C.A.R. Hoare. \Communicating sequential processes,". Communications of the ACM, 21(8): 666{677, 1978.
[IEE88] IEEE Inc., N.Y. IEEE Standard VHDL Language Reference Manual, 1988.
[JMP88] R. Jain, M. Mlinar, and A. Parker. \Area-time model for synthesis of non-pipelined designs,". In Proceedings of the International Conference on Computer-Aided Design, 1988.
[Joh67] S.C. Johnson. \Hierarchical clustering schemes,". Psychometrika, pages 241{254, September 1967.

[KC91] Y.C. Kirkpatrick and C.K. Cheng. \Ratio cut partitioning for hierarchical designs,". IEEE Transactions on Computer-Aided
Design, 10(7): 911{921, 1991.
[KGV83] S. Kirkpatrick, C.D. Gelatt, and M. P. Vecchi. \Optimization by simulated annealing,". Science, 220(4598): 671{680, 1983.
[KL70] B.W. Kernighan and S. Lin. \An ef cient heuristic procedure for partitioning graphs,". Bell System Technical Journal, February
1970.
[LT91] E.D. Lagnese and D.E. Thomas. \Architectural partitioning for system level synthesis of integrated circuits,". IEEE Transactions
on Computer-Aided Design, July 1991.
[MK90] M.C. McFarland and T.J. Kowalski. \Incorporating bottom-up design into hardware synthesis,". IEEE Transactions on
Computer-Aided Design, September 1990.
[NG92] S. Narayan and D.D. Gajski. \System clock estimation based on clock slack minimization,". In Proceedings of the European
Design Automation Conference (EuroDAC), 1992.
[NG94] S. Narayan and D.D. Gajski. \Synthesis of system-level bus interfaces,". In Proceedings of the European Conference on
Design Automation (EDAC), 1994.
[NVG92] S. Narayan, F. Vahid, and D.D. Gajski. \System speci cation with the SpecCharts language,". In IEEE Design & Test of
Computers, Dec. 1992.
[PK89] P.G. Paulin and J.P. Knight. \Algorithms for high-level synthesis,". In IEEE Design & Test of Computers, Dec. 1989.
[PPM86] A.C. Parker, T. Pizzaro, and M. Mlinar. \MAHA: A program for datapath synthesis,". In Proceedings of the Design Automation
Conference, 1986.
[TM91] D.E. Thomas and P. Moorby. The Verilog Hardware Description Language. Kluwer Academic Publishers, 1991.
[VG92] F. Vahid and D.D. Gajski. \Speci cation partitioning for system design,". In Proceedings of the Design Automation Conference,
1992.
[VGG93] F. Vahid, J. Gong, and D.D. Gajski. \A hardware-software partitioning algorithm for minimizing hardware,". UC Irvine, Dept.
of ICS, Technical Report 93-38,1993.

You might also like