Professional Documents
Culture Documents
Transformations
Acknowledgement:
Materials in this lecture are courtesy of the following sources and are used with permission.
Curt Schurgers
J. Rabaey, A. Chandrakasan, B. Nikolic. Digital Integrated Circuits: A Design Perspective.
Prentice Hall/Pearson, 2003.
Layout 101
3-D Cross-Section
p-type substrate
VDD
n-type well
metal/pdiff
contact
SiO2
n+
SiO2
n+
p+
Wp
p+
p+
n+
Lp
N-channel MOSFET
VDD
P-channel MOSFET
IN
OUT
Wn
S
G
Circuit Representation
D
IN
GND
metal
OUT
S
L15: 6.111 Spring 2006
poly
n+
diff
p+
diff
Layout
D
G
contact
frommetal
to ndiff
Ln
Custom Design/Layout
5-1 Mux
g64
CARRYGEN
SUMSEL
node1
2-1 Mux
9-1 Mux
ck1
SUMGEN
+ LU
1000um
REG
9-1 Mux
sum
sumb
to Cache
s0
s1
LU : Logical
Unit
Multiplexers
Shifter
Adder stage 1
Adder stage 2
Wiring
Loopback Bus
Loopback Bus
Loopback Bus
Wiring
Sum Select
Bit slice 2
Bit slice 63
Adder stage 3
Hand crafting the layout to achieve maximum clock rates (> 1Ghz)
Exploits regularity in datapath structure to optimize interconnects
L15: 6.111 Spring 2006
Design Iteration
Design Capture
Pre-Layout
Pre-Layout
Simulation
Simulation
Behavioral
Verilog
Verilog(or
(orVHDL
VHDL))
Structural
Logic
LogicSynthesis
Synthesis
Floorplanning
Floorplanning
Post-Layout
Post-Layout
Simulation
Simulation
Circuit
Circuit
Extraction
Extraction
Placement
Placement
Physical
Routing
Routing
Tape-out
Delay in (ns)!!
After Routing
After
Placement
Macro Modules
25632 (or 8192 bit) SRAM Generated by hard-macro module generator
Clock Distribution
Clock skew
D
To VDD Grid
Ccoup
To VDD Grid
Receiver
Cint
Rd
Cd
Driver
GROUND GRID
Pad
Pad
10
up
down
VCO
Divider
PFD
Loop filter
11
Scan Testing
...
0
1
ScanShift
shift out
0
1
CLK
ScanShift
shift in
ScanShift
shift in
Clk
ScanShift
Primary
Inputs
Scan-Flops
Load/Unload Cycles
Load/Unload Cycles
Primary
Outputs
Normal System
12
Behavioral Transformations
There are a large number of implementations of the same
functionality
These implementations present a different point in the
area-time-power design space
Behavioral transformations allow exploring the design
space a high-level
Optimization metrics:
power
area
time
13
Fixed-Coefficient Multiplication
Conventional Multiplication
Z=XY
Z7
X3 Y3
Z6
X3 Y2
X2 Y3
Z5
X3 Y1
X2 Y2
X1 Y3
Z4
X3
Y3
X3 Y0
X2 Y1
X1 Y2
X0 Y3
Z3
X2
Y2
X2 Y0
X1 Y1
X0 Y2
X1
Y1
X1 Y0
X0 Y1
X0
Y0
X0 Y0
Z2
Z1
Z0
Z7
Y = (1001)2 = 23 + 20
X3
Z6
X2
Z5
X1
Z4
X3
1
X3
X0
Z3
X1
0
X1
X0
1
X0
Z2
Z1
Z0
Z
<< 3
X2
0
X2
1 1
2N-2 + + 21 + 20
0 0 -1
2N-1 - 20
-1
-1
-1
10010001
<< 7
<< 4
15
Algebraic Transformations
Distributivity
Commutativity
A
A+B=B+A
(A + B) C = AB + BC
Common sub-expressions
Associativity
A
B
C
(A + B) + C = A + (B+C)
L15: 6.111 Spring 2006
16
FG
distributivity
A
B D
FG
I
1
17
D
D
D
D
D
Cutset retiming: A cutset intersects the edges, such that this would result in two disjoint
partitions of these edges being cut. To retime, delays are moved from the ingoing to the
outgoing edges or vice versa.
D
D
D
Benefits of retiming:
Modify critical path delay
Reduce total number of registers
18
h(1)
h(2)
Direct form
h(3)
y(n)
x(n)
(10) h(0)
associativity of
the addition
h(1)
h(2)
h(3)
Tclk = 22 ns
y(n)
retime
(4)
x(n)
h(0)
y(n)
h(1)
D
h(2)
D
h(3)
Transposed form
Tclk = 14 ns
Note: here we use a first cut analysis that assumes the delay of a chain of operators is the sum
of their individual delays. This is not accurate.
L15: 6.111 Spring 2006
19
How to pipeline:
1. Add extra registers at
all inputs
2. Retime
retime
D
D
Introductory Digital Systems Laboratory
20
y(n)
x(n)
A
y(n)
x(n)
2D
Try pipelining
this structure
distributivity
y(n)
x(n)
2D
D
A
associativity
y(n)
x(n)
y(n)
x(n)
A
retiming
2D
D
A
A2
L15: 6.111 Spring 2006
A2
precomputed
Introductory Digital Systems Laboratory
21
SOURCE
100
Tox
90
BODY
80
Leff
60
50
10000
40
1000
500
250
130
65
32
Path Delay
70
Temperature (C)
110
Due to
variations in:
Vdd, Vt, and
Temp
Delay
22
S reg
Mult1
Mac1
X reg
Add,
Sub,
Shift
Mult2
Mac2
23
24