Professional Documents
Culture Documents
5 March 2008
FPGA-based ASIC
Design and Verification
Dejan Markovic
Electrical Engineering Department
University of California, Los Angeles
The Issues I am Going to Address
2
Optimization Approach
Power efficiency: circuit-level (C,V)
Performance and area: architectural techniques
Unified Simulink description
12,9 y
U [4x4]
10,8
E Sigma
14,9
r [4x4]
W [4x4]
tr.per PE U-Sigma
Circuit EN
r [4x4]
RY
AZ
y [4x1]y [4x4] AZ
E ky [4x1] AZ
KY 8,5 FPGA
optimization hardware design I/O verification
3
Circuit-Level Optimization Framework
Sensitivity based optimization
Energy
– Balance sensitivity to all variables SA (A0,B0)
D0 Delay
topology B
Energy
Constraints
topology A
Delay
Reference design Goal: find optimal E-D
– Dmin sizing @ Vddmax, Vthref tradeoff for a datapath
4
Circuit-Level Results: Tree Adder
15
S
1.5 Energy map
SW→∞
SVth=0.2
0
S
SW=22 SVdd=1.5
)
15
,B
1 SVth=22
15
(A
ref
SVdd=16
E/Eref
65%
)0
,B
in
C
0
(A
0.5
SW=1 i on
t
SVth=1
m iz a
SVdd=1 p ti
O
0
0 0.5 1 1.5
D/Dref
[D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, JSSC Aug’04]
10
Process: Architecture:
L-Vt P2: parallel 2
S-Vt T2: time-mux 2
P2 T2
Eop / Eref @ Vddmax
H-Vt
1
0.1
P2 g
T2 a lin
Sc
0.01
0.1 1 10
Top / Tref @ Vddmax
6
Simulink to Silicon Mapping
MDL to RTL conversion, automated P&R flow
Simulink
Fix-pt lib
MDL
Custom Speed
tool 1 Power
Area
ASIC
backend
7
Including FPGA Emulation
XSG hardware library, RTL translation scripts
Simulink
Hw lib
RTL
Custom Speed
tool 2 Power
Area
FPGA ASIC
backend backend
8
Closing the Loop: I/O Verification
I/O hardware library, automated FPGA flow
Simulink
I/O lib Hw lib
RTL
Custom Custom Speed
tool 3 tool 2 Power
Area
FPGA ASIC
backend backend
FPGA implements
ASIC logic analysis
[D. Markovic, C. Chang, B. Richards, H. So, B. Nikolic, R.W. Brodersen, CICC’07]
9
Design Approach
Hardware-equivalent
Simulink blocks
– Add, mult, shift, mux…
● Word-size, latency
10
Block Characterization
VDD scaling
mult gate sizing
Speed
Power
TClk @ VDDopt
Area
TClk @ add VDDref
VDDref
Latency 0 Energy
Goal: balanced logic depth and E/D sensitivity
11
Methodology for Architecture Selection
Energy-Area-Delay space for architecture comparison
– Time-mux, parallelism, pipelining, VDD scaling, sizing…
Energy
Block-level Datapath
line
Initial design Initial design
pipe
l
lle
pa
ra
r,
gate sizing gate sizing
pa
pip
Optimal design
intl, fold x intl,
mu Optimal fold tim VDD scaling
e- e-m
tim design ux
Area 0 Delay
12
Example: 4x4 SVD Algorithm
This complexity is hard to optimize in RTL
– 270 adders, 370 multipliers, 8 sqrt, 8 div
– Recursive LMS-based algorithm (nested feedback loops)
Energy
Interl. Fold
13.8x 2.6x 16b design
Area 0 Delay
14
Energy/Area Optimization
Step 1: Word-length optimization
Energy
Interl. Fold
13.8x 2.6x 16b design
Area 0 Delay
15
Energy/Area Optimization
Step 2: Gate size & VDD optimization
Energy
Interl. Fold
13.8x 2.6x 16b design
Area 0 Delay
16
Energy/Area Optimization
Step 2: Gate size & VDD optimization
Energy
Interl. Fold
7x
13.8x 2.6x 16b design
Area 0 Delay
17
Hardware Results
Result of Energy-Area-Performance Optimization
Comparison with ISSCC chips
100
2004 SVD
18-5
Area efficiency
10
(GOPS/mm2)
1998 1998
18-6 7-6 1998
1 18-3
2000
4-2 1999
15-5 2000
0.1 14-8
(90nm ST Micro) 2000
14-5
2.1 GOPS/mW 0.01
– 70 GOPS @ 100MHz 0.01 0.1 1 10
– Power = 34mW Energy efficiency
20 GOPS/mm2 (GOPS/mW)
– 70 GOPS in 3.5mm2 [D. Markovic, B. Nikolic, R.W. Brodersen, JSSC Apr’07]
ASIC + + = ASIC
I/O I/O
TB TB
19
Simulink I/O Test Model for the SVD
ASIC board
GPIO
FPGA board
6 σ22
4
σ32
2
σ42
0
0 8 16 24 32
Number of Symbols [k]
Folding N = 4
c d
Architecture 2
Folding N = 2
Architecture 1
Direct-mapping
Reference
Resulting Simulink/SynDSP
Architectures
1 Constraints
par
0.8 a lleli
sm
tim
Area
0.6 e-m
ux
0.4
0.2
1
0.8 1
Pe g
rfo 0.6 scalin 0.8
rm V DD 0.6
an 0.4 0.4 rgy
ce 0.2 0.2 Ene
25
References
ASIC design and verification
– D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, and R.W. Brodersen,
"Methods for True Energy‐Performance Optimization," IEEE J. Solid‐State
Circuits, vol. 39, no. 8, pp. 1282‐1293, Aug. 2004.
– D. Markovic, R.W. Brodersen, and B. Nikolic, "A 70GOPS 34mW Multi‐Carrier
MIMO Chip in 3.5mm2," in Proc. IEEE Int'l Symp. on VLSI Circuits (VLSI'06),
June 2006, pp. 196‐197.
– D. Markovic, B. Nikolic, R.W. Brodersen, “Power and Area Minimization for
Multidimensional Signal Processing,” IEEE J. Solid‐State Circuits, vol. 42, no.
4, pp. 922‐934, April 2007.
– D. Markovic, C. Chang, B. Richards, H. So, B. Nikolic, and R.W. Brodersen,
“ASIC Design and Verification in an FPGA Environment,” in Proc. IEEE Custom
Integrated Circuits Conf. (CICC’07), Sept. 2007, pp. 737‐740.
More publications available online
– www.ee.ucla.edu/~dejan
26
Acknowledgments
Funding support
– C2S2 Focus Center Research Program,
contract 2003‐CT‐888
Infrastructure support
– ST Microelectronics, Xilinx (hardware)
– Synplicity, Synopsys, Cadence (software)
27