Professional Documents
Culture Documents
HIGHER
PERFORMANCE
(2x the performance of
V-II Pro)
Virtex-4
XtremeDSP
Solution
SLICE
REDUCTION
(All designs in this
presentation use at
least 50% less slices
than V-II Pro)
MAXIMUM FLEXIBILTY
(Opmode of the XtremeDSP Slice provides over )
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
Agenda
Virtex-4 Family
The Xtreme DSP Slice
Filter Techniques
Case Studies
Digital Up Converter
2-D
www.xilinx.com/dsp
www.xilinx.com/dsp
Virtex-4 Family
Virtex-4 is the first Xilinx family to introduce three separate platforms optimized
for different application domains. This fundamental shift provided the greatest
silicon efficiency and optimal cost.
Virtex-4 LX
Platform
Optimized for
high-performance
Logic
Virtex-4 FX
Platform
Virtex-4 SX
Platform
Optimized for
Embedded Processing
and high-speed
Serial Connectivity
Optimized for
high-performance
Signal Processing
www.xilinx.com/dsp
Virtex-4 SX
Note:
The SX family emphasizes
Xilinx commitment to DSP
applications by providing a
strong skew toward to
dedicated arithmetic units
versus logic.
4VSX25
2,650 CLB
128 BRAM
6,144 CLB
320 BRAM
www.xilinx.com/dsp
www.xilinx.com/dsp
18
M REG
CE
D
36
18
www.xilinx.com/dsp
CE
D
Q
2-Deep
18
M REG
CE
D
36
A REG
CE
D
Q
2-Deep
18
www.xilinx.com/dsp
B REG
0
1
18
CE
D
Q
2-Deep
Subtract
18
M REG
CE
D
36
P REG
A REG
CE
D
Q
2-Deep
CE
D
18
48
CarryIn
48
BCIN
www.xilinx.com/dsp
BCOUT
B REG
0
1
18
CE
D
Q
2-Deep
Subtract
18
M REG
CE
D
36
P REG
A REG
CE
D
Q
2-Deep
CE
D
18
48
CarryIn
48
48
48
PCIN
BCIN
www.xilinx.com/dsp
BCOUT
A:B
36
48
B REG
0
1
18
CE
D
Q
2-Deep
Subtract
18
M REG
CE
D
A REG
CE
D
Q
2-Deep
72 36 0
36
P REG
CE
D
18
0
17-bit shift
17-bit shift
48
CarryIn
48
7
OpMode
48
48
PCIN
BCIN
www.xilinx.com/dsp
Dynamically Reconfigurable
DSP OPMODEs
OpMode
Zero
Hold P
A:B Select
Multiply
C Select
Feedback Add
36-Bit Adder
P Cascade Select
P Cascade Feedback Add
P Cascade Add
P Cascade Multiply Add
P Cascade Add
P Cascade Feedback Add Add
P Cascade Add Add
Hold P
Double Feedback Add
Feedback Add
Multiply-Accumulate
Feedback Add
Double Feedback Add
Feedback Add Add
C Select
Feedback Add
36-Bit Adder
Multiply-Add
17-Bit Shift P Cascade Select
17-Bit Shift P Cascade Feedback Add
17-Bit Shift P Cascade Add
17-Bit Shift P Cascade Multiply Add
17-Bit Shift P Cascade Add
17-Bit Shift P Cascade Add Add
17-Bit Shift Feedback
17-Bit Shift Feedback Feedback Add
17-Bit Shift Feedback Add
17-Bit Shift Feedback Multiply Add
17-Bit Shift Feedback Add
6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
Z
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
1
1
1
4
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
Y
3 2
0 0
0 0
0 0
0 1
1 1
1 1
1 1
0 0
0 0
0 0
0 1
1 1
1 1
1 1
0 0
0 0
0 0
0 1
1 1
1 1
1 1
0 0
0 0
0 0
0 1
0 0
0 0
0 0
0 1
1 1
1 1
0 0
0 0
0 0
0 1
1 1
X
1 0
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 1
0 0
1 1
0 0
1 0
1 1
0 1
0 0
Output
+/- Cin
+/- (P + Cin)
+/- (A:B + Cin)
+/- (A * B + Cin)
+/- (C + Cin)
+/- (C + P + Cin)
+/- (A:B + C + Cin)
PCIN +/- Cin
PCIN +/- (P + Cin)
PCIN +/- (A:B + Cin)
PCIN +/- (A * B + Cin)
PCIN +/- (C + Cin)
PCIN +/- (C + P + Cin)
PCIN +/- (A:B + C + Cin)
P +/- Cin
P +/- (P + Cin)
P +/- (A:B + Cin)
P +/- (A * B + Cin)
P +/- (C + Cin)
P +/- (C + P + Cin)
P +/- (A:B + C + Cin)
C +/- Cin
C +/- (P + Cin)
C +/- (A:B + Cin)
C +/- (A * B + Cin)
Shift(PCIN) +/- Cin
Shift(PCIN) +/- (P + Cin)
Shift(PCIN) +/- (A:B + Cin)
Shift(PCIN) +/- (A * B + Cin)
Shift(PCIN) +/- (C + Cin)
Shift(PCIN) +/- (A:B + C + Cin)
Shift(P) +/- Cin
Shift(P) +/- (P + Cin)
Shift(P) +/- (A:B + Cin)
Shift(P) +/- (A * B + Cin)
Shift(P) +/- (C + Cin)
www.xilinx.com/dsp
Dynamic Opmode
Complex Multiplier
18
B
sel1
M REG
P REG
CE
D Q
CE
D Q
CE_R
CE
D Q
REAL
CE_I
CE
D Q
IMAG
A REG
18
CE
D Q
CE
D Q
0
sel2
Opmode
OPMODE
Sub
Sel1
Sel2
CE_R
Sub
CE_I
Performance
400 Mhz
Size:
1 XDSP Slice
59 Slices
(5 for control)
www.xilinx.com/dsp
Dynamic Opmode
Complex Multiplier
18
B
sel1
M REG
P REG
CE
D Q
CE
D Q
CE_R
CE
D Q
REAL
CE_I
CE
D Q
IMAG
A REG
18
-BxD+0
BxD
CE
D Q
CE
D Q
0
sel2
Opmode
Multiply Subtract
OPMODE
Sub
0001010
Sel1
Sel2
Sub
CE_R
CE_I
Performance
400 Mhz
Size:
1 XDSP Slice
59 Slices
(5 for control)
www.xilinx.com/dsp
Dynamic Opmode
Complex Multiplier
18
B
sel1
- BA.C
x D++(-B.D)
0
A
BxC
D
CE
D Q
M REG
P REG
CE
D Q
CE
D Q
CE_R
CE
D Q
REAL
CE_I
CE
D Q
IMAG
A REG
18
CE
D Q
0
sel2
Opmode
OPMODE
Sub
Sel1
Sel2
Sub
CE_R
CE_I
Multiply Subtract
0001010
Multiply Accumulate
0101010
Performance
400 Mhz
Size:
1 XDSP Slice
59 Slices
(5 for control)
www.xilinx.com/dsp
Dynamic Opmode
Complex Multiplier
18
B
sel1
- BA.C
B.C
x+D0++(-B.D)
0
AxC
B
D
CE
D Q
M REG
P REG
CE
D Q
CE
D Q
CE_R
CE
D Q
REAL
CE_I
CE
D Q
IMAG
A REG
18
CE
D Q
00
sel2
Opmode
OPMODE
Sub
Sel1
Sel2
Sub
CE_R
CE_I
Multiply Subtract
0001010
Multiply Accumulate
0101010
Multiply
0001010
Performance
400 Mhz
Size:
1 XDSP Slice
59 Slices
(5 for control)
www.xilinx.com/dsp
Dynamic Opmode
Complex Multiplier
18
B
sel1
A.D
- BA.C
B.C
x+D0B.C
++(-B.D)
0
A
BxD
C
CE
D Q
M REG
P REG
CE
D Q
CE
D Q
CE_R
CE
D Q
REAL
CE_I
CE
D Q
IMAG
A REG
18
CE
D Q
00
sel2
Opmode
OPMODE
Sub
Sel1
Sel2
Sub
CE_R
CE_I
Multiply Subtract
0001010
Multiply Accumulate
0101010
Multiply
0001010
Multiply Accumulate
0101010
Performance
400 Mhz
Size:
1 XDSP Slice
59 Slices
(5 for control)
www.xilinx.com/dsp
PCOUT
B
C
A
P
XtremeDSP Tile
B
C
A
P
BCIN
PCIN
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
90.0
80.0
Virtex-II
Virtex-II Pro
Pro
~39
~39 mW/100MHz
mW/100MHz
70.0
A
verageP
ow
er(m
W
)
60.0
50.0
40.0
30.0
Virtex-4
Virtex-4
3.3
3.3 mW/100MHz
mW/100MHz
20.0
10.0
0.0
0
100
200
300
400
Fre que ncy (M Hz)
500
600
Conditions: TT, 25C, nominal voltage, Fully pipelined multiply-add mode, random vectors
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
Description
Virtex-4 Logic
Functions
Stratix II Logic
Functions
1.35 W
2.35x
>> 11 Watt
-4 and
-II
Watt of
of power
power difference
difference between
between Virtex
Virtex-4
and Stratix
Stratix-II
in
in DSP
DSP Applications
Applications
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
Description
Virtex-4 Logic
Functions
1W
1.07x
Stratix II Logic
Functions
>> 11 Watt
-4 and
-II
Watt of
of power
power difference
difference between
between Virtex
Virtex-4
and Stratix
Stratix-II
in
in DSP
DSP Applications
Applications
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
Filter Techniques
For this analysis the software used was:
ISE 7.1.1i
Quartus 4.1 sp1
FIR Compiler 3.2.1
www.xilinx.com/dsp
i=N-1
Y(n) =
i=0
Viewed as a Diagram
ki.X(n-i)
Accumulate
N times
Z-1
Z-1
Z-1
Z-1
k0
k1
k1
k3
kN-1
Delay
Coefficient
Multiply
X(n)
Multiply
Y(n)
Summation
www.xilinx.com/dsp
100
50
10
5
1
0.5
1
10
20
50
200
100
500
1000
Log Scale
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
20
DSP48 Slice
opmode = 0100101
18
xn
Input Data
366 x 18
CE
Data Addr
we
Control
Coef Addr
Coefficients
366 x 18
45
Load
opmode (5)
Performance:
392 MHz (-10)
Size:
1 XDSP Slice
1 Block RAM
45 Slices
Prepared by : Niall Battson (Xilinx) 2004
yn
23
z-4
www.xilinx.com/dsp
M4K (512 x 9) x2
xn
18
Input Data
366 x 18
CE
Data Addr
we
Control
Coef Addr
Coefficients
366 x 18
45
yn
M4K (512 x 9) x2
Performance:
217 MHz (-5)
Size:
1 DSP Block (1 Mult)
4 M4K
659 ALMs
45%
45% Slower
Slower
Performance
Performance
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
200
100
50
10
5
1
0.5
1
10
20
50
200
100
500
1000
www.xilinx.com/dsp
x(n)
Coefficients are
from left to right.
This causes the
latency to be as
large and grow
with the increase of
coefficients
18
K0
K1
K2
K21
K22
DSP48 Slice
opmode = 0000101
41
DSP48 Slice
opmode = 0010101
Dedicated cascade
connections (PCOUT
and PCIN) are exploited
to achieve maximum
performance
Performance:
400 MHz (-10)
Size:
23 XDSP Slice
Prepared by : Niall Battson (Xilinx) 2004
y(n)
www.xilinx.com/dsp
18
K0
K1
K2
K3
K4
K5
K6
48%
48% Slower
Slower
Performance
Performance
K8
K7
K9
K10
The performance
bottleneck will still
always be the fabric
adder, which will greatly
limit performance
K11
Performance:
207 MHz (-5)
Size:
6 DSP Blocks
(23 Mults)
744 ALMs
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
200
100
Semi-Parallel
FIR Filters
50
10
5
1
0.5
1
10
20
50
200
100
500
1000
www.xilinx.com/dsp
Virtex-4
4 Multiplier Systolic SP FIR
18
K0
K1
K2
K3
K4
K5
K6
K7
K8
K9
K10
K11
K12
K13
K14
K15
20
CE
40
DSP48 Slice
opmode = 0000101
DSP48 Slice
opmode = 0010101
DSP48 Slice
opmode = 0010010
Performance:
400 MHz (-10)
Size:
5 XDSP Slice
104 Slices
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
y(n)
www.xilinx.com/dsp
Stratix-II
4 Multiplier Semi-Parallel FIR
Filter Specification: Sampling Frequency = 74.176 MHz, Coefficients = 16
18
x(n)
4 x 18
4 x 18
4 x 18
4 x 18
40
CE
4 x 18
4 x 18
4 x 18
4 x 18
DSP Block
opmode = 4 MULTADD
36%
36% Slower
Slower
Performance
Performance
Performance:
256 MHz (-5)
Size:
1 DSP Block
(4 Multipliers)
188 ALMS
www.xilinx.com/dsp
y(n)
Multi-Channel Multi-Rate
FIR Filters
www.xilinx.com/dsp
Virtex-4
Multi-Channel Multi-Rate FIR
Filter Specification:
Input Frequency = 3.84 Mhz, Coefficients = 192, Interpolation Rate Change = 2,
Channels = 8, Data Width = 12-bit, Coefficient Width = 15-bits
Channel 1
96 Coefficient Phase Filter
34
12
CE
12
CE
www.xilinx.com/dsp
Virtex-4
Multi-Channel Multi-Rate FIR
x1(n)
Filter Specification:
Input Frequency = 3.84 Mhz, Coefficients = 192+4, Interpolation Rate Change = 2,
Channels = 8, Data Width = 12-bit, Coefficient Width = 15-bit
Input buffer holds 8 individual channels of 7
of values per channel
x2(n)
x3(n)
x4(n)
12
x5(n)
x6(n)
x7(n)
x8(n)
WE
WE1
Input Data
56 x 12
Input Data
56 x 12
Coefficients
112 x 15
Coefficients
112 x 15
WE12
Input Data
56 x 12
136
Coefficients
112 x 15
y1(n)
y2(n)
24
y3(n)
y4(n)
34
y5(n)
y6(n)
DSP48 Slice1
opmode = 0000101
DSP48 Slice
opmode = 0010101
DSP48 Slice14
opmode = 0010101
DSP48 Slice
opmode = 0010010
Performance
447 MHz (-11)
Size:
15 XDSP Slice
14 Block RAM
233 Slices
(73 for control)
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
y7(n)
y8(n)
Stratix-II
Multi-Channel Multi-Rate FIR
Filter Specification:
Input Frequency = 3.84 Mhz, Coefficients = 192, Interpolation Rate Change = 2,
Channels = 8, Data Width = 12-bit, Coefficient Width = 15-bit
x1(n)
x2(n)
12
x3(n)
40 x 12
x4(n)
x5(n)
40 x 12
x6(n)
x7(n)
x8(n)
40 x 12
40 x 12
Input Mux is
the same size
as in Virtex-4
as there are
no F5MUXs
4 Mult / Add
DSP Block
Input time
series buffer
Input time
series buffer
Input time
series buffer
Input time
series buffer
4 Mult / Add
DSP Block
4 Mult / Add
DSP Block
4 Mult / Add
DSP Block
4 Mult / Add
DSP Block
34
www.xilinx.com/dsp
Stratix-II
Multi-Channel Multi-Rate FIR
Filter Specification:
Input Frequency = 3.84 Mhz, Coefficients = 192, Interpolation Rate Change = 2,
Channels = 8, Data Width = 12-bit, Coefficient Width = 15-bit
80 x 15
12
15
80 x 15
12
80 x 15
15
12
15
80 x 15
12
15
Performance
270 Mhz (-4)
Size:
5 DSP Blocks
(20 multipliers)
30 M4K Blocks
1271 ALMs
DSP Block
29
40%
40% Slower
Slower
Performance
Performance
www.xilinx.com/dsp
PFIR
CFIR
CIC Filter
64 Tap, Interpolate by 2
32 Tap, Interpolate by 2
Interpolate by 16
Input
DDS
cos (2..f1.t)
sin (2..f2.t)
PFIR
CFIR
CIC Filter
64 Tap, Interpolate by 2
32 Tap, Interpolate by 2
Interpolate by 16
Output
www.xilinx.com/dsp
CIC Coarse
Gain
PFIR
CIC
Rounding
Subtraction
Mixing
DDS
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
www.xilinx.com/dsp
y (n, m ) = h0 (k ) h1 (l ).x(n k , n l )
k = N
l = N
l =+ N
k = + N
= h1 (k ) h0 (l ).x(n k , n l )
l = N
k = N
k =+ N
Vertical
Line Buffer
Vertical Filter
24 Tap
Horizontal Filter
24 Tap
www.xilinx.com/dsp
x(n)
10
1440 x 10
1440 x 10
1440 x 10
16 x 10
16 x 10
16 x 10
1440 x 10
5
16 x 10
13
Reloadable
Coefficients
CE
26
DSP48 Slice1
opmode = 0000101
DSP48 Slice2
opmode = 0010101
DSP48 Slice4
opmode = 0010101
DSP48 Slice6
opmode = 0010101
Number of Multipliers =
DSP48 Slice
opmode = 0010010
111.38 x 24
450
= 5.94
www.xilinx.com/dsp
y(n)
www.xilinx.com/dsp
www.xilinx.com/dsp
9
9
Knowledge is Power
The next best thing to knowing something is knowing where to find it
- Samuel Johnson
5 Application Notes available in the
Virtex-4 User Guide in regard to
implementation specifics
Many Reference Designs in:
VHDL
Verilog
System Generator for DSP
For Further Details visit..
www.xilinx.com/dsp
Prepared by : Niall Battson (Xilinx) 2004
www.xilinx.com/dsp
Knowledge is Power
The desire of knowledge, like the thirst of riches, increases ever with
the acquisition of it. - Sterne
Load
opmode (5)
Filter Specification:
Sampling Frequency = 1.2288 Mhz, Coefficients = 104, Channels = 3
Control Coefficients
117 x 10 117 x 18
WE
18
xn
Input Data
117 x 18
Counter
CE
CE
Four dimensions?
DSP48 Slice
opmode = 0100101
23
data
addr
x1(n)
4
Cont rol logic is reduced
to only a counter
Max Sample Rate =
2 DSP Courses:
Coef Addr
Clock Rate
Number of Taps
The A port is a
concatenation of the
coefficients, coefficient
address, WE, CE and
Load signals
DSP48
18Slice
opmode = 0100101
x2(n)
Filter Size:
y1(n)
1 XDSP SliceTranspose FIR
y2(n)
0
44
1 Block 50RAM
y3(n)
40 0
Systolic
FIR FIR
(symmetric
Parallel
Filters & non)
28 Slices
30 0
66
opmode (5)
20 0
he re as it has to accomm odate
Load
Increasing
10z0
Semi-Parallel
Filter Size:
1
Xi li nx C
on fide
n ti aldata stre am. The
the
TDM
No. of Multipliers
50
Dist
Mem FIR
embedd ed con tro l can also be
1 XDSP
Slice
1
10
used .
1 Block
RAM
Semi-Parallel
116 Slices
1
Clock Rate
BRAM FIR
Ma x Sample Rate =
(30 for control)
Numbe r o f Taps x Number of Channels
Embedded
x3(n)
Coefficients
104 x 18
-3
0.1
Dist Mem
MACC FIR
Control
MACC FIR
Normal Control
MACC FIR
0.00 1
10 20 30 40
50 60 70
80 90
10 0
20 0 30 0 40 0 50 0 60 0 70 0 80 0 90 0
10 00
Consider Multi-Channel
Multi-Rate FIR Filters
together
MySupport.xilinx.com
- DSP Implementation
Techniques (3 day)
Xilinx Confidential
Over 500
Pages
www.xilinx.com/dsp