You are on page 1of 23

DSP Design

Pipelining Pipelining
and
Parallel Processing
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Introduction Introduction
Pipeliningg
Splits the logic path by introducing pipiline registers
leads to a reduction of the critical path but introduce latency
Either increases the clock speed (or sampling speed) or reduces the Either increases the clock speed (or sampling speed) or reduces the
power consumption at same speed in a DSP system
Parallel Processing
Multiple outputs are computed in parallel in a clock period
The effective sampling speed is increased by the level of parallelism The effective sampling speed is increased by the level of parallelism
Can also be used to reduce the power consumption
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pi li i Pipelining
a b
x(n)
ax(n) abx(n)
( )
a b
x(n)
ax(n)
abx(n-1)
Z
-1
ax(n-1) ax(n) ax(n-1)
Critical path cut in half
Introduced
Latency by
p
double clock speed
or
Latency by
1cc
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
lower power consumption due to reduced V
DD
DSP Design
Parallel Processing Parallel Processing
x(n)
a b
ax(n) abx(n) x(n)
ax(n) abx(n)
x(0), x(1), x(2)...
(2k)
a b
b (2k)
k=0,1,2,3...
x(2k)
ax(2k) abx(2k)
a b
x(0), x(2), x(4)...
x(2k+1)
a b
ax(2k+1) abx(2k+1)
x(1), x(3), x(5)... ( ) ( ) ( )
Two samples are processed in parallel
double throughput
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
or
lower power consumption due to reduced V
DD
DSP Design
Direct Form 4-tap FIR
D D D
x(n)
h0 h3 h2 h1
y(n)
Clock speed limited by Critical Path!
T = T + (N 1)T T
Critical
= T
M
+ (N-1)T
A
N = Nr. of Taps
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
p
DSP Design
Definitions
Cutset: A set of edges that if removed, or cut,
results in two disjoint graphs. j g p
Feedforward Cutset: if data is moved in forward
di ti ll t t direction on all cutsets.
D D D
x(n)
h0 h3 h2 h1
y(n)
Feedback Cutset: data in both directions Feedback Cutset: data in both directions.
A B
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
D
DSP Design
Pipelining
Pipelining: Placing delays at feedforward cutsets.
D D D
x(n)
h0 h3 h2 h1
y(n)
Feedforward
cutset
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
cutset
DSP Design
Pipelining
Pipelining: Placing delays at feedforward cutsets.
D D D
x(n)
D
h0 h3 h2 h1
y(n-1)
D
Pipeline stage
Pipelining will not affect functionality
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
p g y
but introduce latency!
DSP Design
Pipelining Pipelining
In a pipelined system:
In an M-level pipelined system, the number of delay elements in any
path from input to output is (M-1) greater than that in the same path in
the original sequential circuit g q
Pipelining reduces the critical path but leads to increased latency
Latency: the difference in the availability of the first output data in
th i li d t d th ti l t the pipelined system and the sequential system
T i d b k Two main drawback
increase in the number of latches
increase systemlatency increase system latency
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
What if
y(n) z(n)
What if
x(n)
( )
x(n)+y(n) x(n)+y(n)+z(n)
( )
y(n) z(n-1)
( ) ( ) ( )
Z
-1
x(n)
( ) ( )
x(n-1)+y(n-1)+z(n-1)
We said before that the critical path was
x(n)+y(n)
x(n-1)+y(n-1)
We said before that the critical path was
cut in half. Under what assumption?
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
For example: What if the adders are ripple-carry?
DSP Design
Example: Pipelining when ripple-carry adders
x
0
y
0
x
1
y
1
x
2
y
2
x
3
y
3
c
1
c
2
c
3
c
4
c
0
T
s
0
s
1
s
2
s
3
Assume the delay in calculating the sum, s
delay,
T
critical
y g
delay,
is equal to calculating the carry, c
delay
.
What is the critical path?
4
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
4 x c
delay
DSP Design
x
0
y
0
x
1
y
1
x
2
y
2
x
3
y
3
Example: Pipelining when ripple-carry adders
x
0
y
0
c
1
x
1
y
1
c
2
x
2
y
2
c
3
x
3
y
3
c
4
c
0
T
critical
c c c c
c
critical
z
0
z
1
z
2
z
3
s
c
1
s
c
2
s
c
3
s
c
4
c
0
s
0
s
1
s
2
s
3
What is the critical path now?
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
4 x c
delay
+ s
delay
5 x c
delay
DSP Design
Example: Pipelining when ripple-carry adders p p g pp y
x
0
y
0
x
1
y
1
x
2
y
2
x
3
y
3
c
1
c
2
c
3
c
4
c
0
x
0
y
0
x
1
y
1
x
2
y
2
x
3
y
3
D D D D
T
critical
c c c c
c
D D D D
z
0
z
1
z
2
z
3
s
c
1
s
c
2
s
c
3
s
c
4
c
0
s
0
s
1
s
2
s
3
What is the critical path now?
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
4 x c
delay
DSP Design
Example: Pipelining when ripple-carry adders p p g pp y
x
0
y
0
x
1
y
1
x
2
y
2
x
3
y
3
c
1
c
2
c
3
c
4
c
0
x
0
y
0
x
1
y
1
x
2
y
2
x
3
y
3
D D D D
T
critical
c c c c
c
D D D D
z
0
z
1
z
2
z
3
s
c
1
s
c
2
s
c
3
s
c
4
c
0
s
0
s
1
s
2
s
3
What is the critical path now?
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
4 x c
delay
DSP Design
Conclusion
The result depends on the structure of the used
bl k t f dd ( i l blocks, e.g. type of adder (ripple-carry, carry
save, carry look-ahead,...)
We have to understand how the blocks work
and if the critical paths are independent or if
there is a relationship. p
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Feedforward Cutsets
2 4
D
T
critical
= 3546
1 6
D
D
D
D
3 5
D
Feedforward cutset
D
D
Must place delays
on all edges in the on all edges in the
cutset
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Feedforward Cutsets
2 4
D
T
critical
= 3456
1 6
3 5
D
Not a
Feedforward cutset
Pipelining not possible
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
p g p
DSP Design
Fine-Grain pielining
Let TM=10 units and TA=2 units. If the multiplier is broken into 2
smaller units with processing times of 6 units and 4 units,
Fine-Grain pielining
smaller units with processing times of 6 units and 4 units,
respectively (by placing the latches on the horizontal cutset across
the multiplier), then the desired clock period can be achieved as
(TM+TA)/2 ( )
x(n)
Feedforward
h0 h3 h2 h1
(10)
Feedforward
cutset?
y(n)
(2)
D D D
(2)
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
(2) (2)
DSP Design
Fine-Grain pielining
Let TM=10 units and TA=2 units. If the multiplier is broken into 2
smaller units with processing times of 6 units and 4 units,
Fine-Grain pielining
smaller units with processing times of 6 units and 4 units,
respectively (by placing the latches on the horizontal cutset across
the multiplier), then the desired clock period can be achieved as
(TM+TA)/2 ( )
A fine-grain pipelined version of the 3-tap data-broadcast FIR filter is
shown below.
x(n)
(6)
D D D D
(4)
y(n-1)
(4)
D D D
(6)
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
(2)
D D D
DSP Design
Synchronous Pipelining Synchronous Pipelining
R R R
E
G
Combinatorial
Logic
R
E
G G G
clock clock
Logic depth
Out
t
min
t
max
t
min
t
max
Out
I
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
time
In
0
T
clk
DSP Design
Wave Pipelining Wave Pipelining
R R R
E
G
Combinatorial
Logic
R
E
G G G
clock
D l
clock
Logic depth
Out
t
min
t
max
Delay
New input data is
applied before the Out applied before the
previous computation
is done.
I
T
clk
< T
critical path
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
time
In
0
T
clk
DSP Design
Wave pipelining: Pros and cons?
+ shorter T
clk
- extensive simulation
tedious design - tedious design
- hard to verify
lack of tools - lack of tools
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Double Pumped Bus?
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel Processing Parallel Processing
Parallel processing and pipelining techniques are duals each other: if a
computation can be pipelined it can also be processed in parallel Both of them computation can be pipelined, it can also be processed in parallel. Both of them
exploit concurrency available in the computation in different ways.
How to design a Parallel FIR system?
Consider a single input single output (SISO) FIR filter: Consider a single-input single-output (SISO) FIR filter:
y(n)=ax(n)+bx(n-1)+cx(n-2)
Convert the SISO system into an MIMO (multiple-input multiple-output)
system in order to obtain a parallel processing structure
To get a parallel system with 3 inputs per clock cycle To get a parallel system with 3 inputs per clock cycle
y(3k)=ax(3k)+bx(3k-1)+cx(3k-2)
y(3k+1)=ax(3k+1)+bx(3k)+cx(3k 1)
Parallel processing system is also called block processing and the number of
y(3k+1)=ax(3k+1)+bx(3k)+cx(3k-1)
y(3k+2)=ax(3k+2)+bx(3k+1)+cx(3k)
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Parallel processing system is also called block processing, and the number of
inputs processed in a clock cycle is referred to as the block size
DSP Design
Parallel Processing
Convert
x(n) y(n)
Single-Input-Single-Output (SISO)
x(n) y(n)
to
Multiple-Input-Multiple-Output (MIMO)
x(2k) y(2k) x(2k) y(2k)
(2 1) (2 1)
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
x(2k+1) y(2k+1)
DSP Design
Parallel Processing (contd) Parallel Processing (contd)
For example: For example:
When block size is 2, 1 delay element =2 sampling delays
D
x(2k) x(2k-2)
D
x(10k) X(10k-10)
When block size is 10, 1 delay element =10 sampling delays
D
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel 3-Tap

+ + = ) 2 ( ) 1 ( ) ( ) (
2 1 0
n x b n x b n x b n y

+ + + = + ) 1 ( ) ( ) 1 ( ) 1 (
2 1 0
n x b n x b n x b n y
n changed to 2k

+ + = ) 2 2 ( ) 1 2 ( ) 2 ( ) 2 (
2 1 0
k x b k x b k x b k y
g

+ + + = +
+ +
) 1 2 ( ) 2 ( ) 1 2 ( ) 1 2 (
) 2 2 ( ) 1 2 ( ) 2 ( ) 2 (
2 1 0
2 1 0
k x b k x b k x b k y
k x b k x b k x b k y
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

) ( ) ( ) ( ) (
2 1 0
y
DSP Design
Parallel 3-Tap (2)
+ + ) 2 2 ( ) 1 2 ( ) 2 ( ) 2 ( k b k b k b k

+ + + = +
+ + =
) 1 2 ( ) 2 ( ) 1 2 ( ) 1 2 (
) 2 2 ( ) 1 2 ( ) 2 ( ) 2 (
2 1 0
2 1 0
k x b k x b k x b k y
k x b k x b k x b k y
k=0
D
x(2k-1)
x(2k) x(2k+1)
k=0
D
b
x(2k-2)
b b b
0
y(2k)
b
1
b
2
y(2k+1)
b
0
b
1
b
2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)
DSP Design
P ll l 3 T (3) Parallel 3-Tap (3)
+ + ) 2 ( ) 1 ( ) 0 ( ) 0 ( b b b
x(5) x(4)

+ + =
+ + =
) 1 ( ) 0 ( ) 1 ( ) 1 (
) 2 ( ) 1 ( ) 0 ( ) 0 (
2 1 0
2 1 0
x b x b x b y
x b x b x b y
x(3) x(2)
x(0) x(1)
k=0
D
x(2k-1)
x(2k) x(2k+1)
2 Inputs
k=0
D
b
x(2k-2)
b b
2 Inputs
b
0
y(2k)
b
1
b
2
y(0) y(2) y(4)
y(2k+1)
b
0
b
1
b
2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)
y(1) y(3) y(5)
DSP Design
P ll l 3 T (4) Parallel 3-Tap (4)
+ + ) 2 ( ) 1 ( ) 0 ( ) 0 ( b b b

+ + =
+ + =
) 1 ( ) 0 ( ) 1 ( ) 1 (
) 2 ( ) 1 ( ) 0 ( ) 0 (
2 1 0
2 1 0
x b x b x b y
x b x b x b y
D
x(2k-1)
x(2k) x(2k+1)
k=0
2 Inputs
D
b
x(2k-2)
k=0
b b
2 Inputs
b
0
y(2k)
b
1
b
2
y(2k+1)
b
0
b
1
b
2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)
DSP Design
P ll l 3 T (4) Parallel 3-Tap (4)
+ + ) 2 ( ) 1 ( ) 0 ( ) 0 ( b b b

+ + =
+ + =
) 1 ( ) 0 ( ) 1 ( ) 1 (
) 2 ( ) 1 ( ) 0 ( ) 0 (
2 1 0
2 1 0
x b x b x b y
x b x b x b y
D
x(2k-1)
x(2k) x(2k+1)
k=0
2 Inputs
D
b
x(2k-2)
k=0
b b
2 Inputs
b
0
y(2k)
b
1
b
2
y(2k+1)
b
0
b
1
b
2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)
DSP Design
P ll l 3 T (4) Parallel 3-Tap (4)
+ + ) 2 ( ) 1 ( ) 0 ( ) 0 ( b b b

+ + =
+ + =
) 1 ( ) 0 ( ) 1 ( ) 1 (
) 2 ( ) 1 ( ) 0 ( ) 0 (
2 1 0
2 1 0
x b x b x b y
x b x b x b y
D
x(2k-1)
x(2k) x(2k+1)
k=0
2 Inputs
D
b
x(2k-2)
k=0
b b
2 Inputs
b
0
y(2k)
b
1
b
2
y(2k+1)
b
0
b
1
b
2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)
DSP Design
P ll l 3 T (5) Parallel 3-Tap (5)
+ + ) 2 ( ) 1 ( ) 0 ( ) 0 ( b b b

+ + =
+ + =
) 1 ( ) 0 ( ) 1 ( ) 1 (
) 2 ( ) 1 ( ) 0 ( ) 0 (
2 1 0
2 1 0
x b x b x b y
x b x b x b y
D
x(2k-1)
x(2k) x(2k+1)
k=0
2 Inputs
D
b
x(2k-2)
k=0
b b
2 Inputs
b
0
y(2k)
b
1
b
2
y(2k+1)
b
0
b
1
b
2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)
DSP Design
P ll l 3 T (5) Parallel 3-Tap (5)
+ + ) 2 ( ) 1 ( ) 0 ( ) 0 ( b b b

+ + =
+ + =
) 1 ( ) 0 ( ) 1 ( ) 1 (
) 2 ( ) 1 ( ) 0 ( ) 0 (
2 1 0
2 1 0
x b x b x b y
x b x b x b y
D
x(2k-1)
x(2k) x(2k+1)
k=0
2 Inputs
D
b
x(2k-2)
k=0
b b
2 Inputs
b
0
y(2k)
b
1
b
2
y(2k+1)
b
0
b
1
b
2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)
DSP Design
P ll l 3 T (5) Parallel 3-Tap (5)
+ + ) 2 ( ) 1 ( ) 0 ( ) 0 ( b b b

+ + =
+ + =
) 1 ( ) 0 ( ) 1 ( ) 1 (
) 2 ( ) 1 ( ) 0 ( ) 0 (
2 1 0
2 1 0
x b x b x b y
x b x b x b y
D
x(2k-1)
x(2k) x(2k+1)
k=0
2 Inputs
D
b
x(2k-2)
k=0
b b
2 Inputs
b
0
y(2k)
b
1
b
2
y(2k+1)
b
0
b
1
b
2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)
DSP Design
Parallel Processing (contd) Parallel Processing (cont d)
Note: The critical path of the block (or parallel) processing Note: The critical path of the block (or parallel) processing
system remains unchanged. But since L samples are
processed in 1 clock cycle, the iteration (or sample) period is processed in 1 clock cycle, the iteration (or sample) period is
given by the following equations:
T
T T T
A M clock
+ > 2 for a 3-tap FIR filter
L
T
T T
clock
sample iteration
= =
So, it is important to understand that in a parallel system
T
sample
= T
clock
, whereas in a pipelined system T
sample
= T
clock
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
P ll l P i ( td) Parallel Processing (contd)
Why use parallel processing when pipelining can be used equally well? Why use parallel processing when pipelining can be used equally well?
Consider the following chip set: when the critical path is less than the I/O
bound (output-pad delay plus input-pad delay and the wire delay between
th t hi ) thi t i i ti b d d the two chips), we say this system is communication bounded
So, we know that pipelining can be used only to the extent such that the
critical path computation time is limited by the communication (or I/O)
bound. Once this is reached, pipelining can no longer increase the speed
Chip 1 Chip 2
T
output
pad
input
pad
i
T
ion communicat
T
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
n computatio
T
DSP Design
Parallel Processing (contd) Parallel Processing (contd)
So, in such cases, pipelining can be combined with parallel So, in such cases, pipelining can be combined with parallel
processing to further increase the speed of the DSP system
By combining parallel processing (block size: L) and pipelining By combining parallel processing (block size: L) and pipelining
(pipelining stage: M), the sample period can be reduced to:
M L
T
T T
clock
sample iteration
= =
Parallel processing can also be used for reduction of power
M L
Parallel processing can also be used for reduction of power
consumption while using slow clocks
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel Processing (contd) Parallel Processing (contd)
A serial-to-parallel converter
D D
x(n)
D
T/4 T/4 T/4
4k+3
Sample Period T/4
T T T T
4k+3
x(4k+3) x(4k+2) x(4k+1) x(4k)
A parallel-to-serial converter
y(4k+3) y(4k+2) y(4k+1) y(4k)
T T T T
4k
y(n) 0
D D D
T/4 T/4 T/4
T T T T y(n) 0
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel FIR with
Polyphase Decomposition
Chapter 9.2 Chapter 9.2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Polyphase FIR Filter Polyphase FIR Filter
1
( ) ( ) ( ) ( ) ( )
N
n n
Y z H z X z h n z x n z


= =

0 0
( ) ( ) ( ) ( ) ( )
n n
Y z H z X z h n z x n z
= =

h(n) x(n) y(n)
h
x(n)
h h h h
3
y(n)
h
0
h
1
h
2
Idea: Split to an L-parallel
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
filter and reduce the strength
DSP Design
SSplit the input sequence
1 2 3 4 5
( ) (0) (1) (2) (3) (4) (5) X z x x z x z x z x z x z

= + + + + + +
0 2 4
( ) (0) (2) (4) X z x z x z x z

= + + +
( )
1 2 4
(1) (3) (5) z x x z x z

+ + + +
2 1 2
0 1
( ) ( ) ( ) X z X z z X z

= +
) 2 ( ) (
2
k x of transform Z are z X
h( ) ( ) ( ) h( )
x(2k) y(2k)
0 1
( ) ( ) ( )
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
h(n) x(n) y(n) h(n)
x(2k) y(2k)
y(2k+1) x(2k+1)
DSP Design
S f Split the filter accordingly
1 2 3 4 5
( ) (0) (1) (2) (3) (4) (5) H z h h z h z h z h z h z

= + + + + + +
2 1 2
0 1
( ) ( ) ( ) H z H z z H z

= +
0 1
and
( ) ( ) ( ) Y z H z X z = =
( ) ( )
2 1 2 2 1 2
0 1 0 1
( ) ( ) ( ) ( ) H z z H z X z z X z

+ +
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
( ) ( )
DSP Design
( ) ( )
2 1 2 2 1 2
0 1 0 1
( ) ( ) ( ) ( ) H z z H z X z z X z

+ +
2 2 2 2 2
( )
2 2 2 2 2
0 0 1 1
1 2 2 2 2
( ) ( ) ( ) ( ) H z X z z H z X z

+ +
( )
1 2 2 2 2
0 1 1 0
( ) ( ) ( ) ( ) z H z X z H z X z

+
2 2 2 2 2 2
( ) ( ) ( ) ( ) ( ) Y X H X H
2 2 2 2 2 2
0 0 0 1 1
2 2 2 2 2
1 0 1 1 0
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
Y z X z H z z X z H z
Y z X z H z X z H z

= +
= +
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
1 0 1 1 0
( ) ( ) ( ) ( ) ( ) Y z X z H z X z H z +
DSP Design
Polyphase Filter: 2 parallel filter Polyphase Filter: 2parallel filter
y(2k) H0
Split h(n) in
t N/2
x(2k)
y(2k) H0
H1
two N/2
filters
H1
H0: Even Coef.
H1: Odd Coef.
y(2k+1)
x(2k+1)
H0
H1 D
2 2 2 2 2 2
0 0 0 1 1
2 2 2 2 2
1 0 1 1 0
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
Y z X z H z z X z H z
Y z X z H z X z H z

= +
= +
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
1 0 1 1 0
( ) ( ) ( ) ( ) ( ) Y z X z H z X z H z +
DSP Design
Polyphase Filter: 2 parallel filter Polyphase Filter: 2parallel filter
y(2k) H0
Split h(n) in
t N/2
x(2k)
y(2k) H0
H1
two N/2
filters
H1
H0: Even Coef.
H1: Odd Coef.
y(2k+1)
x(2k+1)
H0
H1 D
2 2 2 2 2 2
0 0 0 1 1
2 2 2 2 2
1 0 1 1 0
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
Y z X z H z z X z H z
Y z X z H z X z H z

= +
= +
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
1 0 1 1 0
( ) ( ) ( ) ( ) ( )
DSP Design
Example
h
2
h
0
y
0
= x
0
h
0
x(2n)
y(2n)
h h h
3
h
1
h
2
h
0
y(2n+1)
x(2n+1)
h
2
h
0
h
3
h
1
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Example
h
2
h
0
y
0
= x
0
h
0
x(2n)
y(2n)
h h
y
1
= x
0
h
1
+ x
1
h
0
h
3
h
1
h
2
h
0
y
2
= x
0
h
2
+ x
1
h
1
+
y(2n+1)
x(2n+1)
h
2
h
0
+ x
2
h
0
( )
h
3
h
1
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Example
Critical
Example
h
2
h
0
No Strength
Critical
path
x(2n)
y(2n)
h h
Reduction yet
2N Multiplications per
h
3
h
1
2N Multiplications per
two samples
h
2
h
0
Power can be saved:
y(2n+1)
x(2n+1)
h
2
h
0
Slightly increased
critical path but 2
l i d
h
3
h
1
sample periods
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
3 phase Polyphase Filter 3-phase Polyphase Filter
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Polyphase Filters
Filter Mult Add Subfilters
FIR N N-1 1
2 Parallel 2 N 2(N 1) 4 2-Parallel 2 N 2(N-1) 4
3-Parallel 3 N 3(N-1) 9
2
L-Parallel L N L(N-1) L
2
However, they give L samples each cycle
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Fast FIR Filters

+ =

1 1
2
0 0 0
X H z X H Y

+ =
1 0 0 1 1
X H X H Y
{ }
0 0 2 4
, , ,... H h h h =
{ }
{ }
1 1 3 5
, , ,... H h h h =

2
{ }
0 1 0 1 2 3 4 5
, , , ... H H h h h h h h + = + + +

+ +
+ =

1 1
2
0 0 0
) )( ( X H X H X X H H Y
X H z X H Y
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

+ + =
1 1 0 0 1 0 1 0 1
) )( ( X H X H X X H H Y
DSP Design
Fast FIR Filters

+ =

1 1
2
0 0 0
X H z X H Y

+ =
1 0 0 1 1
X H X H Y
H
0
(2k 1) H H
y(2k) x(2k)
y(2k+1) H
0
+H
1
H
1
x(2k+1)

2
1

+ +
+ =

1 1
2
0 0 0
) )( ( X H X H X X H H Y
X H z X H Y
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

+ + =
1 1 0 0 1 0 1 0 1
) )( ( X H X H X X H H Y
DSP Design
R d d C l it Reduced Complexity
N/2 Mult
H
0
y(2k) x(2k)
each
y(2k+1) H
0
+H
1
N/2-1 Add
each
H
1
x(2k+1)
Total
3N/2 1 5N M lt d t 2N 3N/2 = 1.5N Mults compared to 2N
3(N/2 - 1) + 4 = 1.5N + 1 Adds compared to 2(N - 1)
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
F FIR E l Fast FIR Example
x(2n)
y(2n)
h
2
h
0
H
0
+H
1
Precomputed
y(2n)
y(2n+1)
h
0
+h
1
h
2
+h
3
y( )
x(2n+1)
3 Multiplications
h
3
h
1
3 Multiplications
and 3.5 Add per
Sample
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Sample
DSP Design
Alternative Structure Alternative Structure

+ =

1 1
2
0 0 0
X H z X H Y

+ =
1 0 0 1 1
X H X H Y
H
0
y(2k+1) H
0
-H
1
y(2k) x(2k)
y( )
0 1
H
1
x(2k+1)

+ =

1 1
2
0 0 0
X H z X H Y

+ = ) )( (
1 0 1 0 1 1 0 0 1
X X H H X H X H Y
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Alternative Structure
h ( )
Alternative Structure
h
LP
(n)
H
0
-H
1
might be better for lowpass and
H
0
+H
1
might be better for highpass
filt filters
Saves power
H
0
y(2k) x(2k)
0
y(2k+1) H
0
-H
1
y( ) ( )
h
HP
(n)
H
H
1
x(2k+1)
H
1
H
0
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
H
1
DSP Design
Transposition
Reverse the Reverse the
edges and
interchange interchange
in- outputs
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
R d d C l it 3 ll l Reduced Complexity 3-parallel
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
R d d C l it 4 ll l Reduced Complexity 4-parallel
H
0
H
0
+H
1 0 1
H
1
Two Cascaded 2-
parallel filters
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Fast FIR 6 Fast FIR 6-
parallel pa a e
3-parallel filters
i 2 ll l in a 2 parallel
structure
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
F t FIR 8 ll l Fast FIR 8-parallel
3 cascaded
2-parallel filters pa a e te s
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Polyphase Filters Polyphase Filters
Traditional Fast FIR
Filter Mults Adds Mults Adds Filter Mults Adds Mults Adds
FIR N N-1
2-Parallel 2 N 2(N-1) 1.5N 1.5N+1 2 Parallel 2 N 2(N 1) 1.5N 1.5N 1
3-Parallel 3 N 3(N-1) 2N 2N+4
4-Parallel 4 N 4(N-1) 9N/4 20+9(N/4-1)
(nm)-parallel filter with n=2, m=2
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Polyphase Filters Polyphase Filters
N=8 Traditional Reduced Complexity N=8 Traditional Reduced Complexity
Filter Mults Adds Mults Adds
FIR 8 7
2 P ll l 16 14 12 11 2-Parallel 16 14 12 11
3-Parallel 24 21 16 20
4-Parallel 32 28 18 29
N=1000 Traditional Reduced Complexity
Filter Mults Adds Mults Adds
FIR 1000 999
2-Parallel 2000 1998 1500 1499
3-Parallel 3000 2997 2000 2004
R d d C l it 44%
3 Parallel 3000 2997 2000 2004
4-Parallel 4000 3996 2250 2261
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Reduced Complexity: 44%
DSP Design
Pipelining
and
Parallel Processing Parallel Processing
for Low Power
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Power Dissipation
Two measures are important
Peak power (Sets dimensions) Peak power (Sets dimensions)
P V I =
Average power (Battery and cooling)
peak DD DDmax
P V I =
Average power (Battery and cooling)
dt (t) i
V
P
T
DD
}
= dt (t) i
T
P
0
DD av
}
=
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Or rather energy
DSP Design
Power consumption in CMOS
Gaining more Gaining more
importance with
technology scaling
P P P P + +
technology scaling
static circuit - short dynamic total
P P P P + + =
The one we will look at
and reduce with
pipelining and
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
p p g
parallelism
DSP Design
CMOS Power Consumption
P P P P
2
static circuit - short dynamic tot
= + + =
V I I V V C f
DD leakage sc DD
2
DD L
+ + =
switching for y probabilit = switching for y probabilit
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Short Circuit - Current Spikes
Current peak when both N- and PMOS are open
V
DD
-V
T
V
DD
V
T
V
TT
I
peak
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Static Power Consumption Static Power Consumption
due to leakage current
I
leakage
increases
ith d i V
V
DD
with decreasing V
T
P =I V P
stat
=I
leakage
V
DD
Drain Leakage
I
l k
I
leakage
Subthreshold
Current
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
V Scaling: V and I Trade off V
T
Scaling: V
T
and I
OFF
Trade-off
Performance vs
High V
I
D
S
)
Low V
T
Leakage:
V + I |
High V
T
l
n
(
V
T
+ I
OFF
|
I
I
OFFL
V
G
V
TH
I
OFFH
V
TL
As V
T
decreases, sub-threshold leakage increases
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Dynamic Power Consumption Dynamic Power Consumption
E h d i it
Assumed that V
swing
=V
DD
, otherwise V
swing
V
DD
Energy charged in a capacitor
E
C
= CV
2
/2 = C
L
V
DD
2
/2
V
DD
Energy E
c
is also discharged, i.e.
Charge
E
tot
= C
L
V
DD
2
Power consumption
P = C
L
V
DD
2
f
Di h Discharge
Since squared most
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
efficient to reduce P
DSP Design
Reduce...
C it Capacitances
Transistor/Gate C
L d C Load C
Interconnects, more and more important
E ternal External
Activity
Frequency
P l d t ffi i t Power supply squared so most efficient
ith t d i f ?
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
...without reducing performance?.
DSP Design
Propagation Delay in CMOS
L DD
V C
T

C
W
k
2
L
) (
T DD
DD
pd
V V k
T

= ox
C
L
k , , ,
if V V >>
C 1
if
T DD
V V >>
f
f kV
C
T
L
pd
1
= =
f
proportional to
V
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
f kV
DD
V
DD
DSP Design
Propagation Delay in CMOS
is a valid assumption?
T DD
V V >>
2
L
) (
T DD
DD
pd
V V k
V C
T

=
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Shouri Chatterjee, Yannis Tsividis and Peter Kinget,
Analog Circuit Design Techniques at 0.5V
DSP Design
Pi li i i Pipelining, power consumption
a b
x(n)
ax(n) abx(n)
x(n)
a b
abx(n-1)
Z
1
( )
ax(n)
( )
Z
-1
ax(n-1)
2
Critical path cut in half
double f double throughput
2
DD
V C f P =
doub e f doub e t oug put
or
same f double time for mult reduced V
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
same f double time for mult reduced V
DD
DSP Design
Reduction of Critical Path Reduction of Critical Path
Propagation delay of the original filter and the
pipelined filter pipelined filter
seq
T
Sequential (critical path):
( ) V
Pipelined: (critical path when M=3)
( )
0
V
pipe
T
( )
0
V |
pipe
T
pipe
T
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pi li i i
The power consumption in original architecture
Pipelining, power consumption
g
2
DD L
V C f P =
DD L seq
V C f P
The supply voltage can be reduced to | V
DD
, (0<| <1).
H th ti f th i li d filt i Hence, the power consumption of the pipelined filter is:
( )
seq
2
DD L pipe
P V C f P
2
| | = =
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining Pipelining
Propagation delays of the sequential and the pipelined architecture: g y
) (
DD L DD L
V M C
T
V C
T
|
2 2
) (
) (
,
) (
T DD
DD L
pip
T DD
DD L
seq
V V k
T
V V k
T

=
|
|
The capacitance in each
stage has been reduced.
Since the same f is maintained T
seq
=T
pipe
2 2
) ( ) (
T DD T DD
V V V V M = | |
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining
2 2
x(n)
2 2
) ( ) (
T DD T DD
V V V V M = | |
D
( )
h0
D
h2
h1
D
h4
D
h3
1
3
Tadd
Tmult
=
=
y(n)
3 Tmult =
? M =
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining
2 2
x(n)
2 2
) ( ) (
T DD T DD
V V V V M = | |
D
( )
h0
D
h2
h1
D
h4
D
h3
1
3
Tadd
Tmult
=
=
y(n)
3 Tmult =
4 7 . .
4 4
original mult add
T T T t u
T T t u
= + =
= =
7
4
M =
If Mdoesnt divide the critical path evenly you have to use the real M ,
4 4 . .
Pipelined add
T T t u = =
4
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
i.e. how big is the actual reduction of the critical path.
DSP Design
Example: simple datapath Example: simple datapath
2
V C f P =
a c b
C = the total f
Register Register Register
switched
capacitance p
Compare
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining Pipelining
Increased C due to register
|
0.37P (0.58V) 1.1C f P
2
pipe
= =
|
a
R i t R i t R i t
c b
f
Register Register Register
f
Register Register
Pipeline
stage
Compare
g g
stage
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel Processing Parallel Processing
x(n)
a b
ax(n) abx(n) x(n)
ax(n) abx(n)
x(0), x(1), x(2)...
k=0 1 2 3
x(2k)
a b
ax(2k) abx(2k)
k=0,1,2,3...
x(0), x(2), x(4)... ( )
ax(2k) ( )
a b
( ), ( ), ( )
x(2k+1)
ax(2k+1) abx(2k+1)
x(1), x(3), x(5)...
Two samples are processed in parallel Two samples are processed in parallel
same f but two samples double throughput
or
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
reduce f and two samples same throughput and reduced V
DD
DSP Design
Parallel Processing for Low Power Parallel Processing for Low Power
T t l it C i i d b L Total capacitance, C, is increased by L
To maintain the same sample rate p
f is reduced by 1/L
f reduced V
DD
can be reduced
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel Processing Parallel Processing
k 0 1 2 3
x(2k)
a b
ax(2k) abx(2k)
k=0,1,2,3...
x(0) x(2) x(4) x(2k)
ax(2k) abx(2k)
a b
x(0), x(2), x(4)...
x(2k+1)
ax(2k+1) abx(2k+1)
x(1), x(3), x(5)...
f
2 2
( )
p a r a L D D s e q
f
P L C V P
L
| | = =
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel Processing for Low Power Parallel Processing for Low Power
T
Sequential (critical path):
seq
T
Parallel: (critical path when L=3)
( )
0
V
a a e (c t ca pat e 3)
( )
0
V |
seq
T 3
seq
T 3
Propagation delay of the L parallel system is given by
( )
0
V |
seq
T 3
Propagation delay of the L-parallel system is given by
2 2
) ( ) (
DD L DD L
seq par
V V k
V C
L
V V k
V C
T L T

=

=
|
|
) ( ) (
T DD T DD
V V k V V k |
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
2 2
) ( ) (
T DD T DD
V V V V L = | |
DSP Design
Parallel Processing Parallel Processing
2.15 due to mux |
0.36P (0.58V) 2.15C 0.5f P
2
par
= =
a
Register Register Register
c b a
Register Register Register
c b
f/2 f/2 f/2 f/2
Compare Compare Compare Compare
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
P ll l P i d Pi li i Parallel Processing and Pipelining
0.19P (0.4V) 2.35C 0.5f P
2
pipe par,
= =
a
Register Register Register
c b a
Register Register Register
c b
f/2
f/2
f/2
f/2
Compare Compare
Register Register Register Register
Pipeline
Compare Compare
Pipeline
stage
Viktor wall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

You might also like