Professional Documents
Culture Documents
Multiplier
Parallel Multiplier
m
X1
A = Ai 2i ; 0 A 2m 1, Ai 2 {0, 1}
i=0
X1
n
B = B j 2j ; 0 B 2n 1, Bi 2 {0, 1}
j=0
for large n, m.
Parallel Multiplier: n = m = 4
Example: m = n = 4; note: r = m + n = 4 + 4 = 8.
4 1
X 4 1
X
P = A·B = Ai 2i · Bi 2i
i=0 i=0
= A0 20 + A1 21 + A2 22 + A3 23 · B0 20 + B1 21 + B2 22 + B3 23
= A0 B0 20 + (A0 B1 + A1 B0 )21 + (A0 B2 + A1 B1 + A2 B0 )22
+(A0 B3 + A1 B2 + A2 B1 + A3 B0 )23 + (A1 B3 + A2 B2 + A3 B1 )24
+(A2 B3 + A3 B2 )25 + (A3 B3 )26
= P0 20 + P1 21 + P2 22 + P3 23 + P4 24 + P5 25 + P6 26 + P7 27
|{z}
???
Parallel Multiplier: n = m = 4
Example: m = n = 4; note: r = m + n = 4 + 4 = 8.
4 1
X 4 1
X
P = A·B = Ai 2i · Bi 2i
i=0 i=0
= A0 20 + A1 21 + A2 22 + A3 23 · B0 20 + B1 21 + B2 22 + B3 23
= A0 B0 20 + (A0 B1 + A1 B0 )21 + (A0 B2 + A1 B1 + A2 B0 )22
+(A0 B3 + A1 B2 + A2 B1 + A3 B0 )23 + (A1 B3 + A2 B2 + A3 B1 )24
+(A2 B3 + A3 B2 )25 + (A3 B3 )26
= P0 20 + P1 21 + P2 22 + P3 23 + P4 24 + P5 25 + P6 26 + P7 27
Parallel Multiplier: n = m = 4
P0 = A0 B0
P1 = A0 B1 + A1 B0 + 0
P2 = A0 B2 + A1 B1 + A2 B0 + PREV CARRY OVER
..
.
P6 = A3 B3 + PREV CARRY OVER
P7 = PREV CARRY OVER
operand carry-in
+ operand + + +
carry-out sum
+ + +
+ + +
+ + +
Parallel Multiplier
I Bus Widths: straightforward implementation requires two buses
of width n-bits and a third bus of width 2n-bits, which is
expensive to implement
Multiplier
Shifter
Shifter
I Q: When is scaling useful?
Shifter
I To scale numbers down by a factor of 2:
x1 = 10 = [1 0 1 0] x̂1 = [0 1 0 1] = 5
5
x2 = 5 = [0 1 0 1] 6
x̂2 = [0 0 1 0] = 2 = 2
x3 = 8 = [1 0 0 0] x̂3 = [0 1 0 0] = 4
I To add:
S̃ = [1 0 1 1 0] = 22 6= 23 = 10 + 5 + 8 = S
Shifter
I Consider the following related example. To scale numbers down
by a factor of 2:
11
x1 = 11 = [1 0 1 1] x̂1 = [0 1 0 1] = 5 6= 2
5
x2 = 5 = [0 1 0 1] x̂2 = [0 0 1 0] = 2 =6 2
9
x3 = 9 = [1 0 0 1] x̂3 = [0 1 0 0] = 4 = 6 2
I To add:
Ŝ = x̂1 + x̂2 + x̂3 = 5 + 2 + 4 = 11 = [1 0 1 1]
Shifter
I Q: When is scaling useful?
Barrel Shifter
Implementation of a 4-bit shift-right barrel shifter:
Input
Bits
Output
Bits
Barrel Shifter
Implementation of a 4-bit shift-right barrel shifter:
Barrel Shifter
Input
Bits
Output
Bits
Multiplier
Product
Register
I can implement A + BC operations
I clearing the accumulator at the right
time (e.g., as an initialization to zero)
ADD/SUB provides appropriate sum of products
Accumulator
Multiplier
Harvard Architecture
Address
Program
Memory
Data
Processor
Address
Data
Memory
Data
Address
Processor Data
Memory
Data
Address
Data
Memory
Data
On-Chip Memory
I on-chip = on-processor
I help in running the DSP algorithms faster than when memory is
o↵-chip
I dedicated addresses and data buses are available
I speed: on-chip memories should match the speeds of the ALU
operations
I size: the more area chip memory takes, the less area available for
other DSP functions
Instruction Operation
Instruction Operation
Instruction Operation
Speed Issues
Hardware Architecture
Parallelism
Parallelism means:
I provision of multiple function units, which may operate in
parallel to increase throughput
I multiple memories
I di↵erent ALUs for data and address computations
I advantage: algorithms can perform more than one operation at
a time increasing speed
Pipelining Example
Five steps:
Step 1: instruction fetch
Step 2: instruction decode
Step 3: operand fetch
Step 4: execute
Step 5: save
Multiplexer
MAC
y(n)
Unit
Multiplexer
Multiplexer
MAC
y(n)
Unit
Multiplexer
I At t = 0, initialization occurs.
I Accumulator = 0
Multiplexer
MAC
y(n)
Unit
Multiplexer
I At t = 2T
I Accumulator = h(0)x(n) + h(1)x(n 1)
Multiplexer
MAC
y(n)
Unit
Multiplexer
I At t = 3T
I Accumulator = h(0)x(n) + h(1)x(n 1) + h(2)x(n 2)
Multiplexer
MAC
y(n)
Unit
Multiplexer
I At t = 4T
I Accumulator = h(0)x(n) + h(1)x(n 1) + h(2)x(n 2) + h(3)x(n 3)
Multiplexer
MAC
y(n)
Unit
Multiplexer
I At t = 6T
I Accumulator = h(0)x(n) + h(1)x(n 1) + h(2)x(n 2) + h(3)x(n 3) + h(4)x(n 4)
+ h(5)x(n 5)
Multiplexer
MAC
y(n)
Unit
Multiplexer
I At t = 7T
I Accumulator = h(0)x(n) + h(1)x(n 1) + h(2)x(n 2) + h(3)x(n 3)+
h(4)x(n 4) + h(5)x(n 5) + h(6)x(n 6)
Multiplexer
MAC
y(n)
Unit
Multiplexer
I At t = 8T
I Accumulator = h(0)x(n) + h(1)x(n 1) + h(2)x(n 2) + h(3)x(n 3)+
h(4)x(n 4) + h(5)x(n 5) + h(6)x(n 6) + h(7)x(n 7)
Multiplexer Multiplexer
MAC MAC
Unit Unit
Multiplexer Multiplexer
y(n)
Part of the slides is reused from slides by Dr. Deepa Kundur, course
Real-Time Digital Signal Processing at University of Toronto