Professional Documents
Culture Documents
[Adapted from Rabaeys Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Datapath
Execution units
- Adder, multiplier, divider, shifter, etc.
Control
Interconnect
Memory
1 0 1 0 1 0 x
1 0 1 1 1 0 1 0 1 0
Multiplicand Multiplier
1 0 1 0 1 0
0 0 0 0 0 0
Partial products
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0
Result
Multiply Operation
N
multiplicand multiplier partial product array
2N
Multiplication Approaches
Partial product array rows are accumulated from top to bottom on an N-bit adder
- After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add
Making it faster
Use a faster adder Use higher radix (e.g., base 4) multiplication O(N/2 Tadder)
- Use multiplier recoding to simplify multiple formation (booth)
Form the partial product array in parallel and add it in parallel Use serial-parallel mult Use an array multiplier
Sp12
- Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI Can be easily CMPEN 411 L20 S.5 and efficiently pipelined
X3 X3
HA X3 X2
X2 X2
FA X1
X1 X1
FA X0
X0 X0
HA Y1 Z0
Y0
Y2
Z1
FA X3
FA
FA X1
FA
FA X0
HA Y3
HA
Z2
X2
FA
Z7
Z6
Z5
Z4
Z3
Booth multiplier
Booth encoding
Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.
Booth actions
Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.
yi yi-1 yi-2 increment 000 001 010 011 100 101 0 x x 2x -2x -x
110
111
Sp12 CMPEN 411 L20 S.13
-x
0
Booth example
An array multiplier needs N addtions, booth multiplier needs only N/2 additions
Ripple Carry Adder (RCA) built out of 64 FAs Subtraction complement all subtrahend bits (xor gates) and set the low order carry-in RCA
S1
S2
B2
disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption)
C63
A63
B63
1-bit FA
S63
C64=Cout
Booth structure
Wallace-Tree Multiplier
Partial products 6 5 4 3 2 1 0 First stage 6 5 4 3 2 1 0 Bit position
FA
HA
(c)
(d)
Wallace-Tree Multiplier
interconnect
(4,2) Counter
all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position) the internal carry output is fed to the next higher weight position (indicated by the )
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
Tiles with neighboring (4,2) counters Internal carry in at same level (i.e., bit position weight) as the internal carry out
Fast 4x4 multiplication using (4,2) counters How would you lay it out?
multiplicand multiplier partial product array reduced pp array (to CPA) double precision product five (4,2) counters 5-bit CPA
multiplicand
multiplier
8-bit product
icand ier
Multipliers Summary
Optimization Goals Different Vs Binary Adder Once Again: Identify Critical Path Other possible techniques - Logarithmic versus Linear (Wallace Tree Mult) - Data encoding (Booth) - Pipelining FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
Next lecture