You are on page 1of 23

CMPEN 411

VLSI Digital Circuits


Spring 2009

Lecture 20: Multiplier Design

[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003


J. Rabaey, A. Chandrakasan, B. Nikolic]

Sp09 CMPEN 411 L20 S.1


Review: Basic Building Blocks
 Datapath
Execution units
- Adder, multiplier, divider, shifter, etc.
Register file and pipeline registers
Multiplexers, decoders

 Control
Finite state machines (PLA, ROM, random logic)

 Interconnect
Switches, arbiters, buses

 Memory
Caches (SRAMs), TLBs, DRAMs, buffers

Sp09 CMPEN 411 L20 S.2


The Binary Multiplication

1 0 1 0 1 0 Multiplicand
x 1 0 1 1 Multiplier
1 0 1 0 1 0
1 0 1 0 1 0

0 0 0 0 0 0 Partial products

+ 1 0 1 0 1 0

1 1 1 0 0 1 1 1 0 Result

Sp09 CMPEN 411 L20 S.3


Multiply Operation
 Multiplication is just a a lot of additions

N
multiplicand
multiplier

partial
N product can be formed in parallel
array

double precision product


2N

Sp09 CMPEN 411 L20 S.4


Multiplication Approaches
 Right shift and add
Partial product array rows are accumulated from top to bottom
on an N-bit adder
- After each addition, right shift (by one bit) the accumulated partial
product to align it with the next row to add
Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA
 Making it faster
Use a faster adder
Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder)
- Use multiplier recoding to simplify multiple formation (booth)
Form the partial product array in parallel and add it in parallel
 Making it smaller (i.e., slower)
Use serial-parallel mult
Use an array multiplier
- Very regular structure with only short wires to nearest neighbor
cells. Thus, very simple and efficient layout in VLSI Can be easily
Sp09 CMPEN 411 L20 S.5
and efficiently pipelined
Serial-parallel multiplier structure

Sp09 CMPEN 411 L20 S.6


The Array Multiplier

X3 X2 X1 X0 Y0

X3 X2 X1 X0 Y1

HA FA FA HA

X3 X2 X1 X0 Y2 Z1

FA FA FA HA

X3 X2 X1 X0 Y3

FA FA FA HA

Z7 Z6 Z5 Z4 Z3

Sp09 CMPEN 411 L20 S.7


Booth multiplier
 Encoding scheme to reduce number of stages in
multiplication.
 Performs two bits of multiplication at once—requires half
the stages.
 Each stage is slightly more complex than simple
multiplier, but adder/subtracter is almost as small/fast as
adder.

Sp09 CMPEN 411 L20 S.11


Booth encoding
 Two’s-complement form of multiplier:
y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign bit)
(example, y=18=010010 y= -18 = 101110 )

 Rewrite using 2a = 2a+1 - 2a:


y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

 Consider first two terms: by looking at three bits of y, we


can determine whether to add x, 2x to partial product.

Sp09 CMPEN 411 L20 S.12


Booth actions

y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

 Consider first two terms: by looking at three bits of y, we


can determine whether to add x, 2x to partial product.
yi yi-1 yi-2 increment

000 0
001 x
010 x
011 2x
100 -2x
101 -x
110 -x
111 0
Sp09 CMPEN 411 L20 S.13
Booth example
 x = 1001 (910), y = 0111 (710).
 P0 = 00000000
 y3y2y1=011 y1y0y-1=11(0)
 y1y0y-1 = 110, P1 = P0 - (1001) = 11110111
 x shift left for 2 bits to be 100100
 y3y2y1 = 011, P2 = P1+ (10*100100) =
11110111+01001000 = 001111111 (6310)
 An array multiplier needs N addtions, booth multiplier
needs only N/2 additions

Sp09 CMPEN 411 L20 S.14


Review: A 64-bit Adder/Subtractor
add/subt C0=Cin
 Ripple Carry Adder (RCA) A0 1-bit
built out of 64 FAs FA S0
B0 C1
 Subtraction – complement
all subtrahend bits (xor A1 1-bit
FA S1
gates) and set the low B1
C2
order carry-in
A2 1-bit
FA S2
 RCA
B2 C3
 advantage: simple logic,

...
so small (low cost)
 disadvantage:
C63
slow (O(N)
for N bits) and lots of A63 1-bit
glitching (so lots of energy FA S63
consumption) B63
C64=Cout

Sp09 CMPEN 411 L20 S.15


Booth structure

Sp09 CMPEN 411 L20 S.16


Wallace-Tree Multiplier

Partial products First stage


6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position

(a) (b)

Second stage Final adder


6 5 4 3 2 1 0 6 5 4 3 2 1 0

FA HA
(c) (d)

Sp09 CMPEN 411 L20 S.17


Wallace-Tree Multiplier

x3y2 x2y2 x3y1 x1y2 x3y0 x1y1 x2y0 x0y1


Partial products x3y3 x2y3 x1y3 x0y3 x0y2 x1y0 x0y

First stage
HA HA

Second stage FA FA FA FA

Final adder
z7 z6 z5 z4 z3 z2 z1 z0

Full adder = (3,2) compressor


Sp09 CMPEN 411 L20 S.18
Making it Faster: Tree Multiplier Structure
0 D
Q (‘ier)
0 D
multiple 0 D
forming 0 D (‘icand)
circuits

partial
product mux
array

interconnect
+
reduction
tree
reduction
tree (log N)
+
fast carry
propagate CPA (log N)
adder (CPA)
P (product)

Sp09 CMPEN 411 L20 S.19


(4,2) Counter
 Built out of two (3,2) counters (just FA’s!)
all of the inputs (4 external plus one internal) have
the same weight (i.e., are in the same bit position)
the internal carry output is fed to the next higher
weight position (indicated by the )

(3,2)

Note: Two carry outs - one


(3,2)
“internal” and one “external”

Sp09 CMPEN 411 L20 S.20


Tiling (4,2) Counters

(3,2) (3,2) (3,2)

(3,2) (3,2) (3,2)

 Reduces columns four high to columns only two high


Tiles with neighboring (4,2) counters
Internal carry in at same “level” (i.e., bit position weight) as the
internal carry out

Sp09 CMPEN 411 L20 S.22


4x4 Partial Product Array Reduction
 Fast 4x4 multiplication using (4,2) counters
 How would you
lay it out?
multiplicand
multiplier multiplicand

multiplier
partial
product
array

reduced pp five (4,2) counters


array (to
CPA) 5-bit CPA
double
precision
product 8-bit product

Sp09 CMPEN 411 L20 S.24


8x8 Partial Product Array Reduction
‘icand
 Wallace tree
‘ier
multiplier

partial two rows of


product nine (4,2)
array counters

reduced one row of


partial thirteen
product (4,2)
array counters

to a 13-bit fast CPA


Sp09 CMPEN 411 L20 S.25
An 8x8 Multiplier Layout
 How should it be laid out?
multiplicand

multiplier
nine (4,2) counters

nine (4,2) counters

thirteen (4,2) counters

13-bit CPA

Sp09 CMPEN 411 L20 S.26


Multipliers —Summary

• Optimization Goals Different Vs Binary Adder

• Once Again: Identify Critical Path

• Other possible techniques


- Logarithmic versus Linear (Wallace Tree Mult)
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION

Sp09 CMPEN 411 L20 S.32


Next Lecture and Reminders
 Next lecture
Shifters, decoders, and multiplexers
- Reading assignment – Rabaey, et al, 11.5-11.6

Sp09 CMPEN 411 L20 S.33

You might also like