Professional Documents
Culture Documents
June 1992
The work described by this report was supported by NSF under contract
MIP88-22961
Abstract
This report presents an extension to Booth's algorithm for binary multiplication. Most implementations that utilize Booth's algorithm use the 2 bit version, which reduces the number of partial
products required to half that required by a simple add and shift method. Further reduction in the
number of partial products can be obtained by using higher order versions of Booth's algorithm,
but it is necessary to generate multiples of one of the operands (such as 3 times an operand) by
the use of a carry propagate adder. This carry propagate addition introduces signi cant delay and
additional hardware. The algorithm described in this report produces such di cult multiples in a
partially redundant form, using a series of small length adders. These adders operate in parallel
with no carries propagating between them. As a result, the delay introduced by multiple generation is minimized and the hardware needed for the multiple generation is also reduced, due to the
elimination of expensive carry lookahead logic.
Key Words and Phrases: multiplication, Booth's algorithm, redundant multiples, computer
arithmetic
Copyright c 1992
by
Gary Bewick
Michael J. Flynn
Contents
1 Introduction
2 Background
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
3 Redundant Booth
1
2
3
3
4
4 Summary
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
5
9
11
11
12
14
15
18
20
21
iii
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
iv
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
1
2
3
4
5
6
6
7
7
8
8
9
10
11
12
13
13
14
15
16
16
17
18
19
20
1 Introduction
Multiplication is a basic arithmetic operation that is important in microprocessors, digital signal
processing, and other modern electronic machines. VLSI Designers have recognized this importance
by dedicating signi cant resources and area to integer and oating point multipliers. As a result,
it is desirable to reduce the cost of these multipliers by using e cient algorithms that do not
compromise performance. This report describes an extension to Booth's algorithm for binary
multiplication that reduces the cost of such high performance multipliers.
To explain this scheme, background material on existing algorithms is presented. These algorithms
are then extended to produce the new method. Since the method is somewhat complex, this report
will attempt to stay away from implementation details, but instead concentrate on the algorithms
in a hardware independent manner.
In order that the basic algorithms are not obscured with small details, unsigned multiplication only
will be considered here, but the algorithms presented are easily generalized to deal with signed
numbers.
2 Background
2.1 Add and Shift
The rst and simplest method of multiplication to be addressed is the add and shift multiplication
algorithm. This algorithm conditionally adds together copies of the multiplicand (partial products)
to produce the nal product, and is illustrated in Figure 1, for a 16 x 16 multiply. Each dot is
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
????????????????????????????????
????????????????????????????????
Multiplicand
Pa
rti
al
Pr
od
uc
ts
Selection
0
Msb
Product
Lsb
??
??
?? M
??
?? u
?? l
?? t
?? i
?? p
?? l
?? i
?? e
?? r
??
??
??
??
Lsb
Msb
a placeholder for a single bit which can be a zero or one. Each horizontal row of dots represents a
single copy of the multiplicand, M, (i.e. one partial product), which is conditioned upon a particular
bit of the multiplier . The conditioning algorithm is shown in the selection table in the upper left
hand corner of the gure. The multiplier itself is represented on the right edge of the gure, with
the least signi cant bit at the top. The nal product is represented by the row of 32 horizontal
dots at the bottom of the gure.
An example of the add and shift algorithm using actual numbers is shown in Figure 2.
Multiplier = 6366910 = 1111100010110101
Multiplicand (M) = 4011910 = 1001110010110111
1001110010110111
0000000000000000
1001110010110111
0000000000000000
1001110010110111
1001110010110111
0000000000000000
1001110010110111
0000000000000000
0000000000000000
0000000000000000
1001110010110111
1001110010110111
1001110010110111
1001110010110111
1001110010110111
1 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 00
11
M
0
M
0
M
M
0
M
0
0
0
M
M
M
M
M
1
0
1
0
1
1
0
1
0
0
0
1
1
1
1
1
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
10 0 1 1 = 255433661110 = Product
0
Figure 3.
This gure shows the selection logic for a single partial product (a single row of
Multiplicand
Msb
Lsb
Multiplier
bit
Msb
Partial Product
Lsb
2.3 Booth 3
Shift amounts between adjacent partial products of greater than 2 are also possible 2], with a
corresponding reduction in the height and number of dots in the dot diagram. A 3 bit Booth dot
diagram is shown in Figure 7, and an example is shown in Figure 8.
Each partial product
could be from the set f 0, M, 2M, 3M, 4M g. All multiples with the exception of 3M are
3
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
0
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
?????????????????
?????????????????
+
?????????????????????????????????
?????????????????????????????????
S S S
1 S
1 S
1 S
1 S
1 S
1 S
Selection
+0
+ Multiplicand
+ Multiplicand
+ 2 x Multiplicand
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
0
0
-2 x Multiplicand
- Multiplicand
- Multiplicand
-0
10001001110010110111
0
1101001110010110111
0
1010110001101001000
1
1010110001101001000
1
1101001110010110111
0
1001100011010010001
1
1011111111111111111
1
011111111111111111
1
1001110010110111
+M
+M
-M
-M
+M
-2M
-0
-0
+M
0
1
0
1
0
1
1
0
1
0
0
0
1
1
1
1
1
0
0
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
10011000010000000001010101100011
3 Redundant Booth
This section presents a new variation on the Booth 3 algorithm, which eliminates much of the delay
and part of the hardware associated with the multiple generation, yet produces a dot diagram
which can be made to approach that of the conventional Booth 3 algorithm. Before introducing
this variation, a simple and similar method (but is not particularly hardware e cient) is explained.
This method is then extended to produce the new variation. Methods of further generalizing to a
Booth 4 algorithm are then discussed.
Multiplicand
Msb
Lsb
Lsb
Select M
Multiplier
Group
Select 2M
Msb
Booth Decoder
12 more
And/Or/ExclusiveOr blocks
Msb
Lsb
Partial Product
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
0
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
??????????????????
???????????????????
??????????????????
?????????????????
?????????????????
????????????????????????????????
????????????????????????????????
S S S S
1 1 S
1 1 S
1 1 S
1 S
Multiplier Bits
Selection
Multiplier Bits
Selection
0000
+0
1000
0001
0010
+ Multiplicand
+ Multiplicand
1001
1010
0011
+2 x Multiplicand
1011
-2 x Multiplicand
0100
+2 x Multiplicand
1100
-2 x Multiplicand
0101
+3 x Multiplicand
1101
- Multiplicand
0110
+3 x Multiplicand
1110
- Multiplicand
0111
+4 x Multiplicand
1111
-0
-4 x Multiplicand
-3 x Multiplicand
-3 x Multiplicand
0
0
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
-M
110110110001101001000
1
+3M
111011101011000100101
0
-4M
110011000110100100011
1
-0
10111111111111111111
1
0
1
0
1
0
1
1
0
1
0
0
0
1
1
1
1
1
0
0
-3M
+2M
10011100101101110
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
10011000010000000001010101100011
Select M
Lsb
Select 3M
Multiplier
Group
Select 2M
Select 4M
Msb
Booth Decoder
1 of 18
multiplexer blocks
Multiplier Bits
Selection
Selection
00000
00001
00010
+0
+ Multiplicand
+ Multiplicand
01000
01001
01010
00011
+2 x Multiplicand
00100
Multiplier Bits
+4 x Multiplicand
Selection
Multiplier Bits
Selection
+5 x Multiplicand
10000
10001
-8 x Multiplicand
-7 x Multiplicand
11000
11001
-4 x Multiplicand
-3 x Multiplicand
+5 x Multiplicand
10010
-7 x Multiplicand
11010
-3 x Multiplicand
01011
+6 x Multiplicand
10011
-6 x Multiplicand
11011
-2 x Multiplicand
+2 x Multiplicand
01100
+6 x Multiplicand
10100
-6 x Multiplicand
11100
-2 x Multiplicand
00101
+3 x Multiplicand
01101
+7 x Multiplicand
10101
-5 x Multiplicand
11101
- Multiplicand
00110
+3 x Multiplicand
01110
+7 x Multiplicand
10110
-5 x Multiplicand
11110
- Multiplicand
00111
+4 x Multiplicand
01111
+8 x Multiplicand
10111
-4 x Multiplicand
11111
-0
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
0
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
????????????????????????????????
????????????????????????????????
S S S S 0
1 1 S 0
1 1 S 0
1 1 S 0
1 S 0
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
0
0
3M = 12035710 = 10011100101101110
01001110010110111
0111001100011010010001
10110001101001000
1
110010110001101001000
11111111111111111
1
111010011100101101110
01001110010110111
0
110001100011010010001
01100011010010001
1
10011111111111111111
11111111111111111
1
-3M
-M
+3M
-4M
-0
+2M
10011100101101110
0
1
0
1
0
1
1
0
1
0
0
0
1
1
1
1
1
0
0
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
10011000010000000001010101100011
??????????????????
??????????????????
??????????????????
0
4 bit adder
4 bit adder
4
Carry
4 bit adder
4
Carry
4
4 bit adder
4
Carry
4
1
4
Carry
??????????????????
C
C
C
C
??????????????????
Partially redundant form
10
2M
M
3M
??????????????????
C
C
C
C
??????????????????
Negate
??????????????????
C
C
C
C
??????????????????
1
1 1 1
1 1 1
1 1 1
Compensation constant
????????????????????????????????
???????????????????
????????????????????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
??????????????????
?????????????????
??????????????????
?????????????????
????????????????????????????????
????????????????????????????????
S S S S
1 1 S
1 1 S
1 1 S
1 S
Selection
Multiplier Bits
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
Selection
0000
K+ 0
1000
K-4 x Multiplicand
0001
0010
K+ Multiplicand
K+ Multiplicand
1001
1010
K-3 x Multiplicand
K-3 x Multiplicand
0011
K+2 x Multiplicand
1011
K+2 x Multiplicand
1100
K-2 x Multiplicand
0101
K+3 x Multiplicand
1101
K- Multiplicand
0110
K+3 x Multiplicand
1110
0111
K+4 x Multiplicand
1111
M
u
l
t
i
p
l
i
e
r
K-2 x Multiplicand
0100
Lsb
K- Multiplicand
K- 0
Msb
0
0
?????????????????
C
C
C
?????????????????
1
Multiple
K
Combine these
bits by summing
Y X
OR
= Y
EXOR
?????????????????
?????????????????
X
K + Multiple
?????????????????
?????????????????
+
C ?????????????????
?????????????????
C
K + Multiple
2K
bits of K + Multiple and adding 1. This is exactly the same procedure used to obtain the negative
of a number when it is represented in its non-redundant form.
This partially redundant form satis es the two conditions presented earlier, that is it has the same
representation for both positive and negative multiples, and also it is easy to generate the negative
given the positive form (the entries from the right side of the table in Figure 15 will continue to be
considered as negative multiples).
??????????????????
??????????????????
??????????????????
0
2M
M
4
Carry
4 bit adder
4 bit adder
Carry
4
Carry
4 bit adder
4
Carry
4 bit adder
C
??????????????????
??????????????????
X
3M
4
1
K + 3M, where
K = 000010001000100000
14
?????????????????
?????????????????
15 14 13 12 11 10
0 1 0
0 1
0 1
0 0
0 1 0
0
0 1
0
0 1
0
0 0
K+0
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
?????????????????
????????????????
????????????????
????????????????
Figure 19: Producing other multiples
0
15 14 13 12 11 10
13
12
11
15 14 13 12 11 10
15 14 13 12 11 10
K+M
K+2M
K+4M
15
Compensation constant
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??????????????????
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
1 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0
X
X
X
S S S S C
Y
Y
Y
S
X
X
X
1 1 S C
Y
Y
Y
S
X
X
X
1 1 S C
Y
Y
Y
S
X
X
X
1 1 S C
Y
Y
Y
S
X
X
X
1 S C
Y
Y
Y
S
????????????????????????????????
????????????????????????????????
0
0
K = 000010001000100000
000010001000100000
0 0 0
K+M = 001011111010010111
0
0
1
K+2M = 010001101101001110
1 0 1
K+3M = 011011010000000101
1
1
1
K+4M = 100101000011111100
1 1 0
Compensation constant
11111101100100000000010011100000
0111100100101111111010
0
0
0
1
110110100000101101000
1
1
0
1
111011011010000000101
1
1
1
0
110011010111100000011
0
0
1
1
K-3M
K-M
K+3M
K-4M
10111101110111011111
1
1
1
1
K-0
10011100101101110
2M
0
1
0
1
0
1
1
0
1
0
0
0
1
1
1
1
1
0
0
10011000010000000001010101100011
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
17
16
15
14
13
12
11
10
0
K + 3M
X
Y
Multiplicand
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
3D D 2D 4D
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Mux Block
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Out
Partial Product
17
17
16
15
14
13
12
11
3D
10
2D
4D
Select 3M
Select M
Select 2M
Select 4M
Invert
Mux Block
Out
?????????????????
?????????????????
C
3M
C
1
0 C
?????????????????
?????????????????
X
K + 3M
Left Shift
?????????????????
?????????????????
X
2K + 6M K + 6M
0 C
?????????????????
?????????????????
1
??????????????????
??????????????????
C
3M
6M
1 EXOR
1 C
Y =
2 EXOR ( 1 AND
19
0 C
Z =
C)
X =
Y =
1
Y
+
Z
Z =
OR ( 1 AND C )
?????????????????
?????????????????
Y
X =
K + 3M
1 EXOR
OR C
??????????????????
??????????????????
C
K + 6M
the multiplexers) can reduce this to 4, but this still means that there are more routing wires for a
multiple than there are dots in the multiple. Of course since there are now 3 multiples that must
be routed (3M, 5M, and 7M), these few extra wires may not be signi cant.
There are many other problems, which are inherited from the non-redundant Booth 4 algorithm.
Larger multiplexers { each multiplexer must choose from 8 possibilities, twice as many as for the
Booth 3 algorithm { are required. There is also a smaller hardware reduction in going from Booth
3 to Booth 4 then there was in going from Booth 2 to Booth 3. Optimizations are also possible for
generation of the 3M multiple. These optimizations are not possible for the 5M and 7M multiples,
so the small adders that generate these multiples must be of a smaller length (for a given delay).
This means more dots in the partial product section to be summed.
Thus a redundant Booth 4 algorithm is possible to construct, but probably has little speed or
implementation advantage over the redundant Booth 3 algorithm. The hardware savings due to
the reduced number of partial products is exceeded by the cost of the adders needed to produce
the three hard multiples, and the increased complexity of the multiplexers required to select the
partial products.
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
???????????????????
??????????????????
??????????????????
1 1 1 1 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0
X
X
S S S S
Y
Y
S
X
X
1 1 S
Y
Y
S
X
X
1 1 S
Y
Y
S
X
X
1 1 S
Y
Y
S
X
X
1 S
Y
Y
S
????????????????????????????????
????????????????????????????????
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
0
Lsb
M
u
l
t
i
p
l
i
e
r
Msb
0
0
for a 4 bit carry interval). This accumulation can be avoided by choosing adder lengths which are
relatively prime to the shift amount between neighboring partial products (in this case, 3). This
spreads the Y bits out so that accumulation won't occur in any particular column.
4 Summary
This report has described a new variation on conventional Booth multiplication algorithms. By
representing partial products in a partially redundant form, hard multiples can be computed without a slow, full length carry propagate addition. With such hard multiples available, a reduction
in the amount of hardware needed for summing partial products is then possible using the Booth 3
multiplication method. A detailed evaluation of implementations using this algorithm is currently
in progress, which will give precise answers to performance, area, and power improvements.
References
1] A. D. Booth. A signed binary multiplication technique. Quarterly Journal of Mechanics and
Applied Mathematics, 4(2):236{240, June 1951.
2] O. L. MacSorley. High-speed arithmetic in binary computers. Proceedings of the IRE, 49(1):67{
91, Jan 1961.
3] Mark Santoro. Design and Clocking of VLSI Multipliers. PhD thesis, Stanford University, Oct
1989.
4] Mark Santoro and Mark Horowitz. Spim: A pipelined 64x64b iterative array multiplier. IEEE
International Solid State Circuits Conference, pages 35{36, February 1988.
5] Naofumi Takagi, Hiroto Yasuura, and Shuzo Yajima. High-speed VLSI multiplication algorithm
with a redundant binary addition tree. IEEE Transactions on Computers, C{34(9), Sept 1985.
6] S. Waser and M. J. Flynn. Introduction to Arithmetic for Digital Systems Designers. Holt,
Rinehart and Winston, 1982.
21