l8 9 Arithmetic

L8/9: Arithmetic Structures
Acknowledgements:
Materials in this lecture are courtesy of the following sources and are used with permission.
Rex Min
Kevin Atkinson
Prof. Randy Katz (Unified Microelectronics Corporation Distinguished Professor in Electrical
Engineering and Computer Science at the University of California, Berkeley) and Prof. Gaetano
Borriello (University of Washington Department of Computer Science & Engineering) from
Chapter 2 of R. Katz, G. Borriello. Contemporary Logic Design. 2nd ed. Prentice-Hall/Pearson
Education, 2005.
J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective
Prentice Hall/Pearson, 2003.
L8/9: 6.111 Spring 2006
Introductory Digital Systems Laboratory
Number Systems Basics

How to represent negative numbers?
Three common schemes: sign-magnitude, ones

complement, twos complement
Sign-magnitude: MSB = 0 for positive, 1 for negative

Range:
-(2N-1 1) to +(2N-1 1)
Two representations for zero: 0000 & 1000
Simple multiplication but complicated addition/subtraction
_
Ones complement: if N is positive then its negative is N
Example:
0111 = 7, 1000 = -7
Range: -(2N-1 1) to +(2N-1 1)
Two representations for zero: 0000 & 1111
Subtraction implemented as addition and negation
L8/9: 6.111 Spring 2006
Twos Complement Representation

Twos complement = bitwise complement + 1
0111 1000 + 1 = 1001 = -7
1001 0110 + 1 = 0111 = 7
Asymmetric range: -2N-1 to +2N-1-1
Only one representation for zero
Simple addition and subtraction
Most common representation
0100
-4
1100
0100
-4
1100
+3
0011
+ (-3)
1101
-3
1101
+3
0011
0111
-7
11001
10001
-1
1111
[Katz05]
L8/9: 6.111 Spring 2006
Overflow Conditions
Add two positive numbers to get a negative number or two negative numbers
to get a positive number
-2
-3
-4
-1
+0
-1
1111
1110
0001
1101
0010
1100
-5
+3
0101
1001
-7
0110
1000
0111
-8
-4
+6
1110
5 + 3 = -8!
0001
1101
0010
1010
-6
0011
+3
0100
+4
1001
-7
0110
1000
-7 - 2 = +7!
0111
+7
-7
0011
-2
1100
01000
5
3
-8
+5
+6
1000
1001
0111
0101
+2
0101
-8
+7
+1
0000
1011
+4
+5
1111
1100
-5
0100
1010
-3
+2
0011
1011
-6
-2
+1
0000
+0
10111
If carry in to sign equals carry out then can ignore carry out, otherwise have overflow
L8/9: 6.111 Spring 2006
Binary Full Adder

A
Full
Adder
Ci
S = A B Ci
= ABCi + ABCi + ABCi + ABCi
Co
Co = AB + Ci (A+B)
S
A
0
0
0
0
1
1
1
1
B
0
0
1
1
0
0
1
1
CI
0
1
0
1
0
1
0
1
S
0
1
1
0
1
0
0
1
CO
0
0
0
1
0
1
1
1
AB
CI
00
11
10
AB
CI
00
01
11
10
CO
L8/9: 6.111 Spring 2006
01
Ripple Carry Adder Structure
B3 A3
Co,3
Full Co,2 Full

Adder
Adder
S3
B1 A1
B2 A2
S2
Co,1
Full
Adder
S1
B0 A0
Co,0
Full
Adder
Ci,0
S0
Worst case propagation delay linear with the number of bits

tadder = (N-1)tcarry + tsum
L8/9: 6.111 Spring 2006
Extension to Subtraction
Under twos complement, subtracting B is the same as
adding the bitwise complement of B then adding 1
Combination addition/subtraction system:
_
mux selects B for addition, B for subtraction

B3
A3
Co,3
B2
B3
A2
0 1
FA
S3
Co,2
B1
B2
A1
0 1
FA
S2
Co,1
B1
B0
A0
0 1
FA
Co,0
S1
B0
0 1
Add/Subtract
FA
S0
Add 1 for
subtraction using
carry in
Overflow occurs if carry in to sign bit differs from final carry out
overflow
L8/9: 6.111 Spring 2006
Comparator (one approach)

B3
A3
A2
0 1
FA
Co,3
B2
B3
Co,2
B1
B2
A1
0 1
FA
Co,1
B0
A0
0 1
FA
S2
S3
B1
Co,0
B0
0 1
FA
S1
S0
N
true if negative
result
true if zero result
A<B = N
A=B = Z
AB = Z+N
L8/9: 6.111 Spring 2006
Alternate Adder Logic Formulation

How to Speed up the Critical (Carry) Path?
(How to Build a Fast Adder?)
A
Cin
Full
Adder
Co
S
Generate (G) = AB
Propagate (P) = A B
Note: can also use P = A + B for Co

L8/9: 6.111 Spring 2006
Carry Bypass Adder

A0
B0
A1
Ci,0
G0 P1
FA
Co,0
P0
FA
P2
Co,0
A3
FA
Co,1
FA
P3
G3
FA
Co,2
P,G
G1
P2
Co,1
B3
P,G
G2
P,G
G0 P1
FA
B2
P,G
G1
P,G
Ci,0
A2
P,G
P,G
P0
B1
FA
Co,3
BP= P0P1P2P3
P,G
G2
P3
Co,2
Can compute P, G
in parallel for all bits
G3
FA
Co,3
Key Idea: if (P0 P1 P2 P3) then Co,3 = Ci,0

L8/9: 6.111 Spring 2006
10
16-bit Carry Bypass Adder
BP= P0P1P2P3
P,G
Ci,0
P,G
P,G
P,G
FA FA FA FA
Co,0
Co,1
BP= P4P5P6P7
Co,2
P,G
0
1
Co,3
P,G
P,G
P,G
FA FA FA FA
Co,4
Co,5
BP= P8P9P10P11
Co,6
P,G
0
1
Co,7
P,G
P,G
P,G
FA FA FA FA
Co,8
Co,9
BP= P12P13P14P15
Co,10
P,G
Co,11
0
1
P,G
P,G
P,G
FA FA FA FA
Co,12
Co,13
Co,14
0
1
Co,15
Assume the following for delay each gate:

P, G from A, B: 1 delay unit
P, G, Ci to Co or Sum for a FA: 1 delay unit
2:1 mux delay: 1 delay unit
What is the worst case propagation delay for the 16-bit adder?
L8/9: 6.111 Spring 2006
11
Critical Path Analysis
BP= P0P1P2P3
P,G
Ci,0
P,G
P,G
P,G
FA FA FA FA
Co,0
Co,1
BP2= P4P5P6P7
Co,2
P,G
0
1
Co,3
P,G
P,G
P,G
FA FA FA FA
Co,4
Co,5
Co,6
BP3= P8P9P10P11
P,G
0
1
Co,7
P,G
P,G
P,G
FA FA FA FA
Co,8
Co,9
BP4= P12P13P14P15
Co,10
P,G
Co,11
0
1
P,G
P,G
P,G
FA FA FA FA
Co,12
Co,13
Co,14
0
1
Co,15
For the second stage, is the critical path:

BP2 = 0 or BP2 = 1?
Message: Timing Analysis is Very Tricky

Must Carefully Consider Data Dependencies For
False Paths
L8/9: 6.111 Spring 2006
12
Carry Lookahead Adder

Re-express the carry logic as follows:
C1 = G0 + P0 C0
C2 = G1 + P1 C1 = G1 + P1 G0 + P1 P0 C0
C3 = G2 + P2 C2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C0
C4 = G3 + P3 C3 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 C0
Each of the carry equations can be implemented in a two-level logic

network
Variables are the adder inputs and carry in to stage 0
Ripple effect has been eliminated!

L8/9: 6.111 Spring 2006
13
Carry Lookahead Logic

Ai
Bi
Pi
Ci
Si
Adder with propagate and

generate outputs
Gi
Later stages have increasingly complex logic

C0
P0
C1
G0
C2
G1
P2
G2
G1
C0
P0
P1
P2
P3
G0
P1
P2
C0
P0
P1
G0
P1
C0
P0
P1
P2
C3
G0
P1
P2
P3
G1
P2
P3
G2
P3
C4
G3
L8/9: 6.111 Spring 2006
14
Block Generate and Propagate

Gj:i and Pj:i denote the Generate and Propagate functions, respectively, for a group of bits
from positions i to j. We call them Block Generate and Block Propagate. Gj:i equals 1 if
the group generates a carry independent of the incoming carry. Pj:i equals 1 if an
incoming carry propagates through the entire group. For example, G3:2 is equal to 1 if a
carry is generated at bit position 3, or if a carry out is generated at bit position 2 and
propagates through position 3. G3:2 = G3 + P3G2. P3:2 is true if an incoming carry
propagates through both bit positions 2 and 3. P3:2 = P3P2
C2 = (G1 + P1 G0 ) + (P1 P0 )C0 = G1:0 + P1:0 C0
C4 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 C0
= (G3 + P3 G2 ) + (P3 P2 )Co,1 = G3:2 + P3:2 C2
= G3:2 + P3:2(G1:0 + P1:0 C0) = G3:0 + P3:0 C0
The carry out of a 4-bit block can thus be computed using only the block generate and propagate
signals for each 2-bit section, plus the carry in to bit 0. The same formulation will be used to generate
the carry out signals for a 16-bit adder using the block generate and propagate from 4-bit sections.
L8/9: 6.111 Spring 2006
15
More Definitions
( g, p ) ( g', p' ) = ( g + pg', pp' )
The above dot operator obeys the associative property, but it is not commutative
(G3:2,P3:2) = (G3,P3) (G2,P2)
( Co, 3, 0 ) = ( ( G3, P 3 ) ( G2, P 2 ) ( G 1, P 1 ) ( G0, P 0 )) ( C i, 0, 0 )
( G3:0 , P3:0 ) = [ ( G3, P 3) ( G2, P2 ) ] [ ( G1, P1 ) ( G0, P0 ) ]

= ( G 3:2, P3:2 ) ( G 1:0 , P1:0 )
( Co, k, 0 )
L8/9: 6.111 Spring 2006
= ( ( G k, P k ) ( G k 1 , P k 1 ) ( G , P ) )
0 0
( C i 0, 0 )
,
16
Logarithmic Look-Ahead Adder

A0
F
A1
A2
A3
A4
A5
A6
A7
tp: O(N)
A0
A1
A2
A3
A4
A5
A6
tp:O(log2N)
A7
L8/9: 6.111 Spring 2006
17
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
(A0, B0)
(A1, B1)
(A2, B2)
(A3, B3)
(A4, B4)
(A5, B5)
(A6, B6)
(A7, B7)
(A8, B8)
(A9, B9)
(A10, B10)
(A11, B11)
(A12, B12)
(A13, B13)
(A14, B14)
(A15, B15)
16-bit Kogge-Stone Tree Adder
Sum Logic
Propagate, Generate Logic
L8/9: 6.111 Spring 2006

18
Adder Performance
Ripple
Bypass
Select
Lookahead
Delay vs. number of bits

L8/9: 6.111 Spring 2006
19
Addition of M, N-bit Numbers

IN1N-1
IN0N-1
IN1N-2
IN0N-2
IN11
IN01
IN10
IN00
IN2N-1
IN3N-1
INM-1N-1
IN2N-2
IN3N-2
INM-1N-2
IN21
IN31
INM-11
IN20
IN30
INM-10
Cin =0
L8/9: 6.111 Spring 2006
Cin =0
Cin =0
Cin =0
20
16-bit Carry Lookahead Schematic

181 configured for A+B:
M = 0, S3-0 = 1001
A3:0
Cn
A7:4
B3:0
181 Cn+4
P
Cn
181 Cn+4
P
S3:0
P3:0
G3:0
Cin
A11:8
B7:4
Cn
B11:8
181 Cn+4
S11:8
S7:4
A15:12 B15:12
Cn
181 Cn+4
S15:12
P0 G0 P1 G1 P2 G2 P3 G3
G
P
182
Cn
Cn+x
Cn+y
Cn+z
182 computes Cin for later stages,

using block G & P from earlier stages
L8/9: 6.111 Spring 2006
21
Binary Multiplication
x3
Partial product computation

is simple (single and gate)
x3
x1
x2
x1
x2
y0
x0
x0
y1
z0
x3
HA
FA
FA
x2
x1
x0
HA
y2
z1
x3
FA
FA
FA
x2
x1
x0
HA
y3
z2
z7
L8/9: 6.111 Spring 2006
FA
FA
FA
HA
z6
z5
z4
z3
22
A Serial (Magnitude) Multiplier

Shift/LD
0
[4]
D
x3
0
1
x2
0
1
xBus
[3]
D
[2]
CLK
yReg
CLK
L8/9: 6.111 Spring 2006
acc_out
LD
D Q
XY
CLK
CLK
Y3
0
1
Y1
x0
[0]
Y0
Shift/LD
Shift
[1]
x1
[5]
add_out
[6]
0
1
rst
Y2
Shift
xBus [7]
23
Timing Diagram
CLK
Shift
xreg
yreg
Acc_out
X*Y
0 0 0 0 x3 x2 x1 x0
0 0 0 x3 x2 x1 x0 0
0 0 x3 x2 x1 x0 0 0
0 x3 x2 x1 x0 0 0 0
0 0 0 0 x3 x2 x1 x0
y0 y1 y2 y3
y1 y2 y3 X
y2 y3 X X
y3 X X X
y0 y1 y2 y3
00000000
Accum_1
Accum_2
Accum_3
00000000
PRODUCT
PRODUCT
L8/9: 6.111 Spring 2006
24
Verilog of Serial Multiplier

module serialmult(shift, clk,
x, y, xy);
input shift, clk;
input [3:0] x, y;
output [7:0] xy;
reg [7:0] xReg;
reg [3:0] yReg;
reg [7:0] xBus, acc_out,
xy_int;
wire[7:0] add_out;
assign add_out = xBus +
acc_out;
assign xy = xy_int;
always @ (yReg[0] or xReg)
begin
if (yReg[0] == 1'b0) xBus =
8'b0;
else xBus = xReg;
end
L8/9: 6.111 Spring 2006
always @ (posedge clk)

begin
if (shift == 1'b0)
begin
xReg <= {4'b0, x};
yReg <= y;
acc_out <= 8'b0;
xy_int <= add_out;
end
else
begin
xReg <= {xReg[6:0], 1'b0};
yReg <= {y[3], yReg[3:1]};
acc_out <= add_out;
xy_int <= xy;
end // if shift
end // always
endmodule
25
Simulation
L8/9: 6.111 Spring 2006
26
Twos Complement Multiplication
x3
x3
x1
x2
x1
x2
x0
x0
y1
z0
x3
y0
FA
FA
FA
x2
x1
x0
HA
y2
z1
x3
FA
FA
FA
x2
x1
x0
y3
z2
HA
FA
FA
FA
HA
z7
z6
z5
z4
z3
L8/9: 6.111 Spring 2006
HA
27
Summary
Performance of arithmetic blocks dictate the

performance of a digital system
Architectural and logic transformations can

enable significant speed up (e.g., adder delay
from O(N) to O(log2(N))
Similar concepts and formulation can be applied

at the system level
Timing analysis is tricky: watch out for false

paths!
Area-Delay trade-offs (serial vs. parallel

implementations)
L8/9: 6.111 Spring 2006
28

l8 9 Arithmetic

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

l8 9 Arithmetic

Uploaded by

Copyright:

Available Formats

L8/9: Arithmetic Structures

Introductory Digital Systems Laboratory

Number Systems Basics

Three common schemes: sign-magnitude, ones

Sign-magnitude: MSB = 0 for positive, 1 for negative

L8/9: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Twos Complement Representation

Introductory Digital Systems Laboratory

Introductory Digital Systems Laboratory

Binary Full Adder

L8/9: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Ripple Carry Adder Structure

Full Co,2 Full

Worst case propagation delay linear with the number of bits

L8/9: 6.111 Spring 2006

Introductory Digital Systems Laboratory

mux selects B for addition, B for subtraction

Introductory Digital Systems Laboratory

Comparator (one approach)

true if zero result

Introductory Digital Systems Laboratory

Alternate Adder Logic Formulation

Note: can also use P = A + B for Co

Introductory Digital Systems Laboratory

Carry Bypass Adder

Key Idea: if (P0 P1 P2 P3) then Co,3 = Ci,0

Introductory Digital Systems Laboratory

16-bit Carry Bypass Adder

Assume the following for delay each gate:

L8/9: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Critical Path Analysis

For the second stage, is the critical path:

Message: Timing Analysis is Very Tricky

Introductory Digital Systems Laboratory

Carry Lookahead Adder

Each of the carry equations can be implemented in a two-level logic

Ripple effect has been eliminated!

Introductory Digital Systems Laboratory

Carry Lookahead Logic

Adder with propagate and

Later stages have increasingly complex logic

Introductory Digital Systems Laboratory

Block Generate and Propagate

Introductory Digital Systems Laboratory

( Co, 3, 0 ) = ( ( G3, P 3 ) ( G2, P 2 ) ( G 1, P 1 ) ( G0, P 0 )) ( C i, 0, 0 )

( G3:0 , P3:0 ) = [ ( G3, P 3) ( G2, P2 ) ] [ ( G1, P1 ) ( G0, P0 ) ]

L8/9: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Logarithmic Look-Ahead Adder

Introductory Digital Systems Laboratory

16-bit Kogge-Stone Tree Adder

Propagate, Generate Logic

L8/9: 6.111 Spring 2006

Delay vs. number of bits

Introductory Digital Systems Laboratory

Addition of M, N-bit Numbers

L8/9: 6.111 Spring 2006

Introductory Digital Systems Laboratory

16-bit Carry Lookahead Schematic

182 computes Cin for later stages,