Chapter 5 Harris Module2

Chapter 5
Digital Design and Computer Architecture: ARM® Edition

Sarah L. Harris and David Money Harris
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 5 <1>
Chapter 5 :: Topics
• Introduction
• Arithmetic Circuits
• Number Systems
• Sequential Building Blocks
• Memory Arrays
• Logic Arrays
Introduction
• Digital building blocks:
– Gates, multiplexers, decoders, registers,
arithmetic circuits, counters, memory arrays,
logic arrays
• Building blocks demonstrate hierarchy,
modularity, and regularity:
– Hierarchy of simpler components
– Well-defined interfaces and functions
– Regular structure easily extends to different sizes
• Will use these building blocks in Chapter
7 to build microprocessor
ARITHMETIC CIRCUITS
Arithmetic circuits are the central building blocks of
computers.
Computers and digital logic perform many arithmetic
functions:
addition,
subtraction,
comparisons,
shifts,
multiplication,
and division.
We are implementating hardware of these operations.
Addition
Addition is one of the most common
operations in digital systems.
We first discuss how to add two 1-bit binary
numbers.
We then extend to N-bit binary numbers.
The half adder has
Two inputs- A and B,
and Two outputs- S and Cout.
S is the sum of A and B.
IfA and B are both 1, S is 2,

which cannot be represented with a single binary
digit.
Instead, it is indicated with a carry out Cout in the
next column.
The half adder can be built from an XOR gate and an
AND gate.
In a multi-bit adder, Cout is added or carried
in to the next most significant bit.
Full Adder
1-Bit Adders
Half Full
Adder Adder
A B A B
Cout Cout Cin

+ +
S S
A B Cout S Cin A B Cout S

0 0 0 0 0 0 0 0 0
0 1 0 1 0 0 1 0 1
1 0 0 1 0 1 0 0 1
1 1 1 0 0 1 1 1 0
1 0 0 0 1
S =AB 1 0 1 1 0
Cout = AB 1 1 0 1 0
1 1 1 1 1
S = A  B Cin
Cout = AB + ACin + BCin
Multibit Adders (CPAs)
• Types of carry propagate adders (CPAs):
– Ripple-carry (slow)
– Carry-lookahead (fast)
– Prefix (faster)
• Carry-lookahead and prefix adders faster for large
adders but require more hardware
Symbol
A B
N N
Cout Cin
+
N
S
Ripple-Carry Adder
• Chain 1-bit adders together
• Carry ripples through entire chain
• Disadvantage: slow
A31 B31 A30 B30 A1 B1 A0 B0
Cout Cin
+ C30 + C29 C1 + C0 +
S31 S30 S1 S0
Ripple-Carry Adder Delay
tripple = NtFA
where tFA is the delay of a 1-bit full adder
Carry-Lookahead Adder
The fundamental reason that large ripple-carry adders

are slow is that the carry signals must propagate
through every bit in the adder.
A carrylookahead adder (CLA) is another type of carry
propagate adder
solves this problem by dividing the adder into
blocks and
providing circuitry to quickly determine the
carry out of a block as soon as the carry in is known.
Thus it is said to look ahead across the blocks rather
than waiting to ripple through all the full adders inside
a block.
For example, a 32-bit adder is divided into eight 4-bit
blocks .
CLAs use generate (G) andpropagate (P) signals that
describe how a column or block determines the carry
out.
The ith column of an adder is said to generate a carry if it
produces a carry out independent of the carry in.
The ith column of an adder is guaranteed to
generate a carry Ci if Ai and Bi are both 1.
Hence Gi, the generate signal for column i, is calculated as
Gi =AiBi.
The column is said to propagate a carry if it produces a
carry out whenever there is a carry in.
The ith column will propagate a carry in, Ci−1, if either Ai
or Bi is 1. Thus, Pi =Ai +Bi.
Compute Cout for k-bit blocks using generate and propagate signals
Some definitions:
– Column i produces a carry out by either generating a carry out or
propagating a carry in to the carry out
– Generate (Gi) and propagate (Pi) signals for each column:
• Generate: Column i will generate a carry out if Ai and Bi are both 1.
G i = Ai Bi
• Propagate: Column i will propagate a carry in to the carry out if Ai or Bi is 1.
Pi = Ai + Bi
• Carry out: The carry out of column i (Ci) is:
Ci = Ai Bi + (Ai + Bi )Ci-1 = Gi + Pi Ci-1
Block Propagate and Generate
Now use column Propagate and Generate signals to
compute Block Propagate and Generate signals for
k-bit blocks, i.e.:
• Compute if a k-bit group will propagate a carry in (to the
block) to the carry out (of the block)
• Compute if a k-bit group will generate a carry out (of the
block)
Block Propagate and Generate Signals
• Example: Block propagate and generate
signals for 4-bit blocks (P3:0 and G3:0):
P3:0 = P3P2 P1P0
G3:0 = G3 + G2P3 + G1P2P3 + G0P1P2P3
= G3 + P3 (G2 + P2 (G1 + P1G0 )
Block Propagate and Generate Signals
• In general for a block spanning bits i through j,
Pi:j = PiPi-1 Pi-2 … Pj
Gi:j = Gi + Pi (Gi-1 + Pi-1 (Gi-2 + Pi-2 … Gj )
Ci = Gi:j + Pi:j Cj-1
32-bit CLA with 4-bit Blocks
B31:28 A31:28 B27:24 A27:24 B7:4 A7:4 B3:0 A3:0
4-bit CLA C27 4-bit CLA C23 C7 4-bit CLA C3 4-bit CLA
Cout Cin
Block Block Block Block
S31:28 S27:24 S7:4 S3:0
B3 A3 B2 A2 B1 A1 B0 A0
C2 C1 C0
Cin
+ + + +
S3 S2 S1 S0
G3:0 G3
P3
G2
P2
G1
P1
G0
P3
Cout P3:0 P2
P1
Cin
P0
Carry-Lookahead Addition
• Step 1: Compute Gi and Pi for all columns
• Step 2: Compute G and P for k-bit blocks
• Step 3: Cin propagates through each k-bit
propagate/generate logic (meanwhile
computing sums)
• Step 4: Compute sum for most significant k-
bit block
G i = Ai Bi
Pi = Ai + Bi
P3:0 = P3P2 P1P0
G3:0 = G3 + P3 (G2 + P2 (G1 + P1G0 )
B31:28 A31:28 B27:24 A27:24 B7:4 A7:4 B3:0 A3:0
computing sums) 4-bit CLA C27 4-bit CLA C23 C7 4-bit CLA C3 4-bit CLA
Cout Cin
Block Block Block Block
S31:28 S27:24 S7:4 S3:0

B3 A3 B2 A2 B1 A1 B0 A0
C C C
+ 2 + 1 + 0 + Cin
S3 S2 S1 S0 B3 A3 B2 A2 B1 A1 B0 A0
G3:0 C C C
G3
P3 + 2 + 1 + 0 + Cin B3 A3 B2 A2 B1 A1 B0 A0
G2 S3 S2 S1 S0 C C C
P
G2
1 G3:0 G3 + 2 + 1 + 0 + Cin
P1 P3 S3 S2 S1 S0
G0 G2
B3 A3 B2 A2 B1 A1 B0 A0 P2 G3:0 G3
C C C P3:0 P3 G1 P3
+ 2 + 1 + 0 + Cin Cout P2
P1
P1
G0 G2
S3 S2 S1 S0 Cin P0 P2
G1
P3 P1
G3:0 G3 Cout P3:0 P2 G0
P3 Cin P1
G2 P0 P3
P2 Cout P3:0 P2
G1 P
P1 Cin P1
G0 0
P3:0 P3
Cout P2
Cin P1
P0
computing sums)
• Step 4: Compute sum for most significant k-
bit block B3 A3 B2 A2 B1 A1 B0 A0
S3
G3:0
C
S2
C
S1
C
+ 2 + 1 + 0 + Cin
S0
G3
P3
G2
B3 A3 B2 A2 B1 A1 B0 A0
S3
C
S2
C
S1
C
+ 2 + 1 + 0 + Cin
S0 B3 A3 B2 A2 B1 A1 B0 A0
P C C C
G2
1 G3:0 G3 + 2 + 1 + 0 + Cin
P1 P3 S3 S2 S1 S0
G0 G2
B3 A3 B2 A2 B1 A1 B0 A0 P2 G3:0 G3
C C C P3:0 P3 G1 P3
+ 2 + 1 + 0 + Cin Cout P2
P1
P1
G0 G2
S3 S2 S1 S0 Cin P0 P2
G1
P3 P1
G3:0 G3 Cout P3:0 P2 G0
P3 Cin P1
G2 P0 P3
P2 Cout P3:0 P2
G1 P
P1 Cin P1
G0 0
P3:0 P3
Cout P2
Cin P1
P0
Carry-Lookahead Adder Delay
For N-bit CLA with k-bit blocks:
tCLA = tpg + tpg_block + (N/k – 1)tAND_OR + ktFA
– tpg : delay to generate all Pi, Gi

– tpg_block : delay to generate all Pi:j, Gi:j
– tAND_OR : delay from Cin to Cout of final AND/OR gate in k-bit CLA block
An N-bit carry-lookahead adder is generally much faster than a

ripple-carry adder for N > 16
Prefix adders
Prefix adders extend the generate and propagate
logic of the carrylookahead adder to perform
addition even faster.
They first compute G and P for pairs of columns,
then for blocks of 4, then for blocks of 8, then 16,
and so forth until the generate signal for every
column is known.
. The sums are computed from these generate
signals.
The strategy of a prefix adder is to compute the carry in
Ci−1 for each column i as quickly as possible, then to
compute the sum, using Si = Ai⊕Bi⊕Ci–1
Prefix Adder
• Computes carry in (Ci-1) for each column, then
computes sum:
Si = (Ai ^ Bi) ^ Ci-1
• Computes G and P for 1-, 2-, 4-, 8-bit blocks, etc.
until all Gi (carry in) known
• log2N stages
Prefix Adder
• Carry in either generated in a column or propagated from a
previous column.
• Column -1 holds Cin, so
G-1 = Cin
• Carry in to column i = carry out of column i-1:
Ci-1 = Gi-1:-1
Gi-1:-1: generate signal spanning columns i-1 to -1
• Sum equation:
Si = (Ai ^ Bi) ^ Gi-1:-1
• Goal: Quickly compute G0:-1, G1:-1, G2:-1, G3:-1, G4:-1, G5:-1, …
(called prefixes) (= C0, C1, C2, C3, C4, C5, …)
Prefix Adder
• Generate and propagate signals for a block spanning bits i:j
Gi:j = Gi:k + Pi:k Gk-1:j
Pi:j = Pi:kPk-1:j
• In words:
– Generate: block i:j will generate a carry if:
• upper part (i:k) generates a carry or
• upper part (i:k) propagates a carry generated in
lower part (k-1:j)
– Propagate: block i:j will propagate a carry if both the
upper and lower parts propagate the carry
16-Bit Prefix Adder Schematic
A i Bi
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 -1 i
P i:i G i:i
14:13 12:11 10:9 8:7 6:5 4:3 2:1 0:-1

Pi:k Pk-1:j Gi:k Gk-1:j
14:11 13:11 10:7 9:7 6:3 5:3 2:-1 1:-1
i:j
14:7 13:7 12:7 11:7 6:-1 5:-1 4:-1 3:-1

Pi:j Gi:j
14:-1 13:-1 12:-1 11:-1 10:-1 9:-1 8:-1 7:-1 Gi-1:-1 Ai Bi
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 i
Si
Prefix Adder Delay
tPA = tpg + log2N(tpg_prefix ) + tXOR
tpg: delay to produce Pi, Gi (AND or OR gate)

tpg_prefix: delay of black prefix cell (AND-OR gate)
Adder Delay Comparisons
Compare delay of: 32-bit ripple-carry, CLA, and prefix adders
• CLA has 4-bit blocks
• 2-input gate delay = 10 ps; full adder delay = 30 ps
tripple = NtFA = 32(30 ps)
= 960 ps
tCLA = tpg + tpg_block + (N/k – 1)tAND_OR + ktFA
= [10 + 60 + (7)20 + 4(30)] ps
= 330 ps
tPA = tpg + log2N(tpg_prefix ) + tXOR
= [10 + log232(20) + 10] ps
= 120 ps
Subtracter
Symbol Implementation
A B
N
A B
N N
N N
-
N +
Y N
Y
Comparator: Equality
A3
B3
A B A2
4 4 B2
Equal
= A1
B1
Equal
A0
B0
Comparator: Less Than
A B
N N
-
N
[N-1]
A<B
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 5 <38> 5-<38>
ALU: Arithmetic Logic Unit
ALU should perform:
• Addition
• Subtraction
• AND
• OR
ALUControl1:0 Function
00 Add
01 Subtract
10 AND
11 OR
Example: Perform A + B
ALUControl = 00
Result = A + B
00 Add
01 Subtract
10 AND
11 OR
Example: Perform A OR B
00 Add
01 Subtract
10 AND
11 OR
Example: Perform A OR B
ALUControl1:0 = 11
Mux selects output of OR gate as Result, so
Result = A OR B
00 Add
01 Subtract
10 AND
11 OR
00 Add
01 Subtract
10 AND
11 OR
ALUControl1:0 = 00
ALUControl0 = 0, so:
Cin to adder = 0
2nd input to adder is B
Mux selects Sum as Result, so
Result = A + B
ALU with Status Flags
Flag Description
N Result is Negative
Z Result is Zero
C Adder produces Carry out
V Adder oVerflowed
ALU with Status Flags
ALU with Status Flags: Negative
N = 1 if:
Result is negative
So, N is connected to
most significant bit of
Result
ALU with Status Flags: Zero
Z = 1 if:
all of the bits of Result
are 0
ALU with Status Flags: Carry
C = 1 if:
Cout of Adder is 1
AND
ALU is adding or
subtracting (ALUControl
is 00 or 01)
ALU with Status Flags: oVerflow
V = 1 if:
The addition of 2 same-
signed numbers
produces a result with
the opposite sign
V = 1 if:
ALU is performing addition or subtraction
(ALUControl1 = 0)
V = 1 if:
(ALUControl1 = 0)
AND
A and Sum have opposite signs
V = 1 if:
(ALUControl1 = 0)
AND
AND
A and B have same signs upon addition OR
A and B have different signs upon subtraction
V = 1 if:
(ALUControl1 = 0)
AND
AND
A and B have same signs upon addition
(ALUControl0 = 0) OR
A and B have different signs upon subtraction
(ALUControl0 = 1)
Shifters
Logical shifter: shifts value to left or right and fills empty spaces with 0’s
– Ex: 11001 >> 2 = 00110
– Ex: 11001 << 2 = 00100
Arithmetic shifter: same as logical shifter, but on right shift, fills empty
spaces with the old most significant bit (msb)
– Ex: 11001 >>> 2 = 11110
– Ex: 11001 <<< 2 = 00100
Rotator: rotates bits in a circle, such that bits shifted off one end are shifted
into the other end
– Ex: 11001 ROR 2 = 01110
– Ex: 11001 ROL 2 = 00111
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 5 <55> 5-<55>
Copyright © 2007 Elsevier
Shifter Design
A 3 A 2 A1 A0 shamt1:0
2
00 S1:0
01
10
Y3
11
shamt1:0 00
S1:0
2 01
Y2
10
A3:0 4 >> 4 Y3:0

11
00
S1:0
01
10
Y1
11
00
S1:0
01
10
Y0
11
Shifters as Multipliers, Dividers
• A << N = A × 2N
– Example: 00001 << 2 = 00100 (1 × 22 = 4)
– Example: 11101 << 2 = 10100 (-3 × 22 = -12)
• A >>> N = A ÷ 2N
– Example: 01000 >>> 2 = 00010 (8 ÷ 22 = 2)
– Example: 10000 >>> 2 = 11100 (-16 ÷ 22 = -4)
Multipliers
• Partial products formed by multiplying a single
digit of the multiplier with multiplicand
• Shifted partial products summed to form result
Decimal Binary
230 multiplicand 0101
x 42 multiplier x 0111
460 partial 0101
+ 920 products 0101
9660 0101
+ 0000
result 0100011
230 x 42 = 9660 5 x 7 = 35
4 x 4 Multiplier
A B
4 4
x
8
A3 A2 A1 A0
P
B0
B1
0
A3 A2 A1 A0 0
x B3 B2 B1 B0 B2
A3B0 A2B0 A1B0 A0B0
A3B1 A2B1 A1B1 A0B1 0
B3
A3B2 A2B2 A1B2 A0B2
+ A3B3 A2B3 A1B3 A0B3
0
P7 P6 P5 P4 P3 P2 P1 P0
P7 P6 P5 P4 P3 P2 P1 P0
Dividers
A/B = Q + R/B
Decimal Example: 2584/15 = 172 R4
Dividers
A/B = Q + R/B
Long-Hand:
Dividers
A/B = Q + R/B
Long-Hand: Long-Hand Revisited:
Dividers
A/B = Q + R/B
Decimal: 2584/15 = 172 R4 Binary: 1101/0010 = 0110 R1
Divider Algorithm
A/B = Q + R/B
Binary: 1101/10 = 0110 R1
R’ = 0
for i = N-1 to 0
R = {R’ << 1, Ai}
D=R-B
if D < 0, Qi= 0; R’= R
else Qi= 1; R’= D
R=R’
4 x 4 Divider
Legend
R B
R B
Cout Cin Cout Cin
+
D
D
N R'
0
N
R'
Division: A/B = Q + R/B
R’ = 0
for i = N-1 to 0
R = {R’ << 1, Ai}
D=R-B
if D < 0, Qi=0, R’=R
else Qi=1, R’=D
Each row computes one iteration of the division algorithm. R=R’
4 x 4 Divider
Legend
R B
R B
Cout Cin Cout Cin
+
D
D
N R'
0
N
R'
Each row computes one iteration of the division algorithm.
Number Systems
Numbers we can represent using binary
representations
– Positive numbers
• Unsigned binary
– Negative numbers
• Two’s complement
• Sign/magnitude numbers
What about fractions?
Numbers with Fractions
Two common notations:
• Fixed-point: binary point fixed
• Floating-point: binary point floats to the right of
the most significant 1
Fixed-Point Numbers
• 6.75 using 4 integer bits and 4 fraction bits:
01101100
0110.1100
2 1 -1 -2
2 + 2 + 2 + 2 = 6.75
• Binary point is implied
• The number of integer and fraction bits must be
agreed upon beforehand
Fixed-Point Number Example
• Represent 7.510 using 4 integer bits and 4
fraction bits.
01111000
Signed Fixed-Point Numbers
• Representations:
– Sign/magnitude
– Two’s complement
• Example: Represent -7.510 using 4 integer and 4 fraction
bits
– Sign/magnitude:
11111000
– Two’s complement:
1. +7.5: 01111000
2. Invert bits: 10000111
3. Add 1 to lsb: + 1
10001000
Floating-Point Numbers
• Binary point floats to the right of the most significant 1
• Similar to decimal scientific notation
• For example, write 27310 in scientific notation:

273 = 2.73 × 102
• In general, a number is written in scientific notation as:
± M × BE
– M = mantissa
– B = base
– E = exponent
– In the example, M = 2.73, B = 10, and E = 2
Floating-Point Numbers
1 bit 8 bits 23 bits
Sign Exponent Mantissa
• Example: represent the value 22810 using a 32-bit floating

point representation
We show three versions – final version is called the IEEE 754

floating-point standard
Floating-Point Representation 1
1. Convert decimal to binary
22810 = 111001002
2. Write the number in “binary scientific notation”:
111001002 = 1.110012 × 27
3. Fill in each field of the 32-bit floating point number:
– The sign bit is positive (0)
– The 8 exponent bits represent the value 7
– The remaining 23 bits are the mantissa
0 00000111 11 1001 0000 0000 0000 0000
Sign Exponent Mantissa
• First bit of the mantissa is always 1:
– 22810 = 111001002 = 1.11001 × 27
• So, no need to store it: implicit leading 1
• Store just fraction bits in 23-bit field

0 00000111 110 0100 0000 0000 0000 0000
Sign Exponent Fraction
• Biased exponent: bias = 127 (011111112)
– Biased exponent = bias + exponent
– Exponent of 7 is stored as:
127 + 7 = 134 = 0x100001102
• The IEEE 754 32-bit floating-point representation of 22810

0 10000110 110 0100 0000 0000 0000 0000
Sign Biased Fraction
Exponent
in hexadecimal: 0x43640000
Floating-Point Example
Write -58.2510 in floating point (IEEE 754)
Floating-Point Example
Write -58.2510 in floating point (IEEE 754)
1. Convert decimal to binary:
58.2510 = 111010.012
2. Write in binary scientific notation:
1.1101001 × 25
3. Fill in fields:
Sign bit: 1 (negative)
8 exponent bits: (127 + 5) = 132 = 100001002
23 fraction bits: 110 1001 0000 0000 0000 0000
1 100 0010 0 110 1001 0000 0000 0000 0000
in hexadecimal: 0xC2690000
Floating-Point: Special Cases
Number Sign Exponent Fraction

0 X 00000000 00000000000000000000000
∞ 0 11111111 00000000000000000000000
-∞ 1 11111111 00000000000000000000000
NaN X 11111111 non-zero
Floating-Point Precision
• Single-Precision:
– 32-bit
– 1 sign bit, 8 exponent bits, 23 fraction bits
– bias = 127
• Double-Precision:
– 64-bit
– 1 sign bit, 11 exponent bits, 52 fraction bits
– bias = 1023
Floating-Point: Rounding
• Overflow: number too large to be represented
• Underflow: number too small to be represented
• Rounding modes:
– Down
– Up
– Toward zero
– To nearest
• Example: round 1.100101 (1.578125) to only 3 fraction bits
– Down: 1.100
– Up: 1.101
– Toward zero: 1.100
– To nearest: 1.101 (1.625 is closer to 1.578125 than 1.5 is)
Floating-Point Addition
1. Extract exponent and fraction bits
2. Prepend leading 1 to form mantissa
3. Compare exponents
4. Shift smaller mantissa if necessary
5. Add mantissas
6. Normalize mantissa and adjust exponent if necessary
7. Round result
8. Assemble exponent and fraction back into floating-point
format
Floating-Point Addition Example
Add the following floating-point numbers:
0x3FC00000
0x40500000
1. Extract exponent and fraction bits
0 01111111 100 0000 0000 0000 0000 0000
0 10000000 101 0000 0000 0000 0000 0000
For first number (N1): S = 0, E = 127, F = .1

For second number (N2): S = 0, E = 128, F = .101
2. Prepend leading 1 to form mantissa

N1: 1.1
N2: 1.101
3. Compare exponents
127 – 128 = -1, so shift N1 right by 1 bit
4. Shift smaller mantissa if necessary

shift N1’s mantissa: 1.1 >> 1 = 0.11 (× 21)
5. Add mantissas
0.11 × 21
+ 1.101 × 21
10.011 × 21
Floating Point Addition Example
6. Normalize mantissa and adjust exponent if necessary
10.011 × 21 = 1.0011 × 22
7. Round result
No need (fits in 23 bits)
8. Assemble exponent and fraction back into floating-point

format
S = 0, E = 2 + 127 = 129 = 100000012, F = 001100..
0 10000001 001 1000 0000 0000 0000 0000
in hexadecimal: 0x40980000
Counters
• Increments on each clock edge
• Used to cycle through numbers. For example,
– 000, 001, 010, 011, 100, 101, 110, 111, 000, 001…
• Example uses:
– Digital clock displays
– Program counter: keeps track of current instruction executing
CLK
N CLK
N N
+ Q
Q N N r
1
Reset
Reset
Counter Verilog (FSM style)
module counter (input logic clk, reset,
output logic [N-1:0] q);
logic [N-1:0] nextq;
// register
always_ff @(posedge clk, posedge reset)
if (reset) q <= 0;
else q <= nextq;
// next state CLK

CLK
N
assign nextq = q + 1; N N
Q
+
endmodule Q N 1
N r
Reset
Reset
Counter Verilog (better idiom)
module counter (input logic clk, reset,
output logic [N-1:0] q);
always_ff @(posedge clk, posedge reset)
if (reset) q <= 0;
else q <= q+1;
endmodule
CLK
N CLK
N N
Q
+
Q N N r
1
Reset
Reset
Divide-by-2N Counter
• Most significant bit of an N-bit counter toggles every 2N
cycles.
• Useful for slowing a clock. Ex: blink an LED
• Example: 50 MHz clock, 24-bit counter
• 2.98 Hz
Digitally Controlled Oscillator
• N-bit counter
• Add p on each cycle, instead of 1
• Most significant bit toggles at fout = fclk * p / 2N
• Example: fclk = 50 MHz clock

• How to generate a fout = 200 Hz signal?
• p/2N = 200 / 50 MHz
• Try N = 24, p = 67  fout = 199.676 Hz
• Or N = 32, p = 17179  fout = 199.990 Hz
Shift Registers
• Shift a new bit in on each clock edge
• Shift a bit out on each clock edge
• Serial-to-parallel converter: converts serial input (Sin) to
parallel output (Q0:N-1)
Symbol: Implementation:
CLK
N
Q Sin Sout
Sin Sout
Q0 Q1 Q2 QN-1
Shift Register with Parallel Load
• When Load = 1, acts as a normal N-bit register
• When Load = 0, acts as a shift register
• Now can act as a serial-to-parallel converter (Sin to Q0:N-1) or
a parallel-to-serial converter (D0:N-1 to Sout)
D0 D1 D2 DN-1
Load
Clk
Sin 0 0 0 0 Sout
1 1 1 1
Q0 Q1 Q2 QN-1
Shift Register Verilog Idiom
module shiftreg(input logic clk,
input logic load, sin,
input logic [N-1:0] d,
output logic [N-1:0] q,
output logic sout);
always_ff @(posedge clk)
if (load) q <= d;
else q <= {q[N-2:0], sin};
assign sout = q[N-1];
D0 D1 D2 DN-1
endmodule
Load
Clk
Sin 0 0 0 0 Sout
1 1 1 1
Q0 Q1 Q2 QN-1
Memory Arrays
• Efficiently store large amounts of data
• 3 common types:
– Dynamic random access memory (DRAM)
– Static random access memory (SRAM)
– Read only memory (ROM)
• M-bit data value read/ written at each
unique N-bit address
N
Address Array
Data
Memory Arrays
• 2-dimensional array of bit cells
• Each bit cell stores one bit
• N address bits and M data bits: N
Address Array
– 2Nrows and M columns
– Depth: number of rows (number of words)
M
– Width: number of columns (size of word)
Data
– Array size: depth × width = 2N × M
Address Data
11 0 1 0
2
Address Array 10 1 0 0
depth
01 1 1 0
3 00 0 1 1
Data width
Memory Array Example
• 22 × 3-bit array
• Number of words: 4
• Word size: 3-bits
• For example, the 3-bit word stored at address 10 is 100
Address Data
11 0 1 0
2
Address Array 10 1 0 0
depth
01 1 1 0
3 00 0 1 1
Data width
Memory Arrays
1024-word x
10
Address 32-bit
Array
32
Data
Memory Array Bit Cells
bitline
wordline
stored
bit
bitline = bitline =
wordline = 1 wordline = 0
stored stored
bit = 0 bit = 0
bitline = bitline =
stored stored
bit = 1 bit = 1
(a) (b)
Memory Array Bit Cells
bitline
wordline
stored
bit
bitline = 0 bitline = Z
stored stored
bit = 0 bit = 0
bitline = 1 bitline = Z
stored stored
bit = 1 bit = 1
(a) (b)
Memory Array
• Wordline:
– like an enable
– single row in memory array read/written
– corresponds to unique address
– only one wordline HIGH at once
2:4
Decoder bitline2 bitline1 bitline0
wordline3
11
2 stored stored stored
Address bit = 0 bit = 1 bit = 0
wordline2
10
stored stored stored
wordline1 bit = 1 bit = 0 bit = 0
01
bit = 1 bit = 1 bit = 0
wordline0
00
bit = 0 bit = 1 bit = 1
Data2 Data1 Data0

Types of Memory
• Random access memory (RAM): volatile
• Read only memory (ROM): nonvolatile
RAM: Random Access Memory
• Volatile: loses its data when power off
• Read and written quickly
• Main memory in your computer is RAM
(DRAM)
Historically called random access memory because any data

word accessed as easily as any other (in contrast to
sequential access memories such as a tape recorder)
ROM: Read Only Memory
• Nonvolatile: retains data when power off
• Read quickly, but writing is impossible or
slow
• Flash memory in cameras, thumb drives, and
digital cameras are all ROMs
Historically called read only memory because ROMs were
written at manufacturing time or by burning fuses. Once
ROM was configured, it could not be written again. This is
no longer the case for Flash memory and other types of
ROMs.
Types of RAM
• DRAM (Dynamic random access memory)
• SRAM (Static random access memory)
• Differ in how they store data:
– DRAM uses a capacitor
– SRAM uses cross-coupled inverters
Robert Dennard, 1932 -
• Invented DRAM in 1966
at IBM
• Others were skeptical
that the idea would
work
• By the mid-1970’s DRAM
in virtually all computers
DRAM
• Data bits stored on capacitor
• Dynamic because the value needs to be refreshed
(rewritten) periodically and after read:
– Charge leakage from the capacitor degrades the value
– Reading destroys the stored value
bitline bitline
wordline wordline
stored
bit stored
bit
DRAM
bitline bitline
wordline wordline
stored + + stored
bit = 1 bit = 0
SRAM
bitline
wordline
stored
bit
bitline bitline
wordline
Memory Arrays Review
2:4
wordline3
11
wordline2
10
01
bit = 1 bit = 1 bit = 0
wordline0
00
bit = 0 bit = 1 bit = 1
Data2 Data1 Data0

DRAM bit cell: SRAM bit cell:
bitline bitline bitline
wordline wordline
ROM: Dot Notation
bitline
2:4
Decoder wordline
11
2
Address
bit cell
10
containing 0
01
bitline
wordline
00
bit cell
Data2 Data1 Data0 containing 1
Types of ROMs
Type Name Description
ROM Read Only Memory Chip is hardwired with presence or absence of
transistors. Changing requires building a new chip.
PROM Programmable ROM Fuses in series with each transistor are blown to
program bits. Can’t be changed after
programming.
EPROM Electrically Charge is stored on a floating gate to activate or
Programmable ROM deactivate transistor. Erasing requires exposure to
UV light.
EEPROM Electrically Erasable Like EPROM, but erasing can be done electrically.
Programmable ROM
Flash Flash Memory Like EEPROM, but erasing is done on large blocks
to amortize cost of erase circuit. Low cost per bit,
dominates nonvolatile storage today.
Fujio Masuoka, 1944 -
• Developed memories and high speed
circuits at Toshiba, 1971-1994
• Invented Flash memory as an
unauthorized project pursued during
nights and weekends in the late 1970’s
• The process of erasing the memory
reminded him of the flash of a camera
• Toshiba slow to commercialize the
idea; Intel was first to market in 1988
• Flash has grown into a $25 billion per
year market
ROM Storage
2:4
Decoder Address Data
11
Address 2 11 0 1 0
10
10 1 0 0
depth
01 01 1 1 0
00 00 0 1 1
Data2 Data1 Data0
width
ROM Logic
2:4
Decoder
Address 2
11
Data2 = A1 ^ A0
10
Data1 = A1 + A0
01
00
Data0 = A1A0
Data2 Data1 Data0
Example: Logic with ROMs
Implement the following logic functions using a 22 × 3-bit ROM:
– X = AB
– Y=A+B 2:4
Decoder
– Z=AB
11
2
A, B
10
01
00
X Y Z
Example: Logic with ROMs
Implement the following logic functions using a 22 × 3-bit ROM:
– X = AB
– Y=A+B 2:4
Decoder
– Z=AB
11
2
A, B
10
01
00
X Y Z
Logic with Any Memory Array
2:4
wordline3
11
wordline2
10
01
bit = 1 bit = 1 bit = 0
wordline0
00
bit = 0 bit = 1 bit = 1
Data2 Data1 Data0
Data2 = A1  A0
Data1 = A1 + A0
Data0 = A1A0
Logic with Memory Arrays
Implement the following logic functions using a 22 × 3-bit
memory array:
– X = AB
– Y=A+B
– Z=AB
Implement the following logic functions using a 22 × 3-bit
memory array:
– X = AB 2:4
– Y=A+B Decoder
wordline3
bitline2 bitline1 bitline0
11
– Z=AB stored stored stored
A, B 2
bit = 1 bit = 1 bit = 0
wordline2
10
01
bit = 0 bit = 1 bit = 0
wordline0
00
bit = 0 bit = 0 bit = 0
X Y Z
Called lookup tables (LUTs): look up output at each input
combination (address)
4-word x 1-bit Array
2:4
Decoder bitline
Truth
Table 00
stored
A A1
bit = 0
A B Y 01
B A0
0 0 0 stored
0 1 0 bit = 0
1 0 0 10
1 1 1 stored
bit = 0
11
stored
bit = 1
Y
Multi-ported Memories
• Port: address/data pair
• 3-ported memory
– 2 read ports (A1/RD1, A2/RD2)
– 1 write port (A3/WD3, WE3 enables writing)
• Register file: small multi-ported memory
CLK
WE3
A1 RD1
N M
A2 RD2
N M
A3 Array
N
WD3
M
SystemVerilog Memory Arrays
// 256 x 64 memory module with one read/write port
module dmem(input logic clk, we,
input logic [7:0] a,
input logic [63:0] wd,
output logic [63:0] rd);
logic [63:0] RAM[255:0];
always @(posedge clk)

begin
rd <= RAM[a]; // synchronous read
if (we)
RAM[a] <= wd; // synchronous write
end
endmodule
SystemVerilog Register File
// 16 x 32 register file with two read, 1 write port
module rf(input logic clk, we3,
input logic [3:0] a1, a2, a3,
input logic [31:0] wd3,
output logic [31:0] rd1, rd2);
logic [31:0] RAM[15:0];
always @(posedge clk) // synchronous write

if (we3)
RAM[a3] <= wd3;
assign rd1 = RAM[a1]; // asynchronous read
assign rd2 = RAM[a2];
endmodule
Logic Arrays
• PLAs (Programmable logic arrays)
– AND array followed by OR array
– Combinational logic only
– Fixed internal connections
• FPGAs (Field programmable gate arrays)
– Array of Logic Elements (LEs)
– Combinational and sequential logic
– Programmable internal connections
PLAs
• X = ABC + ABC
Inputs
• Y = AB M
AND Implicants OR
ARRAY N ARRAY
P
Outputs
A B C
OR ARRAY
ABC
ABC
AB
AND ARRAY
X Y
PLAs: Dot Notation
Inputs
M
AND Implicants OR
ARRAY N ARRAY
P
Outputs
A B C
OR ARRAY
ABC
ABC
AB
AND ARRAY
X Y
FPGA: Field Programmable Gate Array
• Composed of:
– LEs (Logic elements): perform logic
– IOEs (Input/output elements): interface with outside
world
– Programmable interconnection: connect LEs and
IOEs
– Some FPGAs include other building blocks such as
multipliers and RAMs
General FPGA Layout
LE: Logic Element
• Composed of:
– LUTs (lookup tables): perform combinational logic
– Flip-flops: perform sequential logic
– Multiplexers: connect LUTs and flip-flops
Altera Cyclone IV LE
Altera Cyclone IV LE
• The Altera Cyclone IV LE has:
– 1 four-input LUT
– 1 registered output
– 1 combinational output
LE Configuration Example
Show how to configure a Cyclone IV LE to perform the following
functions:
– X = ABC + ABC
– Y = AB
LE Configuration Example
Show how to configure a Cyclone IV LE to perform the following
functions:
– X = ABC + ABC
– Y = AB (A) (B)
data 1 data 2
(C)
data 3 data 4
(X)
LUT output
0 0 0 X 0
0 0 1 X 1
A data 1
0 1 0 X 0
B data 2
0 1 1 X 0 C
data 3 X
1 0 0 X 0
0 data 4
1 0 1 X 0 LUT
1 1 0 X 1
LE 1
1 1 1 X 0
(A) (B) (Y)

data 1 data 2 data 3 data 4 LUT output
0 0 X X 0 A data 1
0 1 X X 0 B data 2
1 0 X X 1 0 data 3 Y
1 1 X X 0 0 data 4 LUT
LE 2
LE Example: AND5
How many LEs are required to build a 5-input AND gate?
Solution: 2. First performs AND4 (function of 4 variables).

Second performs AND2 of the first result and the 5th input.
LE Example: 3-bit counter
How many LEs are required to build a 3-bit counter?
Solution: 3. The counter has 3 flip-flops, so it requires at least 3

LEs. The add logic for each bit is a function of less than 4
variables, so it can fit in the LUT before the flop. Hence, 3 LEs is
sufficient.
FPGA Design Flow
Using a CAD tool (such as Altera’s Quartus II)
• Enter the design with a HDL
• Simulate the design
• Synthesize design and map it onto FPGA
• Download the configuration onto the FPGA
• Test the design
This is an iterative process!

Chapter 5 Harris Module2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5 Harris Module2

Uploaded by

Copyright:

Available Formats

Chapter 5

Digital Design and Computer Architecture: ARM® Edition

and Two outputs- S and Cout.

S is the sum of A and B.

IfA and B are both 1, S is 2,

Cout Cout Cin

A B Cout S Cin A B Cout S

A31 B31 A30 B30 A1 B1 A0 B0

The fundamental reason that large ripple-carry adders

Ci = Ai Bi + (Ai + Bi )Ci-1 = Gi + Pi Ci-1

S31:28 S27:24 S7:4 S3:0

S31:28 S27:24 S7:4 S3:0

– tpg : delay to generate all Pi, Gi

An N-bit carry-lookahead adder is generally much faster than a

14:13 12:11 10:9 8:7 6:5 4:3 2:1 0:-1

14:11 13:11 10:7 9:7 6:3 5:3 2:-1 1:-1

14:7 13:7 12:7 11:7 6:-1 5:-1 4:-1 3:-1

14:-1 13:-1 12:-1 11:-1 10:-1 9:-1 8:-1 7:-1 Gi-1:-1 Ai Bi

tpg: delay to produce Pi, Gi (AND or OR gate)

A3:0 4 >> 4 Y3:0

Each row computes one iteration of the division algorithm.

What about fractions?

• For example, write 27310 in scientific notation:

1 bit 8 bits 23 bits

Sign Exponent Mantissa

• Example: represent the value 22810 using a 32-bit floating

We show three versions – final version is called the IEEE 754

1 bit 8 bits 23 bits

1 bit 8 bits 23 bits

Number Sign Exponent Fraction

For first number (N1): S = 0, E = 127, F = .1

2. Prepend leading 1 to form mantissa

4. Shift smaller mantissa if necessary

8. Assemble exponent and fraction back into floating-point

// next state CLK

• Example: fclk = 50 MHz clock

Data2 Data1 Data0

Historically called random access memory because any data

Data2 Data1 Data0

Data2 Data1 Data0

logic [63:0] RAM[255:0];

always @(posedge clk)

logic [31:0] RAM[15:0];

always @(posedge clk) // synchronous write

(A) (B) (Y)

Solution: 2. First performs AND4 (function of 4 variables).

Solution: 3. The counter has 3 flip-flops, so it requires at least 3

This is an iterative process!

You might also like