Chapter 4 The Processor

COMPUTER ORGANIZATION AND DESIGN5th
Edition
The Hardware/Software Interface
Chapter 4
The Processor
Contents
 4.1 Introduction
 4.3 Building a Datapath
 4 4 A Simple Implementation Scheme
4.4
 4.5 An Overview of Pipelining
 4.6 Pipelined Datapath and Control
2
4.1 Introduction
 Three CPU performance factors
 Instruction count
 Determined byy ISA and compiler
p
 E.g.,
 Pseudo instruction: bgt $t4, $t5, L
 Actual ones: slt $at, $t5, $t4
 bne $at, $zero, L
 CPI and Cycle time
 Determined by CPU hardware (i.e., implementation)
3
Implementation
 This Chapter examines two MIPS
implementations (i.e., datapath and the
control unit))
 A simplified version
 A more realistic pipelined version
 Simple subset, shows most aspects
 Memory reference: lw, sw
 Arithmetic/logical: add,
add sub,
sub and,
and or,
or slt
 Control transfer: beq, j
4
Instruction Execution (1/2)
 Phases of instruction execution
 Instruction fetch
 Send the PC to the memoryy and fetch the
instruction
 Decoding
g
 Operands fetch
 Read one or two registers
registers, using fields of the
instruction to select the registers to read
e.g., lw $10, 24($12)

add $10, $11, $12
5
Instruction Execution (2/2)
 Execution, depending on instruction class
 memory reference
f instructions:
i t ti an address
dd
calculation
 arithmetic logical instructions: an operation
arithmetic-logical
execution
 branches: comparison
 Write back
 memory reference
f instructions:
i t ti access the
th memory
 arithmetic-logical instructions: write the data from
the ALU back into a register
 branches: PC  target address or PC + 4
e.g., lw $10, 24($12)
add $10, $11, $12 6
An Abstract View of the Implementation
Fig.
g 4.1
op rs rt address
lw $10, 24($12) 35 base12 10 24
op rs rt rd shamt funct
add $10, $11, $12 0 11 12 10 0 32 7
Basic Implementation
 Fig. 4.2 shows the datapath of Fig. 4.1 with the
th
three required
i d multiplexors
lti l added,
dd d as wellll as
control lines for the major functional units
 In the remainder of the chapter, we refine this
view to fill in the details
8
Multiplexers
 Can’t just join wires
t
together
th
 Use multiplexers
Fig 44.1
Fig. 1
9
The Basic Implementation
Fig. 4.2
1/3 hour
10
4.3 Building a Datapath
 Datapath elements
 Elements that process data and addresses
in the CPU
 E.g., Registers, ALUs, mux’s, memories, …
 Instruction supported
 Arithmetic/logical (R-format): add, sub, and,
or, slt
l
 Memory reference: lw, sw
 Control transfer: beq, j
11
Datapath for Fetching Instruction
Fig. 4.6
Increment by
4 for next
instruction
12
R-Format Instructions
 Register file
 A collection of registers in which any register can be
read or written by specifying the number of the
register
i t ini th
the fil
file
 Read two register operands
 Perform arithmetic/logical operation
 Write
te register
eg ste result
esu t (co
(controlled
t o ed by tthe
e write
te
control signal)
op rs rt address
lw $10, 24($12) 35 12 10 24
op
p rs rt rd shamt funct
add $10, $11, $12 0 11 12 10 0 32
13
Full Datapath
Fig. 4.11
14
Load/Store Instructions
 Read register operands
 C l l t address
Calculate dd using
i 1616-bit
bit offset
ff t
 Use ALU, but sign-extend offset
 Load:
L d Read
R d memory and d update
d t register
i t
 Store: Write register value to memory
Exercise: data path for sw $10, 24($12)
2A
2B
2/3 hour
15
Exercise 班級：學號：姓名：
Answer sw $10, 24($12)
12 $12
10
$10
24 24
lw $10, 24($12)
12 $12
10
24 24
R-Type/Load/Store Datapath
Fig. 4.10
19
Branch Instructions
 The datapath for a branch (Fig 4.9)
 e.g., beq $t0, $s5, Exit
 Calculate the branch target address
 The offset field is shifted left 2 bits so that it is a word
offset
 Adding the sign-extended offset field of the instruction
to the PC
 Compare operands
 Do a subtract
 If the Zero signal out of the ALU unit is asserted, we
know that the two values are equal
20
Review: Showing Branch Offset
Loop: add $t1, $s3, $s3 # location 80000

add
dd …
add …
lw … # location 80012
beq $t0, $s5, Exit # location 80016
add …
j Loop # location 80024
Exit: # location 80028
>> Answer
A
80028 (Exit) = 80020 (PC) + 8 (Offset)
beq 4 8 21 2
$t0 $s5 Exit
PC relative addressing
PC-relative
21
Branch Instructions
Fig. 4.9
Sign-bit wire
replicated
p
beq $t0, $s5, Exit 4 8 21 2 22

3/3 hour
beq $t0, $s5, Exit 4 8 21 2 23
80020
80016
Fig. 4.11
23
4.5 An Overview of Pipelining
 Pipelining is an implementation technique
in which multiple instructions are
pp in execution
overlapped
 All stages in pipelining are operating
concurrently
24
Laundry Analogy for Pipelining (1/2)
1.Washer
1 Washer
2.Dryer
3.Fold
4.Put away
Fig. 4.25
25
Laundry Analogy for Pipelining (2/2)
 Four stages in pipeline (Figure 4.25)
 non-pipelining approach: 16 units in execution
time
 pipelining approach: 7 units
 Pipelining improves throughput of our
laundry system without improving the time
to complete a single task
 The speedup due to pipelining is likely
equal to the number of stages in the
pipeline
26
MIPS Pipeline
 Five stages, one step per stage
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access (data) memory operand
5. WB: Write result back to register
Clock cycle length = 800 ps
Fig 4.27 (a) Single-cycle, nonpipelined 27

1-Cycle vs Pipelined Performance
Instr Instr fetch Register ALU op Memory Register Total time

read access write
lw 200ps 100 ps 200ps 200ps 100 ps 800ps
sw 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps
Fig. 4.26
28
Figure 4.27
Clock cycle length = 200 ps
stage
Computer
p Organization
g
(b) Single-cycle, pipelined
29
ISA Design for Pipelining
 MIPS ISA designed for pipelining
 All instructions are 32-bits
 Easier to fetch and decode in one cycle
 c.f. x86: 1- to 17-byte instructions
 Few and regular instruction formats
 Can decode and read registers in one step
 Load/store addressing
 Can calculate address in 3rd stage, access memory
in 4th stage
 Ali
Alignment
t off memory operands
d
 Memory access takes only one cycle
1/5 hour
30
Case Study
 ]由玻璃構成的天 ((11:24))
[[video]由玻璃構成的一天
31
Pipeline Hazards
 There are situations in pipelining when the
next instruction cannot execute in the
g clock cycle.
following y
 These events are called hazards
 Structure hazards
 Conflict for use of a resource
 D t h
Data hazard
d
 Need to wait for previous instruction to complete its
data read/write
 Control hazard
 Deciding on control action depends on previous
instruction
32
Structural Hazards
 Example
 If the
th pipeline
i li iin Fi
Figure 4
4.27(b)
27(b) h
had
d a ffourth
th iinstruction
t ti
 In the same clock cycle that the first instruction is accessing data
from memory while the fourth instruction is fetching an instruction
from the same memory
 Without two memories, our pipeline could have a
structure hazard
lw $4, 400($0)
Memory
reference
33
2/5 hour
34
Data Hazards
 Data hazards arise from the dependence of one
instruction on an earlier one that is still in the pipeline
 Example
add $$s0,, $t0,
$ , $t1
$
Data dependency
sub $t2, $s0, $t3
 The add instruction does not write its result until the fifth stage,
g
meaning that we would have to add two bubbles to the pipeline
(next page)
add $s0, $t0, $t1
sub $t2, $s0, $t3


35
How to Solve Data Hazards?
 Waiting 2 clock cycles
 Correct, but too pessimistic
new $s0 available
36
Exercise
 Solve three data hazards by specifying
extra waiting clock cycles.
add $3, $4, $6
sub $5, $3, $2
lw $7, 100($5)
add $8, $7, $2
37
Answer
add $3, $4, $6

2 waiting clock cycles
sub $5, $3, $2
2 waiting
i i clock
l k cycles
l
lw $7, 100($5)
2 waiting clock cycles
add $8
$8, $7
$7, $2
38
Forwarding
 Use result ($s0) when it is computed
 Don’t wait for it to be stored in a register
 Requires extra connections in the datapath
Fig. 4.29 39
Load-Use Data Hazard
 Can’t always avoid stalls by forwarding
 If value not computed when needed
 Can’tt forward backward in time!
Can
stall one stage
Fig. 4.30 40
Exercise 1
 Show the forwarding paths needed to
execute the following four instructions:
add $3, $4, $6
sub $5, $3, $2
lw $7, 100($5)
add $8, $7, $2
41
Answer
add $3, $4, $6
sub $5, $3, $2
lw $7, 100($5)
2B
add $8, $7, $2
42
Exercise 2
 Identify all of the data dependencies in the
following code.
 Which dependencies are data hazards that
will be resolved via forwarding?
 Which dependencies are data hazards that
will cause a stall?
add $3, $4, $2

sub
b $5
$5, $3
$3, $1
lw $6, 200($3)
add $7, $3, $6
43
Answer
 add $3, $4, $2
sub $5
$5, $3
$3, $1
lw $6, 200($3)
add $7,
$ , $
$3,
, $
$6
 Answer
 The data dependency between the load and the last add
instruction cannot be resolved by using forwarding.
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
44
Reordering Code to Avoid Stalls
 Example.
Consider the following code segment in C:
A = B + E;
C = B + F;
Here is the generated MIPS code for this segment:
lw $t1,
$ , 0($t0)
($ ) #load B
#
lw $t2, 4($t0) #load E
add $t3, $t1, $t2 #$t3 = B+E
sw $t3, 12($t0) #store A = $t3
lw $t4, 8($t0) #load F
add $t5, $t1, $t4 #$t5 = B+F
sw $t5, 16($t0) #store C = $t5
Reorder the instructions to avoid any pipeline stalls.
45
 Answer.
 Both add instructions have a hazard
l
lw $t2 4($t0)
$t2,4($t0)
add $t3,$t1,$t2
 On a p
pipelined
p p
processor with forwarding,
g, the
reordered sequence will complete in two
fewer cycles
y than the original
g version
46
 Answer (cont’d).
 Moving up the 3rd lw
l instruction eliminates both hazards:
lw $t1, 0($t0)
lw $t2, 4($t0)
lw $t4
$t4, 8($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
add $t5
$t5, $t1 $t4
$t1,
sw $t5, 16($t0)
lw $t2 4($t0)
$t2,
lw $t4,
$t , 48($t0)
($t )
add $t3, $t1, $t2

4/5 hour
47
Control Hazards
 Control hazard arises from the need to make a
decision based on the results of one instruction
while others are executing
 Fetching next instruction depends on branch outcome
 Pipeline can’t always fetch correct instruction
 Still working
g on ID stage
g of branch
48
Fig. 4.21
49
Stall on Branch
 Example:
beq $1, $2, 40
or $7, $8, $9
We must begin fetching the or instruction on the next
clock cycle. But the pipeline cannot possible know
what the next instruction should be.
beq $1,$2,40 IF ID EXE MEM WB
or $7,
$7 $8
$8, $9 IF ID EXE MEM WB
50
Stall on Branch
 Assume that we put in extra hardware so that we
can test
t t register,
i t calculate
l l t the
th branch
b h address,
dd
and update the PC during the 2nd stage of the
pipeline.
i li
beq $1,$2,40 IF ID EXE MEM WB
or $7, $8, $9 IF ID EXE MEM WB
Fig. 4.31
51
Branch Prediction
 One simple approach is to always predict
that branches will untaken
 When you are right, the pipeline proceeds at
full speed (see Figure 4.32 (a))
 Only when branches are taken does the
pipeline stall (see Figure 4.32 (b))
 A more sophisticated
hi ti t d version
i off b
branch
h
prediction would have some branches
predicted as taken and some as untaken
52
MIPS with Predict Not Taken
Prediction
correct
Prediction
incorrect
xor
Fig. 4.32 53
More-Realistic Branch Prediction
 Dynamic branch prediction
 Hardware measures actual branch behavior
 e.g., record recent history of each branch
 A
Assume f t
future behavior
b h i will
ill continue
ti th
the ttrend
d
 When wrong, stall while re-fetching, and update history
54
Exercise
 Refer to the following sequence of
instructions that are executed in a five-
g p
stage pipeline.
p
lw $1, 40($6)
add
dd $6
$6, $2
$2, $2
sw $5, 50($1)
55
1) Indicate hazards and add nop instructions
to eliminate them.
2) Assume there is full forwarding
forwarding. Indicate
hazards and add nop instructions to
eliminate them
them.
56
Answer
(1)
lw $1, 40($6)
add $
$6,, $
$2,, $
$2
sw $5, 50($1)
hazard
IF ID EXE MEM WB
lw $1, 40($6)
add $6
$6, $2
sw $5, 50($1)
IF ID EXE MEM WB
57
lw $1, 40($6)
add $6, $2, $2
sw $5, 50($1)
IF ID EXE MEM WB
lw $1,
$ 40($6)
($ )
add $6, $2, $2 IF ID EXE MEM WB
nop
sw $5, 50($1)
IF ID EXE MEM WB
58
(2)
lw $1, 40($6)
add $
$6,, $
$2,, $
$2
sw $5, 50($1)
forwarding
IF ID EXE MEM WB
lw $1, 40($6)
add $6
$6, $2
sw $5, 50($1)
IF ID EXE MEM WB
59
Case Study: NFC
 [[video]] Amazing-Smart-Mobile-Life-with-NFC
g ((3:55))
2A
3/5 hour
60
4.6 MIPS Pipelined Datapath
MEM
Right-to-left
Ri ht t l ft WB
flow leads to
hazards
61
Pipeline registers
 Need registers between stages
 To hold information produced in previous cycle
62
Pipeline Operation
 Cycle-by-cycle flow of instructions through
the pipelined datapath
 “Single-clock-cycle”
Single clock cycle pipeline diagram
 Shows pipeline usage in a single cycle
 Highlight resources used
 c.f. “multi-clock-cycle” diagram
 Graph of operation over time
 We’ll look at “single-clock-cycle” diagrams
for load & store
63
IF for Load, Store, …
64
ID for Load, Store, …
65
EX for Load
66
MEM for Load
67
WB for Load
Wrong
register
number
68
Corrected Datapath for Load
69
EX for Store
70
MEM for Store
71
WB for Store
72

Chapter 4 The Processor

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 The Processor

Uploaded by

Copyright:

Available Formats

COMPUTER ORGANIZATION AND DESIGN5th

e.g., lw $10, 24($12)

Exercise: data path for sw $10, 24($12)

Loop: add $t1, $s3, $s3 # location 80000

beq $t0, $s5, Exit 4 8 21 2 22

Clock cycle length = 800 ps

Fig 4.27 (a) Single-cycle, nonpipelined 27

Instr Instr fetch Register ALU op Memory Register Total time

Clock cycle length = 200 ps

(b) Single-cycle, pipelined

add $s0, $t0, $t1

sub $t2, $s0, $t3

new $s0 available

add $3, $4, $6

stall one stage

add $3, $4, $6

sub $5, $3, $2

add $8, $7, $2

add $3, $4, $2

add $t3, $t1, $t2

beq $1,$2,40 IF ID EXE MEM WB

beq $1,$2,40 IF ID EXE MEM WB

or $7, $8, $9 IF ID EXE MEM WB

You might also like