Professional Documents
Culture Documents
Irwin]
Memory
11100 src1 data
32
32
src2 data
Fetch Exec
Decode
32 ALU 32
br offset
32
32 4 0 5 1 6 2 7 3
00000
word address (binary)
32
PC 4
32 Add 32 Add 32 32
32 bits
32
fixed size instructions 32-bits only three instruction formats three instruction formats limited instruction set limited number of registers in register file limited number of addressing modes arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands
Smaller is faster
memory-reference instructions: lw, sw arithmetic-logical instructions: add, addu, sub, subu, and, or, xor, nor, slt, sltu arithmetic-logical immediate instructions: addi, addiu, andi, ori, xori, slti, sltiu control flow instructions: beq, j use the program counter (PC) to supply the instruction address and fetch the instruction from memory (and update the PC) decode the instruction (and read registers) execute the instruction
Fetch PC PC+4 Exec Decode
Generic implementation:
elements that operate on data values (combinational) elements that contain state (sequential)
Instruction Memory
PC
Address Instruction
Write Data
Register
Reg Addr
Address
File
Reg Addr Reg Addr
Data Memory
Write Data
Read Data
Single cycle operation Split memory (Harvard) model - one memory for instructions and one for data
Clocking Methodologies
Clocking methodology defines when signals can be read and when they can be written
falling (negative) edge
clock cycle
clock period = 1/(clock cycle) e.g., 10 nsec clock cycle = 100 MHz clock period 1 nsec clock cycle = 1 GHz clock period
State Elements
Level
sensitive D latch
D clock
clock Q
!Q
Latches: output changes whenever the inputs change and the clock is asserted (level sensitive methodology) Two-sided timing constraint Outputs reflect inputs for half a period and are guaranteed to be stable for the other half Flip-flop: output changes only on a clock edge (edgetriggered methodology) One-sided timing constraint The beauty is that outputs are guaranteed to be isolated from inputs for the entire clock period
A clocking methodology defines when signals can be read and written wouldnt want to read a signal at the same time it was being written
CECS 440 Datapath & Control_1.8 (c) 2012 R. W. Allison
Our Implementation
An edge-triggered methodology Typical execution
read contents of some state elements send values through some combinational logic write results to one or more state elements
State element 1 State element 2
Combinational logic
clock
Assumes state elements are written on every clock cycle; if not, need explicit write control signal
write occurs only when both the write control is asserted and the clock edge occurs
(c) 2012 R. W. Allison
Fetching Instructions
Fetching instructions involves
reading the instruction from the Instruction Memory updating the PC value to be the address of the next (sequential) instruction
clock
4
Exec
PC is updated every clock cycle, so it does not need an explicit write control signal Instruction Memory is read every clock cycle, so it doesnt need an explicit read control signal
(c) 2012 R. W. Allison
Decoding Instructions
Decoding instructions involves
sending the fetched instructions opcode and function field bits to the control unit
Fetch Decode
clock
Fetch PC = PC+4
Control Unit
Exec
Decode
Instruction Memory
Address Instruction
Read Addr1
Register
Instruction
Read Addr 2
Read Data1
and
File
Write Addr Write Data Read Data2
Since we havent decoded the instruction yet, we dont know what the instruction is! Just in case the instruction uses values from the RegFile, we do work ahead by reading the two source operands
R-type:
op
rs
rt
rd
shamt
funct
perform operation (op and funct) on values in rs and rt store the result back into the Register File (into location rd)
Fetch
Decode Execute RegWrite ALU control
clock
Read Addr1
Fetch PC = PC+4
Register
Instruction
Read Addr 2
Read Data1
File
Write Addr
ALU
Read Data2
overflow zero
Exec
Decode
Write Data
Note that Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File
(c) 2012 R. W. Allison
Fetch
Decode
Execute
clock
Read Addr1
Fetch PC = PC+4
Register
Instruction
Read Addr 2
Read Data1
File
Write Addr
ALU
Read Data2
overflow zero
Exec
Decode
Write Data
Where does the 1 (or 0) come from to store into $t0 in the Register File at the end of the execute cycle?
(c) 2012 R. W. Allison
I-type:
op
rs
rt
address offset
compute a memory address by adding the base register (in rs) to the 16-bit signed offset field in the instruction
base register was read from the Register File during decode offset value in the low order 16 bits of the instruction must be sign extended to create a 32-bit signed value
store value, read from the Register File during decode, must be written to the Data Memory load value, read from the Data Memory, must be stored in the Register File
(c) 2012 R. W. Allison
RegWrite
MemWrite
Read Addr1
rs Read Data1
Register
Instruction
Read Addr 2
Address
File
Write Addr
Write Data Read Data2
ALU
Data In Write Data if store
Data Memory
Data Out
16
Sign Extend
32
s.e. = { 16{IR[15]}, IR[15:0] }
MemRead
I-type:
op
rs
rt
address offset
compare the operands read from the Register File during decode (rs and rt values) for equality (zero ALU output) compute the branch target address by adding the updated PC to the sign extended 16-bit signed offset field in the instruction base register is the updated PC address offset value in the low order 16 bits of the instruction must be sign extended to create a 32-bit signed value and then shifted left 2 bits to turn it into a word address
(c) 2012 R. W. Allison
Add
Shift left 2
Add
PC
Read Addr1 rs
Register
Instruction
Read Addr 2
Read Data1
File
Write Addr Write Data Read Data2 rt
16
J-Type:
op
replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits
28
no datapath resource can be used more than once per instruction, so some must be duplicated (e.g., why we have a separate Instruction Memory and Data Memory) to share datapath elements between two different instruction classes will need multiplexers (muxs) at the input of the shared elements with control lines to do the selection
MemWrite
Instruction Memory PC
Address Instruction
Read Addr1
Register
Read Addr 2
File
Write Addr Write Data Read Data2
ALU
Data In
Data Memory
Data Out
MemRead Sign Extend 32 What instructions require us to have two paths to here?
Multiplexor Insertion
Add
4 RegWrite
MemWrite
MemtoReg
Instruction Memory PC
Address Instruction
Read Addr1
Register
Read Addr 2
File
Write Addr Write Data Read Data2
M U X
ALU
Data In
Data Memory
Data Out
M U X
16
Sign Extend
32
MemRead
Clock Distribution
System Clock clock cycle
Add
4 RegWrite
MemWrite
MemtoReg
Instruction Memory PC
Address Instruction
Read Addr1
Register
Read Addr 2
File
Write Addr Write Data Read Data2
M U X
ALU
Data In
Data Memory
Data Out
M U X
16
Sign Extend
32
MemRead
Shift left 2
Add
M U X
PCSrc MemWrite
Add
4 RegWrite
MemtoReg
Instruction Memory PC
Address Instruction
Read Addr1
Register
Read Addr 2
File
Write Addr Write Data Read Data2
M U X
ALU
Data In
Data Memory
Data Out
M U X
16
Sign Extend
32
MemRead
ALU might not produce right answer right away Memory and RegFile reads are combinational (as are ALU, adders, muxs, shifter, sign-extender) Use write signals along with the clock edge to determine when to write to the sequential elements (to the PC, to the Register File and to the Data Memory)
The clock cycle time is determined by the logic delay through the longest path
We are ignoring some details like register setup and hold times
CECS 440 Datapath & Control_1.25 (c) 2012 R. W. Allison