You are on page 1of 39

1

DIGITAL LOGIC DESIGN


( CE_118 )

CHAPTER 4:
PROCESSOR DESIGN
(part_1)
2

Outline
Ø We introduced new concepts:
ü Instruction sets
ü Instruction types
ü Addressing modes
ü Instruction-execution cycle
ü Processor design flow

Ø Including
ü instruction set design,
ü instruction set flowcharts,
ü component allocation,
ü ASM charts
ü processor architecture

Ø We have demonstrated processor design


ü 16-bit CISC design
ü 32-bit RISC design
o data-forwarding
o branch prediction
3

Outline
Ø We introduced new concepts:
ü Instruction sets
ü Instruction types
ü Addressing modes
ü Instruction-execution cycle
ü Processor design flow

Ø Including
ü instruction set design,
ü instruction set flowcharts,
ü component allocation,
ü ASM charts
ü processor architecture

Ø We have demonstrated processor design


ü 16-bit CISC design
ü 32-bit RISC design
o data-forwarding
o branch prediction
4

Basic definitions
Ø Processor controls overall system operations, supervising I/O devices

(keyboards, monitors, disks, tapes), synchronize the data between


system and components and perform most of computational tasks.

Ø ASIC (Application Specification Integrated Circuit) is a co-processor

that executes one or more specific tasks much faster than the processor
itself. For this reason, such ASICs are called accelerators, since the
processor offloads computationally intensive tasks to them.

Ø ASIP (Application Specification Instruction Processor) is a processor

that possesses specific instructions that will allow some applications to


execute much faster than on an ordinary processor.
5

Basic computer architecture


6

Instruction Set
Ø Instruction Set (IS) is a set of variety of instructions and
instructions format, which will be interpreted by the processor’s
control unit and executed in the processor’s datapath.
Ø An instruction is a string of bits grouped into a number of
different fields, such as
ü Operation code (op-code)
ü Instruction types
ü Addressing fields

Ø A typical instruction, such as, a = b + c, where a, b, and c are


stored in memory at location A, B, and C can be expressed in
assembly language format,
ü Add A, B, C (Mem[A] <- Mem[B]+ Mem[C])

(assembly instruction) (mathematical notation)


7

Typical instructions: instruction-type field


Ø Register instructions (arithmetic, logic, shift)
ü Add RA, RB, RC ( RF[A] ß RF[B] + RF[C] )
Ø Move instructions (load and store)

ü Load R2, A ( RF[2] ß MEM[A] )


ü Store A, R2 ( MEM[A] ß RF[2] )
Ø Control instructions (branch instruction)

ü Beq R2, R3, A if R2=R3 then PC ß MEM[A]


if R2 !=R3 then PC ß PC+1
8

Typical instructions: instruction-type field

Ø Mode field
Lind R2, A ( RF[2] ß Mem[ Mem[A] ] )

Ø Constant field
Add R2, R3, 1 ( RF[2] ß RF[3] + 1 )
9

Number of address fields vs. performance


Ø Mathematical expression: C = (a+b)(a-b)

Ø Code with three-address instructions:

1. Add X, A, B ( Mem[X] ß Mem[A]+ Mem[B] )


2. Sub C, A, B ( Mem[C] ß Mem[A]- Mem[B] )
3. Mul C, X, C ( Mem[C] ß Mem[X] x Mem[C] )
Ø Code size: 3 instructions

Ø Code performance:

- Assume memory sizes have 16 to 256 million words à each address


would require 24 to 28 bits à 3-address in each instruction would
require 80 to 90 bits (including the opcode field)
- Assume that processor has 32-bit data and memory with 32-bit words
à each instruction address would occupy three words in memory
è 9 memory accesses to fetch 3 instructions and 9 memory
accesses to get the operand and store the result.
10

Number of address fields vs. performance


Ø Mathematical expression: C = (a+b)(a-b)
Ø Code with two-address instructions (with temporary data
in memory):
1. Move X, A ( Mem[X] ß Mem[A] )

2. Add X, B ( Mem[X] ß Mem[X]+ Mem[B] )

3. Move C, A ( Mem[C] ß Mem[A] )

4. Sub C, B ( Mem[C] ß Mem[C]- Mem[B] )

5. Mul C, X ( Mem[C] ß Mem[C] x Mem[X] )

Ø Code size: 5 instructions

Ø Code performance: 10 memory accesses for instructions,


13 memory accesses for data
11

Number of address fields vs. performance


Ø Mathematical expression: C = (a+b)(a-b)
Ø Code with one-address instructions (with temporary
data in the accumulator (ACC) register):
1. Load A ( ACC ß Mem[A] )
2. Add B (ACC ß ACC]+ Mem[B] )
3. Store X ( Mem[X] ß ACC )
4. Load A ( ACC ß Mem[A] )
5. Sub B ( ACC ß ACC- Mem[B] )
6. Mul X ( ACC ß ACC x Mem[X] )
7. Store C ( Mem[C] ß ACC )
Ø Code size: 7 instructions
Ø Code performance: 7 memory accesses for instructions,
7 memory accesses for data
12

Number of address fields vs. performance


Ø Mathematical expression: C = (a+b)(a-b)
Ø Code with two-address instructions (with temporary
data in the register file):
1. Load R1, A ( RF[1] ß Mem[A] )
2. Load R2, B ( RF[2] ß Mem[B] )
3. Move R3, R2 ( RF[3] ß RF[2] )
4. Add R1, R2 ( RF[1] ß RF[1]+ RF[2] )
5. Sub R3, R2 ( RF[3] ß RF[3]- RF[2] )
6. Mul R1, R3 ( RF[1] ß RF[1] x RF[3] )
7. Store C, R1 (Mem[C] ß RF[1] )
Ø Code size: 7 instructions
Ø Code performance: 7 memory accesses for instructions,
3 (Load x 2, Store) memory accesses for data
13

Number of address fields vs. performance


Ø Mathematical expression: C = (a+b)(a-b)
Ø Code with three-address register instructions and
two-address memory instructions (with temporary data
in the register file):
1. Load R1, A ( RF[1] ß Mem[A] )
2. Load R2, B ( RF[2] ß Mem[B] )
3. Add R3, R1, R2 ( RF[3] ß RF[1]+ RF[2] )
4. Sub R4, R1, R2 ( RF[4] ß RF[1]- RF[2] )
5. Mul R5, R4, R3 ( RF[5] ß RF[3] x RF[4] )
6. Store C, R5 ( Mem[C] ß RF[5] )
Ø Code size: 6 instructions
Ø Code performance: 6 memory accesses for instructions,
3 memory accesses for data
14

Addressing modes
Implied
(set, reset ACC/status registers)

Immediate
(incre/decre loop,
array indices, coefficient)

Direct

Indirect
15

Addressing modes

Relative

Branch instructions
Table look-ups

Indexed

“1”

Access stack or queue


Access array

What is the difference between “Relative” and “Indexed” address?


16

Instruction-execution cycle

IR = Instruction Register
PC = Program Counter
Memory stores all instructions and data
17

Processor design flow


Trade-offs between programming efficiency Can
and program size with processor cost (size) repeat
and performance several
time
v Instruction Set (IS): specify the operations
taken by each instruction.

v IS flowchart: describe precisely all the


operations that are performed in each
instruction.

v ASM chart: divide each instruction into clock


cycle.

v Design datapath: complete the connection.


18

Instruction-set design
Ø Programming effeciency vs. program size
Ø Processor cost vs. processor performance
Ø Compromise program size vs. processor size
Ø Complex Instruction-set (CISC)
ü powerful instructions -> shorter programs
ü powerful instructions -> complex datapath, control unit
ü complex instructions -> several clock cycles
ü complex datapath, control unit -> longer clock period
ü complex instructions -> poor pipeline
Ø Reduced Instruction-set (RISC)
ü simple instructions -> longer programs
ü simple instructions -> simple datapath, control unit
ü simple instructions -> single clock cycle
ü simple datapath -> shorter clock period
ü simple instructions -> excellent pipeline
19

Complex instruction-set for a 16-bit processor


1 word 16-bit (no memory access) (2-bit Type field, 5-bit Op field,…)
Each instruction
2 words 16-bit (memory access in word 2) à access 64K memory
a) Register instructions
arithmetic, logic,
move and shift
Name Action
Op Dest, Src1, Src2 RF (Dest)ß RF(Src1) Op RF(Src2)

b) Memory instructions
load and store

Name Name Action


L imm Dest RF [Dest] ß Address
L dir Dest RF [Dest] ß Mem[Address]
L rel Dest , Src2 RF [Dest] ß Mem[RF[Src2]+Address]
L in Dest RF [Dest] ß Mem[Mem[Address]]
S dir Scr1 Mem [Address] ß RF[Src1]
S rel Src1 , Src2 Mem[RF [Src2]+Address] ß RF[Src1]
S in Src1 Mem [Mem[Address]] ß RF[Src1]
20

Complex instruction-set for a 16-bit processor


c) Control instructions
jump, branch, call and
return Name Action
Jump Address PC ß Address
Brel Address PC ß PC+1 if Status[rel] = 0
PC ß Address if Status[rel] = 1
Call Address, Src1 Mem[RF[Src1]] ß PC+1; PC ß Address;
RF[Src1] ß RF[Src1] +1
Return RF[Src1] ß RF[Src1] -1;
PC ß Mem[RF[Src1]] ;
d) Miscellaneous instructions
no-op, clear, status, set
and reset
Name Action six relations
No-op Do nothing
Clear Dest RF [Dest] ß 0
Lstat Src1, Src2 Status ß R[Src1] >=< RF[Src2]
Sstat Dest Status [Dest] ß 1
Rstat Dest Status [Dest] ß 0
21

Outline
Ø We introduced new concepts:
ü Instruction sets
ü Instruction types
ü Addressing modes
ü Instruction-execution cycle
ü Processor design flow

Ø Including
ü instruction set design,
ü instruction set flowcharts,
ü component allocation,
ü ASM charts
ü processor architecture

Ø We have demonstrated processor design


ü 16-bit CISC design
ü 32-bit RISC design
o data-forwarding
o branch prediction
22

CISC Design: Instruction-set flowchart

Instruction-set flowchart
Ø Does not presume any architectural details
Ø Does not presume any particular processor datapath
Ø Does not consider any timing constraints or clock cycle
duration

Purpose:
Ø Give the order in which the operations specified by
each instruction will be executed.

Processor design flow


23

CISC Design:
Instruction-set
flowchart

Design process
24

CISC Design:
Instruction-set
flowchart
(cont.)

Mem[RF[src1]] <- PC+1


PC <- Mem [PC]
RF[src1] <- RF[src1] + 1

PC <- Mem [RF[src1]]


RF[src1] <- RF[src1] - 1
25

Component allocation for the 16-bit processor


Components: 64K x 16 Memory
8 x 16 Register file
ALSU
Instruction register ( IR )
Program counter ( PC )
Address register ( AR )
Data register ( DR )
Status register ( Status )
Control unit

AR, DR: needed to shorten clock period


and improve the performance of the
processor
Processor ASM chart
(scheduled IS chart)

Design process

26
Processor ASM chart
(scheduled IS chart)

27
28

Processor
schematic

Design process

Ø Schematic is obtained by connecting the processor components


according to the ASM chart, that is, by adding a connection whenever
data or an instruction is moved from one component to another.
Ø Components with several connections at the same input port require
selectors for that particular input port.
29

Reduced instruction-set cycle


(RISC design)
Ø Register and miscellaneous instructions
do not need address fetch and effective
address computation
Ø Memory and control instructions do not
need operand fetch and operation
Instruction-execution cycle
execution
Ø Share operand fetch and address fetch
Ø Share operand execution and effective
address computation
Ø Thus, instruction cycle reduced to
4 steps

Pipelined execution
30

Reduced instruction-set for a 32-bit processor


(RISC type)
a) Register Instructions
arithmetic, logic, move
and shift Name Action
Op Dest, Src1, Src2 RF (Dest) ß RF[Src1] Op RF[Src2]
Op Dest, Src1, Src2 RF (Dest) ß RF[Src1] Op Constant
Move Dest, Src1 RF (Dest) ß RF[Src1]
Shift Dest, Src1, Constant RF (Dest) ß RF[Src1] shift Constant

b) Memory instructions
load and store Name Action
L immU Dest RF [Dest(31…16)] ß Offset
2 addressing mode:
- Immediate L immL Dest RF [Dest(15…0)] ß Offset
- Relative
L rel Dest, Src2, Offset RF [Dest] ß Mem[RF[Src2] + Offset]
S rel Src1, Src2, Offset Mem[RF[Src2] + Offset] ß RF [Src1]
31

Reduced instruction-set for a 32-bit processor


(RISC type)
c) Control instructions
jump and branch
Beq (=) Name Action
Bgre (>) Jump Offset PC ß PC + offset
Bgoeq (≥) Jump Src2, Offset PC ß RF[Src2] + offset
Bless (<)
Bloeq (≤) Brel Src1, Src2, Offset PC ß PC+1 if RF[Src1] not rel RF[Src2]
Bneq (≠) PC ß PC+Offset if RF[Src1] rel RF[Src2]

d) Miscellaneous
instructions
no-op, clear, Name Action
set and reset No-op Do nothing
Clear Dest RF [Dest] ß0
Sstat Dest status [Dest] ß 1
Rstat Dest status [Dest] ß 0
32

RISC Ø Four-stage pipeline


Ø Separate instruction and data memories
block diagram Ø Add control register in each pipeline stage
Ø Pipeline stalling (flushing) for control instructions
2-stage forward path

1-stage forward path

Stage 2 Stage 3

Stage 1 Stage 4
33

RISC operation for a 3-line program


x=a+b
y=b–c
z=c+d
Source program
Assembly program
Concurrently:
n+3 clock cycles to execute n instructions
Sequentially:
4n clock cycles for n instructions (no pipeline)

Timing diagram
34

RISC operation for a 2-line program with data dependencies


sum = a + b
total = sum + c
Source program

Assembly program
Data dependence:
5 (or 45%) of 11 instructions are No-op
instrucsions à decrease substantially
performance of pipelined processor

Timing diagram
35

RISC operation for a 2-line program with data-forwarding

sum = a + b
total = sum + c Assembly program
Source program

Timing diagram
36

RISC operation for a 2-line program with data-forwarding


2-stage
forward path

1-stage
forward path

sum = a + b
total = sum + c
37

RISC operation without branch prediction


If a≥b then
begin
max = a
min = b
end
else
begin
max = b
min = a
end Timing diagram when branch is not taken

endif
Source program Assembly program

Timing diagram when branch is taken


38

RISC operation with branch prediction


If a≥b then
begin
max = a
min = b
end
else result of
jump inst
begin
max = b
min = a
Assembly program
end Timing diagram when branch is not taken
endif
Source program

result of
Bgoeq inst

Timing diagram when branch is taken


39

Chapter Summary
Ø We introduced new concepts:
ü Instruction sets
ü Instruction types
ü Addressing modes
ü Instruction-execution cycle
ü Processor design flow
Ø Including
ü instruction set design,
ü instruction set flowcharts,
ü component allocation,
ü ASM charts
ü processor architecture
Ø We have demonstrated processor design:
ü 16-bit CISC design
ü 32-bit RISC design
o data-forwarding
o branch prediction

You might also like