Chapter04 ProcessorDesign PDF

1
DIGITAL LOGIC DESIGN

( CE_118 )
CHAPTER 4:
PROCESSOR DESIGN
(part_1)
2
Outline
Ø We introduced new concepts:
ü Instruction sets
ü Instruction types
ü Addressing modes
ü Instruction-execution cycle
ü Processor design flow
Ø Including
ü instruction set design,
ü instruction set flowcharts,
ü component allocation,
ü ASM charts
ü processor architecture
Ø We have demonstrated processor design

ü 16-bit CISC design
ü 32-bit RISC design
o data-forwarding
o branch prediction
3
Outline
ü Instruction sets
ü Addressing modes
Ø Including
ü ASM charts

o data-forwarding
o branch prediction
4
Basic definitions
Ø Processor controls overall system operations, supervising I/O devices
(keyboards, monitors, disks, tapes), synchronize the data between

system and components and perform most of computational tasks.
Ø ASIC (Application Specification Integrated Circuit) is a co-processor
that executes one or more specific tasks much faster than the processor
itself. For this reason, such ASICs are called accelerators, since the
processor offloads computationally intensive tasks to them.
Ø ASIP (Application Specification Instruction Processor) is a processor
that possesses specific instructions that will allow some applications to

execute much faster than on an ordinary processor.
5
Basic computer architecture

6
Instruction Set
Ø Instruction Set (IS) is a set of variety of instructions and
instructions format, which will be interpreted by the processor’s
control unit and executed in the processor’s datapath.
Ø An instruction is a string of bits grouped into a number of
different fields, such as
ü Operation code (op-code)
ü Addressing fields
Ø A typical instruction, such as, a = b + c, where a, b, and c are

stored in memory at location A, B, and C can be expressed in
assembly language format,
ü Add A, B, C (Mem[A] <- Mem[B]+ Mem[C])
(assembly instruction) (mathematical notation)

7
Typical instructions: instruction-type field

Ø Register instructions (arithmetic, logic, shift)
ü Add RA, RB, RC ( RF[A] ß RF[B] + RF[C] )
Ø Move instructions (load and store)
ü Load R2, A ( RF[2] ß MEM[A] )

ü Store A, R2 ( MEM[A] ß RF[2] )
Ø Control instructions (branch instruction)
ü Beq R2, R3, A if R2=R3 then PC ß MEM[A]

if R2 !=R3 then PC ß PC+1
8
Typical instructions: instruction-type field
Ø Mode field
Lind R2, A ( RF[2] ß Mem[ Mem[A] ] )
Ø Constant field
Add R2, R3, 1 ( RF[2] ß RF[3] + 1 )
9
Number of address fields vs. performance

Ø Mathematical expression: C = (a+b)(a-b)
Ø Code with three-address instructions:
1. Add X, A, B ( Mem[X] ß Mem[A]+ Mem[B] )

2. Sub C, A, B ( Mem[C] ß Mem[A]- Mem[B] )
3. Mul C, X, C ( Mem[C] ß Mem[X] x Mem[C] )
Ø Code size: 3 instructions
Ø Code performance:
- Assume memory sizes have 16 to 256 million words à each address

would require 24 to 28 bits à 3-address in each instruction would
require 80 to 90 bits (including the opcode field)
- Assume that processor has 32-bit data and memory with 32-bit words
à each instruction address would occupy three words in memory
è 9 memory accesses to fetch 3 instructions and 9 memory
accesses to get the operand and store the result.
10

Ø Code with two-address instructions (with temporary data
in memory):
1. Move X, A ( Mem[X] ß Mem[A] )
2. Add X, B ( Mem[X] ß Mem[X]+ Mem[B] )
3. Move C, A ( Mem[C] ß Mem[A] )
4. Sub C, B ( Mem[C] ß Mem[C]- Mem[B] )
5. Mul C, X ( Mem[C] ß Mem[C] x Mem[X] )
Ø Code performance: 10 memory accesses for instructions,

13 memory accesses for data
11

Ø Code with one-address instructions (with temporary
data in the accumulator (ACC) register):
1. Load A ( ACC ß Mem[A] )
2. Add B (ACC ß ACC]+ Mem[B] )
3. Store X ( Mem[X] ß ACC )
4. Load A ( ACC ß Mem[A] )
5. Sub B ( ACC ß ACC- Mem[B] )
6. Mul X ( ACC ß ACC x Mem[X] )
7. Store C ( Mem[C] ß ACC )
12

Ø Code with two-address instructions (with temporary
data in the register file):
1. Load R1, A ( RF[1] ß Mem[A] )
2. Load R2, B ( RF[2] ß Mem[B] )
3. Move R3, R2 ( RF[3] ß RF[2] )
4. Add R1, R2 ( RF[1] ß RF[1]+ RF[2] )
5. Sub R3, R2 ( RF[3] ß RF[3]- RF[2] )
6. Mul R1, R3 ( RF[1] ß RF[1] x RF[3] )
7. Store C, R1 (Mem[C] ß RF[1] )
3 (Load x 2, Store) memory accesses for data
13

Ø Code with three-address register instructions and
two-address memory instructions (with temporary data
in the register file):
1. Load R1, A ( RF[1] ß Mem[A] )
2. Load R2, B ( RF[2] ß Mem[B] )
3. Add R3, R1, R2 ( RF[3] ß RF[1]+ RF[2] )
4. Sub R4, R1, R2 ( RF[4] ß RF[1]- RF[2] )
5. Mul R5, R4, R3 ( RF[5] ß RF[3] x RF[4] )
6. Store C, R5 ( Mem[C] ß RF[5] )
14
Addressing modes
Implied
(set, reset ACC/status registers)
Immediate
(incre/decre loop,
array indices, coefficient)
Direct
Indirect
15
Addressing modes
Relative
Branch instructions
Table look-ups
Indexed
“1”
Access stack or queue

Access array
What is the difference between “Relative” and “Indexed” address?

16
Instruction-execution cycle
IR = Instruction Register
PC = Program Counter
Memory stores all instructions and data
17
Processor design flow

Trade-offs between programming efficiency Can
and program size with processor cost (size) repeat
and performance several
time
v Instruction Set (IS): specify the operations
taken by each instruction.
v IS flowchart: describe precisely all the

operations that are performed in each
instruction.
v ASM chart: divide each instruction into clock

cycle.
v Design datapath: complete the connection.

18
Instruction-set design
Ø Programming effeciency vs. program size
Ø Processor cost vs. processor performance
Ø Compromise program size vs. processor size
Ø Complex Instruction-set (CISC)
ü powerful instructions -> shorter programs
ü powerful instructions -> complex datapath, control unit
ü complex instructions -> several clock cycles
ü complex datapath, control unit -> longer clock period
ü complex instructions -> poor pipeline
Ø Reduced Instruction-set (RISC)
ü simple instructions -> longer programs
ü simple instructions -> simple datapath, control unit
ü simple instructions -> single clock cycle
ü simple datapath -> shorter clock period
ü simple instructions -> excellent pipeline
19
Complex instruction-set for a 16-bit processor

1 word 16-bit (no memory access) (2-bit Type field, 5-bit Op field,…)
Each instruction
2 words 16-bit (memory access in word 2) à access 64K memory
a) Register instructions
arithmetic, logic,
move and shift
Name Action
Op Dest, Src1, Src2 RF (Dest)ß RF(Src1) Op RF(Src2)
b) Memory instructions
load and store
Name Name Action

L imm Dest RF [Dest] ß Address
L dir Dest RF [Dest] ß Mem[Address]
L rel Dest , Src2 RF [Dest] ß Mem[RF[Src2]+Address]
L in Dest RF [Dest] ß Mem[Mem[Address]]
S dir Scr1 Mem [Address] ß RF[Src1]
S rel Src1 , Src2 Mem[RF [Src2]+Address] ß RF[Src1]
S in Src1 Mem [Mem[Address]] ß RF[Src1]
20
Complex instruction-set for a 16-bit processor

c) Control instructions
jump, branch, call and
return Name Action
Jump Address PC ß Address
Brel Address PC ß PC+1 if Status[rel] = 0
PC ß Address if Status[rel] = 1
Call Address, Src1 Mem[RF[Src1]] ß PC+1; PC ß Address;
RF[Src1] ß RF[Src1] +1
Return RF[Src1] ß RF[Src1] -1;
PC ß Mem[RF[Src1]] ;
d) Miscellaneous instructions
no-op, clear, status, set
and reset
Name Action six relations
No-op Do nothing
Clear Dest RF [Dest] ß 0
Lstat Src1, Src2 Status ß R[Src1] >=< RF[Src2]
Sstat Dest Status [Dest] ß 1
Rstat Dest Status [Dest] ß 0
21
Outline
ü Instruction sets
ü Addressing modes
Ø Including
ü ASM charts

o data-forwarding
o branch prediction
22
CISC Design: Instruction-set flowchart
Instruction-set flowchart
Ø Does not presume any architectural details
Ø Does not presume any particular processor datapath
Ø Does not consider any timing constraints or clock cycle
duration
Purpose:
Ø Give the order in which the operations specified by
each instruction will be executed.
Processor design flow

23
CISC Design:
Instruction-set
flowchart
Design process
24
CISC Design:
Instruction-set
flowchart
(cont.)
Mem[RF[src1]] <- PC+1

PC <- Mem [PC]
RF[src1] <- RF[src1] + 1
PC <- Mem [RF[src1]]

RF[src1] <- RF[src1] - 1
25
Component allocation for the 16-bit processor

Components: 64K x 16 Memory
8 x 16 Register file
ALSU
Instruction register ( IR )
Program counter ( PC )
Address register ( AR )
Data register ( DR )
Status register ( Status )
Control unit
AR, DR: needed to shorten clock period

and improve the performance of the
processor
Processor ASM chart
(scheduled IS chart)
Design process
26
Processor ASM chart
(scheduled IS chart)
27
28
Processor
schematic
Design process
Ø Schematic is obtained by connecting the processor components

according to the ASM chart, that is, by adding a connection whenever
data or an instruction is moved from one component to another.
Ø Components with several connections at the same input port require
selectors for that particular input port.
29
Reduced instruction-set cycle

(RISC design)
Ø Register and miscellaneous instructions
do not need address fetch and effective
address computation
Ø Memory and control instructions do not
need operand fetch and operation
Instruction-execution cycle
execution
Ø Share operand fetch and address fetch
Ø Share operand execution and effective
address computation
Ø Thus, instruction cycle reduced to
4 steps
Pipelined execution
30
Reduced instruction-set for a 32-bit processor

(RISC type)
a) Register Instructions
arithmetic, logic, move
and shift Name Action
Op Dest, Src1, Src2 RF (Dest) ß RF[Src1] Op RF[Src2]
Op Dest, Src1, Src2 RF (Dest) ß RF[Src1] Op Constant
Move Dest, Src1 RF (Dest) ß RF[Src1]
Shift Dest, Src1, Constant RF (Dest) ß RF[Src1] shift Constant
b) Memory instructions
load and store Name Action
L immU Dest RF [Dest(31…16)] ß Offset
2 addressing mode:
- Immediate L immL Dest RF [Dest(15…0)] ß Offset
- Relative
L rel Dest, Src2, Offset RF [Dest] ß Mem[RF[Src2] + Offset]
S rel Src1, Src2, Offset Mem[RF[Src2] + Offset] ß RF [Src1]
31
Reduced instruction-set for a 32-bit processor

(RISC type)
c) Control instructions
jump and branch
Beq (=) Name Action
Bgre (>) Jump Offset PC ß PC + offset
Bgoeq (≥) Jump Src2, Offset PC ß RF[Src2] + offset
Bless (<)
Bloeq (≤) Brel Src1, Src2, Offset PC ß PC+1 if RF[Src1] not rel RF[Src2]
Bneq (≠) PC ß PC+Offset if RF[Src1] rel RF[Src2]
d) Miscellaneous
instructions
no-op, clear, Name Action
set and reset No-op Do nothing
Clear Dest RF [Dest] ß0
Sstat Dest status [Dest] ß 1
Rstat Dest status [Dest] ß 0
32
RISC Ø Four-stage pipeline

Ø Separate instruction and data memories
block diagram Ø Add control register in each pipeline stage
Ø Pipeline stalling (flushing) for control instructions
2-stage forward path
1-stage forward path
Stage 2 Stage 3
Stage 1 Stage 4
33
RISC operation for a 3-line program

x=a+b
y=b–c
z=c+d
Source program
Assembly program
Concurrently:
n+3 clock cycles to execute n instructions
Sequentially:
4n clock cycles for n instructions (no pipeline)
Timing diagram
34
RISC operation for a 2-line program with data dependencies

sum = a + b
total = sum + c
Source program
Assembly program
Data dependence:
5 (or 45%) of 11 instructions are No-op
instrucsions à decrease substantially
performance of pipelined processor
Timing diagram
35
RISC operation for a 2-line program with data-forwarding
sum = a + b
total = sum + c Assembly program
Source program
Timing diagram
36
RISC operation for a 2-line program with data-forwarding

2-stage
forward path
1-stage
forward path
sum = a + b
total = sum + c
37
RISC operation without branch prediction

If a≥b then
begin
max = a
min = b
end
else
begin
max = b
min = a
end Timing diagram when branch is not taken
endif
Source program Assembly program
Timing diagram when branch is taken

38
RISC operation with branch prediction

If a≥b then
begin
max = a
min = b
end
else result of
jump inst
begin
max = b
min = a
Assembly program
end Timing diagram when branch is not taken
endif
Source program
result of
Bgoeq inst
Timing diagram when branch is taken

39
Chapter Summary
ü Instruction sets
ü Addressing modes
Ø Including
ü ASM charts
Ø We have demonstrated processor design:
o data-forwarding
o branch prediction

Chapter04 ProcessorDesign PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter04 ProcessorDesign PDF

Uploaded by

Copyright:

Available Formats

1

DIGITAL LOGIC DESIGN

Ø We have demonstrated processor design

Ø We have demonstrated processor design

(keyboards, monitors, disks, tapes), synchronize the data between

Ø ASIC (Application Specification Integrated Circuit) is a co-processor

Ø ASIP (Application Specification Instruction Processor) is a processor

that possesses specific instructions that will allow some applications to

Basic computer architecture

Ø A typical instruction, such as, a = b + c, where a, b, and c are

(assembly instruction) (mathematical notation)

Typical instructions: instruction-type field

ü Load R2, A ( RF[2] ß MEM[A] )

ü Beq R2, R3, A if R2=R3 then PC ß MEM[A]

Typical instructions: instruction-type field

Number of address fields vs. performance

Ø Code with three-address instructions:

1. Add X, A, B ( Mem[X] ß Mem[A]+ Mem[B] )

- Assume memory sizes have 16 to 256 million words à each address

Number of address fields vs. performance

2. Add X, B ( Mem[X] ß Mem[X]+ Mem[B] )

3. Move C, A ( Mem[C] ß Mem[A] )

4. Sub C, B ( Mem[C] ß Mem[C]- Mem[B] )

5. Mul C, X ( Mem[C] ß Mem[C] x Mem[X] )

Ø Code size: 5 instructions

Ø Code performance: 10 memory accesses for instructions,

Number of address fields vs. performance

Number of address fields vs. performance

Number of address fields vs. performance

Access stack or queue

What is the difference between “Relative” and “Indexed” address?

Processor design flow

v IS flowchart: describe precisely all the

v ASM chart: divide each instruction into clock

v Design datapath: complete the connection.

Complex instruction-set for a 16-bit processor

Name Name Action

Complex instruction-set for a 16-bit processor

Ø We have demonstrated processor design

CISC Design: Instruction-set flowchart

Processor design flow

Mem[RF[src1]] <- PC+1

PC <- Mem [RF[src1]]

Component allocation for the 16-bit processor

AR, DR: needed to shorten clock period

Ø Schematic is obtained by connecting the processor components

Reduced instruction-set cycle

Reduced instruction-set for a 32-bit processor

Reduced instruction-set for a 32-bit processor

RISC Ø Four-stage pipeline

1-stage forward path

RISC operation for a 3-line program

RISC operation for a 2-line program with data dependencies

RISC operation for a 2-line program with data-forwarding

RISC operation for a 2-line program with data-forwarding

RISC operation without branch prediction

Timing diagram when branch is taken

RISC operation with branch prediction

Timing diagram when branch is taken

You might also like