You are on page 1of 84

CCB-524: Computer Architecture

UNIT-III

Instruction Set Architecture


An Introduction

*Data taken from multiple sources


Instruction Set Architecture
• Instruction Set: Complete set of all the instructions in machine code that can
be recognized and executed by a μP.

• Instruction Set Architecture: It consists of addressing modes, instructions,


native data types, registers, memory architecture, interrupt/exception
handling and external I/O.

• The ISA serves as the boundary between software and hardware.

• Visible to the compiler writer.


Instruction Set Architecture (ISA)
• Serves as an interface between software and hardware.

• The software/instructions tells the hardware what should be done.

High level language code : C, C++, Java, Fortran,


compiler
Assembly language code: architecture specific statements
assembler
Machine language code: architecture specific bit patterns

software

instruction set
hardware
ISA: Design Issues
 Where are operands stored? (During Operation)
• registers, memory, stack, accumulator

 How many explicit operands are there?


• 0, 1, 2, or 3

 How is the operand location specified?


• register, immediate, indirect, . . .

 What type & size of operands are supported?


• byte, int, float, double, string, vector. . .

 What operations are supported?


• add, sub, mul, move, compare , load, store …... . .
Classification ( Architectural Complexity)

CISC RISC
Complex Instruction Set Computer (CISC)
The small and slow memory, backward instruction set compatibility constraints led to the
development of CISC.

 Large number of complex Instructions:


 Single instruction can
 Execute several low-level operations or are capable of multi-step operations
 Use multi-addressing modes within single instructions (Orthogonal Instruction
Set ).
(Example: A load from memory, an arithmetic operation, and a memory store)

 Due to complex instructions, it attempts to minimize the number of instructions per


program, sacrificing the number of cycles per instruction.
 CISC architecture is designed to decrease the memory cost.
Complex Instruction Set Computer (CISC)
 Variable length instructions where the length often varies according to the addressing
mode

 Instructions require multiple clock cycles to execute.

 A small number of general purpose registers because instructions can operate directly
on memory.

 Data transfer is from memory to memory.

 Micro programmed control unit is found in CISC.


CISC ISA: Intel Pentium
12 addressing modes Operand sizes
• Register • Can be 8, 16, 32, 48, 64, or 80 bits long
• Immediate
• Direct
• Base
• Base + Displacement
• Index + Displacement
• Scaled Index + Displacement
Instruction Encoding:
• Based Index • The smallest instruction is one byte.
• Based Scaled Index • The longest instruction is 12 bytes long.
• Based Index + Displacement • The first bytes generally contain the opcode,
mode specifiers, and register fields.
• Based Scaled Index + • The remainder bytes are for address
Displacement displacement and immediate data.
• Relative.
Reduced Instruction Set Computer (RISC)
RISC is an ISA that consists of small, highly-optimized set of instructions.

 Focuses on reducing the complexity and number operations per instruction

 Reduced number of cycles needed per instruction.

 Designed with CPU instruction pipelining in mind.

 Fixed-length instruction encoding.

 Incorporates a larger number of registers to prevent in large amounts of


interactions with memory (Only load/store memory instructions)

 Simplified addressing modes.


-Usually limited to immediate, register indirect, register displacement,
indexed.
CISC v/s RISC
CISC RISC

Emphasis on hardware Emphasis on software

Includes multi-clock Single-clock,


complex instructions reduced instruction only
Memory-to-memory: Register to register:
"LOAD" and "STORE" "LOAD" and "STORE"
incorporated in instructions are independent instructions

Variable Length Instructions Fixed length instructions


Microprocessor without Interlocked Pipeline Stages
(MIPS) ISA

MIPS is a RISC ISA and developed by MIPS Technologies (formerly MIPS


Computer Systems).

MIPS is a load/store architecture (also known as a register-register


architecture); except for the load/store instructions used to access memory, all
instructions operate on the registers.

The early MIPS architectures were 32-bit, with 64-bit versions added later.

Versions of MIPS: MIPS I, II, III, IV, and V


Five releases of MIPS32/64 (for 32- and 64-bits implementations).
The current version is MIPS32/64 Release 6.
MIPS Instruction Set
32 bit architecture
32 General Purpose Registers

 Instruction set function:


•Computational (uses ALU): ADD, SUB, AND, Or instructions.
•Memory Access: lw, lb, sw, sb instructions
•Program flow: Jump instructions

Instruction set format:


R-type
J-type
I-type
R-Type Instruction
•Arithmetic and Logical Instructions

•In addition to the opcode, R-type instructions specify three registers, a shift amount field,
and a function field

•op (opcode): basic operation of instruction – also determines format – op = 0 for all R-type
instructions

•rs: first source operand


•rt: second source operand
•rd: destination

•shamt: shift amount and is only used for shift instruction

• funct: function variant (e.g. add has funct=32 and sub has funct=34)
Example R-Type Instruction
add Rd, Rs, Rt ; Rd = Rs + Rt

Binary Representation: 000000 10001 10010 00011 00000 100000

sub Rd, Rs, Rt ; Rd = Rs - Rt

Binary Representation: 000000 10001 10010 00011 00000 100010


J-Type Instruction

op (opcode): Basic operation of instruction (e.g. opcode = 2 or 3)

target: target word address of the instruction to jump to.


Example J-Type Instruction

Jump Label1 ; goto Label

Binary 000010 xx xxxx xxxx xxxx xxxx xxxx xxxx


I-Type Instruction
•Arithmetic + Logical + Memory (Load/Store) + Branch Instructions

•I-type instructions specify two registers and a 16-bit immediate


value

•opcode is greater than 3


I-Type Instruction
•op (opcode): basic operation of instruction (e.g. lw opcode = 35,
addi opcode = 8, beq opcode = 4)

•rs: register containing source operand

•rt: destination register for addi or loads, second operand for beq

•immediate: immediate field in computation instructions


•byte address offset in load/store instructions, word address offset
(wrt PC) in branch instructions – always sign extended to a 32-bit
value.
Load Instruction: 100011 10011 01000 xxxxxxxxxxxxxxxx
I-Type Instruction

ADDI Rs, Rt, 10


The binary code 001000 00001 00110 0000000000001010
Building a Datapath (MIPS Implementation)

• A datapath is a collection of functional units, such as


arithmetic logic units or multipliers, that perform data
processing operations using registers and buses.

• Datapath
– A hardware unit that process data and addresses
in the CPU
– It includes
Instruction and Data Memories, Registers, ALUs,
Adders and mux’s, …
Building a Datapath
The subset of the core MIPS instruction set
 Arithmetic/logical Instructios
add, sub, and, or
 Memory reference Instructions
lw, sw
 Branch/Control transfer Instructions
branch equal(beq), jump(j)
“Simplicity favors regularity”
For every instruction in all 3 classes of instructions, the first two steps are
identical:

 Send the program counter (PC) to the memory that contains the address and
fetch the instruction from the memory.

 Read one or two registers, using fields of the instruction to select the
registers to read.
For the load/store word instruction, we need to read only one register, but most other
instructions require reading two registers.
“Simplicity favors regularity”
After these two steps, every instruction uses ALU:
 Memory-reference instructions for an address calculation
 Arithmetic-logical instructions for the operation execution
 Branches for comparison

After using the ALU,


 Memory-reference instruction will need to access the memory either to read data for a load
or write data for a store.
 Arithmetic-logical instruction must write the data from the ALU into a register.
 Branch instruction, we may need to change the next instruction address based on the
comparison; otherwise, the PC should be incremented by 4 to get the address of the next
instruction.
“Simplicity favors regularity”
For each of the three instruction classes (memory-reference, arithmetic-logical,
and branches), the actions are almost the same, independent of the exact
instruction. The simplicity and regularity of the MIPS instruction set simplifies
the implementation by making the execution of many of the instruction classes
similar.
Abstract view of the implementation of MIPS Subset

•Fetch Instruction.
•Decode and Fetch registers
operands
•To compute a memory
address (for a load or
store)
•To compute an
arithmetic result (for
arithmetic-logical
instruction)
•To compare (for a
branch)
Abstract view of the implementation of MIPS Subset

•For A/L instruction, the result


must be written to a register.
• For lw/sw, the ALU result is
used as an address to either
store a value from the registers
or load a value from memory.
•For Branches, the ALU output
to determine the next
instruction address, which
comes either from the ALU or
from an adder that increments
the current PC by 4.
Abstract view of the implementation of MIPS Subset
Multiplexers: Data Selector
 Can’t just join wires together
 Use multiplexers
Multiplexers: Data Selector
Elements of Data path (Combinational)
• AND-gate  Adder A
Y
+
–Y=A&B  Y=A+B B

A
Y
B

 Arithmetic/Logic Unit
 Multiplexer
 Y = F(A, B)
 Y=S?1:0
A
I0 M
u Y ALU Y
I1 x
B
S F
Elements of Datapath
 Program Counter
The 32-bit register containing the address of
the instruction in the program being
executed. It is updated at the end of every
clock cycle/fetch instruction step.

 Instruction Memory
Consist of set of instructions (Program)
Read-only memory read-only at user level
(so a program may not
modify it's own code), and writeable only by
supervisor level(s),
usually an OS function
Elements of Data path
 Data Memory
Consists of data corresponding to program being
executed
Data memory is expected to be modified by a
program, so the access to it is freely given to it's
parent process.

Register File
A register file is a collection of registers in which
any register can be read or written by specifying
the number of the register in the file.
State element that consists of a set of registers
(read and write operations valid).
The processor’s 32 general-purpose registers are
stored in a structure called a register file.
Elements of Data path
Sign Extension Unit
The sign extension unit has a m-bit input that is sign-
extended into a n-bit result appearing on the output.

Sign extension is the operation that increases the number


of bits of a binary number while preserving the
number's sign (positive/negative) and value.
For example, if six bits are used to represent the number
"00 1010" (decimal positive 10) and the sign extend
operation increases the word length to 16 bits, then the
new representation is simply "0000 0000 0000 1010".

If ten bits are used to represent the value "11 1111 0001" 16-bit input and 32-bit output
(decimal negative 15) using two's complement, and this is
sign extended to 16 bits, the new representation is "1111
1111 1111 0001". Thus, by padding the left side with
ones, the negative sign and the value of the original
number are maintained.
Datapath: Instruction Fetch and Increment

To execute any instruction, the processor must fetch the instruction from instruction
memory.
To prepare for executing the next instruction, the processor must also increment the
program counter so that it points at the next instruction, 4 bytes later.

Required elements are


PC Register
Instruction Memory
Adder
Datapath: Instruction Fetch and increments the PC

Increment by 4
for next
32-bit instruction
register
Datapath: R-Format Instructions
• Read two register operands
• Perform ALU operation on the contents of the registers
• Write result to a register.

• R-Format Instructions read three register operands i.e. 3


register read from register file

 Required elements are


Register File
ALU
Datapath: R-format Instructions
Add $t1, $t2, $t3
ALU control

Read 4
Register 1 Read
data 1
Read A Zero
Instruction
Register 2 L
U Result
Write Read
Register data 2
Write Registers
Data
RegWrite

[$t1] @ [$t2]

32
{ +, -, AND, OR, etc.}
Memory-Reference Instruction (Load/Store)
• Read one register operand
• Calculate address using base register and 16-bit offset from instruction
• Perform operation
– Load: Read memory and load register
– Store: Write register value to memory

• Elements required
– Data Memory Unit
– ALU
– Register File
– Sign-Extension Unit
Datapath: Load/Store
lw $t1, $t2(Offset_value)
op rs rt address ALU control
MemWrite

Read
Register 1 Read
data 1 Zero
Instruc- Read A
tion Register 2 L Read
U Address data
Write Read Result
Register data 2
Write Registers Dmem
Write
Data
Data
RegWrite
Sign MemRead
Extend
16 32
Branch Instructions
• Three operands, two registers that are compared for equality, and a
16-bit offset used to compute the branch target address

• beq $t1, $t, Offset

• Branch target address: The address specified in a branch, which


becomes the new program counter (PC) if the branch is taken.

• Branch taken: A branch where the branch condition is satisfied and


the program counter (PC) becomes the branch target.

• Branch not taken: A branch where the branch condition is false


and the program counter (PC) becomes the address of the instruction
that sequentially follows the branch.
Branch Instructions
• Read register operands
• Compare operands
– Use ALU, subtract and check Zero output
• Calculate target address
– Sign-extend displacement
– Shift left 2 places (word displacement)
– Add PC (PC + 4) + Resultant Offset
• PC value: Already calculated by instruction fetch

• Element required
– Register File
– ALU
– Adder
– Sign-Extension Unit
Datapath: Branch Instructions

Sign-bit wire
replicated
Creating Single Datapath
• Single Datapath : The hardware unit designed to follow the sequence
Fetch-Decode-Execute

• The simplest datapath will attempt to execute all instructions in one clock
cycle.

• Each datapath element can only do one function at a time. No datapath


resource can be used more than once per instruction, so any element needed
more than once must be duplicated.

• To share a datapath element between two different instruction classes, use


multiplexers where alternate data sources are used for different instruction
control signal to select among the multiple inputs.
Creating Single Datapath
To create single datapath for the core MIPS architecture , the required
datapaths are
– datapath for instruction fetch and Increment
– datapath from R-type
– datapath for memory instructions
– datapath for branches
– Control Signals (Decode)
– Muxs
Single Datapath
Instruction Cycle and Pipelining
Address, Data and Control Buses
• Bus: A bus is a group of signal lines which is used to connect two or more systems or
subsystems.

• Address bus: A system bus which is used to specify a physical address to memory.
When a processor needs to read/write to a memory location, it specifies that memory location
on the address bus. The width of the address bus determines the amount of memory a system
can address. For example, a system with a 32-bit address bus can address 232(4,294,967,296)
memory locations. If each memory location holds one byte, the addressable memory space is 4
GB.

• Data bus: A system bus which is used to transport data from memory to processor and vice-
versa. Different kinds of data buses have evolved along with personal computers and other
pieces of hardware.

• Control bus: A system bus which is used by CPUs for communicating with other devices
within the computer.

• The address bus carries the information about the device with which the CPU is communicating
and the data bus carries the actual data being processed, the control bus carries commands from
the CPU and returns status signals from the devices. For example, if the data is being read or
written to the device the appropriate line (read or write) will be active (logic one).
Execution of Complete Instruction
Cycle
To execute the instructions, the processor must follow the instruction cycle:
 Fetch instruction: The processor reads an instruction from memory
(register, cache, main memory).
 Decode instruction: The instruction is decoded to determine what
action is required.
 Fetch data: The execution of an instruction may require reading data
from memory or an I/O module (If indirect addressing is used).
 Execute Operation: The execution of an instruction may require
performing some arithmetic or logical operation on data.
 Write data: The results of an execution may require writing data to
memory or an I/O module.
 Interrupt: If interrupts are enabled and an interrupt has occurred, save
the current process state and service the interrupt.
Execution of Complete Instruction
Cycle
 An instruction may involve one or more operands in memory.
 For operands with Indirect addressing, additional memory accesses are
required
 Can be thought of as additional instruction sub-cycle
 If interrupts are enabled and an interrupt has occurred, save the current
process state and service the interrupt before the next instruction fetch.
Instruction Execution with Indirect Cycle (Software
Interrupt)
Instruction Cycle State Diagram
Data Flow

• What is sequence of data flow in different stages?

Essential registers for instruction execution:


 Program counter (PC): Contains the address of an instruction to be fetched
 Instruction register (IR): Contains the instruction most recently fetched
 Memory address register (MAR): Contains the address of a location in
memory
 Memory buffer register (MBR): Contains a word of data to be written to
memory or the word most recently read
Data Flow (Instruction Fetch)

 PC contains address of next instruction to be fetched


 Address moved to MAR
 The value of MAR(Address) is placed on address bus
 Control unit requests memory read
 Result placed on data bus
 Result copied to MBR
 The fetched data moved to IR
 Meanwhile PC incremented by 1
Data Flow (Instruction Fetch)
Data Flow (Data Fetch)

• The CU examines the contents of Instruction Register (IR).

• If indirect addressing, indirect cycle is performed


– Right most N bits of MBR transferred to MAR
– Control unit requests memory read
– Result (address of operand) moved to MBR
– The value of MBR is copied to appropriate register
Data Flow (Data Fetch)
Data Flow (Execute)

• Depends on instruction being executed


• May include
– Memory read/write
– Input/Output
– Register transfers
– ALU operations
Data Flow (Interrupt)

• Current PC saved to allow resumption after interrupt


• Contents of PC copied to MBR
• Special memory location (e.g. stack pointer) loaded to MAR
• MBR written to memory
• PC loaded with address of interrupt handling routine
• Next instruction can be fetched
Data Flow (Interrupt Diagram)
Instruction Execution Improvement

• Cache Memory
• General Purpose registers
• Pipelining
Pipelining
• A way of speeding up execution of instructions
• Key idea: Overlap execution of multiple instructions

• Instruction pipelining is similar to the use of an assembly line in a


manufacturing plant.

• Pipelining: The new inputs are accepted at one end before previously
accepted inputs appear as outputs at the other end.

• To apply this concept to instruction execution, we must decompose an


instruction execution into independent stages with equal time
requirements.
Two Stage Instruction Pipelining

The main memory is not being accessed during the execution of an


instruction all the time. When an instruction is being executed, the
next instruction can be fetched in parallel.

Two Stages
Fetch instruction
Execute instruction

• The pipeline has two independent stages. The first stage fetches an
instruction and buffers it. When the second stage is free, the first
stage passes it the buffered instruction. While the second stage is
executing the instruction, the first stage takes advantage of any
unused memory cycles to fetch and buffer the next instruction. This
is called instruction prefetch or fetch overlap.
Two Stage Instruction Pipeline
Two Stage Instruction Pipelining

• Requires more registers for instruction buffering.

• Conceptually, the instruction cycle time would be halved, If the


fetch and execute stages were of equal duration. However, it has
two reasons for performance slowdown
– The execution time will generally be longer than the fetch time. Execution will
involve reading and storing operands. Thus, the fetch stage may have to wait for
some time before it can empty its buffer.
– A conditional branch instruction makes the address of the next
instruction to be fetched unknown. Thus, the fetch stage must
wait until it receives the next instruction address from the
execute stage. The execute stage may then have to wait while
the next instruction is fetched.
Pipelining: Stages

To achieve improved speed-up, the instruction execution can have more than
two stages,
• Fetch instruction (FI): Fetch the next instruction from memory.
• Decode instruction (DI): Determine the opcode and the operand specifiers.
• Calculate operands (CO): Calculate the effective address of each source
operand.
• Fetch operands (FO): Fetch each operand from memory. Operands in
registers need not be fetched.
• Execute instruction (EI): Perform the indicated operation and store the
result, if any, in the specified destination operand location.
• Write operand (WO): Store the result in memory.

• Assume each stage consumes approximately equal time.


• Pipelining: Overlap these operations for multiple instructions.
Timing Diagram for Instruction Pipeline

If each stage consumes approximately equal time, then


Without pipelining, the time required to execute 9 instructions = 9*6 = 54
With pipelining, the time required to execute 9 instructions = 14
Pipelining: Limitations
• Each instruction goes through all six stages of the pipeline. (Not True). For example,
a load instruction does not need the WO stage.
• Memory Conflicts: All of the stages can be performed in parallel. It is assumed that
there are no memory conflicts during FI, FO, and WO stages.
• Register Conflict: The CO stage may depend on the contents of a register that could
be altered by a previous instruction that is still in the pipeline.
• Each stage consumes approximately equal time. However, if the six stages are not of
equal duration, there will be some waiting involved at various pipeline stages.
• Conditional Branch Instruction:
– If conditional branch is not taken, the pipelining get the full performance
benefits.
– If conditional branch is taken, the conditional branch instruction can invalidate
several fetched instruction
• Similar to conditional branch, the interrupt are also un-predictable.
Conditional Branch on Instruction: I3 is conditional to I15
Pipelining: Performance
• The greater the number of stages in the pipeline, the faster the execution rate.
• The process for increasing the performance between two systems processing the
same problem is known as speeding-up of the problem.
• Speed-up of : It is the ratio of sequential time and time required to execute the same
problem in parallel manner.
• Cycle Time (τ): The time required to advance a set of instructions one stage
through the pipeline. The cycle time can be determined as
τ = max{τi} + d = τm + d, 1 ≤ i ≤ k
where
k: The number of stages in the instruction pipeline
τi :The time delay of the circuitry in the ith stage of the pipeline
tm : The maximum stage delay
d: The time delay of a latch, needed to advance signals and data from
one stage to the next
Pipelining: Performance
• Assume pipelining with no branches.
• A total of k cycles are required to complete the execution of the first instruction, and
the remaining (n-1) instructions require (n-1) cycles.
• The total time required for a pipeline with k stages to execute n instructions ,
Tk,n = [k + (n-1)] τ

The total time required without pipeline to execute n instructions ,


T1,n = nkτ

The speed-up factor is given by


S = T1,n /Tk,n = nk / [k + (n-1)]
Pipeline Hazards
• Pipeline Hazards are situations that prevent the next instruction in the
instruction stream from executing in its designated clock cycle

• Pipeline Hazard: When the pipeline, or some portion of the pipeline, must
stall because conditions do not permit continued execution.

• Hazards reduce the performance from the ideal speedup gained by pipelining

• No new instructions are fetched during the stall

• Types of hazards
– Resource
– Data
– Control
Resource/Structural Hazards

• Occurs when two or more instructions in pipeline need same resource (such as
memory, single ALU etc.)
• Executed in serial rather than parallel for part of pipeline

Example: Memory Conflict


• Assume simplified five-stage pipeline (each stage takes one clock cycle)
• Initially, Cache is empty
• Ideal case is new instruction enters pipeline each clock cycle
• Assume main memory has single port i.e. instruction fetch/data read/data write
can not performed simultaneously.

• Fetch instruction stage must idle for one cycle fetching I3.

• Example: ALU Conflict


• Example: Register Conflict
Resource/Structural Hazards
Data Hazards
• Occurs when there is a conflict in access of an operand location

• Two sequential instructions are to be executed and both access a particular


memory or register operand (Multiple memory port exist).
– If sequential execution, no problem occurs.
– For pipeline, the wrong value may be produced

• Consider the following machine instruction sequence:

ADD EAX, EBX /* EAX = EAX + EBX


SUB ECX, EAX /* ECX = ECX – EAX

ADD instruction does not update EAX until end of stage 5, at clock cycle 5 and
SUB instruction needs value at beginning of its stage 2, at clock cycle 4.
Data Hazards
• For correct execution, Pipeline must stall for two clocks cycles
• Without special hardware and specific avoidance algorithms, results in
inefficient pipeline usage
Types of Data Hazards
• True Dependency or Read after write (RAW) or
Flow Dependency
– Occurs when an instruction modifies a register or memory location and
succeeding instruction reads data in that location
– Hazard if read takes place before write complete.
a = b + c;
d = a*e;

• Anti-dependency or Write after read (WAR), or


– Occurs when an instruction reads a register or memory location and
succeeding instruction writes to the location
– Hazard if write completes before read takes place
a = b + c;
c = e - f;
Types of Data Hazards
• Output Dependency or Write after write (WAW)
– Two instructions both write to same location
– Hazard if writes take place in reverse of order intended sequence
a = b + c;
a = e - f;
Bernstein’s Condition
• Bernstein's conditions describe when the two are independent and can be
executed in parallel.

• Violation of the first condition introduces a flow dependency,


• Violation of the second condition introduces a anti-dependency,
• Violation of the third condition represents an output dependency
Control/Branch Hazard
• Occurs when the pipeline makes the wrong decision on a branch prediction
and therefore brings instructions into the pipeline that must subsequently be
discarded.

• In instruction pipelining, it is impossible to determine whether the branch


will be taken or not, until the instruction is actually executed,
• Different Solution Exists
– Multiple Streams
– Pre-fetch Branch Target
– Loop buffer
– Branch prediction
– Delayed branching
Control/Branch Hazard
• Multiple Streams: Use two pipelines and allow pipeline to fetch both
instructions
• Pre-fetch Branch Target: When conditional branch is recognized, the target
of branch is prefetched in addition to instructions following branch. This
target is saved until the branch instruction is executed

• Loop buffer: A loop buffer (Small, very-high-speed memory) consisting of


the n most recently fetched sequential instructions. If a branch is to be
taken, the hardware first checks whether the branch target is within the
buffer. If so, the next instruction is fetched from the buffer.
Control/Branch Hazard
• Branch prediction:
– Predict never taken
– Predict always taken
– Predict by Op-code
– Taken/Not taken switch
– Branch History Table

• Delayed branching: Rearrange instructions within a program automatically


and execute branch instructions later than actually desired.
Thank You

You might also like