Professional Documents
Culture Documents
1.
Instructions are represented as numbers and, as such, are indistinguishable from data Programs are stored in alterable memory (that can be read or written to) Memory just like data Stored-program concept
z
2 2.
Accounting A ti prg (machine code) C compiler (machine code) Payroll d t data Source code in C for Acct p prg g
Programs can be shipped as files of binary numbers binary compatibility Computers can inherit ready-made ready made software provided they are compatible with an existing ISA leads industry to align around a small number of ISAs
MIPS-32 ISA
Instruction Categories
z z z z
Registers R0 - R31
z z
PC HI LO
rs rs
rt rt
rd
sa
funct
immediate
jump target
fixed size instructions small number of instruction formats opcode always the first 6 bits
Smaller is faster
z z z
limited instruction set limited number of registers in register file li it d number limited b of f addressing dd i modes d
arithmetic operands from the register file (load-store (load store machine) allow instructions to contain immediate operands
MIPS S assembly language arithmetic statement add sub $t0, $s1, $s2 $t0, $s1, $s2
Each arithmetic instruction p performs one operation p Each specifies exactly three operands that are all p register g file ($t0,$s1,$s2) contained in the datapaths
destination source1 op source2
MIPS S assembly language arithmetic statement add sub $t0, $s1, $s2 $t0, $s1, $s2
Each arithmetic instruction p performs one operation p Each specifies exactly three operands that are all p register g file ($t0,$s1,$s2) contained in the datapaths
destination source1 op source2
op rs rt rd funct
opcode that specifies the operation register file address of the first source operand register file address of the second source operand register file address of the results destination shift amount (for shift instructions) function code augmenting the opcode
shamt 5-bits
Register g File
32 bits 5 5 5 32 32 locations 32 src2 32 src1
data
Registers are
z
data
- But register files with more locations write control are slower (e.g., a 64 word file could be as much as 50% 0% slower than a 32 word f file) ) - Read/write port increase impacts speed quadratically
z
Name
Preserve on call? constant 0 (hardware) na n.a. reserved for assembler n.a. returned values no arguments yes temporaries no saved values yes temporaries no global pointer yes stack pointer yes frame p pointer yes y return addr (hardware) yes
Irwin, PSU, 2008
Usage
MIPS S has two basic data transfer f instructions for f accessing memory lw sw $t0 4($s3) $t0, $t0, 8($s3) #load word from memory #store word to memory
The data is loaded into (lw) or stored from (sw) a register in the register file a 5 bit address The memory address a 32 bit address is formed by adding the contents of the base address register to the offset ff t value l
z
A 16-bit field meaning access is limited to memory locations within a region of 213 or 8,192 words (215 or 32,768 bytes) of the address in the base register
Irwin, PSU, 2008
2410 + $s3 =
0xf f f f f f f f 0x120040ac 0x12004094 0x0000000c 0x00000008 0 00000004 0x00000004 0x00000000 word address (hex)
Irwin, PSU, 2008
$t0 $s3
data
CSE431 Chapter 2.10
Byte Addresses
Since 8-bit bytes are so useful, S f most architectures address individual bytes in memory
z
Alignment g restriction - the memory y address of a word must be on natural word boundaries (a multiple of 4 in MIPS-32)
MIPS S provides special instructions to move bytes lb sb $t0, 1($s3) $t0, 6($s3)
0x28 19 8
memory
load byte places the byte from memory in the rightmost 8 bits of the destination register
- what happens to the other bits in the register?
store byte takes the byte from the rightmost 8 bits of a register and writes it to a byte in memory
- what happens to the other bits in the memory word?
Irwin, PSU, 2008
put typical typical constants constants in memory and load them create hard-wired registers (like $zero) for constants like 1 have special instructions that contain constants !
We'd W 'd also l lik like t to b be able bl t to l load d a 32 bit constant t ti into t a register, for this we must use two instructions a new "load load upper immediate immediate" instruction lui $t0, 1010101010101010
16 0 8 10101010101010102
Then must get the lower order bits right, use ori $t0, $t0, 1010101010101010
1010101010101010 0000000000000000 1010101010101010 0000000000000000 1010101010101010 1010101010101010
Irwin, PSU, 2008
Binary
00000 00001 00010 00011 00100 00101 00110 0 0111 00111 01000 01001 11100 11101 1 1110 11110 11111
Decimal
0 1 2 3 4 5 6 7 8 9 232 - 4 232 - 3 232 - 2 232 - 1
Irwin, PSU, 2008
... ...
23 22 21 3 2 1
20 0
1 1 1
...
1 1 1 1
bit
1 0 0 0
...
0 0 0 0
232 - 1
decimal
-8 -7 -6 6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
Irwin, PSU, 2008
complement all the bits 0101 d add dd a 1 and 0110 1011 and add a 1 1010 complement all the bits
1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 23 - 1 = 0111
Need operations to pack and unpack 8-bit 8 bit characters into 32-bit words Shifts move all the bits in a word left or right sll $t2, $s0, 8 srl $t2, $ $ $s0, 8 #$t2 = $s0 << 8 bits #$ #$t2 = $ $s0 >> 8 bits
Such S h shifts hift are called ll d logical l i l because b th they fill with ith zeros
z
Notice that a 5-bit shamt field is enough g to shift a 32-bit value 25 1 or 31 bit positions
Irwin, PSU, 2008
There are a number of bit-wise logical operations in the MIPS ISA and $t0, $t1, $t2 #$t0 = $t1 & $t2 or $t0, $t1, $t2 #$t0 = $t1 | $t2 nor $t0, $t1, $t2 #$t0 = not($t1 | $t2)
MIPS conditional branch instructions: bne $s0, $s1, Lbl #go to Lbl if $s0$s1 beq q $s0, $ , $ $s1, , Lbl #go #g to Lbl if $ $s0=$s1 $
z
Ex:
if (i==j) h = i + j; bne $s0, $s1, Lbl1 add $s3, $s0, $s1 ...
Lbl1:
Use a register (like in lw and sw) added to the 16 16-bit bit offset
z
limits the branch distance to -215 to +215-1 (word) instructions from th (i the (instruction t ti after ft the) th ) branch b h instruction, i t ti b but t most tb branches h are local anyway
from the low order 16 bits of the branch instruction
16
offset sign-extend
00 32 32 Add 32 4 32 Add 32
PC
32
CSE431 Chapter 2.20
32
?
Irwin, PSU, 2008
We have beq, bne, but what about other kinds of branches (e.g., branch-if-less-than)? For this, we need yet another instruction, slt Set on less than instruction:
slt $t0, $s0, $s1 # if $s0 < $s1 # $t0 = 1 # $t0 = 0 then else
Can use slt, beq, bne, and the fixed value of 0 in register $zero to create other conditions
z
blt $s1, $s2, Label $at, $s1, $s2 $at, $zero, Label #$at set to 1 if #$s1 < $s2
z z z
ble $s1, $s2, Label bgt $s1, $s2, Label bge $s1, $s2, Label
Such branches are included in the instruction set as pseudo instructions - recognized (and expanded) by the assembler
z
Treating signed numbers as if f they were unsigned gives a low cost way of checking if 0 x < y (index out of bounds for arrays) sltu $t0, $s1, $t2 beq $t0, $zero, IOOB # # # # # $t0 = 0 if $s1 > $t2 (max) or $s1 < 0 (min) go to IOOB if $t0 = 0
The key y is that negative g integers g in twos complement p look like large numbers in unsigned notation. Thus, an unsigned comparison of x < y also checks if x is negative as well as if x is less than y y.
Irwin, PSU, 2008
MIPS also has an unconditional branch instruction or jump instruction: j label #go to label
00 32
PC
CSE431 Chapter 2.24
32
Irwin, PSU, 2008
What if f the branch destination is further f away than can be captured in 16 bits?
The assembler comes to the rescue it inserts an unconditional jump to the branch target and inverts the condition beq $s0, $s1, L1
Saves PC+4 in register $ra to have a link to the next instruction for the procedure return Machine format (J format):
0x03 26 bit address
Main routine (caller) places parameters in a place where the procedure (callee) can access them
z
2. 3. 4. 5 5.
Caller transfers control to the callee Callee acquires q the storage g resources needed Callee performs the desired task Callee places the result value in a place where the caller can access it
z
6.
What if the callee needs to use more registers than allocated to argument and return values?
z
high addr
top of stack
$sp
One of the general registers, $sp ($29) ), is used to address the stack (which grows from high address to low address)
z
low addr
high addr
Saved argument regs (if any) Saved return addr Saved local regs (if any) Local arrays y & structures (if any)
$fp
The segment of the stack containing a procedures saved registers and local variables is its procedure frame (aka activation record)
z
The frame pointer ($fp) points t the to th first fi t word d of f the th frame f of fa procedure providing a stable base register for the procedure
- $fp is initialized using $sp on a call and $sp is restored using $fp on a return
$sp
low addr
Static data segment for S f constants and other static variables (e.g., arrays) Dynamic data segment (aka heap) for structures that grow and shrink (e (e.g., g linked lists)
z
$sp
Memory Stack
0x 7f f f f f f c
Allocate space on the heap with ith malloc() ll () and df free it with free() in C
$gp
PC
0x 0040 0000
Reserved
0x 0000 0000
Frequency Integer 16% 35% 12% 34% 2% Ft. Pt. 48% 36% 4% 8% 0%
Need hardware support for f synchronization mechanisms to avoid data races where the results of the program can change depending on how events happen to occur
z
Two memory accesses from different threads to the same location, and at least one is a write
Atomic exchange (atomic swap) interchanges a value in a register for a value in memory atomically, i.e., as one operation (instruction)
z
Implementing an atomic exchange would require both a memory read and a memory write in a single, uninterruptable instruction. An alternative is to have a pair of specially configured instructions
ll sc
CSE431 Chapter 2.32
If the th contents t t of f the th memory location l ti specified ifi d by b the th ll are changed before the sc to the same address occurs, the sc fails (returns a zero)
add $t0, $zero, $s4 ll $t1, 0($s1) sc $t0, 0($s1) #$t0=$s4 (exchange value) #load memory value to $t1 #try to store exchange #value to memory, if fail #$t0 will be 0 #t #try again i on f failure il #load value in $s4
try:
If the value in memory between the ll and the sc instructions changes, then sc returns a 0 in $t0 causing th code the d sequence t to try t again. i
Irwin, PSU, 2008
library routines
Compiler Benefits
To sort 100,000 words with the array initialized to random values on a Pentium 4 with a 3.06 clock rate, a 533 MHz system bus, with ith 2 GB of f DDR SDRAM SDRAM, using i Li Linux version i 2 2.4.20 4 20
Relative performance 1.00 2.37 2.38 2.41 Clock cycles (M) 158,615 66,990 66,521 65,747 Instr count (M) 114,938 37,470 39,993 44,993 CPI 1.38 1.79 1.66 1.46
The unoptimized code has the best CPI, the O1 version has the lowest instruction count, but the O3 version is the fastest. Why? y
Irwin, PSU, 2008
Relative performance C C C C Java Java Compiler Compiler Compiler Compiler Interpreted JIT compiler None O1 O2 O3 1.00 2 37 2.37 2.38 2.41 0.12 2.13 1.00 1 50 1.50 1.50 1.91 0.05 0.29
Observations?
Irwin, PSU, 2008
1. Register addressing
op rs rt rd
Register
word operand
Memory
word or byte operand
base register eg s e
3. Immediate addressing
op op rs rs rt rt operand offset
5. Pseudo-direct addressing
op jump address || Program Counter (PC)
CSE431 Chapter 2.38 Irwin, PSU, 2008
Memory
jump destination instruction
Memory
11100
src2 32 data
read/write addr
32
230 words
read data
32 32
write data
32 32 4 0 5 1 6 2 7 3
Decode