You are on page 1of 27

9/18/2014

Advanced Computer Architecture


Dr. Umer Farooq
umerfarooq@ciitlahore.edu.pk

Lecture overview
Previous lecture

Hardware renaissance era


Performance improvement in processors
Computer classes
Instruction Set Architecture overview

Todays lecture

ISA overview (contd)


Technology, power and energy, cost trends
IC dependability
Amdahls law
Processor performance equations

9/18/2014

ISA: Logical operations


Logical ops

C operators

Shift Left
Shift Right
Bit-by-bit AND
Bit-by-bit OR
Bit-by-bit NOT

<<
>>
&
|
~

Java operators
<<
>>>
&
|
~

MIPS instr
sll
srl
and, andi
or, ori
nor

ISA: Logical operations


Sll

$t2, $S0, 4

0000 0000 0000 0000 0000 0000 0000 1001


0000 0000 0000 0000 0000 0000 1001 0000

and $t0, $t1, $t2


$t2 = 0000 0000 0000 0000 0000 1101 0000 0000
$t1 = 0000 0000 0000 0000 0011 1100 0000 0000
$t0 = 0000 0000 0000 0000 0000 1100 0000 0000

or

$t0, $t1, $t2

$t0 = 0000 0000 0000 0000 0011 1101 0000 0000

nor

$t0, $t1, $t3 (if $t3 is all 0s )

$t0 = 1111 1111 1111 1111 1100 0011 1111 1111


4

9/18/2014

ISA: Control instructions


Involve decision making: if a certain condition is satisfied
then do something else do something else.
MIPS assembly language includes two decision making
instructions (conditional branches)
beq
register1, register2, L1 #go to L1 if contents of
register1 and 2 are equal
bne
register1, register2, L1 #go to L1 if contents of
register1 and 2 are not equal

ISA: Control instructions


Convert to assembly:
if (i == j)
f = g+h;
else
f = g-h;
Variable f through j correspond to
$s0 through $s4
Assembly code:
bne $s3, $s4, Else
add $s0, $s1, $s2
j
Exit
Else: sub $s0, $s1, $s2
Exit:
6

9/18/2014

Example
Convert to assembly:
while (save[i] == k)
i += 1;

i and k are in $s3 and $s5 and


base of array save[] is in $s6

Loop: sll
add
lw
bne
addi
j
Exit:

$t1, $s3, 2
$t1, $t1, $s6
$t0, 0($t1)
$t0, $s5, Exit
$s3, $s3, 1
Loop

Unconditional branch

ISA: More control instructions


To check equality is important
What if we want to check if a certain variable is less
than or greater than another variable
slt
slti

$t0, $S3, $S4


$t0, $S2, 10 # $t0 = 1 if $s2 < 10

9/18/2014

Summary of MIPS assembly language instructions

Summary of MIPS machine language instructions

10

9/18/2014

Procedures
Each procedure (function, subroutine) maintains a scratchpad of register
values when another procedure is called (the callee), the new procedure
takes over the scratchpad values may have to be saved so we can safely
return to the caller
1.
2.
3.
4.
5.
6.

parameters (arguments) are placed where the callee can see them
control is transferred to the callee
acquire storage resources for callee
execute the procedure
place result value where caller can access it
return control to caller

11

Registers
MIPS follows following convention in allocating its 32
registers for procedure calling
$a0 - $a3: four argument registers in which to pass
parameters
$v0 - $v1: two value registers in which to return values
$ra: one return address register to return to the point
of origin

12

9/18/2014

Jump and link


A special register (storage not part of the register file) maintains the
address of the instruction currently being executed this is the
program counter (PC)
The procedure call is executed by invoking the jump-and-link (jal)
instruction the current PC (actually, PC+4) is saved in the register
$ra and we jump to the procedures address (the PC is accordingly
set to this address)

jal NewProcedureAddress
Since jal may over-write a relevant value in $ra, it must be saved
somewhere (in memory?) before invoking the jal instruction
How do we return control back to the caller after completing the
callee procedure?
13

The stack
What if more registers are required than the four parameter
registers ($a0 - $a3) and two value registers ($v0 - $v1)
The register scratchpad for a procedure seems volatile it seems to
disappear every time we switch procedures a procedures values
are therefore backed up in memory on a stack
High address
Proc A

Proc As values
Proc Bs values
Proc Cs values

Stack grows
this way

Low address

call Proc B

call Proc C

return
return
return
14

9/18/2014

Storage management on call/return


A new procedure must create space for all its variables on the stack
Before executing the jal, the caller must save relevant values in $s0$s7, $a0-$a3, $ra, temps into its own stack space
Arguments are copied into $a0-$a3; the jal is executed
After the callee creates stack space, it updates the value of $sp
Once the callee finishes, it copies the return value into $v0, frees up
stack space, and $sp is incremented
On return, the caller may bring in its stack values, ra, temps into
registers
The responsibility for copies between stack and registers may fall
upon either the caller or the callee

15

Example 1
int leaf_example (int g, int h, int i, int j)
{
int f ;
f = (g + h) (i + j);
return f;
}

If g, h, i, j correspond to parameter register $a0 - $a3


and f corresponds to $s0 what will be MIPS assembly
code?

16

9/18/2014

Example 1
leaf_example
addi
$sp, $sp, -12
sw
$t1, 8($sp)
sw
$t0, 4($sp)
sw
$s0, 0($sp)
add
$t0, $a0, $a1
add
$t1, $a2, $a3
sub
$s0, $t0, $t1
add
$v0, $s0, $zero
lw
$s0, 0($sp)
lw
$t0, 4($sp)
lw
$t1, 8($sp)
addi
$sp, $sp, 12
jr
$ra

#making room for 3 registers $t0, $t1, $s0


#save $t1 for use afterwards
#save $t0 for use afterwards
#save $s0 for use afterwards
# $t0 = g+h
# $t1 = i+j
# f = $t0 - $t1
# return f
# restore register $s0 for caller
# restore register $t0 for caller
# restore register $t1 for caller
# adjust stack to delete three items
# jump back to calling routine
17

Example 2
int fact (int n)
{
if (n < 1) return (1);
else return (n * fact(n-1));
}
What is MIPS assembly code?

18

9/18/2014

Example 2
fact:
addi
sw
sw
slti
beq
addi
addi
jr
L1:
addi
jal
lw
lw
addi
mul
jr

$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra

# adjust stack for two items


# save the return address
# save the argument n int fact (int n)
{
# test for n<1
if (n < 1) return (1);
#if n>=1 go to L1
else return (n * fact(n-1));
# return 1
}
# pop two items off stack
What is MIPS assembly code?
# return to after jal
#n>=1 argument gets (n-1)
# call fact with n-1
# return from jal:restore argument n
# restore the return address
# adjust stack pointer to pop 2 items
# return n*fact(n-1)
# return to the caller

19

Example
ra1
a0 = 4
SP

SP

SP, v0 = 6

ra2
a0 = 3
ra3

SP

SP, v0 = 24

a0 = 2

SP, v0 = 2

SP, v0 = 1

ra4
SP

a0 = 1
ra5

SP

SP, v0 = 1

a0 = 0

20

10

9/18/2014

Memory organization
The space allocated on stack by a procedure is termed the
activation record (includes saved values and data local to the
procedure)
Frame pointer points to the start of the record and stack pointer
points to the end
variable addresses are specified relative to $fp as $sp may change
during the execution of the procedure

21

Memory organization
In addition to variables local to procedure, space needed for static
variables and for dynamic data structures
Stack starts from top and grows towards bottom
$gp points to area in memory that saves global variables
Dynamically allocated storage (with malloc()) is placed on the heap

22

11

9/18/2014

MIPS registers

23

Dealing with characters


Instructions are also provided to deal with byte-sized and half-word
quantities: lb (load-byte), sb, lh, sh
lb
$t0, 0($sp)
#read a byte from source and store it in $t0
sb
$t0, 0($gp)
#write a byte to destination
Right most 8 bits of register are used in byte operations

These data types are most useful when dealing with characters, pixel
values, etc.
C employs ASCII formats to represent characters each character is
represented with 8 bits and a string ends in the null character
(corresponding to the 8-bit number 0)
For Example Cal in C is represented as 67, 97, 108, 0

24

12

9/18/2014

ASCII code of characters

25

Example

Consider an example
where one string of
characters is copied
into another string of
characters

Convert to assembly:
void strcpy (char x[], char y[])
{
int i;
i=0;
while ((x[i] = y[i]) != `\0)
i += 1;
}
Assume that base addresses for arrays
x and y are found in $a0 and $a1 while
i is in $s0

26

13

9/18/2014

Example
strcpy:
addi $sp, $sp, -4
sw
$s0, 0($sp)
add $s0, $zero, $zero
L1: add $t1, $s0, $a1
lb
$t2, 0($t1)
add $t3, $s0, $a0
sb
$t2, 0($t3)
beq $t2, $zero, L2
addi $s0, $s0, 1
j
L1
L2: lw $s0, 0($sp)
addi $sp, $sp, 4
jr
$ra

#adjust stack for 1 more item


# save $s0
#i=0
# address of y[i] in $t1
# $t2 = y[i]
# address of x[i] in $t3
# x[i] = y[i]
# if y[i]==0 go to L2
# i = i+1
# go to L1
# y[i]==0, end of string
# pop 1 word off stack
# return
27

Large constants
Immediate instructions can only specify 16-bit constants

The lui instruction is used to store a 16-bit constant into the upper 16 bits
of a register thus, two immediate instructions are used to specify a 32bit constant
For example, what if you want to perform add operation between a 32 bit
immediate number and value stored in $s1?
32 bit number is 0000 0000 0011 1101 0000 1001 0000 0000
lui $s0, 61
$s0 = 0000 0000 0011 1101 0000 0000 0000 0000
ori $s0, $s0, 2304
$s0 = 0000 0000 0011 1101 0000 1001 0000 0000
28

14

9/18/2014

Large constants
The destination PC-address in a conditional branch is specified as a 16-bit
constant, relative to the current PC

How to branch far away?

A jump (j) instruction can specify a 26-bit constant; if more bits are
required, the jump-register (jr) instruction is used

29

Overview of different ISAs


Class of ISA:
register-memory ISA like 80x86 which can access
memory through many instructions
Load-store ISA like RISC, ARM, MIPS. Access to
memory through load or store instructions only

Memory addressing
Byte addressing or word addressing?? 80x86, ARM,
MIPS use byte addressing
Objects are aligned or no? ARM, MIPS yes. 80x86 no
An access to object of size S bytes at byte address A is
aligned if A mod s = 0
30

15

9/18/2014

ISA: memory alignment

31

Overview of different ISAs


Addressing modes
Addressing modes specify address of memory objects. Register,
Immediate, and Displacement addressing modes are used

Types and Sizes of operands


80x86, ARM, and MIPS supports sizes of 8-bit (ASCII character),
16-bit (unicode character), 32-bit (integer or word), 64-bit
(double word or long integer), floating point 32-bit (single
precision), 64-bit (double precision), 80-bit floating point
(extended double precision) supported by 80x86 only

Operations
For example data transfer, arithmetic logical, control, and
floating point

Control flow instructions


Like conditional branches, unconditional jumps, procedure calls,
returns etc.
32

16

9/18/2014

Implementation: Organization and Hardware


Design
Organization: high level aspects of computer design like
memory system, memory interconnect, design of CPU.
Term microarchitecture also used for organization
Computers can have same ISA but different organizations
like AMD opteron and Intel i7: x86 ISA but different
pipeline and cache organizations

Hardware: detailed logic design, process/packaging


technology
Computers can have same ISA and organization but
different hardware implementation.
For example Intel core i7, Intel xeon 7560: same ISA,
organization but different clock rates and memory systems
33

Trends in technology
A successful ISA must be designed to survive technology evolution
Five implementation technologies have changed at a dramatic pace
Integrated circuit logic technology
Transistor density increases by about 35% per year quadrupling every 4 years
Growth in transistor count on a chip about 40% to 55% per year or doubling
every 18 to 24 months (Moores law)

Semiconductor DRAM
Capacity increase 25% to 40% per year doubling roughly every two to three
years
Growth rate getting slower and may hit the wall sooner than later

34

17

9/18/2014

Trends in technology
Semiconductor flash: standard storage device in PMDs
Capacity increases 50% to 60% per year doubling roughly every two
years
15 to 20 times cheaper per bit than DRAM

Magnetic disk technology

Prior to 90s, increased at 30% per year


Mid 90s, growth rate rose to 100% per year
Now almost stable at 40% per year
300 to 500 times cheaper per bit than DRAM
Central to server and warehouse scale computers

Network technology
Depends upon performance of both network switches and
transmission systems
From early 80s till today, bandwidth of network switches has increased
10000 times
Latency has improved by 30 times
35

Bandwidth over latency


Bandwidth or throughput: total work done in a given time.
For example for disk data transfer it is measured as
megabytes/second
10000-25000 times improvement for processors
300-1200 times improvement for memory and disks

Latency or response time: time between the start and


completion of an event
For example for disk access latency can be in milliseconds.
30-80 times improvement for processors
6-8 times improvement for memory and disks

General rule of thumb: bandwidth grows by at least square of


improvement in latency
36

18

9/18/2014

Bandwidth and latency

37

Scaling of transistor performance and wires


Feature size: minimum size of transistor or wire in either x or y
dimension
Feature size has decreased from 10 microns (in 70s) to 22
nm(core i7)
Transistor performance improves linearly with decreasing
feature size e.g. improvement from 4-bit microprocessors to 64bit processors today
Wires do not improve with reduced feature size
Wires get shorter with reduced feature size
Wire delay is directly proportional to product of its resistance
and capacitance
With reduced feature size, capacitance and resistance per unit
length gets worse
Wire delay is a major limiting factor today for design of large ICs
38

19

9/18/2014

39

40

20

9/18/2014

Energy in microprocessors
For a complete transition i.e. 0-1-0

For single transition


(

1
2

For power
(

) 1/2

As the process technology is improved, number of transistors


switching and the frequency with which they switch increases;
hence resulting in an increased power and energy consumption
41

Growth in clock rates for microprocessors

First 32-bit microprocessor required only 2 watts


Intel core-i7 requires 130 watts
More clock rate, higher power, more heat dissipation

42

21

9/18/2014

Power and energy efficiency improvement techniques


How to distribute power, remove heat, prevent hot spots?
Do nothing well
Turn off what is not being used

Dynamic Voltage Frequency Scaling (DVFS)


During periods of inactivity, operate the device on low voltage and
frequency

43

Power and energy efficiency improvement techniques


Design for typical case
PMDs are often idle, so low power modes for memory to save
energy
Spin disks of laptops at lower rate during inactive periods

Overclocking
Run your system in turbo mode for short bursts of time and
then go to sleep
For example a 3.3 GHz core i7 can run at 3.6 GHz for short
periods

Issue of static power



Increasing with the process technology and can be as high as
50% of total power
Can be decreased with power gating (turn off power supply)
44

22

9/18/2014

Trends in cost
The impact of time, volume and commoditization
Time
Improved learning curve, better yield
Twice the yield, half the cost

Volume
Increase in volume decreases cost/chip
Decrease in development cost
Decrease in cost by 10% by doubling the volume

Commoditize the market


More vendors, high competition and less rate

45

Chip manufacturing process

46

23

9/18/2014

Wafer containing Intel core i7 processors

280 core i7 dies at


100% yield
Die area 20.7x10.5
mm2
Die is manufactured in
32 nm processing
technology

47

Cost of integrated circuits

Bose-Einstein formula

Defects/area = 0.1 to 0.3 per square inch or 0.016 to 0.057 per square cm
N is process complexity factor. N = 11.5 to 15.5 for 40 nm
48

24

9/18/2014

Dependability
Historically integrated circuits were one of the most reliable
components of a computer
With the advancement in processing technology transient and
permanent faults have become more common
Module/component reliability is a measure of continuous
service accomplishment from a reference initial time
Measure in Mean Time To Failure (MTTF)
Reciprocal of MTTF is Failure In Time (FIT) or failure rate
FIT normally measured in no of failures per billion hour

Service interruption is measured in Mean Time To Repair


(MTTR)
Mean Time Between Failures (MTBF) = MTTR + MTTF
Module availability = MTTF/(MTTF+MTTR)
Primary way to cope with failure is redundancy
49

Quantitative principles of computer design


Taking advantage of parallelism
At server level parallelism can be achieved through
multiprocessors and multiple disks
At single processor level, its best example is pipelining
At memory level, set-associative cache is an example
where multiple banks are searched in parallel

Principle of locality
90-10 rule of thumb: programs spend 90% of time in 10%
of code
Temporal locality: recently accessed data is likely to be
accessed again in future
Spatial Locality: items whose addresses are near one
another tend to be referenced close together in time
50

25

9/18/2014

Quantitative principles of computer design


Focus on the common case
In making a design trade-off, favor the frequent case over
infrequent case (applicable when not sure how to spend
resources)

Amdahls Law: performance improvement to be gained


from using some faster mode of execution is limited by
the fraction of time the faster mode can be used
Speedup: improvement in performance/execution time
that can be gained by using a particular feature
51

Amdahls law

Speedup governed by two factors


Fraction of time in the original computer that can be converted
to take advantage of the enhancement.
Termed as fraction(enhanced). Less than or equal to 1

Improvement gained by enhanced execution mode


Value is the time of original mode over time of enhanced mode
Always greater than 1. Termed as speedup (enhanced)
52

26

9/18/2014

Amdahls law

Amdahls law and law of diminishing returns: you can


not improve beyond a certain limit no matter what
that speedup (enhanced)

53

Processor performance equation

54

27

You might also like