You are on page 1of 5

Desktop computers: Meant for single user use and generally has a GUI.

Servers: These are computers which handle a large load (too many incoming requests)
or several small loads, like a webpage.

Embedded computers: Meant to run a single application and integrated with


necesssary hardware and delivered as a single system. Eg. Used in cars, television
sets, refridgerators, etc.

Factors affecting program performance:


1. Algorithm: Tells the number of instructions being executed, often known by its
complexity or the big O notation

2. Processing language: If you use a high level language, compiler will take a
longer time to translate it to low level assembly/ machine code.

3. Processor: Higher clock speeds indicate lesser time taken for executing the
instruction.

4. IO system: Slower bus speeds, More IO indicate that the OS has to allocate more
resources to the IO system. In general, if more resources are allocated to stuff
other than the program, then it degrades program performance.

Doubt: What are the different types of RAM available in the system? Such as DRAM,
SRAM, etc.
Doubt: What are the different types of Cache available in the system, such as L1
Instruction cache, L1 data cache, L2 cache, L3 cache, etc.

Doubt: What us the full form of DDR3, what is its significance? It is called Double
Data Rate. DDR has twice the data rate than conventional RAM.
Similarly DDR

Cache is built on SRAM and is different from DRAM. SRAM is more fadter and dense,
and is therefore more expensive. A cache acts like a buffer between slower memort
and the CPU. The processor can do read/ write at much faster speeds which are not
compatible with the memory devices and therefore the cache acts as a buffer between
the memory device by loading some important functionality in the cache. Cache
clears itself as soon as power is turned off.
There are three levels of cache, like L1, L2, L3 and so on. The instructions are
located in the DRAM or Primary memory. On the contrary the BIOS is located on ROM.
So anyway, since the time to access the instructions from the primary memory is a
lot, the following procedure takes place.
1. First the L3 cache loads a block (Yes, the transfer occurs in a block and not
line by line) of the instructions.
2. The L2 cache then loads a subset of the block of instructions from the L3 cache.
3. A smaller set of the instructions in the L2 cache are then loaded into the L1
cache.
The L1 cache has a small memory size, but is really fast. It however is expensive
and therefore not used as a primary memory.

Each cache contains blocks of smaller memory sizes. Each block has a fixed size.
Each Cache has a TLB or a translation lookup table associated with it. When cache
loads data/ instructions from primary memory, the initial address of the
data/instruction is associated with that block of cache memory. This is the
translation lookup table. It contains a lookup table of the initial memory address
associated with the cache block.
When a change is to be made in the primary memory, first the change is flushed in
the L1 cache, and then the change is flushed in the L2 cache and then L3 cache,
followed by the change in the primary memory.

There is a distinction in the Instruction cache and the data cache and sometimes
the cache is unified.
The important to note is that is Instruction cache is only Read Only, whereas the
Data cache is Read and Write.
So if there is a distinction in cache, it is because it is easier to develop Read
only cache, because of the lower cost associated with the development of such
cache.
However if the cache is unified then, then it is possible to modify Instructions on
the go. Thus some instructions can be executed in a single clock cycle, by a small
modification in the instruction, whereas if the cache is disjoint then it may take
more clock cycles.

Instruction Set Architecture: It encompasses a list of all instructions which the


user can give to the processor.

Performance parameters;
1. Response time: The total time taken to complete the task.
2. Throughput or Bandwidth: The total amount of work done in unit time.

If the response time of the processor increases then the throughput will also
increase. Reason being, The processor will be able to complete more tasks in a unit
time as compared to another processor with a lower response time.

It is also possible the processor might not be have been able to complete the first
task before the second task kicks in, thus resulting in the task waiting in a
queue. This wait period also is included in the response time. So the response time
also increases. So, if the throughput is increased (by inserting another processor)
It will be able to take up the load of the first processor and reduce waiting time
in a queue. Thus increasing throughput also reduces the response time.

Another important equation which measures the performance goes as such:


Total time = CPI (Clock cycles per instruction) * Number of instructions * size of
clock cycles

As the size of of processors decrease the amount of power dissipated increases.


This is mainly due to leakage current.
Another factor to consider is that as the clock speeds increase there are power
losses called, dynamic losses which occur during switching. Thus this greatly
limits the clock rate.

However as the clock speeds increase, the variation in power losses is not
proportionate. This is because the voltage supply also decreased from the erstwhile
5V to 3.3V and so on.
Thus power is equal to the following expression = 0.5 *C *V^2 * f, where f is
switching speed/ or the clock speed in our case

bit: either a 1 or a 0
nibble: 4 bits
byte: 8 bits
halfword: 2 bytes or 16 bits
word: 4 bytes or 32 bits
doubleword: 8 bytes or 64 bits

Why are there only 32 registers?


Having too many registers slows down the computer processes, because then it will
take more time to access those registers, since those electronic systems will take
more time to travel.
Secondly, consider the MIPS architecture. It has an instruction size of 32 bits. To
identify one among the 32 registers, we need a unique 5 bit register code.
Considering that there are 3 operands, 15 of the 32 bits in the instruction go in
specifying the registers. If the number of registers are increased the number of
bits neeed to specify the opcode will decrease.
There are CPUs with more than 32 registers, but they are VLIW processors such as
DSP processors.
Reason for having 3 operands for a particular operation is simplicity and
regularity.

Big endian: The MSB is loaded into the lower memory cell
Little endian: The LSB is loaded into the lower memory cell

MIPS32 is a big endian system


x86 is a little endian system

MIPS32 instructions:
Most of the instructions are in the the following format:
opcode destination_register,register_1, register_2
opcode refers to the operand or the function
Except for the following load and store instructions all the instructions are
solely register based.
There are 32 registers, named $0 to $32. All registers are preceded by the dollar
sign.
The registers are also named as
1. $s0 - $s7: x8. Variables
2. $t0 - $t9: x10. Temporary variables
3. $zero: x1. A register whose value is fixed at zero
4. $a0 - $a3: x4. The registers which contain the parameters that are passed when a
procedure is called
5. $v0 - $v1: x2. The registers which contain the values that will be returned
after the procedure has finished executing
6. $gp:
7. $fp:
8. $sp: As obvious, the Stack Pointer.
9. $ra: The register address which contains the address of the next instruction
when a procedure is called. Generally, the instruction pointer+ the instruction
size. In this case, since all instructions are 4 bytes ling it is PC+4.
10. $at: Reserved by the compiler for handling large constants

Memory is byte organized

Aritmetic operations:
1. add $s0,$s1,$s2: $s0 = $s1 + $s2 unsigned addition
2. sub $s0,$s1,$s2: $s0 = $s1 - $s2 unsigned subtraction
3. addi $s0,$s1,20: $s0 = $s1 + 20

There is no separate subi because the constant in addi can be made negative to
implement a subtraction operation.
(Doubt: Where are the multiplication and division operations?)

Load-Store operations:
1. lw register,memory address: Load a word from a word to a register
2. sw register,memory address: Store a word from a register to memory
3. lh register,memory address: Load half a word from memory to register
4. lhu register,memory address: Load half unsigned word from memory to register
5. sh register,memory address: Store half word from register to memory
6. lb register,memory address: Load a byte from memory to register
7. lbu register,memory address: Load a byte from memory to register
8. sb register,memory address: Store a byte from register to memory
9. lui register,immediate: register = immmediate << 16

The memory loaction is referenced by it's address which is stored in a register.


The memory location is referenced as such:
If the address of the memory location is 0x10010000 and that value is stored in
say, $s0, then the address is referenced by enclosing the register containing the
address in ()
So, lw/sw $t0,($s0) instruction will result in register $t0 containing the data
located in ($s0) or 0x10010000.
(Doubt: What is the difference between lh/lhu or lb/lbu. Why is such a distinction
not present in sb,sh. Also why do lw & sw not have such a distinciton)
The difference between lh/lhu and similarly lb/lbu is that of sign extension. For
eg. when you try to load a half word 0xffff from a memory location then, the
register into which it is loaded now contains 0xffffffff instead of 0x0000ffff.
Since this is not a concern if a word is being loaded in a memory location, so a
separate opcode for the same is not required for it.

An important concept in load and store instructions is alignment restriction.


Alignment restriction leads to faster data transfers.
So for an illustration let's say the Data segment starts from 0x10010000
This is followed by 2 sw instructions, so there is a word at 0x10010000 and another
word at 0x10010004;
So if I do a lw instruction from 0x1001002, this will show an error, because the
address 0x10010002 is not a multiple of 4, and hence the error. If however I do a
lhu/lh from the same address, i.e 0x10010002, then it will not show an error
because the address is a multiple of 2.

Instruction formats in MIPS:


Few terminology:
1. opcode: 6 bits
2. rs: 5 bits
3. rt: 5 bits
4. rd: 5 bits
5. shamt: 5 bits
6. func: 6 bits

Instead of naming it as r1 and r2, the registers were named as rs and rt, because
alphabetically 't' comes after 's'.

Now, to the instruction formats


1. R format
Assembly: opcode rd,rs,rt
Instruction: | opcode| rs| rt| rd| shamt| func|
| 6 | 5 | 5 | 5 | 5 | 6 |
Example: add $s0, $t0, $t1
Instruction:

2. I format
Assembly: opcode rs, rt, IMM
Instruction: | opcode | rs | rt | IMM |
| 6 | 5 | 5 | 16 |

Logical Operations
1. Shift left/right: R format
sll rs,rt,shamt #shamt is the number of bits to shift by.
srl rs,rt,shamt #shamt is the number of bits to shift by.

2. and/or/xor/nor: R format, bit by bit logical operation


and rd,rs,rt
or rd,rs,rt
nor rd,rs,rt
xor rd,rs,rt

3. and/or/xor/nor immediate: I format, bit by bit operation


andi rd,rs,IMM
ori rd,rs,IMM
nori rd,rs,IMM
xor rd,rs,IMM

The NOT operation can be implemented by NORing with $zero

Conditional statements/ Branch


These are different from the jump instruction

1. beq: branch if equal


2. bne: branch if not equal

Usage
beq $t0,$t1, Label
bne $t0,$t1, Label

Comparision statements: set less than, set less than unsigned


1. stl
2. stlu: for unsiged comparision

Usage:
stl $t0, $t1, $t2 #$t0 = 1 if $t1 < $t2 else $t0 = 0
stlu $t0, $t1, $t2 #$t0 = 1 if $t1 < $t2 else $t0 = 0

jal: jump and link.


Goes to the specified address and stores the next address in $ra (register
address).
Usage: jal ProcedureLabel
So if jal ProcedureLabel is stored in address PC, then $ra contains PC+4

jr: Jump register


Jumps to the address specified in a register
Usage: jr $t0

You might also like