Professional Documents
Culture Documents
Servers: These are computers which handle a large load (too many incoming requests)
or several small loads, like a webpage.
2. Processing language: If you use a high level language, compiler will take a
longer time to translate it to low level assembly/ machine code.
3. Processor: Higher clock speeds indicate lesser time taken for executing the
instruction.
4. IO system: Slower bus speeds, More IO indicate that the OS has to allocate more
resources to the IO system. In general, if more resources are allocated to stuff
other than the program, then it degrades program performance.
Doubt: What are the different types of RAM available in the system? Such as DRAM,
SRAM, etc.
Doubt: What are the different types of Cache available in the system, such as L1
Instruction cache, L1 data cache, L2 cache, L3 cache, etc.
Doubt: What us the full form of DDR3, what is its significance? It is called Double
Data Rate. DDR has twice the data rate than conventional RAM.
Similarly DDR
Cache is built on SRAM and is different from DRAM. SRAM is more fadter and dense,
and is therefore more expensive. A cache acts like a buffer between slower memort
and the CPU. The processor can do read/ write at much faster speeds which are not
compatible with the memory devices and therefore the cache acts as a buffer between
the memory device by loading some important functionality in the cache. Cache
clears itself as soon as power is turned off.
There are three levels of cache, like L1, L2, L3 and so on. The instructions are
located in the DRAM or Primary memory. On the contrary the BIOS is located on ROM.
So anyway, since the time to access the instructions from the primary memory is a
lot, the following procedure takes place.
1. First the L3 cache loads a block (Yes, the transfer occurs in a block and not
line by line) of the instructions.
2. The L2 cache then loads a subset of the block of instructions from the L3 cache.
3. A smaller set of the instructions in the L2 cache are then loaded into the L1
cache.
The L1 cache has a small memory size, but is really fast. It however is expensive
and therefore not used as a primary memory.
Each cache contains blocks of smaller memory sizes. Each block has a fixed size.
Each Cache has a TLB or a translation lookup table associated with it. When cache
loads data/ instructions from primary memory, the initial address of the
data/instruction is associated with that block of cache memory. This is the
translation lookup table. It contains a lookup table of the initial memory address
associated with the cache block.
When a change is to be made in the primary memory, first the change is flushed in
the L1 cache, and then the change is flushed in the L2 cache and then L3 cache,
followed by the change in the primary memory.
There is a distinction in the Instruction cache and the data cache and sometimes
the cache is unified.
The important to note is that is Instruction cache is only Read Only, whereas the
Data cache is Read and Write.
So if there is a distinction in cache, it is because it is easier to develop Read
only cache, because of the lower cost associated with the development of such
cache.
However if the cache is unified then, then it is possible to modify Instructions on
the go. Thus some instructions can be executed in a single clock cycle, by a small
modification in the instruction, whereas if the cache is disjoint then it may take
more clock cycles.
Performance parameters;
1. Response time: The total time taken to complete the task.
2. Throughput or Bandwidth: The total amount of work done in unit time.
If the response time of the processor increases then the throughput will also
increase. Reason being, The processor will be able to complete more tasks in a unit
time as compared to another processor with a lower response time.
It is also possible the processor might not be have been able to complete the first
task before the second task kicks in, thus resulting in the task waiting in a
queue. This wait period also is included in the response time. So the response time
also increases. So, if the throughput is increased (by inserting another processor)
It will be able to take up the load of the first processor and reduce waiting time
in a queue. Thus increasing throughput also reduces the response time.
However as the clock speeds increase, the variation in power losses is not
proportionate. This is because the voltage supply also decreased from the erstwhile
5V to 3.3V and so on.
Thus power is equal to the following expression = 0.5 *C *V^2 * f, where f is
switching speed/ or the clock speed in our case
bit: either a 1 or a 0
nibble: 4 bits
byte: 8 bits
halfword: 2 bytes or 16 bits
word: 4 bytes or 32 bits
doubleword: 8 bytes or 64 bits
Big endian: The MSB is loaded into the lower memory cell
Little endian: The LSB is loaded into the lower memory cell
MIPS32 instructions:
Most of the instructions are in the the following format:
opcode destination_register,register_1, register_2
opcode refers to the operand or the function
Except for the following load and store instructions all the instructions are
solely register based.
There are 32 registers, named $0 to $32. All registers are preceded by the dollar
sign.
The registers are also named as
1. $s0 - $s7: x8. Variables
2. $t0 - $t9: x10. Temporary variables
3. $zero: x1. A register whose value is fixed at zero
4. $a0 - $a3: x4. The registers which contain the parameters that are passed when a
procedure is called
5. $v0 - $v1: x2. The registers which contain the values that will be returned
after the procedure has finished executing
6. $gp:
7. $fp:
8. $sp: As obvious, the Stack Pointer.
9. $ra: The register address which contains the address of the next instruction
when a procedure is called. Generally, the instruction pointer+ the instruction
size. In this case, since all instructions are 4 bytes ling it is PC+4.
10. $at: Reserved by the compiler for handling large constants
Aritmetic operations:
1. add $s0,$s1,$s2: $s0 = $s1 + $s2 unsigned addition
2. sub $s0,$s1,$s2: $s0 = $s1 - $s2 unsigned subtraction
3. addi $s0,$s1,20: $s0 = $s1 + 20
There is no separate subi because the constant in addi can be made negative to
implement a subtraction operation.
(Doubt: Where are the multiplication and division operations?)
Load-Store operations:
1. lw register,memory address: Load a word from a word to a register
2. sw register,memory address: Store a word from a register to memory
3. lh register,memory address: Load half a word from memory to register
4. lhu register,memory address: Load half unsigned word from memory to register
5. sh register,memory address: Store half word from register to memory
6. lb register,memory address: Load a byte from memory to register
7. lbu register,memory address: Load a byte from memory to register
8. sb register,memory address: Store a byte from register to memory
9. lui register,immediate: register = immmediate << 16
Instead of naming it as r1 and r2, the registers were named as rs and rt, because
alphabetically 't' comes after 's'.
2. I format
Assembly: opcode rs, rt, IMM
Instruction: | opcode | rs | rt | IMM |
| 6 | 5 | 5 | 16 |
Logical Operations
1. Shift left/right: R format
sll rs,rt,shamt #shamt is the number of bits to shift by.
srl rs,rt,shamt #shamt is the number of bits to shift by.
Usage
beq $t0,$t1, Label
bne $t0,$t1, Label
Usage:
stl $t0, $t1, $t2 #$t0 = 1 if $t1 < $t2 else $t0 = 0
stlu $t0, $t1, $t2 #$t0 = 1 if $t1 < $t2 else $t0 = 0