You are on page 1of 9

(2009/10/23 1:00PM~3:30PM)

(Key)
1. 2.
3.
1. (18%)
(a) (6%) The three key factors for the performance of a computer are instruction count, clock cycles per
instruction, and clock cycle time (or clock rate). Which of these factors will be affected by compiler,
instruction set architecture, and the implementation of processor, respectively? Explain the reasons.
Factors affected by compiler: IC, CPI
Factors affected by instruction set architecture: IC, CPI, Clock cycle time
Factors affected by the implementation of a processor: CPI, Clock cycle time
(b) (10%) The table below shows the number of instructions executed and the average CPI for executing
application 1 and 2 on three different machines. Machine A has a clock rate of 4 GHz, machine B, 2
GHz, and machine C, 1 GHz. (must show computation to get full credit)
Machine A

Machine B

Machine C

Instructions

Avg CPI

instructions

Avg CPI

Instructions

Avg CPI

Application 1

4.0E+9

1.0E+10

2.0E+10

2.5

Application 2

4.0E+11

10

2.0E+11

2.0E+10

3.0

i. (4%) If the workload is to run both applications once a week, which machine has the best
performance?
Machine A

Machine B

Machine C

Application 1

2 sec

25 sec

50 sec

Application 2

1000 sec

100 sec

60 sec

Total time

1002 sec

125 sec

110 sec

Machine C has the best performance.


ii. (3%) If application 1 must run three times as often as application 2 in a week, which machine is
the fastest?
Weighted average for machine A = 2*3 + 1000 = 1006 sec
Weighted average for machine B = 25*3 + 100 = 175 sec
Weighted average for machine C = 50*3 + 60 = 210 sec
Machine B is the fastest
iii.

(5%) If we consider application 1 and application 2 are equally important, which machine
should be the fastest?
Pick a machine as a reference and compute the ratios
Machine A

Machine B

Machine C

Application 1

0.08

0.04

Application 2

10

16.667

Geometric mean

0.894

0.816

Machine A is the fastest. The answer will be the same no matter what machine is used as the base.
2. (7%)
(a) (2%) Describe the definition of Amdahls law.
Timproved

Taffected
Tunaffected
improvemen t factor

(b) (5%) Consider a computer running programs with CPU times shown in the following table:
FP instr.

INT instr.

L/S instr.

Branch instr.

Total time

100 s

200 s

150 s

50 s

500 s

i. By how much is the total time reduced in percentage if the time for FP operations is reduced by
50%?
ii. By how much must we improve the speed of the INT instructions if we want this program run
1.25 times faster? How about making it 2 times faster?
b) i. (100 0.5) 500 = 10%
ii. Speedup = 1.25 = 500 (200/n + 300) n = 2
Speedup = 2 = 500 (200/n + 300) n = 4 (x)
So, it is impossible to make the program 2 times faster.

3.

1.

2.
3.

4.

(8%) What are the four underlying principles for hardware design? For each principle, give one
example from the MIPS ISA design that follow the principle.
Ans:
Simplicity favors regularity:
Example: a) All ALU instructions have 3 operands, (b) Fix length instructions (c) $rs field is
always a source operand.
Smaller is faster
Example: a) 32 registers and b) three instruction formats
Make the common case fast
Example: a) immediate are commonly used b) most branches compare for equal or not
equal, or compare to zero, c) immediate are usually small
Good design demands good compromises
Example: a) trade large immediate for fixed length instructions and b) trade code space for
more efficient pipeline processing

4. (13%) The following figure shows a C program and its MIPS assembly program.

(a) (3%) In this code, why does it need to save registers $ra and $a0?
Ans: This is a recursive call. The return address and the arguments must be saved otherwise the
call of fib(n-1) and fib(n-2) will overwrite $ra and $a0.
(b) (3%) In this code, $s1 is used as a temporary to hold fib(n-1)+fib(n-2), so why not use register $t1
instead?
if $t1 is used, it must be saved and restored across the two calls fib(n-1) and fib(n-2). This will
require four instructions. Using $s1 instead, only require two instructions one store at the
procedure prologue and one load at the epilogue.
(c) (4%) It takes one instruction to compare $a0 with 2, and one branch instruction to skip the
computations if $a0 is less than 2. If we have a blti $a0, 2, L1 (branch less than immediate) instruction,
one instruction could be eliminated. Should the MIPS ISA consider adding this instruction?
Ans: No. First, this instruction would require a new format it does not fit in the R-format, nor the Iformat. Second, the MIPS ISA wants to simplify branch instructions so that the branch functional unit
can quickly determine if a branch should be taken or fall-through. This instruction requires a subtraction
operation and violates the intention to make quick branch decision.
(d) (3%) The above assembly code may be generated from the MIPS C compiler because the frame pointer
($r30) was not used. The GNU C MIPS compiler, on the other hand, does use the frame pointer ($r30)
to reference arguments, local variables, and saved registers. Will there be a problem if this fib routine is
called by some separate modules generated from the GNU C MIPS compiler?

Ans: No problems. The MIPS C Compiler treats the register $r30 as a callee-save register. In this fib
routine, since $r30 is not used, there is no need to save and restore $r30. When the function fib() returns,
the caller still have its frame pointer in $r30, not changed.

(10%) Assume that the variables a, b, c and d are assigned to register $s0,$s1,$s2,and $s3 respectively.
Assume that the base address of the arrays A and B are in register $s4 and $s5. For the following MIPS
assembly code, write comments for each instruction and the corresponding C statement for each code
sequence?
(b).
(a)
sll $t0,$s1,2
addi $s4,$s4, 20
add $t2,$s4,$t0
sll $t1,$s1,2
lw $t1,4($t2)
add $s4,$s4,$t1
sll $t1, $t1, 2
lw $s0,-8($s4)
add $t2, $t1,$s5
lw $t1, 0($t2)
add $s2, $s2,$t1

Ans: Comments for each instruction must be included


a) a = A[b+3];
b) c = c + B[A[b+1]];
5. (12%) Turn the following MIPS machine instructions into assembly language form based on the
following table and Table 1. You can define label name by yourself, if necessary.

0x00ae8020 0x00000000

0x02008140 0x8fb1fffc

Name

$zero

$v0-$v1

$a0-$a3

$t0-$t7

$s0-$s7

$t8-$t9

$sp

Reg. no.

2-3

4-7

8-15

16-23

24-25

29

Table 1. MIPS Instruction encoding

Table 2. MIPS assembly language

0x00ae8020
0x00000000
0x02008140
0x8fb1fffc

add $s0,$a1,$t6
sll $zero,$zero,0
sll, $s0,$s0,5
lw $s1,-4($sp)

6. (9%) Give the description of the MIPS instructions ll (load linked word) and sc (store conditional word)
as follows:
Instruction

Meaning

ll rt, addr(rs)

rt = Memory[rs + addr] (Load linked word as 1st half of atomic swap)

sc rt, addr(rs)

Memory[rs + addr] = rt, rt = 1 if succeeded or = 0 if failed


(Conditional store word as 2nd half of atomic swap)

Each entry in the following table has code and also shows the contents of various registers. The
notatiaon, ($s1) shows the contents of a memory location pointed to by register $s1. The assembly
code in the table is executed in the cycle shown on parallel processors with a shared memory space. Fill
out the table with the value of the registers and memory for each given cycle.
Processor 1

Processor 2

Cycle
0

try: add $t0, $zero, $s2


ll $t1, 0($s1)

Processor 1
$s2

$t1

$t0

MEM
($s1)

Processor 2

100

$s2

$t1

$t0

$s2

$t1

$t0

10

20

30

1
try: add $t0, $zero, $s2

ll $t1, 0($s1)

sc $t0, 0($s1)

sc $t0, 0($s1)

beq $t0, $zero, try

beq $t0, $zero, try

add $s2, $zero, $t1

Processor 1

Processor 2

Cycle
$s2

$t1

$t0

MEM
($s1)

100

10

20

30

100

10

20

30

100

100

10

20

10

ll $t1, 0($s1)

100

100

10

100

10

sc $t0, 0($s1)

100

10

10

100

sc $t0, 0($s1)

beq $t0, $zero, try

100

10

10

100

beq $t0, $zero, try

add $s2, $zero, $t1

100

10

100

100

try: add $t0, $zero, $s2


ll $t1, 0($s1)

try: add $t0, $zero, $s2

Processor 1

Processor 2

7. (10%) About the Booth algorithm.


(a) (6%) Complete the following table for the 2-bit Booth encoding and describe the advantages and
disadvantages of the algorithm:

(b) (3%)

Current bits
ai

Previous bit
ai-1

Operation

Reason
For a

multiplication with multiplier 0110 1110 0010 1111, please compare the number of additions and
subtractions required by traditional multiplication algorithm and Booth algorithm.
(c) (1%) A=1111 0000 1111 0000 and B=1100 1100 0011 0011. Which one should be used as the
multiplier?
(a)
Current bits Previous bit Operation
Reason
ai
ai-1
0
0
Do nothing
Middle of a run of 0s
0
1
Add multiplicand
End of a run of 1s
1
0
Sub multiplicand
Beginning of a run of 1s
1
1
Do nothing
Middle of a run of 1s

Advantages: may be applied to the multiplication for both unsigned and signed numbers; may
reduce arithmetic operations for a multiplier with long runs of 1s
Disadvantages: in case that the multiplier interlaces 0 and 1, the Booth algorithm must do many
times additions and subtractions.
(b)

Traditional multiplication algorithm: need to add when the multiplier bit is 1 10 adds
Booth algorithm: needs 4 adds and 4 subs
c) A

8. (5%) Circle all the correct statements from the following:


a) MIPS is an acronym, it means Million Instructions Per Second
b) The JR instruction can only be used for procedure or function returns
c) MIPS-64 has longer instructions than MIPS-32
d) Since the shamt field in the R-format has only 5 bits, MIPS-64 cannot shift more
than 31 bits.
e) The MIPS ISA has five addressing modes: immediate, register, base, PC-relative,
and pseudodirect addressing.
(e)
9. (4%) In which of the following MIPS instructions, the rt field designates the
destination register.
a) Load word
b) Load linked
c) Store conditional
d) Store word

e) Branch on equal
f) Load upper immediate
(a, b, c, f)
10. (4%) Which of the following may cause integer overflow?
a) Adding two positive integers (2s complement)
b) Adding one positive and one unsigned numbers
c) Adding two negative integers
d) Using SSE instruction to do multimedia (video/audio) computation
(a, c)

You might also like