You are on page 1of 2

Homework

Set 5
Session: Winter 2015 Instructor: Prof. L. He
Total Points: 60 Due Date: Feb 18 (before class starts)

1.

The performance advantage of both the multi cycle and the pipelined designs is limited by the longer time
required to access memory versus use of the ALU. Suppose the memory access became 2 clock cycles
long. Draw the modified pipeline. List all the possible new forwarding situations and all possible new
hazards and their length. [10 points]

2.

We examine how data dependencies affect execution in the basic five-stage pipeline. Problems in this
exercise refer to the following sequence of instructions:

i.
ii.

Indicate dependencies in the above instruction sequence.


Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop
instructions to eliminate them.
iii.
Assuming there is full forwarding, indicate hazards and add nop instructions to eliminate them.
[10 points]
3.

Write a small program in language of your choice to compare different pipelined implementations of
following ISA:
1) Only three instructions A, B, C
2) Ignore any data or control hazards, i.e. assume all instructions in any program are independent.
3) Assume that every register adds a setup time of 10ps. Assume every pipeline stage add this penalty.
4) Assume that every instruction is infinitely divisible into smaller chunks for pipeline stages.
Your program should take as input
1) Combinational execution time in ps of A, B, C each as input
2) Number of pipeline stages
3) A program (i.e. a sequence of instruction) in a text file (one instruction per line).

It should output the total execution time of the program.


Make sure your program works. You have to submit the printout of your program and the result of the
program. Sample program will be sent out by Friday. [15 points]

4.

We have a program core consisting of five conditional branches. The program core will be executed
thousands of times. Below are the outcomes of each branch for one execution of the program core (T for
taken, N for not taken).
Branch 1: T - T - T
Branch 2: N - N - N N
Branch 3: T N T N T N
Branch 4: T T T N T
Branch 5: T T N T T N T
Assume the behavior of each branch remains the same for each program core execution. For dynamic
schemes, assume each branch has its own prediction buffer and each buffer initialized to the same state
before each execution. List the predictions for the following branch prediction schemes:
A.
B.
C.
D.

5.

Always taken
Always not taken
1-bit predictor, initialized to predict taken
2-bit predictor, initialized to weakly predict taken

[10 points]

In this exercise, we make several assumptions. First, we assumethatan N-issue superscalar processor can
execute any N instructionsin the same cycle, regardless of their types. Second, weassumethat every
instruction isindependentlychosen, without regardfor the instruction that precedesor follows it. Third,
we assumethat thereare nostallsdueto data dependencesthatno delayslotsare used, andthat
branchesexecutein the EX stage of the pipeline. Finally, we assume thatinstructions executed in the
program aredistributedas follows:
ALU
a. 50%
b. 40%

Correctly predicted beq


18%
10%

Incorrectly predicted beq


2%
5%

lw
20%
35%

sw
10%
15%

a. What is the CPI achieved by a 2-issue static superscalar processor on this program?
b. In a 2-issue static superscalar processor that only has one register write port, what speedup is
achieved by adding a second register write port?
c. For a 2-issue static superscalar processor with a classic five-stage pipeline, what speed-up is
achieved by making the branch prediction perfect?
d. Repeat exercise C, but for a 4-issue processor. What conclusion can you draw about the
importance of good branch prediction when the issue width of the processor is increased?
[15 points]

You might also like