Professional Documents
Culture Documents
Fall 2015
Instructor: Gandhi Puvvada
Friday, 9/25/2015 (A 2H 50M exam)
11:00 AM - 01:50 PM in THH201; 02:00 PM - 04:50 PM in THH101
Ques#
Topic
Page#
Time
2-4
30 min.
40
4-5
30 min.
40
CPU Performance
25 min.
30
25 min.
34
20 min.
36
9-10
30 min.
42
160 min.
222
Points
Score
Byte-addressable processors
6
Single-Cycle CPU
Total
Perfect Score
210
1 / 11
40
points) 30 min.
1.1
In preparation to the next part of the question, we have given below a solved question which is
similar to but simpler than the question in the next part.
This is similar to your HW#1A problem of finding the largest number divisible by 7 among 16
8-bit unsigned numbers. But here, we do not need the largest number divisible by 7. We need
to copy numbers from array A, if they are divisible by 5 to array B. Four-bit counters I & J are
indexes into A and B. Solution below is complete. Please go through it.
(A[I] <= 5)(I != 15)
A
15
35
20
14
70
0
B
15
35
20
70
16
16
15
16
INI
I <= 0;
J <= 0;
LOAD_A
X <= A[I];
1
2
3
if (A[I] == 5)
{B[J] <= A[I];
J <= J + 1;}
(A[I] > 5)
if (X == 5)
{B[J] <= A[I];
J <= J + 1;}
(X > 5)
DIV_5
X <= X - 5;
if (A[I] <= 5)
I <= I + 1;
if (X <= 5)
I <= I + 1;
(X <= 5)(I == 15)
DONE
2 / 11
3 / 11
if (X <= 5)
I <= I + 1;
if (X == 5)
{B[J] <= A[I];
J <= J + 1;}
DIV_5
X <= X - 5;
(A[I] > 5)
if (A[I] <= 5)
I <= I + 1;
if (A[I] == 5)
{B[J] <= A[I];
J <= J + 1;}
LOAD_A
X <= A[I];
INI
I <= 0;
J <= 0; S
K <= 0;
DONE
if (X <= 7)
J <= J + 1;
if (X == 7)
{C[K] <= B[J];
K <= K + 1;}
DIV_7
X <= X - 7;
(B[J] > 7)
if (B[J] <= 7)
J <= J + 1;
if (B[J] == 7)
{C[K] <= B[J];
K <= K + 1;}
LOAD_B
X <= B[J];
pts
(X > 7)
State diagram for Question 1.2 narrated on the next page. Please complete it.
40
(X > 5)
1.2
A
15
35
20
14
70
0
B
15
35
20
70
16
16
C
35
70
16
16
16
16
15
16
16
0
1
2
3
4
Mr. Bruin took the completed state diagram of Q 1.1 and made
the (incomplete) state diagram on the previous page. He added
two states LOAD_B and DIV_7 which are similar to the
LOAD_A and DIV_5 states. He assumed a Jmax register and used it in the LOAD_B and DIV_7
states but did not know how and when to set it to the right value in the LOAD_A and DIV_5
states, nor how and when to reinitialize J to zero.
That is when you, Mr. Trojan, were called in to help. Please complete the state diagram. Here,
instead of begin .. end of Verilog, we are using the curly parentheses {}. Also, similar to
Verilog, we assume that, in our state diagram also, a later assignment to a reg variable in a
procedure overrides an earlier assignment to the same reg variable.
40
points)
30 min.
2.1
Variation of the min/max lab: In Part 3 Method 1 you tried to hold-on control in the CMx or in the
CMn state as you expected the data to be made up of either ascending or descending chunks of
data. Now we are told that most of the time the data oscillates and it is best to give control to the
other party whether the current M[I] is useful to you or not. So there is no loop at all on any of
the 4 significant states. Of course, if it is not useful to you (i.e. if M[I] < Max in CMx state or
M[I] > Min in CMn state) you should not increment the I counter so that the other party can
take a look at the same data in CMn_Second or CMx_Second state respectively. The
"_Second" suffix tells us that he is the second person to look at the data, hence the I counter is
incremented unconditionally in the _Second states.
All needed states, state transition arrows, and RTL within each state are already completed in
the incomplete state diagram on the next page. Please complete the state transition conditions
on the next page.
2.2
6
pts
Mr. ___________ (Bruin/Trojan) says that in the Min/Max lab, if all 16 data items are the same
(identical, say 40H = 0100_0000B), it takes 16 clocks, to process them in any of the 6 parts.
How many clocks does such data take in the above design? ___________.
Note: We do not count the INI and the DONE states in counting the clocks.
4 / 11
INI
Start
LOAD
CMnx
Compare with Max
I <= 0;
Reset
5 / 11
I <= I + 1;
if (M[I] >= Max)
Max <= M[I];
CMx_Second
Compare with Max
DONE
I <= I + 1;
if (M[I] <= Min)
Min <= M[I];
CMn_Second
Compare with Min
1
CMn
Compare with Min
30+4
pts
30
A multi-cycle CPU has three types of instructions Q_type (Quick_type), M_type (Medium_type),
and S_type (Slow_type). They are named in that fashion as the Quick type instruction takes the
least number of clocks to execute where as the Slow type instruction takes the most number of
clocks to execute. Find X and Y based on the information provided in the table below.
Q_type
M_type
25
40
10
50
50
10
40
of the Benchmark
Frequency of occurrence
in the dynamic execution trace of the Benchmark
S_type
14
pts
pts
If you were to improve the CPI of one instruction by one clock, you choose _______ (Q / M / S)
because of its ________ (A / B / C / D). Note: CPI improvement is different from speeding up by a factor.
A. highest number of clocks taken
B. highest Percentage of Execution time
C. highest Frequency of occurrence
D. other (you state here if you chose this) __________________________________________________
Your colleague offered to reduce the CPI of one of the three instructions by 2 clocks provided you agree
to increasing the CPI of the other two instructions by 1 clock each. Your detailed response?
12
pts
6 / 11
4.1
X (X2X1X0) and Y (Y2Y1Y0) are 3-bit unsigned numbers. You need to compute (S=X+Y-8) and
produce the 4-bit signed results S (S3S2S1S0) represented in 2s complement system. Can the
result fit-in S for all possible X and Y? Explain. ______________________________________
____________________________________________________________________________
4.1.1
pts
pts
4.2
2+6
pts
4.3
15
pts
34
X2
C3
S3
Y2
X0
Y0
a b
a b
a b
C2
C1
C0
cin
cout cin
cout cin
s
s
s
B3
C4
Y1
cout
S2
A3
X1
S1
B2
A2
S0
B1
A1
B0
A0
a b
a b
a b
a b
C3
C2
C1
C0
cin
cout cin
cout cin
cout cin
s
s
s
s
cout
D3
D2
D1
D0
D4
You are given the first five bits of two 16-bit numbers below.
A = 1 1 0 0_1 X X X_X X X X_X X X X
B = 1 1 0 1_1 X X X_X X X X_X X X X
The highlighted bit (the bit-12) is different among the upper five bits. Lower 11 bits can any bits
and may or may not match between the A and the B.
A. A and B are unsigned numbers. The higher is ________________ (A / B / cant tell).
B.1. A and B are signed numbers represented in 2s complement system.
The higher is ________________ (A / B / cant tell).
B.2. A and B are signed numbers represented in 1s complement system.
The higher is ________________ (A / B / cant tell).
B.3. A and B are signed numbers represented in Sign-Magnitude system.
The higher is ________________ (A / B / cant tell).
C.1. The Sum S produced by adding A and B is _______________ (right / wrong / cant tell) if A, B, and
S are all unsigned numbers.
C.2. The Sum S produced by adding A and B is _______________ (right / wrong / cant tell) if A, B, and
S are all signed numbers represented in 2s complement system.
D. If we use the Adder/Subtracter design of our Ch #4 class-notes, to produce the difference D (D=A-B),
the Raw Carry C32 will be __________ (a 0 / a 1 / cant tell) and the V bit will be __________ (a 0 / a 1
/ cant tell). The difference D is __________ (right/wrong/cant tell) if A, B, D are all unsigned. The
difference D is __________ (right/wrong/cant tell) if A, B, D are all signed numbers in 2s complement .
7 / 11
5.1
5.2
5.3
Intel follows ___________ (Little Endian / Big Endian) system. In the Intel 80486 processor
system address space, byte 0000_400FH is the ____________ (most / least) significant byte of the
32-bit word with system address ______________ (state in hexadecimal).
The 32-bit word 4000 consists of the four bytes 4000, 4001, 4002, and 4003 in ____________
____________________ (Little-Endian / Big-Endian / both kinds of /neither kind of) processor.
5.4
5.5
pts
pts
9
pts
A[19:4]
____KB
A[
A31
A30
A29
A28
A27
A26
A25
A24
A23
A22
A21
A20
WE
RD
]
D[7:0]
11
pts
points) 20 min.
CS
D[
pts
36
BE15
8 / 11
6.1
The data path on the next page is nearly complete. Complete the connections to loose ends
marked with 9 arrows 1
6.2
ALUOp1
ALUop0
MemRead
MemWrite
Branch
lw
sw
beq
Link
ALUSrc
JUMP
RegWrite
BZIMal
Memtoreg
R-format
BZIM
RegDst
Control Signal Table: Complete the four rows and four columns. Whenever possible, use dont cares.
Instruction
24
pts
Bzim
Bzimal
J
Jal
It is not difficult to get an A in EE457. You need to work for it and seek help from the 457 teaching team on whatever you do not understand. We are eager to help you.
The next three topics, pipelined CPU, cache and virtual memory are interesting and challenging too. They are the focus of the midterm exam. Then we cover advanced topics. Best!
Gandhi, TAs: Sanmukh, Jizhe, Fangzhou, Pezhman, Mentors: Heqing, Madhusudhan, Ninad, Srikar,
HW Graders: Sukruthi, Shunqing, Sailesh, Jingyu, Tushar, Lab graders: Goutam, Feng, Shuyuan, Yanjingtian, Fengle
9 / 11
PC
Instruction
memory
Read
address
Instruction
[310]
Add
10 / 11
8-9
Instruction [15 0]
BZIMAL
Shift
left 2
0
M
u
x
1
Control
Control
Link
RegWrite
JUMP
BZIM
Read
data 1
16
Sign
extend
32
Function Field
Instruction [5 0]
Write
data
Read
register 2
Registers Read
Write
data 2
register
Read
register 1
ALUSrc ALUSrc
RegWrite RegUpdate
RegDst
Branch
Branch
MemRead MemRead
MemtoReg MemtoReg
ALUOp ALUOp
MemWrite MemWrite
RegDst
ALU
control
0
M
u
x
1
Shift
left 2
ALU
control
ALU
ALU
result
Zero
Zero
ALU
result
Rt
ZVC
Add
JA (Jump Address)
Write
data
Address
PCSrc
0
1
0
1
JUMP
Data
memory
Read
data
1
M
u
x
0
PCSrc
BZIM_SUCCESS
2-6
M
u
x
pts
18
11 / 11