You are on page 1of 5

ECSE 425 Computer Organisation and Architecture Group 7

Prof. Warren Gross Bndicte Leonard-Cannon (260377592)


Tuesday, April 16, 2013 Payom Meshgin (260431193)

FINAL REPORT: N-BIT LOCAL BRANCH PREDICTION
INTRODUCTION
The objective of this project was to study the effect of a dynamic, local n-bit branch predictor on the
performance of a machine organised under the MIPS architecture, compared to that of a static
predict not-taken predictor. To simulate this machine, we used the EduMIPS64 software as a base.
As the simulator we had at our disposal did not feature a dynamic predictor, the source code of the
software has been modified to include our own. In addition, two different prediction algorithms were
implemented to determine the state of the dynamic predictor.
We were fairly successful in implementing the above features, although a few issues related to the
original simulator were discovered during the testing stage, as described in the Post Mortem section.
APPROACH
N-BIT PREDICTOR
The local n-bit predictor is the central modification to the simulator. The predictor selects a prediction
scheme (i.e. predict taken or predict not taken) based on the outcome of n previously encountered
branches - unlike its static counterpart which predicts the same scheme every time a branch
instruction is encountered.
Initially, our local n-bit predictor predicts a not-taken branch, i.e., it anticipates that the condition
under which the branch occurs will not be satisfied. The predictor can be set to one of two modes: a
consecutive n-level counter or an n-bit saturating counter, where n represents the number of bits in
the prediction.
In the first configuration, the prediction is left unchanged until n consecutive mispredictions occur, in
which case the prediction scheme switches. If the number of mispredictions, which is stored in a
counter, reaches n, the predictor changes its scheme and resets the counter to 0. Moreover, if a correct
prediction is made, the counter is also reset to 0. In other words, the predictor alternates between the
two schemes after a certain number of successive mispredictions.
In the saturating counter mode, the predictor exists in 2

states. There are two boundary states in the


predictor: a strong predict taken (state 2

1) and a strong predict not-taken (state 0). In


between these states lie a number of transitional states, which are traversed by a counter. For every
taken branch, this counter is incremented, while for every not-taken branch this counter is
decremented. If the counter value is in the upper half of its possible range (i.e. its most significant bit
is 1), the current branch is predicted taken. Conversely, if the value is in the lower half of the acceptable
range, the branch is predicted not-taken.
Figure 1: n-level misprediction counter (left) and saturating counter (right) schemes for n = 2
P a g e | 2
PREDICT TAKEN
Since our predictor must choose whether or not a branch is predicted taken, the simulator must be
able to behave correctly under each of these schemes. Unfortunately, the original simulator did not
include a static predict taken branch predictor, so much of the work was focused on adding this
feature. Simply put, under the predict taken scheme, the target instruction of the branch must be
fed into the IF stage of the pipeline, as opposed to the fall-through instruction in the not-taken case.
GUI
USER INTERFACE
A few additional functions were added to the GUI of the initial EduMIPS64 software. To facilitate
switching from the default static predictor to our predictor (and vice-versa), a checkbox was added to
the main settings tab of the simulator. When unchecked, the default predictor is enabled; when
checked, the button enables our predictor. Another checkbox was included to select our n-bit
misprediction predictor (when the button is unchecked) or our n-bit saturating counter predictor
(when the button is checked). Moreover, a text field was embedded into the same panel to set the
number of prediction bits, permitting us to quickly change this value during the testing phase.
Note that clicking on the OK button is mandatory to activate the changes described above.
STATISTICS
To gather data on the performance of our simulator, extra statistics were displayed by the GUI. In
particular, the number of branch not taken stalls and of branches encountered were added to the list
of stalls displayed in the statistics window of the simulator. The branch not taken and taken stalls
correspond to the number of stalls resulting from a misprediction with the not-taken and taken
schemes respectively, while the number of misprediction stalls corresponds to the sum of the two.
IMPLEMENTATION
N-BIT PREDICTOR
Our predictor was implemented in a class of its own: OurPredictor.java. This class contains a method
for updating the predictor status (updatePredictor(condition)), which is called in every subclass of
Instruction.java corresponding to a branch instruction (BEQ.java, BNE.java, etc.). Based on the current
mode of the predictor (n-level or saturating counter), the method updates the predictors counter
and changes the prediction scheme according to the boolean variable condition, which indicates
whether the condition specified in the branch instruction was met.
PREDICT TAKEN
The IF stage of CPU.java has been modified to predict taken when advised as such by our predictor. In
such a case, the offset of the current branch is fetched, then added to the program counter. Hence,
the next instruction that will get into the IF stage is the branch target. Additionally, the counter of the
branch fall-through is stored in case of a misprediction so that RestoreIF.java can feed it back into IF
(see next section for more details)
Additional code was implemented to restore the IF stage of the pipeline with the fall-through
instruction of a branch in the case of a misprediction on a predicted taken.
RESTORING THE IF STAGE
We have created an additional class named RestoreIF.java, which is called in the case of a
misprediction detected in one of the branch classes (BEQ.java, BNE.java, etc.). RestoreIF.java has two
functions: SchemeTaken, which is called when a taken prediction was wrong and SchemeNotTaken,
P a g e | 3
which corresponds to a erroneously predicted not taken branch. The former will restore the fall-
through address into the IF stage, while the latter will feed the branch target back into the IF stage.
GUI
To implement the changes described in the above section, three classes and one properties file had to
be altered: Instruction.java , Config.java, GUIConfig.java and MessagesBundle_en.properties. In
Instruction.java, boolean variables representing the GUI objects we implemented were added, as well
as their respective getter and setter methods. In Config.java, simulation parameters were added for
these objects. In GUIConfig.java, code for the buttons and text fields has been added. Finally, the
properties file was modified to display the text corresponding to our GUI objects on the main settings
tab.
STATISTICS
Methods to keep track of the statistics were implemented into the program (including the non-
functioning predicted taken stall and branch misprediction stall counters that are part of the initial
simulator). To implement these values, we created a set of variables for each type of stall we were
interested in, as well as getter and setter methods in CPU.java. These number of branches
encountered is incremented in every branch type class (BEQ.java, BNE.java, etc.), while the number
of taken and not taken stalls are incremented in our RestoreIF.java class that is called on every
misprediction.
RESULTS
We initially ran tests on relatively complex programs taken from the EduMIPS64 samples (mySqrt,
vet20parinum, etc.) to verify the correctness of the results obtained from the simulator (registers and
data). All programs returned the same values on both the original and our modified simulators,
confirming the correct operation of the modified version.
Then, we created our own set of simple tests to verify that the pipeline was functioning correctly based
on the expected number of clock cycles, branch mispredictions and other metrics we implemented.
These tests included static for loops and nested for loops, whose behaviour can be easily determined.
Finally, we modified the EduMIPS samples to extend the number of computations performed on these
programs to obtain more global data on programs with complex branching behaviour.
IMPLEMENTED TESTS
We have used the following test programs to observe the performance of the different configurations
of our predictors compared to the original static not-taken predictor:
1. Static for loop implemented using branch equal (BEQ);
2. Static for loop implemented using branch not equal (BNE);
3. Static nested for loops (2) implemented using BNE;
4. Static nested for loops (2) implemented using BEQ;
5. Extended EduMIPS sample: mins2 (finds the minimum of a vector);
6. Extended EduMIPS sample: isort (insertion sort of a vector);
7. Extended EduMIPS sample: mysqrt (identifies complete squares and computes their square
root);
8. Extended EduMIPS sample: vet20parinum (squares or subtracts 1 from a number depending
on whether or not the number is less than 20);
9. Extended EduMIPS sample: copyvet1_10 (copies a vector element into another vector for
elements between 1 and 10);
10. Extended EduMIPS sample: copyvet50disp (copy the inverse of a vector into another and
squares the entries that are more than or equal to 50 and even).
P a g e | 4
RESULTS OF TESTS
Of course, in all prediction configurations, all test programs saved correct data onto the registers and
onto the memory of our simulated machine. As for performance benchmarks, statistics were stored
after every test to observe the performance of the simulator in different configurations. A summary
of our test results is shown below, however all raw data has been stored in an Excel spreadsheet
included in the project submission.
First, we quantified the performance of the
predictors using the CPI (clock cycles per
instruction). As seen below, the CPI hits a
negative peak for n= 2 in both configurations,
reaching an average CPI of 1.751 and 1.732,
representing improvements of 5.7% and 6.9%
respectively over the default static not-taken
predictor. It is interesting to note that beyond
a value of 6 bits in the case of the n-level
consecutive predictor or 8 bits for the
saturating counter, the prediction algorithms
start behaving similarly to the static predict
not taken predictor.
Next, we looked at the misprediction rate
returned by the simulator after running each
test. Taking an average of all misprediction
rates returned by programs running under the
same configuration leads to the plot below.
Once again, the optimal number of prediction
bits is n = 2, which yields misprediction rates of
15.1% for the consecutive n-level mode and
15.4% for the n-bit saturating counter mode.
Both configurations ended up being more
accurate than the static predictor, which had a
misprediction rate of 39.5%.
Finally, the last metric of note, memory size,
was not inherently implemented in the code
but it can be determined quite easily. Quite
simply, as we increase the prediction bits in a
particular configuration of the predictor, the
larger hardware requirements will be (bigger
counters).


COMMENT ON THE TESTS
Based on our tests, we determined that our modifications did improve the performance of the
simulator for small values of n. Moreover, on average, optimal results were observed for n=2, which
confirms the theory discussed in class that 2-bit prediction delivers the most performance gain.
1.74
1.76
1.78
1.8
1.82
1.84
1.86
1 2 3 4 5 6 7 8 9 10
Static Predict Not
Taken
N-bit Saturating
Counter
N-level
consecutive
Figure 2: Average CPI of Test programs over number of prediction bits
10%
15%
20%
25%
30%
35%
40%
45%
1 3 5 7 9
Static Predict Not
Taken
N-bit Saturating
Counter
N-level
consecutive
Figure 3: Misprediction Rate over number of prediction bits
P a g e | 5
POST-MORTEM
SUCCESSFUL IMPLEMENTATION
According to the tests we ran for comparing the data and register values obtained after running
several samples on our simulator and the original one, our two predictors do not affect their final
outputs. Moreover, we have calculated the expected number of mispredictions (taken and not taken)
expected in the case of simple programs, which coincided with the values returned by the simulator,
with two exceptions caused by simulator bugs (see Post Mortem section). Therefore, we can assume
that our predictor has been successfully implemented.
Moreover, as we expected, a value of n equal to two corresponded to the optimal predictor in most
cases compared to other values of n and to predict not taken. Moreover, we observed that in general,
the performance of our predictor decreased as n increased past a value of two and converged towards
that of the predicted not taken scheme. This result is coherent with the fact that as n increases, the
probability of changing scheme decreases; the higher the value of n, the more statically it behaves.
PROBLEMS WITH EDUMIPS
After running test3.s and
test4.s containing two
nested loops, we realized
that our predictor was
not returning the number
of stalls that we were
expecting based on our
calculations. We then
looked at the cycles displayed on the GUI and noticed very unusual program behaviour (superimposed
IF and ID stages) as shown on Figure 4. To ensure that our modifications were not the cause of this
bug, we ran the same two test files on the original simulator and obtained the same results. Therefore,
we concluded that the EduMIPS64 simulator was unreliable and could cause errors in our test results.
FUTURE IMPROVEMENTS
For this project, we supplemented the EduMIPS64 simulator with an n-bit dynamic predictor. We ran
various tests on our implementation and observed that optimal performance occurred with two
predictor bits, conforming to the theory.
In the middle of the project, we had contemplated modifying our predictor so that it would work with
forwarding enabled. However, due to time constraints, as well as the weird behaviour we had
discovered as shown in Figure 4, we decided against implementing our predictor with forwarding
enabled.
Moreover, it would not be difficult to implement more complex dynamic branch predictors such as a
correlating predictor or a tournament predictor. The only required modification is that the program
counter of the current branch instruction would have to be included as a parameter of these two
predictors.
In the end, we enjoyed tinkering with the simulator and observing the effect of our individual
modifications. The problems encountered due to EduMIPS64 notwithstanding, we gained a deeper
familiarity with the MIPS 5-stage pipeline.
Figure 4: abnormal behavior caused by running test3.s on the original simulator

You might also like