Professional Documents
Culture Documents
Robocode Project
Introduction
To pass EECE592 you are required to submit a piece of coursework. Your submission
will be marked and the mark will represent your results for this course. In brief the
coursework will ask you to implement the Temporal Difference and Backpropagation
learning algorithms that are taught during the course.
This year, we are going to change the flavour of the coursework a little and venture into
an area that is regarded as somewhat of a research topic! Hopefully this will make it even
more fun. Read on.
Robocode
Robocode is an exciting, interactive environment designed by IBM, originally to promote
Java. However it has since become a popular tool for the exploration of topics in artificial
intelligence (AI), including neural networks and reinforcement learning (RL). In this
assignment, it is hoped that you will be able to gain hands-on experience of these areas of
AI whilst having fun developing your own intelligent robot tank using the Robocode
environment!
Please read the following problem statement carefully.
S. Sarkaria 2008
EECE 592
Robocode Project
S. Sarkaria 2008
EECE 592
Robocode Project
Report Guide
As mentioned earlier, function approximation of an RL value or Q-function is in fact a
topic of research (for example see http://rlai.cs.ualberta.ca/RLAI/RLFA.html). That it,
there is no clear or well-defined solution guaranteed to work in all cases. In fact,
successful application of function approximation to RL is a delicate art! There is
obviously then, no correct or right answer that I will be looking for. Your understanding
and expertise expressed through your report will be key to attaining a good mark.
Your report should be well structured, written clearly and demonstrate an understanding
of the backpropagation and reinforcement learning paradigms. For each of these, it
should describe the problem being addressed and provide an analysis of how that learning
mechanism was applied. It is important that you describe how your solution was
evaluated and offer a conclusion. Pay attention to your results and be scientific in
evaluating your solution.
To help you, the following set of questions provides a guide for what your report should
contain and how it will be marked. Try to be as thorough and clear as possible with your
answers. The answers to these questions are not unique. You'll be judged based on what
you can deduce from your experiments and how well you understand the theory. Please
also format your report, such that the question appears, as written below, with your
answer following it.
Important Note: I expect the entire report to be written IN YOUR OWN WORDS. In
previous years, students have been penalized for paragraphs which
where copied, verbatim, from other assignments done in the either
current or previous years.
S. Sarkaria 2008
EECE 592
Robocode Project
S. Sarkaria 2008
EECE 592
Robocode Project
Section 3 - BackPropagation
5) Describe your application of BP to Robocode.
a) Describe in your own words, how the backpropagation algorithm works.
b) Describe here your results to indicate how well the backpropagation learned
the training set.
6) Discuss the number of input, hidden and output units used (assume one hidden
layer).
a) How many hidden nodes did you use? Why?
b) Does the number of hidden nodes matter?
c) How long did the algorithm take to converge using different numbers of
hidden nodes? Provide a graph of the number of training epochs vs number of
hidden nodes (it's enough to test a few values). What can you conclude?
d) Note that there are bias nodes present in both input and hidden layers. Are
they necessary?
7) Convergence
a) What do you use as a stopping criterion. Meaning, how and what mechanism
do you use to decide when the network training is good enough?
b) How long did learning take to converge under the optimal conditions (in terms
of the number of epochs).
c) What is overfitting and what are possible ways to avoid it?
8) Overall Conclusions
a) Did your robot perform as you might have expected? What insights are you
able to offer with regard to the practical issues surrounding the application of
RL & BP to your problem? E.g. did the RL learning algorithm converge? Was
the problem being solved linear or non-linear?
References:
1
2
3
4
Fausett, L. (1994). Fundamentals of Neural Networks. Architectures, Algorithms and Applications. Prentice Hall.
Sutton R.S., and Barto A.G. (1998) Reinforcement Learning. The MIT Press.
Li, S. (2002) Rock em, Sock em Robocode. http://www-128.ibm.com/developerworks/java/library/j-robocode/
Reinforcement Learning & Function Approximation group at the University of Calgary.
http://rlai.cs.ualberta.ca/RLAI/RLFA.html
Acknowledgments:
My friend and colleague Julian Rendell, for bringing Robocode to my attention and suggesting its use as a course
project.
S. Sarkaria 2008