You are on page 1of 5

EECE 592

Robocode Project

Introduction
To pass EECE592 you are required to submit a piece of coursework. Your submission
will be marked and the mark will represent your results for this course. In brief the
coursework will ask you to implement the Temporal Difference and Backpropagation
learning algorithms that are taught during the course.
This year, we are going to change the flavour of the coursework a little and venture into
an area that is regarded as somewhat of a research topic! Hopefully this will make it even
more fun. Read on.
Robocode
Robocode is an exciting, interactive environment designed by IBM, originally to promote
Java. However it has since become a popular tool for the exploration of topics in artificial
intelligence (AI), including neural networks and reinforcement learning (RL). In this
assignment, it is hoped that you will be able to gain hands-on experience of these areas of
AI whilst having fun developing your own intelligent robot tank using the Robocode
environment!
Please read the following problem statement carefully.

S. Sarkaria 2008

EECE 592

Robocode Project

Problem Statement and Deliverables


You are required to hand in a written report, the size of which should be consistent with
that expected for a 3-credit course. The source code for your robot tank must be included
in an appendix of the report, but will not be marked.
The report must be submitted as softcopy via email and should be in Microsoft Word
format. Please also submit a PDF version of the report. The softcopies will be marked
and returned via email. Remarks will be inserted using Words commenting feature.
Your work must demonstrate the practical application of both the backpropagation
learning algorithm (BP) and reinforcement learning (RL). Both topics are covered in the
course. You should apply these methods as suggested below.
Application of RL
At the start of a battle, your tank will be faced with 4 enemies in the battlefield.
Your bot must act fast to defend itself but needs to attack and kill to earn more
points. What do you do? Which target do you attack? Which is the easiest to
eliminate? How do you make this selection? Basically, if fired at, which opponent
are you most likely to hit? Target selection is an area where RL may help to learn
the states that will most likely result in a positive hit. It is suggested to keep your
actions high-level. E.g. attack, retreat, pursuit rather than say, move
forward, turn left etc.
Application of BP
A key component of RL is the value function, or Q-function in our case. One of
the easiest ways to implement this function is as a look-up table. However, in any
real world problem the state space is likely to be restrictively large. E.g. for
backgammon, there are 1020 different states! In this case such a look-up table is
clearly not tractable. The key is the ability to approximate the value function.
There are many ways. For this assignment you are to implement a feed-forward
multi-layer perceptron trained via backpropagation to approximate the value
function. There are at least a couple of ways to do this. (A) The network is trained
during RL training itself. Or (B) the RL training is implemented using a look-up
table and the contents of the table used to train the network. The network would
then used by the RL algorithm during battle. Other approaches may exist too. This
is the research element of the assignment!

S. Sarkaria 2008

EECE 592

Robocode Project

Report Guide
As mentioned earlier, function approximation of an RL value or Q-function is in fact a
topic of research (for example see http://rlai.cs.ualberta.ca/RLAI/RLFA.html). That it,
there is no clear or well-defined solution guaranteed to work in all cases. In fact,
successful application of function approximation to RL is a delicate art! There is
obviously then, no correct or right answer that I will be looking for. Your understanding
and expertise expressed through your report will be key to attaining a good mark.
Your report should be well structured, written clearly and demonstrate an understanding
of the backpropagation and reinforcement learning paradigms. For each of these, it
should describe the problem being addressed and provide an analysis of how that learning
mechanism was applied. It is important that you describe how your solution was
evaluated and offer a conclusion. Pay attention to your results and be scientific in
evaluating your solution.
To help you, the following set of questions provides a guide for what your report should
contain and how it will be marked. Try to be as thorough and clear as possible with your
answers. The answers to these questions are not unique. You'll be judged based on what
you can deduce from your experiments and how well you understand the theory. Please
also format your report, such that the question appears, as written below, with your
answer following it.
Important Note: I expect the entire report to be written IN YOUR OWN WORDS. In
previous years, students have been penalized for paragraphs which
where copied, verbatim, from other assignments done in the either
current or previous years.

S. Sarkaria 2008

EECE 592

Robocode Project

Section 1 - Data Representation


1) Robocode provides a wealth of data for your tank.
a) How did you encode the data in your project? In your answer, consider
whether the data is Boolean, nominal or categorical?
b) In general, why are high dimensional data sets problematic? Describe what
kinds of problems this causes for BP. Describe what kinds of issues this
causes for RL.
c) When applying function approximation to RL, what is it about the nature of
the value function that may affect the level of success? E.g. Think about why
TD Gammon is so successful given the state space is an astronomical 1020.
Section 2 - Reinforcement Learning
2) Describe your application of RL to Robocode
a) Describe what task you tried to optimize using RL in terms of states and
actions. How big is your state space?
b) Describe in detail how you applied BP to approximate the RL value function.
Did you use approach (A) or (B) as described in Suggested Application of
BP or some other approach? What are the problems and advantages with
each approach? (You might want to read up on some of the literature in this
area to help you).
c) Describe the performance of your robot due to RL training. You will need to
devise a way to do this. For example, use one or more other robot as
benchmarks and show how well your robot did in battle after different levels
of training. Graphing your results may help.
3) In RL, a policy is realized by a value function.
a) Describe what the term policy in RL generally means?
b) The value function implements a mapping from perceived states or stateaction pairs to actions. Describe in your own words, what this value function
actually represents. Describe in your own words, what TD learning is doing.
c) What is the difference between continuous and episodic tasks? Also what is
the difference between deterministic and non-deterministic tasks? Which of
these does Robocode fall into?
d) Describe why Q-learning not V-learning must be used.
e) In Robocode, what do you think would happen if you wait for rewards only in
terminal states?
4) While training via RL, the next move is selected randomly with probability and
greedily with probability 1
a) How well does your robot perform when moves are greedy only?
b) Explain why exploratory moves are necessary.
c) What is the optimal value for ? Provide a graph of the measured performance
of your tank vs .

S. Sarkaria 2008

EECE 592

Robocode Project

Section 3 - BackPropagation
5) Describe your application of BP to Robocode.
a) Describe in your own words, how the backpropagation algorithm works.
b) Describe here your results to indicate how well the backpropagation learned
the training set.
6) Discuss the number of input, hidden and output units used (assume one hidden
layer).
a) How many hidden nodes did you use? Why?
b) Does the number of hidden nodes matter?
c) How long did the algorithm take to converge using different numbers of
hidden nodes? Provide a graph of the number of training epochs vs number of
hidden nodes (it's enough to test a few values). What can you conclude?
d) Note that there are bias nodes present in both input and hidden layers. Are
they necessary?
7) Convergence
a) What do you use as a stopping criterion. Meaning, how and what mechanism
do you use to decide when the network training is good enough?
b) How long did learning take to converge under the optimal conditions (in terms
of the number of epochs).
c) What is overfitting and what are possible ways to avoid it?
8) Overall Conclusions
a) Did your robot perform as you might have expected? What insights are you
able to offer with regard to the practical issues surrounding the application of
RL & BP to your problem? E.g. did the RL learning algorithm converge? Was
the problem being solved linear or non-linear?
References:
1
2
3
4

Fausett, L. (1994). Fundamentals of Neural Networks. Architectures, Algorithms and Applications. Prentice Hall.
Sutton R.S., and Barto A.G. (1998) Reinforcement Learning. The MIT Press.
Li, S. (2002) Rock em, Sock em Robocode. http://www-128.ibm.com/developerworks/java/library/j-robocode/
Reinforcement Learning & Function Approximation group at the University of Calgary.
http://rlai.cs.ualberta.ca/RLAI/RLFA.html

Acknowledgments:
My friend and colleague Julian Rendell, for bringing Robocode to my attention and suggesting its use as a course
project.

S. Sarkaria 2008

You might also like