You are on page 1of 7

Using a Genetic Algorithm to Evolve Rule Based Agents

to Play a Simple Strategy Game

by
William L. Johnson

for
CS 4633, Assignment #4
Abstract

A genetic algorithm is used to evolve script like agents to play a RTS. The GA is found to be
successful at finding simple, effective strategies, but does not easily evolve more complex behavior.

Introduction

This project explored to viability of evolving rule based agents to play a simple game. Each agent
consisted of n commands, where each command was evaluated each turn of game play. Each command
was given a probability and a enable/disable flag. When a command was evaluated, it would check to
see if was enabled and then test the probability against a linearly distributed random number. The
command was executed if both conditions were met.

These agents were matched against each other in a simple real time strategy game (RTS) created by the
author, called Melete's Game. This game involved a small number of game regions arranged in a four
by four grid. Each region contained two zones. Two types of units could be built and used to play the
game: a gatherer unit to collect resources, and a combat unit that could attack and defend against other
units. Each agent could issues commands to build units, move units between regions and zones, target
individual regions, select groups of regions, and modify the flags and variable that controlled the
agent's operation.

To evolve better agents, an agent was randomly generated, and then used to create a pool of child
agents. These children were generated using a standard crossover and mutation scheme. Each child
played a game against the parent and the scores of each match were recorded. The top x percentage of
tested agents were selected to create a new generation via normal genetic reproduction. This behavior
was repeated until the population achieved stopped improving. The agent which had achieved the
highest score was selected as the new opponent, and the cycle was repeated. These high scoring agents
were saved to a file when identified. After a sufficient number of agents were created this way, the
selection algorithm was modified to play games against the latest winner and a random selection of
previous winners. This encouraged the development of more robust strategies.
Methods

This project was implemented in Java. The project required work in three main areas: simulating the
game, running the agents, and evolving the agent pool.

The Game

The game was played on a four by four grid of regions. Each region was differentiated from the other
regions by a number of features. Each region had two zones, which had unique features.

Feature Use
Resource Quantity Gatherer Units could harvest resources which could be used to
build more units and directly contributes to the agents score. The
agent with the highest score wins the match, and the agents with the
highest overall score are selected for reproduction.
Resource Collection Rate This value affects how quickly a unit can collect resources.
Resource Accessibility Resource collection increases linearly with the number of Gatherer
Units present until a threshold is reached. After the threshold is
reached, resource collection suffers from diminishing returns. This
feature determines the threshold number.
Combat Bonus This feature affect how easy it is to attack or defend this region.
Figure 1 – Region Features for Melete's Game

Each region has two zones, called offensive and defensive. Each zone has unique values for each
feature above. However, units in the defensive zone receive a bonus to defend against attacks, while
the units in the offensive zone receive a bonus to attack. Since each zone has separate resources, it is
necessary to send units to both zones to fully collect the resources of a region.

Units automatically perform their functions while deployed to a region. Gatherer Units will collect
resources from the zone they are in until the resources there run out. Combat Units will attack a
random enemy unit each round. Units will only attack into the offensive zone until those units are
destroyed, and then they will attack into the defensive zone. Combat Units will attack an enemy
combat unit if one is present before they will attack an enemy gatherer unit. An attack consists of a
random number being generated for each unit involved. If the attackers number is higher, the target is
damaged based on the difference in the roles. Once a unit has been sufficiently damaged, it is removed
from the game.

Unit are built in a reserve area and may be moved between regions and the reserves with a command.
Units are always moved into and from the defensive zone. Units may be moved between the defensive
zone and offensive zones using another command. Damaged units are repaired while in the reserves.

The game ends after 1000 turns. Each agent is given point based on how much resources the collected,
how many units they built, and how many enemy units they destroyed. The agent with the highest
score wins the match. For selection purposes, the score of each match is recorded as the difference
between the scores of the players.

The Agents

Each agent has sixteen commands. Each command is evaluated each turn based on its probability and a
enable/disable flag. When a command is executed, some event takes place that modifies either the
game state or the state of the agent itself.

Commands Effect
(Construction)
Build Unit Create a new unit in the reserves if the player can
pay the cost of the unit.
(Movement)
Deploy Unit Send a unit from the reserves to a target region.
Recall Unit Return a unit to the reserves from a target region.
Advance Unit Move a unit from the Defensive to the Offensive
Zone.
Retreat Unit Move a unit from the Offensive to the Defensive
Zone.
(Targeting)
Set Target Select a Target from a group. Targets are used
with Movement commands.
Make Group Select a set of Regions from the game board. This
process takes into account a number of different
features of the region.
Modify Group Create a new group based on an existing group.
(Agent State)
Set Flag Set a flag to be enabled or disabled based on game
defined and agent defined variables
Set Variable Store a value in an agent defined variable.
Figure 2 – Command Descriptions for Agents

The Genetic Algorithm

Each agent can be represented as a fixed length binary string. This string is created by concatenating a
string representation of each command together. The strings are formatted as follows:

Name Flag Probability Command Command Code Parameters


Type
Purpose This number This determines Determine The more complex These parameters
determines how likely this s what the commands can be determine what
which flag to command is to be command completed many is effected or
watch for this executed each turn does, see different ways. used while
command. commands These commands evaluating a
table use and extra field command.
above. to encode that
information,
Figure 3 – Field Descriptions for Chromosomal Representation of a Command
110100001110010010111000101101110100100110100101110
101011101010010011010101011000001100001100100001110
110011111110001001011011111101101011110010001011011
110110111000111111001111100001110111011000111010101
011101011100000100010011100100001101101100100010000
001111101110001111011001001010110011000011111001000
000010011000011000001110000111011111110011001000100
100001110011011101001011010010101000101001000000001
100100001001000010001101011111111101110001010001111
111010001101011110001111100110000100100010101111110
100101001010101010001111101000111110001001010001011
101001001111100110010011101111001110001100001101010
001110101110011010001000011100110110001000101101010
111001110111110110010111110011011001101011010100001
010011110100010010000000011000010111111011000111011
101011101011110101010100010100100100100001001110000
Figure 4 – Example of a Complete Agent Chromosome

New agents were created from older agents via a single crossover point and a small mutation
probability. In this implementation, a child had a 10% chance to come exclusively from one parent and
a 90% chance to be composed of a combination of two parents, where a random crossover point was
selected, and each parent contributed to one side of the crossover. After this operation was completed,
each bit had a 3% chance to be set to a random value. During each reproduction cycle, only the top x
percentage was allowed to reproduce. At first the reproducing percentage was 20%, but this was latter
changed to be 35%.

A pool of 80 agents was tested each cycle. The best agents were allowed to reproduce to make the next
generation of agents. This process continued until the population stabilized to a victory condition.
Here we required that the agents be able to win 85% of all games they played against the current
opponent. Additionally, the agents were allowed to continue evolving until the fitness score of the pool
stopped increasing. To prevent to negative changes in the population, the population saved its state
whenever it bet its previous best score, and had a chance of reverting to that state each time it failed to
improve. Here the chance to revert per failure was 8%. If the population reverted while it met the
victory condition, this was interpreted as a local maximum, and the population's best member was
selected as the new opponent to play against.

Results

The author had two objectives for this project. The first was to see if agent representation and
evolution scheme could produce scripts that would perform well in the designed game. The second
was to see if a competitive, complex game could force the evolution of state based behavior in the
agents. The project was completely successful on the first goal, and a complete failure on the second.
The population of agents was able to rapidly evolve to capture nearly all available resources. They
behavior of the agents was determined solely on the reward scheme which determined the victor. The
evolution scheme was able to determine what the best way to win very quickly, and then attempted to
maximize that strategy. While this produced very successful agents, it did not produce interesting or
varied behavior. Since there was no extra reward for interesting behavior, this makes perfect sense. To
develop different strategies, it seems like that the learning environment would have to be varied. This
method would mimic the situations that encouraged diverse strategies in the real world.

Improvements

Since the project was successful at finding a good strategy for a given environment, but failed at
producing interesting state transitions, the author would suggest dividing up the task explicitly, so that a
GA evolves strategies without state transitions to fit a particular environment ( the behavior of the
opponent can be included in the environment) and then evolve another agent to select the best strategy
based on features it detects in its current environment.