You are on page 1of 10

Soccer results prediction using neural

networks
Jlius Koi

kocisj@neuron.tuke.sk

Supervisor : Ing. Rudolf Jaka, PhD.

jaksa@neuron.tuke.sk

Department of Cybernetics and Artificial Intelligence


Technical University of Koice, Letn 9, 042 00 Koice
Slovak Republic

Abstract : This master thesis deals with the soccer results predictions. For predictions
there are used neural networks. The goal of this thesis is to design environment for the
soccer results prediction, to create concrete experiments with prediction with different
configuration of training data, to compare the results of experiments with real data and
evaluate the possibility of its real usage.

Keywords: neural networks, prediction, soccer

1. Project Definition and Task Determination

Primary targets of this work:


1. Working out the beginning of the soccer results prediction and the beginning
of used methods of prediction.
2. Design the environment for the achieving and transformating soccer results
predictions.
3. Design the system for the prediction and results visualization.
4. Implementation of designed system.
5. Realize experiments with prediction with different configuration of trained
data.
6. Evaluate results of realized experiments and outline possibilities of real usage
of this system.

1
2. The State of the Art in the Domain

Probably the only company, that predicts soccer results using neural
networks (NN), is a Greek company Prosoccer [12].
Their system take into account results of previous games for the
participating teams, various information about the league, statistics and many
other complex factors. The program then generates probabilities for all probable
outcomes and scores. The software is made in LISP, for the needs of their site
www.prosoccer.gr. The software is currently not for sale and it will not be in the
near future.
Other neural networks software exists in the market but the people from
Prosoccer doubt there is one specific for soccer games predictions.

3. Selected Methods and Approaches

For prediction I am using neural network with 2 hidden layers and with
algorithm backpropagation of error. I predict soccer results for the 1st Czech
soccer league.
In the Czech league play every year 16 teams. The season begins on 1st
July and ends on 31st June of the next year. Each team plays 2 matches with the
other 15 teams (one on the home field, one on the away field). During one season,
teams together play 240 matches. At the end of the season teams that ended at the
last 2 positions of the table, are executed and are replaced by 2 new teams that
achieved the 1st 2 positions in the 2nd Czech league.
For my experiments I have used data from 1.7.1999 to 31.12.2005 (this
means 1567 matches). In the 1st Czech league, during these 7 seasons, these 22
teams have been participating:
Table 1. Teams that played in 1st Czech soccer league from 1999.

1. Blany 7. Most 13. 1.FC Slovcko 19. Drnovice

2. Mlad Boleslav 8. Olomouc 14. Sparta Praha 20. ikov Viktria

3. Brno 9. Ostrava Bank 15. Teplice 21. Bohemians Praha

4. Jablonec 10. Slavia Praha 16. Zln 22. Hradec Krlov

5. Jihlava 11. Plze Viktria 17. . Budjovice

6. Liberec 12. Pbram 18. Opava

2
The names of these 22 teams I have written by using 22 neurons, from
which 21 have value -6 and one value 6. For example, by the team Sparta
Praha, the 14th neuron have the value 6 and each other neuron had value -6.
The values of these 22 neurons, data that represent date of already played
matches, statistics for all teams (scores, points) and bookmaker odds were the
values of input neurons.
For the recording of results, I have used 3 neurons. If the home team wins
it activates the 1st neuron, that means that, the values for 1st, 2nd and 3rd neuron
will be 1 0 and 0. If draw the values for 1st, 2nd and 3rd neuron will be 0
1 and 0 and if the away team wins, the values for 1st, 2nd and 3rd neuron will
be 0 0 and 1. The values of these 3 neurons were the values of output
neurons.

4. Design and Implementation

The goal of the neural network is to learn an exact number of matches


(training patterns) until the 11th round of the 1st Czech league and to predict results
for 8 matches of the 12th round.
Sample of 1 training pattern with values for input and output neurons can be
found on Figure 1.

Figure 1. Sample of 1 training pattern

3
In the 1st line of the Figure 1 is encoded the name of the home team. In the
following 6 lines are statistics for the home team (scores and points). In the 8th
line of the Figure 1 is encoded the name of the away team and in the following 6
lines are statistics for the away team (scores and points). Then follow the result,
the date and bookmaker odds.
Those training patterns were split into training and testing sets in relation
9:1, it means that every 10th pattern has been inserted into the testing set and the
left patterns made up the contents of the training set.
The NN was learned on the training set and was tested on the testing set.
On the Figure 2, on the x-axis is the number of cycles and on the y-axis is the
learning error on the training or on the testing set.

Figure 2. Learning error on the training and testing set

After the learning was NN used for 8 match prediction of the 12th round of
Czech league. On the output of NN were values that determined the type of match
result, for every 8 matches (Figure 3.).

Figure 3. Sample of results prediction

4
5. Experiments

5.1 Soccer results prediction


The goal of the experiments is to compare the prediction results in case of
3 different training data configuration:
if no statistics are using data configuration nr.1,
if statistics are using data configuration nr.2,
if statistics and bookmakers odds are using data configuration nr.3.
The prediction results for 3 different configuration data are comparing by using 3
types of experiments:
Precision prediction analysis experiments of type A,
Experiments with using crossvalidation experiments of type B,
Learning with progressive changing training and testing sets
experiments of type C.
By experiments of type A NN achieved better results in experiments
with greater number of training and testing patterns. The prediction results
fruitfulness with number of training patterns with value 512 and with value 728
were 17,7% and with number of training patterns with value 1160 a 1376 were
39,58%. The best prediction results fruitfulness were by configuration data nr.2,
(in average 32,81%).
The results of experiments of type B are in the following Table 2. In the
1st 6 lines there are results for data configuration nr.1, in the next 6 lines there are
results for data configuration nr.2 and in the last 6 lines there are results for data
configuration nr.3.
Table 2. Results of experiments of type B
Number of Number of Number of neurons Number of neurons Prediction results
training testing on the 1st hidden on the 2nd hidden fruitfulness
patterns patterns layer layer [%]
646 162 40 15 27.50
646 162 60 20 42.50
1030 258 40 15 45.00
1030 258 60 20 47.50
1222 306 40 15 52.50
1222 306 60 20 55.00
646 162 40 15 40.00
646 162 60 20 30.00
1030 258 40 15 52.50
1030 258 60 20 52.50
1222 306 40 15 47.50
1222 306 60 20 47.50
5
646 162 40 15 35.00
646 162 60 20 15.00
1030 258 40 15 42.50
1030 258 60 20 37.50
1222 306 40 15 47.50
1222 306 60 20 45.00

NN achieved better results in experiments with greater number of training


and testing patterns. As a conclusion from Table 2, we can see, that by the data
configuration nr.3, where are included bookmakers odds, the prediction results
fruitfulness was smaller by data configuration nr.3 like by data configuration nr.1
and nr.2.
The results of experiments of type C are in the following Table 3. In the
1st 6 lines there are results for data configuration nr.1, in the next 6 lines there are
results for data configuration nr.2 and in the last 6 lines there are results for data
configuration nr.3.
Table 3. Results of experiments of type C
Number of Number of Number of neurons Number of neurons Prediction results
training testing on the 1st hidden on the 2nd hidden fruitfulness
patterns patterns layer layer [%]
600 107 40 15 50.00
600 107 60 20 37.50
957 170 40 15 50.00
957 170 60 20 37.50
1136 201 40 15 37.50
1136 201 60 20 37.50
600 107 40 15 25.00
600 107 60 20 37.50
957 170 40 15 50.00
957 170 60 20 50.00
1136 201 40 15 25.00
1136 201 60 20 50.00
600 107 40 15 62.50
600 107 60 20 37.50
957 170 40 15 37.50
957 170 60 20 37.50
1136 201 40 15 75.00
1136 201 60 20 62.50

The average value of prediction results fruitfulness for data configuration


nr.1 is 41,66%, for data configuration nr.2 is 39,58% and for data configuration
nr.3 is 52,08%.

6
Learning error on the testing set for all experiments of type C is on the
following Figure 4. The numbers in legend determine the number of training
patterns, number of testing patterns, number of neurons on 1st and 2nd hidden layer.
On the x-axis is the number of cycles and on the y-axis is the learning
error on the testing set.

Figure 4. Learning error on the testing set for experiments of type C

5.2 Interaction and spotting reliability prediction for a current player


The aim of NN is to give information for a player to choose or not to
choose the type of the tip, with the use of his previous tips.
On the input of NN will be the next information:
names of all teams on the ticket,
names of 2 teams, that played the current match,
bookmakers odds for the current match,
prediction results for the current match.
The output of NN will compare the tip of a player with the correct tip by using of
6 neurons. For every kind of tip (win of home team 1, draw 0, win of away
team 2) will be used 2 neurons.
If a player did not choose the tip, the value of 1st neuron will be 0.5 and
the value of 2nd neuron will be 1 (if he decided correctly not to choose the tip)

7
and will be 0 (if he decided incorrectly not to choose the tip, that means he had
to choose the tip).
If a player choose the tip, the value of 2nd neuron will be 0.5 and the
value of 1st neuron will be 1 (if he decided correctly to choose the tip) and will
be 0 (if he decided incorrectly to choose the tip, that means he had not to choose
the tip).
Table 4. Values of neurons on the output layer
Correct Correct Correct Correct Correct Correct
decision decision decision decision decision decision
Players Correct of of not of of not of of not
tip tip choosing choosing choosing choosing choosing choosing
the tip 1 the tip 1 the tip 0 the tip 0 the tip 2 the tip 2
[ 0 / 1] [ 0 / 1] [ 0 / 1] [ 0 / 1] [ 0 / 1] [ 0 / 1]
1 1 1 0.5 0.5 1 0.5 1
1 0 0 0.5 0.5 0 0.5 1
1 2 0 0.5 0.5 1 0.5 0
0 1 0.5 0 0 0.5 0.5 1
0 0 0.5 1 1 0.5 0.5 1
0 2 0.5 1 0 0.5 0.5 0
2 1 0.5 0 0.5 1 0 0.5
2 0 0.5 1 0.5 0 0 0.5
2 2 0.5 1 0.5 1 1 0.5

On the Figure 5. are results of spotting reliability prediction for a current


player, by using his 12 previous tickets (38 matches - training patterns).

Figure 5. Results of spotting reliability prediction for a current player


From results viewed on Figure 5, for 6 matches advices the NN for the
player to choose the home win with range of 0.79. It does not advice him to
choose a draw with range of 0.82 and finally it does not advice him to choose the
lost of home team (with a range of 0.92).
For match Slavia Blsany, prediction result can not advice for the user
if he should choose the win of the home team (0.46 0.55) or the draw (0.57 0.59),
but it gives an advice him not to choose the win of the away team with a range of
0.76.

8
For match Pribram Slovacko, NN with a small range (0.13)
recommends for the player to tip on the win of the home team, with a small range
of 0.17 do not tip the win of the away team, but it do not advice for the player to
tip on the draw. That means, that the user should not tip on this match.
For graphical displaying results from Figure 5 are using these colors:
light green (1.00 0.80),
light blue (0.80 0.60),
pink (0.60 0.40),
light brown (0.40 0.20),
white (0.20 0.00).
The betting menu for the 14th round Czech soccer league for a current
player using results from Figure 5 is on following Figure 6.

Figure 6. Betting menu for current player with colorized radiobuttons for odds
For the successful choose of a tip (for every match) the user has 3 kinds of
information:
bookmakers odds,
soccer results prediction using match results from more years,
results of spotting reliability prediction for the user.

9
6. Contribution to the Research Domain

From experiments results we can say, that by using data from the last 6
and 7 years and by using data configuration nr.2, the NN achieved best
predictions results.
The fruitfulness of prediction we could achieve by using NN that would
recommend for the current user to choose or not to choose the type of the tip. The
system for 2 different players, that achieved similar spotting results, would be
recommended to spot on similar tips. From the players, of which one achieved
worst spotting results like then the other, the system would not recommend to spot
on some tips or matches.
In the future this system could be used to compare prediction fruitfulness
for different soccer leagues or different kind of sports.

7. Conclusion

[1] Angelovi, P., Vyuitie neurnovch siet s nekontrolovanm uenm pri predikcii
asovch radov., Diplomov prca, Katedra Kybernetiky a Umelej Inteligencie,
TU Koice, (2003)
[2] Babik, M., Predikn systm na bze prediktvnych modulrnych siet., Diplomov
prca, Katedra Kybernetiky a Umelej Inteligencie, TU Koice, (2003)
[3] Babinec, ., Vyuitie neurnovch siet v predikci asovch radov., Diplomov
prca, Katedra Kybernetiky a Umelej Inteligencie, TU Koice, (2003)
[4] Folvark, L., Metdy predikcie vmennch kurzov finannch mien., Diplomov
prca, Katedra Kybernetiky a Umelej Inteligencie, TU Koice, (2004)
[5] Sink, P., Andrejkov, G., Neurnov siete - ininiersky prstup (Dopredn
neurnov siete), .1., Elfa-press, ISBN 80-88786-38-X, (1996)
[6] Sink, P., Andrejkov, G., Neurnov siete - ininiersky prstup (Rekurentn
a modulrne neurnov siete), .2., Elfa-press, ISBN 80-88786-42-8, (1996)
[7] http://www.betexplorer.com/soccer/czech-republic/
[8] Jaka, R., Neuroriadenie: vyuitie neurnovch siet v inteligentnom riaden,
Dizertan prca, Technick univerzita Koice, (1999)
[9] Uk, M., Vizualizcia a interakcia v procese uenia neurnovch siet, Diplomov
prca, Technick univerzita Koice, (2005)
[10] idlovsk, J., Vyuitie neurnovch siet pre rozpoznvanie flia pri recyklcii.,
Diplomov prca, Katedra Kybernetiky a Umelej Inteligencie, TU Koice, (2005)
[11] Duda, Richard O. - Hart, Peter E. - Stork, David G.: Pattern Classification: Second
Edition. New York: Wiley, 2001. ISBN 0-471-05669-3.
[12] http://www.prosoccer.gr

10

You might also like