You are on page 1of 49

Steffen Steffensen Halkjær Eriksen

011189-2725

A win-rate model in Pokémon TCG

Bachelor Project
Bachelor in Mathematics-Business Economics

Supervisor
Professor Jørgen T. Lauridsen

December 2011
Acknowledgements

I would like to thank Professor Jørgen T. Lauridsen for always being there when I needed it.
He gave me good advice, pointed me in the right direction; without his help I am sure I would
not be able to do the best I could.

To Søren Rud Kristensen I would like to thank for the help when working with Stata and the
nice comments during my work.

I specially would like to thank all the people who let me interview them for this project. They
gave me good and wise answers to all my question and thereby making me able to get a more
complete view of the game. They have all been a pleasure to interview.

I will also like to express my gratitude to all the people who answered my survey and in that
way helped me get the data I needed to make this research.

Finally, a really heartfelt gratitude goes to my friends and family who encouraged and
supported me in the process, to my previous neighbor Ákos Kancsal who helped me get the idea
to make this project and to my fiancé Catarina for being the most amazing girl one could ever
wish for.

i
ii
Abstract

The strategic card game known as Pokémon Trading Card is a dynamic game where a lot of
different factors can influence the outcome of a game. This paper investigates what affects a
Pokémon  TCG  player’s  win-rate besides luck, with respect to the season 2010-2011. In order to
do so a cross sectional dataset containing 84 individuals was collected along with 7 interviews.

Techniques from econometrics are applied in other to determine the effect of the different
factors   on   a   player’s   win-rate. In a situation where the dependent variable is a coding of
qualitative outcome, a probability model is applied to maintain the familiar type of regression.
The dependent variable could then be linked to a list of factors, each of them with a different
impact on the probability for a higher win-rate. The model chosen is the logit model due to its
mathematical convenience and the fact that the posterior distribution is a continuous
probability function which then holds with the theory of probability models. To take account for
problems with heteroscedasticity a weighted least-squares logistic regression for grouped data,
known as glogit, has been used to estimate the data.

Results revealed a positive effect of experience, playing decks containing a SP-engine and
being a Pokémon professor. Ageing proves to have a negative   effect   on   a   player’s   win-rate.
There is no difference whether a player is from the USA or not. Also having family members
playing, having a job or playing abroad during the season showed no significant influence on the
win-rate.

iii
iv
Table of Contents

Acknowledgements.................................................................................................................................. i

Abstract .................................................................................................................................................. iii

Table of contents....................................................................................................................................v

Chapter 1. Introduction .......................................................................................................................... 1

Chapter 2. Data ....................................................................................................................................... 5

2.1. Data collection ............................................................................................................................. 5

2.2. Variables....................................................................................................................................... 6

2.2.1 Dependent variable................................................................................................................ 7

2.2.2 Independent variables .......................................................................................................... 8

Chapter 3. Methods .............................................................................................................................. 11

3.1. Probability models ..................................................................................................................... 11

3.2. The logit model .......................................................................................................................... 12

3.3. The model .................................................................................................................................. 14

Chapter 4. Results ................................................................................................................................. 17

Chapter 5. Discussion ............................................................................................................................ 23

Chapter 6. Conclusions ......................................................................................................................... 27

Chapter 7. Future work ......................................................................................................................... 29

References ............................................................................................................................................ 31

Appendix A – Data collection

Appendix B – Scan of cards containing in a SP-engine

Appendix C – Omitted variable bias

Appendix D – Heteroscedasticity tests

v
vi
Chapter 1. Introduction

Chapter 1. Introduction

In 1998 the Pokémon Trading Card Game was created. The game known as Pokémon TCG
was based on the video game series created by Satoshi Tajiri. In contradistinction to other TCGs
at that time like Magic; The Gathering, it appealed more to the younger audience. It quickly
became very popular and spread from Japan throughout the world. The game was published by
Wizards of the Coast from its creation until they lost the license to Nintendo in 2003. Affiliated
and owned by Nintendo, The Pokémon Company international (TPCi) is the responsible for all
Pokémon franchise and marketing. Play! Pokémon1 is the division of TPCi that takes care of the
Pokémon TCG and is responsible for the organized tournament play. Today, after 13 years,
Pokémon TCG is still one of the most popular TCGs along with Magic; The Gathering and Yu-Gi-
Oh.

The game itself is a 2 player game, where each player has their own pre-constructed 60 card
deck. There are different types of ways to play the game, the most common one to  be  “Modified  
Constructed”.2 This is the type of game play there will be focus on in this report. The goal for
each player is to   knock   out   the   other   player’s   Pokémon with the help of their own Pokémon,
energy and trainer cards. Each time a player succeed in knocked out one of his or her opponent’s
Pokémon the player are allowed to take 1 of his or her initial 6 prize cards. There are in general 4
ways to win the game:

1. Taking all 6 prize cards before the opponent.


2. Knocking out the opponent’s  last  Pokémon on the field.
3. Decking out the opponent.3
4. Winning by using “Lost  World”.4

In the event of both players fulfilling one or more of above win conditions at the same time,
the player fulfilling more win conditions wins the game; otherwise a new game to one prize card
is played. For more specific rules of the game; how to play the cards, how to construct a deck
and much more visit the official Pokémon website. (1)

1
Play! Pokémon was until August 2010 known as Pokémon Organized Play (POP).
2
Modified Constructed means that each player plays with a pre-constructed 60 card deck, where only cards
from a certain number of sets are allowed in the deck.
3
Each player must draw a card from their deck in the beginning of their turn. If there are no more cards left in
the deck and therefore the player is unable to draw a card, that player loses the game.
4
Your  opponent  must  have  at  least  6  Pokémon  in  their  “Lost  Zone”  in  order  to  claim  themselves  the  winner.  

Page 1 of 31
Chapter 1. Introduction

There has previously been done research on other card games like Texas Holdem Poker (2) (3),
however there has not yet been done research about Pokémon TCG. Unlike a game like Texas
Holdem  Poker,  Pokémon  TCG  is  not  “constant”.  The amount of cards the player may use for his
or her decks change over time, while in Texas Holdem Poker the player play with the same 52
card every time. There are usually 4 new sets released every year, each of them containing
around 100 cards. There is then a format change once a year removing a number of the oldest
sets. A player usually has a card pool in the beginning of the season of at least 500 different
cards to create decks from. Then during the season more cards are added through the release of
new sets. A player then end up with a card pool of around 1000 cards before the format change
at the end of the season. All this together gives a dynamic game in constant evolution where
players constantly have to learn new cards and combos in order to not fall behind.

When playing a game like Pokémon TCG, a player have to think a lot about how to construct
his or her 60 card deck before entering a tournament. Since there is more than one way to win
the game and the game is as dynamic as it is, play testing serves an important role in achieving
more wins doing tournament play. So having a test group who is very devoted to the game gives
a player a huge advantage in the game.(3) The more a player play test, the more the player know
about what he or she should include in his or her deck and which tactic should be used. Things
like this ads a lot factors into consideration when discussing what will make a player win more
games. It  is  not  just  “the  luck  of  the  draw”  that  decides  the  outcome  of  a  game.  

The aim for this paper is to investigate what affects a Pokémon TCG player’s  win-rate other
than luck. The win-rate is defined as the percentage of premier rated games5 a player wins. This
research will additionally focus on a number of hypotheses, which all together will help to
answer the main research question of this paper. This study will focus on the 2010-2011 season
from September 1st until April 24th.6 Knowing  what  will  affect  a  player’s  win-rate might be helpful
to get a deeper understanding of the game and how a player can improve his or her own win-
rate.

5
At premier rated tournaments players can earn point in order to qualify for the World Championship.
6 st st
Normally a season last one whole year(Sept. 1 to Sept. 1 ), but because of the release of Black&White on
th th
April 25 , where a lot of rule changes followed, the study only goes until April 24 .

Page 2 of 31
Chapter 1. Introduction
The objective of this paper is to answer the main research question:

“What  does  significantly  affect a Pokémon TCG players win-rate”

In order to answer this question the following hypothesis will be used:

“Ageing  will  affect  a  player’s  win-rate negatively”

“Playing the game for a longer time helps a player achieve a higher win-rate”

“Playing a deck with a SP-engine contribute to a higher win-rate”

“A  player achieve a higher win-rate if he or she has other family members playing”

“A full time job affects a player’s win-rate negatively”    

“The Pokémon professor status affects a player’s win-rate positively”

“Playing abroad during the season boost a players win-rate”

“Being  from  USA  does  not  affect  a  player’s  win-rate”

When researching about a subject that heavily involves the players themselves, the best way
to approach this seemed to be getting as much data from as many different players as possible.
Then by also getting data from people, who had another view of the game (mainly judges) a
more complete view of the game can be done, thereby being able to answer the main research
question more precise. This report makes use of nonexperimental data. There are both
quantitative data such as surveys and qualitative data like interviews. The advantages of
gathering the quantitative data in this way are that this kind of data makes it possible to use
techniques from econometrics in order to answer the research questions. The interviews which
form the qualitative part of the data were used to create the different hypothesis stated in this
paper.

So in the development to provide an answer to these questions a cross section data set has
been collected in the season of 2010-2011. In chapter 2 the data of this paper will be discussed.
Chapter 3 will discuss the methods used in this paper. Chapter 4 presents the results produced

Page 3 of 31
Chapter 1. Introduction
in this paper and a discussion of these is provided in chapter 5. In chapter 6 the final conclusions
will be made. Chapter 7 will look at some possible further research.

Page 4 of 31
Chapter 2. Data

Chapter 2. Data

In order to answer the research questions, a cross sectional analysis about Pokémon TCG
players win-rate from around the world in the 2010-2011 season has been conducted. With the
data collected it is then possible to capture the effects, such as a player’s  main  deck  choice  doing  
a season, particularly if the chosen deck contains a so called SP-Engine.7 The dependent variable
is  the  player’s  win-rate throughout all the regressions. The independent variables are capturing
the  factors  that  may  influence  the  player’s  win-rate.

2.1. Data collection

As stated earlier this report makes use of both quantitative and qualitative data with both
data types being nonexperimental. The period in which the data was collected stretched from
June 1st until August 14th. The quantitative part of the data was conducted as survey data. A total
of 84 observations were collected, after sorting out incomplete answers or for some other
reason not valid answers. An online survey was created at www.surveymonkey.com, so people
could either fill out the survey at that site or send the answers by e-mail. Different ways of PR
was made to draw attention to the survey. An article was published on one of the big pokémon
website www.sizprizes.com, small paper flyers were given out at the Dutch National
Championship 2011 and Danish National Championship 20118, mails were distributed to people
in the pokémon community and a copy of the article was posted at the facebook page for Danish
Pokémon players. A copy of the article and the flyer can be seen in appendix A.

When looking at the article and the flyer, certain conditions were stated in order to
contribute to the project. These conditions have been well considered and apart from the
reasons stated in the article some further comments ought to be given.

The  first  condition  was:  “Master players only”.  The reason for only using players from the
master division9, other than the one stated in the article, is that players who have reached that

7
An SP-Engine is when your collection of Supporter and Trainer cards in your deck mainly or partly consist of:
Cyrus’s  Conspiracy,  Team Galactic's Invention G-101 Energy Gain, Team Galactic's Invention G-103 Power Spray, Team
Galactic's Invention G-105 Poke Turn and Team Team Galactic's Invention G-109 SP Radar. A scan of each of the cards
can be found in appendix B.
8
I was present at these two championships. One as a player (Dutch) and one as the head judge (Danish).
9
For season 2010-2011 a player born in 1994 or prior plays in the master division.

Page 5 of 31
Chapter 2. Data

age will often be able to play the most advanced decks and also play the game with less mistakes
than a player who plays in a lower age division.

The second condition was:   “Results from this season only – Prior to Black & White”. The
reason for this is that the game had some rule changes with the release of that set, which will
dramatically change the types of decks being played, making the results very unstable. The first
early rotation in the history of Pokémon TCG also happened at July 1st, removing a number of
the oldest sets in order to keep a healthy game environment.(4)

The  third  and  last  condition  was:  “You must have played at least 25 premier rated games”.  
As stated in the article this number was chosen after some consideration. A too low number will
give too unstable win-rates and will therefore be less reliable. Picking a too high number will
then give trouble with the amount of data that could be collected.

The qualitative part of data consists of 7 interviews of players and judges. The interviews
were made during the Pokémon TCG World Championship 2011 at Hilton Bayfront hotel in San
Diego, California from August 10th to August 14th. These selected people were all asked the same
questions in the same order. As stated earlier the people selected for the interviews were a mix
of players and judges from all around the world in order to get a more complete view of the
game. The questions asked during the interviews can be seen in appendix A.

The questions in the interviews were meant to be well connected to the questions asked in
the survey. However some of the questions in the interview also go beyond the questions asked
in the survey, to get some interesting statements from the people who were interviewed. All
together the statements obtained in the 7 interviews formed the different hypotheses stated in
this paper.

2.2. Variables

In this section the different variables, which are being used in the cross sectional regression
of Pokémon TCG players win-rate, will be described. Table 1 in summarizes the data that has
been collected.

Page 6 of 31
Chapter 2. Data

Table 1: Name, definition, mean, variance, min and max value of each of the variables described in the following
subsections.

Name: Definition: Mean: Variance: Min: Max:


wr The  player’s  win-rate expressed in 0.6463333 0.009979 0.4 0.904
%.
age Age of the player in years. 21.83333 66.45382 15 64
exp The experience of the player in 4.97619 12.43316 1 12
whole years.
sp If  the  player’s  deck  contained  a  sp- 0.5357143 0.2517212 0 1
engine.
job Indicating if the player has a full 0.202381 0.1633678 0 1
time job.
prof Telling if the player has earned the 0.4285714 0.2478485 0 1
title of Pokémon professor.
abroad If the player played a tournament 0.6666667 0.2248996 0 1
abroad in the past season.
family Indicating if the player has any 0.4404762 0.2494263 0 1
family members playing.
usa_player Telling if the player is from the USA 0.5 0.253012 0 1
or not.

As mentioned earlier the   dependent   variable   is   the  player’s   win-rate. Multiple regressions
are run in order to test models with different specifications. In this way the most true and fair
model of the tested models can be found. Then that model can be used to describe what
influence a Pokémon  TCG  player’s  win-rate.

2.2.1 Dependent variable

When working with a win-rate, it can be considered that the model fitting the data best is a
model which fits a discrete choice, since a player can either win or lose when playing a game of
Pokémon TCG. So the choice of the model for this paper has been among models known as
qualitative response (QR) models. For the present case, the dependent variable can be
formulated as a coding of a qualitative outcome, namely the wins (coded as 1) and losses

Page 7 of 31
Chapter 2. Data

(coded as 0) of a player. However, there is further information in the present case, as there are
repeated observations for each player, i.e., each player played a number of games. This implies
that the dependent variable can be represented as the share of games won by the player. Thus,
the dependent variable (wr) is a figure between 0 and 1 which can be thought of as the
probability that the player in question would win a game. To fit into a probability model, the
dependent variable (wr) therefore must be transformed. In what follows, the logistic probability
model and thus the so called logit transformation is applied, which reads as follows:

𝑤𝑟
𝑙𝑛
1 − 𝑤𝑟

A discussion of the choice of model and how the transformation is done, will be given in the
next section.

2.2.2 Independent variables

To begin with there are a various number of explanatory variables applied in the regressions,
in this section a short description of the variable used is given.

The  “age”  variable  denotes  the  age  of  the  player  at  the  time  he  or  she  answered  the  survey.  
This  variable  is  used  to  capture  the  effect  the  age  has  on  a  player’s  performance.  It  was  then  
used to create a new variable “agesquare”.  These two variables were then used to check for a
possible  ‘peak’  in  a  player’s  performance  and  more  important  at  what  age  such  peak  will  be.

“exp”  denotes  the  number  of  whole  years  the  individual  player  has  been  playing  the  game.  It  
measures the effect  of  experience  in  the  game.  Together  with  the  “sp”  variable  the  model  were  
extended  to  allow  the  ‘premium’  of  playing  a  deck  with  a  sp-engine to depend on the years of
experience. This interaction term,  which  will  be  denoted  “sp_exp”, was to test the idea that a
player   would   get   a   ‘premium’   according   to   the   years   of   experience.   So   a   player   with   a   low  
experience, who picked up a sp-deck, would gain a rather small boost in the win-rate, while a
player with more experience would gain a bigger boost in his or her win-rate by playing a sp-
(4)(5)(6)
deck. Furthermore   this   variable   is   used   to   create   “expsquare”,   to   check   whether   the
return  to  experience  may  decrease  with  “expsquare”  <  0.  

The  variable  “sp”  is  a  dummy  variable  which  indicates  whether  a  player  mostly  played  a  deck  
containing a sp-engine or not. It measures the effect of playing with a deck containing a sp-
engine. It is important to note that some players played different decks during the season. So

Page 8 of 31
Chapter 2. Data

they might not have played with a deck containing a sp-engine the entire season. So the
variable is based on the deck a player played the most in the season. As mentioned above this
variable,  together  with  “exp”,  was  used  to  create  an  interaction  term  to  test  for  the  ‘premium’  
of playing a sp-deck.

To test for the effect of having a full-time   job,  the   variable   “job”   indicates   if   the player in
question had a full time job or was  studying.  This  was  used  to  test  if  a  player’s  win  rate  will  be  
negatively affected by having a full time job and not be studying. (3)(7)

The   “prof”   variable   captures the effect of being a Pokémon professor. This variable was
added to test if passing the professor exam and achieving the title of Pokémon professor will
increase your win-rate. The interviews have given the impression that being a Pokémon
professor not will affect your win-rate significantly. (8)(9)

“abroad”    denotes  if  the  player  has  been  playing  abroad  during  the  season.10 It measures the
effect  on  a  player’s  win-rate of playing abroad. It is to test if the player gets a positive effect on
his or her win-rate by playing abroad during the season. The general opinion gained from the
interviews, suggests a positive effect since the player in this way would have more exposure to
more tournaments, different players, different decks and different strategies. (3) (5) (9)

The   dummy  variable  “family”  tells  if  a  player  has  other  family  members  playing  the  game.  
This was used to test if there should be any positive effect of having family playing. Having
family playing makes it easy for a player to have a quick game without having to plan much. The
player can just wake up in the morning and ask another family member if he or she want to play
Pokémon, which makes it easier to develop strategies and test. (3) (6)

The   last   dummy   “usa_player”   denotes if a player comes from the USA. This variable was
used to test if there should be any significant difference in the level of the players from the USA
and outside the USA.11

10
For players in the USA, playing in another state counts as playing abroad.
11
Countries included in the list of countries outside the USA are: Australia, Canada, Denmark, England, Finland,
Germany, Italy, Mexico, Netherlands, Portugal and Sweden.

Page 9 of 31
Page 10 of 31
Chapter 3. Methods

Chapter 3. Methods

In this section the methods used to obtain the results in this paper will be discussed,
furthermore the reason for the choice of model will be discussed. This section will conclude
what  will  be  the  ‘main’  regression  model  used  in  this  paper.

3.1. Probability models

In a situation where the dependent variable is some coding of qualitative outcome, it does
not seem like the familiar type of regression can be applied. However it is possible to construct
models, where each decision can be linked to a set of factors. In this way it is still possible to
maintain a regression like approach. The way it is done is by using probability models, with the
general structure:

𝑃𝑟𝑜𝑏(𝑒𝑣𝑒𝑛𝑡  𝑗  𝑜𝑐𝑐𝑢𝑟𝑠) = 𝑃𝑟𝑜𝑏(𝑌 = 𝑗) = 𝐹[𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡  𝑒𝑓𝑓𝑒𝑐𝑡𝑠, 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠] (3.1)

The interest of this paper is to see how factors such as age, experience, family and more
explain whether a TCG player wins or losses. The information of the factors can be gathered in a
vector x, so that it can be expressed by:

𝑃𝑟𝑜𝑏(𝑌 = 1|𝐱) = 𝐹(𝐱, 𝛃) and 𝑃𝑟𝑜𝑏(𝑌 = 0|𝐱) = 1 − 𝐹(𝐱, 𝛃) (10) (3.2)

The set of parameters β shows the change in the probability with a change in the vector x.
Such a change in x could be that a player started to play a deck containing a SP engine.
(Assuming the player did not play SP beforehand) It is possible to keep the standard linear
regression 𝐹(𝐱, 𝛃) = 𝐱′𝛃 and construct a model on that basis. However it is shown that this
model has some complications and therefore the model might not give predictions that look
like probabilities. (10) One of the complications is that the error term is Bernoulli distributed. This
means   that   ε   will   either   be   equal   to   −𝐱′𝛃 or 1 − 𝐱′𝛃, with the probabilities 1-F and F,
respectively. These complications are the reason that the linear regression approach is not so
frequently used. The requirement for a model is then that the model will give predictions which
holds with the theory described earlier in (3.1). Following this theory it would be expected that:

lim𝐱 𝛃→ 𝑃𝑟𝑜𝑏(𝑌 = 1|𝐱) = 1 and lim𝐱 𝛃→ 𝑃𝑟𝑜𝑏(𝑌 = 1|𝐱) = 0 (10) (3.3)

Page 11 of 31
Chapter 3. Methods

So actually any continuous probability distribution defined over the real line will be enough.
The distribution chosen for this paper is the logistic distribution which is defined as follows:

𝐱 𝛃
𝑃𝑟𝑜𝑏(𝑌 = 1|𝐱) =   𝐱 𝛃 = 𝛬(𝐱′𝛃) (10) (3.4)

The model that arises from this distribution is called the logit model. It is closely related to
the probit model which arises from the normal distribution. The difference in the distribution
lies in the tails. The tails in the logistic distribution are heavier than the tails in the normal
distribution. It would therefore be expected to get similar results if encountering intermediate
values of    𝐱′𝛃 (e.g. -1.2 and +1.2). Furthermore the logistic distribution will tend to give a larger
probability to Y=0 when 𝐱′𝛃 is extremely little than the normal distribution, and then smaller
probabilities to Y=0 when 𝐱 𝛃 is very big. (10)
On theoretical basis it is hard to justify which
distribution should be used and in most applications the choice between the two distributions
seems to make little difference. In this case the choice of the logit model is based on the
mathematical convenience.

After explaining the choice of the logistic distribution it will be natural to look at how the
logit model is derived and thus how the transformation of the dependent variable became
ln(wr/(1-wr)).

3.2. The logit model

First of all the kind of data observed for this paper has been grouped data. The grouped data

was obtained by observing ni individuals. (In this case 84 individuals was observed

independently of each other), all of them having the same vector xi. The dependent variable
was then a coding of the qualitative outcome (between 0 and 100). For simplification it will be
assumed that, throughout the rest of this section, the dependent variable denotes if a player
won or lost the game, instead of the win-rate. The idea is still the same and it can be extended
to the win-rate, since the win-rate denotes the proportion of games won of all games played. So
now the dependent variable will consist of the proportion Pi of the ni individuals i j who
responded with yij = 1(a game won). A single observation will then be expressed as
[𝑛 , 𝑃 , 𝒙 ], 𝑖 = 1, … . ,84. In this formulation it is then possible to use the familiar regression
methods to analyze the relationship between the proportion Pi and the vector of independent
variables xi. The observed Pi can then be treated as an estimate of the population quantity,

Page 12 of 31
Chapter 3. Methods

π =  F(𝐱 𝛃  ). This problem can then be treated as a Bernoulli experiment and then it can be
written as:

𝑃 = F(𝐱 𝛃  ) + 𝜀 = 𝜋 + 𝜀 (10)
(3.5)

Where the expected value and variance of the error term is given by:

( )
𝐸[𝜀 ] = 0,          𝑉𝑎𝑟[𝜀 ] = (3.6)

As it can be seen here the variance depends on xi and is considered to be heteroscedastic,


and it therefore suggest that these parameters can be estimated using a weighted least square
regression. However there is another way to proceed. The function F(𝐱 𝛃  ) is strictly
monotonic12, it is then 1 to 1 and it therefore has an inverse. A Taylor series approximation
around the point 𝑃 = 𝜋  (𝜀 = 0) for this function can be considered.

( )
𝐹 (𝑃 ) = 𝐹 (𝜋 + 𝜀 ) ≈ 𝐹 (𝜋 ) + (𝑃 − 𝜋 )  (10) (3.7)

This expression can then be reduced to:

𝜀𝑖 (10)
𝐹 (𝑃 ) ≈ 𝐱 𝛃 + (3.8)
(𝜋𝑖 )

Since

( )
𝐹 (𝜋 ) = 𝐱 𝛃  and = = (10)
(3.9)
( ( )) ( )

Equation (3.8) then produces a heteroscedastic linear regression of the form:

𝐹 (𝑃 ) = 𝑧 = 𝐱 𝛃 + u (10)
(3.10)

Where

( )[ ( )] (10)
 𝐸[𝑢 |𝒙𝒊 ] = 0  and 𝑉𝑎𝑟[𝑢 |𝒙𝒊 ] = [( )]
(3.11)

With this knowledge it is now possible also to use this on the logistic model in (3.4). So the
inverse of the logistic function is the following:

𝛬 (𝜋 ) = 𝑙𝑛 (10)
(3.12)

12 (10)
This is true for a probability model.

Page 13 of 31
Chapter 3. Methods

The above function is known as the logit of 𝜋 , hence   the   “logit”   model.   It   has   now   been  
shown how the logit function of 𝜋 is derived and therefore why it is possible to use in this
paper.

3.3. The model

As mentioned earlier a cross sectional analysis of 84 observations has been conducted. The
choice of the logit model is because of its mathematical convenience and the fact that a
continuous probability distribution, like the logistic, holds with the theory from (3.1).
Furthermore, since the dependent variable in this paper consist of qualitative outcome, it was
possible to maintain the familiar linear regression with use of probability models. The
dependent variable could then be linked to a list of factors, each of them with a different
impact on the probability for, in this case, a higher win-rate.

When estimating the data using the logit model, there will arise some problems due to
certain types of heteroscedasticity. So to take account for the heteroscedasticity a weighted
least-squares logistic regression for grouped data has been used to estimate the data. This type
of model is known as a glogit13 model and it takes account for heteroscedasticity caused by
differences in the group sizes and an error term which is Bernoulli distributed. The two models
are never the less closely related and provide almost the same estimates.

It is important to note that even when the glogit model is applied to estimate the data and
produce the results, all tests performed in this paper are based on the logit model with the
same explanatory variables and not on the glogit model. This is because that all the familiar
tests known from OLS cannot be readily used on the transformed glogit model, as these are not
standard options in the STATA implementation. Formally, it is possible to derive the tests for the
glogit, but this is beyond the scope and space of the present project. The results from the tests
on the logit model can then be used to indicate any possible problems with the glogit model,
since the two models produce very similar estimates.

Like with any other linear regression model it is important to know if the assumptions are
violated.14 Even though the glogit models assures against heteroscedasticity caused by the two

13
The  name  ’glogit’  comes  from  the  Stata  command  of  the  same  name.  Stata  is  the  statistical  software  used  to  
estimate the data in this paper.
14 (12)
In this paper the formulation of Hayashi will be used.

Page 14 of 31
Chapter 3. Methods

cases described above, It will be preferable to test for heteroscedasticity as a function of the
data matrix X, so the Breusch-Pagan / Cook-Weisberg test in STATA is applied. The assumption
of homoscedasticity states that the conditional second moment, which in general is a nonlinear
function of X, is a constant, or written in more mathematical terms:

𝐸 𝜀 𝑿 = 𝜎 > 0  (𝑖 = 1,2, … , 𝑛) (10) (3.13)

This assumption is also known as the spherical error variance assumption. If this assumption
is violated, then the variance is not constant and varies with X. Furthermore the estimator is no
longer BLUE. The t and F-tests are no longer valid. However the estimator is still unbiased. (11)

To check if the assumption of multicollinearity is violated a correlation matrix has been used,
to check the correlation between the different explanatory variables. The idea of this
assumption is that matrix X should be at full column rank, i.e. none of the columns of the data
matrix X can be stated as a linear combination of other columns of X. The assumption then also
automatically implies that there are at least as many observation as regressors.

The most important assumption of the linear regression is the assumption of strict
exogeneity states that the expectation is conditional on all regressors for all observations.15 This
can be stated mathematically as:

𝐸(𝜀 |𝑿) = 0  (𝑖 = 1,2, … , 𝑛) (11) (3.14)

This assumption of strict exogeneity has several implications which are useful. One of them
being that 𝐸(𝜀 ) = 0. i.e. the unconditional mean of the error term is 0.16 Another implication is
that the explanatory variables are orthogonal to the error term for all observations:17

𝐸 𝑥 𝜀 = 0  (𝑖, 𝑗 = 1, … , 𝑛; 𝑘 = 1, … , 𝐾) (11) (3.15)

If this assumption is not satisfied one or more explanatory variables are said to be
endogenous. There are different ways an explanatory variable can be endogenous. The case
considered in this paper is the case of omitted variable bias. Remember that the error term
captures the impact of variables not included in the regression, in the case of this paper it could

15
Some authors define strict exogeneity as xi being independent of εi.
16
The proof of this is an application of the law of total expectation.
17
The proof of this is an application of the law of iterated expectations.

Page 15 of 31
Chapter 3. Methods

be  variables  like  “innate  ability”,  “Pokémon  League”18 and “motivation”. If any of such variables
is correlated with any of the explanatory variables the assumption of strict exogeneity does not
hold. A closer study of omitted variable bias is provided in appendix C.

No test for endogeneity is applied since it would be rather complicated in the logit model.
Moreover the model does not suggest problems with endogenous variables.

Below the econometric models is presented:

𝑙𝑜𝑔𝑖𝑡_𝑤𝑟 =   𝛽 + 𝛽𝐗 + 𝛿𝐷𝑢𝑚𝑚𝑖𝑒𝑠 + 𝜀

X is a set of independent variables.

Dummies are capturing different effects such as:

𝑠𝑝 , 𝑗𝑜𝑏 , 𝑝𝑟𝑜𝑓 , 𝑎𝑏𝑟𝑜𝑎𝑑 , 𝑓𝑎𝑚𝑖𝑙𝑦 , 𝑢𝑠𝑎_𝑝𝑙𝑎𝑦𝑒𝑟

However a logit model with identical parameters is used to conduct the tests for each glogit
model.

18
Dummy if the player regularly attends a Pokémon League.

Page 16 of 31
Chapter 4. Results

Chapter 4. Results

This section will present the results of the different model specifications used in this
research. The model considered first, is a model with all the mentioned variables as
independent variables. From there on a different number of independent variables will then be
removed from the regression, in order to find the most suitable model to describe the present
data and which can answer the hypotheses stated in this paper. This section ends up with what
will be the final model describing what will have a significant impact on a Pokémon TCG player’s  
win-rate.

Before presenting the results, table 2 shows the correlation matrix between the variables
used in the different specifications in order to check for any problems with the assumption of
multicollinearity. Correlations above 0.3 between the variables are highlighted in the table.
There is a relatively high correlation  between  “age”  and  “job”  which  is  not  surprisingly  since  an  
older player is more likely to have a full time job, rather than being studying, which is in line
with the expectations. The  correlation  between  “exp”  and  “prof”  is  also  not  that  surprising  since
it is often player with at least some experience that is able to pass the professor exam. Also the
fact that a player has to be 18 in order to take the exam explains this correlation. This fact also
explains   the   correlation   between   “prof”   and   “age”.   Moreover “usa_player”   and   “abroad”   are  
also positively correlated. This might be explained by the fact that the distribution of the
tournaments in the USA is somewhat different than in Europe. In USA and Canada they have so
called Regional Championships which are big tournaments that attracts players from many
states. There are only a few Regional Championships held and therefore people have to travel
across state and country boarders in order to attend. Europe on the other hand only had one
tournament in the season 2010/2011, which could be counted as a Regional Championship.19
The   variables   “age”   and   “sp”   are negatively correlated. This fact indicates that there is a
relationship between the variables in such way, that the older the player is the less likely the
player will be playing a deck containing a SP-engine.

Finally, it can then also be concluded that it is highly likely that the models do not suffer
from problems with multicollinearity since there are no variables which is highly correlated.

19
The European Challenge Cup held in Arnhem, The Netherlands.

Page 17 of 31
Chapter 4. Results

Table 2: Correlation matrix of the variables used in the specifications.

age exp sp job prof abroad family usa_player


age 1
exp -0.1167 1
sp -0.2519 0.1163 1
job 0.4638 -0.1995 -0.1252 1
prof 0.2642 0.3765 0.0345 0.2224 1
abroad 0.001 0.0096 0.1013 -0.021 0.2041 1
family 0.1573 0.1839 0.0567 0.2096 0.2008 0.0678 1
usa_player 0.0617 -0.0068 0.0716 0.0296 0 0.2525 -0.2158 1

Table 3, shows the results from seven different specifications. All regressions are done with
the glogit model, to account for heteroscedasticity caused by differences in the group sizes and
by the Bernoulli distributed error term, which is a common problem whit probability models. In
regression (1), all variable shown in above correlation matrix were used. However this
specification showed only a few variables with explanatory power. Therefore a specification
with  only  “age”,  “exp”  and  “sp”  as  independent  variables  was  run  in  regression  (2).  In  regression  
(3) to (7) one of the other variables not used in (2) were added in turn to check if any of them
would add any explanatory power to the model. In all the specification a heteroscedasticity test
was applied to check for heteroscedasticity caused as a function of the data matrix. In all the
specifications, the null hypothesis of homoscedasticity was not rejected, so it can be concluded
that it is less likely that the trust in the model will be lower due to violation of the spherical
error variance assumption. All tests for heteroscedasticity can be seen in appendix D.

In  all  the  specifications  “age”,  “exp”  and  “sp”  were  added; they showed significant results at
the 5 % level,  except  for  “age”  in  regression  (1).  The  coefficient  for  “age”  is  negative,  which  is  in  
line with the expectations. It is not strongly negative, which is not surprising. The coefficient for
“exp”,   which   captures   the   effect   of   one   extra   year   of   experience   in   the   game,   has   a   positive  
coefficient. This supports the theory well. The strongly positive sign of the   “sp”   coefficient   is  
also in line with the expectations. It is highly positive comparing to other coefficients; however
(4) (8) (13)
this does not seem so surprising when looking at the statements from the interviews.
Results revealed that even if the effect of having a job was insignificant, the negative sign of the
(4) (10)
coefficient is in line with the expectations. The   positive   coefficients   for   “prof”   and
“abroad”   is   also   as   one   would   expect,   even   though   the   positive   sign   on   “prof”   could   be  

Page 18 of 31
Chapter 4. Results
discussed.   The   negative   sign   on   “family”   does   not   meet   the   expectations.   One   would   have  
expected this coefficient to be positive (7)(9)(10), however the coefficient is strongly negative.

Page 19 of 31
Chapter 4. Results

Table 3: Estimation results using different model specifications. When using the glogit regression, one must specify the number of positive values and then the total population as the LHS of
the regression. gw (games won) is therefore the number of positive values out of the entire number of gp (games played).
(1) (2) (3) (4) (5) (6) (7)
gw gw gw gw gw gw gw
age -0.00937 -0.0117* -0.00791 -0.0149* -0.0121* -0.0101 -0.0117*
(-1.51) (-2.18) (-1.32) (-2.60) (-2.25) (-1.85) (-2.13)
exp 0.0350* 0.0443** 0.0409** 0.0369* 0.0440** 0.0475*** 0.0441**
(2.36) (3.34) (3.05) (2.62) (3.34) (3.54) (3.30)
sp 0.234* 0.253** 0.247* 0.237* 0.232* 0.261** 0.255**
(2.47) (2.68) (2.64) (2.53) (2.44) (2.77) (2.64)
job -0.186 -0.177
(-1.37) (-1.39)
prof 0.165 0.147
(1.57) (1.48)
abroad 0.122 0.142
(1.11) (1.37)
family -0.111 -0.114
(-1.11) (-1.23)
usa_player -0.0624 -0.00253
(-0.65) (-0.03)
_cons 0.545** 0.594*** 0.565** 0.642*** 0.506** 0.586*** 0.594***
(2.97) (3.50) (3.33) (3.76) (2.81) (3.48) (3.46)
N 84 84 84 84 84 84 84
t statistics in parentheses
*
p < 0.05, ** p < 0.01, *** p < 0.001

Page 20 of 31
Chapter 4. Results

Table 4 presents results for five more glogit specifications. In these specifications new
variables   as   “agesquare”   and   “expsquare”   and   an   interaction   terms   testing   for   a   potential  
premium of playing a deck containing a SP-engine, when a player has more experience are
added. These variables are added in turn to see their individual effect on the model. In
regression  (1)  the  variable  “agesquare”  was  added  to  test  if  the  negative  effect  ageing  was  
declining.  Furthermore  “prof”  was  added,  since  it  showed  to  have  a  significant  impact  at  5  %  
level. Regression (2) the  variable  “expsquare”  was  added  to  take  into  account  the  return  to  
experience.  “sp_exp”  was  added  in  regression  (3),  to check for the premium of playing with
an SP-engine when a player has more experience. In (4) all three additional variables were
added to the model, to see their combined effect on the model. The regression (5) is the
final specification. In this specification “sp_exp”  was   removed   due   to   insignificance. A test
for heteroscedasticity is applied in each specification to test for heteroscedasticity caused as
a function of the data matrix; however the null hypothesis of homoscedasticity was not
rejected in any of the specifications, indicating a model which is not in violation with the
assumption of the spherical error variance.

“agesquare”  has  a  very  small  positive   sign in all the specifications it was added, which
could indicate a small decline in the effect of ageing. The two age coefficients are individual
insignificant in the specifications were they are both present, however they are not dropped
due to the fact that they are jointly significant at the 5 % level. The coefficient for
“expsquare”  is  negative,  however  only  slightly.  This  indicates  that  the  return  to  experience  is  
decreasing.  Again  the  coefficients  “exp”  and  “expsquare”  are  individual  insignificant,  but  a  F-
test shows evidence for a jointly significance between the variables and they are therefore
not   dropped   from   the   model.   For   both   “age”   and   “exp”   it   applies   that   when   the   squared  
coefficient is added they become insignificant, but the two coefficients are then jointly
significant as previously stated. In  the  regressions  where  the  interaction  term  “sp_exp”  was  
added it turned out insignificant. The sign of the coefficient is negative, which suggests that
the premium of playing a SP-deck becomes less with a higher experience. The negative sign
was somehow expected, due to the fact that such a deck could be some kind of an autopilot
deck for relatively new players.(4) On the other hand a positive sign would not have been
surprising either, since this deck also is considered as a rather complicated deck, which
requires skill to play. (7)

Page 21 of 31
Chapter 4. Results

Table 4: Estimation results when adding  “agesquare”,  “expsquare”  and  the  interaction  term  “sp_exp”.
(1) (2) (3) (4) (5)
gw gw gw gw gw
age -0.0586 -0.0145* -0.0138* -0.0528 -0.0528
(-1.99) (-2.53) (-2.37) (-1.78) (-1.77)
agesquare 0.000626 0.000567 0.000549
(1.51) (1.35) (1.31)
exp 0.0356* 0.114 0.0524** 0.121* 0.101
(2.55) (1.97) (2.65) (2.00) (1.73)
expsquare -0.00614 -0.00541 -0.00519
(-1.37) (-1.20) (-1.15)
sp 0.229* 0.207* 0.388* 0.368* 0.204*
(2.44) (2.16) (2.37) (2.27) (2.13)
prof 0.233* 0.149 0.148 0.228* 0.224
(2.05) (1.51) (1.50) (2.02) (1.97)
sp_exp -0.0297 -0.0327
(-1.12) (-1.26)
_cons 1.227** 0.486* 0.544** 0.927* 1.023*
(2.90) (2.38) (2.84) (2.00) (2.24)
N 84 84 84 84 84
t statistics in parentheses
*
p < 0.05, ** p < 0.01, *** p < 0.001

Page 22 of 31
Chapter 5. Discussion

Chapter 5. Discussion

This chapter summarizes the results from the different specifications of the model tested in
the last chapter. First  there  is  found  evidence  for  age  having  a  negative  effect  on  a  player’s  win-
rate. In all specifications where  “agesquare”  was  added  it  showed  that  the  effect  of  ageing  was  
declining until a player reached the end of his or her forties. At this point the negative effect
ageing reaches a minimum. It is a minimum due to the positive sign of  “agesquare”.  This is in line
with the theory and the claims from the interviews. Generally younger players who had a few
years in master are doing best. These players still have a high motivation for playing the game
and become a champion. (8) (9) The first years in masters they are still learning and have not reach
their full potential.(6)(7) Players at that age usually also work less and are going to college and
therefore usually have some more free time at hand. (4) When getting older a player have more
responsibilities and the interest starts going away and thereby  the  player’s  win-rate decrease.

The regressions have also shown that experience has an explanatory power on the win-rate.
However the return to experience is declining as a consequence of the negative sign of
“expsquare”, which also supports the basic theory of playing a game for a longer time, makes a
player better.

In all specifications used the  variable  “sp”,  donating  if  a  player  played  with  a  deck  containing  
a SP-engine,   showed   a   positive   significant   impact   on   a   player’s   win-rate. This supports the
findings in the interviews where a lot of reasons were given for SP-decks being so strong. Some
pointed that it was not necessarily the best decks out there, but they were straight forward to
(7) (9)
play and players already had the cards. The fact that this deck was more or less carried over
from the season before made it easy for the players to construct and play this deck. While many
other decks still had to be formed, SP-decks were already formed and with the release of new
cards, a player could often just grab one or two new cards and the deck was ready to go. Since
this made a huge amount of players play the deck, it was obvious that it also would take a lot of
the top stops at tournaments.(7) Others also considered these decks to be some kind of an auto
pilot deck, so even a relatively new player could play it. On the other hand while most other
decks were limited to one or two strategies, SP-decks offered a lot of different strategies so
(4) (8)
player’s   with   more   experience   could   shift   the   strategy   from   game   to   game. Some also
viewed the decks to be rather complicated decks and therefore takes time to learn, due to the

Page 23 of 31
Chapter 5. Discussion

amount of different strategies a player could use. So these decks required some skill
of the player in order to generate a good win-rate. (5) (6) Furthermore no evidence is found for
earning  a  ‘premium’  playing  SP-decks with more experience.

The results do not support the hypothesis that when a player has other family members
playing, that player will have a higher win-rate. The coefficient point is negative, but however
insignificant. Interviews support an increased win-rate if a player has other family members
playing, since the support will be higher. There is an easier access to resources, since the family
then is more likely to spend money on the game and they also tend to travel more to
tournaments and thereby get to play more games. (5) (12) The fact that a player can wake up in the
morning and play straight away with another family member also helps. When a player only can
play in the Pokémon league once a week or with friends, the player often gets to play fewer
games than if other family members played. (6) On the other hand interviews also pointed out
that it was mainly younger players who had benefit from having family members playing. This
could very well explain the negative findings, since it is only master players who are considered
in this paper. Typically having family members playing are good in the beginning when getting
into the game and when it is younger players who are considered. (7) (12)

The  theory  of  a  full  job  negatively  affecting  a  player’s  win-rate is supported by the results;
however the connection is not significant. When a player is no longer studying and is going to
have a regular full time job, that player typically would have less time to play than before. A job
is a bigger responsibility and you have to perform in order not to lose your job, whereas when
studying the player can slack off a bit and still pass. (4) (6) Also  when  studying  the  player’s  brain  is
used to be occupied with many different things and is used to think hard, which can help when
then playing the game. (5) (8)

Results reveal a positive connection between being a Pokémon professor, however the
connection is not strong since it is significant on a level between 5 and 10 %. This small positive
connection is not surprising, since some have pointed out that there might be a small difference.
This is because the player would know exactly what his or her cards do and what the different
penalties are for making a mistake during the game. (4) (6) On the other hand most would have
expected no connection between the professor status and the win-rate. This is due to the fact
that many judges and professors not necessarily know all the strategies and deck building even
though they know the rules very well. The professor status is also not only about knowing the

Page 24 of 31
Chapter 5. Discussion
rulings, but also about how to handle and help players, how to judge games and knowing the
other mechanics of the game. (5) (7) (8)

Results  support  the  theory   that   playing   abroad   increases   a   player’s  win-rate. However the
connection is not significant. When playing abroad the players gets more exposure to different
decks and play styles. A player might see a card combination he or she not would have thought
about and thereby get a broader view of the game. (4) (5) (9) Players who play abroad also get more
used to handle stress at big tournaments and players also find out where there strategy is good
and where it is weak.(7) (8) Furthermore when a player chooses to play abroad he or she often
(6)
already has a big commitment to the game. One could also argue that it is a case of the
chicken and the egg, who came first? Are they better players because they play so much or do
they play so much because they win? However it both contributes to a higher win-rate. (12)

Lastly, no evidence is found that if being from the USA would affect   a   player’s   win-rate,
which is in line with the expectations. The coefficient is slightly negative indicating a negative
relationship, however it is highly insignificant.

Page 25 of 31
Page 26 of 31
Chapter 6. Conclusions

Chapter 6. Conclusions

When a person plays a game whether it is Pokémon TCG or another game, that person will
often ask himself, how to achieve most possible wins. This aim of this research was to look at
what factors play an important role when playing Pokémon TCG, so players can get a deeper
understanding of what would make them better players and help them achieve the highest
possible win-rate. Analyzing a game like Pokémon TCG is not easy due to the regularly
increase/decrease in the card pool, however understanding general factors that affect  a  player’s  
win-rate is an important step towards becoming a better player. In the season 2010/2011 data
through surveys and interviews were conducted and together with techniques known from
econometric  theory  a  picture  of  what  affect  a  player’s  win-rate could be drawn.

From  the  research  it  can  be  concluded  that  ageing  has  a  negative  effect  on  a  player’s  win-
rate, however it does not prove to have a strong effect. Also the effect is declining and will be at
a minimum when a player reaches the end of his or her forties.

Regarding the effect of experience it is found to have a positive effect in every specification,
which proves the hypothesis stated, yet the return to experience is decreasing, so the marginal
effect of one extra year of experience gets lower.

In the case of playing with a SP-engine, it is proven to have a strong positive impact on a
player’s win-rate. It can then be concluded that a player who mainly played decks containing a
SP-engine would have a significantly higher win-rate, than players who did not. However no
‘premium’  of  playing  with  a  SP-engine could be shown.

It is also possible to conclude that having family members playing, having a regular full time
job or playing  abroad  during  the  season  had  the  expected  effect  on  a  player’s  win-rate, yet none
of them proved to be significant resulting in a rejection of all the corresponding hypotheses.

The Pokémon professor status turned out to have a positive impact on the win-rate, making
a player who is a Pokémon professor achieve a slightly higher impact. Since this result only is
significant between 5 and 10 % the hypothesis can only be partly proven. There is a positive
effect, however it is rather small.

Page 27 of 31
Chapter 6. Conclusions

Last but not least no effect of being a player from USA has been found which is consistent
with the hypothesis stated and it is therefore not rejected.

Page 28 of 31
Chapter 7. Future work

Chapter 7. Future work

The results obtained during this research showed which factors that had a significant impact
on  a  Pokémon  TCG  player’s  win-rate in the season 2010-2011. It showed that techniques known
from econometrics   are   appropriate   to   describe   a   Pokémon   TCG   player’s   win-rate. The model
described in this paper can also be used for future season, however not with the exact same
choice of variable, since some of them will not be relevant.

Further extension to the model can also be made. It could be interesting to check for other
effects on the win-rate such as if a player played regularly in a Pokémon league, the motivation
of  a  player  or  a  player’s  innate  ability.  However  a  variable  like  “innate  ability”  would  be  hard  to  
measure and a proxy variables ought to be used such a variable reflecting a score on an IQ test.
Caution should be made when adding such a variable since problems with endogeneity can
arise.

Regarding the data gathering process in general, a larger sample could be collected in
future studies in order to obtain more precise estimates.

Page 29 of 31
Page 30 of 31
References

1. Play Pokémon. The Official Pokémon Website. www.pokemon.com. [Online] 25 April 2011. [Cited:
4 September 2011.] http://www.pokemon.com/us/news/op_bw_modifiedformat-2011-04-25/.

2. Bragonier, Danny. Statistical Analysis of Texas Holdem Poker. California State Polytechnic
University. [Online] Spring 2010. [Cited: 23 August 2011.]
http://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1006&context=statsp&sei-
redir=1#search=%22Statistical%20Analysis%20Texas%20Holdem%20Poker%20California%20State%2
0Polytechnic%20University%22.

3. Nguyen, Duy. Regression Analysis on Poker Models. Texas Christian University. [Online] 12 April
2010. [Cited: 3 November 2011.]
http://wwwstu.tcu.edu/duynguyen/Regression%20Analysis%20on%20Poker%20Models.pdf.

4. Craig, Heidi. Interview #1. [interv.] Steffen Eriksen. 11 August 2011.

5. Nelson, David. Interview #4. [interv.] Steffen Eriksen. 11 August 2011.

6. Wittenkeller, Josh. Interview #5. [interv.] Steffen Eriksen. 11 August 2011.

7. Rountree, Ives. Interview #2. [interv.] Steffen Eriksen. 11 August 2011.

8. Ceolin, Andrea. Interview #7. [interv.] Steffen Eriksen. 12 August 2011.

9. Sucevich, Kyle. Interview #8. [interv.] Steffen Eriksen. 14 August 2011.

10. Greene, William H. Econometric Analysis 6 Edition. s.l. : Prentice Hall, 2007.

11. Hayashi, Fumio. Econometrics. s.l. : Princeton University Press, 2000.

12. Kamada-Fujii, Doreen. Interview #6. [interv.] Steffen Eriksen. 11 August 2011.

13. Pokémon Organized Play. The Official Pokémon Website. www.pokemon.com. [Online] [Cited:
August 23, 2011.] http://www.pokemon.com/us/organized-play/tournaments/rules/.

14. Pokébeach. Pokémon Card Search: Rising Rivals. www.pokébeach.com. [Online] [Cited: 13
November 2011.] http://pokebeach.com/tcg/rising-rivals/scans.

15. PokéBeach. Pokémon Card Search: Platinum. www.pokébeach.com. [Online] [Cited: 13


November 2011.] http://pokebeach.com/tcg/platinum/scans.

Page 31 of 31
Appendix A – Data collection

1. Article on www.Sixprizes.com

Below a copy of the article published on www.sixprizes.com is given. For seeing the original
article visit: http://www.sixprizes.com/uncategorized/dane-bachelor-project-pokemon-tcg/

Dane Bachelor Project on the Pokemon


TCG – Need Your Help!
Written by Steffen Eriksen | June 4, 2011 | 13 comments | 1,034 views | Rating: +8

Hello everyone!

Since this is my first article here on SixPrizes I will start


off by introducing myself. My name is Steffen Eriksen and
I am currently doing my bachelor in Mathematic-
Economics in Denmark. I have been playing Pokémon
TCG for around 6 years and have played tournaments in
many countries.

The article is about the data collection process for my


bachelor, which I hope you will be a part of.

Some time ago I got an idea about a so-called “win-rate”  


model for the Pokémon TCG, so I asked my university if I
could write my bachelor project about this model and I got
a yes! So now I have this huge opportunity to write my
project about my hobby, which is really cool!

To explain a little more about what this win-rate model, it


is a model where I try to explain what will affect a
Pokémon  player’s  win-rate. What I mean when I say win-
rate, is how many percent of your premier rated games
you have won.

The methods I will use to estimate such a model is generally different types of regression
models, which is borrowed from econometrics.

This model I can then use to answer some hypothesis I will state in the beginning of my
paper. Such a hypothesis could be: Will playing the game for a longer time give you a
higher win-rate?
So I will with this model maybe be able to answer some pretty interesting hypotheses about
the game.

Before I can even start thinking about setting up such a model I need data from Pokémon
players. Unfortunately I cannot just use data from every player out there. I have to make
some certain conditions in order to make my model valid. I will state these 3 conditions and if
you do not fulfill all 3 of the conditions below I will not be able to use your contribution.

Condition 1 – Masters Division Only

I have chosen only to make this model for players in the Masters divison. This is done
because in different age division there might be different decks which do well. For example
in  the  Junior  division  a  “Speed  Jumpluff”  deck  might  do  well.

However I do not think that such a deck will do well in the master division. So such a sample
will give a misleading picture of my model and maybe show that the highest win-rate will
be achieved   with   a   “Speed   Jumpluff”   deck,   when   it   in   reality   not   is.   (Sorry   to   all   Jumpluff  
fans out there).

Condition 2 – Results from This Season Only – Prior to B/W

I am only working with the current season. So all questions


below assume the current season. It is then your tournament
record from the 1st of September until now that counts.

If you have already played with B&W and the new rules
(e.g. in the USA) then it is your tournament record from
Regionals and back that counts.

The reason why I will not take B&W into account is


because it will have too big of an impact on the decks that
are being played and also increase the luck factor even
more.

Condition 3 – You must have played at least 25 Premier


Rated games

You have to have played at least 25 premier rated games


this season. The reason for this is that with a low number of
games played will give a more unstable win-rate. A single
loss will lower your win-rate too much when your number
of games played is below 25.

Then you might ask yourself, why 25 and not 30, 35, or 42 (which is the answer to
everything). I just have to pick a suitable number. Picking a to low number will give unstable
win-rates. Picking a too high one will result in too few samples.

So after stating these 3 conditions I can now go on to the actual questions. All questions are
simple and can be answered right away, except for the last 2, which requires you to log onto
your Pokémon account on Pokemon.com, unless of course you can remember by heart how
many games you have played this season and how many you have won.

You also have to answer all the questions; otherwise your contribution will not be valid.

Here are the questions:

1. What country are you from?


2. How old are you?
3. How many years have you played the game? (approximately)
4. What type of deck have you used the most this
season? (Examples: LuxChomp, BlazeChomp,
VileGar, LostGar, DialgaChomp, MagneRock,
Gyarados,  MewDos,  etc…)
5. Do you have a full-time job? (Full-time student
does not count as a job for this question.)
6. Are you a Pokémon Professor?
7. Have you played a Premier Rated tournament
abroad this season? (For players in the US, I
count playing in a different state as playing
abroad.)
8. Do you have family members who also play the
Pokemon TCG?
9. How many Premier Rated games have you played
this season? (You can check on your My
Pokemon account.)
10. How many of those games have you won? (This
you can check on your my-pokemon account.)

E-mail your answers to: steff5001@hotmail.com

Or simply fill out my survey here: http://www.surveymonkey.com/s/8GWP8F9

Thanks so much for your help!

Image Credits: S.S. Anne Pokemon, PokeBeach, Pokegym, and Pokemon Paradijs
2. Interview Questions

First the person who was interviewed was asked to talk about his or her experience with
the game. They were then asked to answer the following questions:

1. You see a lot of boys playing this game. Why do you think that so few girls play this
game?
2. Do you think that having other family members playing helps a player achieving a higher
win-rate?
3. When you look at all the master players, at what age do you think a player will peak and
why?
4. Do you think it will make a big difference to your win-rate if your studying or having a
full time job?
5. Do you think a person who is a Pokémon professor performs better than a player who is
not a Pokémon professor?
6. Looking more at the decks played in the past season. Do you think people who played a
deck containing a so called SP-engine performed better?
7. Can you point out a card or two that you think really made a difference if a player
decided to play that card in his or her deck?
8. A lot of players travel across country boarders and in the case of the USA state boarders
to play tournaments. Do you think that a player that does so ends up with a better win-
rate?

3. Hand out at Dutch Nationals and Danish Nationals

Dear Pokémon player


-Are you playing in the master division?
-Have you played +25 premier rated matches this season?
If so, then please help a fellow Pokémon
player with his bachelor project by filling out a survey on:
http://www.surveymonkey.com/s/8GWP8F9
Appendix B – Scan of cards contained in a SP-engine

Below card scans of card found in a typical SP-engine are presented:

Figure 1: Card  scan  of  “Cyrus’s  Conspiracy” (13) Figure 2: Card  scan  of  “SP  Radar” (14)

Figure 3: Card  scan  of  “Power  Spray” (13) Figure 4: Card  scan  of  “Energy  Gain” (13)
Figure 5: Card  scan  of  “Poké  Turn” (13)
Appendix C – Omitted variable bias

To explain more about the consequences of omitting important variables suppose that the
logit_wr is determined by:

𝑙𝑜𝑔𝑖𝑡_𝑤𝑟 = 𝛽 + 𝛽 𝑎𝑔𝑒 +   𝛽 𝑒𝑥𝑝 + 𝜀   (C.1)

Where 𝐸(𝜀 |𝑎𝑔𝑒 , 𝑒𝑥𝑝 ) = 0.

Suppose that exp (experience) was not observed and the following model was estimated
instead:

𝑙𝑜𝑔𝑖𝑡_𝑤𝑟 = 𝛽 + 𝛽 𝑎𝑔𝑒 + 𝑣   (C.2)

Where 𝑣 = 𝛽 𝑒𝑥𝑝 + 𝜀

Notice that the correlation between the variable age and the error term is no longer zero i.e.:

𝐸(𝑣 |𝑒𝑥𝑝 ) = 𝐸(𝛽 𝑒𝑥𝑝 + 𝜀 |𝑎𝑔𝑒 ) = 𝛽 𝐸(𝑒𝑥𝑝 |𝑎𝑔𝑒 ) ≠ 0

This is because 𝑒𝑥𝑝 and 𝑎𝑔𝑒 are positively correlated (𝐸(𝑒𝑥𝑝 |𝑎𝑔𝑒 ) > 0). In the case of (C.2) the
assumption of strict exogeneity is violated.

The bias of the omitted variable can be described in the following way. Let:

𝛽 be the estimator of 𝛽 from a simple regression of 𝑙𝑜𝑔𝑖𝑡_𝑤𝑟 on 𝑎𝑔𝑒 (see equation (C.2)).

𝑦 = 𝑙𝑜𝑔𝑖𝑡_𝑤𝑟, 𝑥 = 𝑎𝑔𝑒 and 𝑥 = 𝑒𝑥𝑝 Model (C.1) can then be rewritten as follows:

𝑦 =   𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝜀 (C.3)

It can then be shown that:

𝐸 𝛽 𝑥 ,𝑥 = 𝛽 +𝛽 𝛿 (C.4)

Where 𝛿 is the estimate for  𝛿 from the following regression:

𝑥 =𝛿 +𝛿 𝑥 +𝜍 (C.5)

Equation (C.4) implies that the omitted variable bias is equal to:

𝑏𝑖𝑎𝑠 𝛽 𝑥 , 𝑥 =  𝐸 𝛽 𝑥 , 𝑥 −𝛽 =𝛽 𝛿 (C.6)
From equation (C.6), there can be two cases where the estimator 𝛽 is unbiased.

 Were 𝛽 =0
In this example experience has no impact on the win rate.
 𝛿 =0 this is equivalent to the variable 𝑥 (age) is uncorrelated with the omitted
variable 𝑥 (experience)

In the example stated it is unlikely that any of those conditions would happen. One would expect
that: 𝛽 >0 and 𝛿 >0 (Positive correlation between age and experience)

According to equation (C.2) and the bias formula presented in (C.6) the estimator 𝛽 is biased
upwards because:

𝐸(𝑣 |𝑎𝑔𝑒 ) = 𝐸(𝛽 𝑒𝑥𝑝 + 𝜀 |𝑎𝑔𝑒 ) = 𝛽 𝐸(𝑒𝑥𝑝 |𝑎𝑔𝑒 ) > 0 (C.7)

This problem can be partly solved with the use of panel data. However panel data is not possible
because there is a format change at the end of each season as argued earlier in the introduction of
this paper.
Appendix D – Heteroscedasticity tests

This appendix presents the result of the heteroskedasticity tests performed on all the
specifications used in this paper. The test applied is Breusch-Pagan / Cook-Weisberg test for
heteroskedasticity, with the null hypothesis of constant variance.

Table 5: Results of heteroscedasticity performed all specifications in this paper.


Specification Results from the Collusion of the Breusch-Pagan /
Breusch-Pagan / Cook-Weisberg test for Cook-Weisberg test
Heteroskedasticity (prob>chi2) (null hypothesis: constant variance)
Table 3: 0.1222 Fail to reject the null hypothesis
Regression (1) at the level 5%
Table 3: 0.5557 Fail to reject the null hypothesis
Regression (2) at the level 5%
Table 3: 0.3345 Fail to reject the null hypothesis
Regression (3) at the level 5%
Table 3: 0.4037 Fail to reject the null hypothesis
Regression (4) at the level 5%
Table 3: 0.5535 Fail to reject the null hypothesis
Regression (5) at the level 5%
Table 3: 0.3373 Fail to reject the null hypothesis
Regression (6) at the level 5%
Table 3: 0.6818 Fail to reject the null hypothesis
Regression (7) at the level 5%
Table 4: 0.3254 Fail to reject the null hypothesis
Regression (1) at the level 5%
Table 4: 0.5480 Fail to reject the null hypothesis
Regression (2) at the level 5%
Table 4: 0.3667 Fail to reject the null hypothesis
Regression (3) at the level 5%
Table 4: 0.3928 Fail to reject the null hypothesis
Regression (4) at the level 5%
Table 4: 0.4572 Fail to reject the null hypothesis
Regression (5) at the level 5%

You might also like