Professional Documents
Culture Documents
Sandro RADOVANOVI
University of Belgrade,
Faculty of Organizational Sciences, Serbia
sandro.radovanovic@gmail.com
Milan RADOJII
University of Belgrade,
Faculty of Organizational Sciences, Serbia
radojicic.milan@hotmail.com
Gordana SAVI
University of Belgrade,
Faculty of Organizational Sciences, Serbia
gordana.savic@fon.bg.ac.rs
Received: / Accepted:
1. INTRODUCTION
Contemporary sports have a huge impact on the world economy, so more and
more attention is paid to the analysis of sports teams and athletes. Because of that it is
necessarily to determine their impact, not only on the field, but also on the economy and
society as a whole. In the field of sports analytics, clubs from the United States of
America (like the Boston Red Sox) and clubs from Europe (like AC Milan) are making
biggest progress (Schumaker et al., 2010).
Sports analytics is considered primarily as a statistical analysis (t-test, 2 test,
ANOVA, descriptive statistics etc.), analysis of efficiency and more recently sports data
mining. Usually, events on field, like number of shots on goal, number of passes in 90
minutes of football game or number of homeruns in baseball, are being analyzed in order
to improve team results and identifying weaknesses of opponents. However, with the
growth in popularity and the amount of funds invested in sport, sports analytics requires
more complex analysis. As stated in (Schumaker et al., 2010), forerunner of data analysis
in sports is Anatoly Zelentsov, who made a computer application which tests mental
stability, durability, memory, reaction time and coordination in football club Dynamo
Kiev in the mid 70's. That application was used to test young players, in order to
determine whether these players are able to play for the first team. The results were
surprisingly good, so that the Dynamo Kiev managed to win UEFA Cup Winners' Cup in
1975 and 1986. Efficiency analysis began with work by (Scully, 1974) on baseball and
(Zak et al., 1979) on basketball. After success in quantifying the relationship between
sport related inputs and sport success by aforementioned authors efficiency analysis
founded their application not only in basketball (Lee & Worthington, 2012; Hill & Jolly,
2012; Moreno & Lozano, 2014), but in many other sports like football (Ribeiro & Lima,
2012; Fernandez et al., 2012), baseball (Jane, 2012; Regan, 2012) or chess (Jeremic &
Radojicic, 2010).
Inspired by these and other works, the purpose of this paper is to provide a
comprehensive assessment of National Basketball Association (NBA) players efficiency.
Fortunately, research in sports analytics and sports economics has recently embraced
statistical and mathematical methods for the assessment of sport efficiency. Those new
methods are very important development as these theoretical and empirical relationships
are useful for management decision making process like hiring, play positions, minutes,
play combining and salaries. Further, many statistical algorithms, such as linear
regression and least median square regression, and machine learning algorithms, such as
neural networks and support vector machines, are used to predict efficiency boundary.
This approach allows us to predict relative efficiency of player which was not evaluated
in original DEA model. In other words, this approach allows prediction of relative
efficiency boundary using machine learning algorithms.
The remainder of the paper is structured as follow. Section 2 explains
methodology, and Section 3 finding and analysis. Section 3 is divided to ranking of NBA
3
S. Radovanovi, M. Radojii, G. Savi / Two-phased DEA-MLA approach for
predicting efficiency of NBA players
players by DEA and predicting and testing efficiency frontier. Section 4 concludes the
paper.
2. METHODOLOGY
A NBA players, which play at the guard position and had notable (in terms of
points, assists and other basketball measures) results in the season 2011/12, are
considered for the efficiency analysis in this paper. In the first phase, selected players are
considered as decision making units (DMUs) and comparative analyses of their
efficiency is performed by data envelopment analysis is used. In the second phase, DEA
results are used as a basis for predicting efficiency of new player via boundary form
learned by machine learning algorithm.
s .t .
m
v i x ij=1
i=1 (1)
s m
r y rj v i x ij 0, j=1, , n , j k
r=1 i=1
r , r=1, , s
v i ,i=1, , m
4
S. Radovanovi, M. Radojii, G. Savi / Two-phased DEA-MLA approach for
predicting efficiency of NBA players
The optimal values of efficiency scores hk are obtained by solving the linear
model, n-times (once for each DMU in order to compare it with other DMUs). Efficiency
score hk is greater or equal to 1 for all efficient units and smaller than 1 for
inefficient units. In this way, ranking of units, according to their efficiency, is enabled
(Ray, 2004).
The NBA players database consists of eight indicators, from which two are
considered as input factors and six as output factors. The input indicators, for all players
used in analysis, are gross salary and minutes on the court. Outputs used in analysis are
number of points, number of assists, number of rebounds, number of steals, number of
turnovers and number of blocked shots which player made during regular season
2011/12. All data can be found on (National Basketball Association, 2013; ESPN, 2013).
Based on the correlation matrix show in Table 1 property of isotonicity is satisfied.
Steals
0,4 2 (5)
Turnovers
Modified super-efficiency DEA model (1)-(5) is used for efficiency evaluation
of NBA players considered as DMUs.
5
S. Radovanovi, M. Radojii, G. Savi / Two-phased DEA-MLA approach for
predicting efficiency of NBA players
Phase II Predictive analytics
In order to obtain the DEA results, software EMS 1.3 is used (Scheel, 2000) for
academic purposes. For the analysis we have chosen 26 players and data are taken from
the official statistics of National Basketball Association 2013 (ESPN, 2013).
Based on the results of the efficiency analysis ten out of 24 players are efficient
(Table 2). Player with the highest efficiency score is John Wall with score of 115.30%.
On second place is Russell Westbrook with 114.26%. O.J. Mayo, James Harden and Mo
Williams are close to the efficiency frontier, while Jason Terry, Kobe Bryant and Joe
Johnson are extremely inefficient.
With this equation for calculation of player efficiency decision maker can see
which attributes influence more or which attributes does not influence efficiency at all.
Since linear regression firstly used M5 Prime feature selection techniques in order to
eliminate highly collinear attributes simple efficiency is not selected as important
attribute.
9
S. Radovanovi, M. Radojii, G. Savi / Two-phased DEA-MLA approach for
predicting efficiency of NBA players
Obviously (see Figure 2), support vector machine does not show as good results
as linear regression (there is a bigger difference between actual scores and prediction).
Like linear regression, support vector machines gives weights from which attribute
importance can be obtained (Table 4).
Stat
Stat Per Simple
Salary Minutes Points Assists Rebounds Steals Blocks 1/Turnovers Per
Dollar Efficiency
Minute
-0.034 -0.098 0.073 0.041 0.027 0.019 0.057 0.067 -0.021 -0.011 0.086
10
S. Radovanovi, M. Radojii, G. Savi / Two-phased DEA-MLA approach for
predicting efficiency of NBA players
As a mean absolute error has shown, neural networks define the best fitting
efficiency frontier on the whole dataset, which is clear from the Figure 3.
In order to further evaluate results, we repeated analyses by splitting data set:
70% has used as training data and rest (30%) of dataset has used for testing. It is worth to
notice that local random seed with value 1992 was used in order to get repeatable results.
Validation results are shown in Table 5.
Obviously the best results on the smaller dataset are obtained by liner regression
(mean absolute error is 4.4%). It means that this method shows the highest level of
robustness. Eventually, we can conclude that DEA results can be used as basis for
learning the frontier and predicting the efficiency using a machine learning algorithms
such as linear regression, neural networks and support vector machines for regression.
4. CONCLUSION
This paper, in first phase, employs a DEA to evaluate the efficiency of NBA
players in regular season 2011/2012. In the second phase machine learning techniques are
used to predict efficiency frontier. Using the data available from NBA.com and available
source of NBA salaries we had an opportunity to examine efficiency in non-traditional
way. We were able to perform the ranking by Anderson-Peterson model. In that way we
11
S. Radovanovi, M. Radojii, G. Savi / Two-phased DEA-MLA approach for
predicting efficiency of NBA players
included salary in calculation of efficiency, which is very important for making decisions
on, among other things, hiring, play positions and salaries.
Afterwards, we used machine learning algorithms such as linear regression,
support vector machine or neural networks, to predict efficiency of a new DMU
(players). In this way, we tried to overcome the weakness of DEA. Namely, DEA is good
to estimate the relative efficiency of a DMU, but in order to evaluate efficiency of new
DMU, we need to develop and solve new DEA model. In this paper, we have shown on
the example of 26 NBA players, that DEA efficiency indexes can be used for learning
of models by various machine learning algorithms. Results obtained by neural network
are very reliable with the expected absolute mean error around 0.7%. We also have
shown that results of linear regression algorithm better fits for the smaller data set.
Expected absolute error for the testing 30% of dataset was lowest (4.4%). This paper uses
machine learning (regression) algorithms as a good method for efficiency frontier
prediction.
As a part of future work we plan to perform other types of relative efficiency
evaluation such as distance based analysis (DBA). Further, efficiency boundary for this
efficiency can be, like in this paper, predicted using machine learning algorithms. Also,
we plan compare results gathered with DEA and DBA. One way for improvement of this
is to consider player efficiency over time.
Additionally, we want to evaluate efficiency on team level, like in paper
(Aizemberg et al., 2014). In this paper cross efficiency of team are evaluated over several
seasons. Adding machine learning algorithms we can predict efficiency of team or player
in following years.
REFERENCES
Aizemberg, L., Roboredo, M. C., Ramos, T. G., de Mello, J. C. C. S., Meza, L. A., & Alves, A. M.
(2014). Measuring the NBA Teams Cross-Efficiency by DEA Game. American Journal of
Operations Research, 4(03), 101. DOI=http://dx.doi.org/10.4236/ajor.2014.43010.
Andersen, P., and Petersen, N. C., A Procedure for Ranking Efficient Units in Data Envelopment
Analysis, Management Science, 39(10), 1993, 1261-1264,
DOI=http://dx.doi.org/10.1287/mnsc.39.10.1261.
Charnes, A., Measuring the efficiency of decision making units, European Journal of
Operational Research, 2(6), 1978, 429-444, DOI=http://dx.doi.org/10.1016/0377-
2217(78)90138-8.
Drucker, H., Burges, C. J., Kaufman, L., Smola, A., and Vapnik, V., Support vector regression
machines, Advances in neural information processing systems, 9, 1997, 155-161.
ESPN, ESPN, 2013, Retrieved from http://espn.go.com/nba/salaries.
Fernandez, R. C., Gomez Nunez, T., and Garrido, R. S., Analysis of the efficiency of Spanish
Soccer League Players (2009/10) using the metafrontier approach, Estudios de economa
aplicada, 30(2), 2012, 565-578.
Hill, J., and Jolly, N., Salary Distribution and Collective Bargaining Agreements: A Case Study of
the NBA, Industrial Relations: A Journal of Economy and Society, 51(2), 2012, 342-363,
DOI=http://dx.doi.org/10.1111/j.1468-232X.2012.00680.x.
Hong K.H., Ha S.H., Shin C.K., and Park S.C., Evaluating the efficiency of system integration
projects using data envelopment analyses (DEA) and machine learning, Expert Systems with
Application, 16, 1999, 283-296, DOI= http://dx.doi.org/10.1016/S0957-4174(98)00077-3.
12
S. Radovanovi, M. Radojii, G. Savi / Two-phased DEA-MLA approach for
predicting efficiency of NBA players
Jane, W.-J., Overpayment and Reservation Salary in the Nippon Professional Baseball League: A
Stochastic Frontier Analysis, Journal of Sports Economics, 2012, 579-598,
DOI=http://dx.doi.org/10.1177/1527002511433857.
Jayaraman, A. R., Srinivasan, M. R., & Jeremic, V. (2013). Empirical Analysis of Banks in India
using DBA and DEA. Management, 69.
DOI=http://dx.doi.org/10.7595/management.fon.2013.0029.
Jeremic, V. and Radojicic, Z., A new approach in the evaluation of team chess championships
rankings, Journal of Quantitative Analysis in Sports, 6(3), 2010,
DOI=http://dx.doi.org/10.2202/1559-0410.1257.
Lee, B. L. and Worthington, A. C., A note on the Linsanityof measuring the relative efficiency of
National Basketball Association guards, Applied Economics, 45(29), 2013, 4193-4202.
DOI=http://dx.doi.org/10.1080/00036846.2013.770125.
Lovell, C., and Rouse, A. (2003). Equivalent standard DEA models to provide super-efficiency
scores. Journal of the Operational Research Society, 54, 101-108.
DOI=http://dx.doi.org/10.1057/palgrave.jors.2601483.
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T., Yale: Rapid prototyping for
complex data mining tasks, In Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining, 2006, 935-940, ACM.
Mitchell T., Machine Learning, McGraw Hill, 1997.
Moreno, P., and Lozano, S., A network DEA assessment of team efficiency in the NBA, Annals
of Operations Research, 214(1), 2014, 99-124, DOI=http://dx.doi.org/10.1007/s10479-012-
1074-9.
National Basketball Association, National Basketball Association, 2013, Retrieved from
http://www.nba.com/.
Radovanovi, S., Radojii, M., Jeremi, V., and Savi, G., A Novel Approach in Evaluating
Efficiency of Basketball Players, Management, 67, 2013, 37-45, DOI=http://dx.doi.org/
10.7595/management.fon.2013.0012.
Ray, S., Data Envelopment Analysis: Theory and Techiques for Economics and Operationas
Research. Cambridge, United Kingdom: Press Syndicate of the University of Cambridge,
2004.
Regan, C. S., The price of efficiency: examining the effects of payroll efficiency on Major League
Baseball attendance, Applied Economics Letters, 19(11), 2012, 1007-1015,
DOI=http://dx.doi.org/10.1080/13504851.2011.610735.
Ribeiro, A., and Lima, F., Portuguese football league efficiency and players' wages, Applied
Economics Letters, 19(6), 2012, 599-602.
DOI=http://dx.doi.org/10.1080/13504851.2011.591719.
Savic, G., Makajic-Nikolic, D., and Suknovic, M., AHP-DEA Measure for study program
selection, SymOrg 2012 (pp. 1217-1223). Zlatibor: University of Belgrade, Faculty of
Organizational Sciences.
Savic, G., Radosavljevic, M., and Ilijevski, D., DEA Window Analysis Approach for Measuring
the Efficiency of Serbian Banks Based on Panel Data. Management 65, 2013, 5-14,
DOI=http://dx.doi.org/10.7595/management.fon.2012.0028.
Scheel, H., EMS: efficiency measurement system users manual, Operations Research and
Witshaftinsformetik. University of Dortmund, Germany, 2000.
Schumaker, R., Solieman, O., and Chen, H., Sports Data Mining, New York: Springer, 2010,
DOI=http://dx.doi.org/10.1007/978-1-4419-6730-5.
Scully, G. W., Pay and performance in Major League Baseball, The American Economic Review,
64(6), 1974, 915-930.
Sueyoshi, T., and Mika, G., Returns to scale vs. damages to scale in data envelopment analysis:
An impact of U.S. clean air act on coal-fired power plants. Omega, 41(2), 2013, 164-175,
DOI=http://dx.doi.org/10.1016/j.omega.2010.04.005.
13
S. Radovanovi, M. Radojii, G. Savi / Two-phased DEA-MLA approach for
predicting efficiency of NBA players
Tsang, S.-S., and Chen, Y.-F., Facilitating Benchmarking with Strategic Grouping and Data
Envelopment Analysis: The Case of International Tourist Hotels in Taiwan, Asia Pacific
Journal of Tourism Research, 18(5), 2012, 1-16,
DOI=http://dx.doi.org/10.1080/10941665.2012.695283.
Yeh C.C., Dhi D.J., and Hsu M.F., A hybrid approach of DEA, rough set and support vector
machine for bussines faliure prediction, Expert Systems with Application, 37, 2010, 1535-
1541. DOI= http://dx.doi.org/10.1016/j.eswa.2009.06.088.
Zak, T., Huang, C., and Siegfried, J., Production Efficiency: The Case of Professional Basketball,
The Journal of Business, 52(3), 1979, 379-392.
Zhang W, Tang SY, Zhu YF, and Wang WP, Comparative Studies of Support Vector Regression
between Reproducing Kernel and Gaussian Kernel, World Academy of Science, Engineering
and Technology, 65, 2010, 933-941.