You are on page 1of 28

STATISTICAL INFERENCES AND REGRESSION ANALYSIS IN CRICKET

SUBMITTED BY

GAGANDEEP SINGH 12PGP015 MANOJ H - 12PGP026 NIKESH AGARWAL - 12PGP030 SOURAV MONDAL - 12PGP042 VIJAYKRISHNAN G - 12PGP016

ABSTRACT
Cricket is a sport which employs extensive statistical tools for representation and analysis of data. We, in this project, intended to find how the impact of toss differs on the results of day and day-night matches. For the purpose of this statistical inference, we used the hypothesis testing of two population tool to study the mean of both day and day-night population. The findings showed that toss has a very minimum difference in impact on the result between the day and day-night matches. We have also studied and estimated with ninety percent confidence, the likely target interval for runs scored by Indian team while chasing against Pakistan using single population estimation. This was done with the help of the population which contained all the matches where India faced Pakistan and batted second. In addition to these, we studied the compensation of IPL players and tried to establish the relationship between the players skill using their statistical attributes, and the compensation they are paid using the simple linear regression and multiple linear regression analysis.

GAGANDEEP SINGH 12PGP015 (pgp12015.gagandeep@iimraipur.ac.in)

VIJAY KRISHNAN G - 12PGP016 (pgp12016.vijay@iimraipur.ac.in)

MANOJ H - 12PGP026 (pgp12026.manoj@iimraipur.ac.in)

NIKESH AGARWAL - 12PGP030 (pgp12030.nikesh@iimraipur.ac.in)

SOURAV MONDAL - 12PGP042 (pgp12042.sourav@iimraipur.ac.in) i

ACKNOWLEDGEMENT
We would like to sincerely thank Prof. Naval Bajpai, Indian Institute of Management Raipur for his valuable guidance in this project right from the conception till the completion of the same. We would also like to thank our beloved Prof. B.S. Sahay, Director of Indian Institute of Management Raipur, for rendering his support during the entire project period. We also thank all the anonymous referees for their valuable comments on the report. Last but not the least; we thank our classmates for their encouragement and support.

ii

TABLE OF CONTENTS
ABSTRACT ---------------------------------------------------------------------------------I

ACKNOWLEDGEMENT ------------------------------------------------------------------------------- II TABLE OF CONTENTS LIST OF FIGURES LIST OF TABLES


----------------------------------------------------------------------------------- III ----------------------------------------------------------------------------------- VI ----------------------------------------------------------------------------------- VI

CHAPTER 1

INTRODUCTION --------------------------------------------- 1

1.1 CRICKET ---------------------------------------------------------------------------------------------------------------------- 1 1.2 STATISTICS IN CRICKET -------------------------------------------------------------------------------------------------- 1 1.2.1 INDIVIDUAL STATISTICS------------------------------------------------------------------------------------------- 1 1.2.2 TEAM STATISTICS--------------------------------------------------------------------------------------------------- 2 1.3 APPLICATION OF TOOLS ------------------------------------------------------------------------------------------------ 2 1.3.1 PIE CHART ------------------------------------------------------------------------------------------------------------ 2 1.3.2 WAGON-WHEEL ---------------------------------------------------------------------------------------------------- 2 1.3.3 WORM GRAPH ------------------------------------------------------------------------------------------------------ 2 1.3.4 MANHATTAN CHART ---------------------------------------------------------------------------------------------- 2 1.4 OBJECTIVE OF THE PROJECT ------------------------------------------------------------------------------------------- 3 1.5 STATISTICAL TOOLS EMPLOYED--------------------------------------------------------------------------------------- 3 1.5.1 CHARTS AND GRAPHS --------------------------------------------------------------------------------------------- 3 1.5.2 SINGLE POPULATION ESTIMATION ---------------------------------------------------------------------------- 3 1.5.3 HYPOTHESIS TESTING FOR TWO POPULATION ------------------------------------------------------------- 3 1.5.4 SIMPLE LINEAR REGRESSION ------------------------------------------------------------------------------------ 4 1.5.5 MULTIPLE LINEAR REGRESSION -------------------------------------------------------------------------------- 4

CHAPTER 2 CHAPTER 3

LITERATURE REVIEW ------------------------------------- 5 RESEARCH METHODOLOGY ------------------------------ 7

3.1 WINNING PERCENTAGE USING PIE CHART ------------------------------------------------------------------------ 7 3.1.1 OBJECTIVE------------------------------------------------------------------------------------------------------------ 7

iii

3.1.2 3.1.3 3.1.4

POPULATION -------------------------------------------------------------------------------------------------------- 7 PIE CHART ------------------------------------------------------------------------------------------------------------ 7 INFERENCES --------------------------------------------------------------------------------------------------------- 7

3.2 CAPTAINCY RECORD CALCULATION USING BAR CHART-------------------------------------------------------- 8 3.2.1 OBJECTIVE------------------------------------------------------------------------------------------------------------ 8 3.2.2 POPULATION -------------------------------------------------------------------------------------------------------- 8 3.2.3 INFERENCES --------------------------------------------------------------------------------------------------------- 8 3.3 ACHIEVABLE SCORE AT THE END OF 50 OVERS ------------------------------------------------------------------- 9 3.3.1 POPULATION AND SAMPLING ---------------------------------------------------------------------------------- 9 3.3.2 TECHNIQUE EMPLOYED ------------------------------------------------------------------------------------------ 9 3.4 DIFFERENCE IN IMPACT OF TOSS BETWEEN DAY AND DAY-NIGHT MATCHES --------------------------- 9 3.4.1 POPULATION AND SAMPLING ---------------------------------------------------------------------------------- 9 3.4.2 TECHNIQUE EMPLOYED ------------------------------------------------------------------------------------------ 9 3.5 VALUATION OF PLAYERS IN IPL --------------------------------------------------------------------------------------- 9 3.5.1 REGRESSION --------------------------------------------------------------------------------------------------------- 9

CHAPTER 4

STATISTICAL ANALYSIS AND INTERPRETATION 10

4.1 ESTIMATION OF SINGLE POPULATION ----------------------------------------------------------------------------- 10 4.1.1 SET NULL AND ALTERNATE HYPOTHESIS -------------------------------------------------------------------- 10 4.1.2 DETERMINE APPROPRIATE STATISTICAL TEST ------------------------------------------------------------- 10 4.1.3 LEVEL OF SIGNIFICANCE ----------------------------------------------------------------------------------------- 10 4.1.4 SET THE DECISION RULE ----------------------------------------------------------------------------------------- 10 4.1.5 COLLECTION OF DATA-------------------------------------------------------------------------------------------- 10 4.1.6 ANALYZE THE DATA ----------------------------------------------------------------------------------------------- 10 4.1.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION ----------------------------------------------- 10 4.2 HYPOTHESIS TESTING FOR TWO POPULATION ------------------------------------------------------------------ 11 4.2.1 SET NULL AND ALTERNATE HYPOTHESIS -------------------------------------------------------------------- 11 4.2.2 DETERMINE APPROPRIATE STATISTICAL TEST ------------------------------------------------------------- 11 4.2.3 LEVEL OF SIGNIFICANCE ----------------------------------------------------------------------------------------- 11 4.2.4 SET THE DECISION RULE ----------------------------------------------------------------------------------------- 11 4.2.5 COLLECTION OF DATA-------------------------------------------------------------------------------------------- 11 4.2.6 ANALYZE THE DATA ----------------------------------------------------------------------------------------------- 12 4.2.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION ----------------------------------------------- 12 4.3 REGRESSION ANALYSIS OF IPL VALUATION OF PLAYERS------------------------------------------------------ 12 4.4 REGRESSION ANALYSIS ------------------------------------------------------------------------------------------------- 14

iv

4.4.1 4.4.2 4.4.3 4.4.4

AMOUNT VERSUS STRIKE RATE -------------------------------------------------------------------------------- 14 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 14 AMOUNT VERSUS WICKETS, STRIKE RATE ------------------------------------------------------------------ 14 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 15

4.5 DESCRIPTION OF STATISTICS OF BATSMAN ---------------------------------------------------------------------- 16 4.5.1 AMOUNT (IN US DOLLARS) VERSUS RUNS ------------------------------------------------------------------ 16 4.5.2 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 16 4.5.3 AMOUNT (IN US DOLLARS) VERSUS RUNS, AVERAGE ---------------------------------------------------- 17 4.5.4 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 17

CHAPTER 5

DISCUSSIONS ----------------------------------------------- 18

5.1 BOWLERS------------------------------------------------------------------------------------------------------------------- 18 5.2 BATSMEN ------------------------------------------------------------------------------------------------------------------ 18 5.2.1 REASONS FOR NON-EXPLANATION --------------------------------------------------------------------------- 19

CHAPTER 6

CONCLUSION ----------------------------------------------- 20

6.1 LIMITATIONS -------------------------------------------------------------------------------------------------------------- 20 6.2 FUTURE SCOPE------------------------------------------------------------------------------------------------------------ 20

REFERENCES

----------------------------------------------------------------- 21

LIST OF FIGURES

FIGURE 3.1 PIE CHART FOR WINNING PERCENTAGE FIGURE 4.1 RESIDUAL PLOTS FOR BOWLERS FIGURE 4.2 RESIDUAL PLOTS FOR AMOUNT

7 15 17

LIST OF TABLES
TABLE 3.1 POPULATION DATA TABLE 3.2 INDIA'S WINNING RECORD UNDER MS DHONI TABLE 3.3 MS DHONI'S CAPTAINCY RECORD TABLE 4.1DISTRIBUTION PLOT TABLE 4.2 DESCRIPTION OF VARIABLES TABLE 4.3 DESCRIPTION OF STATISTICS OF BOWLERS TABLE 4.4 BATSMAN STATISTICS 7 8 8 11 13 14 16

vi

CHAPTER 1 1.1CRICKET

INTRODUCTION

The game of cricket has fascinated the minds of many statisticians simply because of the sheer amount and variety of statistics it generates. Individual statistics are recorded for each player during a match, and aggregated over a career for batting and bowling across formats. Team statistics are recorded and maintained separately for various teams in different formats of the cricket like Test matches, One Day Internationals, Twenty 20s, First-Class matches and List-A matches. The test matches are the international variant of the First Class matches and hence the corresponding statistics will be included in the first class statistics of an individual/team. Similarly, the One Day Internationals are a variant of the List-A matches and hence the corresponding statistics will be included in the List-A statistics of an individual/team.

1.2STATISTICS IN CRICKET
The applications of statistics in cricket are very diverse, ranging from analysis of the team/players performance in a particular match/over a period of time, to a comprehensive study of the evolution of the various aspects of the game. For example, with the help of the games statistics, one can predict the impact of a particular player on the outcome, and that would serve as the performance indicator of the player, taken over a period of time. Based on the analysis of general statistics across the different formats of cricket, venue-based and team-based statistics could be arrived at, which upon performing an in-depth analysis tend to reveal a lot of clues on how the game has evolved over the years. 1.2.1 INDIVIDUAL STATISTICS They are generally calculated for each individual player either for a certain set of matches or aggregated over his career. o o o o o o o o o o o o Matches Played Runs Scored Highest Score Batting/Bowling Averages Centuries, Strike Rate Maiden Overs Economy Rate Best Bowling Wickets Partnerships Catches &Stumping Captaincy Statistics 1

1.2.2 TEAM STATISTICS They are generally calculated for the whole team taken together, considering all the individual players statistics into account. o Match Results o Result Margins o Series Results o Innings Totals o Match Scores o Run Rate o Extras etc.

1.3APPLICATION OF TOOLS
Of late, the impact of television coverage on the sport has been profound, and it has provided a huge impetus to develop interesting forms of statistical representation to the viewers. The television networks are thus engaged in pioneering the cause of several new innovative ways of presenting cricket statistics. Some of the most widely used new forms of statistical representation include: 1.3.1 PIE CHART The Pie charts are one of the most widely used methods in representing cricket statistics, and it is a circular chart which is subdivided into many sectors. The size of each of the sector is dependent on the proportion of the total quantity it represents. For example, the extras can be presented as a pie-chart with the different sectors representing the Leg-byes, No Balls, and Wides etc. 1.3.2 WAGON-WHEEL It displays a 2D or 3D plot of various shots or runs scored by a player/team upon a cricket fields overhead view. 1.3.3 WORM GRAPH This is used to represent the runs scored and wickets taken during an innings, plotted against the time or balls bowled during a match. 1.3.4 MANHATTAN CHART This is used to represent the runs scored and wickets in each over during a match. It is a variant of the bar graph/histogram, and it is named as Manhattan Chart because of its similarity to the Manhattan skyline. With the help of various tools like the ones mentioned above, the purpose is to make the viewer understand clearly the impact of statistics on the game of cricket. Thereafter, many methods are devised by the cricket pundits to perform analysis of the statistics, and then to use statistical inferences to arrive at estimations and predictions about the game. 2

1.4OBJECTIVE OF THE PROJECT


The main objective of this project is to illustrate the application of statistical inferences and regression analysis in cricket. A case is taken into account such that the situation is an IndiaPakistan cricket match, and to perform a pre-match analysis, all the One Day Internationals which ended in a result between India and Pakistan so far are taken into account; the results are represented using a pie-chart and then proportion of results in each teams favor is interpreted. Since the data represented using the pie chart was taken from matches spread across a long duration of time, another type of statistic could be considered to perform the analysis. The wins, losses and other results achieved by Team India under the leadership of MS Dhoni are considered, and represented using the bar-chart, which could be used to understand the extremely high win-loss ratio of MS Dhoni, and hence, the head-to-head record advantage of Pakistan would not have a significant say in the outcome of the game. The prediction of the outcome of the game is done in two stages: a) In the pre-match analysis, prediction is done if there would be a difference in the impact of toss between the day and day-night matches, using 2-population Hypothesis testing. b) During the innings break, estimation of an achievable target score range for India is done with a confidence interval of ninety percent. Then, a regression analysis is carried out to determine if the pricing of the players in the IPL auction is explained fully by the various parametric statistics of the individual players or whether the pricing is influenced by other factors as well.

1.5 STATISTICAL TOOLS EMPLOYED


1.5.1 CHARTS AND GRAPHS A chart is a graphical representation of data, in which the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart. A chart can represent tabular numeric data, functions or some kinds of qualitative structures. Charts are often used to ease understanding of large quantities of data and the relationships between parts of the data. Charts can usually be read more quickly than the raw data that they are produced from. 1.5.2 SINGLE POPULATION ESTIMATION The Z statistic can be used in the calculation of prediction intervals. A prediction interval consisting of a lower endpoint designated and an upper endpoint designated, is an interval such that a future observation X will lie in the interval with high probability. 1.5.3 HYPOTHESIS TESTING FOR TWO POPULATION A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observation study. In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold probability, the significance level. 3

1.5.4 SIMPLE LINEAR REGRESSION In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as small as possible. 1.5.5 MULTIPLE LINEAR REGRESSION Multiple linear regressions are when more than one explanatory variable is used to estimate the least squares.

CHAPTER 2

LITERATURE REVIEW

Estenson et al (1994), and Bennett and Flueck (1983) have studied the players compensation that is being done in the game of baseball. Results of auction have showed that salaries matched marginal revenue products and that the open auction showed the declining price anomaly found to exist in real-world auctions. Similarly, Dobson and Goddard (1998) and Kahn (1992) considered compensations made for players in football. Jones and Walsh, (1988) made similar studies in ice-hockey and concluded that skills are the principal determinant of salaries at all positions. Berri, (1999) answers the question of measuring the productivity of an individual participating in a team sport that links the player's statistics in the National Basketball Association (NBA) to team wins. An economic model is employed in the measurement of each player's marginal product. Such a study is useful in answering the question offered in the title, or a broader list of questions by both industry insiders and other interested observers. In cricket, there are a few studies which deal with the game of cricket. Barr and Kantor (2004) intended to determine the important skill set for a batsman in one-day cricket. The batting average statistic has been used to assess the worth of a batsman. However, in the one-day game, limits on the number of balls bowled have introduced a very important additional dimension to performance. Assessing batting performance in the one-day game requires the application of at least a two-dimensional measurement approach because of the time dimension imposed on limited over cricket. They had used a new graphical representation with Strike rate on one axis and the Probability of getting out on the other, akin to the risk-return framework used in portfolio analysis, to obtain useful, direct and comparative insights into batting performance, particularly in the context of the one-day game. However, we have not come across any study that links compensation to player attributes. Rosen (1974) based his model of product differentiation on the hypothesis that goods are valued for their utility generating attributes. According to him, while making a purchase decision, consumers evaluate product quality attributes, and pay the sum of implicit prices for each quality attribute, which is reflected in the observed market price. Hence, price of a product is nothing but the summation of the prices of all quality attributes. Shapiro (1983) presented a theoretical framework to examine the halo effect on prices. Developing an equilibrium price-quality schedule for high-quality products, under the assumption of competitive markets and imperfect information, he showed that reputation facilitates a price premium; hence, reputation building can be considered as an investment good. Weemaes and Riethmuller (2001) studied the role of quality attributes on preferences for fruit juices. The study involved market valuation of various attributes of fruit juice. It did not consider consumers preferences, but generated quality attributes from the product label. The study 5

revealed that consumers paid a premium for nutrition, convenience, and information. In a similar study on tea, Deodhar and Intodia (2004) showed that colour and aroma were the two important attributes of a prepared tea. Extending the analogy to cricket, a cricket player is valued for his on-the-field (and perhaps, offthe-field) performance. We propose that a cricket player sells his cricketing skills for the IPL tournament. The franchisee team owners bid for the players services, for they would like to maximize their utility and player performance is an important argument of their utility function. In equilibrium, the final bid price of a player must be a function of the valuation of winning attributes of a player.

CHAPTER 3

RESEARCH METHODOLOGY

3.1WINNING PERCENTAGE USING PIE CHART


3.1.1 OBJECTIVE To give a clear representation of the matches ending in a result between India and Pakistan in ODI matches played so far. We consider the entire matches played so far. We have a sample size of 117 excluding four matches which have ended in no result 3.1.2 POPULATION Table 3.1 Population Data

Total Matches
117

Won by India
48

Won by Pakistan
69

3.1.3 PIE CHART For the above data a pie-chart can be used best to represent the data.

Total Matches: 117


India 41%

Pakistan 59%

Figure 3.1 Pie Chart for Winning Percentage 3.1.4 INFERENCES The above pie-chart implies that among the total number of matches Pakistan won more matches with total win percentage of 59% and India won 41% of the total number of matches.

3.2CAPTAINCY RECORD CALCULATION USING BAR CHART


3.2.1 OBJECTIVE The objective is to present the best way to represent the captaincy record of MS Dhoni. The number matches won or lost by India under the captaincy of Captain Mahendra Singh Dhoni is taken as the population and the graph is made for the same 3.2.2 POPULATION Table 3.2 India's Winning Record under MS Dhoni

Total Matches
117

Won
80

Lost
32

Tied
2

No result
3

For the above data a bar-chart can be used as the best tool to represent the data.

80

32 2 WON LOST TIED 3 NO RESULT

Table 3.3 MS Dhoni's Captaincy Record 3.2.3 INFERENCES From the above bar chart we can see that under the captaincy of Mahendra Singh Dhoni India played a total of 117 matches among which India won 80 matches, lost 32 matches and tied 2 of them. For 3 of the matches there were no results.

3.3ACHIEVABLE SCORE AT THE END OF 50 OVERS


In the game of cricket, the team chasing can win when it exceeds the score scored by the opponent. For successful chasing of the total we need to have the team batting second score more than the team batting first. Thus, we intended to find the runs that Indian team could score while chasing a target against Pakistan. 3.3.1 POPULATION AND SAMPLING The data of that particular team while chasing was considered as population 3.3.2 TECHNIQUE EMPLOYED Estimation of single population mean was applied to get the intended result. We were able to predict the mean with a confidence level of 90%.

3.4 DIFFERENCE IN IMPACT OF TOSS BETWEEN DAY AND DAYNIGHT MATCHES


We intended to study the impact of toss between day and day- night matches played between India and Pakistan. 3.4.1 POPULATION AND SAMPLING We used the data of matches played between India and Pakistan as the Population. From the population we applied the technique of random sampling and arrived at a sample size of 38 for both the populations of day and day-night matches. 3.4.2 TECHNIQUE EMPLOYED The hypothesis testing for two populations was applied to study the differences between both the population means.

3.5VALUATION OF PLAYERS IN IPL


Next, our objective was to find the whether the valuation of players in IPL is matching their skills or are they over or under valued for their skill. 3.5.1 REGRESSION We developed a regression model for finding the correlation between a players compensation against their skills. We choose a sample consisting of 7 batsmen and 7 bowlers and developed the regression.

CHAPTER 4

STATISTICAL ANALYSIS AND INTERPRETATION

4.1ESTIMATION OF SINGLE POPULATION


4.1.1 SET NULL AND ALTERNATE HYPOTHESIS In this step, we are trying to predict whether India will be able to successfully chase the total of 245 runs in 50 overs. According to the data given we are estimating the single population mean at assumed standard deviation as 53 4.1.2 DETERMINE APPROPRIATE STATISTICAL TEST As the number of samples is greater than 30(64), we take z-test for single sample population mean. We calculate the estimate value using the formula

4.1.3 LEVEL OF SIGNIFICANCE Alpha = 0.10 4.1.4 SET THE DECISION RULE For value of 0.10, value of Z from the z distribution table is +1.645. The null hypothesis will be rejected if the computed value of z is outside +1.645 4.1.5 COLLECTION OF DATA Sample size (Runs): 64 Standard Deviation: 52.71 Mean of Sample: 243.68

4.1.6 ANALYZE THE DATA Z -0.95 P 90% of CI SE Mean 0.340 (232.78, 254.58) 6.63

4.1.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION With the 90% confidence we can say that India will chase down the total of 245 in 50 overs because the total score 245 comes in the range of (232.78, 254.58).

10

Table 4.1Distribution Plot

4.2HYPOTHESIS TESTING FOR TWO POPULATION


4.2.1 SET NULL AND ALTERNATE HYPOTHESIS Null Hypothesis =(1 - 2)=0 (No significant difference in runs scored) Alternate Hypothesis =(1 - 2)0 4.2.2 DETERMINE APPROPRIATE STATISTICAL TEST As the number of samples in both cases is greater than 30 and are independent and their population variance is unknown, we take z-test for two sample population mean. 4.2.3 LEVEL OF SIGNIFICANCE Alpha = 0.10 4.2.4 SET THE DECISION RULE For value of 0.10, value of Z from the z distribution table is +1.645. The null hypothesis will be rejected f the computed value of z is outside +1.645 4.2.5 COLLECTION OF DATA Sample size 1: 38 Sample size 2: 38 Variance of sample 1: 2370.775 Variance of sample 2: 3119.37 Mean of sample 1: 7.539473684 Mean of sample 2: 5.039473684 11

4.2.6 ANALYZE THE DATA Z 0.207988776 P (Z<=z) one-tail Z Critical one-tail P (Z<=z) two-tail Z Critical two-tail 0.417618865 1.281551566 0.835237731 1.644853627

4.2.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION The observed z-value is less than the tabular z-value 1.645. Hence Null hypothesis is accepted and alternative hypothesis is rejected. It can be concluded that there is no significant change in the impact of toss, between the day and day-night matches.

4.3REGRESSION ANALYSIS OF IPL VALUATION OF PLAYERS


We hypothesized that the equilibrium final bid price of an IPL cricket player, as given in equation (1), is nothing but the sum total of the prices of his performance over a given period of time in IPL League. 1) Pi = f ( zi1, ,zij, , zin), where Pi is the final bid price paid to a cricketer i for the IPL tournament and zij is the value of the attribute j of the cricket player i. The hedonic price equation, in this context, is a locus of equilibrium final bid prices and player attributes, where buyers (team owners) and sellers (cricket players) participate in an auction. IPL cricket being both a sport and a source of entertainment, we postulate that cricketing attributes at its best must have been promoted and incentivized which in turn must be contributing to the final bid price of the players. But from the point of view of the organizers and the team owners perspective, winning and crowd-pulling abilities of a player, are very crucial for IPL. Higher the win ability and crowd pulling attributes of the players, higher will be the revenue earned from sale of tickets, broadcasting rights, sports merchandize, memorabilia, and advertisements. Thus we arrive at a fix as to which quality of a player acts as the crucial factor. While non-cricketing attributes matter in IPL we have deliberately excluded them from the analysis process as measurement of such qualitative attributes is a subjective process and at the same time money, cost and time constraints restricts the scope of digging deep into the abstruse process of fair evaluation of such attributes. Thus we have premised and hypothesized that the final bid price and the consequent success of IPL depends mainly on the core competencies of the cricket players, i.e., their cricketing attributes. Variables that capture the batting and bowling performances of cricket players must contribute to the players final bid price. Data on final bid prices and values of cricketing attributes of players are readily available for IPL 2008-IPL 2012. 12

The data sources include the official website of IPL and two other websites, Cricinfo and Wikipedia. For the sake of convenience we have considered only 8 Indian players in each category i.e. Bowlers and Batsman. While we have considered final bidding price as the dependent variable, there is a wealth of data available on the cricketing attributes of IPL players hypothesized above. We have data relating to the individual performances of these 16 players spanning across all the IPLs taken place till date 1) Batsman: For the multiple regression analysis we have taken 2 important independent variables which are the prime determinant of the performances of the players in the long run. The two variables are the Total runs scored and the Batting averages. 2) Bowlers: For the multiple regression analysis of bowlers also we have taken 2 important independent variables which are the prime determinant of the performances of the players in the long run. These are the wickets taken and strike rate. The relevant variables are drawn from observations on skills that are considered important for Twenty20 form of the game. For example, in this shorter version of the game, no one is likely to make centuries frequently. However, a player contributing many runs on a continuous basis and having high batting average would be an asset for the team. While IPL is a Batsmans game, a wicket taking bowler could put a lot of pressure on the opposition, and hence, he would be considered quite useful. To paraphrase the estimated variable coefficients should be having the right signs and are statistically significant, the equation has a reasonably high (adjusted) R-square and maintains parsimony, and there are sufficient degrees of freedom. Based on such guidelines, the variables chosen for estimating equation (1) and their description is reported in Table. It has been taken into consideration as to which combination offered the best goodness of fit in terms of R-square, adjusted R-square, correct signs of the coefficients, t-statistics, and F-statistics. The exact specification of the regression is given below in Equation (2). P (BATSMAN)= b0 + b1(RUNS)+ b2(AVERAGE) P (Bowlers) = b0 + b1(wickets) + b2(strike rate) Table 4.2 Description of Variables
Variable
P Runs Average Wickets Strike Rate

Description
Final bid price of a player. Total runs scored over a span of 5 IPL . Average runs scored in the same period. Total number of wickets taken by a bowler in 5 IPLs. Strike rate i.e. balls per wicket.

13

Table 4.3 Description of statistics of bowlers Name of the bowler


Harbhajan Singh Ishant Sharma Munaf Patel Pragyan Ojha Praveen Kumar R. Ashwin R.P. Singh Zaheer Khan

Wickets 54 36 70 69 53 49 74 65

Strike rate 18.88 29.33 21.22 25.63 22.35 20.66 19.78 19.22

Amount(in US Dollars) 1300000 450000 700000 500000 800000 850000 500000 900000

4.4REGRESSION ANALYSIS
4.4.1 AMOUNT VERSUS STRIKE RATE The regression equation is Amount = 1899656 - 51941 Strike Rate
Predictor Constant Strike Rate S = 226442 Coef 1899656 -51941 SE-Coef 529535 23649 T 3.59 -2.20 P 0.012 0.070

R-Sq = 44.6%

R-Sq(adj) = 35.3%

4.4.2 ANALYSIS OF VARIANCE


Source Regression Residual Error Total DF 1 6 7 SS 2.47345E+11 3.07655E+11 5.55000E+11 MS 2.47345E+11 51275908679 F 4.82 P 0.070

Durbin-Watson statistic = 1.42023


dl=0.76 du=1.33 4-du=2.67 4-dl=3.24

Hence there is is no autocorrelation. 4.4.3 AMOUNT VERSUS WICKETS, STRIKE RATE The regression equation is
Amount = 3190691 - 13138 Wickets - 75398 Strike Rate

14

Predictor Constant Wickets Strike Rate S = 176571

Coef 3190691 -13138 -75398

SE Coef 716164 5955 21286

T 4.46 -2.21 -3.54

P 0.007 0.078 0.017

R-Sq = 71.9%

R-Sq(adj) = 60.7%

4.4.4 ANALYSIS OF VARIANCE


Source Regression Residual Error Total Source Wickets Strike Rate DF 2 5 7 DF 1 1 SS 3.99114E+11 1.55886E+11 5.55000E+11 MS 1.99557E+11 31177190382 F 6.40 P 0.042

SeqSS 7940674349 3.91173E+11

Durbin-Watson statistic = 1.39505

Hence there is no autocorrelation.


Residual Plots for Amount
Normal Probability Plot
99 90 50 10 1 -400000 -200000 0 Residual 200000 400000 200000 100000 0 -100000 -200000 400000 600000 800000 Fitted Value 1000000

Versus Fits

Histogram
3 200000

Residual

Percent

Versus Order

Frequency

Residual

2 1 0

100000 0 -100000 -200000

-200000 -100000

100000

200000

Residual

3 4 5 6 Observation Order

Figure 4.1Residual Plots for Bowlers

15

4.5DESCRIPTION OF STATISTICS OF BATSMAN


4.5.1 AMOUNT (IN US DOLLARS) VERSUS RUNS Table 4.4 Batsman Statistics Name of the Batsman
M.S.Dhoni S.K.Raina V.Sehwag SR Tendulkar V.Kohli R.G.Sharma G.Gambhir R.Dravid

Runs 1782 2254 1879 2047 1639 1975 2065 1703

Average 37.12 33.64 30.3 37.9 28.25 31.35 33.31 27.91

Amount(in US Dollars) 1300000 1800000 1800000 2400000 500000 2000000 1800000 1800000

The regression equation is Amount in US Dollars) = - 1663273 + 1740 Runs

Predictor Constant Runs S = 467406

Coef -1663273 1740.5

SE Coef 1649219 855.5

T -1.01 2.03

P 0.352 0.088

R-Sq = 40.8%

R-Sq(adj) = 31.0%

4.5.2 ANALYSIS OF VARIANCE


Source Regression Residual Error Total DF 1 6 7 SS 9.04188E+11 1.31081E+12 2.21500E+12 MS 9.04188E+11 2.18469E+11 F 4.14 P 0.088

Durbin-Watson statistic = 2.59499

Thus there is no autocorrelation

16

4.5.3 AMOUNT (IN US DOLLARS) VERSUS RUNS, AVERAGE The regression equation is
Amount (in US Dollars) = - 1946333 + 1562 Runs + 19235 Average Predictor Constant Runs Average S = 506777 Coef -1946333 1562 19235 SE Coef 1992016 1080 59656 T -0.98 1.45 0.32 P 0.373 0.207 0.760

R-Sq = 42.0%

R-Sq(adj) = 18.8%

4.5.4 ANALYSIS OF VARIANCE


Source Regression Residual Error Total Source Runs Average DF 1 1 DF 2 5 7 SS 9.30888E+11 1.28411E+12 2.21500E+12 MS 4.65444E+11 2.56822E+11 F 1.81 P 0.256

Seq SS 9.04188E+11 26699560534

Durbin-Watson statistic = 2.39651

Thus there is no autocorrelation.


Residual Plots for Amount(in US Dollars)
Normal Probabilit y Plot
99 90 500000

Versus Fit s

Residual

250000 0 -250000 -500000

Percent

50 10 1 -1000000 -500000 0 Residual 500000 1000000

1000000

1500000 Fitted Value

2000000

Histogram
2.0 500000

Versus Order

Frequency

Residual

1.5 1.0 0.5 0.0


-750000 -500000 -250000 0 250000 500000

250000 0 -250000 -500000 1 2 3 4 5 6 Observation Order 7 8

Residual

Figure 4.2Residual Plots for Amount

17

CHAPTER 5 5.1BOWLERS

DISCUSSIONS

We can clearly see that coefficient of determination is very low i.e.44.6% for strike rate as an individual factor i.e. Simple Linear regression. This indicates the low level of correlation between strike rate of a bowler and the amount paid to him in IPL. In other words only around 44.6% of the change in amount is determined or explained by strike rate of the bowler. The rest of the change is unexplained. Similarly coefficient of determination in multiple regression model has been determined as 71.9% which too is low. Hence it can be safely concluded that the performance factors are not at the helm for determination of the bid price of the bowlers which is rather determined by various other factors which have been discussed later on in the below mentioned analysis of the regression output.The corresponding p-value has been obtained as 0.042 which lies in the rejection region. H0: Key performance indicators (wickets and strike rate) are the key determinant of the amount paid to bowlers in IPL. H1: Other factors act as the key determinants of the amount paid to the bowlers.

Since the null hypothesis has been rejected and the alternative hypothesis has been selected the key conclusion that can be derived from the above exercise is that there are a variety of other reasons responsible for the insuperably high amount of money paid to bowlers.

5.2BATSMEN
It is clearly evident that the coefficient of determination for runs is quite low at 40.8% which indicates runs do not play a major role in the fixing of the disbursements of the cricketers.This indicates the low level of correlation between runs scored by a batsman and the amount paid to him in IPL.It implies that only 40.8% of the change in amount is determined or explained by runs scored by the batsman in the T-20 format. The rest of the change is unexplained.Similarly coefficient of determination in multiple regression model has been determined as 42% which too is really low. Hence it can be safely concluded that the performance factors are not the key factors to be considered as majority of the part is dependent upon various other factors. The corresponding p-value has been obtained as 0.256 which lies in the rejection region. H0: Key performance indicators (runs and average) are the key determinant of the amount paid to batsman in IPL. H1: Other factors act as the key determinants of the amount paid to the batsman.

18

Thus null hypothesis has been rejected driving home the point that there are various other factors in operation which may be responsible for the amount of money being so high. These high premiums, over and above thecompensation for their cricketing attributes, seem to bea reflection of their ability to draw huge crowds nationallydue to their charismatic association with film stars, the racial controversies surrounding them etc. 5.2.1 REASONS FOR NON-EXPLANATION Some of the reasons which may account for non-explanation of the relation might be as follows: 1. 2. 3. 4. 5. Iconic Value. Glamour. Controversy. Age. Popularity.

19

CHAPTER 6 6.1LIMITATIONS
The limitations of our study are:
1.

CONCLUSION

The usage of the pie chart and the bar chart to represent the statistics for earlier IndiaPakistan matches was appropriate, but when predictions are made with the help of those representative forms with respect to the current match, it is not exactly possible because of the inherent unpredictability in the game of cricket. While estimating the achievable target score with a ninety percent confidence interval range, we take into account only the matches played already between the two teams, without considering other factors like the difference in the set of players between those games and the current match, the form in which the individual players are currently in, the nature of the pitch, weather conditions etc. This might result in incorrect range estimation. In the determination of the difference in the impact of toss, we calculate the net run-rate difference between the teams batting first and second, and arrive at two populations, one each for the Day and Day-night matches. But in this case, the net run rate difference is calculated across the maximum overs for all the matches, and the event of teams chasing down targets easily without losing wickets is not explained through our population. During the process of developing a regression model for determining the pricing of an IPL player based on his statistical attributes, there are many intangible attributes of an individual player. For example, a players brand value, image, relevance to the franchise is all taken into account while determining his price. But, these aspects are completely ignored in our study while determining the regression model. This probably explains the low correlation between the independent variables and the pricing of the player.

2.

3.

4.

6.2FUTURE SCOPE
Single population estimation could be used to estimate the likely scores of people with confidence based on their previous performances. This could help the teams in formulating the strategies against he opponent. Regression model could be applied in to fix a players compensation based on his skill set. This could help the team franchise to fix a ceiling price on each player before going in for auction. This could help them spend the money accordingly and thus could achieve maximum return on money.

20

REFERENCES
Armstrong, J and Willis, R J (1993). Scheduling the Cricket World Cup: A Case Study, The Journal of the Operational Research Society, 44(11), 1067-1072. Barr, G D I and Kantor, B S (2004). A Criterion for Comparing and Selecting Batsmen in Limited Overs Cricket, Journal of the Operational Research Society, 55(12), 12661274. Bennett, J M and Flueck, J A (1983). An Evaluation of Major League Baseball Offensive Performance Models, The American Statistician, 37(1), 76-82. Berri, D J (1999). Who Is Most Valuable? Measuring the Players Production of Wins in the National Basketball Association, Managerial and Decision Economics, 20(8), 411-427. Cricinfo. http://www.cricinfo.com/, as on September 13, 2012. Estenson, P S (1994). Salary Determination in Major League Baseball: A Classroom Exercise, Managerial and Decision Economics, 15(5), 537-541. Jones, J C H and Walsh, W D (1988). Salary Determination in the National Hockey League: The Effects of Skills, Franchise Characteristics, and Discrimination, Industrial and Labor Relations Review, 41(4), 592-604. Rastogi, S. K. (APRIL - JUNE 2009). "Player Pricing and Valuation of Cricketing Attributes:Exploring the IPL Twenty20 Vision". Vikalpa, Volume 34, 15-23.

21

You might also like