You are on page 1of 13

Mark Russeff

Econ 482
Final Project

Gold Miner Stock Returns and The Price of Gold: A regression analysis

Introduction
Due to the economic turmoil brought on by the financial crisis of 2008 the price of gold
has surged to new highs; for investors this presents a substantial opportunity. However, the cost
of safely storing gold bullion keeps most investors from owning physical gold. For many
investors the best way to invest in gold is through a gold ETF, like the GLD Gold ETF. In
addition to gold ETFs there are also gold mining companiesbut are the monthly returns of
these gold miners directly influenced by the price of gold or do their returns follow the general
market? This is the question I hope to answer with the following analysis.
The question of what influences the returns of gold mining companies is an important
question because gold is often thought to be uncorrelated with the market, which gives investors
a way to hedge against market uncertainty. So if the returns of the gold miners are heavily
influenced by the price of gold rather than the returns of the market, then gold miners too can be
used as a hedge against market uncertainty.
For this paper I will analyze the stock returns for gold mining companies by performing
an OLS regression of the GDX Gold Miners ETF on the GLD Gold ETF and the SPY S&P 500
Index ETF. Then by interpreting my results I hope to garner some evidence of a relationship
between the returns of the gold miners and the price of gold or the general market.

Literature Review
There has been some research on the influence of gold prices on the prices of the gold
miners. The first research on the subject was that of Tufano(1998), who studied the pricing of
the stock of gold miners and found that gold prices played a significant role in their price
movement. Specifically, he found that a 1% move in gold corresponded to a 2% change in the
price of the gold miners stock. Building on the research of Tufano(1998), Faff(2004) analyzed
gold miners in Australia, where he found that only the volatility in gold prices affected the price
of the miners stock.
Data
For this analysis I used monthly price data uploaded from Yahoo! Finance using the
get.hist.quote() function in R. I used price data from the GDX Gold Miners ETF which is an
ETF that seeks to replicate the price movements of the AMEX Gold Miners Index. I also used
the GLD Gold Index replicates the spot price of gold and the SPY S&P 500 Index which
replicates the price movements of the S&P 500 Index. The returns where calculated using the
lagged differences in the log(prices) and are available in Chart 1 of Appendix II or Table 1 in
Appendix III. The means and standard deviations of the return data are below:


Modeling Approach
For this project I have chosen to perform a multiple linear regression analysis using
ordinary least squares (OLS) estimators in R. For this analysis I will be using two explanatory
variables to construct a model of the general form:


In order to analyze the effect of the returns of gold and the market on the returns of the
gold miners I will run a regression of the GDX Gold Miner EFT on the GLD Gold ETF and the
SPY S&P 500 Index ETF. This would construct a model of the form:


GDX
SPY
GLD
Mean Standard deviation
0.009463229
-0.001123934
0.015871765
0.12638073
0.05670286
0.12638073
Where

is the monthly return predicted by the model for the GDX Gold Miner EFT,

is the actual monthly return of the GLD Gold ETF and

is the actual monthly return on


the SPY S&P 500 Index ETF. This model is not very different from where I started originally,
because it was able to pass the tests for proper specification, homoscedasticity, and
autocorrelation.

Results
The resulting model of the multiple linear regression of the GDX on the GLD and the
SPY resulted in the following model (R output in Appendix I):


() () ()
The resulting model tells us that for every 1% move in the GLD the GDX goes up by
1.87% suggesting a strong positive relationship between the return on gold and the returns of the
gold mining companies stock. Additionally for every 1% move in the market represented by the
SPY the GDX goes up by only 0.61% with means that the monthly return on gold has a much
larger effect on the return of the gold miners than the does the return of the market. The

, which is very large, telling us that this is a well specified model.


In order to test whether or not the coefficients actually have an effect on the return of the
GDX, we will use a t-test in to test the hypothesis:


We can reject the null hypothesis at the 1% level with df = 45, so we know that

does have
some effect on the returns of the GDX.


We can reject the null hypothesis at the 1% level with df = 45, so we know that

does have
some effect on the returns of the GDX, so GDXs returns are influenced somewhat by the
markets returns.
Another issue we could have is the problem of multicollinearity, intuitively we wouldnt
think that the returns on gold and the returns on the market were strongly positively correlated
but it is good to check.
(

)
There is only a very slightly positive correlation between the two which suggests that
multicollinearity should not be a problem.
Next we must check for heteroscedasticity in order to make sure that the variance of the
probability distribution of the disturbance term u is not different across observations, this would
cause our OLS estimators to be inefficient. For our OLS estimators to be efficient we what the
variance of the probability distribution of the disturbance term u to be the same across all
observations, this is known as homoscedasticity.


In order to test for the presence of heteroscedasticity, a White test must be performed in which
the squared residual will be regressed on the explanatory variable, their squares and their cross
products. The R output for the regression is located in Appendix I, the important figure we need
for the White test is

. Now we can perform the White test for heteroscedasticity:


Using the critical value of chi-squared with five degrees of freedom (11.070) at the 5% level, we
fail to reject the null hypothesis of homoscedasticity. To confirm these results a Goldfeld-
Quandt test was also run using the R function gqtest() and the output is contained in Appendix I.
The GQ test splits the data into 3 parts and runs separate regressions on the top and bottom 15
observations and then uses the residual sum of squares for each regressions and an F-test is
performed:
()


Using the critical value of F(12,12) at the 5% level, we confirm the finding of the White test and
we fail to reject the null hypothesis of homoscedasticity. Based on the above tests we can say
with some confidence that the distribution of the disturbance term is homoscedastic.
Finally in order to be assured that our OLS estimates are efficient we must check for
autocorrelation. This is especially important in this regression because we are dealing with time-
series data which often suffers from autocorrelation caused by the disturbance term picking up
the influence of variables not included in the regression equation on the dependent variables.
Visually when plotting the autocorrelation function, Plot 3 in Appendix II, of the disturbance
term we see no significant autocorrelation. However, we must also test for autocorrelation using
the Durbin-Watson d statistic. The d statistic will be sufficient in our case because our
regression does not contain a lag dependent variable. First the d statistic is calculated using a
while loop in R (see Appendix I) and then we test for autocorrelation:


We used the df=50 because our true df=48 and we know that the critical value for

at df=50 is
more than that of df=48 so we can be sure that null hypothesis would be rejected at the true
degree of freedom as well. This test result tells us that there is no autocorrelation present in this
model.

Conclusion
In conclusion we can see that this model is fairly well specified which I originally did not
anticipate. Due to the fact that I was dealing with time series data I expected that I would have
to deal with the problem of autocorrelation, but I did not. The model also yielded very similar
results to the prior research on the subject, which found a two percent move in the mining stocks
for every one percent in the price of gold. The model I created suggests a 1.87% move in the
price of the GDX Gold Miner EFT for every 1% move in the GLD Gold ETF and the 2% move
found in the prior research is with the 95% confidence interval for the coefficient of the GLD.
So my findings do support those of the past research, but I am surprised at how closely I was
able to replicate their result with a very simple multiple regression model.













Appendix I R output

> summary(gdx.fit)
Call: lm(formula = GDX ~ GLD + SPY, data = projectReturns.z)
Coefficients:
Estimate Std. Error t value
(Intercept) -0.019560 0.008276 -2.364
GLD 1.872082 0.142201 13.165
SPY 0.614073 0.140480 4.371
---
Residual standard error: 0.05513 on 45 degrees of freedom
Multiple R-squared: 0.8178, Adjusted R-squared: 0.8097
F-statistic: 101 on 2 and 45 DF, p-value: < 2.2e-16
> confint(gdx.fit, level=0.95) 95% confidence interval
2.5 % 97.5 %
(Intercept) -0.03622780 -0.002891873
GLD 1.58567481 2.158488818
SPY 0.33113260 0.897013961
> cov.mat = var(projectReturns.z)
> cor.mat = cov2cor(cov.mat)
> cor.mat
SPY GLD GDX
SPY 1.00000000 0.07372496 0.3408144
GLD 0.07372496 1.00000000 0.8605025
GDX 0.34081442 0.86050251 1.0000000

White Test
> white.fit = lm(residuals ~ GLD + SPY + GLD*SPY + I(GLD^2) + I(SPY^2), data=projectReturns.z)
> summary(white.fit)
Call:
lm(formula = residuals ~ GLD + SPY + GLD * SPY + I(GLD^2) + I(SPY^2), data = projectReturns.z)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0016614 0.0007894 2.105 0.0413 *
GLD 0.0133351 0.0106804 1.249 0.2187
SPY 0.0170795 0.0111570 1.531 0.1333
I(GLD^2) 0.0771926 0.1452605 0.531 0.5979
I(SPY^2) 0.2399527 0.1559507 1.539 0.1314
GLD:SPY -0.1913851 0.1529494 -1.251 0.2178
---
Residual standard error: 0.003636 on 42 degrees of freedom
Multiple R-squared: 0.1222, Adjusted R-squared: 0.01775
F-statistic: 1.17 on 5 and 42 DF, p-value: 0.34
> 48*0.1222
[1] 5.8656

Goldfeld-Quandt Test
> gqtest(gdx.fit, fraction= 0.375, order.by= ~GLD, data=projectReturns.z)
Goldfeld-Quandt test
data: gdx.fit
GQ = 1.4003, df1 = 12, df2 = 12, p-value = 0.2844

Durbin-Watson d statistic Using While Loop
> i <- 2
> sum <- 0
> while(i < 49) {
+ sum <- sum + ((residuals[i]-residuals[i-1])^2)
+ i <- i + 1
+ }
> rss=sum(residuals^2)
> sum/rss
2.3091
Using dwtest() funcion
> dwtest(gdx.fit)
Durbin-Watson test
data: gdx.fit
DW = 2.3091, p-value = 0.8545
alternative hypothesis: true autocorrelation is greater than 0














Appendix II Charts
Chart 1: Returns

Chart 2: 3D box plot with regression plane.

Chart 3: Autocorrelation function chart for the residuals


Appendix III - Data

> summary(projectReturns.z)
SPY GLD GDX
Min. :-0.180562 Min. :-0.17602 Min. :-0.478045
1st Qu.:-0.033083 1st Qu.:-0.01508 1st Qu.:-0.068608
Median : 0.012158 Median : 0.02333 Median : 0.010018
Mean :-0.001124 Mean : 0.01587 Mean : 0.009463
3rd Qu.: 0.037605 3rd Qu.: 0.05170 3rd Qu.: 0.059438
Max. : 0.094735 Max. : 0.12033 Max. : 0.293983

> projectReturns.z
SPY GLD GDX
Jan 2007 0.0149583510 2.530594e-02 -0.0076923456
Feb 2007 -0.0198416203 2.513269e-02 0.0097336834
Mar 2007 0.0115590641 -1.119358e-02 -0.0105061825
Apr 2007 0.0433531903 2.032743e-02 -0.0002576324
May 2007 0.0333221774 -2.337436e-02 -0.0100997880
Jun 2007 -0.0147440151 -1.956768e-02 -0.0330761730
Jul 2007 -0.0317693312 2.337489e-02 0.0526644282
Aug 2007 0.0127566697 1.103480e-02 -0.0588714377
Sep 2007 0.0379495549 9.991880e-02 0.1859694700
Oct 2007 0.0135066988 6.720467e-02 0.1095325155
Nov 2007 -0.0395042071 -1.667347e-02 -0.0875242915
Dec 2007 -0.0113727563 6.436067e-02 0.0052631700
Jan 2008 -0.0623499040 1.029322e-01 0.0940766188
Feb 2008 -0.0261498724 5.097596e-02 0.0534907587
Mar 2008 -0.0089772202 -6.186656e-02 -0.1076907116
Apr 2008 0.0465417796 -4.247786e-02 -0.0898814783
May 2008 0.0149696604 9.190185e-03 0.0541519317
Jun 2008 -0.0872160560 4.417828e-02 0.0542620916
Jul 2008 -0.0090476024 -1.454731e-02 -0.1111770920
Aug 2008 0.0153325678 -9.752177e-02 -0.1440945796
Sep 2008 -0.0989620799 4.029805e-02 -0.1078788490
Oct 2008 -0.1805624151 -1.760173e-01 -0.4780453703
Nov 2008 -0.0720489886 1.184370e-01 0.2377728613
Dec 2008 0.0096674709 7.448145e-02 0.2429387822
Jan 2009 -0.0856898986 5.388471e-02 0.0103017835
Feb 2009 -0.1136581631 1.435275e-02 -0.0258059701
Mar 2009 0.0800319537 -2.569711e-02 0.1002694532
Apr 2009 0.0947351603 -3.390919e-02 -0.1138833376
May 2009 0.0568127215 9.742260e-02 0.2939826209
Jun 2009 -0.0006656313 -5.359378e-02 -0.1551348946
Jul 2009 0.0719257174 2.352030e-02 0.0504047768
Aug 2009 0.0363049403 5.354752e-04 -0.0065756433
Sep 2009 0.0348405921 5.671220e-02 0.1366349198
Oct 2009 -0.0194237436 3.655189e-02 -0.0665768965
Nov 2009 0.0597647145 1.203265e-01 0.1875519205
Dec 2009 0.0189418666 -7.476008e-02 -0.0984333102
Jan 2010 -0.0370258792 -1.266018e-02 -0.1264768526
Feb 2010 0.0307517054 3.222341e-02 0.0749671315
Mar 2010 0.0550125393 -4.396014e-03 0.0117781656
Apr 2010 0.0153354901 5.716861e-02 0.1287066672
May 2010 -0.0828309817 3.005691e-02 -0.0129522584
Jun 2010 -0.0531081460 2.328007e-02 0.0412551137
Jul 2010 0.0660470739 -5.221070e-02 -0.0747003190
Aug 2010 -0.0459904765 5.549262e-02 0.1059617451
Sep 2010 0.0857615908 4.665032e-02 0.0423652910
Oct 2010 0.0374904193 3.616100e-02 0.0241997149
Nov 2010 0.0000000000 2.089316e-02 0.0376756888
Dec 2010 0.0493164916 -7.384706e-05 0.0356591099

You might also like