You are on page 1of 4

STAT 151A Hypothesis Testing Writeup Demonstration

These notes cover a partial solution to the last problem from Homework 5, and
also one other problem where we are concerned with testing for interactions of two
factors. The point here is not to provide a complete homework solution, but to
illustrate (1) the writing style of an acceptable answer in a write-up where your
explanations should stand alone, and (2) correct and incorrect ways of phrasing the
explanation of hypothesis tests.
First, we load the data
Ericksen =
read.table("http://socserv.socsci.mcmaster.ca/jfox/
Books/Applied-Regression-2E/datasets/Ericksen.txt",
header = T)

attach(Ericksen)

monority = log(minority)
crime = log(crime)

Describing the Regression and T-tests


This part I did not expect you to get, but it is worth thinking about what we
are actually doing in this regression.
The data in question is Ericksen’s data, consisting of information about crime
and various demographic variables in the fifty states.
We fit the model
Y = α + β1 X1 + β2 X2 + β3 X3 +  (1)
with Y representing the logarithm of crime per 100,000 residents X1 representing
the logarithm of the percentage of residents in poverty, X2 representing the loga-
rithm of the percentage of residents who are minorities, and X3 representing the
percentage of residents with a high school education. Here we assume  is normally
distributed with mean zero, although any finite-variance mean-zero error leads to
approximately correct inference. Each of these variables is observed once for each
state, and we assume the errors are independent.
We did not attempt to investigate the variables in detail; our decisions to take the
logarithm of the crime and minority variables were based on their highly skew one-
dimensional distributions. One disadvantage of doing this is that the coefficients
of the model are hard to interpret, but our main interest here is testing whether
certain variables have any association after controlling for the others, and for those
purposes the units of the coefficients are not important.
We are not checking the model assumptions. The linearity of the relationship is
hard to check, and the methods for trying to do so are not covered as of Chapter
6. As for independent errors, this is not true. The states are not a random sample
because they are dependent in any reasonable model. But we accept that the model
is not perfect, and hope it proves useful. All the results to follow assume the model
above is valid.
The output of the regression is given below:(I am including the code; you should
not do this on your next lab. In a real research report, the table should be made
more readable than the raw R output, but I am willing to accept raw R output as
long as you provide some explanation)

CODE

model = lm(crime~poverty+minority+highschool)
1
2

summary(model)

OUTPUT:

lm(formula = crime ~ poverty + minority + highschool)

Residuals:
Min 1Q Median 3Q Max
-0.76675 -0.14504 -0.04639 0.16388 0.82748

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.057495 0.129794 31.261 < 2e-16 ***
poverty 0.024410 0.011596 2.105 0.0393 *
minority 0.217060 0.032441 6.691 7.35e-09 ***
highschool -0.024877 0.005674 -4.384 4.59e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.256 on 62 degrees of freedom


Multiple R-squared: 0.5306,Adjusted R-squared: 0.5079
F-statistic: 23.36 on 3 and 62 DF, p-value: 3.088e-10

Based on the p-values in the table of t-statistics, we conclude that all of the
explanatory variables have statistically significant association with crime at the
five-percent level.
This could also be phrase We have statistically significant evidence, at the five-
percent level, that each of the βj (tested individually) is nonzero in the model.
This can also be phrased We have statistically significant evidence that elimi-
nating any one variable Xj from the model leads to a model with less accurate
predictions.
This can also be phrased We have statistically significant evidence that the model
(1) fits the data better than any of the corresponding two-variable models.
Note: It is actually better not to use one sentence to describe all three tests,
because technically there is a multiple testing problem. I won’t worry too much
about that though. The other advantage of writing a sentence for each test is that
then you can refer to the variables by name. I’m too lazy to type out each of the
phrasings above for each of the three tests, so I’ll do a sample of this better answer:
(Ideal solution., with some variation in the phrasing..)
• The t-test of the hypothesis H0 : β1 = 0 against the full model in (1) results
in a very small p-value, indicating that we have statistically significant
evidence indicating that removing poverty from the model results in a worse
model fit.
• The t-test of the hypothesis H0 : β1 = 0 against the full model in (1) results
in a small p-value of about 0.04, indicating that the crime rate and the
poverty rate have a statistically significant association after adjusting for
the effects of minority percentage and highschool education demographics.
(Note: the word ‘effects’ is dangerous, but above it was less inappropriate
because I did not talk about the statistically significan effect of poverty; it
is okay to use the word effect because it is already in the statistical jargon,
i.e. “main effects,” but don’t use it as the main object of a sentence because
then you sound like you are saying something causal)
• The t-test of the hypothesis H0 : β1 = 0 against the full model in (1)
results in a very small p-value, indicating strong statistical evidence that
3

highschool grauation demographics are useful for predicting the crime rate,
even after accounting for poverty and minority demographics.
More Notes on this:
It is even better if you talk about linear association and linear predictive power,
but that is understood if you tell the reader you fit a linear model then this is what
you mean, and at some point it is okay to save words.
It is not appropriate to say that any test provides evidence that variable X has
an effect on the crime rate. This phrasing suggests causation, and the only way to
establish causation is to know that your model is a valid causal model. This is only
possible if you make stronger assumptions that we like to make, or if you run an
experiment, which this is not.
It is also undesirable to talk about the association between variable X and the
crime rate without mentioning that you are accounting for the other variables in (1)
because the association depends on the model. As you saw in the lab, for instance,
highschool does not have a statistically significant association with the crime rate
in the marginal model Y = α + β3 X3 + , but it does have a statistically significant
association after accounting for the other variables X1 and X2 .

Some other tests and how to phrase them


You were asked to perform simple linear regressions and test the hypothesis
βj = 0 for these models, where there is only one Xj as an explanatory variable.
The most interesting case was X2 , the highschool variable. The p-value was large,
close to one half.
So the model is
Y = α + β3 X3 (2)
Sample Explanation of this Test Based on the t-test of H0 : β3 = 0 against the
alternative (2) we obtain a large p-value, so we do not reject the null hypothesis.
In other words, the data do not give sufficient evidence to conclude that there is a
nonzero linear association between the highschool demographics of a state and its
crime rate. ( This part is optional, but helpful given that we did a similar-looking
test and came to the opposite conclusion earlier:) Note that in this case, we are
finding insufficient evidence to conclude there is a nonzero association marginally,
i.e. without adjusting for the association of crime rate with minority and poverty
demographics.
It is not correct to say something to the effect of “we are testing H0 under the
assumption that poverty and minority do not effect the crime rate.” This is in-
correct for two reasons: (1) the word ‘effect’ is always risky; this isn’t a causal
model, so it doesn’t make sense to talk about assuming cause-and-effect type re-
lationships or the lack thereof, and (2) even if you change ‘effect’ to ‘association,’
this is still wrong. It is perfectly okay for both models (1) and (2) to be correct,
even if β1 , β2 6= 0 in (1). This just means that the models are different. The value
of β3 in the two models might not be the same, and the value of  will change. The
errors actually get bigger in the simple model if β1 , β2 6= 0 in the full model; this is
why it is appropriate to talk about the predictions getting worse, or the model fit
getting worse, when discussing F -tests.
The point here the fact that the model fit gets worse does not mean that the
simple linear regression model is wrong, or that in performing it we are somehow
assuming that other variables don’t matter. We are just ignoring those other vari-
ables, and letting their associations show up in the value of β3 from the simple
regression, and also in the error term. The book has discussion about this on page
110—the relationships involved in our statistical tests are what he calls ‘empirical
associations,’ and it is perfectly okay for apparently contradictory empirical models
to all be valid. So, in your discussion of a simple linear regression, don’t say things
4

like “we are assuming that other variables don’t matter.” This is not true, if we
take the empirical point of view rather than the structural point of view, and the
empirical point of view is the only one that can be supported by pure statistics
(except in an experimental context).

An ANOVA Interaction Test Example


CODE:

interlockDat =
read.table("http://socserv.socsci.mcmaster.ca/jfox/Books/
Applied-Regression-2E/datasets/Ornstein.txt", header = T)
attach(interlockDat)

anovaModel = lm(interlocks ~ sector*nation)


anova(anovaModel)

OUTPUT:

Analysis of Variance Table

Response: interlocks
Df Sum Sq Mean Sq F value Pr(>F)
sector 9 20263 2251 13.0872 < 2.2e-16 ***
nation 3 4125 1375 7.9917 4.439e-05 ***
sector:nation 16 1823 114 0.6624 0.829
Residuals 219 37675 172

Recall that in the interlock data set, we have the economic sector, assets, and
nation of ownership for various firms operating in Canada. As in the preceding
dataset, it is debatable whether this data should be treated as a random sample,
but again even a wrong model can tell us something.
The ANOVA table above gives F -tests for a model with just an intercept against
one accounting for sector, and then for a model accounting for both nation and
sector against one accounting for just sector, and finally for the full two-way model
with interactions.
The F -test of interest is the one in the last row, testing whether the interactions
are statistically significant. The p-value is quite high, so we do not reject the null
hypothesis. In other words, we do not have enough evidence to conclude that the
model with interactions fits the data any better than the model without interactions.
Hence, we conclude that it is appropriate to model the effects of sector and nation as
additive (Here again I used the word effects, but again I was using it in a situation
where it is statistical jargon and will be understood by informed readers. I did NOT
say that some variable has an effect, which seems to suggest causation; I simply
refered to ‘the effects’ of two variables. The distinction is subtle, but important if
you want to avoid misleading statements)

You might also like