You are on page 1of 2

STATS 330 / STATS 762

Mock Midterm Test 1


SC 2015

Instruction
Answer all questions.

The test is worth 20% of your grade.

You have 1 hour.

1. In data cleaning, which attributes describe data as inconsistent? [4 marks]

2. In a study, the Percentage of Body fat 50 male workers was measured, and their daily
schedule of work hours and exercise hours collected. The data are summarised in the
following trellis plot. Describe and interpret the plot. [6 marks]

6 8 10 12 14

exercise exercise exercise


30

25

20



15


10
Body Fat (%)






5
exercise exercise exercise
30



25


20




15




10

6 8 10 12 14 6 8 10 12 14

Work (h)

3. Describe the concept of a confounding variable in multiple linear regression. [4 marks]

4. An insurance company is interested in predicting cost of insurance claims from the age,
experience, and amount of sleep the driver involved had. A linear model was fitted,
its output is shown below. Interpret all hypotheses involved and give an interpretation
of the outcome. [8 marks]

1
> summary(lm(cost~age+experience+sleep,data=insure.df))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1005.8148 32.2365 31.201 <2e-16 ***
age 0.2005 4.1410 0.048 0.962
experience -2.5843 1.9939 -1.296 0.201
sleep -2.2768 2.6040 -0.874 0.386
---
Residual standard error: 41.98 on 46 degrees of freedom
Multiple R-squared: 0.6742,Adjusted R-squared: 0.6529
F-statistic: 31.73 on 3 and 46 DF, p-value: 2.865e-11

> diag(solve(cor(insure.df[-1])))
age experience sleep
61.074171 61.070165 1.001787

5. What do we mean by over-fitting? What is the consequence of over-fitting? [4 marks]

6. A research group compared the effect of a new drug to reduce blood pressure against a
placebo. Over 4 weeks 50 patient were administered a placebo and 50 patients received
the drug. Further, previous research had indicated that the method of administration
might have an effect too. To incorporate this, within each drug group 25 patients
received the drug orally and 25 received it via injection. The measured response is the
difference in systolic blood pressure before and after treatment. Below you find the
anova output for the study. Which model is the most adequate? Did the researchers
find evidence that the method of administration has an effect? [4 marks]

> anova(lm(bpd~drug.*meth.,data=blood.df))
Analysis of Variance Table

Response: bpd
Df Sum Sq Mean Sq F value Pr(>F)
drug. 1 2422.81 2422.81 95.0174 5.182e-16 ***
meth. 1 65.31 65.31 2.5611 0.1128
drug.:meth. 1 0.84 0.84 0.0329 0.8565
Residuals 96 2447.87 25.50

You might also like