You are on page 1of 5

1

Sept24,2009
Lecture6
REMIND Problem (Chatterjee S. & Hadi A.S., 2006): Consider a case of a company those markets and
repairssmallcomputers.Tostudytherelationshipbetweenthelengthofaservicecallandthenumber
of electronic components in the computer that must be repaired or replaced, a sample of records on
servicecallswastaken.Thedataconsistofthelengthofservicecallsinminutes(theresponsevariable)
andthenumberofcomponentsrepaired(thepredictorvariable).(see.DatainLecture4)
PREVIOUS LECTURE IN CLASS
Constructa simple linear regression model (Chatterjee S & Hadi A. S, Section 2.5, 2.7)
a. OLS estimators for the regression model
i

= 4.1S +1S.S1X


b. Calculate the coefficient of determination (R
2
) to interpret the relationship R
2
= u.9874
c. Confidence Interval for [
0

and [
1


P(-3.160103< [
0
< 11.46010)=0.95; P(14.40975 < [
1
< 16.61025)=0.95
In this lecture: We will answer some questions based on the regression model that we get.
(Chatterjee S & Hadi A. S, Section 2.6, 2.8, 2.9)

= 4.1S +1S.S1X


QUESTION 1: Does the length of service call depend on the number of computer units?
QUESTION 2: Can we expect the increase in service time for each additional unit to be repaired
is 16 minutes? Do this data support this conjecture?
QUESTION 3: Can I say what will be the length of the service call if the customer calls
regarding to 9 computer units? And what is the confidence intervals for this value with
confidence coefficient (1-)?
QUESTION 4: What if one calls for 18 computer units? What long will be the service call? And
what is the %95 confidence interval for that new observation?
QUESTION 5: What are the tools to examine the quality of fit?




2

WHAT KIND OF QUESTIONS CAN WE ANSWER BY USING THIS SIMPLE LINEAR


REGRESSION MODEL?
1) Test the dependence between X and Y.
The simple linear regression model is given by

= [
0
+[
1
X

+c

. Testing the dependence


between X and Y is equivalent to testing null hypothesis E
0
: [
1
= u versus E
1
: [
1
= u. So, if we
accept E
0
then there is no relationship between X and Y and they are independent from each
other. Otherwise, if we accept E
1
: [
1
= u then the response depends on predictor.

QUESTION 1: Does the length of service call depend on the number of computer units?
ANSWER: Use t-test for the regression coefficients.
Step 1: Construct the hypothesis
E
0
: [
1
= u (There is no relationship between X and Y)
E
1
: [
1
= u (There is relationship between X and Y. i.e. Y depend on X)
Step 2: Fix value and obtain t- critical value
value: =0.01; (is arbitrarily choosen by researcher)
t- critical value: t(0.995,12)=3.054540
Step 3: Calculation of t-test statistics
t =
[
1

-[
1
sc([
1

)

[
1

=
SX
SXX
=
1768
114
= 1S.S1
sc([
1

) =
c
SXX
=
5.3917
114
=0.504979
t =
15.51-0
0.504979
= Su.7141S t.calc=30.71
Step 4: Decision
t.calc=30.71 > t(0.995,12)=3.054540 or p.val=4.454215e-13 < /2=0.005
(Ho is rejected) By 99% confidence we can say that there is significant relationship between the
length of service call and the number of computer components.

3

2) How does the one unit increase in X affect Y?


QUESTION 2: Can we expect the increase in service time for each additional unit to be repaired
is 16 minutes? Do this data support this conjecture?
ANSWER: Use t-test for the regression coefficients.
Step 1: Construct the hypothesis
E
0
: [
1
= 16
(Each additional units to be repaired cause 16 min increase in length of service call)
E
1
: [
1
= 16 (Note that this is two-sided hypothesis)
(There is no evidence to accept that each additional units cause 16 min increase in length of
service call)
Step 2: Fix value and obtain t- critical value
=0.05; t(0.975,12)= 2.178813; t(0.025,12)= -2.178813
Step 3: Calculation of t-test statistics and p-value
t =
[
1

-[
1
sc([
1

)

[
1

=
SX
SXX
=
1768
114
= 1S.S1
sc([
1

) =
c
SXX
=
5.3917
114
=0.504979
t =
15.51-16
0.504979
= -u.97uSS74 t.calc= -0.9703
Step 4: Decision
t.calc= -0.9703 > t(0.025,12)= - 2.1788
OR
p-value = 0.1755154 > /2 = 0.025 (since it is two-sided, take /2)
Ho is accepted. So, this is a strong evidence to expect the increase in service time for each
additional unit to be repaired is 16 minutes.

4

QUESTION 3: Can I say what will be the length of the service call if the customer calls
regarding to 9 computer units? And what is the confidence intervals for this value with
confidence coefficient (1-)?
ANSWER: By using the simple linear regression model find the fitted value at the given value
of X. And construct the confidence interval for this fitted value.
min(Y) units(X)

= 4.1S +1S.S1X

Y.fit
23 1 19.66
29 2 35.17
49 3 50.68
64 4 66.19
74 4 66.19
87 5 81.7
96 6 97.21
97 6 97.21
109 7 112.72
119 8 128.23
149 9 143.74
145 9 143.74
154 10 159.25
166 10 159.25
y
11
= 14S.74
P(y
11
_t [
o
2
, n -2 - sc(y
11
)) = 1 -o
sc(y
11
) = o _
1
n
+
(x
11
-x )
2
SXX
_
12

x
11
= 9; x = 6; SXX=114; n=14; o = S.S917.
sc(y
11
) = S.S917__
1
14
+
(9 -6)
2
114
_ = 2.u9u812
t [
o
2
, n -2 = t(u.u2S,12) = 2.17881S
P(14S.74 _2.1788 - 2.u9u812) = u.9S
P(139.1845<y
11
<148.2955)=0.95
5

QUESTION 4: What if one calls for 18 computer units? What long will be the service call? And
what is the %95 confidence interval for that new observation?
ANSWER: Find the predicted value for this new value of the predictor.
y
15
= 4.1S +1S.S1 - 18
y
15
= 28S.SS min
sc(y
15
) = o _1 +
1
n
+
(x
15
-x )
2
SXX
_
12
= S.S917__1 +
1
14
+
(18 -6)
2
114
_ = 8.2S82
sc(y
15
) =8.238169
t [
o
2
, n -2 = t(u.u2S,12) = 2.17881S
P(y
15
_t [
o
2
, n -2 - sc(y
15
)) = 1 -o
P(28S.SS _2.1788t - 8.2S8169) = u.9S
P(265.3807 < y
15
< 301.2793)=0.95

QUESTION 5: What are the tools to examine the quality of fit?
ANSWER: The important tools are,
1) t-test for regression coefficients: The larger the t value or the smaller p-value, the
stronger the relationship between X and Y.
2) Correlation coefficient between Y and X: Corr(Y, X)=
SX
(SXX)(S)

i. Corr(Y, X)<0 negative relationship
ii. Corr(Y, X)>0 positive relationship
iii. Corr(Y, X) closer to 1 or -1 stronger relationship
3) Examine scaterplot Y versus
`
, the closer the set of points to a straight line the stronger
relationship.
4) Coefficient of determination (R
2
): The variability of response explained by predictor.

You might also like