You are on page 1of 7

BES Tutorial Sample Solutions, S2 2010

This document will be posted on the BES website with one wees delay.
WEEK 10 TUTORIAL EXERCISES (To be discussed in the week starting
September 27)
1.

State whether the normal distribution, t distribution or neither would be


used to test hypotheses regarding the population mean in the following
situations:
(a) Population normally distributed, 2 unknown, sample size less than
30.
tdistribution
Population normally distributed, 2 unknown, sample size greater than
30.
tdistributionalthoughasthesamplesizegetsverylargethiseffectively
becomesthesameasusingthenormal.
(b)

(c) Population normally distributed, 2 known, sample size less than 30.
Normaldistribution
Population not normally distributed, 2 unknown, sample size greater
than 30.
BecausethesamplesizeislargeyoucaninvoketheCLTandusethefact
that s2 is a consistent estimator of 2 to justify using the normal
distribution.
(d)

Population not normally distributed, 2 unknown, sample size less


than 30.
Herethesamplingdistributionisunknownandhencewedontknowhowto
testahypothesisabout inthiscircumstance.Inpracticeyoucouldeither
assume the population is approximately normally distributed and proceed
as in (a); or alternatively invoke the CLT and proceed as in (d). How well
eitherofthesesolutionsworksultimatelydependsonthe(unknown)extent
ofnonnormalityofthepopulationdistribution.
(e)

2.

Reconsider Question 2 of the Week 9 exercises. In that exercise, a real


estate expert claimed the current mean value of houses in a particular area
was more than $250,000. A random sample of 150 recent sales prices in
the area yielded a sample mean of $265,000 and it is known that house
values in the area are approximately normally distributed with a standard
deviation of $50,000.
(a) If in fact the population mean house value in the area is $260,000,
what is the probability of committing a type II error in performing an
upper tail test of the null hypothesis that the mean house value price
in the area is $250,000, as in Question 1 part (a) of the Week 9
exercises? What is the power of the test in these circumstances?
State in words what the power of the test means.

Let X valueofahouseinthearea
$265,000,
$50,000,
150, ~
:
250,000; :
250,000

Rejectionregion:
.

250,000

1.645

50,000
150

256,715.68

ThusTypeIIerror(ProbabilityofnotrejectingH0whenitisfalse):

256,715.68|
260,000
256,715.68 260,000
0.8
0.2119

50,000 150


1
0.7881

The power of the test gives the probability of correctly rejecting the null
hypothesiswhenitisfalse.

(b)

Illustrate your answer to part (a) above by showing on a diagram the


areas representing the probability of a type II error and the power of
the test.

Under

250,000

1power
under260,000

3.

250,000 260,000
$256,715.68

A company running an urban rail service wishes to estimate its daily


average number of late running trains on week days. For 10 randomly
selected week days, it finds the following numbers of late running trains:
32, 10, 9, 18, 25, 15, 14, 18, 22, 16
(a)

Assuming the number of late running trains on a weekday is


approximately normally distributed, calculate a 90% confidence
interval for the mean number of late running trains on a week day.

Let X numberoflatetrainsonaweekday

0.1,
17.9,
48.32,
6.9514

Since2isunknown,nissmallandtheunderlyingdistributionisnormal,we
constructtheconfidenceintervalusingthetdistribution.
3

Requiredintervalis
,

(b)

6.9514

. ,
10

6.9514

17.9 1.833
10

17.9 4.029
13.871,21.929
17.9

If we did not have the assumption of normality, could we still


calculate a confidence interval in this example? If not, suggest a way
of overcoming this problem.

Everythingelsethesame,wecouldnotconstructaconfidenceintervalinthe
same way as in (a) since the t distribution is only valid if the underlying
distributionisnormal.Thisproblemcouldbeovercomebyobtainingalarger
samplesizeandthenmakinguseofthecentrallimittheorem(andreplacing
bys).

4.

Reconsider Question 5 of the Week 8 exercises. Would normality be a good


approximation for the population distribution of distance traveled by used
passenger cars? (Hint: look at the summary statistics and a histogram.) Do
you need to assume normality? Redo the 95% confidence interval for the
population mean distance traveled by used passenger cars without assuming
a known population standard deviation.

EXCEL summary statistics and histogram for distance traveled indicate non
normality.Thedistributionisskewedtotheright,themedianismuchlessthan
themean,andthesamplemeanisonly1.35standarddeviationsfromzero:

Odometer (km)
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

78560.83
5384.86
67980
147000
58246.19
3392618896
3.426
1.528
315597
403
316000
9191617
117

Frequency histogram for odometer readings for cars in


Anzac Garage data
45
40
35

Frequency

30
25
20
15
10
5
0
20000

60000

100000

140000

180000

Odometer (kms)

220000

260000

300000


Whilethepopulationdistributionseemsnonnormal,thesamplesizeislarge
enough to invoke the CLT and hence to assume the sample mean is
approximatelynormallydistributed.

In Question 5 of the Week 8 we assumed known but here we consider the


morelikelysituationwhereitisunknownandwereplace bysascalculated
byEXCEL.The95%confidenceintervalisgivenby
/

5.

58,246

117
78561 10,554
68,007,89,115

78,561

1.96

It is known that 80% of people suffering from a particular disease are cured
by a certain medication. Calculate the probability that out of a random
sample of 400 people with the disease, less than 330 will be cured by using
the medication. (Hint: Use the normal approximation and ignore continuity
correction).
0.8,

400&

330
400

0.825

0.825

Thereforewecanusethenormalapproximationtothebinomial,i.e.
1
~
,

0.8,

0.8 0.2
400

So,ignoringthecontinuitycorrection:
0.825 0.8
1.25
0.8944
0.825
0.8 0.2 /400

(We could of course also work in terms of the binomial random variable X,
calculating
330 )
6

6.

A unisex hairdressing salon is interested in determining the proportion of its


clients who are male (p), as this will influence its advertising strategy. A
random sample of 100 of the salons clients is taken and leads to the
calculation of a confidence interval for p of (0.6102, 0.7898).
(a)

What is the value of the sample proportion on which the reported


confidence interval is based?
Sincetheconfidenceintervalforthepopulationproportionisalwayscentered
aroundthepointestimate,the isalwaysthemiddlepoint,i.e.

0.6102 0.7898
0.7

(b)

What level of confidence was used in the calculation of the reported


confidence interval?

Assuming
~

thenwehave(replacingpby ):
0.6102,0.7898
Thus0.0898

0.7

/
.

and
/

0.7 0.3

100

0.0458

0.0898
0.0458

1.96

implying/2=0.025&hence=0.05or5%.

You might also like