You are on page 1of 8

University of the Free State

Department of Mathematical Statistics and Actuarial Science


STS618 / STS718
First Semester Test: 3 March 2014

Total marks: 50
Time: 90 min

1. Question 1 [10]
A study was carried out in July 2010 to investigate the profile of soccer enthusiasm
in Bloemfontein. A random sample of 1000 adults in Bloemfontein was drawn.
Among other questions, participants were asked whether or not they had tickets
for at least one of the world cup matches. Participants were cross-classified
according to gender, and whether or not they had at least one world cup ticket.
Of the 329 males in the sample, 159 had a word cup ticket. Of the females, 210
had a world cup ticket.
Summarize the data in a 2 2 table and then answer the following questions
about soccer enthusiasm in Bloemfontein:

Table 1: Data
Have world cup ticket
Gender

Yes

No

Total

Male

159

170

329

Female

210

461

671

Total

369

631

1000

(a) Estimate the probability that a male has a world cup ticket. Is this a
conditional, marginal or joint probability? [2]
Conditional probability: P(Have world cup ticket | male) = 159/329 = 0.483
(b) Estimate the probability that somebody who has no ticket is a female. Is
this a conditional, marginal or joint probability? [2]
Conditional probability: P(Female | Have no ticket) = 461/631 = 0.731
(c) Estimate the probability that an adult in Bloemfontein has a world cup
ticket. Is this a conditional, marginal or joint probability? [2]
Marginal probability: P(Have ticket) = 369/1000 = 0.369
(d) Estimate the probability of being female and having a world cup ticket. Is
this a conditional, marginal or joint probability? [2]
Joint probability: P(Female and Have ticket) = 210/1000 = 0.210
(e) Who was more likely to have a word cup ticket, males or females? Motivate
your answer. [2]
Males are more likely to have a world cup ticket. The probability of a male
having a world cup ticket is estimated as 159/329 = 0.483 (see answer to
question 1(a)); conditional probability of having a world cup ticket given
one is a female is estimated as 210/671 = 0.313, which is less than 0.483.

2. Question 2 [7]
Consider the following Model 2 2 table.
Table 2.1: Model 2 2 Table
Characteristic B
Characteristic A

Present

Absent

Total

Present

n11

n12

n1+

Absent

n21

n22

n2+

Total

n+1

n+2

(a) Under cross-sectional sampling, specify the distribution of the cell frequencies n11 , n12 , n21 and n22 . [1]
Multinomial distribution.
(b) Which total counts are considered fixed under cross-sectional sampling? [1]
Total count n.
(c) Under prospective sampling, specify the distribution of the cell frequency
n21 [1]
Binomial distribution Binomial(n2+ , 2 )
(d) Which total counts are considered fixed under prospective sampling? [1]
Row totals n1+ and n2+ (and therefore also Total count n = n1+ + n2+ .
(e) Under retrospective sampling, specify the distribution of the cell frequency
n12 [1]
Binomial distribution Binomial(n+2 , 2 )
(f) Which total counts are considered fixed under retrospective sampling? [1]
Column totals n+1 and n+2 (and therefore also Total count n = n+1 + n+2 ).
(g) Specify the distribution of the cell frequency n21 when all marginal totals
are fixed. [1]
Hypergeometric distribution

3. Question 3 [4]
Before and during the 2010 World Cup the most convenient way to acquire tickets
for a World Cup match was via the internet, and credit card payment. Therefore, a study was carried out on 27 June 2010 to investigate whether credit card
ownership was associated with having a ticket for Germanys famous World Cup
victory over England (4:1 !!) in Bloemfontein. That is, the research question
was the following: Is an owner of a credit card more likely to have a ticket and
attend the Germany-England match than somebody who did not own a credit
card. In the stadium during the match, a sample of 100 adult spectators were
asked whether or not they owned a credit card. At the some time, but outside
the stadium in the Waterfront shopping centre, a control sample of 100 adults
were asked the same question.
Was this a prospective, a retrospective or a cross-sectional study. Motivate your
answer. [4]
This was a retrospective study. The explanatory variable is ownership of credit
card, and the outcome variable is attendance at the Germany-England match.
Two samples of fixed size (100 adults each) were taken for the two outcome
categories (100 adults attending the match and 100 adults not attending the
match), and then the presence or absence of the explanatory characteristic was
determined.

4. Question 4 [13]
Let be the probability that a student registered for STK114 attends class.
A random sample of size n = 120 of students registered for STK114 is taken,
and the random variable X denotes the number of students in the sample who
are actually found to be in class . We assume that X follows the binomial
distribution. The probability that exactly X = n1 members of the sample are
observed to be in class is denoted by Bin(n1 , n, ), and is given by the probability
function
!

Prob(X = n1 ) = Bin(n1 , n, ) =

n n1
(1 )nn1
n1
n!
n1 (1 )nn1
n1 !(n n1 )!

Of the n = 120 students in the sample, n1 = 85 were found to be in class.


(a) Derive the maximum likelihood estimate for . [5]
See Lecture Notes pages 2627.
(b) We want to test the two-sided null-hypothesis H0 : = 0.8 against the
alternative HA : 6= 0.8. As test statistic we use the number n1 of students
in the sample who were in class. State the exact distribution of n1 under
the null-hypothesis [2].
Binomial distribution Bin(n1 , n, 0 ) = Bin(n1 , 120, 0.8); see Lecture Notes
page 28.
(c) In order to test the above null hypothesis, derive the exact P-value, that is,
the probability, given the null-hypothesis, of observing X = n1 , or a more
extreme outcome than n1 . (State the results in terms of the formula for the
probability function of distribution of n1 under the null-hypothesis; there is
no need to work out the actual P-value). [6]
See Lecture Notes page 28: Since
= 85/120 0.71 < 0 = 0.8, all counts
smaller than n1 = 85 are more extreme that n1 . Thus the P-value is
P =

n1
X
i=0

Bin(i, n, 0 ) =

n1
X

85
X
n!
120!
0i (10 )ni =
0.8i 0.2120i
i!(n

i)!
i!(120

i)!
i=0
i=0

5. Question 5 [16]
A cross-sectional study was carried out to investigate the attitude of students at
the UFS to compulsory class attendance. A random sample of students was drawn
from the total student population, and cross-classified according to whether or
not they were postgraduates, and whether or not they approved of compulsory
class attendance. The data for the sub-sample of students who study Actuarial
Science is as follows:
Table 5.1: Student seniority and approval of compulsory classes
Approve compulsory class attendance
Seniority

Yes

No

Total

Postgraduate student

Undergraduate student

10

Total

10

17

We want to test the null hypothesis H0 No association between the variables


Seniority and Approve compulsory class attendance. As test statistic we
choose the frequency n11 .
(a) What is the exact distribution of n11 under the null-hypothesis? [2]
Hypergeometric
(b) Why is it a good idea to carry out Fishers exact test (instead of a chi-square
test)? Motivate your answer. [3]
Three of the four expected cell frequencies are smaller than 5. For example,
m11 =

77
= 2.88 < 5
17

(c) Specify the 5 easy steps to test a null-hypothesis for Fishers exact test
for a 2 2 table. [5]
i. Null-hypothesis: H0 : No association between row and column variable
ii. Collect data: The data in Table 2 above
iii. Test statistic: count n11 (say); exact distribution of n11 : Hypergeometric
6

iv. Calculate P-value: Probability of observed n11 plus probability of all


counts in this cell of the table which are more extreme than the observed
count.
v. Reject H0 if P < 0.05.
(d) Carrying out Fishers exact test for Table 5.1 above, the table probabilities
are as follows:
Table 5.2: Table Probabilities
Table Cell

Probability

n11

n12

n21

n22

10

0.0001

0.0036

0.0486

0.2160

0.3779

0.2721

0.0756

0.0062

Calculate the exact two-sided P-value and the exact one-sided P-value for
testing the null-hypothesis. [3]
The two-sided P-value is given by the sum of the probabilities for the observed table (0.0486) plus the probabilities of the more extreme tables,
that is, tables associated with probabilities that are smaller than 0.0486.
Thus P = 0.0486 + 0.0036 + 0.0001 + 0.0062 = 0.0585.
The one-sided P-value is the sum of probabilities P = 0.0486 + 0.0036 +
0.0001 = 0.0522.
(e) What can you conclude from the data in Table 5.1 and from the result of
the hypothesis test? [3]
Considering the observed data, there is an apparent association between
seniority and approval of compulsory class attendance: 5/7=71% of postgraduate students, but only 2/10=20% of undergraduate students approve
7

of compulsory class attendance. However, at the conventional significance


level of = 0.05 we cannot reject the null hypothesis of no association between seniority and approval of compulsory class attendance, since
P = 0.0585 > 0.05 = . Thus the apparent association could have arisen
by chance, although the failure to reject the null hypothesis could also have
been caused by the small sample size.

You might also like