You are on page 1of 14

CONFIDENTIAL

SQQS2013

UNIVERSITI UTARA MALAYSIA FINAL EXAMINATION SECOND SEMESTER 2009/2010 SESSION


CODE / COURSE NAME DATE : TIME : VENUE
INSTRUCTIONS: 1. 2. 3. 4. This book script contains FOUR (4) questions in TWENTY (20) printed pages excluding the cover page. List of formulae and distributions are provided on pages THIRTEEN (13) until TWENTY (20). Answer ALL questions in the SPACE provided. Show all the calculation (if relevant), and use FOUR (4) decimal places in your calculation.

: SQQS2013 / APPLIED STATISTICS st 1 MAY 2010 2.30 PM 5.00 PM (2 HOUR) : DMS, TE, KYM, IKIP, PMI, NEGERI, KIA, KTB

MATRIC NO. :_______________________________


(in words) IDENTITY CARD NO. : LECTURER : _____________________________ GROUP : TABLE NO. : (in numbers)

PLEASE DO NOT OPEN THIS QUESTION BOOKLET UNTIL FURTHER INSTRUCTION IS GIVEN

CONFIDENTIAL

QUESTION 1 (25 MARKS)


a) Tick

the correct answer.

i) A numerical quantity computed from the data of a sample and is used in reaching a decision on whether or not to reject the null hypothesis is referred to as: significance level critical value

test statistic

parameter (1 mark)

ii) In developing an 87.4% confidence interval estimate for a population mean, the value of z to use is 1.15 0.32
1.53

0.16 (1 mark)

iii) Given significance level 4.4%, the critical value for testing that the proportion in population A is different from population B is 2.12 1.82

2.00

1.96 (1 mark)

iv) When the p-value is found to be equal to 0.076, the result at 0.05 significance level is

reject H0

fail reject H0. to (1 mark)

v) In Levenes test of equality of variance, we conclude with the assumption that the variance of the two populations are equal when we reject H0

fail to reject H0 (1 mark)

vi) The manager of a cyber caf claims that the mean daily revenue was $700 with a standard

deviation of $70. A sample of 32 days reveals mean daily revenue of $620. The test we would use is

z-test

t-test. (1 mark)

vii) To determine if the mean test scores of English students, E, is higher than from the mean test scores of American students, A, the alternative hypothesis is H1: E A (1 mark) viii) For a left-tailed test of the difference two means of independent populations, the alternative hypothesis for the Levenes test of equality of variance is: H1: 12 = 22

H1: E < A

H1: E A

H1: E > A

H1: 12 22 2

H1: 12 < 22

H1:12 22

ix) If the lower limit of a confidence interval is interval? 320 340

(1 mark) , what is the upper limit for this 380


400

(1 mark) x) If there are two unbiased estimators, the one whose variance is smaller is said to be relatively efficient.

True

false (1 mark)

b) Mary, the owner of two laundry shops (Perfect laundry and Best Laundry) would like to

determine the number of complaint due to any unsatisfaction with her laundry services. Customer satisfaction is the key for the success. Thus, the owner of the laundry shops has set up, if the number of complaints is at most 5 per week, then the services provided by her laundry shops are success. A number of complaints for 45 weeks of her two laundry shops are given in Table 1. Table 1 Perfect Laundry
Week Number of complaint Week Number of complaint Week Number of complaint 1 4 16 0 31 0 2 2 17 3 32 0 3 0 18 2 33 4 4 7 19 6 34 5 5 6 20 5 35 5 6 0 21 1 36 7 7 6 22 0 37 5 8 4 23 1 38 0 9 2 24 0 39 5 10 6 25 4 40 5 11 5 26 1 41 4 12 2 27 6 42 5 13 5 28 0 43 6 14 1 29 4 44 2 15 0 30 6 45 1

Best laundry
Week Number of complaint Week Number of complaint Week Number of complaint 1 8 16 0 31 0 2 2 17 2 32 2 3 1 18 2 33 8 4 4 19 6 34 6 5 6 20 5 35 6 6 7 21 6 36 7 7 6 22 4 37 5 8 0 23 6 38 3 9 7 24 6 39 7 10 2 25 2 40 8 11 1 26 4 41 0 12 3 27 6 42 5 13 5 28 5 43 6 14 2 29 3 44 3 15 0 30 3 45 7

i) Construct a 90% confidence interval for the different in the proportion number of weeks with

status failure between Perfect Laundry and Best Laundry. (5 marks) -1m -1m -1m for 1.6449

-1m 3

ii) Based on your answer in (i), does Mary have any significant evidence to conclude that there is

no different in the proportion number of weeks with status failure between her two laundry shops? Give your reason. (2 marks)
-

0 is not in the interval -1m At 90% confidence level, there is not enough evidence to conclude that there is no different in the proportion number of days with status failure between the two laundries. -1m

iii) For nearly 10 years, Perfect Laundry had a good achievement in their services with at most 10% of failure for every 45 weeks the laundry operates. Can we conclude that the Perfect Laundry achievement remain the same now? Do an appropriate test at 5% significance level. (6 marks)

versus H0 : p -2m -1m Reject H0 -1m

-1m

The perfect laundry achievement has changed now.

-1m

iv) If the significance level change to 1%, is there any changes in your conclusion in (iii)? Give your reason. (2 marks) , Fail to Reject H0 -1m The conclusion in (iii) has changed at 1% level of significance -1m

QUESTION 2 (25 MARKS) a) Choose the correct answer. i) Analysis of variance is used to compare nominal data. compute t test. compare population proportion.

simultaneously compare several population means. (1 mark)

ii) In ANOVA, F statistic is used to test a null hypothesis such as:


2 2 2 H 0 : s12 = s 2 = s3 = s 4 2 2 2 H 0 : 12 2 3 4

H 0 : 1 = 2 = 3 = 4
H 0 : x1 x 2 x3 x 4

(1 mark) iii) If an ANOVA test is conducted and the null hypothesis is rejected, what does this indicate? Too many degrees of freedom No difference between the population means

Difference between at least one pair of population means None of the above (1 mark)

b) A study was conducted to compare the final scores obtained by students from 5 different

schools in four different subjects. The researchers wanted to show that schools have an effect on the scores. He believed that the subjects have an effect on the scores too. The following data represent the final scores obtained by randomly selected students from 5 different schools in Mathematics, English, Science and Biology Subject Schools 1 2 3 4 Mathematics English Science 68 57 73 83 94 91 72 81 63 55 73 77 5 Biology 61 86 59 66

5 92 68 75 87 i) Based on the data, complete the analysis of variance table. (10 marks) Source of Variation Treatment Block Error Total Sum of Square 1618.7 (2m) 42.15 (2m) 1112.1 (1m) 2772.95 Degree of freedom 4 3 12 19 (1m) and ni - school Mean of Square 404.675 (1m) 14.05 (1m) 92.675 (1m) F 4.3666 (1m)

x x

= 259

n1 = 4
j

= 354
n2 = 4

= 275

n3 = 4

= 271

n4 = 4

= 322
n5 = 4

x x

and n j - subject

= 370

n1 = 5
ij

= 373
n2 = 5

= 379

n3 = 5

x
2 ij

= 359
n4 = 5

= 1481 , n = 20

= 112441

259 2 354 2 275 2 2712 322 2 14812 SSA = + + + + = 1618.7 4 4 4 4 20 4 370 2 373 2 379 2 359 2 14812 SSB = + + + = 42.15 5 5 5 20 5 SSE = 2772 .95 1618 .7 42.15 = 1112 .1 1618.7 42.15 1112.1 MSA = = 404.675 , MSB = = 14.05 , MSE = = 92.675 4 3 12 404.675 F= = 4.3666 92.675
ii) Use a 0.05 level of significance to test the researcher interest. (4 marks)

H 0 : The school have not an effect on the scores


H 1 : The school have an effect on the scores
F = 4.3666

1m

F0.05, 4,12 = 3.2592 1m

F : 4.3666 > F0.05, 4,12 : 3.2592 , Reject H 0

1m

We conclude that the school have an effect on the scores. -1m c) A study on rental rates in four cities has been done. Based on the OUTPUT 2.1, what is your conclusion on the rental rates between the four cities at = 0.05 ? OUTPUT 2.1
Rental per month for two-bedded apartments

Between Groups Within Groups Total

Sum of Square 44947.000 378299.040 423246.040

df 3 96 99

Mean Square 14982.333 3940.615

F 3.802

Sig. .013

(4 marks)
H 0 : 1 = 2 = 3 = 4

H 1 : The rental rates is difference for different cities


p value = 0.013 1m

1m

p value : 0.013 < : 0.05 , the H 0 is rejected .

1m

We conclude that the rental rates is differ for different cities. -1m d) A researcher in a manufacturing company has done a research to study the effect of incentives given by the company on the workers productivity. To reduce the error of the experiment, the workers commitment was also considered in the study. The collected data have been analyzed using SPSS and the output is as below. OUTPUT 2.2
Tests of Between-Subjects Effects Dependent Variable: productivity

Source Model Incentive Commitment Error Total

Type III Sum of Squares df Mean Square F 543.222(a) 5 108.644 115.035 27.556 2 13.772 B D 2 C 24.482 3.776 A .944 547.000 9 A R Squared = .993 (Adjusted R Squared = .984)

Sig. .000 .015 .006

Based on the OUTPUT 2.2, find the value of A, B, C and D.


A=

3.776 =4 0.944 13.772 = 14.589 0.944


7

B=

C = 24 .482 .944 = 23 .111 D = 23.111 2 = 46.222 QUESTION 3 (25 MARKS) a) Answer the following questions. (4 marks)

i) Give one of measurement scales that can be analyzed using Chi-square test? (1 mark) Nominal or Ordinal 5). If this were not possible, what would be a (1 mark) Combine rows or columns Or increase sample sizes

ii) One guideline to ensure a good approximation to the Chi-square distribution is that expected

frequency for the ith category is at least 5 ( possible solution?

b) A social worker believes that the age distribution of regular users of marijuana in a certain

population is as follows: below 21, 30%; 21 30, 60%; 31 40, 8%; and over 40, 2% of the total population. A random sample of 300 drawn from the population yielded the age breakdown shown in Table 3.1. Do these data provide sufficient evidence to support the social workers belief at 5% level of significance? Table 3.1 Age, years Number Below 21 96 21 30 171 31 40 22 Over 40 11 (7 marks) Ho: The age distribution of regular users of marijuana is follows the social workers belief. H1: The age distribution of regular users of marijuana is different than the social workers belief. -1m The observed and expected frequencies are shown in the table below, where E = np. Below 21 Observed Expected 96 90 21 -30 171 180 31 - 40 22 24 Over 40 11 6 Total 300 300

Correct expected value -1m 2 = (O E ) 2 / E = 0.4+0.45+0.1667+4.1667=5.1833 -2m 8

The critical value:

= 7.8147

-1m

Failed to reject H o -1m There is no sufficient evidence at the 0.05 level of significance to show that the age distribution of regular users of marijuana is different than the social workers belief.. -1m We assume that the social workers belief is not true.

c) A graduate student in psychology recorded the number of people contributing to a solicitor

for a charity organization stationed in a shopping mall during the Christmas season. The numbers of people contributing during five minutes time intervals were counted. The results are shown in Table 3.2. Table 3.2 Number of contributors Number of intervals 0 15 1 30 2 36 3 33 4 22 5 12 6 6

i) Find the sample mean for number of contributors. (1 mark)

ii) Test the hypothesis that the number of people contributing during five minutes time intervals that follows a Poisson distribution at 1% significance level. (8 marks)

Number of contributors Number of intervals Pi Ei (Oi-Ei)2/Ei Correct Pi -1m Correct Ei -1m

0 1 2 3 4 5 15 30 36 33 22 12 0.0821 0.2052 0.2565 0.2138 0.1336 0.0668 12.6434 31.6008 39.501 32.9252 20.5744 10.2872 0.4392 0.0811 0.3103 0.0002 0.0988 0.2852

6 6 0.042 6.468 0.0339

H0 : The number of people contributing during five-minutes time intervals follows a Poisson distribution. H1 : The number of people contributing during five-minutes time intervals do not follows a Poisson distribution. -1m 9

= 1.2487

-2m -1m

The critical value : Failed to reject H0 -1m

We do not have enough evidence that he number of people contributing during five-minutes time intervals do not follows a Poisson distribution We assume the number of people contributing during five-minutes time intervals that follows a Poisson distribution. -1m

c) A study is conducted to see if there is any association between the colour of cars involved in accidents and the time the accidents occur. The result of the analysis is displayed in Output 3.1. Output 3.1
Car_Colour * Time Crosstabulation Time Car_Colour Bright Count Expected Count Count Expected Count Count Expected Count Morning 30 Noon 50 Night 35 120 Total 355

A
80 -

Dark

B
60.8 -

Total

Chi-Square Tests Value 30.179(a) 29.010 1.611 355 Df Asymp. Sig. (2-sided) .000 .000 .204

Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases

C
2 1

i) Based on Output 3.1, find the value of A, B and C. (3 marks)

A = 35.6338 10

B = 40

C = (r 1) (c 1) = (21)(31) = 2

ii) At 2.5% significance level test whether there is an association between the colour of cars involved in accidents and the time the accidents occur. (4 marks)

H0 : There is no association between colour of car involved in accidents and the time the accidents occur H1 : There is an association between colour of car involved in accidents and the time the accidents occur -1m P-value : 0.000 < = 0.025 -1m Reject H0 -1m There is an association between colour of car involved in accidents and the time the accidents occur. -1m QUESTION 4 (25 MARKS) a) A biologist assumes that there is a linear relationship between the amount of fertilizer supplied to tomato plants and the subsequent yield of tomatoes obtained. Eight tomato plants, of the same variety, were selected at random and treated, weekly, with solution in which fertilizer (in grams) was dissolved in a fixed quantity of water. The yield of tomatoes (in kilograms) was recorded. Tomato Plant Fertilizer (in grams) Yield of tomatoes (in kilograms) A 1.0 3.9 B 1.5 4.4 C 2.0 5.8 D 2.5 6.6 E 3.0 7.0 F 3.5 7.1 G 4.0 7.3 H 4.5 7.7

i) Identify the dependent and independent variables. (2 marks) Dependent variable: yield of tomato plant Independent variable: amount of fertilizer -- 1M -- 1M

ii) Find the Pearson correlation coefficient and interpret. (2 marks) r = 0.9444 -- 1 M The correlation coefficient suggests a strong positive linear relationship between yield of tomato plant and amount of fertilizer. -- 1M 11

iii) Test the relationship between yield of tomato plant and amount of fertilizer based on Pearson

correlation coefficient at = 5 %. H0 : = 0 vs H1: 0 -- 1M

(6 marks)

t test = (0.9444)

82 = 7.0356 1 (0.9444) 2

-- 2M -- 1M -- 1M

t 0.05/2,(8 2) = t 0.025,6 = 2.4467

t test > t so reject H0.

There is enough evidence to conclude that the positive relationship between y and x is significant. --1M

iv) Fit a least squares line. (3 marks) a = 3.2524 b = 1.0810 y = 3.2524 + 1.0810x v) Interpret the slope. (1 mark) Amount of fertilizer (x) with an increment of 1 grams will increase 1.0810 kilograms yield of tomato plant (y). -- 1M -- 1M -- 1M -- 1M

vi) Estimate the yield of plant treated, weekly, with 3.2 grams of fertilizer. (2 marks)

y = 3.2524 + 1.0810(3.2 )

-- 1M

y = 6.7116

-- 1M

b) A researcher wants to find out the relationship between concentration of cholesterol in blood

serum (y), age (x1), and body mass index (x2). He run a regression analysis using computer and the output obtained is shown in OUTPUT 4.1. OUTPUT 4.1
ANOVA(b)

12

Model 1 Regression Residual Total

Sum of Squares 23.132 26.571

df 2 27 29

Mean Square 11.566 .984

Sig. .000(a)

49.703 a Predictors: (Constant), x1, x2 b Dependent V Coefficients(a) Unstandardized Coefficients Model B 1 (Constant) x1 x2 a Dependent Variable: y -.740 .041 .201

Standardized Coefficients Beta .462 .349

t -.390 3.006 2.269

Sig. .700 .006 .031

Std. Error 1.896 .014 .089

i) Write down the estimated regression model obtained.

y = - 0.740 + 0.041 x1 + 0.201 x 2

(1 mark)
-- 1M

ii) Briefly explain about the significant of estimated regression model and its coefficients. (Use

= 5%).

(4 marks) The ANOVA table showed that the model is significant since the p-value = 0.000 less than = 5%. -- 2M

The coefficient table showed that both x1 and x2 has a significant contribution to the model since the p-value of x1 = 0.006 and p-value of x2 = 0.031 are less than = 5%. 2M

--

iii) How many variables have positive significant effect on the model at = 1%?

(1 mark)

One

-- 1M

13

iv) Interpret the effect of x2 to the model.

(1 mark) b2 = 0.201 indicates that, assuming the other variable are constant, a body mass index variability with an increment of 1 unit will increase 0.201 average concentration of cholesterol in blood serum. -- 1M

v) What is the estimated value of concentration of cholesterol in blood serum if a person is 47 years old and his body mass index is 23.1? (2 marks)

y = - 0.740 + (0.041)(47) + (0.201)(23.1)


y = 5.8301

-- 1M -- 1M

~ END OF QUESTIONS~

14

You might also like