You are on page 1of 26

Department of Economics, NSU

Spring 2014. BUS173 Applied Statistics- 2- Section 16


Instructor: Humaira Husain (HHn) Lecturer, Dept. of Economics
Office: NAC 819
Consultation hours: ST: 10am- 1pm
MW 9.10AM to 9.40am & 1pm 1.30pm
Email: humaira@northsouth.edu
Course Objective:
The aim of this course is to assist students to get familiar with the standard
statistical techniques frequently used in Business and Economics. This course
introduces advanced topics in statistics and their application in Business and
Economics. The course is thoroughly application oriented and serves as a
prerequisite for Course like Introduction to Econometrics (ECO372) and higher
level quantitative research based courses in Business studies.
Upon completion of this course a student is expected to be able to carry on
Applied Statistical research on various topics in Business.
Prerequisite: BUS172
Text Book: Statistics for Business and Economics 6 th edition / 7th edition ,
authored by Paul Newbold, William . L. Carlson and Betty thorne.
References:
1.

Basic Statistics for Business and Economics

/Wathen , 7 edition

by Lind/ Marchal

th

2. Statistics for Management and Economics by Gerald Keller, 9th edition


Grading Policy: The course grade will be based NSU grading policy The
weights are as follows:
Best 1 of Quiz -1 & Quiz-2
Assignment-1

:
:

15%
10%

Mid-Term-1
Mid-Term-2
Final Examination
Total:

:
:
:

25%
25 %
25 %
100%

P.T.O.

Topics expected to be covered:


1. Hypothesis Testing part-1 (chapter-10)
2. Hypothesis Testing part-2 (chapter-11)
3. Simple regression (Chapter 12)

4. Multiple regression (Chapter 13)

5. Non parametric Test(Chapter 15)


6. Goodness of Fit and Contingency Tables (Chapter 16)
7. Analysis of Variance (Chapter 17)

Course regulations:
1. Mobile phones must be switched off during the class hour.
2. Make-up exams will be arranged only in case of emergency , subject to
submission of genuine documents. There will be NO MAKE-UP for quiz tests
under any circumstances.
3. Distracting the instructor by talking to other classmates is not allowed.
4. Students are free to consult the instructor regarding the class material only
during the office hour mentioned.
5. Attendance is important to earn satisfactory grade in this course.

Spring 2014
BUS 173 Applied Statistics -2
Worksheet -1
Instructor: Humaira Husain

Topic: Sampling Distribution of sample mean and the central limit


theorem
Q1. According to IRS study , it takes a mean of 330 minutes for taxpayers to
prepare, copy and electronically file a tax form. This distribution of times follows
Normal distribution and standard deviation is 80 minutes. A consumer
surveillance agency selects a random sample of 40 taxpayers.
a. Calculate standard error of the mean in this sample.
b. What is the likelihood the sample mean is greater than 320 minutes?
c. What is the likelihood the sample mean is between 320 and 350 minutes?
d. What is the likelihood the sample mean is greater than 350 minutes?

Q2. The rent for a one bed room apartment in California follows Normal
distribution with a mean of 2200 dollars per month and a standard deviation of
250 dollars per month. What is the probability of selecting a sample of 50 one bed
room apartments and finding the mean to be at least 1950 dollars per month?
Q3. Antelope Coffee is considering the possibility of opening a gourmet coffee
shop. Shops will be successful if per capita annual income is above $60000.
Standard deviation of income is $5000. A random sample of 36 people was
obtained and the mean income was 62300$. Does this sample provide evidence
that the shop will be opened?
Q4.Given a population with mean 100 and variance 2 81 . The central limit
theorem applies when the sample size n 25 . A random sample of size n 25 is
obtained.

a. What are the mean and variance of sampling distribution of sample mean?
b. What is the probability that x > 102?
c. What is the probability that 98 x 101?
d. What is the probability that x 101.5 ?

Spring 2014
BUS 173 Applied Statistics -2
Worksheet 2
Instructor: Humaira Husain

Topic: Estimation and Confidence Interval


Q1. A personal manager has found that historically the scores on aptitude tests
given to applicants for entry level positions follow normal distribution with a
standard deviation of 32.4 points. A random sample of nine test scores from the
current group of applicants had a mean score of 187.9 points.

a. Find 90% confidence interval for population mean score of the current group of
applicants.
b. Based on these sample results, a statistician found for the population mean
with a confidence interval extending from 165.8 to 210.0 points. Find the
confidence level of this interval.

Q2. A college admissions officer for an M.B.A program has determined that
historically applicants have undergraduate grade point averages that are
normally distributed with standard deviation 0.45. From a random sample of 25
applications from the current year, the sample mean grade average is 2.90.
a. Find 95% confidence interval for population mean.

b. Based on these sample results, a statistician computes for the population mean
a confidence interval extending from 2.81 to 2.99. Find the confidence level
associated with this interval.
Q3. The owner of Brittens egg farm wants to estimate mean number of eggs laid
per chicken . A sample of 20 chickens shows they laid on average of 20 eggs per
month with a standard deviation of 2 eggs pee month.
a. Find the value of population mean?
b. Explain why we need to use the t distribution . What assumption do you need
to make?
c. For a 95 % confidence interval , what is the value of t?
d. Develop 90% confidence interval for population mean.

Spring 2014 BUS 173 Applied Statistics -2 , Worksheet 3


Instructor: Humaira Husain
Topic: Hypothesis Testing
Q1. Test the Hypotheses

H0: 100
H1: > 100

Using a random sample of size n 25 , a probability of Type error equal to .05, and the
following sample statistics.
a. x 106 ,

s 15
b. x 104 , s 10
c. x 95 , s 10
d. x 92 , s 18
Q2. Test the Hypotheses

H 0 : 100

H 1 : < 100
Using a random sample of size n 36 , a probability of type error equal to 0.05 and the
following sample statistics.
a. x 106 ,

s 15
b. x 104 , s 10
c. x 95 , s 10
d. x 92 , s 18
Q3. The accounts of a corporation show that , on average, accounts payable are $125.32.
An auditor checked a random sample of 16 of these accounts. The sample mean was
$131.78 and the sample standard deviation was $25.41. Assume that the population
distribution is normal . Test at the 5% significance level against a two sided alternative
the null hypothesis that the population mean is $125.32.
Q4. A process that produces bottles of shampoo when operating correctly , produces
bottles whose contents weigh, on average , 20 ounces. A random sample of nine bottles
from a single production run yielded the following content weights (in ounces):

21.4 19.7
19.7
20.6
20.8
20.1
19.7
20.3
20.9
Assuming that the population distribution is normal, test at the 5% level against a two
sided alternative the null hypothesis that the process is operating correctly.
Q5. In contract negotiations a company claims that a new incentive scheme has resulted
in average weekly earnings of at least $400 for all customer service workers. A union
representative takes a random sample of 15 workers and finds that their weekly earnings
have on average of $381.35 and a standard deviation of $48.60. Assume a normal
distribution.
a. Test the companys claim.
b. If the same sample results had been obtained from a random sample of 50 employees,
could the companys claim be rejected at a lower significance level than that used in part
(a)?

Q6. The production manager of northern Windows has asked you to evaluate a proposed
new procedure for producing its regal line of double hung windows. The present process
has a mean of production of 80 units per hour with a population standard deviation of
8 . Is there any strong evidence that the mean production level is higher with the
new process.? Consider level of risk to be 5% and sample size is 25 and the resulting
sample mean is 83 .
Q7. The production manager of twin Forks ball bearing has asked your assistance in
evaluating a modified ball bearing production process. Ball bearings weights are
normally distributed with mean of 5 ounces and standard deviation of 0.1 ounces. A new
raw material supplier was used for a recent production run and the manager wants to
know if that change has resulted in a lowering of the mean weight of ball bearings
.Consider level of risk to be .05. and sample size = 16 and the sample mean is 4.962.
Q8. The production manager of Circuits unlimited has asked your assistance in
analyzing a production process. This process involves drilling holes whose diameters are
normally distributed with population mean 2 inches and population standard deviation
of 0.06 inches. A random sample of 9 measurements had a sample mean of 1.95
inches.Use a significance level of 5% to determine if the observed sample mean is
unusual and suggests that the drilling machine should be adjusted.

Q9. You have been asked to evaluate single employer plans after the establishment of the
health benefit guarantee corporation. A random sample of 76 percentage changes in
promised health benefits was observed. The sample mean percentage change was .078
and the sample standard deviation was .201. Find and interpret the p value of a test of
null hypothesis that the population percentage change is 0 against the two sided
alternative.
Q10 A sample of 64 observations is selected from normal population. The sample mean
is 215 and population standard deviation is 15. Conduct the following test of hypothesis
using 3% level of significance.
H0: 220
H1: < 220

Spring 2014 BUS 173 Applied Statistics -2 , Worksheet 4


Instructor: Humaira Husain
BUS173

Topic: Hypothesis Testing

Worksheet-4

[Computing Type 2 Error and the p value of the test.]


[Reference: Keller, G ,9th edition]
Q1. Calculate the probability of Type 2 Error for the following test of hypothesis, given
that 203.
H 0 : 200
H 1 : 200

.05
10
n 100

Q2. A statistics practitioner wants to test the following hypotheses with 20 and
n 100 .
H O : 100

H 1 : > 100
a. using 0.10 , find the Type 2 Error for the following test of hypothesis, given that
102.
b. Repeat part a with 0.02
c. Describe the effect on of decreasing

d. Describe the effect on of increasing sample size to 200.

Q3. Determine for the following test of hypothesis, given that 48 .


H O : 50

H 1 : < 50

.05

, 10 , n 40

Q4. Compute the p value in order to test the following hypotheses given that x 52 ,
n 9 and 5 , .03 .
H O : 50

H 1 : > 50
a. repeat part a with n 25.
b. repeat part a with n 100.
c. Describe what happens to the value of the test statistic and p value when the sample
size increases.
Q5. A statistics practitioner formulated the following hypotheses and learned that
x 190 , n 9
and 50 . Compute the p value in order to test the following
hypotheses.
H O : 200

H 1 : < 200
a. repeat part a with n 30
b. repeat part a with n 10

c. Describe what happens to the value of the test statistic


standard deviation decreases.

Q6.

and p value

when the

H O : 200

H 1 : < 200
Find the probability of a type 2 error for the following test of hypothesis given that
196. consider significance level to be 10%, population standard deviation is 30 and the
sample size is 25.
a. repeat part (a) with sample size =100.
b. describe the effect on error type 2 of increasing sample size.

Q7. Compute the p value in order to test the following hypotheses given that x 990 ,
n 100 and 50 .
H O : 1000

H 1 : < 1000
a. repeat part a with 50
b. repeat part a with 100
c. Describe what happens to the value of the test statistic
standard deviation increases.

and p value

when the

Spring 2014 BUS 173 Applied Statistics -2 , Worksheet 5


Instructor: Humaira Husain
Topic: Two sample test of Hypothesis (Hypothesis testing II )
Q1. In random samples of 12 from each of two normal populations , we found
following statistics:
x1
x2

= 74

s1 18

= 71

s 2 16

a. Test with .05 to determine whether we can infer that population means
differ.
b. Repeat part (a) increasing standard deviations to s1 210 and s 2 198 ,
describe the result.
c. Repeat part (a) with sample size=150 and discuss the effect of increasing
sample size.
d. Repeat part(a) changing the mean of sample1 ( x1 ) to 76. Discuss the effect of
increasing x1 .
Q2. A number of restaurants feature a device that allows credit card users to
swipe their cards at the table. It allows user to specify a percentage or a dollar
amount to leave as a tip. In an experiment to see how it works , a random sample
of credit card users was drawn. Some paid the usual way and some used new
device. The percent left as a tip was recorded and listed below. Can you infer that
users of new device leave larger tips?
Usual

10.3

15.2

13

9.9

12.1

13.4

Device 13.6

15.7

12.9

13.2 12.9 13.4

12.2

14.9

13.2 12.0

12.1

13.9 15.7

15.4

17.4

Q3. Every month a clothing store conducts an inventory and calculates losses
from theft. The store would like to reduce these losses and is considering two
methods . The first is to hire a security guard, and the second is to install
cameras. To help decide which method to choose , the manager hired a security
guard for 6 months . During the next 6 month period, the store installed cameras.
The monthly losses were recorded and are listed here. Manager decided that

because the cameras were cheaper than the guard, he would install the cameras
unless there was enough evidence to infer that the guard was better. What
should the manager do?
Security guard

355

284

401

398

477

254

Cameras

486

303

270

386

411

435

Q4. How do drivers react to sudden large increases in the price of gasoline? To
help answer the question, a statistician recorded the speeds of cars as they
passed a large service station. He recorded the speeds (mph) in the same location
after the service station sign showed that the price of gasoline had risen by 15
cents. Can we conclude that speeds differ?

Speeds before price increase


43

36

31

30

28

36

27

36

35

30

32

36

26

30

32

30

Speeds after price increase


32

33

36

31

32

29

28

39

Q5. An investigation of the effectiveness of an antibacterial soap in reducing


operating room contamination resulted in the accompanying table. The new soap
was tested in a sample of eight operating rooms in the greater Seattle area
during the last year .
Operating Room
A

Before

6.6

6.5

9.0

10.3

11.2

8.1

6.3

11.6

After

6.8

2.4

7.4

8.5

8.1

6.1

3.4

2.0

At the .05 significance level, can we conclude the contamination measurements


are lower after use of the new soap?
(To solve above problems assume that population variances are unknown but
2
2
EQUAL so 1 2 and use t distribution with n1 n2 2 df )
Reference: 1. Gerald Kellers Statistics for Management and Economics 9 th
edition Chapter 13s problem exercises.
2. Lind / marchal / Wathen , Basic Statistics for Business & Economics Chapter
11s exercises .
( Z statistic is rarely used in TWO SAMPLE test of hypothesis because in most
cases population variances are NOT known.)
Worksheet-6 BUS173.3
Topic: Simple regression

Instructor: Humaira Husain


Q1. It was hypothesized that the number of bottles of an imported premium beer
sold per evening in the restaurants of a city depends linearly on the average costs
of meals in the restaurants . The following results were obtained for a sample of
n 17 restaurants , of approximately equal size, where

y Number of bottles sold per evening

x Average cost , in dollars, of a meal


x 25.5

x x
n

i 1

n 1

= 16.0

x
n

350

i 1

x yi y
n 1

180

a. Find the sample regression line.


b. Interpret the slope of the sample regression line .

c. Is it possible to provide a meaningful interpretation of the intercept of the


sample line?
Q2. Find and interpret the coefficient of determination for the regression of DVD
system sales on price, using the following data.
Sales
Price

420
5.5

380
6

350
6.5

400
6

440
5

380
6.5

450
4.5

420
5

Q3 A fast food chain decided to carry out an experiment to assess the influence of
advertising expenditure on sales . Different relative changes in advertising
expenditure , compared to the previous year were made in eight regions of the
country and resulting changes in sales levels were observed . The accompanying
table shows the results.

Increase in advertising
Expenditure %
0
Increase in sales % 2.4

4
7.2

14
10.3

10
9.1

9
10.2

8
4.1

6
7.6

1
3.5

a. Estimate by least squares the linear regression of increase in sales on


increase in advertising expenditure.
b. Find a 90% confidence interval for the slope of the population regression
line.
Q4. A sample of 25 blue collar employees at a production plant was taken. Each

employee was asked to assess his or her own job separation x on a scale from 1
to 10. In addition, the number of days absent y from work during the last year
were found for these employees. The sample regression line y hat = 12.6 1.2 x
was estimated by least squares for these data. Also found were

= 6.0

x
25

i 1

x = 130.0

SSE 80.6

Test at the 1% significance level against the appropriate one-sided alternative


the null hypothesis that job satisfaction has no linear effect on absenteeism.

Q5 For problem number 1 it was found that

n 1

250

Test against a two-sided alternative the null hypothesis that the slope of the
population regression line is 0.
Q6. It might be that watching television reduces the amount of physical exercise,
causing weight gains. The number of pounds each child was overweight was
recorded ( a negative number indicates the child is underweight ). In addition, the
number of hours of television viewing per week was also recorded. These data are
listed here.
Television
42
38 28 29

34

Overweight
8
5
3

18

Television

36

18

25

35

-1

37

13

38

31

33

19

29

14

-9

Overweight
14
-7
a)Calculate the sample regression line and describe what the coefficients tell you
about the relationship between the two variables.
b) Determine the coefficient of determination and describe what it tells you.
c) Conduct a test to determine whether there is evidence of a linear relationship
between weight and watching television.
d) Estimate or predict with 90% confidence the Mean overweight for children who
are watching television 20 times per week.
Worksheet -7

Topic: Simple regression Reference: G. Keller


Instructor: Humaira Husain

Spring 2014

Q1. In an attempt to determine the factors that affect the amount of energy used,
200 households were analyzed. In each the number of occupants and the number
of electricity used measured. We have the following sample statistics: x bar = 4.75
, y bar = 762.6,
variance of x = 4.84 , Variance of y = 56725, covariance
between x and y = 310.0
a) Determine the regression line and interpret the results.
b) Assess the fit of the regression line (Compute the standard error of the
estimate and R square )
c) Estimate the mean number of electricity consumption for households with
90% confidence when the number of occupants = 5.

Q2. An economist of the federal government is attempting to produce a better


measure of poverty than is currently in use. To help acquire information she
recorded the annual household income ( in thousand dollars) and the amount of
money spent on food during one week for a random sample of households. We
have the following sample statistics:
x bar = 59.42 , y bar = 270.3
variance of x =, 115.24 , Variance of y =
1797.25, covariance between x and y = 225.66 , n = 150

a) Determine the least square line and interpret the coefficients.


b) Determine the coefficient of determination and describe what it tells you.
c) Conduct a test whether there is evidence of linear relationship between
household income and food budget.
d) Predict or forecast the mean food budget of a family when the household
income is 50000 dollars. Use 90% confidence level.
Q3.An economist wanted to investigate the relationship between Office rents (the
dependent variable) and vacancy rates . Accordingly he took a random sample of
monthly office rents and the percentage of vacant office space in 30 different
cities. The sample statistics are followings: x bar =11.33
, y bar = 17.2 ,
variance of x = 35.47 , Variance of y = 11.24, covariance between x and y = 10.78 , n = 30 .

a) Determine the least square line and interpret the coefficients.


b) Can we infer that office rents and vacancy rates are linearly related ?
c) Forecast or predict the mean office rent when the vacancy rate is 10% with
95% confidence.

Worksheet -8 (For Final examination)


Topic: Multiple regression

Reference: Paul Newbold and G. Keller


Instructor: Humaira Husain

Spring 2014

Q1. The following model was estimated to a sample of 30 families in order to


explain household milk consumption:
y i o 1 x1i 2 x 2i i

Where y i = Milk consumption, in quarts per week


x1 = Weekly Income.( in hundreds of dollars)
x 2 = Family size
The least square estimates of the regression parameters were followings:
b0 0.025

b1 0.052

b2 1.14

a) Interpret the estimates b1 and b2 .


b) is it possible to provide a meaningful interpretation of the estimate b0 ?

In the above problem the SST = 162.1

and

SSR = 88.2

c) Find and interpret the coefficient of determination.


d) Find the adjusted coefficient of determination.
In the above problem the standard errors are followings:
S b1 0.023

S b2 0.35

e) Test against the appropriate one sided alternative the null hypothesis that for
fixed family size , milk consumption does not depend linearly on Income.
f) Find 95% confidence Interval for 2 .
Q2. A study was conducted to determine whether certain features could be used
to explain variability in the prices of furnaces. For a sample of 19 furnaces the
following regression model was estimated

y i 68.286 0.0023 x1 19.729 x 2 7.653 x3

Where

R 2 0.84

yi =

Price in dollar
x1 = Rating of furnace
x 2 = Energy efficiency ratio
x 3 = Number of settings

a) Find a 95% confidence interval for the expected increase in price resulting from
an additional setting when the values of the rating and the energy efficiency ratio
remains fixed.
b) Test the null hypothesis that all else being equal the energy efficiency ratio of
furnaces does not affect their price against the alternative that the higher the
energy efficiency ratio the higher the price.
Q3. In Question number 1
a) Test the null hypothesis H o : 1 2 0
b) Set out the analysis in variance table.
Q4. The president of a company that manufactures the drywall wants to analyze
the variables that affect demand for his product . Drywall is used to construct the
walls in houses and offices. Consequently the president decides to develop a
regression model in which dependent variable y i is monthly sales of drywall and
followings are the independent variables:
x1
x2
x3

x4

= Number of building permits issued in the county


= Five year mortgage rates
= Vacancy rate in apartments (in %)
= Vacancy rate in Office buildings (in %)

Following is the computer generated output of the sample regression


R square = .8935 Adjusted R square = .8711
S of epsilon = 40.13
F = 39.86

Intercept
Permits
Mortgage

Coefficients
-111.83
4.76
16.99

Standard error
134.34
.395
15.16

t statistic
-.83
12.06
1.12

Apartment vacancy
Office vacancy

-10.53
1.31

6.39
2.79

-1.65
.47

a) Analyze the data using multiple regression .


b) What is the standard error of the estimate ? Can you use this statistic to assess
the models fit?
c) Interpret R square value.
d) Test the overall validity of the model. ( Conduct the F test)
e) Interpret each of the coefficients.
f) Test to determine whether each of the independent variables is linearly
related to drywall demand in this model.
g) Predict next month drywall sales with 95% confidence if the number of
building permits is 50 and 5 year mortgage rate is 9% , vacancy rates are 3.6% in
apartments and 14.3% in Office buildings.

Question:5
Consider the following software generated result sheet of a multiple
regression.
Analysis of Variance
SOURCE
Regression
Error

DF
5
20

SS
100
40

MS
20
2

Predictor

Coef

St Dev.

Constant
X1
X2
X3
X4

3.00
4.00
3.00
0.20
-2.50

1.50
3.00
0.20
0.05
1.00

t statistic
2.00
1.33
15.00
4.00
-2.5

X5

3.00

4.00

0.75

a. Compute the multiple standard error of the estimate.


b. Write the Population regression model and the sample regression line.
c. Check whether the model is valid or not.

d. Test the regression coefficients individually. Would you consider omitting any
variables? If so, which one(s)? use .05 significance level.
e. Compute the R square and adjusted R square and interpret the result.
f. What assumption you have regarding the model error variance?
g.

Write briefly what is the basic

problem you will have if the standard

assumptions of the multiple regression model do not hold in the above case for
example heteroscedasticity and if the residual terms are serially correlated .
[ E( i j ) 0 ]
DF = degrees of freedom
SS = Sum Square
MS= Mean square
Coef= coefficient
St. Dev = Standard deviation

Worksheet - 9 (Final Exam)

Topic: Analysis of variance (One way)


Reference: Keller and Newbold
Instructor: Humaira Husain

Q1.A statistics practitioner computed the following statistics:


Treatment
_________________________________
Statistic
1
2
3
_________________________________
Sample size
X

10

15

20

S2

50

50

50

a) Complete the ANOVA table


b) Repeat part a by increasing the sample size to 10.
c) Describe what happens to F statistic when sample size increases.
Q2. You are given the following statistics
Treatment
_________________________________
Statistic

_________________________________
Sample size
4
4
4
20

S2

10

22

25

10

10

a) Complete the ANOVA table.


b) Repeat part a by changing the variances to 25.
c) Describe what happens to F statistic of increasing the sample variance.

Q3. A management scientist believes that one way of judging whether a computer
came equipped with enough memory is to determine the age of the computer. In a
preliminary study random sample of computer users were asked to identify the
brand of the computer and its age (by months).the categorized responses are
shown below. Do these data provide sufficient evidence to conclude that there are
differences in age between the computer brands? Use level of Significance = .05

IBM

DELL

HEWLETT - PACKARD

OTHER

17

24

10

15

12

13

21

15

Q4. In early 2001 the economy was slowing down and companies were laying off
workers . A gallup poll asked a random sample of workers how long it would be
before they had significant financial hardships if they lost their jobs and could not
find new ones. They also classified their income. The classifications are
More than 50000$
30000$ to 50000$
20000$ to 30000$
Less than 20000$
Sample

xi

S i2

ni

____________________________________________________
1

22.21

121.6

39

18.46

90.39

14

15.49

85.25

81

9.31

65.4

67

Can we infer that differences exist between the four groups?

Q5. Given the following analysis of variance table


Source of variation
Between groups
Within groups
Total

Sum of Squares
1000
750
1750

Degrees of freedom
4
15
19

Compute mean squares for between groups and within groups . Compute the F
ratio and test the hypothesis that the group means are equal.

Q6. A corporation is trying to decide which of three makes of automobile to order


for its fleet- domestic, Japanese or European. Five cars of each type were ordered
and after 10000 miles of driving the operating cost per mile of each was assessed .
The accompanying results in cents per mile was obtained.
Domestic

Japanese

European

_________________________________________________

18
17.6
17.4
19.1
16.9

20.1
17.6
16.1
17.3
17.4

19.3
17.4
17.1
18.6
16.1

a) Set out the analysis of variance table for these data


b) Test the null hypothesis that the population mean operating costs per mile are
the same for these three types of car.

Assignment-1 (Marks 10)


Applied Statistics- II

Topic: Multiple regression


Consider the following software generated result sheet of a multiple
regression.

Analysis of Variance
SOURCE
Regression
Error

DF
5
20

SS
100
40

MS
20
2

Predictor

Coef

St Dev.

Constant
X1
X2
X3
X4
X5

3.00
4.00
3.00
0.20
-2.50
3.00

1.50
3.00
0.20
0.05
1.00
4.00

t statistic
2.00
1.33
15.00
4.00
-2.5
0.75

a. Compute the multiple standard error of the estimate.

b. Write the Population regression model and the sample regression line.
c. Check whether the model is valid or not.

d. Test the regression coefficients individually. Would you consider omitting any
variables? If so, which one(s)? use .05 significance level.
e. Compute the R square and adjusted R square and interpret the result.
f. What assumption you have regarding the model error variance?

g.

Write briefly what is the basic

problem you will have if the standard

assumptions of the multiple regression model do not hold in the above case for
example heteroscedasticity and if the residual terms are serially correlated .
[ E( i j ) 0 ]
DF = degrees of freedom
SS = Sum Square
MS= Mean square
Coef= coefficient
St. Dev = Standard deviation
IMPORTANT GUIDELINES!

Use A4 plain paper.


Assignment should be submitted in formal binding (Spiral binding)
Write your Name and ID. Do NOT include any cover page.
Exactly same answers of the assignments from students will result deduction in marks
accordingly. You are not allowed to ask questions regarding assignment to your
instructor. Please read the relevant handouts.

You might also like