You are on page 1of 21

Group Assignment

Subject Name : Statistics for Business Decisions

Subject Code : HI6007

Lecturer Name : Dr Serguei Mikhailitchenko

Prepared By:
1 SHOUGANAG CHEN (DC5003)

2 MAULIK THAKAR (EGU8594)

3 KIRANDEEP KAUR (EMV8687)


Task 1
The data for task 1 in the data file for assignment represents the starting costs in thousands of dollars for
different kind of business.

1. Mean, median, mode, range, variance and standard deviation

X1 X2 X3

Mean 83 Mean 92.090909 Mean 72.3


Standard Standard Standard
Error 9.46722 Error 11.726779 Error 9.918613
Median 80 Median 87 Median 70
Mode 35 Mode #N/A Mode #N/A
Standard Standard Standard
Deviation 34.1345 Deviation 38.893327 Deviation 31.36541
Sample Sample Sample
Variance 1165.17 Variance 1512.6909 Variance 983.7889
- -
Kurtosis 1.04192 Kurtosis 0.4369227 Kurtosis -0.95897
Skewness 0.13297 Skewness 0.5098441 Skewness 0.546078
Range 105 Range 120 Range 90
Minimum 35 Minimum 40 Minimum 35
Maximum 140 Maximum 160 Maximum 125
Sum 1079 Sum 1013 Sum 723
Count 13 Count 11 Count 10

X4 X5

Mean 87Mean 51.625


Standard Error 11.3539 Standard Error 6.76872
Median 97.5Median 49
Mode 100 Mode 30
Standard Standard
Deviation 35.9042 Deviation 27.0749
Sample Variance 1289.11 Sample Variance 733.05
-
Kurtosis -0.4857 Kurtosis 0.47673
Skewness 0.07729 Skewness 0.63311
Range 115 Range 90
Minimum 35 Minimum 20
Maximum 150 Maximum 110
Sum 870 Sum 826
Count 10 Count 16
2.

a) Frequency and relative frequency distributions


b) Relative frequency Histogram

For X1
Relative Frequency Histogram of X1
0.35
0.3
Relative Frequency

0.25
0.2
0.15
0.1
0.05
0
30 60 90 120 150 180
Range

For X2
Relative Frequency Histogram of X2
0.4
0.35
Relative Frequency

0.3
0.25
0.2
0.15
0.1
0.05
0
30 60 90 120 150 180
Range
For X3
Relative Frequency Histogram of X3
0.5
Relative Frequency

0.4
0.3
0.2
0.1
0
30 60 90 120 150 180
Range

For X4
Relative Frequency Histogram of X4
0.6
Relative Frequency

0.5
0.4
0.3
0.2
0.1
0
30 60 90 120 150 180
Range

For X5
Relative Frequency Histogram of X5
0.4
Relative Frequency

0.3

0.2

0.1

0
30 60 90 120 150 180
Range

3. Results obtained in parts 1 and 2

Part 1 and part 2 has the information about the descriptive statistics and frequency distribution of the given data
set. From the part 1 results, we can determine that the start-up cost for baker/donuts (X2) is the highest while
start-up cost for pet stores (X5) is the lowest. Range value is the highest for the business of baker/donuts (X2)
implies the mean of the data set for the start-up costs of this business is not representative of data. Apart from
this, range of the data set for the start-up costs of shoe stores (X3) and pet stores (X5) is lower as there is low
difference between the individual scores. In addition, higher variance and standard deviation of the data set for
the start-up costs of the business for baker/donuts (X2) indicates the existence of outliers (Newbold et al. 2012).
On the other hand, these values are lower for the data set of pet stores (X5) showing low variability in data set
in relation to the mean.
At the same time, the outcomes of part 2 indicate frequency and relative frequency of the given dataset. It
implies that data set of start-up costs for baker/donuts (X2) has outliers showing higher variability in relation to
the average value. The distribution curve is left-skewed because of presence of outliers in the given data set
(Weiers, 2010). But, the distribution curve of data set for business of pet stores (X5) is normally distributed due
to low variability in relation to the mean.

4. Test if there significant difference in the starting costs for these types of business.

H0: There is no difference in the starting costs for these types of business.

H1: There is significant difference in the starting costs for these types of business

Test:
Results:

Fvalue>Fcritical

p-value (0.018) < p-significance value (0.05)

Null hypothesis is rejected means there is significant difference in the starting costs for these types of business.
Task 2 20 marks
The data for Task 2 in the data file for Assignment represents the following variables for franchisees of All
Greens Pty Ltd: annual sales ($000), the floor area (sq.ft.000), inventory ($000), advertising expenditure
($000), the size of the area where the business operates (number of families, 000) and the number of
competitors in the area.

Data: Table 1 is the data for the task 2.


All Greens Franchise
The data (X1, X2, X3, X4, X5, X6) are for each franchise store.
X1 = annual net sales/$1000
X2 = number sq. ft./1000
X3 = inventory/$1000
X4 = amount spent on advertizing/$1000
X5 = size of sales district/1000 families
X6 = number of competing stores in district

X1 X2 X3 X4 X5 X6
231 3 294 8.199999809 8.199999809 11
156 2.200000048 232 6.900000095 4.099999905 12
10 0.5 149 3 4.300000191 15
519 5.5 600 12 16.10000038 1
437 4.400000095 567 10.60000038 14.10000038 5
487 4.800000191 571 11.80000019 12.69999981 4
299 3.099999905 512 8.1 10.10000038 10
195 2.5 347 7.699999809 8.4 12
20 1.200000048 212 3.299999952 2.099999905 15
68 0.600000024 102 4.900000095 4.699999809 8
570 5.400000095 788 17.39999962 12.30000019 1
428 4.199999809 577 10.5 14 7
464 4.699999809 535 11.30000019 15 3
15 0.600000024 163 2.5 2.5 14
65 1.200000048 168 4.699999809 3.299999952 11
98 1.600000024 151 4.599999905 2.700000048 10
398 4.300000191 342 5.5 16 4
161 2.599999905 196 7.199999809 6.300000191 13
397 3.799999952 453 10.39999962 13.89999962 7
497 5.300000191 518 11.5 16.29999924 1
528 5.599999905 615 12.30000019 16 0
99 0.800000012 278 2.799999952 6.5 14
0.5 1.100000024 142 3.099999905 1.600000024 12
347 3.599999905 461 9.6 11.30000019 6
341 3.5 382 9.800000191 11.5 5
507 5.099999905 590 12 15.69999981 0
400 8.6 517 7 12 8
Table 1: Original datai
We have one dependent variable y, which is annual net sales, and 5 independent variables, which are number
sq. ft., inventory, amount spent on advertising, size of sales district and number of competing stores in district.
We can use Multiple Regression Model to deal with the task.

1.Present the Microsoft Excel output and write down the estimated regression equation (3 marks)
Table 2 is Microsoft Excel regression output for annual net sales (y) and number sq. ft. (2 ), inventory (3 ),
amount spent on advertising (4 ), size of sales district (5 ), and number of competing stores in district (6 ).

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.996583914
R Square 0.993179497
Adjusted R Square 0.991555568
Standard Error 17.64924165
Observations 27

ANOVA
df SS MS F Significance F
Regression 5 952538.9415 190507.7883 611.5903672 5.39731E-22
Residual 21 6541.410344 311.4957306
Total 26 959080.3519

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -18.85941416 30.15022791 -0.625514812 0.538372333 -81.56024554 43.84141723 -81.56024554 43.84141723
X2 16.20157356 3.544437306 4.570986073 0.000165985 8.830512669 23.57263445 8.830512669 23.57263445
X3 0.174635154 0.057606068 3.031540961 0.006346793 0.054836778 0.294433531 0.054836778 0.294433531
X4 11.52626903 2.5321033 4.55205324 0.000173652 6.260471952 16.79206611 6.260471952 16.79206611
X5 13.5803129 1.770456609 7.670514392 1.60543E-07 9.898446822 17.26217897 9.898446822 17.26217897
X6 -5.31097141 1.70542654 -3.114160174 0.005248873 -8.857600053 -1.764342766 -8.857600053 -1.764342766
Table 2: MS Excel regression output for annual net sales
From the Excel output we can get the estimated regression equation, which is:

= 18.86 + 16.202 + 0.173 + 11.534 + 13.585 5.316

2.How well the model fits the data? (2 marks)

As we know, R Square ( 2 ), the Coefficient of Determination, which tells us how many points fall on the
regression line. But because we have more than one x variables, we should use Adjusted R Square. Our
Adjusted R Square is 0.99, which means that 99% of the variation of y-values around the mean are explained
by the x-values. In other words, 99% of the values fit the model. Therefore, the model fits the data very well.

3.Test the hypothesis that there is no significant relationship between the dependent and any of the
independent variables (2 marks)

Annual net sales (y) is dependent variable, and the others are independent variables.

1) 0 : 1 = 2 = 3 = 4 = 5 = 0

: 0

2) = 0.05
3) P-Value = 5.39731E-22 0 < = 0.05
4) Conclusion: Reject 0 at = 0.05, there is no sufficient evidence to support 1 = 2 = 3 = 4 = 5 = 0.
Therefore, we reject that there is no significant relationship between the dependent and any of the
independent variables.

OR Table 3 shows the P-Value test and results.


Dependent Independent p-value Test Relationship
Annual sales Area 0 p-value <0.05 Significant
relationship exists
Annual sales Inventory 0.006 p-value <0.05 Significant
relationship exists
Annual sales Advertising 0 p-value <0.05 Significant
spending relationship exists
Annual sales Size of sales 0 p-value <0.05 Significant
district relationship exists
Annual sales Number of 0.005 p-value <0.05 Significant
competing relationship exists
stores
Table 3: P-Value test and results

4.Interpret individual slope coefficients (3 marks)

Table 4 shows the interpretation of individual slope coefficients.

Variables Coefficients Interpretation

Annual net sales -18.86 Set 2 = 3 = 4 = 5 = 6 = 0, y = -18.86

-18.86 means that without store, inventory, advertising expense, sales district
and competing stores in the district, we will gain -18.86 thousand dollars sales.
Obviously, this is meaningfulness.

Number sq. ft. 16.20 16.20 means that every single sq. ft. increasing of store area can increase
16.20 thousand dollars sales.

Inventory 0.17 0.17 means that increasing 1 thousand dollars inventory can increase 170
dollars sales.

Advertising 11.53 11.53 means that increasing 1 thousand dollars advertising expense can
expense increase 11.53 thousand dollars sales.

Size of sales 13.58 13.58 means that if the sales district increases 1000 families the sales will
district increase 13.58 thousand dollars.

Number of -5.31 -5.31 means that increasing a single competing store in the sales store can
competing decrease 5.31 thousand dollars sales.
stores in district

Table 4: interpretation of individual slope coefficients

5.Construct a 95% confidence interval for the slope coefficients of individual variables
(3 marks)

Table 5 shows the interval for slope coefficients of individual variables.

Variables Lowest slope coefficient Highest slope coefficient

Annual net sales -81.56024554 43.84141723


Store area 8.830512669 23.57263445

Inventory 0.054836778 0.294433531

Advertising expense 6.260471952 16.79206611

Size of sales district 9.898446822 17.26217897

Number of competing stores in district -8.857600053 -1.764342766

Table 5: the interval for slope coefficients

6.Test the estimated slope coefficients for individual variables for significance (3marks)
Variables t-stat t-critical Criteria Result
Accept H0: t-stat>t-cr
Else Rejected it
Area 7.7354 2.0555 Rejected Statistically significant

Inventory -8.2877 2.0555 Accepted Not significant

Advertising 7.6716 2.0555 Rejected Statistically significant


spending

Size of sales 7.6869 2.0555 Rejected Statistically significant


market
Number of 7.3719 2.0555 Rejected Statistically significant
competing stores
Table 6: Test the estimated slope coefficients for individual variables

Table 6 shows the results of testing the estimated slope coefficients for individual variables. From the table we
can get that the estimated slope coefficient for all variables except inventory is significant.
7. Remove all insignificant variables and re-estimate the model (1 marks)

From Table 6, inventory is not significant, we remove it and get below result in Table 7.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.995085241
R Square 0.990194637
Adjusted R
Square 0.988411844
Standard Error 20.67511795
Observations 27

ANOVA
Significance
df SS MS F F
Regression 4 949676.2208 237419.0552 555.4175271 9.57989E-22
Residual 22 9404.131054 427.4605024
Total 26 959080.3519

Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept -39.460022 34.41055873 -1.14674168 0.263807827 -110.8231531 31.90310896
X2 20.44388672 3.814801407 5.359095937 2.21824E-05 12.53247282 28.35530062
X4 16.96614275 2.092787626 8.10695865 4.73185E-08 12.62596685 21.30631864
X5 15.67296189 1.90985556 8.206359798 3.85791E-08 11.71216388 19.6337599
X6 -4.04330128 1.936828415 -2.08758879 0.048629066 -8.060037571 -0.026565
From the Excel output we can get the estimated regression equation, which is:

= 39.46 + 20.442 + 16.974 + 15.675 4.046

8.Using the model from part (g), predict annual sales for a franchisee with 1,000 sq ft floor
area,
$150,000 inventory, $5,000 spent on advertising, 5,000 families in the area of operation and 2
competitors. (3 marks)

= 39.46 + 20.442 + 16.974 + 15.675 4.046 = $183591.46


References
Black, K., 2009. Business statistics: Contemporary decision making. USA: John Wiley & Sons.

Groebner, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D., 2011. Business statistics: A decision making approach. UK: Prentice

Hall/Pearson.

Newbold, P., Carlson, W. and Thorne, B., 2012. Statistics for business and economics. UK: Pearson.

Weiers, R.M., 2010. Introduction to business statistics. USA: Cengage Learning.


i
Source:
http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/fram
e.html

You might also like