Professional Documents
Culture Documents
Prepared By:
1 SHOUGANAG CHEN (DC5003)
X1 X2 X3
X4 X5
For X1
Relative Frequency Histogram of X1
0.35
0.3
Relative Frequency
0.25
0.2
0.15
0.1
0.05
0
30 60 90 120 150 180
Range
For X2
Relative Frequency Histogram of X2
0.4
0.35
Relative Frequency
0.3
0.25
0.2
0.15
0.1
0.05
0
30 60 90 120 150 180
Range
For X3
Relative Frequency Histogram of X3
0.5
Relative Frequency
0.4
0.3
0.2
0.1
0
30 60 90 120 150 180
Range
For X4
Relative Frequency Histogram of X4
0.6
Relative Frequency
0.5
0.4
0.3
0.2
0.1
0
30 60 90 120 150 180
Range
For X5
Relative Frequency Histogram of X5
0.4
Relative Frequency
0.3
0.2
0.1
0
30 60 90 120 150 180
Range
Part 1 and part 2 has the information about the descriptive statistics and frequency distribution of the given data
set. From the part 1 results, we can determine that the start-up cost for baker/donuts (X2) is the highest while
start-up cost for pet stores (X5) is the lowest. Range value is the highest for the business of baker/donuts (X2)
implies the mean of the data set for the start-up costs of this business is not representative of data. Apart from
this, range of the data set for the start-up costs of shoe stores (X3) and pet stores (X5) is lower as there is low
difference between the individual scores. In addition, higher variance and standard deviation of the data set for
the start-up costs of the business for baker/donuts (X2) indicates the existence of outliers (Newbold et al. 2012).
On the other hand, these values are lower for the data set of pet stores (X5) showing low variability in data set
in relation to the mean.
At the same time, the outcomes of part 2 indicate frequency and relative frequency of the given dataset. It
implies that data set of start-up costs for baker/donuts (X2) has outliers showing higher variability in relation to
the average value. The distribution curve is left-skewed because of presence of outliers in the given data set
(Weiers, 2010). But, the distribution curve of data set for business of pet stores (X5) is normally distributed due
to low variability in relation to the mean.
4. Test if there significant difference in the starting costs for these types of business.
H0: There is no difference in the starting costs for these types of business.
H1: There is significant difference in the starting costs for these types of business
Test:
Results:
Fvalue>Fcritical
Null hypothesis is rejected means there is significant difference in the starting costs for these types of business.
Task 2 20 marks
The data for Task 2 in the data file for Assignment represents the following variables for franchisees of All
Greens Pty Ltd: annual sales ($000), the floor area (sq.ft.000), inventory ($000), advertising expenditure
($000), the size of the area where the business operates (number of families, 000) and the number of
competitors in the area.
X1 X2 X3 X4 X5 X6
231 3 294 8.199999809 8.199999809 11
156 2.200000048 232 6.900000095 4.099999905 12
10 0.5 149 3 4.300000191 15
519 5.5 600 12 16.10000038 1
437 4.400000095 567 10.60000038 14.10000038 5
487 4.800000191 571 11.80000019 12.69999981 4
299 3.099999905 512 8.1 10.10000038 10
195 2.5 347 7.699999809 8.4 12
20 1.200000048 212 3.299999952 2.099999905 15
68 0.600000024 102 4.900000095 4.699999809 8
570 5.400000095 788 17.39999962 12.30000019 1
428 4.199999809 577 10.5 14 7
464 4.699999809 535 11.30000019 15 3
15 0.600000024 163 2.5 2.5 14
65 1.200000048 168 4.699999809 3.299999952 11
98 1.600000024 151 4.599999905 2.700000048 10
398 4.300000191 342 5.5 16 4
161 2.599999905 196 7.199999809 6.300000191 13
397 3.799999952 453 10.39999962 13.89999962 7
497 5.300000191 518 11.5 16.29999924 1
528 5.599999905 615 12.30000019 16 0
99 0.800000012 278 2.799999952 6.5 14
0.5 1.100000024 142 3.099999905 1.600000024 12
347 3.599999905 461 9.6 11.30000019 6
341 3.5 382 9.800000191 11.5 5
507 5.099999905 590 12 15.69999981 0
400 8.6 517 7 12 8
Table 1: Original datai
We have one dependent variable y, which is annual net sales, and 5 independent variables, which are number
sq. ft., inventory, amount spent on advertising, size of sales district and number of competing stores in district.
We can use Multiple Regression Model to deal with the task.
1.Present the Microsoft Excel output and write down the estimated regression equation (3 marks)
Table 2 is Microsoft Excel regression output for annual net sales (y) and number sq. ft. (2 ), inventory (3 ),
amount spent on advertising (4 ), size of sales district (5 ), and number of competing stores in district (6 ).
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.996583914
R Square 0.993179497
Adjusted R Square 0.991555568
Standard Error 17.64924165
Observations 27
ANOVA
df SS MS F Significance F
Regression 5 952538.9415 190507.7883 611.5903672 5.39731E-22
Residual 21 6541.410344 311.4957306
Total 26 959080.3519
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -18.85941416 30.15022791 -0.625514812 0.538372333 -81.56024554 43.84141723 -81.56024554 43.84141723
X2 16.20157356 3.544437306 4.570986073 0.000165985 8.830512669 23.57263445 8.830512669 23.57263445
X3 0.174635154 0.057606068 3.031540961 0.006346793 0.054836778 0.294433531 0.054836778 0.294433531
X4 11.52626903 2.5321033 4.55205324 0.000173652 6.260471952 16.79206611 6.260471952 16.79206611
X5 13.5803129 1.770456609 7.670514392 1.60543E-07 9.898446822 17.26217897 9.898446822 17.26217897
X6 -5.31097141 1.70542654 -3.114160174 0.005248873 -8.857600053 -1.764342766 -8.857600053 -1.764342766
Table 2: MS Excel regression output for annual net sales
From the Excel output we can get the estimated regression equation, which is:
As we know, R Square ( 2 ), the Coefficient of Determination, which tells us how many points fall on the
regression line. But because we have more than one x variables, we should use Adjusted R Square. Our
Adjusted R Square is 0.99, which means that 99% of the variation of y-values around the mean are explained
by the x-values. In other words, 99% of the values fit the model. Therefore, the model fits the data very well.
3.Test the hypothesis that there is no significant relationship between the dependent and any of the
independent variables (2 marks)
Annual net sales (y) is dependent variable, and the others are independent variables.
1) 0 : 1 = 2 = 3 = 4 = 5 = 0
: 0
2) = 0.05
3) P-Value = 5.39731E-22 0 < = 0.05
4) Conclusion: Reject 0 at = 0.05, there is no sufficient evidence to support 1 = 2 = 3 = 4 = 5 = 0.
Therefore, we reject that there is no significant relationship between the dependent and any of the
independent variables.
-18.86 means that without store, inventory, advertising expense, sales district
and competing stores in the district, we will gain -18.86 thousand dollars sales.
Obviously, this is meaningfulness.
Number sq. ft. 16.20 16.20 means that every single sq. ft. increasing of store area can increase
16.20 thousand dollars sales.
Inventory 0.17 0.17 means that increasing 1 thousand dollars inventory can increase 170
dollars sales.
Advertising 11.53 11.53 means that increasing 1 thousand dollars advertising expense can
expense increase 11.53 thousand dollars sales.
Size of sales 13.58 13.58 means that if the sales district increases 1000 families the sales will
district increase 13.58 thousand dollars.
Number of -5.31 -5.31 means that increasing a single competing store in the sales store can
competing decrease 5.31 thousand dollars sales.
stores in district
5.Construct a 95% confidence interval for the slope coefficients of individual variables
(3 marks)
6.Test the estimated slope coefficients for individual variables for significance (3marks)
Variables t-stat t-critical Criteria Result
Accept H0: t-stat>t-cr
Else Rejected it
Area 7.7354 2.0555 Rejected Statistically significant
Table 6 shows the results of testing the estimated slope coefficients for individual variables. From the table we
can get that the estimated slope coefficient for all variables except inventory is significant.
7. Remove all insignificant variables and re-estimate the model (1 marks)
From Table 6, inventory is not significant, we remove it and get below result in Table 7.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.995085241
R Square 0.990194637
Adjusted R
Square 0.988411844
Standard Error 20.67511795
Observations 27
ANOVA
Significance
df SS MS F F
Regression 4 949676.2208 237419.0552 555.4175271 9.57989E-22
Residual 22 9404.131054 427.4605024
Total 26 959080.3519
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept -39.460022 34.41055873 -1.14674168 0.263807827 -110.8231531 31.90310896
X2 20.44388672 3.814801407 5.359095937 2.21824E-05 12.53247282 28.35530062
X4 16.96614275 2.092787626 8.10695865 4.73185E-08 12.62596685 21.30631864
X5 15.67296189 1.90985556 8.206359798 3.85791E-08 11.71216388 19.6337599
X6 -4.04330128 1.936828415 -2.08758879 0.048629066 -8.060037571 -0.026565
From the Excel output we can get the estimated regression equation, which is:
8.Using the model from part (g), predict annual sales for a franchisee with 1,000 sq ft floor
area,
$150,000 inventory, $5,000 spent on advertising, 5,000 families in the area of operation and 2
competitors. (3 marks)
Groebner, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D., 2011. Business statistics: A decision making approach. UK: Prentice
Hall/Pearson.
Newbold, P., Carlson, W. and Thorne, B., 2012. Statistics for business and economics. UK: Pearson.