Professional Documents
Culture Documents
Student:
Coordinator:
January, 2014
Introduction
The subject chosen for this project is from a personal curiosity to determine whether price
influences the durability of a product. The product chosen is a product used on a frequent basis
by women- the nail polish. The reason of choosing this type of product is because women are the
respondents that usually take time to answer at questionnaires and deliver the truth behind the
proper consumption.
Database description
Data selected from the questionnaire are the three most important factors with which I will
develop this econometrics project, two of which influence the third one.
Durability- x1
Time- x2
Price- y
In this case study, there have been used 37 observations for the sample data.
Hypothesis testing
2
Based on the chosen model, we will conduct 2 hypothesis tests that reflect the importance of
analyzing certain features and assumptions related to our data.
Ist category of hypothesis testing
First of all, I will begin by making a test regarding the most used brand (from the questionnaire
provided most of the women are using Flormar nail polish because of the good relationship
quality-price) which seems to last more than the other brands mentioned. The average durability
of Flormar users is of 4.428571 minutes. Sample results for 23 observations show that the
average durability of other brands (except Flormar) was of 4.152173913 minutes, with a
standard deviation of 1.76734767 minutes. We use hypothesis testing to see whether the
result supports the results of women with an average age of 22 years old.
Survey data:
x1 4.42
x 2 4.15
n1 23
n2 14
s1 7.76
s2 3.12
The computations will be made in minutes
Step1
Initial assumption: Women of an average age of 22 years old believe that Flormar nail polish
lasts long than other brands.
Alternative hypothesis: Those women are wrong and Flormar nail polish doesnt last longer than
other brands.
Step 2
H 0 : 4.428
H 1 : 4.428
Step 3
We are in the case of a both-sided test upon the mean, because of the alternative
hypothesis.
Step 4
The significance level chosen is =5% and therefore the rejection region is (-,
-1.96)
(1.96, ).
3
Step 5
Z calc
x1 x 2
2
s1
s2
n1 n 2
4.42 4.15
7.76 3.12
0.27
10.88
0.08
Step 6
As Zcalc doesnt fall into the rejection region, we decide that we cannot reject Ho .
We do not have enough sample evidence to infere that H1 is true, nor to reject H0.
In 95% of cases we cannot say for cartain that Flormar nail polish lasts longer than
other brands, nor can we say that it lasts less.
14
0.451 45.15%
31
n 31
p
We are in the case of the test upon proportion to the left, because of the alternative
hypothesis.
Step 4
The significance level chosen is =5% and therefore the rejection region is (-,
-1.645).
Step 5
Z calc
p 0
0 (100 0 )
n
45.15 37
37(100 37)
31
8.15
0.108
75.19
Step 6
As Zcalc does not fall into the rejection region (-, -1.645), we decide that we
cannot reject Ho . We have enough sample evidence to reject H1 in 95% of the
cases.
SIMPLE
LINEAR REGRESSION
MODEL
We will firstly analyze the influence of durability upon price. This is a model with 1 regressor.
Consider the general form of the simple linear regression function:
1 2 X 2
Yi
=
Yi
The variables of this model are
X2
and
Yi
= Value of the dependent variable, price
X2
The specific model for this sample is: Price= 6.89 + 1.511Durability +
SUMMARY OUTPUT
Regression Statistics
0.25
Multiple R
87
0.06
R Square
693
5
0.04
027
12.4
494
37
Adjusted R Square
Standard Error
Observations
ANOVA
df
Regression
SS
MS
389.07
98
154.98
65
F
2.510
41
Significan
ce F
0.122091
661
Lower
95%
389.08
35
36
5424.5
5813.6
Coefficie
nts
Standa
rd
Error
t Stat
Pvalue
0.138
69
0.122
09
Residual
Total
Intercept
6.89038
4.5474
1.5152
52
Durability
1.51147
0.954
1.5844
28
2.341229
16
0.425157
03
Upper
95%
16.122
3.4480
88
Lower
95.0%
2.34122
92
0.42515
7
The level of correlation between the variables is shown by multiple-R. In this case, it is 0.25
which doesnt belong to the interval [0.75,1]. This shows a low level of correlation between the
variables.
In order to interpret the coefficients, we have to look first at the intercept. This represents the
predicted value, the price would have if durability was 0. However, since the regressor cannot be
0, the interpretation of the intercept is meaningless.
The slope is 1.511. This shows a positive correlation between Price & Durability. For any
additional unit in Durability, it would result in 1.511 units increase in Price.
In order to test the validity of the model, we shall hypothesis that all values of the Price are the
same.
H 0 : Pr ice1 Pr ice 2 Pr ice 3 ... Pr ice 37
H 1 : Pr icei Pr ice j
Upper
95.0%
16.12199
564
3.448088
081
In order to test this claim, we can compare F calculated with F critical for this model,
but also compare significance F with =5%. Significance F (0.12) >0.05, therefore
we cannot reject H0 and say with a confidence class of 95% that the model is NOT valid.
To test the inference upon the slope, we have to test the validity of the confidence class.
The confidence class is (-0.425, 3.44). This interval contains the value 0, therefore we must test
the validity. We can do this by comparing the p-value (0.12) to (0.05). P-value is higher than
0.05, therefore the inference on this slope is not valid.
20
40
60
80
100
120
Sample Percentile
20
0
-20
9 10
Durability
Price
50
Predicted Price
0
1 2 3 4 5 6 7 8 9 10
Durability
From the residual plot above, we can see that the errors are randomly scattered, therefore there is
no correlation between the errors. From the line fit plot, it can be noticed that the errors arent
equally spread around the mean, therefore the model is heteroskedastic.
Finally, I have conducted a Durbin-Watson test for this model. In the excel file, I have calculated
the d which is 2.22 for the simple regression. dL and dU are 1.217 and 1.322 respectively, thus
d being higher than dU means there is no statistical evidence to show that the errors are
positively autocorrelated.
MULTIPLE LINEAR
REGRESSION
MODEL
1 2 X 2 3 X 3
Yi
=
Yi X 2
The variables of this model are
X3
and
Yi
= Value of the dependent variable, price
X2
= Value of the independent variable, durability
X3
The specific model for our sample is: Price = 6.89 + 1.38 Durability - 0.17 Time+
In order to analyze the correlation between the variables, we look at multiple-R. In this case, it is
0.26 which doesnt belong to the interval [0.75, 1]. This shows a low level of correlation between
the variables, but slightly improved by adding an extra regressor.
To interpret the coefficients, we will first look at the intercept. This represents the predicted
value, the price would have if the 2 regressors were 0. However, since the 2 regressors cannot be
0, the interpretation of the intercept is meaningless.
The first slope is 1.38. This shows a positive correlation between Price & Durability. For any
additional unit in Durability, it would result in 1.38 units increase in Price.
The second slope is -0.17. This shows a negative correlation between Price & Time. For any
additional unit in Time, it would result in -0.17 units decrease in Price.
In order to test the validity of the model, we shall hypothesis that all values of the Price are the
same.
H 0 : Pr ice1 Pr ice 2 Pr ice 3 ... Pr ice 37
H 1 : Pr icei Pr ice j
In order to test this claim, we can compare F calculated with F critical for this model,
but also compare significance F with =5%. Significance F (0.28) >0.05, therefore
we cannot reject H0 and say with a confidence class of 95% that the model is NOT
valid.
To test the inference upon the slope, we have to test the validity of the confidence class.
The first confidence class is (-0.684, 3.44). This interval contains the value 0, therefore we must
test the validity. We can do this by comparing the p-value (0.18) to (0.05). P-value is higher
than 0.05, therefore the inference on this slope is not valid.
50
0
0
20
40
60
80
100
120
Sample Percentile
20
0
-20
10
15
20
25
30
35
Time
The second confidence class is (1.01, 0.67). This interval contains the value 0, therefore we must test the validity. We can do this
by comparing the p-value (0.68) to (0.05). P-value is higher than 0.05, therefore the inference
on this slope is not valid.
10
Price
Price 20
Predicted Price
0
0
10
15
20
25
30
35
Time
Price
50
Predicted Price
0
1 2 3 4 5 6 7 8 9 10
Durability
20
0
-20
Durability
11
10
From the normal probability, we can see that the distribution of errors is skewed to the right.
Both residual plots present random scattering of the errors, meaning there is no correlation
between the errors. Also, in both line fit plots, the errors are randomly dispersed around the
mean, showing that both models are heteroskedastic.
I have conducted a DW test for this model again and the results were the same as for the simple
regression, with d being higher than dU, meaning there is no statistical evidence to show that
the errors are positively autocorrelated.
Finally, I have analyzed the two independent variables in order to see their coefficient of
correlation, which in this case was -0.31. This shows that the two variables, durability and time,
are negatively correlated in a small percentage. It also shows that the multicollinearity
phenomenon does not occur.
Conclusion
Based on the limited sample evidence and low correlation between the variables, the test must be
repeated because we cannot be sure if time of drying and durability of the nail polish on the nail
are the only factors that influence the price of such a product.
12