Professional Documents
Culture Documents
1 November, 2011
To find out the effectiveness of two business school preparation courses. How should we obtain the sample observations in these two courses?
Should students be randomly selected as related samples or independent samples? Discuss these two sampling methods.
STAT 101 -- Part IX 2
1 November, 2011
1 November, 2011
1 November, 2011
1 November, 2011
versus regression Simple linear regression model Least square method Measure of variation in regression Coefficient of determination Standard error of estimate Assumptions and residual analysis Inference about the slope Estimation of mean values and prediction of individual values
1 November, 2011 STAT 101 -- Part X 6
Introduction
Regression analysis is used primarily for the purpose of prediction. We will develop a regression model to predict the values of a dependent (or response) variable based on the values of independent (or explanatory) variables. In this section, we only focus on simple linear regression, i.e. only one dependent variable and one independent variable.
1 November, 2011
The sample covariance measures the strength of the linear relationship between two variables (called bivariate data) The sample covariance:
Only concerned with the strength of the linear relationship No causal effect is implied
STAT 101 -- Part X 8
1 November, 2011
Sample Covariance between two variables: X and Y tend to move in the same direction
cov(X,Y) > 0
cov(X,Y) < 0
cov(X,Y) = 0
1 November, 2011
Coefficient of correlation
Measures the relative strength of the linear relationship between two variables Sample coefficient of correlation:
1 November, 2011
10
Unit free Ranges between 1 and 1 The closer to 1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship
1 November, 2011
11
Correlation analysis is used to measure relative strength of the association (linear relationship) between two variables. Correlation is only concerned with strength of the relationship No causal effect is implied with correlation Regression analysis is used to predict the dependent variable based on the independent variable Changes in dependent variable are assumed to be caused by changes in independent variable
STAT 101 -- Part X 12
1 November, 2011
Only one independent variable, X Only one dependent variable, Y Relationship between X and Y is described by a linear function: Changes in Y are assumed to be caused by changes in X
1 November, 2011
13
Types of relationships
Linear relationship
Non-Linear relationship
No relationship
1 November, 2011
14
1 November, 2011
15
1 November, 2011
16
1 November, 2011
18
1 November, 2011
20
Numerical example
The source of the data is a full page advertisement placed in the Straits Time newspaper issue of 29 February, 1992 by a Singapore-based retailer of diamond jewelry. The advertisement contained pictures of diamond rings and listed their prices, diamond content, and gold purity. There were 48 such rings of varying designs. The weights of the diamond stones ranged from 0.12 to 0.35 carats (a one carat diamond stone weights 0.2 gram) The price of diamond jewelry depends on the four Cs: carat, cut, color, clarity.
Dependent variable (Y) = price in Singapore dollars Independent variable (X)= weight in carats
STAT 101 -- Part X 21
1 November, 2011
Weight of Diamond in carats 0.17 0.16 0.17 0.18 0.25 0.16 0.15 0.19 0.21 0.15 0.18 0.28 0.16 0.2 0.23 0.29 0.12 0.26 0.25 0.27 0.18 0.16 0.17 0.16 0.17 0.18 0.17 0.18 0.17 0.15 0.17 0.32 0.32 0.15 0.16 0.16 0.23 0.23 0.17 0.33 0.25 0.35 0.18 0.25 0.25 0.15 0.26 0.15
Diamond Price in Singapore 355 328 350 325 642 342 322 485 483 323 462 823 336 498 595 860 223 663 750 720 468 345 352 332 353 438 318 419 346 315 350 918 919 298 339 338 595 553 345 945 655 1086 443 678 675 287 693 316
1000
Singapore Dollars
800
600
400
200
1 November, 2011
22
Coefficients
Intercept Weight of Diamond(g) -259.6259072 3721.024852
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
1 November, 2011
23
1200
1000
600
400
200
0 0 0.05 0.1 0.15 0.2 0.25 Weight in carats 0.3 0.35 0.4
1 November, 2011
24
1 November, 2011
25
1 November, 2011
26
The results are always presented in the Analysis of Variance table (ANOVA) Total variation is made up of two parts:
1 November, 2011
27
Measure of variation
1 November, 2011
28
Coefficient of determination
The coefficient of determination is the portion of the total variation in the dependent variable that is explained by the variation in the independent variable. The coefficient of determination is also called rsquared and is denoted as
1 November, 2011
29
Examples of r-squared
http://istics.net/stat/Correlations/
1 November, 2011
30
The standard deviation of the variation of observations around the regression line is estimated by
1 November, 2011
31
ANOVA df Regression Residual Total 1 November, 2011 1 46 47 SS 2098595.999 46635.66747 2145231.667 MS F Significance F 6.75126E-40
32
97.826% of the variation in diamond prices can be explained by the variation in weight
1 November, 2011
33
Assumptions of regression
Linear relationship The relationship between X and Y is linear (if there is any relationship) Normality of error Error values are normally distributed for any given value of X Homoscedasticity The probability distribution of errors has constant variance Independence of errors Error values are statistically independent
1 November, 2011
34
Residual analysis
The residual for observation is the difference between its observed value and its predicted value Check the assumptions of regression by examining the residuals
Examine for linearity Examine for constant variance for all levels of X Examine for normal distribution Examine for independence Plot the residuals against X
STAT 101 -- Part X 35
1 November, 2011
Non-Linear
Linear
1 November, 2011
36
Non-constant variance
Constant variance
1 November, 2011
37
Not independent
Not independent
Independent
1 November, 2011
38
Non-normal distribution
Normal distribution
1 November, 2011
39
Residual Plot
100 80 60 40 20 0 -20 -40 -60 -80 -100 0 0.1 0.2 0.3 Weight of Diamond in carat 0.4
1000
Singapore Dollars 800
600
400
200
0 0 0.1 0.2 0.3 Weight in carats 0.4
-100 -3
-2
-1
0 Z Value
Residuals
1 November, 2011
40
1 November, 2011
41
http://www.causeweb.org/repository/statjav a/Regression.html
1 November, 2011
42
1 November, 2011
43