You are on page 1of 43

Highlight the last lecture

Two Sample Tests

Test Population Means, Independent Samples

Test Population Means, Related Samples

Test Two Population Proportions

1 November, 2011

STAT 101 -- Part IX

Discussion two samples testing of population means

To find out the effectiveness of two business school preparation courses. How should we obtain the sample observations in these two courses?

Should students be randomly selected as related samples or independent samples? Discuss these two sampling methods.
STAT 101 -- Part IX 2

1 November, 2011

1 November, 2011

STAT 101 -- Part IX

1 November, 2011

STAT 101 -- Part IX

1 November, 2011

STAT 101 -- Part IX

X: Simple Linear Regression


Correlation

versus regression Simple linear regression model Least square method Measure of variation in regression Coefficient of determination Standard error of estimate Assumptions and residual analysis Inference about the slope Estimation of mean values and prediction of individual values
1 November, 2011 STAT 101 -- Part X 6

Introduction

Regression analysis is used primarily for the purpose of prediction. We will develop a regression model to predict the values of a dependent (or response) variable based on the values of independent (or explanatory) variables. In this section, we only focus on simple linear regression, i.e. only one dependent variable and one independent variable.

1 November, 2011

STAT 101 -- Part X

The sample covariance

The sample covariance measures the strength of the linear relationship between two variables (called bivariate data) The sample covariance:

Only concerned with the strength of the linear relationship No causal effect is implied
STAT 101 -- Part X 8

1 November, 2011

Interpreting sample covariance

Sample Covariance between two variables: X and Y tend to move in the same direction

cov(X,Y) > 0

cov(X,Y) < 0
cov(X,Y) = 0

X and Y tend to move in opposite directions


X and Y may be independent

1 November, 2011

STAT 101 -- Part X

Coefficient of correlation

Measures the relative strength of the linear relationship between two variables Sample coefficient of correlation:

1 November, 2011

STAT 101 -- Part X

10

Features of correlation coefficient, r


Unit free Ranges between 1 and 1 The closer to 1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship

1 November, 2011

STAT 101 -- Part X

11

Correlation vs. regression

Correlation analysis is used to measure relative strength of the association (linear relationship) between two variables. Correlation is only concerned with strength of the relationship No causal effect is implied with correlation Regression analysis is used to predict the dependent variable based on the independent variable Changes in dependent variable are assumed to be caused by changes in independent variable
STAT 101 -- Part X 12

1 November, 2011

Simple linear regression model


Only one independent variable, X Only one dependent variable, Y Relationship between X and Y is described by a linear function: Changes in Y are assumed to be caused by changes in X

1 November, 2011

STAT 101 -- Part X

13

Types of relationships

Linear relationship

Non-Linear relationship

No relationship

1 November, 2011

STAT 101 -- Part X

14

Simple linear regression model


The Population Regression Model
Independent Variable Dependent Variable Random Error term

Population parameter: Y intercept

Population parameter: Slope coefficient

1 November, 2011

STAT 101 -- Part X

15

1 November, 2011

STAT 101 -- Part X

16

Simple linear regression equation


The simple linear regression equation provides an estimate of the population regression line

Estimated Y value for observation i

Estimate of the regression intercept

Estimate of the regression slope Value of X for observation i

The individual random error terms have a mean of zero


1 November, 2011 STAT 101 -- Part X 17

Least square method

1 November, 2011

STAT 101 -- Part X

18

Least square method

Or you can use the Excel to determine the coefficients


1 November, 2011 STAT 101 -- Part X 19

Interpretation of the slope and the intercept

1 November, 2011

STAT 101 -- Part X

20

Numerical example

The source of the data is a full page advertisement placed in the Straits Time newspaper issue of 29 February, 1992 by a Singapore-based retailer of diamond jewelry. The advertisement contained pictures of diamond rings and listed their prices, diamond content, and gold purity. There were 48 such rings of varying designs. The weights of the diamond stones ranged from 0.12 to 0.35 carats (a one carat diamond stone weights 0.2 gram) The price of diamond jewelry depends on the four Cs: carat, cut, color, clarity.

Dependent variable (Y) = price in Singapore dollars Independent variable (X)= weight in carats
STAT 101 -- Part X 21

1 November, 2011

Weight of Diamond in carats 0.17 0.16 0.17 0.18 0.25 0.16 0.15 0.19 0.21 0.15 0.18 0.28 0.16 0.2 0.23 0.29 0.12 0.26 0.25 0.27 0.18 0.16 0.17 0.16 0.17 0.18 0.17 0.18 0.17 0.15 0.17 0.32 0.32 0.15 0.16 0.16 0.23 0.23 0.17 0.33 0.25 0.35 0.18 0.25 0.25 0.15 0.26 0.15

Diamond Price in Singapore 355 328 350 325 642 342 322 485 483 323 462 823 336 498 595 860 223 663 750 720 468 345 352 332 353 438 318 419 346 315 350 918 919 298 339 338 595 553 345 945 655 1086 443 678 675 287 693 316

Relationship between weights of diamond and price in Singapore in February 1992


1200

1000

Singapore Dollars

800

600

400

200

0 0 0.1 0.2 Weight in carats 0.3 0.4

1 November, 2011

STAT 101 -- Part X

22

Coefficients
Intercept Weight of Diamond(g) -259.6259072 3721.024852

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

17.31885619 -14.99093845 2.52327E-19 81.78588037 45.4971547 6.75126E-40

-294.486956 -224.7648583 3556.398415 3885.651288

1 November, 2011

STAT 101 -- Part X

23

1200

1000

800 Singapore Dollars

600

400

200

0 0 0.05 0.1 0.15 0.2 0.25 Weight in carats 0.3 0.35 0.4

1 November, 2011

STAT 101 -- Part X

24

1 November, 2011

STAT 101 -- Part X

25

Interpretation of slope and prediction of price

1 November, 2011

STAT 101 -- Part X

26

Measures of variation in regression


The results are always presented in the Analysis of Variance table (ANOVA) Total variation is made up of two parts:

Total Sum of Squares

Regression Sum of Squares

Error Sum of Squares

1 November, 2011

STAT 101 -- Part X

27

Measure of variation

1 November, 2011

STAT 101 -- Part X

28

Coefficient of determination

The coefficient of determination is the portion of the total variation in the dependent variable that is explained by the variation in the independent variable. The coefficient of determination is also called rsquared and is denoted as

1 November, 2011

STAT 101 -- Part X

29

Examples of r-squared

http://istics.net/stat/Correlations/

1 November, 2011

STAT 101 -- Part X

30

Standard error of estimate

The standard deviation of the variation of observations around the regression line is estimated by

It is a measure of the variation of observed Y values from the regression line

1 November, 2011

STAT 101 -- Part X

31

Diamond example (Contd)

ANOVA df Regression Residual Total 1 November, 2011 1 46 47 SS 2098595.999 46635.66747 2145231.667 MS F Significance F 6.75126E-40

2098595.999 2069.991086 1013.818858

STAT 101 -- Part X

32

Diamond example (Contd)

97.826% of the variation in diamond prices can be explained by the variation in weight

1 November, 2011

STAT 101 -- Part X

33

Assumptions of regression

Linear relationship The relationship between X and Y is linear (if there is any relationship) Normality of error Error values are normally distributed for any given value of X Homoscedasticity The probability distribution of errors has constant variance Independence of errors Error values are statistically independent

1 November, 2011

STAT 101 -- Part X

34

Residual analysis

The residual for observation is the difference between its observed value and its predicted value Check the assumptions of regression by examining the residuals

Examine for linearity Examine for constant variance for all levels of X Examine for normal distribution Examine for independence Plot the residuals against X
STAT 101 -- Part X 35

Graphical analysis of residuals

1 November, 2011

Residual analysis for linearity

Non-Linear

Linear

1 November, 2011

STAT 101 -- Part X

36

Residual analysis for homoscedasticity

Non-constant variance

Constant variance

1 November, 2011

STAT 101 -- Part X

37

Residual analysis for independence

Not independent

Not independent

Independent

1 November, 2011

STAT 101 -- Part X

38

Normal probability plot of residuals

Non-normal distribution

Normal distribution

1 November, 2011

STAT 101 -- Part X

39

Diamond example (contd)


1200

Residual Plot
100 80 60 40 20 0 -20 -40 -60 -80 -100 0 0.1 0.2 0.3 Weight of Diamond in carat 0.4

1000
Singapore Dollars 800

600
400

200
0 0 0.1 0.2 0.3 Weight in carats 0.4

Normal Probability Plot


100 50 0 -50 Residuals

-100 -3

-2

-1

0 Z Value

Residuals

1 November, 2011

STAT 101 -- Part X

40

Normal Probability Plot


100 Residuals 50 0 -50 -100 -2.5 -2 -1.5 -1 -0.5 0 Z Value 0.5 1 1.5 2 2.5

1 November, 2011

STAT 101 -- Part X

41

Useful and interesting websites


http://istics.net/stat/Correlations/ Guessing the coefficient of correlation

http://www.causeweb.org/repository/statjav a/Regression.html

Changes of y-intercept and slope

1 November, 2011

STAT 101 -- Part X

42

Recommended questions from the textbook


Question 13.4; 13.6; 13.8; 13:10 13:16; 13.18; 13.20; 13:22 Page 509 510 515

1 November, 2011

STAT 101 -- Part X

43

You might also like