Regression 1

Regression Analysis
Intro to OLS Linear Regression
Regression Analysis
Defined as the analysis of the statistical relationship among variables In its simplest form there are only two variables:
Dependent or response variable (labeled as Y) Independent or predictor variable (labeled as X)

Statistical RelationshipsA warning

Be aware that as with correlation and other measures of statistical association, a relationship does not guarantee or even imply a causality between the variables Also be aware of the difference between a mathematical or functional relationship based upon theory and a statistical relationship based upon data and its imperfect fit to a mathematical model
Simple Linear Regression Y ! E FX I

The basic function for linear regression is Y=f(X) but the equation typically takes the following form: Y ! E FX I Y= + X+

- Alpha an intercept component to the model that represents the models value for Y when X=0 - Beta a coefficient that loosely denotes the nature of the relationship between Y and X and more specifically denotes the slope of the linear equation that specifies the model - Epsilon a term that represents the errors associated with the model
Example
i in this case is a counter representing the ith observation in the data set
Yi ! E F X i I
observation (i) number of ice cream cones sold 1 2 3 4 5 6 7 8 9 10 (Y) cost of icecream cones (X) 84 $2.50 89 $3.00 92 $3.25 96 $2.25 98 $1.75 102 $2.75 113 $2.00 114 $1.50 122 $1.25 127 $1.00
Accompanying Scatterplot
Ice Cream Demand
Ice Cream Cone's Sold 130 120 110 100 90 80
0 .5 $3 5 .2 $3 0 .0 $3 5 .7 $2 0 .5 $2 5 .2 $2 0 .0 $2 5 .7 $1 0 .5 $1 5 .2 $1 0 .0 $1 5 .7 $0
Ice Cream Cone Cost
Accompanying Scatterplot with Regression Equation

Ice Cream Demand
Ice Cream Cone's Sold 130 120 110 100 90 80 y = -16.094x + 137.81 R2 = 0.7078
0 .5 $3 5 .2 $3 0 .0 $3 5 .7 $2 0 .5 $2 5 .2 $2 0 .0 $2 5 .7 $1 0 .5 $1 5 .2 $1 0 .0 $1 5 .7 $0
Ice Cream Cone Cost
What does the additional info mean?

- Alpha 138 cones - Beta -16 cones/$1 increase in cost - Epsilon still present and evidenced by the fact that the model does not fit the data perfectly R2 - a new term, the Coefficient of Determination - a value of 0.71 is pretty good considering that the value is scaled between 0 and 1 with 1 being a model with a perfect agreement with the data

Coefficient of Determination
In this simple example R2 is indeed the square of R Recall that R is often the symbol for the Pearson Product Moment Correlation (PPMC) which is a parametric measure of association between two variables R (X,Y) = -0.84 in this case 0.84^2=0.71 We will get into why this is the case and how are these related on Thursday
A Digression into History

Adrien Legendrethe original author of the method of least squares, published in 1805
The guy that got the creditCarl-Fredrick- the giant of early statistics AKA Gauss published the theory of least squares in 1821
Back on Topic a recap of PPMC or r

From last semester:
The PPMC coefficient is essentially the sum of the products of the z-scores for each variable divided by the degrees of freedom Its computation can take on a number of forms depending on your resources

What it looks like in equation form:

z r!
r!
x
zy
( x x)( y y) r!
(n 1) s x s y
r!
2
n 1
2
sx !
( x x)
n 1
( x x)( y y ) ( x x) ( y y )
xy ( x)( y) / n x ( x ) / n y ( y )
2 2 2
/n
Mathematically Simplified
The
Computationally Easier
sample covariance is the upper center equation without the sample standard deviations in the denominator measures how two variables covary and it is this measure that serves as the numerator in Pearsons r
Covariance
Take home message

Correlation is a measure of association between two variables Covariance is a measure of how the two variables vary with respect to one another Both of these are parametrically based statistical measures note that PPMC is based upon zscores Z-scores are based upon the normal or Gaussian distribution - thus these measures as well as linear regression based upon the method of least squares is predicated upon the assumption of normality and other parametric assumptions
OLS defined
OLS stands for Ordinary Least Squares This is a method of estimation that is used in linear regression Its defining and nominal criteria is that it minimizes the errors associated with predicting values for Y It uses a least squares criterion because a simple least criterion would allow positive and negative deviations from the model to cancel each other out (using the same logic that is used for computations of variance and a host of other statistical measures) n
i !1
min (Yi Y i ) 2
The math behind OLS

Recall that the linear regression equation for a single independent variable takes this form: Y= + X+
n
Y
i!
! E FX i I
(i) 1 2 3 4 5 6 7 8 9 10
(Y) 84 89 92 96 98 102 113 114 122 127
(X) $2. $3. $3.25 $2.25 $1. 5 $2. 5 $2.00 $1.50 $1.25 $1.00
Since Y and X are known for all I and the error term is immutable, minimizing the model errors is really based upon our choice of alpha and beta
This
min (Yi ! E F X i I ) 2
i !1
is this under the condition that S is the total sum of squared deviations from i =1 to n for all Y and X for an alpha and beta n
S (E , F ) ! (Yi E F X i ) 2
i !1
The correct alpha and beta to minimize S can be found by taking the partial derivative for alpha and beta by setting each of them equal to zero for the other, yielding
(Y
i !1
E F
)!0
for alpha, and X i (Yi E F X i ) ! 0 for beta

i !1 n n
which can be further simplified to

n n n
Y
i !1
! nE F X i
i !1
for alpha and
X i Yi ! E X i F X i2
i !1 i !1 i !1
for beta
Refer to page 436 for the the texts more detailed description of the computations for solving for alpha and beta
Y
i !1
! nE F X i
i !1
n i i
X Y
i !1
! E X i F X i2
i !1 i !1
Given these, we can easily solve for the more simple alpha via algebra and since X(bar) is the sum of all X(I) from 1 to n diveded by n and the same can be said for Y(bar) we are left with
Y F X !E
Y
i !1
! nE F X i
i !1
n i
Y is
i !1
F Xi
i !1
!E
Since the mean of both X and Y can be obtained from the data, we can calculate the intercept or alpha very simply if we know the slope or beta
Once we have a simple equation for alpha, we can plug it into the equation for beta and then solve for the slope of the regression equation
i !1
i i
Y ! E
i !1
F
i !1
n i
2 i
Y
i !1
F Xi
i !1
n
n n Yi F i !1 i !1 iYi ! n n
!E
n n Y i i !1 i !1 iYi ! n 2 i n F i !1
i !1
i n i !1
F
i !1
2 i
i !1
n F i !1 n
2 i
Multiply by n and you get

n
i !1 i i i !1 n
i i
Y Yi
i !1 n i i !1 n n i i !1 2 i !1
! nF
i !1
n 2 i F i !1
isolate beta and we have
X Y Y X
n X i2 X i i !1 i !1
!F
Alpha or the regression intercept
Y F X !E
n i i i !1 n n i i !1 n n i i !1 2
X Y Y X
Beta or the regression slope
n X i2 X i i !1 i !1
!F
Given this info, lets

Head over to the lab and get some hands on practice using the small and relatively simple ice cream sales data set We will cover the math behind the coefficient of determination on Thursday and introduce regression with multiple independent variables

Regression 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression 1

Uploaded by

Copyright:

Available Formats

Regression Analysis

Intro to OLS Linear Regression

Statistical RelationshipsA warning

Simple Linear Regression Y ! E FX I

Accompanying Scatterplot with Regression Equation

What does the additional info mean?

A Digression into History

Back on Topic a recap of PPMC or r

What it looks like in equation form:

Take home message

The math behind OLS

(Y) 84 89 92 96 98 102 113 114 122 127

for alpha, and X i (Yi E F X i ) ! 0 for beta

which can be further simplified to

for alpha and

Multiply by n and you get

isolate beta and we have

Alpha or the regression intercept

Given this info, lets

You might also like

Regression 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression 1

Uploaded by

Copyright:

Available Formats

Regression Analysis

Intro to OLS Linear Regression

Statistical RelationshipsA warning

Simple Linear Regression Y ! E  FX  I

Accompanying Scatterplot with Regression Equation

What does the additional info mean?

A Digression into History

Back on Topic a recap of PPMC or r

What it looks like in equation form:

Take home message

The math behind OLS

(Y) 84 89 92 96 98 102 113 114 122 127

for alpha, and X i (Yi  E  F X i ) ! 0 for beta

which can be further simplified to

for alpha and

Multiply by n and you get

isolate beta and we have

Alpha or the regression intercept

Given this info, lets

You might also like

Simple Linear Regression Y ! E FX I

for alpha, and X i (Yi E F X i ) ! 0 for beta