You are on page 1of 17

Simple regression

Statistics for dummies

Statistics

Gabriel V. Montes-Rojas

Gabriel Montes-Rojas Statistics


Simple regression
Statistics for dummies

y = β0 + β1x + u
Much of applied econometrics is concerned with the linear simple
regression model that explains the relationship between y and x:

y = β0 + β1 x + u

where

y x
dependent variable independent variable
explained variable explanatory variable
response variable control variable
regressand regressor or covariate

u is called the error term, residual or disturbance and represents


all other factors, different from x that affect y .
Gabriel Montes-Rojas Statistics
Simple regression
Statistics for dummies

y = β0 + β1x + u

Our interest is the effect of x on the variable y on some


population. The error term, u is assumed to have no systematic
influence on y and therefore, only x is of importance. Then, we
believe that y ≡ f (x ) = β 0 + β 1 x.
The following definitions will be used extensively during the course:
β 0 is the intercept, f (0) = β 0 .
This represents the value of y when x is set at 0.
β 1 is the slope, ∆y
∆x = β 1 .
This represents the unit change in y after a unit change
in x.

Gabriel Montes-Rojas Statistics


Simple regression
Statistics for dummies

Gabriel Montes-Rojas Statistics


Simple regression
Statistics for dummies

Example 2.7 (p.41 in Wooldridge): Returns to education

wage = β 0 + β 1 educ + u

Wages are expected to be an increasing function of education,


i.e. more education means on average higher wages. Then, in
this linear model, we expect that β 1 > 0.
What does u mean? Other factors, different from education,
that affect wages, such as age or ability.

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Statistics for dummies

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Random variables (RV)


Why do we need random variables in Econometrics????
We will (almost) never observe the whole population, only a
small portion of it
A random sample is a subset of a population
If we consider the random variable X , a random sample is
{xi }ni=1 or x1 , x2 , ..., xn that consists of n realisations of the
variable X , which are indexed by i.
Example: If X is the return of an asset, a random sample are
actual observations in the market about the asset returns. Say
for a sample of three observations
x1 = $ 1000, x2 = −$ 567, x3 = $ 0
Example: Flipping a coin: let X = 0 be HEADS and X = 1
be TAILS. Then, X = {0, 1}. Moreover,
P [X = 0] = P [X = 1] = 0.5. (This is called the Bernoulli
distribution).
Gabriel Montes-Rojas Statistics
Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Discrete vs Continuous RVs

A discrete random variable is one that takes on only a finite or


countably infinite number of values.

Example: Flipping a coin: let X = 0 be HEADS and X = 1 be


TAILS. Two possible values: 0 or 1.

Example: Number of £50 bills in your wallet: X can take any


number in 0, 1, 2, 3,..., ∞

Each outcome of X has an associated probability.


pj = P (X = xj ), j = 1, ..., k. This probability measure satisfies:
pj ≥ 0, j = 1, 2, ..., k
∑kj=1 pj = 1

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Discrete vs Continuous RVs

A continuous random variable is one that takes on any real


value.

Let X be a continuous random variable. Its probability measure is


described by a density function f (X ) that satisfies
f (x ) ≥ 0 for all x ∈ X , where X is the domain of X , usually
X =R
R
X
f (x )dx = 1

Although the density function acts as a probability of each value of


x, it has a tricky interpretation, because there are so many values
in X , that individually each one has probability zero (?!).

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Expectation of a RV

Random variables can be described by some of its features:

Expectation: E [X ]

What value should we expect from X ? If we have a considerable


amount of draws from the X random variable, what would be their
average?
For the coin example:
E [X ] = 0 × P [X = 0] + 1 × P [X = 1] = 0 × 0.5 + 1 × 0.5 = 0.5.
For the discrete RVs: E [X ] = ∑kj=1 xj × P [X = xj ].
R
For the continuous RVs: E [X ] = X xf (x )dx.

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Property of expectation: Let A and B be two random variables,


and c and d two constants. Then, E [cA + dB ] = cE [A] + dE [B ].
Property of expectation: Let A and B be two independent
random variables. Then, E [A × B ] = E [A] × E [B ].

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

An estimator of the expectation of a random variable X is the


sample average.
Given a random sample {xi }ni=1 , define x̄ = n−1 ∑ni=1 xi which is
simply the average.

An estimator µ̂ is unbiased for a given parameter µ if E (µ̂) = µ

In words, if we consider all possible random samples, on average,


we will obtain the parameter we want to estimate.
In our case, we can prove that E (x̄ ) = E (X ).
Proof:...

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Variance of a RV

However, for a given realisation of X , defined as x, we may have


that x 6= E [X ].
But, how much does this random variable deviate from the E [X ]?

Variance: Var [X ] ≡ E [(X − E [X ])2 ]

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Prove that Var [X ] = E [X 2 ] − (E [X ])2 .


Property of variance: Var [aX ] = a2 × Var [X ]
Property of variance:
Var [aX + bY ] = a2 × Var [X ] + b 2 × Var [Y ] + ab × Cov [X , Y ],
where Cov [X , Y ] = E [XY ] − E [X ]E [Y ]

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Covariance

The covariance of the random variables A and B measures how


much co-movement they have.

Covariance: Cov [Y , X ] ≡ E [YX ] − E [Y ]E [X ]

Property of covariance: Let A and B be two independent random


variables. Then, Cov [A, B ] = 0.

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

In the simple regression model...

In the simple regression model, Y , X and U are random


variables. β 0 and β 1 are population parameters, i.e. constants
that describe the relation between Y and X . Then,

E [Y ] = E [ β 0 + β 1 X + U ] = β 0 + β 1 E [X ] + E [U ]
(Since U captures other factors, we will assume that E [U ] = 0.)
However, our main interest is in the conditional expectation that
defines the population regression model:

E [Y |X ] = E [ β 0 + β 1 X + U |X ] = β 0 + β 1 X + E [U |X ] = β 0 + β 1 X

Assumption: U and X are independent, then E [U |X ] = E [U ] = 0.

Gabriel Montes-Rojas Statistics


Expectation
Simple regression
Variance
Statistics for dummies
Regression model

Parameters vs Estimators

Note:
β 0 and β 1 are population parameters to be estimated.
β̂ 0 and β̂ 1 will be their estimators.
The parameters are just numbers, they are fixed. However,
the estimators will be random variables.

Gabriel Montes-Rojas Statistics

You might also like