You are on page 1of 30

Basiskurs Finance

1. Regression Analysis

Course Overview
Date

Topic

Tue. 08.04., 1216

Regression Analysis

Lecture

Tue. 15.04., 1216

Regression Analysis

Tutorial

Tue. 22.04.

No class (Easter holiday)

Tue. 29.04., 1216

Event Studies

Lecture

Tue. 06.05., 1216

Event Studies

Tutorial

Fri. 09.05., 1418

Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models)

Lecture

Tue. 13.05., 1216

Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models)

Tutorial

Please bring your laptop to the tutorials to follow along during the Excel exercises.
People:
Lecture: Dr. Nikolas Breitkopf (breitkopf@bwl.lmu.de)
Tutorial: Janis Bauer (janis.bauer@bwl.lmu.de)
Grading:
60 minute exam
Exam date: 02.06.2014, 18:3019:30 (please register for the exam via the LSF)
1

REGRESSION ANALYSIS

Content
Core questions
What is an estimator?
Properties of estimators
Which problems result from violation of OLS assumptions?

Agenda
Motivation
Ordinary Least Squares (OLS)
Effects of violation of assumptions
- Heteroscedasticity
- Correlation of the regressors with the error term (Endogeneity)

Fixed Effects Panel Estimation

REGRESSION ANALYSIS

Literature
Basic literature
Barreto, H. and Howland, F. M.; Introductory Econometrics, Cambridge;
latest edition

Additional literature
Johnston, J. and DiNardo, J.; Econometric Methods, McGraw-Hill; latest
edition
Greene, W.; Econometric Analysis, Prentice Hall; latest edition

REGRESSION ANALYSIS

Motivation
What are regressions used for in the field of finance?
Estimation of beta of a stock
Asset-Pricing-Tests
Determinants of capital structure
Event studies
Determination of trends
Forecasting

Definition
A regression estimates the linear relationship between independent
variables (x) and the dependent variable (y).
The real relationship of the population is deduced from a sample.

REGRESSION ANALYSIS

Example: Prohibition of naked sales of stocks


Dependent variable:
Bid/Ask-Spread

Stocks with Listed


Options

Stocks without Listed


Options

Constant

0.60 ***
(196.48)

4.23 ***
(1,015.57)

Naked Short Sale Ban


(Dummy)

0.33 ***
(5.94)

1.40 ***
(12.24)

Covered Short Sale Ban


(Dummy)

0.67 ***
(9.66)

2.14 ***
(25.95)

Disclosure Requirement
(Dummy)

-0.20 ***
(-3.42)

-0.72 ***
(-6.54)

Stock-level Fixed Effects

Yes

Yes

#Obs

427,164

4,716,000

#Stocks

1,306

15,185
Source: Beber/Pagano 2010, WP

REGRESSION ANALYSiS

Inference from a Random Sample


The population is the true, data-generating process that determines
the relationship between variables of interest.
Usually, one cannot observe the full population

Statistical inference is the process to learn from a random sample


about the population.
Population variables are assumed to be random variables, i.e. there
is no deterministic relationship between variables.
Then, any statistic calculated from the sample is a random variable as
well.

REGRESSION ANALYSIS

Illustrative Example: Inference on the Average of Random Variable


Assume the population consists of a normally distributed
random variable X ~ N( = 200, = 20)
Experiment
Draw 8 random samples from the population, each having 10 observations.
Calculate the mean of each sample.
Calculate the mean of the experiments means and its standard deviation.
What can you learn about the true random variable?

The sample mean is the best estimator of the population mean


Since the population variable X is a random variable, so will be the sample mean.
To learn about the population, you have to know the distribution of the estimator
(here the mean) in repeated samples.
The standard error describes the uncertainty (standard deviation) of the estimator.

REGRESSION ANALYSIS

Illustrative Example (continued)


Observation
1
2
3
4
5
6
7
8
9
10
Average
Std. Deviation
Mean (Averages)

E1
176.27
185.75
200.09
201.17
232.02
202.67
210.55
203.00
166.32
221.19
199.90
19.73

E2
185.11
182.41
215.10
203.69
197.46
225.78
236.46
202.90
179.36
162.50
199.08
22.62

E3
223.63
222.41
212.60
193.54
214.99
219.02
246.32
218.72
219.29
209.35
217.99
13.20

E4
207.93
214.81
187.94
182.40
208.36
165.95
191.26
218.34
197.47
194.19
196.86
16.06

E5
181.65
202.57
236.40
188.33
196.82
189.38
219.76
228.23
204.82
201.30
204.93
17.94

E6
254.48
189.30
217.17
202.42
216.74
168.10
202.98
203.15
194.76
199.27
204.84
22.34

E7
197.73
195.00
234.46
199.96
199.91
210.96
198.10
210.51
184.78
151.77
198.32
21.03

E8
233.67
164.47
212.61
218.67
166.11
221.77
227.80
194.88
173.09
199.35
201.24
25.89

202.89

Standard Error
(Experiments)

6.75

Standard Error
(theoretical)

6.32

Standard Error of the Mean : / n

REGRESSION ANALYSIS

Estimator Properties (I)


Estimation
An estimator is a method to determine unknown parameters of a
population with the help of a random sample from this population.
Elements of sample (here:
sample n=10)

Real population mean


Real population mean
Population

Estimated population
mean

REGRESSION ANALYSIS

Estimator Properties (II)


Desirable Properties
Unbiasedness
- The expected value of estimator is the true parameter.

E (b) =
Efficiency
- The sample variance of unbiased estimator is the smallest of all unbiased
estimators.
- Example: OLS is the best linear unbiased estimator (BLUE).

Consistency
- A biased estimator is consistent, if it converges asymptotically against the true
parameter.
n

s12 =
s2

1
( xi x ) 2

n 1 i =1

1 n
= ( xi x ) 2
n i =1

E ( s12 ) = 2

E ( s22 ) =

n 1 2

lim E ( s22 ) = 2

10

REGRESSION ANALYSIS

OLS in the case of two-dimensionality


OLS
Estimation of the best straight line describing the relationship between
x and y.
y = a + bx + e
Approach: Minimization of the squared errors by
RSS = ei 2
ei = yi yi = yi a bxi

Slope b

Properties
- Minimization of Residual Sums of Squares
- The straight line crosses the point

(x, y)

e1
e2
x
Intercept a
11

REGRESSION ANALYSIS

OLS in the case of multidimensionality


OLS in matrix notation:
Population model:

1 x1, 2 ... x1, k


y
u1
0
1

1 x
... x2, k
2, 2
;
y = X + u
with y = ... ; X =
= ... ; u = ...

... ... ...


un
k 1
yn

1
x
...
x
n,2
n, k

u describes the error term in the (unobserved) population, e the error


term of the sample.

The vector of the sample residuals is: e = y Xb


The optimization problem is then as follows:
min e ' e = min ( y ' y 2b' X ' y + b' X ' Xb)
b

- First order conditions (FOC)

RSS
= 2 X ' y + 2 X ' Xb = 0 ( X ' X )b = X ' y b = ( X ' X ) 1 ( X ' y )


b
k xk
x1
k

k x1

12

REGRESSION ANALYSIS

Linear Algebra Basics


Transpose of a matrix:
a b
A=

c d

a c
A' =

b d

Product of two matrices:


e
a b
A=
B =
g
c d

Identity matrix I:
AI = A
IA=A

0
I =
...

ae + bg
AB =
ce + dg

af + bh

cf + dh

0 ... 0

1 ... 0
... ... ...

0 ... 1

Properties of the inverse of a (square) matrix:


A A 1= I
A 1 A = I
13

REGRESSION ANALYSIS

Matrix Multiplication: Typical Cases


Example

Shape

Interpretation

( n k ) ( k m) = ( n m)

e' e

Sum of squared elements of e


(inner product)

Linear combination of the


columns of X

X'X

Co-Variation of the columns of X


(second-moment matrix)

uu '

Product of all combinations of the


elements of u (outer product)

14

REGRESSION ANALYSIS

What is the intercept?


OLS estimates a linear function, that crosses the mean of X and y.
To integrate the intercept into the estimation equation, a vector
consisting of ones has to be added to the matrix X.
1
y = 2

3

1
X = 1

OLS

b = ( X ' X ) 1 X ' y

1
1
( X ' X ) 1 = [1 1 1] 1 = (3) 1 =

3
1

1
X ' y = [1 1 1] 2 = 1 + 2 + 3 = 6

3

1
b= 6=2= y
3

15

REGRESSION ANALYSIS

Central OLS assumptions (I)


X has full rank k
Solution of OLS only possible, if the matrix (XX) is invertible, i.e. it has
to be positive definite (all intrinsic values > 0).
X has to consist of linear independent columns (full rank).
A violation of this assumption is also called perfect collinearity and
results usually from a wrong specification of the problem.
- Wrong specification of dummy variables:
- If a variable, consisting of c attributes, is separated into c Dummy
Variables, then X no longer posses full column rank.
- Example: gender is separated in female: 1 if female otherwise 0,
and male: 1 if male otherwise 0.

1 1 0
X = 1 0 1

1 1 0
16

REGRESSION ANALYSIS

Central OLS assumptions (II)


E(Xu)=0
Regressors are not correlated with the error
terms.

Note:

Cov (a, b) = E [(a E[a ])(b E[b])]


= E[ab] E[a ]E[b]

Homoscedasticity: Error terms are iid (0, )


E(ui)=0
The expected value of the error term is
zero.

Var(ui) = 2
The variance of the error terms is constant.

E(uiuj) = 0 for i j
Individual errors are independent.
No autocorrelation.

17

REGRESSION ANALYSIS

Properties of OLS
Unbiasedness: E(b) =
b = ( X ' X ) 1 ( X ' y )

y = X + u

b = ( X ' X ) 1 X ' ( X + u )

b = ( X ' X ) 1 ( X ' X ) + ( X ' X ) 1 X ' u

(b ) = ( X ' X ) 1 X ' u

E (b ) = ( X ' X ) 1 X ' E (u ) = 0 E (b) =

The regressors are uncorrelated with the residual: Cov(e, X) = 0

b = ( X ' X ) 1 ( X ' y )
( X ' X )b = X ' ( Xb + e)
( X ' X )b = ( X ' X )b + X ' e

X ' e = 0 E ( X ' e) = 0

Cov (e, X ) = 0

Note: E(Xe)=0 is not to be mixed up with the assumption E(Xu)=0.


OLS is calculated such that E(Xe)=0, nevertheless E(ee) can be a biased
and inconsistent estimate of E(uu).
18

REGRESSION ANALYSIS

OLS in the multivariate case (II)


Variance-Covariance-Matrix of the error terms
u1

u
E (uu ' ) = E 2
...

u
n

[u1

u2

E (u12 ) E (u1u2 )

2
E (u2u1 ) E (u2 )
... un ] =
...
...
E (u u ) E (u u )

n 1
n 2

... E (u1un ) 2 0

!
0 2
...
=
...
... ... ...
0
... E (un 2 ) 0

... 0
2
= I
... ...
... 2

...

Inference
Estimation of 2: E(ee), so-called standard error of the regression:
Variance of the OLS coefficients
var(b)

= E[(b )(b )' ] because E (b) =

= E[( X ' X ) 1 X ' uu ' X ( X ' X ) 1 ]

= E (uu ' )( X ' X ) 1 X ' I X ( X ' X ) 1

var(b)

= 2 ( X ' X ) 1

var(b) = s 2 ( X ' X ) 1
19

REGRESSION ANALYSIS

Hypothesis testing
The standard errors of the coefficients are the square root of the diagonal of
var(b). They can be used to calculate the t-statistic of an estimate: t = b/se(b)
A joint hypothesis test can be conducted by:

H 0 : R = r with : ( Rb r ) ~ N [0, 2 R( X ' X ) 1 R' ]


where R describes a (q x k)-matrix and r a vector of length q.
The test statistic is:
( Rb r )'[ R ( X ' X ) 1 R ' ]1 ( Rb r ) / q
e' e /( n k )

F ( q, n k )

Example: H0: i=0, i=1..4


1
0
R=
0

0
1
0
0

0
0
1
0

0
0
0
0
r=
0
0


1
0

q=4

20

REGRESSION ANALYSIS

Example: Estimates as random variables


True model:

y = + x + u = 1, = 1

100 simulations of random samples of x,y


- x ~ N(0,1) one-time sampled (fixed regressors).
- error terms u are sampled randomly out of N(0,1)
- For each sample, conduct OLS estimation

E (a)
StdDev(a )
E (b)
StdDev(b)

= 0.9921
= 0.1083
= 0.9993
= 0.1015

21

REGRESSION ANALYSIS

Properties of the Variance-Covariance-Matrix under OLS assumptions


Properties
Variance-Covariance-Matrix of the coefficients:
= E [(b )(b )' ]
var(b)

= E [( X ' X ) 1 X ' uu ' X ( X ' X ) 1 ]

= ( X ' X ) 1 X ' X ( X ' X ) 1

2 0

is the covariance matrix of the error terms


0 2

2
= I =
OLS makes the assumption = 2I
... ...

0
var(b)
= E[( X ' X ) 1 X ' uu ' X ( X ' X ) 1 ]
0

= 2 ( X ' X ) 1 X ' I X ( X ' X ) 1

... 0

... 0
... ...

2
...

= 2 ( X ' X ) 1 as AA1 = I

If the OLS-assumptions are violated, the standard significance


statements are wrong.
22

REGRESSION ANALYSIS

Heteroscedasticity
OLS assumption:
The error term ui posses a constant variance for all observations i
(Homoscedasticity).
Example: 2 =f(x)=x2

Heteroscedasticity:
12 0

2
0 2
E (uu ' ) =
... ...
0
0

... 0

... 0

... ...
2
... n

23

REGRESSION ANALYSIS

Results of a simulation study with heteroscedastic error terms


Monte-Carlo-Simulation
Assume the true model is: yi = 1 + xi + ui

with ui ~ N (0, xi2 )

Create 1000 samples of sample, each with a sample size of N = 100 observations
A regression from y to x is executed for each one of the 1000 data sets and the
resulting axis intercept and the slope is noted.
N=100
=1
1.0091

=1
0.99751

OLS s.e.
(avg. / incorrect)

0.60031

0.19875

OLS s.e.
(sim. distribution / correct)

0.38269

0.21949

White s.e.

0.37341

0.21262

Coefficients (OLS)

Standard errors of OLS are biased here: The estimated standard error of the
intercept is too large; the estimated standard error of the slope is too small.
Note: The coefficient estimates of OLS are still unbiased even in presence of
heteroscedasticity. However, for inference unbiased standard errors are essential.
24

REGRESSION ANALYSIS

White Correction of the Variance-Covarince-Matrix


The White Correction determines an adapted covariance matrix out
of the sample standard error terms to correct for heteroscedasticity.
The covariance matrix with heteroscedasticity is:
var(b)

= ( X ' X )1

1
(
)
(
)
X
X
X
X
'

'

k xn n xn n xk

S0 : k x k

12 0

2
0 2
with =
... ...
0
0

... 0

... 0

... ...
2
... n

has n parameters, this cant be estimated out of n observations.

The coefficient estimators of OLS are unbiased, so the residuals e are


unbiased estimates of u.

White :

S 0 = ei 2 xi x 'i
i =1

The White matrix is asymptotically unbiased with any type of


heteroscedasticity, and only k parameters have to be estimated.
25

REGRESSION ANALYSIS

Correlation of the regressors with the error term (Endogeneity)


OLS assumes that the regressors X and the error term u are uncorrelated
E(Xu) = 0

A violation of this assumption results particularly if


the X-variables are measured with error
- a X-variable is endogenous with y
- a X-variable, that is relevant in the population, is omitted in the regression.

y = 0 + 0 x + u with x = u +

Simulation experiment:
S=1000

=1

=0.5

=0.1

OLS (=0)

OLS

OLS

Parameter

0.5075

0.40766

0.10106

Avg. s.e.

0.050628

0.081588

0.10124

Std. Dev.

0.035988

0.067452

0.10102

1%tile

0.42086

0.24095

-0.14092

99%tile

0.59003

0.55394

0.32374

u , ~ N (0,1)

OLS is extremely biased


Solution: Instrumental Variables Regression
26

REGRESSION ANALYSIS

Simulation results with E(Xu) 0


Case: = 0.5

27

REGRESSION ANALYSIS

Panel Estimation
Panel data consists of cross-section and
time series data:
N individuals,
repeatedly observed at T points in time.

Simple OLS would pool all N*T


observations, assuming independence.
Obviously, with economic individuals
(like firms, stocks, countries, etc.) in the
cross-section,
- repeated observations of the same
individual will be more similar than
obs. between individuals.
- OLS will then be inconsistent and
biased.

Solutions
Estimate a system of equations, one for
each individual.
Estimate a system of equations, with
restrictions requiring some homogeneity
(e.g. same slope, different intercepts)

Pooled OLS :

y = Xb + e

y
I1

I2

I3

x
28

REGRESSION ANALYSIS

Fixed Effects Model (FE)


Assumption: Individual differences are
captured in differences in the constant
term (intercept).
This amounts to including one dummy
variable per individual
y1 X 1
i 0 ... 0 1
y X
0 i ... 0
2
2
= +
2 +
... ...
... ... ... ... ...

0 0 ... i n
yn X n

y = [ X d1 ... d n ] +

y1: Vector of observations of the


dependent variable of individual 1
X1 : Explanatory variables of individual 1
i is a vector of ones with length
corresponding to y1

Properties:
Computational intensive for large N.
Significance of fixed effects
F-test for the joint significance of the
dummies.

The model is robust against


misspecification.
Every time-invariant explanatory variable
is captured by dummies.
- E.g. legal form of firms, industry
affiliation, etc.

The individual effects can be correlated


with the disturbances.

Specific time-invariant variables can


only be estimated / included as
interaction terms with other regressors.
The fixed effects themselves are biased
This is just a classical regression!
estimates.
Basically, the individual time series data
is demeaned and then estimated by
OLS.
29

You might also like