01 Regression Analysis

Basiskurs Finance
1. Regression Analysis
Course Overview
Date
Topic
Tue. 08.04., 1216
Regression Analysis
Lecture
Tue. 15.04., 1216
Regression Analysis
Tutorial
Tue. 22.04.
No class (Easter holiday)
Tue. 29.04., 1216
Event Studies
Lecture
Tue. 06.05., 1216
Event Studies
Tutorial
Fri. 09.05., 1418
Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models)
Lecture
Tue. 13.05., 1216
Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models)
Tutorial
Please bring your laptop to the tutorials to follow along during the Excel exercises.
People:
Lecture: Dr. Nikolas Breitkopf (breitkopf@bwl.lmu.de)
Tutorial: Janis Bauer (janis.bauer@bwl.lmu.de)
Grading:
60 minute exam
Exam date: 02.06.2014, 18:3019:30 (please register for the exam via the LSF)
1
REGRESSION ANALYSIS
Content
Core questions
What is an estimator?
Properties of estimators
Which problems result from violation of OLS assumptions?
Agenda
Motivation
Ordinary Least Squares (OLS)
Effects of violation of assumptions
- Heteroscedasticity
- Correlation of the regressors with the error term (Endogeneity)
Fixed Effects Panel Estimation
REGRESSION ANALYSIS
Literature
Basic literature
Barreto, H. and Howland, F. M.; Introductory Econometrics, Cambridge;
latest edition
Additional literature
Johnston, J. and DiNardo, J.; Econometric Methods, McGraw-Hill; latest
edition
Greene, W.; Econometric Analysis, Prentice Hall; latest edition
REGRESSION ANALYSIS
Motivation
What are regressions used for in the field of finance?
Estimation of beta of a stock
Asset-Pricing-Tests
Determinants of capital structure
Event studies
Determination of trends
Forecasting
Definition
A regression estimates the linear relationship between independent
variables (x) and the dependent variable (y).
The real relationship of the population is deduced from a sample.
REGRESSION ANALYSIS
Example: Prohibition of naked sales of stocks

Dependent variable:
Bid/Ask-Spread
Stocks with Listed

Options
Stocks without Listed

Options
Constant
0.60 ***
(196.48)
4.23 ***
(1,015.57)
Naked Short Sale Ban

(Dummy)
0.33 ***
(5.94)
1.40 ***
(12.24)
Covered Short Sale Ban

(Dummy)
0.67 ***
(9.66)
2.14 ***
(25.95)
Disclosure Requirement
(Dummy)
-0.20 ***
(-3.42)
-0.72 ***
(-6.54)
Stock-level Fixed Effects
Yes
Yes
#Obs
427,164
4,716,000
#Stocks
1,306
15,185
Source: Beber/Pagano 2010, WP
REGRESSION ANALYSiS
Inference from a Random Sample

The population is the true, data-generating process that determines
the relationship between variables of interest.
Usually, one cannot observe the full population
Statistical inference is the process to learn from a random sample

about the population.
Population variables are assumed to be random variables, i.e. there
is no deterministic relationship between variables.
Then, any statistic calculated from the sample is a random variable as
well.
REGRESSION ANALYSIS
Illustrative Example: Inference on the Average of Random Variable

Assume the population consists of a normally distributed
random variable X ~ N( = 200, = 20)
Experiment
Draw 8 random samples from the population, each having 10 observations.
Calculate the mean of each sample.
Calculate the mean of the experiments means and its standard deviation.
What can you learn about the true random variable?
The sample mean is the best estimator of the population mean

Since the population variable X is a random variable, so will be the sample mean.
To learn about the population, you have to know the distribution of the estimator
(here the mean) in repeated samples.
The standard error describes the uncertainty (standard deviation) of the estimator.
REGRESSION ANALYSIS
Illustrative Example (continued)

Observation
1
2
3
4
5
6
7
8
9
10
Average
Std. Deviation
Mean (Averages)
E1
176.27
185.75
200.09
201.17
232.02
202.67
210.55
203.00
166.32
221.19
199.90
19.73
E2
185.11
182.41
215.10
203.69
197.46
225.78
236.46
202.90
179.36
162.50
199.08
22.62
E3
223.63
222.41
212.60
193.54
214.99
219.02
246.32
218.72
219.29
209.35
217.99
13.20
E4
207.93
214.81
187.94
182.40
208.36
165.95
191.26
218.34
197.47
194.19
196.86
16.06
E5
181.65
202.57
236.40
188.33
196.82
189.38
219.76
228.23
204.82
201.30
204.93
17.94
E6
254.48
189.30
217.17
202.42
216.74
168.10
202.98
203.15
194.76
199.27
204.84
22.34
E7
197.73
195.00
234.46
199.96
199.91
210.96
198.10
210.51
184.78
151.77
198.32
21.03
E8
233.67
164.47
212.61
218.67
166.11
221.77
227.80
194.88
173.09
199.35
201.24
25.89
202.89
Standard Error
(Experiments)
6.75
Standard Error
(theoretical)
6.32
Standard Error of the Mean : / n
REGRESSION ANALYSIS
Estimator Properties (I)

Estimation
An estimator is a method to determine unknown parameters of a
population with the help of a random sample from this population.
Elements of sample (here:
sample n=10)
Real population mean

Real population mean
Population
Estimated population
mean
REGRESSION ANALYSIS
Estimator Properties (II)

Desirable Properties
Unbiasedness
- The expected value of estimator is the true parameter.
E (b) =
Efficiency
- The sample variance of unbiased estimator is the smallest of all unbiased
estimators.
- Example: OLS is the best linear unbiased estimator (BLUE).
Consistency
- A biased estimator is consistent, if it converges asymptotically against the true
parameter.
n
s12 =
s2
1
( xi x ) 2
n 1 i =1
1 n
= ( xi x ) 2
n i =1
E ( s12 ) = 2
E ( s22 ) =
n 1 2
lim E ( s22 ) = 2
10
REGRESSION ANALYSIS
OLS in the case of two-dimensionality

OLS
Estimation of the best straight line describing the relationship between
x and y.
y = a + bx + e
Approach: Minimization of the squared errors by
RSS = ei 2
ei = yi yi = yi a bxi
Slope b
Properties
- Minimization of Residual Sums of Squares
- The straight line crosses the point
(x, y)
e1
e2
x
Intercept a
11
REGRESSION ANALYSIS
OLS in the case of multidimensionality

OLS in matrix notation:
Population model:
1 x1, 2 ... x1, k

y
u1
0
1
1 x
... x2, k
2, 2
;
y = X + u
with y = ... ; X =
= ... ; u = ...
... ... ...

un
k 1
yn
1
x
...
x
n,2
n, k
u describes the error term in the (unobserved) population, e the error

term of the sample.
The vector of the sample residuals is: e = y Xb

The optimization problem is then as follows:
min e ' e = min ( y ' y 2b' X ' y + b' X ' Xb)
b
- First order conditions (FOC)
RSS
= 2 X ' y + 2 X ' Xb = 0 ( X ' X )b = X ' y b = ( X ' X ) 1 ( X ' y )

b
k xk
x1
k
k x1
12
REGRESSION ANALYSIS
Linear Algebra Basics

Transpose of a matrix:
a b
A=
c d
a c
A' =
b d
Product of two matrices:

e
a b
A=
B =
g
c d
Identity matrix I:
AI = A
IA=A
0
I =
...
ae + bg
AB =
ce + dg
af + bh
cf + dh
0 ... 0
1 ... 0
... ... ...
0 ... 1
Properties of the inverse of a (square) matrix:

A A 1= I
A 1 A = I
13
REGRESSION ANALYSIS
Matrix Multiplication: Typical Cases

Example
Shape
Interpretation
( n k ) ( k m) = ( n m)
e' e
Sum of squared elements of e

(inner product)
Linear combination of the

columns of X
X'X
Co-Variation of the columns of X

(second-moment matrix)
uu '
Product of all combinations of the

elements of u (outer product)
14
REGRESSION ANALYSIS
What is the intercept?

OLS estimates a linear function, that crosses the mean of X and y.
To integrate the intercept into the estimation equation, a vector
consisting of ones has to be added to the matrix X.
1
y = 2

3
1
X = 1
OLS
b = ( X ' X ) 1 X ' y
1
1
( X ' X ) 1 = [1 1 1] 1 = (3) 1 =
3
1
1
X ' y = [1 1 1] 2 = 1 + 2 + 3 = 6

3
1
b= 6=2= y
3
15
REGRESSION ANALYSIS
Central OLS assumptions (I)

X has full rank k
Solution of OLS only possible, if the matrix (XX) is invertible, i.e. it has
to be positive definite (all intrinsic values > 0).
X has to consist of linear independent columns (full rank).
A violation of this assumption is also called perfect collinearity and
results usually from a wrong specification of the problem.
- Wrong specification of dummy variables:
- If a variable, consisting of c attributes, is separated into c Dummy
Variables, then X no longer posses full column rank.
- Example: gender is separated in female: 1 if female otherwise 0,
and male: 1 if male otherwise 0.
1 1 0
X = 1 0 1
1 1 0
16
REGRESSION ANALYSIS
Central OLS assumptions (II)

E(Xu)=0
Regressors are not correlated with the error
terms.
Note:
Cov (a, b) = E [(a E[a ])(b E[b])]

= E[ab] E[a ]E[b]
Homoscedasticity: Error terms are iid (0, )

E(ui)=0
The expected value of the error term is
zero.
Var(ui) = 2
The variance of the error terms is constant.
E(uiuj) = 0 for i j
Individual errors are independent.
No autocorrelation.
17
REGRESSION ANALYSIS
Properties of OLS
Unbiasedness: E(b) =
b = ( X ' X ) 1 ( X ' y )
y = X + u
b = ( X ' X ) 1 X ' ( X + u )
b = ( X ' X ) 1 ( X ' X ) + ( X ' X ) 1 X ' u
(b ) = ( X ' X ) 1 X ' u
E (b ) = ( X ' X ) 1 X ' E (u ) = 0 E (b) =
The regressors are uncorrelated with the residual: Cov(e, X) = 0
b = ( X ' X ) 1 ( X ' y )
( X ' X )b = X ' ( Xb + e)
( X ' X )b = ( X ' X )b + X ' e
X ' e = 0 E ( X ' e) = 0
Cov (e, X ) = 0
Note: E(Xe)=0 is not to be mixed up with the assumption E(Xu)=0.

OLS is calculated such that E(Xe)=0, nevertheless E(ee) can be a biased
and inconsistent estimate of E(uu).
18
REGRESSION ANALYSIS
OLS in the multivariate case (II)

Variance-Covariance-Matrix of the error terms
u1

u
E (uu ' ) = E 2
...

u
n
[u1
u2
E (u12 ) E (u1u2 )

2
E (u2u1 ) E (u2 )
... un ] =
...
...
E (u u ) E (u u )

n 1
n 2
... E (u1un ) 2 0
!
0 2
...
=
...
... ... ...
0
... E (un 2 ) 0
... 0
2
= I
... ...
... 2
...
Inference
Estimation of 2: E(ee), so-called standard error of the regression:
Variance of the OLS coefficients
var(b)
= E[(b )(b )' ] because E (b) =
= E[( X ' X ) 1 X ' uu ' X ( X ' X ) 1 ]
= E (uu ' )( X ' X ) 1 X ' I X ( X ' X ) 1
var(b)
= 2 ( X ' X ) 1
var(b) = s 2 ( X ' X ) 1
19
REGRESSION ANALYSIS
Hypothesis testing
The standard errors of the coefficients are the square root of the diagonal of
var(b). They can be used to calculate the t-statistic of an estimate: t = b/se(b)
A joint hypothesis test can be conducted by:
H 0 : R = r with : ( Rb r ) ~ N [0, 2 R( X ' X ) 1 R' ]

where R describes a (q x k)-matrix and r a vector of length q.
The test statistic is:
( Rb r )'[ R ( X ' X ) 1 R ' ]1 ( Rb r ) / q
e' e /( n k )
F ( q, n k )
Example: H0: i=0, i=1..4

1
0
R=
0
0
1
0
0
0
0
1
0
0
0
0
0
r=
0
0

1
0
q=4
20
REGRESSION ANALYSIS
Example: Estimates as random variables

True model:
y = + x + u = 1, = 1
100 simulations of random samples of x,y

- x ~ N(0,1) one-time sampled (fixed regressors).
- error terms u are sampled randomly out of N(0,1)
- For each sample, conduct OLS estimation
E (a)
StdDev(a )
E (b)
StdDev(b)
= 0.9921
= 0.1083
= 0.9993
= 0.1015
21
REGRESSION ANALYSIS
Properties of the Variance-Covariance-Matrix under OLS assumptions

Properties
Variance-Covariance-Matrix of the coefficients:
= E [(b )(b )' ]
var(b)
= E [( X ' X ) 1 X ' uu ' X ( X ' X ) 1 ]
= ( X ' X ) 1 X ' X ( X ' X ) 1
2 0
is the covariance matrix of the error terms

0 2
2
= I =
OLS makes the assumption = 2I
... ...
0
var(b)
= E[( X ' X ) 1 X ' uu ' X ( X ' X ) 1 ]
0
= 2 ( X ' X ) 1 X ' I X ( X ' X ) 1
... 0
... 0
... ...
2
...
= 2 ( X ' X ) 1 as AA1 = I
If the OLS-assumptions are violated, the standard significance

statements are wrong.
22
REGRESSION ANALYSIS
Heteroscedasticity
OLS assumption:
The error term ui posses a constant variance for all observations i
(Homoscedasticity).
Example: 2 =f(x)=x2
Heteroscedasticity:
12 0
2
0 2
E (uu ' ) =
... ...
0
0
... 0
... 0
... ...
2
... n
23
REGRESSION ANALYSIS
Results of a simulation study with heteroscedastic error terms

Monte-Carlo-Simulation
Assume the true model is: yi = 1 + xi + ui
with ui ~ N (0, xi2 )
Create 1000 samples of sample, each with a sample size of N = 100 observations
A regression from y to x is executed for each one of the 1000 data sets and the
resulting axis intercept and the slope is noted.
N=100
=1
1.0091
=1
0.99751
OLS s.e.
(avg. / incorrect)
0.60031
0.19875
OLS s.e.
(sim. distribution / correct)
0.38269
0.21949
White s.e.
0.37341
0.21262
Coefficients (OLS)
Standard errors of OLS are biased here: The estimated standard error of the
intercept is too large; the estimated standard error of the slope is too small.
Note: The coefficient estimates of OLS are still unbiased even in presence of
heteroscedasticity. However, for inference unbiased standard errors are essential.
24
REGRESSION ANALYSIS
White Correction of the Variance-Covarince-Matrix

The White Correction determines an adapted covariance matrix out
of the sample standard error terms to correct for heteroscedasticity.
The covariance matrix with heteroscedasticity is:
var(b)
= ( X ' X )1
1
(
)
(
)
X
X
X
X
'
'
k xn n xn n xk
S0 : k x k
12 0
2
0 2
with =
... ...
0
0
... 0
... 0
... ...
2
... n
has n parameters, this cant be estimated out of n observations.
The coefficient estimators of OLS are unbiased, so the residuals e are

unbiased estimates of u.
White :
S 0 = ei 2 xi x 'i
i =1
The White matrix is asymptotically unbiased with any type of

heteroscedasticity, and only k parameters have to be estimated.
25
REGRESSION ANALYSIS
Correlation of the regressors with the error term (Endogeneity)

OLS assumes that the regressors X and the error term u are uncorrelated
E(Xu) = 0
A violation of this assumption results particularly if

the X-variables are measured with error
- a X-variable is endogenous with y
- a X-variable, that is relevant in the population, is omitted in the regression.
y = 0 + 0 x + u with x = u +
Simulation experiment:
S=1000
=1
=0.5
=0.1
OLS (=0)
OLS
OLS
Parameter
0.5075
0.40766
0.10106
Avg. s.e.
0.050628
0.081588
0.10124
Std. Dev.
0.035988
0.067452
0.10102
1%tile
0.42086
0.24095
-0.14092
99%tile
0.59003
0.55394
0.32374
u , ~ N (0,1)
OLS is extremely biased

Solution: Instrumental Variables Regression
26
REGRESSION ANALYSIS
Simulation results with E(Xu) 0

Case: = 0.5
27
REGRESSION ANALYSIS
Panel Estimation
Panel data consists of cross-section and
time series data:
N individuals,
repeatedly observed at T points in time.
Simple OLS would pool all N*T

observations, assuming independence.
Obviously, with economic individuals
(like firms, stocks, countries, etc.) in the
cross-section,
- repeated observations of the same
individual will be more similar than
obs. between individuals.
- OLS will then be inconsistent and
biased.
Solutions
Estimate a system of equations, one for
each individual.
Estimate a system of equations, with
restrictions requiring some homogeneity
(e.g. same slope, different intercepts)
Pooled OLS :
y = Xb + e
y
I1
I2
I3
x
28
REGRESSION ANALYSIS
Fixed Effects Model (FE)

Assumption: Individual differences are
captured in differences in the constant
term (intercept).
This amounts to including one dummy
variable per individual
y1 X 1
i 0 ... 0 1
y X
0 i ... 0
2
2
= +
2 +
... ...
... ... ... ... ...

0 0 ... i n
yn X n

y = [ X d1 ... d n ] +

y1: Vector of observations of the

dependent variable of individual 1
X1 : Explanatory variables of individual 1
i is a vector of ones with length
corresponding to y1
Properties:
Computational intensive for large N.
Significance of fixed effects
F-test for the joint significance of the
dummies.
The model is robust against

misspecification.
Every time-invariant explanatory variable
is captured by dummies.
- E.g. legal form of firms, industry
affiliation, etc.
The individual effects can be correlated

with the disturbances.
Specific time-invariant variables can

only be estimated / included as
interaction terms with other regressors.
The fixed effects themselves are biased
This is just a classical regression!
estimates.
Basically, the individual time series data
is demeaned and then estimated by
OLS.
29

01 Regression Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 Regression Analysis

Uploaded by

Copyright:

Available Formats

Basiskurs Finance

Tue. 08.04., 1216

Tue. 15.04., 1216

No class (Easter holiday)

Tue. 29.04., 1216

Tue. 06.05., 1216

Fri. 09.05., 1418

Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models)

Tue. 13.05., 1216

Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models)

Fixed Effects Panel Estimation

Example: Prohibition of naked sales of stocks

Stocks with Listed

Stocks without Listed

Naked Short Sale Ban

Covered Short Sale Ban

Stock-level Fixed Effects

Inference from a Random Sample

Statistical inference is the process to learn from a random sample

Illustrative Example: Inference on the Average of Random Variable

The sample mean is the best estimator of the population mean

Illustrative Example (continued)

Standard Error of the Mean : / n

Estimator Properties (I)

Real population mean

Estimator Properties (II)

OLS in the case of two-dimensionality

OLS in the case of multidimensionality

1 x1, 2 ... x1, k

... ... ...

u describes the error term in the (unobserved) population, e the error

The vector of the sample residuals is: e = y Xb

- First order conditions (FOC)

Linear Algebra Basics

Product of two matrices:

Properties of the inverse of a (square) matrix:

Matrix Multiplication: Typical Cases

Sum of squared elements of e

Linear combination of the

Co-Variation of the columns of X

Product of all combinations of the

What is the intercept?

Central OLS assumptions (I)

Central OLS assumptions (II)

Cov (a, b) = E [(a E[a ])(b E[b])]

Homoscedasticity: Error terms are iid (0, )

b = ( X ' X ) 1 ( X ' X ) + ( X ' X ) 1 X ' u

E (b ) = ( X ' X ) 1 X ' E (u ) = 0 E (b) =

The regressors are uncorrelated with the residual: Cov(e, X) = 0

Note: E(Xe)=0 is not to be mixed up with the assumption E(Xu)=0.

OLS in the multivariate case (II)

= E[(b )(b )' ] because E (b) =

= E[( X ' X ) 1 X ' uu ' X ( X ' X ) 1 ]

= E (uu ' )( X ' X ) 1 X ' I X ( X ' X ) 1

H 0 : R = r with : ( Rb r ) ~ N [0, 2 R( X ' X ) 1 R' ]

Example: H0: i=0, i=1..4

Example: Estimates as random variables

100 simulations of random samples of x,y

Properties of the Variance-Covariance-Matrix under OLS assumptions

= E [( X ' X ) 1 X ' uu ' X ( X ' X ) 1 ]

= ( X ' X ) 1 X ' X ( X ' X ) 1

is the covariance matrix of the error terms

= 2 ( X ' X ) 1 X ' I X ( X ' X ) 1

If the OLS-assumptions are violated, the standard significance