Professional Documents
Culture Documents
by
We accept
t h i s t h e s i s as conforming
to the r e q u i r e d
standard
c)
J a c k i e Jen-Chy Hsu,
1980
In p r e s e n t i n g
an a d v a n c e d
the
this
degree
Library shall
I further
for
agree
scholarly
by
his
of
this
thesis
in p a r t i a l
fulfilment
at
University
of
the
make
that
thesis
purposes
for
may
avai1able
It
financial
is
of
The U n i v e r s i t y
by
the
understood
gain
shall
British
2 0 7 5 Wesbrook Place
Vancouver, Canada
1W5
Feti.
8,
1980
Columbia
requirements
reference
copying
Head o f
that
not
Commerce & B u s i n e s s A d m i n .
of
for
for extensive
be g r a n t e d
the
B r i t i s h Co 1umbia,
permission.
Department
Date
freely
permission
representatives.
written
V6T
it
of
of
I agree
and
this
be a l l o w e d
or
that
study.
thesis
my D e p a r t m e n t
copying
for
or
publication
without
my
ABSTRACT
The
presence of m u l t i c o l l i n e a r i t y
can induce l a r g e
i n the o r d i n a r y L e a s t - s q u a r e s estimates
of r e p r e s s i o n
coefficients.
on e s t i m a t i o n .
The presence o f s e r i a l l y
adverse
correlated error
coefficients
variances
Various
two-stage
of the r e g r e s s i o n
i n regression
a n a l y s i s , they a r e u s u a l l y d e a l t w i t h
This thesis
explores
separately.
square e r r o r p r o p e r t i e s of the o r d i n a r y r i d g e e s t i m a t o r
as the o r d i n a r y
'least-squares
estimator.
as w e l l
We show t h a t r i d g e
that i s
for autocorrelation.
F i n a l l y , using
of - m u l t i c o l l i n e a r i t y
s i m u l a t i o n experiments w i t h d i f f e r e n t
degrees
square e r r o r p r o p e r t i e s o f v a r i o u s
estimators.
TABLE OF CONTENTS
INTRODUCTION
NOTATION AND PRELIMINARIES
MULTICOLLINEARITY
3.1
Sources
3.2
Effects
3.3
Detection
AUTOCORRELATION
4.1
Sources
4.2
Effects
4.3
Detection
5.2
5.3
5.4
RIDGE REGRESSION:
PREDICTION
6.1
6.2
6.3
6.4
Estimation
6.5
Prediction
D e s i g n o f t h e Experiments
7.2
Sampling R e s u l t s
7.2a.
R e s u l t s assuming p
i s known
7.2b.
R e s u l t s assuming p
i s unknown
7.2c.
Forecasting
CONCLUSIONS
REFERENCES
INTRODUCTION
M u l t i c o l l i n e a r i t y and A u t o c o r r e l a t i o n a r e two v e r y
in regression analysis.
common problems
of m u l t i c o l l i n e a r i t y r e s u l t s i n e s t i m a t i o n ,
instability
and model m i s -
s p e c i f i c a t i o n w h i l e the presence o f s e r i a l l y c o r r e l a t e d e r r o r s l e a d s t o
u n d e r e s t i m a t i o n o f the v a r i a n c e s
prediction.
estimation
o f parameter e s t i m a t e s and i n e f f i c i e n t
reduce t h e i r impact.
c o r r e l a t i o n problems a r e d e a l t w i t h s e p a r a t e l y
preceedings.
In t h i s t h e s i s we address the q u e s t i o n
of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n on e s t i m a t i o n
Thereafter
e f f e c t i v e n e s s of various
these two c o n d i t i o n s .
estimator
adjusted
p r o p e r t i e s a r e i n v e s t i g a t e d by c o n d u c t i n g a s i m u l a t i o n
We b r i e f l y o u t l i n e t h i s t h e s i s .
our
and p r e d i c t i o n ? "
analysis.
Sections
Section 2 provides
3 and 4 g i v e a g e n e r a l
of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n .
the v a l i d i t y o f v a r i o u s
study.
the s e t t i n g f o r
d i s c u s s i o n of the problems
In a d d i t i o n , we comment on
The a n a l y t i c a l study
of the j o i n t e f f e c t s of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n i s p r e s e n t ed i n S e c t i o n 5 .
I n S e c t i o n 6 , a new r i d g e e s t i m a t o r
adjusted
f o r auto-
i n practice.
-2-
7.
The
t h e s i s concludes w i t h the p r e s e n t a t i o n
of s e v e r a l two-stage
predictions.
will
-3-
equation
Y = x + e
(2.1)
where Y i s a
n x l . v e c t o r o f o b s e r v a t i o n s on t h e dependent v a r i a b l e , X
i s a nxp m a t r i x o f o b s e r v a t i o n s on t h e e x p l a n a t o r y v a r i a b l e s , 3 i s a
p x l v e c t o r o f r e g r e s s i o n c o e f f i c i e n t s t o be e s t i m a t e d and e i s a n x l
vector of true
e r r o r terms.
r e g r e s s i o n model a r e :
(1) E ( e ) = 0 , where 0 i s t h e zero v e c t o r
(2.2)
0 L g
terms.
(OLS) e s t i m a t o r o f 3 i s g i v e n
(fp^fY
Var(3
) = a (X X)
2
Q L S
_ 1
T
For s i m p l i c i t y , we w i l l assume t h a t (X X) i s i n c o r r e l a t i o n form. L e t
X T
P be the pxp o r t h o g o n a l m a t r i x such t h a t PX XP = A where A i s a
T
d i a g o n a l m a t r i x w i t h t h e e i g e n v a l u e s o f (X X), X^,...,X , d i s p l a y e d on
the d i a g o n a l o f A.
1 2
We assume f u r t h e r t h a t
p
-4-
After
applying
(2.4)
an o r t h o g o n a l
E(Y) = X P P
T
rotation,
P, i t f o l l o w s
from
(2.1) t h a t
= X*a
T
w h e r e X* = XP
i s the data
matrix
represented
and
t h e c o l u m n s o f X* a r e l i n e a r l y
the
vector
of regression
follows
that
We w i l l
consider
(2.6)
is
ridge
symmetric
be
an " a d a p t i v e
0 L S
ridge
LS
estimator"
estimator"
i n the rotated
[12].
where Z =
By
SfcCk) - Z a
(A+kI)
substituting
8 (k)
R
- 1
the ridge
0 L S
A.
= (P AP+kI)
T
? (
T
A + k
_ 1
-P^A+kl)"
It
follows
(2.8)
from
(2.7) t h a t
a ( k )= P (k) .
R
- 1
I found
P AP
T
??
T A
Q L S
?OLS
Aa ...
QLS
by a
i s said to
[ll:p.63].
coordinates,
X*P = X i n ( 2 . 6 ) ,
I f k l i s replaced
k, t h e n t h e e s t i m a t o r
given by
(2.7)
nonnegative d e f i n i t e matrix
a "generalized
Expressed
Q L S
It
0 < k < l
of
components.
f o r 3 o f the form
(X X)
by
coordinates
a =-P i s
The v e c t o r
of the p r i n c i p a l
of a i s given
ridge estimators
Where k i s i n d e p e n d e n t
called
independent.
coefficients
t h e OLS e s t i m a t o r
i n the rotated
estimator
fora i s
model, assumption
Linear Regression
This leads
(ALR)
to the f o r m u l a t i o n
model.
Mathematically
(3) by
T
E(ee
) = a^Q
We
the
ALR
assumptions
(3 )
fc
where 2 i s a n o n d i a g o n a l p o s i t i v e d e f i n i t e
ie
observation
on the
an
(2 )
1
t'
of
(3') below.
uncorrelated
matrix.
explanatory
succeeding
errors,
t+1'
follows a f i r s t - o r d e r
autoregressive
scheme, t h a t i s
e
(2.9)
where p
and
i s the a u t o c o r r e l a t i o n c o e f f i c i e n t .
that U
(2.10)
= p e - + U
e t-1
t
s a t i s f i e s the f o l l o w i n g f o r a l l t
E(U
E
) = 0
t t
U
E(U U
t
t + s
"
s = 0
)=0
4 o
and
(2.11)
E(ee
~~
N
) =
'
a2u~V
where
n-1
n-2
V
n-1
n-2
We
require that
|p |<l
-6-
(GLS) w i l l
(BLUE) o f 3, denoted as 3
r T C
g i v e the
.
The
m a t r i x ft can be w r i t t e n
T
0 = 9Q
where Q i s n o n s i n g u l a r .
-1
-1 T
ft(Q
-1 T
(Q )
and
^GLS
Hence
) = I
-3
-1
n
b ^ - ^ by making
t a
n e
S*
_ 1
Then i t f o l l o w s t h a t
(2.12)
Since
= X g + ,.
g i v e the BLUE of 3.
( 2
1 3 )
i Ls
Hence i t f o l l o w s t h a t
T -1 -1 T -1
= (X ft X)
X ft Y
For p r e d i c t i o n , formula
where e
t + 1
= X
t + 1
G L S
'
(2.15)
will
p e
residual.
-7-
3. MULTICOLLINEARITY
In a p p l y i n g m u l t i p l e r e g r e s s i o n models, some degree of
dependence among e x p l a n a t o r y
v a r i a b l e s can be
expected.
inter-
As
this
T
interdependence grows and
the c o r r e l a t i o n m a t r i x (X X)
s i n g u l a r i t y , m u l t i c o l l i n e a r i t y c o n s t i t u t e s a problem.
approaches
Therefore
it is
or
"nonexistence".
Sources
In g e n e r a l , m u l t i c o l l i n e a r i t y can be
poor e x p e r i m e n t a l d e s i g n .
be c l a s s i f i e d as f o l l o w s
(i)
Not
The
considered
to be
a symptom
of
[20:p.99-101].
enough d a t a or two
many v a r i a b l e s
the number of v a r i a b l e s e x t r a c t e d
from the d a t a i n c r e a s e s ,
each
each h i g h l y c o l l i n e a r v a r i a b l e o n l y has
t i o n content.
little
In t h i s case, d e l e t i n g some v a r i a b l e s or
informa-
collecting
to mathematical or
included
i n the model.
( i i i ) Sampling s i n g u l a r i t y
Due
to expense, a c c i d e n t
or m i s t a k e , sampling was
conducted i n a s m a l l r e g i o n of the d e s i g n
space.
only
3.2
Effects
The
(i)
major
effects
Estimation
As
the
T
(X X)
the
T
-1
(X X)
explode.
inverse matrix
of
becomes i l l - c o n d i t i o n e d ,
the
OLS
estimates
f o r B are
elements of
the
inverse matrix
instability
of
the
be
numerically
the
inverse matrix
regression coefficients
variances
(ii)
of
and
impossible
are
of
quite sensitive
the
As
a result
of
small
OLS
estimates
v a r i a b l e s might
case
they
changes
have
i n the
large
data
set.
Structure misspecification
The
the
increase
information
decreasing
to
the
the
i n the
content
sample
explained
d e p e n d s on
size
of
the v a r i a b l e set of X
o f each e x p l a n a t o r y
significance
variance
e a c h member o f
o f Y.
authors
data
a relatively
[6:p.94][13:p.l60][15],
limitation
responsible
Forecast
If
then
f o r the
than
tendency
the
thereby
each v a r i a b l e ' s c o n t r i b u t i o n
even though Y
large v a r i a b l e set
happen.
i n the
variable,
decreases
As
process
theoretical
to underspecify
really
of
a s s e r t e d by
of
many
model-building,
limitation
the
X,
is
models.
inaccuracy
an
collinear,
changes
rather
of
Therefore,
e r r o n e o u s d e l e t i o n o f v a r i a b l e s may
(iii)
, the
I n any
to
that
-1
collinear
to obtain.
shows
the
-1
(X X)
the
(2.3)
a f f e c t e d by
T
(XX)
T
the
are
c o r r e l a t i o n matrix
variances
diagonal
serious multicollinearity
instability
elements of
the
of
important
v a r i a b l e i s omitted
but
in- the
later
i t s behavior
and
prediction period,
moves i n d e p e n d e n t l y
any. f o r e c a s t i n g u n d e r . t h i s
inaccurate.
because
of
i t i s highly
this
omitted
variable
other v a r i a b l e s ,
o v e r s i m p l i f i e d model w i l l
be
very
-9-
(iv)
Numerical problems
T
The
o f X a r e l i n e a r l y dependent.
the
With the m a t r i x (X X) b e i n g
singular,
OLS e s t i m a t e s of , r e p r e s e n t e d by ( 2 . 2 ) , a r e c o m p l e t e l y
indeterminate.
In case o f an almost s i n g u l a r
s e t of v a r i a b l e s ,
T -1
the n u m e r i c a l i n s t a b i l i t y i n c a l c u l a t i n g i n v e r s e m a t r i x (X X)
still
3.3
remains.
Detection
T e s t s f o r the presence and l o c a t i o n o f s e r i o u s
are b r i e f l y o u t l i n e d
(1)
multicollinearity
and f o l l o w e d by comments.
T e s t s based o n . v a r i o u s c o r r e l a t i o n
coefficients
Here, h a r m f u l m u l t i c o l l i n e a r i t y i s g e n e r a l l y
r u l e s o f thumb.
r e c o g n i z e d by
requires
simple p a i r - w i s e c o r r e l a t i o n c o e f f i c i e n t s of e x p l a n a t o r y
variables
to be l e s s than 0.8.
sophisticated
Certainly,
more s a t i s f a c t o r y r e s u l t s .
i s generally
c o n s i d e r e d t o be s u p e r i o r
The f o l l o w i n g
coefficients
r u l e o f thumb
to o t h e r r u l e s :
a variable
i s s a i d t o be h i g h l y m u l t i c o l l i n e a r i f i t s c o e f f i c i e n t o f
2
m u l t i p l e c o r r e l a t i o n , R., w i t h the remaining (p-1) v a r i a b l e s i s
2
g r e a t e r than the c o e f f i c i e n t o f m u l t i p l e c o r r e l a t i o n , R , w i t h a l l
the e x p l a n a t o r y v a r i a b l e s [I4:p.101]. The v a r i a n c e o f the e s t i m a t e
of
3^ can be expressed as f o l l o w s
-,
(3.1)
1 - R
Var(B ) = ^--4
1
[9]
\
a
X.
l
1 - R
2
2
where a i s the v a r i a n c e o f the dependent v a r i a b l e Y and a,, i s the
y
X.
l
3
-10-
X^.
From
(3.1), i t i s obvious
2
that m u l t i c o l l i n e a r i t y
constitutes
a problem
o n l y when R i s r e l -
2
high to R^.
atively
this
(ii)
rule
U n f o r t u n a t e l y the geometric
o f thumb i s a p p a r e n t
o n l y when t h e r e a r e two
variables
[6:p.98],
Three-stage hierarchy test
T h i s i s p r o p o s e d b y F a r r a r and G l a u b e r
stage,
i f the n u l l
hypothesis H
the W i l k s - B a r t l e t t ' s
severe
a n d move
test,
toward
i n t e r p r e t a t i o n of
: |X
At the f i r s t
X| = 1 i s rejected
we may a s s e r t
the second
[6].
explanatory
based
on
that m u l t i c o l l i n e a r i t y i s
stage.
The F s t a t i s t i c
i s then
2
f o r each R^
computed
RJ/(P-D
F
i = 1,
. . . , p
(l-R*)/(n-p)
Statistical
stage,
inspection o f the p a r t i a l
X ^ and t h e r e m a i n i n g
can
F^ i m p l i e s X ^ i s c o l l i n e a r .
significant
(p-1) v a r i a b l e s
severe
F a r r a r and G l a u b e r
multicollinearity
among e x p l a n a t o r y v a r i a b l e s
different
Haitovsky
In
stages
of their
C h i Square
statistic
test
between
tratios
among t h e e x p l a n a t o r y
that detecting,
the pattern of
can be r e s p e c t i v e l y
localizing
interdependence
achieved
at three
test.
test
1969, H a i t o v s k y
hypothesis
claimed
and l e a r n i n g
coefficients
and t h e a s s o c i a t e d
show t h e p a t t e r n o f i n t e r d e p e n d e n c y
variables.
(iii)
correlation
At the t h i r d
[9] proposed
of severe
a heuristic
multicollinearity.
i s a f u n c t i o n o f the determinant
statistic
This
f o r the
heuristic
of the c o r r e l a t i o n
matrix
T
( X X ) , and a p p r o x i m a t e l y
distributed
as C h i Square.
Applications
at the f i r s t
stage of
three-stage
T h e r e f o r e H a i t o v s k y c l a i m e d the s u p e r i o r i t y of h i s t e s t
suggested
a replacement
of W i l k - B a r t l e t t ' s t e s t by h i s t e s t i n
three-stage t e s t .
of c o r r e l a t i o n m a t r i x has some b u i l t - i n
deficiencies..
However, any
test
based
error
T
e s t i m a t e s may
X^/X
deteriorate.
, is
Since
e q u i v a l e n t l y to those h a v i n g
relatively
The
r e l a t i v e magnitude of the e i g e n v a l u e s
i f n o t i m p o s s i b l e t o i n f e r from the r e s u l t s of
t e s t t h a t i s based
on the determinant
However, H a i t o v s k y t e s t g i v e s a f a i r l y
any
of the c o r r e l a t i o n m a t r i x .
good i n d i c a t i o n i n the
S i n c e the t r a c e of the c o r r e l a t i o n m a t r i x
T
(X X) i s e q u a l to the number o f e x p l a n a t o r y v a r i a b l e s p, an
arbitrary
-12-
Besides,
MMI
[21:p.13-14].
Among a l l these tests and methods proposed, examining the
T
-13-
4. AUTOCORRELATION
One o f t h e b a s i c assumptions o f t h e CLR model i s t h a t the e r r o r
terms a r e independent o f each o t h e r .
i s applied
However, when r e g r e s s i o n
analysis
s e r i a l l y correlated.
w i d e s p r e a d problem i n a p p l y i n g
r e g r e s s i o n models.
i s another
For s i m p l i c i t y ,
first-
o r d e r a u t o c o r r e l a t i o n i s assumed i n our s t u d y .
4.1
Sources
The s o u r c e s a r e m a i n l y t h e f o l l o w i n g :
(i)
Omission of v a r i a b l e s
The t i m e - o r d e r e d e f f e c t s o f t h e o m i t t e d v a r i a b l e s w i l l be
included
i n t h e e r r o r terms.
d i s p l a y i n g random b e h a v i o r .
from
I n t h i s case, f i n d i n g the m i s s i n g
v a r i a b l e s and i d e n t i f y i n g t h e c o r r e c t r e l a t i o n s h i p can s o l v e t h e
problem.
(ii)
S y s t e m a t i c measurement e r r o r i n t h e dependent v a r i a b l e
A g a i n , t h e e r r o r terms absorb the s y s t e m a t i c measurement e r r o r
i n t h e dependent v a r i a b l e and t h e n d i s p l a y non-random b e h a v i o r .
(iii)
E r r o r s t r u c t u r e i s time dependent
The g r e a t impacts o f some random e v e n t s o r s h o c k s , such as
war, s t r i k e s , f l o o d , e t c . , a r e spread over s e v e r a l p e r i o d s o f t i m e ,
c a u s i n g t h e e r r o r terms t o be s e r i a l l y c o r r e l a t e d .
"true-autocorrelation".
This i s so-called
-14-
4.2
Effects
When the OLS technique i s s t i l l used f o r e s t i m a t i o n , the major
effects are:
(i)
Unbiased but i n e f f i c i e n t e s t i m a t o r of B
GLS p r o v i d e s t h e BLUE o f B when the d i s p e r s i o n m a t r i x o f e,
2
oM}, i s n o n d i a g o n a l .
the sampling
g, hence OLS i s i n e f f i c i e n t
compared w i t h GLS.
U n d e r e s t i m a t i o n o f the v a r i a n c e s o f the e s t i m a t e s o f B
(ii)
As an i l l u s t r a t i o n , c o n s i d e r t h e v e r y simple model
y
V
where u
= Bx
e
e
+ c
t - i
s a t i s f i e s assumptions ( 2 . 1 0 ) .
v a r i a n c e o f OLS e s t i m a t e o f B i s [ 1 3 : p . 2 4 7 ]
n-1
a
(4.1)
Var(6
0 L S
i>i i l
- -f
i-i
n-2
[ 1 + 2^
+.2p
1=1
+ 2 p e
n
-f^*
i - i
X
1
"
n
i=l
J, l i+2
xi
1
i n ( 4 . 1 ) and
2
2
g i v e s the v a r i a n c e s o f the e s t i m a t e s o f B as a / x ^ I f b o t h e
. i=l
and
x a r e p o s i t i v e l y a u t o c o r r e l a t e d the e x p r e s s i o n i n parentheses
n
n T C
I n e f f i c i e n t predictor of Y
When autocorrelation i s present,, error made at one point i n
time gives information about the error made at a subsequent point
i n time.
Detection
The tests which are commonly used to recognize
the existence of
Eye-ball tests
The plot of OLS residuals e
Any nonrandom bechaior
autocorrelation.
lagged value e
fc
of e
fc
can be considered
as an i n d i c a t i o n of
I f the observations
against i t s
yon-Neumann r a t i o
In 1941, the r a t i o of the mean square successive difference to
the variance was proposed by von-Neumann as a test s t a t i s t i c for the
existence of f i r s t - o r d e r autocorrelation [22].
Though various
In practice,
to compute t h e von-Neumann r a t i o
are not i n d e p e n d e n t l y d i s t r i b u t e d
usually
terms
are.
(iii)
Durbin-Watson t e s t
T h i s t e s t , named a f t e r i t s o r i g i n a t o r s D u r b i n and Watson, i s
w i d e l y used
f o r s m a l l sample s i z e s
[4][5]..
There a r e some s h o r t -
First,
t h e r e e x i s t two r e g i o n s
Secondly,
the Durbin-Watson t e s t i s d e r i v e d f o r
by Henshaw
i s b i a s e d towards the v a l u e f o r
a random e r r o r , t h a t i s , d i s b i a s e d towards 2
v e r y m i s l e a d i n g i n f o r m a t i o n [17].
thereby
giving
I t i s as n e c e s s a r y as important
to t e s t f o r s e r i a l c o r r e l a t i o n f o r models c o n t a i n i n g lagged
dependent v a r i a b l e s s i n c e a u t o c o r r e l a t e d models a r e u s u a l l y r e p a i r e d
by i n s e r t i n g lagged Y v a l u e s i n t o the r i g h t - h a n d s i d e of the
regression equation.
To t h i s end, D u r b i n
on the h s t a t i s t i c i n 1970[3].
"h" i s d e f i n e d as the f o l l o w i n g ,
n
t-1*
developed
a test
based
-17-
This test
i s c o m p u t a t i o n a l cheap but o n l y a p p l i c a b l e f o r l a r g e
sample s i z e s .
are s t i l l unknown.
-18-
5.
s t a t i s t i c a l a n a l y s i s , a p o i n t e s t i m a t e i s u s u a l l y o f l i t t l e use
u n l e s s accompanied by an e s t i m a t e o f i t s a c c u r a c y .
Mean Square E r r o r
it
In t h i s
connection,
Since
i s t r u e t h a t a c c u r a t e parameter e s t i m a t e s c o n s t i t u t e an e f f e c t i v e
model.
MSE
can be used
t o determine
1970 paper,
H o e r l and Kennard p r e s e n t e d
T h e r e a f t e r the r e s u l t s o f v a r i o u s
that ridge r e g r e s s i o n w i l l
of
In
t h i s s e c t i o n , we w i l l p r e s e n t e x p r e s s i o n s f o r the MSE of
B (k)
(2.9).
and
be reduced
5.1
multicollineary.
two
o f severe
Our a n a l y s i s can
t o t h a t o f H o e r l and Kennard by s e t t i n g p
= 0.
model. L e t
= D i s t a n c e from
^0LS"
) T (
Q L S
to B
^0LS~^ '
"
We d e f i n e the MSE of B
ul_ib
2
to be E ( L ) .
_L
P r o p o s i t i o n 5.1
(5.D
-BO*) =
1
j = l =1
D
J J i
-19-
where
D = X(X X)
T
Proof:
( 5 . 2 )
From (2.1)
^OLS "
(2.2)
? P ? I ~
- 1
= (X X) X (X+e) - 3T
-1
= (X X)
l
-1
X. e
E(L ) = E [ ( 6
2
-B) (i
T
O L S
O L S
-3)]
= gt^xonp X e] .
Noting that E(e) = 0 i t follows from Theorem 4.6.1
G r a y b i l l [7:p.l39]
that
(5.3)
E(L ) = a t . [ X ( X X ) " X V ]
2
of 3Q
LS
Proposition
5.2
1=1
demonstrated.
j > I
observaton on the i
1=1
p r i n c i p a l component.
-20-
Proof:
(5.5)
- a = (X* X*) X* Y - a
T
Q L S
-1
T
-1 T
= (X* X*) X* g
By d e f i n i t i o n and (5.5) i t follows that
^"ibLS-^VPCgoLS-^l
< 1>
L
E [ (
?0LS^
^0LS 2
_
)]
X* e]
T
E(L )
2
a t [X*A X* V]
u r ~ ~ ~ ~
2
xl i
!i !i
X
2
1 AT
l
L
X* X* .
l i nx
au2 tr (,
V .)
X* . x* . p X* . X* .
nx l x Y nx 2x
2
^ 2
AT
1
AT
X
p x*
y nx
2
1 A
x
2
First, i f p
i s positive
That i s to
-21-
p .
c o r r e l a t i o n c o e f f i c i e n t , p^.
that i s ,
I f the matrix
(X X) i s i l l - c o n d i t i o n e d ,
c o r r e l a t i o n both
auto-
i s then extremely
characteristics.
can be v e r y l a r g e .
Finally,
from (5.4) we a r e a b l e to t e l l by
?
^.
> A.
i=l
= a
5.2
of 0
In p a r a l l e l w i t h 5.1, we d e f i n e
L.(k) = D i s t a n c e from B_(k)
L
~K
to g
~
The MSE o f g ( k ) i s g i v e n by
D
B ) ( B ( k ) - )].
P r o p o s i t i o n 5.3
(5.6)
E[L (k)] =
where
2
y^k) = ^
2
Y
( k )
Y l
( k ) + Y (k) + Y (k)
2
iJ.
\^
(A +k)2
.
"i
-22-
n n-1
Y (k) - l o l l
l
.
j > I
3
Proof:
A.
] p -'
px*.x*
l
\
i=l( Aj+k)
as
E [ L ( k ) ] = E[( (k) - B ) V p ( 3 ( k ) - )]
2
"
E [ ( Z
?OLS " ?
= E[(a
(5.7)
) T ( Z
5oLS * ?
- a) Z Z(a
T
Q L S
) ]
- a)] + (Za - a ) ( Z a - a)
T
Q L S
E[(a
-a) Z Z(a
T
Q L S
0 L S
-a)] = a
= a j t [X*(A+kI)" X* V]
2
2
u
2o
1=1 (XH-k)
n
n-1
l i t !
j > ,
P x*.x*
- ^
.
2
] Pf
i = l (X +k)
(5.9)
Z - I = Z(I-Z )
-1
= Z(-kA )
-1
= -k(A+kI)
_1
(Za-a) -(Za-a) =
J
a (Z-I)^'a
i
2 T
-?
= kV(M-kl)
a
Z
= k
I
i=l
a. '
(A +k)
= Y (k)
'
Completing the p r o o f .
The
Y^(k)
MSE of B ( k ) c o n s i s t s o f t h r e e p a r t s , y^ik),
R
can be c o n s i d e r e d
t o be the t o t a l v a r i a n c e
e s t i m a t e s and i s a m o n o t o n i c a l l y
the
decreasing
Y ( k ) and Y^Ck).
2
o f the parameter
f u n c t i o n o f k, Y ( k ) i s
2
m o n o t o n i c a l l y i n c r e a s i n g f u n c t i o n o f k w h i l e Y-j(k) i s r e l a t e d to the
a u t o c o r r e l a t i o n i n the e r r o r terms.
Hoerl
and Kennard c l a i m t h a t i n
i t i s p o s s i b l e t o reduce MSE
s u b s t a n t i a l l y by t a k i n g a l i t t l e b i a s , t h a t i s , c h o o s i n g k > 0.
This
w i l l only increase
s l i g h t l y as k i n c r e a s e s
[ll:p.60-61].
After
i n c o r p o r a t i n g a u t o c o r r e l a t i o n i n the c o n t e x t o f r i d g e r e g r e s s i o n
t h e i r a s s e r t i o n w i l l s t i l l be t r u e o n l y
satisfied.
while
analys
i f c e r t a i n conditions are
the e f f e c t s o f m u l t i c o l l i n e a r i t y and
a u t o c o r r e l a t i o n a r e the f o l l o w i n g .
(i)
If
weak components, a r e a l s o p o s i t i v e l y a u t o c o r r e l a t e d ,
method w i l l be even more d e s i r a b l e than OLS.
then
ridge
T h i s i s because
i n Y ( k ) i s r e l a t i v e l y s m a l l as
2
moving t o k > 0.
(ii)
If p
-24 -
not
autocorrelated,
same as i n the u n c o r r e l a t e d
( i i i ) Since
ridge regression
relatively
case,
i s s i m i l a r to s h r i n k i n g the model by d r o p p i n g
(5.6) g i v e s a t h e o r e t i c a l
j u s t i f i c a t i o n t o s h r i n k t h e model i f both t h e l a s t
e r r o r terms a r e p o s i t i v e l y a u t o c o r r e l a t e d .
of e s t i m a t i o n
stability,
5.3
a c o n d i t i o n on k such that r i d g e r e g r e s s i o n g i v e s b e t t e r
e s t i m a t e s than OLS i n terms of MSE.
a
That i s when k i s s m a l l e r
2
, where a
i s the l a r g e s t r e g r e s s i o n c o e f f i c i e n t
max
max
i s present,
Y ( k ) and Y ( k ) .
2
below.
than a /
u
i n magnitude, the
When a u t o c o r r e l a t i o n
t h e c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l
b e t t e r than OLS r e g r e s s i o n a r e d e s c r i b e d
of Y ( k ) ,
parameter
perform
C o n s i d e r the d e r i v a t i v e s
"P
dy
/dk
2k
A. af
'I
i=l
(5.10) d y / d k
3
When (X X)
values
(A^k)-
approaches s i n g u l a r i t y which i m p l i e s t h a t A
of the f i r s t
are g i v e n
two
d e r i v a t i v e s i n the neighborhood of
the
origin
by
(dy /dk)
= -<*>
(dy
= 0
/dk)
As k i n c r e a s e s , a huge drop i n y^ w i t h s l i g h t
expected.
-> 0,
increase
in y
may
Therefore
increases.
be
The
y^ may
use
increase
or d e c r e a s e at v a r i o u s
of r i d g e r e g r e s s i o n
the
rate's as
i s most f a v o u r a b l e
the
when
error
k
there
We
now
formalize
present
c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l be b e t t e r than
regression
i n MSE
criterion.
a
OLS
and
-26-
Let F(k) = E ( L ) - E [ L ( k ) ]
2
2
i = i At;
(A.+k)"
" j = i *=i
fcJ
=i
(A.+k)
Then
dF/dk = 2 a J
(5.13)
j -^[x/f
i=l
(X + k )
j=l =1
X*.X*
pj]-2k
P A
1=1 ( A + k )
of o r i g i n .
moving
towards k > 0 ,
( 5 . 1 2 ) we may
^ ( d F / d k ) > 0.
i.e.
Theorem 5 . 1 .
/r- , ,
o*
a
max
implies
If
2
\
max
ri-1
1 3=1
In o t h e r words,
e s t i m a t e s have h i g h e r MSE
estimates.
j J J + (dF/dk) > 0
expect F ( k ) to i n c r e a s e as
n-j
1=1
there
than the r i d g e
-27-
=0.
When
However, (5.14) i s
2
2
just a necessary condition on k for E(L^) to be greater than E[L2(k)]
since F(k)- i s increasing i n k over the range shown by (5.14).
It
estimates
criterion.
If either p
and
The effect of
gathered.
by conducting a p r i n c i p a l component
parameters.
-28-
[ll':p.65].
They proposed
the system w i l l
d i a g n o s t i c t o o l t o s e l e c t a s i n g l e v a l u e of k and a unique
of
g i n practice.
The
the parameter e s t i m a t e s as k v a r i e s .
dimensions
T h e r e f o r e i n s t e a d of s u p p r e s s i n g
e i t h e r by d e l e t i n g c o l l i n e a r v a r i a b l e s or dropping
p r i n c i p a l components of s m a l l importance,
how
s i n g u l a r i t y i s causing i n s t a b i l i t y ,
incorrect
ridge estimate
signs.
the
over/under-estimations
and
In c o n n e c t i o n w i t h a u t o c o r r e l a t i o n where r i d g e
g r e a t h e l p i n g e t t i n g b e t t e r p o i n t e s t i m a t e s and
predictions.
Even when p
be
thereby b e t t e r
i s n e g a t i v e or the p r i n c i p a l components
e
the
-29-
6.
RIDGE REGRESSION:
(6.1)
E(L*) = E ( i
T
0 L S
) - gg
'
0 L S
In this section
61
0 = (Y-XB) (Y-XB)
T
(6.2)
= ^Xg
= 0
min
) (Y-xg
T
O L S
+ 0(B)
o L s
) + (B-B
) X X(B-g
T
0LS
0LS
by choosing a B to
T
Minimize B B
(6.3)
Subject to ( B - g
) X X (B-g ) = 0
T
0LS
QLS
-30-
where (1/k)
The
i s the m u l t i p l i e r c o r r e s p o n d i n g to the c o n s t r a i n t
(6.3).
problem i s to minimize
(6.4)
F = BB
T
(l/k)[(B-6
) X (.?-0LS
T
0 L S
T x
A n e c e s s a r y c o n d i t i o n f o r B to minimize (6.4)
|f
=2B
i[2(X
X)B-2(X
"
0*
i s that
X)^
Hence
and
B* = B ( k )
(X X+kI) X Y
T
_ 1
where k i s chosen to s a t i s f y c o n s t r a i n t
work the other way
(6.3).
In p r a c t i c e , we
usually
of squares,
then
0.
Q
I t i s c l e a r t h a t f o r a f i x e d increment 0 , t h e r e i s a continuum of
o'
v a l u e s of B
t h a t w i l l s a t i s f y the r e l a t i o n s h i p 0 = 0 .
+ 0 , and
mxn
o
nu
the r i d g e e s t i m a t e so d e r i v e d
Therefore,
we
may
i s the one
w i t h the minimum l e n g t h .
w e l l expect the r i d g e e s t i m a t e s to y i e l d
l e s s MSE
derived
I t i s t r u e to a
t h a t m i n i m i z i n g the l e n g t h of the r e g r e s s i o n v e c t o r
to r e d u c i n g
the MSE
of parameter e s t i m a t e s .
Q L S
T
increase
i n the r e s i d u a l sum
approaching s i n g u l a r i t y .
That i s to say,
is
In a d d i t i o n ,
an a p p r e c i a b l e
in
of squares as
ridge regression
(X
X)
may
without
r e s i d u a l sum
In 1971,
as evaluation c r i t e r i o n i n
to evaluate proposals
From
realize
criterion.
Then
0
= ^ O L s ' V ^ O L S
min
^OLS V
)
T x
*CA-a
( J L S
X1=1 - V O L S . l * i
(
) 2
i s the OLS
ULio
component.
The problem i s
T
Minimize A A
(6.5)
Subject
to ( - ?
A
) A(A-a
T
0 L S
0 L S
) = 0
or equivalently
(6.6)
(6.5)
Subject
to
\ (A.-
i=l
) X.
2
0LS
A _
= 0
i
?
0 L S
length equal to 0 .
(6.6)
shows that the constraint has incorporated the concept of square-errorloss function as w e l l .
estimates
shrink the estimate of for those components that have small eigenvalues,
i . e . the ones most subject to
6.2
instability.
estimator
type
(k), w i l l p a r a l l e l with
~GR
Again l e t B be
r)
estimate of B..
any
the value of the minimum sum of squares, 0 . , plus the distance from
mm'
r,
B to
rj-i
G L S
weighted through ( X X ) .
A
0 = (Y*-X*Bj (Y*-X*B)
T
" CY -xJ
A
) (Y,-xJ
T
G L S
G L S
) 4- ( B - B
) X^X (B-B
T
G L S
G L S
We have
-33rp
0(B) = 0 , 3
For a s p e c i f i c v a l u e
(6.7)
subject
The
to ( B -
) X^(B-
i s derived
= 0
G L S
)^ ] -
to minimize B B
L a g r a n g i a n i s g i v e n by
* = ? ? + C(?-
T
G L S
> &(?-e
T
A n e c e s s a r y c o n d i t i o n f o r a minimum i s t h a t
2B
+ i
[2X^B
2 ( X ^ ) 3
T h i s reduces t o
(6.8)
B* = 3
G R
(k)
= (X*X* +
for
T -1
'
-1
T
( X Q X + kl)
X n
Where k i s chosen t o s a t i s f y
trace of 3
k l ) "
(6.7).
-1
The c h a r a c t e r i z a t i o n o f the r i d g e
a specific
increment 0 , the 3 , ^ so d e r i v e d
o
~(JK.
i s the r e g r e s s i o n
satisfy
the r e l a t i o n s h i p 0 = 0 ^
m
+ 0 .
Q
For i n s t a n c e ,
of B
that
However m u l t i c o l l i n e a r i t y
may no l o n g e r be a s u b s t a n t i a l problem a f t e r t r a n s f o r m i n g
i n some r a r e cases.
That i s ,
X into X
s t u d i e s where m u l t i c o l l i n e a r i t y
i s a r e s u l t o f the e x p l a n a t o r y
i n c r e a s i n g together
over time.
c o l l i n e a r w i t h each o t h e r .
variables
I f that i s
with only a s l i g h t
In most c a s e s , i f not a l l ,
the m a t r i x
T -1
(X ft X)
i s very l i k e l y
to have
X
a broad
eigenvalue
spectrum i f (X X) does.
on the m o t i v a t i o n of m i n i m i z i n g
the i n t e r p r e t a t i o n and
the l e n g t h of the r e g r e s s i o n v e c t o r ,
i m p l i c a t i o n s o f - t h e c o n s t r a i n t i n the d e r i v a t i o n
of r i d g e e s t i m a t o r of 8 f o r a CLR
Y
p
model w i l l be a p p l i c a b l e to t h a t i n
3.
~GR
the d e r i v a t i o n of
6-3
= X8
=0
of 8^
&
and
Tc
(k)
are r e a d i l y e s t a b l i s h e d .
= a
tr
( X ^ ) '
with
to 3
-1
T
t r (X
model, (5.3)
(6.9)
Since
of 3 as f o l l o w s ,
-GLS
L e t L . = D i s t a n c e from $
2
Estimators"
g i v e s the MSE
E(L )
Then the p r e v i o u s d i s c u s s i o n
-1
X)
'
e
S e t t i n g p^ = 0 i n (5.8),
L e t L^(k) = D i s t a n c e
(6.10)
E[L (k)] = a
2
(6.10) g i v e s
the MSE
from ( k ) to 8
of
8 (k).
r R
G R
t r [ ( X f t X+kI) ( X f t X ) ] +
T
2 T T -1
-2
k (X ?+kI) 8
The
(6.10),
expect
effect
since
E(L-j)
a n
of a u t o c o r r e l a t i o n Is d i f f i c u l t
to i n f e r from (6.9)
0, i s not a d i a g o n a l m a t r i x , however, n o r m a l l y we
^
and
may
-35-
6.4
Estimation
T h e o r e t i c a l l y , t h e GLS g i v e s the BLUE o f 8 f o r an ALR model.
iterative
i s a c t u a l l y quite straightforward
incorporated
of .
We i l l u s t r a t e how r i d g e r e g r e s s i o n can be
c o l l i n e a r explanatory
(6.11)
.
and
- 3
B l
e t-l
= P
t l
only
variables:
+ 8 X
2
t 2
t=l,2,,..,n
' J
for a l l t
E(u )
= 0
E ( u
t' t+s
u
for s = 0
=0
The
with
b e t t e r estimates
two
t o combine r i d g e r e g r e s s i o n
for s 0
transformed r e l a t i o n i s g i v e n by
(6.12)
- P
Y _
t
- 3 (l-p ) + ^ ( X ^ - P ^ . ^
o
+ u.
e (X
2
t 2
-P X
e
l j 2
- 3 (1-P >-+ ^
~ h e t-l,l
X
2 t2 " W t - l , 2
X
Y
E
t-l
+ U
( X ^ - p g X ^ . ^ and (\ ~ e t-l
P
^.
The c o e f f i c i e n t estimate of (X . - p X
.) i s our
ti
e t-li
approximation of 3
and the intercept term divided by (1-p ) i s our
n
/\
approximation of 3
G R
As stated
stability.
variable set.
fc
(X X) may be s a t i s f a c t o r y . Moreover
-37-
6.5
Prediction
Consider a f i r s t - o r d e r ALR model, (2.15) gives the minimum
In p r a c t i c e , both $
values.
r T O
and p are
f c + 1
= X ^ B ^ k )
+P e
st
where X ^
t +
explanatory v a r i a b l e s , B
regression c o e f f i c i e n t s , P
observation on the
i s an estimate of autocorrelation c o e f f i c i e n t
by
t=i
t=i
If
t 2 -
e W
/S
respect to B , 6^, 8
Q
A
a n
2
d P
/S
/\
parameters and the true error terms but also the order of autocorrelation
structure i s unknown.
7.1
The sampling
B a s i c a l l y , the
-39-
degree
They are
summarized in Table 1.
' Table 1
Experiment
1
2
3
4
5
6
7
8
9
12
.05
.05
.05
.50
.50
.50
.95
.95
.95
.05
.50
.90
.05
.50
.90
.05
.50
.90
1 S
a s
high as 0.8 or 0.9. In addition, the error terms are normally considered
to be independent, moderately and highly autocorrelated when p = .05,
fc
in
-40(2.9).
Three s e r i e s of e
the v a l u e s o f e
and d i f f e r e n t v a l u e of p .
e
The
probability
generate the s e r i e s of X ^
t h r e e experiments.
and X ^
We
and X ^
a r e generated f o r the r e m a i n i n g
have a l s o a s s u r e d t h a t t h e r e i s no
f i r s t - o r d e r a u t o c o r r e l a t i o n i n X ^ and X ^
is first-order.
Solving for Y
of f o r t y o b s e r v a t i o n s a r e generated.
so t h a t the e r r o r
structure
sets
from t = 1 to t = 30 a r e employed
by a p p r o r p i a t e methods.
significant
based on the d a t a , n i n e d i f f e r e n t
on the Y 's
first
By v a r y i n g the c o r r e l a t i o n c o e f f i c i e n t o f X ^ and
X 2 another two s e r i e s o f X ^
s i x experiments.
t h a t a r e s u i t a b l e f o r the
observations
to e s t i m a t e the e q u a t i o n
p r e d i c t i o n p r o p e r t i e s of e s t i m a t o r s .
of s i g n i f i c a n t a u t o c o r r e l a t i o n i n the e r r o r
terms.
S p e c i a l c a r e has to be e x e r c i s e d i n c o n t r o l l i n g the s e r i a l
c o r r e l a t i o n p r o p e r t i e s of the e r r o r terms.
In t h i s c o n n e c t i o n , OLS
r e g r e s s i o n has t o be r u n on
(7.2)
=pe
3
,+u
J
j = 1,2,...,10
t = 1,2
40
the OLS
is
However, as
e s t i m a t e s o f parameters f o r s m a l l samples
may
[23].
fc
jt''"'' j40"
G
w i l l no
*
longer
^ )>
2
-41-
( -i.>
jt
)
jt+s
-4.j_
0 for s
4- 0
c o e f f i c i e n t of ..-, i s b i a s e d .
3t-l
The u s u a l t t e s t on the e s t i m a t e of
r e g r e s s i o n c o e f f i c i e n t may be q u i t e m i s l e a d i n g , t h e r e f o r e we can o n l y
a s c e r t a i n t h a t the d e s i r e d s e r i a l
by a s s u r i n g t h a t u ^
first
t e s t whether
c o r r e l a t i o n p r o p e r t i e s are o b t a i n e d
are randomly d i s t r i b u t e d .
the s e r i e s o f u
i s c o n s i s t e n t w i t h the p r o b a b i l i t y
fc
we
i s randomly d i s t r i b u t e d .
t e s t to
Only those s e r i e s of u
We are
now
ready to e s t i m a t e the r e g r e s s i o n e q u a t i o n .
First,
the
parameters.
The Durbin-Watson
statistic
t e s t the e x i s t e n c e of a u t o c o r r e l a t i o n .
statistic
i s used as a f i l t e r
Whenever the
to
Durbin-Watson
for- e s t i m a t i o n
In a d d i t i o n , whenever the e x i s t e n c e of a u t o c o r r e l a t i o n i s
~
r e c o g n i z e d , 3
~
Tc
and B
-'
i s known f o r
w i l l simply be the
s t r a i g h t f o r w a r d m u l t i p l i c a t i o n of m a t r i c e s as shown by
(2.13) and
(6.8).
-42-
Table 2
Experiment
Method
OLS,
GR
GR
OLS,
OLS:
RR
RR
GR
GR
OLS,
GR
9.
GR
RR
O r d i n a r y L e a s t --squares
Durb.:
Durbin's
Durb.+RR:
Regression
Two--step Method
Durbin's Two-step i n c o n j u n c t i o n w i t h
Ridge R e g r e s s i o n
GLS:
GR:
C e n t r a l i z e d Least-squares
Regression
Ridge R e g r e s s i o n a d j u s t e d f o r A u t o c o r r e l a t i o n
As i s expected, no c o r r e c t i o n f o r a u t o c o r r e l a t i o n i s n e c e s s a r y f o r
experiments
1, 4 and 7.
I n o r d e r t o minimize
A.
That
Then the v a l u e o f k i s s e l e c t e d
i s to say, a unique
s e l e c t e d w i l l g e n e r a l l y be t h e b e s t f o r a l l then samples.
value of k
Obviously,
T h e r e f o r e , t h e minimum o f t h e MSE o f r i d g e
e s t i m a t e s o f 3 a c h i e v e d f o r each experiment
i s s l i g h t l y upward b i a s e d .
7.2
Sampling R e s u l t s
For
of
a r e a l s o computed,
In
o f p . and t h e mean H a i t o v s k y h e u r i s t i c
u
i s assumed to f o l l o w a normal
i s 10 and t h a t o f X
tl
variance of X
i s 8.
and X ^
18 and 15.
i s chosen t o be 3 f o r each
sample.
The t r u e v a l u e o f BQ i s 5, 8^ i s 1.1 and
7.2a R e s u l t s assuming p i s Known
F i r s t we assume p
i s known.
best of s i t u a t i o n s .
i s 1.
The r e s u l t s here w i l l
The r e s p e c t i v e
t2
indicate
T a b l e 3 c o n t a i n s t h e average MSE o f 8
~ Lrii b
Table 3
\5xperiment
k
\.
(P
0.0*
12
=.50)
(P
12
=.90)
(P
12
=.50)
12
=.90)
(P
12
=.50)
(P.
=.90)
.4824
3.2913
.0834
2.3940
.025
.0591
1.8084
.0018
1.4991
.05
.0363
.8073
.1215
.8337
.075
.3603
.2274
.3033
.4263
.1
.9849
.0156
.8871
.1092
.4929
.2901
.2
5.7786
2.0100
4.0851
.6153
2.4426
.4242
.3
2.9771
7.0896
9.3743
4.0521
5.4924
1.0113
.5
30.1494
21.7887
21.1581
11.4165
13.7628
4.8285
.7
47.5367
36.6241
35.2063
22.2489
23.7003
10.9524
1.0
75.6732
63.4542
56.5869
39.9762
38.5830
.1011
2.1681
.0561
.9405
.
will
increase
equation.
Though the GLS regression y i e l d s the BLUE of 8 , the behavior of the MSE
of 8
i s very d i f f i c u l t to i n f e r from (6.10).
~Gljb
of m u l t i c o l l i n e a r i t y , the MSE.of 8
autocorrelation increases.
-45of B
of 3
r T 0
~ GL< o
Moreover, Table 3
shows that there exists at least one value of k for each experiment
such that the MSE
of 8
= 9,
~GLiD
~GR
of the estimates of g i n
T
experiment 9.
s t i l l ill-conditioned.
of autocorrelation.
range of k.
of g
i s less than that of
~GR
larger i f m u l t i c o l l i n e a r i t y i s accompanied by high
~Gijb
adjusted.
n T
of B i s very d i f f i c u l t , i f not
~GR
(6.10) shows that the MSE of $
i s comprised
~GR
terms.
predict.
How
the
way
Results assuming p
In practice p
i s unknown
i s unknown.
c o e f f i c i e n t i s unknown and we
We
-46-
Experiment
1 2
= .05)
1
V
2
.05
= .50
.90
2*
R
a
d**
.8640
2.0791
.8979
1.8766
.9149
1.8380
.025
.8640
2.0903
.8884
1.8713
.9144
1.8269
.05
.8620
2.1001
. 8868
1.8728
.9128
1.8286
.075
.8579
2.1083
.8844
1.8805
.9104
1.8409
.1
.8567
2.1154
.8813
1.8916
.9071
1.8613
.2
. 8401
2.1353
.8619
1.9619
.8888
1.9809
.3
.8167
2.1463
.8399
2 .0383
.8648
2.1030
.5
.7651
2.1536
.7865
2.1563
.8102
2.2789
1.0
.6394
2.1527
.6571
2.2960
.6780
2.4740
k
0.0
* R :
*d :
2
R
a
statistic
2
R
a
Table 5
(Y,
= -50)
Experiment
5
05
.50
P
0.0
2
R
.90
' d
.8973
2.0984
.9178
1.8958
.9475
2.0754
.025
.8970
2.1040
.9175
1.9054
.9472
2.0865
.05
.8962
2.1095
.9168
1.9194
.9464
2.1059
.075
.8950
2.1147
.9155
1.9371
.9451
2.1317
.1
.8933
2.1198
.9138
1.9576
.9434
2.1622
.2
.8832
2.1381
.9032
2.0540
.9336
2.3024
.3
.8702
2.1592
.8895
2.1504
.9184
2.4532
.5
.8351
2.1721
.8546
2.2988
.8834
2.6086
.7
.7970
2.1826
.8187
2.4203
.8440
2.7084
1.0
.7412
2.1900
.7572
2.4732
.7846
2.7887
Table 6
(y., = . 9 5 )
Experiment
V
k
8
.05
9
50
90
.9208
2.0511
.9391
1.8785
.9575
1.8707
.05
.9201
2.0873
.9380
1.9058
.9565
1.8883
.1
.9181
2.1128
.9359
1.9437
.9544
1.9335
.2
.9116
2.1473
.9293
2.0280
.9677
2.0562
.9024
2.1659
.9199
2.1084
.9382
2.1660
.5
.8786
2.1768
.8957
2.2312
.9136
2.8381
.7
.8505
2.1725
.8673
2.3082
.8847
2.4417
1.0
.8056
2.1594
.8217
2.3739
.8399
2.5259
0.0
-48-
I n c r e a s e s as the
degree of a u t o c o r r e l a t i o n i n c r e a s e s f o r a g i v e n v a l u e o f k and a g i v e n
degree o f m u l t i c o l l i n e a r i t y .
This i s i n t u i t i v e l y
plausible
since
d e c r e a s e s as k
B e s i d e s , the b e s t R
is,
2
cl
estimators.
a c h i e v e d f o r each experiment i s p r e t t y h i g h , t h a t
This
statistic
autocorrelation.
S i n c e the model i s r e a s o n a b l y w e l l f i t t e d ,
simulation
comparisons o f t h e e x p e r i m e n t a l r e s u l t s s h o u l d be m e a n i n g f u l as w e l l as
informative.
The average MSE o f the e s t i m a t e s o f i s computed f o r each method
for
Table 7
^^Experiment.
1
.1101
.4824
.9594
.0030
.0342
.3996
.0180
.0720
.6951
.025
.0192
.0570
.2691
.1104
.0210
.0945
.05
.2865
.0390
.0087
.4158
.1965
.0024
.1833
.0719
.0939
.075
.8820
.3744
.1200
.8973
.5778
.1026
.1
1.7430
1.0167
.5559
1.5366
1.1097
.3765
.8307
.5643
.0041
.2
7.3539
5.9115
4.7694
5.3823
4.5690
2.7540
3.2001
1.2549
.5822
.3
15.1383
13.2456
11.5962
10.8432
9.5949
6.8219
6.6531
5.7624
3.3012
.5
33.5067
31.1559
28.8369
23.9694
22.3449
18.6255
15.6804
14.2377
10.0251
.7
38.6334
36.9396
32.0214
26.3049
24.3789
18.5115
79.3134
76.8708
60.7620
50.3176
52.8276
43.3464
40.2351
32.3991
k
0.0
1.0
-
73.8630
-50-
However, b e i n g
different
then i n c r e a s e s as t h e degree o f m u l t i c o l l i n e a r i t y i n c r e a s e s
f o r k = 0 and a g i v e n degree o f a u t o c o r r e l a t i o n .
except
f o r experiments
T a b l e 7 shows t h a t
4 and 7, b e t t e r e s t i m a t e s o f B i n MSE
criterion
can be o b t a i n e d i f D u r b i n ' s
regression f o r estimation.
a r e a b l e t o o b t a i n b e t t e r e s t i m a t e s o f B i n terms o f MSE i f t h e t r u e
autocorrelation coefficient P
i s unknown.
F o r c l a r i t y , we
shall
known c a s e .
i n the p
e s t i m a t e s o f 8 a c h i e v e d f o r each experiment
unknown c a s e s .
i n both the
known and
of each experiment
-51-
Table 8
(p
(p
unknown)
known)
^ s E x p e r imen t
Experiment
(r
1 2
,P )
Estimation
Method
Min. MSE
of g
Min.
^GR
(.05,.05)
RR
.025
.0192
(.05,.50)
. Durb.+RR
.05
.0390
.05
.0363
(.05,.90)
Durb.+RR
.05
.0087
.1
.0156
(.50,.05)
.0030
(.50,,50)
Durb.+RR
.025
.0210
. 025
.0018
(.50,.90)
Durb.+RR
.05
.0024
.1
.1092
(.95,.05)
.0180
(.95,.50)
Durb.+RR
.05
.0719
.05
.0561
(.95,.90)
Durb.+RR
.1 .
.0441
.1
.2901
OLS
0.0
OLS
0.0
First
average
MSE o f parameter e s t i m a t e s w i l l
first
the degree o f a u t o c o r r e l a t i o n i n c r e a s e s .
i n c r e a s e then decrease as
first
decrease
c o l l i n e a r i t y increases.
These a r e i n t u i t i v e l y p l a u s i b l e s i n c e s u f f i c i e n t
usually
r e s u l t s i n v e r y u n s t a b l e parameter e s t i m a t e s .
observe
MSE
Secondly, we
and a u t o c o r r e l a t i o n .
T h i s i s c o n s i s t e n t w i t h our a n a l y t i c
findings
does not
Table 9
Experiment
Bias i n p
e
H
x^
df = 3
2
.3581
125.7
.7182
.3586
.7231
.1419
.1818
.1414
.1769
123.1
111.7
38.7
37.4
39.1
2.78
.3849
.7498
.1151 .1502
2.53
2.40
7, 8 and 9.
However,
i t does not give any warning when there exists a f a i r l y high degree of
m u l t i c o l l i n e a r i t y , i . e . based on the Haitovsky test, m u l t i c o l l i n e a r i t y
i s i n s i g n i f i c a n t i n experiments
4, 5 and 6.
-53-
Forecasting
Tables 10, 11 and 12 report the average residual sums of squares and
the mean square error of prediction from the given values f o r the forecast period of each experiment, under the assumption that
Table 10
Experiment
k
1 2
= .05, a* = 6)
1
a
0.0
(r
i s unknown.
AA
2*
AA
F/C
F/C
AA
M S E
F/C
5.9700
8.1132
5.7055
9.1966
5.6351
10.939
.025
5.9924
8.0343
5.7332
9.6623
5.6721
10.952
.05
6.0554
7.9961
5.8113
9.0620
5.7762
11.034
.075
6.1536
7.9986
5.9328
9.1072
5.9382
11.173
.1
6.2824
8.0343
6.0918
9.1913
6.1501
11.360
.2
7.0250
8.4293
7.0074
9.8181
7.3716
12.470
.3
8.0022
9.0877
8.2087
10.739
8.9754
13.932
.5
10.245
10.755
10.957
12.944
12.6521
17.249
1.0
15.733
15.075
17.656
18.419
21.649
25.164
"2
a : the average of the residual sums of squares over ten samples.
** MSE ^ :
F
-54-
Table 11
(r, = .50, a
k
0.0
Experiment
-2
a
u
= 6)
12
M S E
F/C
-2
. a
u
M S E
F/C
"2
a
u
M S E
F/C
6.0169
8.2093
5.8625
9.3838
5.7757
10.733
.025
6.0331
8.1743
5.8828
9.3691
5.8038
10.731
.05
6.0797
8.1690
5.9409
9.3874
5.8838
10.672
.075
6.1541
8.1905
6.0330
9.4351
6.0109
10.850
.1
6.2518
8.2360
6.1559
9.5093
6.1803
10.961
.2
6.8444
8.6142
6.8849
10.021
7.1976
11.679
.3
7.2030
9.2476
7.9231
.10.783
8.6992
12 .482
.5
9.7190
10.851
10.470
12.735
12.128
15.327
.7
11.933
12.726
13.070
14.851
15.759
18.018
1.0
15.429
15.620
17.566
18.301
21.929
22.680
2
Table 12
a
u
0.0
= .95, a
^F/C
= 6)
Experiment
k
(r
-2
a
u
MSE .
F/C
9
-2
a
u
M S E
F/C
6.0443
8.3699
5.6725
9.8589
5.4033
11.335
.05
6.1186
8.1165
5.7759
9.4141
5.5390
10.758
.1
6.2753
8.1220
6.0473
9.3568
5.8110
10.754
.2
6.7890
8.4120
6.6160
9.5945
6.7433
11.172
.3
7.5180
8.9408
7.5161
10.116
7.9187
12.001
.5
9.4101
9.8466
11.683
11.120
. 14.301
10.445
.7
11.643
.12.290
12.584
13.651
14.881
17.123
1.0
15.198
15.343
16.973
16.918
20.713
21.591
still
i n c r e a s e as the degree of
of
autocorrelation
c o l l i n e a r i t y w i l l adversely
disturbances
are h i g h l y s e r i a l l y c o r r e l a t e d .
The
commonly h e l d
a f f e c t e d by
belief
existence
of m u l t i - c o l l i n e a r i t y i s o n l y t r u e i f the problem of a u t o c o r r e l a t i o n
is
not
serious.
method g i v e s s a t i s f a c t o r y r e s u l t s on v a r i o u s
diagnostic tests,
c o l l i n e a r i t y and
not
l e s s MSE
We
to
perform
a l s o observed t h a t the v a l u e
of p r e d i c t i o n s .
e s t i m a t e s of B i n MSE
of
of squares w i l l
However, the
c r i t e r i o n tends to
of p r e d i c t i o n f o r each experiment.
o f p r e d i c t i o n and
e s t i m a t e of the r e s i d u a l sum
o f k g i v i n g the b e s t
are a b l e
t e s t s i n the j o i n t presence of m u l t i -
autocorrelation.
k t h a t y i e l d s the b e s t
still
Fortunately,
w e l l on v a r i o u s
Durbin's
Hence, we
may
value
yield
conclude
c r i t e r i o n i n the e v a l u a t i o n of parameter
estimates.
To
avoid
c o n f u s i o n , we
based on
g
~GLS
and
obtained
However, we
true p
have not
reported
the MSE
of p r e d i c t i o n based
regression.
of p r e d i c t i o n
with
ridge
r e g r e s s i o n g i v e s b e t t e r e s t i m a t e s o f 3, but
underestimates p .
Therefore,
ridge
i n general i t
(3
and
true p
on
CONCLUSIONS
I t has been shown t h a t i n the presence o f m u l t i c o l l i n e a r i t y w i t h
s u f f i c i e n t h i g h degrees o f a u t o c o r r e l a t i o n .
The OLS e s t i m a t e s o f
r e g r e s s i o n c o e f f i c i e n t s can be h i g h l y i n a c c u r a t e .
estimation
procedure i s o b v i o u s l y n e c e s s a r y .
r e g r e s s i o n , we d e r i v e d a new
estimator.
(k) = (xV'-'-X+kl)'" X n~ Y
~ .. ~
~
~
~
T
~GR
Improving the
&(k), though b i a s e d ,
autocorrelation.
based on the b i a s e d
Therefore,
However, s i n c e ft i s unknown,parameter e s t i m a t e s
estimator
B^Ck)
cannot be o b t a i n e d
i n practice.
The e f f e c t i v e n e s s o f our
simulation.
Ridge
T h i s agrees w i t h
proportioned
conventional
3 i s i n v e r s e l y p r o p o r t i o n a l to the degree of m u l t i c o l l i n e a r i t y f o r a
s u f f i c i e n t l y high
degree o f a u t o c o r r e l a t i o n .
T h i s i m p l i e s t h a t i n the
However, s i n c e i n p r a c t i c e n e i t h e r
the a u t o c o r r e l a t i o n c o e f f i c i e n t
e s t i m a t e s can p o s s i b l y be o b t a i n e d .
We
were p l e a s e d
i s known, no
to f i n d
autocorrelation;
with
technique ( p
criterion.
Though the
o f k g i v i n g b e t t e r e s t i m a t e s of 3 tends to y i e l d l e s s MSE
prediction, s t i l l
the cases.
that
the GLS
gives
value
of
the minimal-MSE o f p r e d i c t i o n i n a l l
Durbin-Watson t e s t f o r d e t e c t i n g the e x i s t e n c e
the
of f i r s t - o r d e r
auto-
of - m u l t i c o l l i n e a r i t y e i t h e r w i t h or without
presence of a u t o c o r r e l a t e d
Our
"optimal
autocorrelation.
to the s e a r c h
independent phenomena.
Empirical research
f o r optimal
d e a l i n g w i t h m u l t i c o l l i n e a r i t y and
and
find,an
m u l t i c o l l i n e a r i t y and
been c o n f i n e d
the
e r r o r terms.
r e s u l t s a l s o suggest t h a t i t might be p o s s i b l e to
estimation
information
The
estimation
autocorrelated
ordinary
has
hitherto
techniques i n
e r r o r s as
separate
ridge regression, i . e . ,
T
adding a constant
k on the d i a g o n a l
of c o r r e l a t i o n m a t r i x (X X)
m u l t i c o l l i n e a r i t y and
Even though s a t i s f a c t o r y e s t i m a t i o n
and
still
e x i s t some o t h e r
and
powerful
a u t o c o r r e l a t i o n problems.
p r e d i c t i o n are o b t a i n e d
ordinary
ridge
by
regression,
autocorrelation.
For
-58-
i n s t a n c e , the combination
of the Cochrane-Orcutt
procedure
G e n e r a l i z e d Ridge r e g r e s s i o n i s a more f l e x i b l e e s t i m a t i o n
and
thereby
with
technique
s h o u l d l e a d to b e t t e r e s t i m a t i o n and p r e d i c t i o n .
Allowing
BIBLIOGRAPHY
Cochrane, D. and Orcutt, G. H. (1949). Application of l e a s t squares regressions to relationships containing autocorrelated
error terms. J . Am. S t a t i s t . Assoc., 44, 32-61.
Durbin, J . (1960). Estimation of parameters i n time-series
regression models. J . Royal S t a t i s t . S o c , Series B,
139-153.
Durbin, J . (1970). Testing for s e r i a l c o r r e l a t i o n i n l e a s t squares regression when some of the regressors are lagged
dependent variables. Econometrica, 38, 410-421.
Durbin, J . and Watson, G. S. (1950). Testing f o r s e r i a l
c o r r e l a t i o n i n least-squares regression (part 1). Biometrica,
J 3 7 , 409-428.
Durbin, J . and Watson, G. S. (1951). Testing for s e r i a l c o r r e l a t i o n
i n least-squares regression (part 2). Biometrica, 38, 159-178.
Farrar, D. C. and Glanber, R. R. (1967). M u l t i c o l l i n e a r i t y i n
regression analysis: the problem r e v i s i t e d . Rev. Economics
S t a t i s t i c s , 49, 92-107.
G r a y b i l l , F. A. (1976). Theory and Application of the Linear
.Model. Daxburg Press, North Scituate, Mass.
G r i l i c h e s , Z. and Rao, P. (1969). Small-sample properties'of
several two-stage regression methods i n the context of .
autocorrelated errors. JASA, 64, 253-272.
Hailovsky, Y. (1969). M u l t i c o l l i n e a r i t y i n regression analysis:
comment. Rev. Economics S t a t i s t i c s , 486-489.
Henshaw, R. C., J r . (1966). Testing single-equation least squares
regression models for autocorrelated disturbances.
Econometrica,
34, 646-660.
Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression:
biased
estimation for nonorthogonal problems. Tech., 12, 55-67.
Hoerl, A. E., Kennard, R. W., and Baldwin, K. F. (1975). Ridge
regression: some simulations. Comm. Stat. , k_, 105-123.
Johnston, J . (1972).
Econometric Methods.
K l e i n , L. R. (1962).
Hall.
An Introduction to Econometrics.
Prentice-
-60-
[15]
[16]
[17]
[18]
[19]
Smith, V. K. (1973).
[20]
[21]
[22]
[23]