Multicollinearity Vs Autocorrelation

MULTICOLLINEARITY, AUTOCORRELATION, AND RIDGE REGRESSION
by
JACKIE JEN-CHY HSU

B.A. i n Econ., The N a t i o n a l Taiwan U n i v e r s i t y , 1977
A THESIS SUBMITTED IN PARTIAL FULFILMENT OF

THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE fBUSISJESS ADMINISTRATION)

IN:
THE FACULTY OF GRADUATE STUDIES"
THE FACULTY OF COMMERCE AND BUSINESS ADMINISTRATION
We accept
t h i s t h e s i s as conforming
to the r e q u i r e d
standard
THE UNIVERSITY OF BRITISH COLUMBIA

February 1980
c)
J a c k i e Jen-Chy Hsu,
1980
In p r e s e n t i n g
an a d v a n c e d
the
this
degree
Library shall
I further
for
agree
scholarly
by
his
of
this
thesis
in p a r t i a l
fulfilment
at
University
of
the
make
that
thesis
purposes
for
may
avai1able
It
financial
is
of
The U n i v e r s i t y
by
the
understood
gain
shall
British
2 0 7 5 Wesbrook Place
Vancouver, Canada
1W5
Feti.
8,
1980
Columbia
requirements
reference
copying
Head o f
that
not
Commerce & B u s i n e s s A d m i n .
of
for
for extensive
be g r a n t e d
the
B r i t i s h Co 1umbia,
permission.
Department
Date
freely
permission
representatives.
written
V6T
it
of
of
I agree
and
this
be a l l o w e d
or
that
study.
thesis
my D e p a r t m e n t
copying
for
or
publication
without
my
ABSTRACT
The
presence of m u l t i c o l l i n e a r i t y
can induce l a r g e
i n the o r d i n a r y L e a s t - s q u a r e s estimates
of r e p r e s s i o n
coefficients.
I t has been shown t h a t r i d g e r e g r e s s i o n can reduce t h i s

effect
on e s t i m a t i o n .
The presence o f s e r i a l l y
methods, have been proposed to o b t a i n good estimates

i n t h i s case.
adverse
correlated error
terms can a l s o cause s e r i o u s e s t i m a t i o n problems.
coefficients
variances
Various
two-stage
of the r e g r e s s i o n
Although the m u l t i c o l l i n e a r i t y and
a u t o c o r r e l a t i o n problems have l o n g been r e c o g n i z e d
i n regression
a n a l y s i s , they a r e u s u a l l y d e a l t w i t h
This thesis
explores
separately.
the j o i n t e f f e c t s of these two c o n d i t i o n s on the mean
square e r r o r p r o p e r t i e s of the o r d i n a r y r i d g e e s t i m a t o r
as the o r d i n a r y
'least-squares
estimator.
as w e l l
We show t h a t r i d g e
r e g r e s s i o n i s doubly advantageous when m u l t i c o l l i n e a r i t y i s

accompanied by a u t o c o r r e l a t i o n i n b o t h , t h e e r r o r s and the p r i n c i p a l .
components.
adjusted
We then d e r i v e a new r i d g e type e s t i m a t o r
that i s
for autocorrelation.
F i n a l l y , using
of - m u l t i c o l l i n e a r i t y
s i m u l a t i o n experiments w i t h d i f f e r e n t
degrees
and a u t o c o r r e l a t i o n , we compare the mean
square e r r o r p r o p e r t i e s o f v a r i o u s
estimators.
TABLE OF CONTENTS
INTRODUCTION
NOTATION AND PRELIMINARIES
MULTICOLLINEARITY
3.1
Sources
3.2
Effects
3.3
Detection
AUTOCORRELATION
4.1
Sources
4.2
Effects
4.3
Detection
JOINT EFFECTS OF MULTICOLLINEARITY AND AUTOCORRELATION

5.1
Mean Square Error of the OLS Estimates of '
5.2
Mean Square Error of the Ridge Estimates of,3
5.3
When w i l l Ridge estimates be better than the

OLS estimates?
5.4
Use of the "Ridge Trace"
RIDGE REGRESSION:
PREDICTION
ESTIMATES, MEAN SQUARE ERROR AND
6.1
Derivation of Ridge Estimator for a CLR Model
6.2
Derivation of Ridge Estimator for an ALR Model
6.3
Mean Square Error of the "Generalized Estimates"
6.4
Estimation
6.5
Prediction
TABLE OF CONTENTS (cont'd)
THE MONTE CARLO STUDY

7.1
D e s i g n o f t h e Experiments
7.2
Sampling R e s u l t s
7.2a.
R e s u l t s assuming p
i s known
7.2b.
R e s u l t s assuming p
i s unknown
7.2c.
Forecasting
CONCLUSIONS
REFERENCES
INTRODUCTION
M u l t i c o l l i n e a r i t y and A u t o c o r r e l a t i o n a r e two v e r y
in regression analysis.
common problems
As i s well-known, the presence o f some degree
of m u l t i c o l l i n e a r i t y r e s u l t s i n e s t i m a t i o n ,
instability
and model m i s -
s p e c i f i c a t i o n w h i l e the presence o f s e r i a l l y c o r r e l a t e d e r r o r s l e a d s t o
u n d e r e s t i m a t i o n o f the v a r i a n c e s
prediction.
estimation
o f parameter e s t i m a t e s and i n e f f i c i e n t
Because these two c o n d i t i o n s have adverse e f f e c t s on

and p r e d i c t i o n , a wide range o f t e s t s have, been developed t o
reduce t h e i r impact.
I n v a r i a b l y , the m u l t i c o l l i n e a r i t y and auto-
c o r r e l a t i o n problems a r e d e a l t w i t h s e p a r a t e l y
i n most i f not a l l the
preceedings.
In t h i s t h e s i s we address the q u e s t i o n
"What are. the j o i n t e f f e c t s
of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n on e s t i m a t i o n
Thereafter
we s h a l l study a n a l y t i c a l l y the p o s s i b l e changes i n the
e f f e c t i v e n e s s of various
these two c o n d i t i o n s .
estimator
adjusted
e s t i m a t i o n methods i n the j o i n t presence of
As a r e s u l t o f these new f i n d i n g s , a new r i d g e
f o r a u t o c o r r e l a t i o n i s then proposed and i t s
p r o p e r t i e s a r e i n v e s t i g a t e d by c o n d u c t i n g a s i m u l a t i o n
We b r i e f l y o u t l i n e t h i s t h e s i s .
our
and p r e d i c t i o n ? "
analysis.
Sections
Section 2 provides
3 and 4 g i v e a g e n e r a l
of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n .
the v a l i d i t y o f v a r i o u s
study.
the s e t t i n g f o r
d i s c u s s i o n of the problems
In a d d i t i o n , we comment on
existing diagnostic tests.
The a n a l y t i c a l study
of the j o i n t e f f e c t s of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n i s p r e s e n t ed i n S e c t i o n 5 .
I n S e c t i o n 6 , a new r i d g e e s t i m a t o r
adjusted
f o r auto-
c o r r e l a t i o n i s d e r i v e d and i t s mean square e r r o r p r o p e r t i e s a r e a n a l y z e d .

A l s o , we d i s c u s s how these new e s t i m a t e s can be obtained
The
i n practice.
methodology and the r e s u l t s o f sampling experiments appear i n S e c t i o n
-2-
7.
The
t h e s i s concludes w i t h the p r e s e n t a t i o n
methods t h a t can be used w i t h the new

a c h i e v e b e t t e r e s t i m a t e s and
of s e v e r a l two-stage
ridge r u l e that hopefully
predictions.
will
-3-
2. NOTATION AND PRELIMINARIES

The C l a s s i c a l L i n e a r R e g r e s s i o n
the
(CLR) model can be r e p r e s e n t e d by
equation
Y = x + e
(2.1)
where Y i s a
n x l . v e c t o r o f o b s e r v a t i o n s on t h e dependent v a r i a b l e , X
i s a nxp m a t r i x o f o b s e r v a t i o n s on t h e e x p l a n a t o r y v a r i a b l e s , 3 i s a
p x l v e c t o r o f r e g r e s s i o n c o e f f i c i e n t s t o be e s t i m a t e d and e i s a n x l
vector of true
e r r o r terms.
The s t a n d a r d assumptions o f the l i n e a r
r e g r e s s i o n model a r e :
(1) E ( e ) = 0 , where 0 i s t h e zero v e c t o r
(2) E(ee ) = a I , where I i s t h e i d e n t i t y m a t r i x .

(3) The e x p l a n a t o r y v a r i a b l e s a r e n o n - s t o c h a s t i c , hence they
a r e independent o f the e r r o r
(4) Rank (X) = p < n .
The O r d i n a r y L e a s t - s q u a r e s
(2.2)
0 L g
terms.
(OLS) e s t i m a t o r o f 3 i s g i v e n
(fp^fY
with variance-covariance matrix

(2.3)
Var(3
) = a (X X)
2
Q L S
_ 1
T
For s i m p l i c i t y , we w i l l assume t h a t (X X) i s i n c o r r e l a t i o n form. L e t
X T
P be the pxp o r t h o g o n a l m a t r i x such t h a t PX XP = A where A i s a
T
d i a g o n a l m a t r i x w i t h t h e e i g e n v a l u e s o f (X X), X^,...,X , d i s p l a y e d on
the d i a g o n a l o f A.
1 2
We assume f u r t h e r t h a t
p
-4-
After
applying
(2.4)
an o r t h o g o n a l
E(Y) = X P P
T
rotation,
P, i t f o l l o w s
from
(2.1) t h a t
= X*a
T
w h e r e X* = XP
i s the data
matrix
represented
and
t h e c o l u m n s o f X* a r e l i n e a r l y
the
vector
of regression
follows
that
We w i l l
consider
(2.6)
B (k) =(X X+kI)~

T
is
ridge
symmetric
be
an " a d a p t i v e
0 L S
ridge
LS
estimator"
estimator"
i n the rotated
[12].
where Z =
By
SfcCk) - Z a
(A+kI)
substituting
8 (k)
R
- 1
the ridge
0 L S
A.
= (P AP+kI)
T
? (
T
A + k
_ 1
-PÂ+kl)"
It
follows
(2.8)
from
(2.7) t h a t
a ( k )= P (k) .
R
- 1
I found
P AP
T
??
T A
Q L S
?OLS
Aa ...
QLS
by a
i s said to
[ll:p.63].
coordinates,
X*P = X i n ( 2 . 6 ) ,
I f k l i s replaced
k, t h e n t h e e s t i m a t o r
given by
(2.7)
When k i s a f u n c t i o n o f |3Q > ( k )
nonnegative d e f i n i t e matrix
a "generalized
Expressed
Q L S
It
0 < k < l
of
components.
f o r 3 o f the form
(X X)
by
coordinates
a =-P i s
The v e c t o r
of the p r i n c i p a l
of a i s given
ridge estimators
Where k i s i n d e p e n d e n t
called
independent.
coefficients
t h e OLS e s t i m a t o r
i n the rotated
estimator
fora i s
For the CLR
model, assumption
i s often violated i n practice.

Autoregressive
Linear Regression
(2) t h a t the e r r o r s are
This leads
(ALR)
to the f o r m u l a t i o n
model.
Mathematically
model i s given by r e p l a c i n g assumptions (2) and

(2') and
(3) by
T
E(ee
) = a^Q
We
the
ALR
assumptions
(3 )
t ; L > 2 > tp-' ' ^
fc
where 2 i s a n o n d i a g o n a l p o s i t i v e d e f i n i t e
ie
observation
on the
v a r i a b l e s , i s independent of the contemporaneous and

E
an
(2 )
1
t'
of
(3') below.
uncorrelated
matrix.
explanatory
succeeding
errors,
t+1'
assume t h a t the e r r o r term 'e
follows a f i r s t - o r d e r
autoregressive
scheme, t h a t i s
e
(2.9)
where p
and
i s the a u t o c o r r e l a t i o n c o e f f i c i e n t .
that U
(2.10)
= p e - + U
e t-1
t
s a t i s f i e s the f o l l o w i n g f o r a l l t
E(U
E
) = 0
t t
U
E(U U
t
t + s
"
s = 0
)=0
4 o
and
(2.11)
E(ee
~~
N
) =
'
a2u~V
where
n-1
n-2
V
n-1
n-2
We
require that
|p |<l
-6-
For an ALR model, the " G e n e r a l i z e d L e a s t - s q u a r e s "

"Best L i n e a r Unbiased E s t i m a t o r "
(GLS) w i l l
(BLUE) o f 3, denoted as 3
r T C
g i v e the
.
The
m a t r i x ft can be w r i t t e n
T
0 = 9Q
where Q i s n o n s i n g u l a r .
-1
-1 T
ft(Q
-1 T
(Q )
and
^GLS
Hence
) = I
-3
-1
n
b ^ - ^ by making
t a
the f o l l o w i n g s u b s t i t u t i o n i n the ALR model;
n e
S*
_ 1
Then i t f o l l o w s t h a t
(2.12)
Since
= X g + ,.
(2.12) s a t i s f i e s a l l the assumptions o f a CLR model, OLS
g i v e the BLUE of 3.
( 2
1 3 )
i Ls
Hence i t f o l l o w s t h a t
T -1 -1 T -1
= (X ft X)
X ft Y
For p r e d i c t i o n , formula
where e
t + 1
= X
t + 1
G L S
i s the t*"* GLS

1
'
(2.15) g i v e s the "Best L i n e a r Unbiased P r e d i c t o r "
(BLUP) i n a f i r s t - o r d e r ALR model
(2.15)
will
p e
residual.
-7-
3. MULTICOLLINEARITY
In a p p l y i n g m u l t i p l e r e g r e s s i o n models, some degree of
dependence among e x p l a n a t o r y
v a r i a b l e s can be
expected.
inter-
As
this
T
interdependence grows and
the c o r r e l a t i o n m a t r i x (X X)
s i n g u l a r i t y , m u l t i c o l l i n e a r i t y c o n s t i t u t e s a problem.
approaches
Therefore
it is
p r e f e r a b l e to t h i n k of m u l t i c o l l i n e a r i t y i n terms of i t s " s e v e r i t y "

rather
3.1
than i t s " e x i s t e n c e "
or
"nonexistence".
Sources
In g e n e r a l , m u l t i c o l l i n e a r i t y can be
poor e x p e r i m e n t a l d e s i g n .
be c l a s s i f i e d as f o l l o w s
(i)
Not
The
considered
to be
a symptom
of
sources of severe m u l t i c o l l i n e a r i t y may
[20:p.99-101].
enough d a t a or two
many v a r i a b l e s
In many cases l a r g e d a t a s e t s o n l y c o n t a i n a few b a s i c f a c t o r s .

As
the number of v a r i a b l e s e x t r a c t e d
from the d a t a i n c r e a s e s ,
each
v a r i a b l e tends to measure the d i f f e r e n t nuances of the same b a s i c

f a c t o r s and
each h i g h l y c o l l i n e a r v a r i a b l e o n l y has
t i o n content.
little
In t h i s case, d e l e t i n g some v a r i a b l e s or
informa-
collecting
more data can u s u a l l y s o l v e the problem.

( i i ) P h y s i c a l or s t r u c t u r a l s i n g u l a r i t y
Sometimes h i g h l y c o l l i n e a r v a r i a b l e s , due
p h y s i c a l c o n s t r a i n t s , are i n a d v e r t e n t l y
to mathematical or
included
i n the model.
( i i i ) Sampling s i n g u l a r i t y
Due
to expense, a c c i d e n t
or m i s t a k e , sampling was
conducted i n a s m a l l r e g i o n of the d e s i g n
space.
only
3.2
Effects
The
(i)
major
effects
Estimation
As
the
T
(X X)
the
T
-1
(X X)
explode.
inverse matrix
of
becomes i l l - c o n d i t i o n e d ,
the
OLS
estimates
f o r B are
elements of
the
inverse matrix
instability
of
the
be
numerically
the
inverse matrix
regression coefficients
variances
(ii)
of
and
impossible
are
of
quite sensitive
the
As
a result
of
small
OLS
estimates
v a r i a b l e s might
case
they
changes
have
i n the
large
data
set.
Structure misspecification
The
the
increase
information
decreasing
to
the
the
i n the
content
sample
explained
d e p e n d s on
size
of
the v a r i a b l e set of X
o f each e x p l a n a t o r y
significance
variance
e a c h member o f
o f Y.
authors
data
a relatively
[6:p.94][13:p.l60][15],
limitation
responsible
Forecast
If
then
f o r the
than
tendency
the
thereby
each v a r i a b l e ' s c o n t r i b u t i o n
even though Y
large v a r i a b l e set
happen.
i n the
variable,
decreases
As
process
theoretical
to underspecify
really
of
a s s e r t e d by
of
many
model-building,
limitation
the
X,
is
models.
inaccuracy
an
collinear,
changes
rather
of
Therefore,
e r r o n e o u s d e l e t i o n o f v a r i a b l e s may
(iii)
, the
I n any
to
that
-1
collinear
to obtain.
shows
the
-1
(X X)
the
(2.3)
a f f e c t e d by
T
(XX)
T
the
are
c o r r e l a t i o n matrix
variances
diagonal
serious multicollinearity
instability
elements of
the
of
important
v a r i a b l e i s omitted
but
in- the
later
i t s behavior
and
prediction period,
moves i n d e p e n d e n t l y
any. f o r e c a s t i n g u n d e r . t h i s
inaccurate.
because
of
i t i s highly
this
omitted
variable
other v a r i a b l e s ,
o v e r s i m p l i f i e d model w i l l
be
very
-9-
(iv)
Numerical problems
T
The
c o r r e l a t i o n m a t r i x (X X) i s not i n v e r t i b l e i f the columns

T
o f X a r e l i n e a r l y dependent.
the
With the m a t r i x (X X) b e i n g
singular,
OLS e s t i m a t e s of , r e p r e s e n t e d by ( 2 . 2 ) , a r e c o m p l e t e l y
indeterminate.
In case o f an almost s i n g u l a r
s e t of v a r i a b l e s ,
T -1
the n u m e r i c a l i n s t a b i l i t y i n c a l c u l a t i n g i n v e r s e m a t r i x (X X)
still
3.3
remains.
Detection
T e s t s f o r the presence and l o c a t i o n o f s e r i o u s
are b r i e f l y o u t l i n e d
(1)
multicollinearity
and f o l l o w e d by comments.
T e s t s based o n . v a r i o u s c o r r e l a t i o n
coefficients
Here, h a r m f u l m u l t i c o l l i n e a r i t y i s g e n e r a l l y
r u l e s o f thumb.
r e c o g n i z e d by
For i n s t a n c e , an admitted r u l e o f thumb
requires
simple p a i r - w i s e c o r r e l a t i o n c o e f f i c i e n t s of e x p l a n a t o r y
variables
to be l e s s than 0.8.
sophisticated
Certainly,
those more extended and
r u l e s o f thumb w i t h prudent use o f v a r i o u s c o r r e l a t i o n

w i l l give
more s a t i s f a c t o r y r e s u l t s .
i s generally
c o n s i d e r e d t o be s u p e r i o r
The f o l l o w i n g
coefficients
r u l e o f thumb
to o t h e r r u l e s :
a variable
i s s a i d t o be h i g h l y m u l t i c o l l i n e a r i f i t s c o e f f i c i e n t o f
2
m u l t i p l e c o r r e l a t i o n , R., w i t h the remaining (p-1) v a r i a b l e s i s
2
g r e a t e r than the c o e f f i c i e n t o f m u l t i p l e c o r r e l a t i o n , R , w i t h a l l
the e x p l a n a t o r y v a r i a b l e s [I4:p.101]. The v a r i a n c e o f the e s t i m a t e
of
3^ can be expressed as f o l l o w s
-,
(3.1)
1 - R
Var(B ) = ^--4
1
[9]
\
a
X.
l
1 - R
2
2
where a i s the v a r i a n c e o f the dependent v a r i a b l e Y and a,, i s the
y
X.
l
3
-10-
variance of the explanatory variable
X^.
From
(3.1), i t i s obvious
2
that m u l t i c o l l i n e a r i t y
constitutes
a problem
o n l y when R i s r e l -
2
high to R^.
atively
this
(ii)
rule
U n f o r t u n a t e l y the geometric
o f thumb i s a p p a r e n t
o n l y when t h e r e a r e two
variables
[6:p.98],
Three-stage hierarchy test
T h i s i s p r o p o s e d b y F a r r a r and G l a u b e r
stage,
i f the n u l l
hypothesis H
the W i l k s - B a r t l e t t ' s
severe
a n d move
test,
toward
i n t e r p r e t a t i o n of
: |X
At the f i r s t
X| = 1 i s rejected
we may a s s e r t
the second
[6].
explanatory
based
on
that m u l t i c o l l i n e a r i t y i s
stage.
The F s t a t i s t i c
i s then
2
f o r each R^
computed
RJ/(P-D
F
i = 1,
. . . , p
(l-R*)/(n-p)
Statistical
stage,
inspection o f the p a r t i a l
X ^ and t h e r e m a i n i n g
can
F^ i m p l i e s X ^ i s c o l l i n e a r .
significant
(p-1) v a r i a b l e s
severe
F a r r a r and G l a u b e r
multicollinearity
among e x p l a n a t o r y v a r i a b l e s
different
Haitovsky
In
stages
of their
C h i Square
statistic
test
between
tratios
among t h e e x p l a n a t o r y
that detecting,
the pattern of
can be r e s p e c t i v e l y
localizing
interdependence
achieved
at three
test.
test
1969, H a i t o v s k y
hypothesis
claimed
and l e a r n i n g
coefficients
and t h e a s s o c i a t e d
show t h e p a t t e r n o f i n t e r d e p e n d e n c y
variables.
(iii)
correlation
At the t h i r d
[9] proposed
of severe
a heuristic
multicollinearity.
i s a f u n c t i o n o f the determinant
statistic
This
f o r the
heuristic
of the c o r r e l a t i o n
matrix
T
( X X ) , and a p p r o x i m a t e l y
distributed
as C h i Square.
Applications
to F a r r a r and Glauber's d a t a show t h a t t h i s t e s t g i v e s more

s a t i s f a c t o r y r e s u l t s than the W i l k s - B a r t l e t t ' s t e s t t h a t i s
adopted
test.
and
at the f i r s t
stage of
the F a r r a r and Glauber
three-stage
T h e r e f o r e H a i t o v s k y c l a i m e d the s u p e r i o r i t y of h i s t e s t
suggested
a replacement
of W i l k - B a r t l e t t ' s t e s t by h i s t e s t i n
the F a r r a r and Glauber
three-stage t e s t .
on the the determinant
of c o r r e l a t i o n m a t r i x has some b u i l t - i n
deficiencies..
However, any
test
As w i l l be shown l a t e r , the mean square
based
error
T
p r o p e r t i e s depend o n l y on the e i g e n v a l u e s of the m a t r i x (X X ) .

T
Only when the
(X X)has a broad e i g e n v a l u e spectrum, that i s to say
the r a t i o of the l a r g e s t e i g e n v a l u e to the s m a l l e s t one,

l a r g e , the performance of the OLS
the determinant
e s t i m a t e s may
X^/X
deteriorate.
, is
Since
of the c o r r e l a t i o n m a t r i x i s e q u a l t o the product
of a l l the e i g e n v a l u e s , t h i s t e s t w i l l t r e a t the m a t r i x having

broad e i g e n v a l u e spectrum
e q u i v a l e n t l y to those h a v i n g
relatively
narrow e i g e n v a l u e s p e c t r a , so l o n g as they have the same or n e a r l y

the same d e t e r m i n a n t s .
is difficult
The
r e l a t i v e magnitude of the e i g e n v a l u e s
i f n o t i m p o s s i b l e t o i n f e r from the r e s u l t s of
t e s t t h a t i s based
on the determinant
However, H a i t o v s k y t e s t g i v e s a f a i r l y
any
of the c o r r e l a t i o n m a t r i x .
good i n d i c a t i o n i n the
presence of severe m u l t i c o l l i n e a r i t y i n our s i m u l a t i o n study.

T
Examining the spectrum o f m a t r i x (X X)
T
I f the m a t r i x (X X) has a broad e i g e n v a l u e spectrum, t h a t i s ,
A^/Ap i s l a r g e , then the mean square e r r o r of the OLS e s t i m a t e s of
becomes v e r y l a r g e .
S i n c e the t r a c e of the c o r r e l a t i o n m a t r i x
T
(X X) i s e q u a l to the number o f e x p l a n a t o r y v a r i a b l e s p, an
arbitrary
-12-
r u l e of thumb may consider \,/\ i s large i f A,/X > p.

1 p
1 p
Besides,
the minimax index MMI = Z X. /X

i
P
i s a useful indicator too. Small
MMI
, say, less than two implies the presence of m u l t i c o l l i n e a r i t y
[21:p.13-14].
Among a l l these tests and methods proposed, examining the
T
eigenvalue spectrum of the matrix (X X) provides not only a sound

t h e o r e t i c a l basis but also the l i g h t e s t computation burden.
-13-
4. AUTOCORRELATION
One o f t h e b a s i c assumptions o f t h e CLR model i s t h a t the e r r o r
terms a r e independent o f each o t h e r .
i s applied
However, when r e g r e s s i o n
analysis
t o time s e r i e s d a t a , the r e s i d u a l s a r e o f t e n found t o be
s e r i a l l y correlated.
Like multicollinearity, autocorrelation
w i d e s p r e a d problem i n a p p l y i n g
r e g r e s s i o n models.
i s another
For s i m p l i c i t y ,
first-
o r d e r a u t o c o r r e l a t i o n i s assumed i n our s t u d y .
4.1
Sources
The s o u r c e s a r e m a i n l y t h e f o l l o w i n g :
(i)
Omission of v a r i a b l e s
The t i m e - o r d e r e d e f f e c t s o f t h e o m i t t e d v a r i a b l e s w i l l be
included
i n t h e e r r o r terms.
d i s p l a y i n g random b e h a v i o r .
This prevents the e r r o r s
from
I n t h i s case, f i n d i n g the m i s s i n g
v a r i a b l e s and i d e n t i f y i n g t h e c o r r e c t r e l a t i o n s h i p can s o l v e t h e
problem.
(ii)
S y s t e m a t i c measurement e r r o r i n t h e dependent v a r i a b l e
A g a i n , t h e e r r o r terms absorb the s y s t e m a t i c measurement e r r o r
i n t h e dependent v a r i a b l e and t h e n d i s p l a y non-random b e h a v i o r .
(iii)
E r r o r s t r u c t u r e i s time dependent
The g r e a t impacts o f some random e v e n t s o r s h o c k s , such as
war, s t r i k e s , f l o o d , e t c . , a r e spread over s e v e r a l p e r i o d s o f t i m e ,
c a u s i n g t h e e r r o r terms t o be s e r i a l l y c o r r e l a t e d .
"true-autocorrelation".
This i s so-called
-14-
4.2
Effects
When the OLS technique i s s t i l l used f o r e s t i m a t i o n , the major
effects are:
(i)
Unbiased but i n e f f i c i e n t e s t i m a t o r of B
GLS p r o v i d e s t h e BLUE o f B when the d i s p e r s i o n m a t r i x o f e,
2
oM}, i s n o n d i a g o n a l .
That i s t o say on the average
the sampling
v a r i a n c e s o f GLS e s t i m a t e s of B a r e l e s s than t h a t o f OLS e s t i m a t e s

of
g, hence OLS i s i n e f f i c i e n t
compared w i t h GLS.
U n d e r e s t i m a t i o n o f the v a r i a n c e s o f the e s t i m a t e s o f B
(ii)
As an i l l u s t r a t i o n , c o n s i d e r t h e v e r y simple model
y
V
where u
= Bx
e
e
+ c
t - i
s a t i s f i e s assumptions ( 2 . 1 0 ) .
I t has been shown t h a t the
v a r i a n c e o f OLS e s t i m a t e o f B i s [ 1 3 : p . 2 4 7 ]
n-1
a
(4.1)
Var(6
0 L S
i>i i l
- -f
i-i
n-2
[ 1 + 2^
+.2p
1=1
+ 2 p e
n
-f^*
i - i
X
1
"
n
i=l
The OLS formula ( 2 . 3 ) i g n o r e s
J, l i+2
xi
1
the term i n parentheses
i n ( 4 . 1 ) and
2
2
g i v e s the v a r i a n c e s o f the e s t i m a t e s o f B as a / x ^ I f b o t h e
. i=l
and
x a r e p o s i t i v e l y a u t o c o r r e l a t e d the e x p r e s s i o n i n parentheses
n
i s almost c e r t a i n l y greater than unity, therefore the OLS formula

w i l l underestimate the true variance of 3
(iii)
n T C
I n e f f i c i e n t predictor of Y
When autocorrelation i s present,, error made at one point i n
time gives information about the error made at a subsequent point
i n time.
The OLS predictor f a i l s to take this information into
account, hence i t i s not the BLUP of Y [13:p.265-266].

A.3
Detection
The tests which are commonly used to recognize
the existence of
f i r s t - o r d e r autocorrelation are the following.

(i)
Eye-ball tests
The plot of OLS residuals e
Any nonrandom bechaior
autocorrelation.
lagged value e
fc
of e
fc
against time t can be informative.
can be considered
as an i n d i c a t i o n of
We may also plot the OLS residual e

^.
I f the observations
against i t s
are hot evenly spread over
the four quadrants, we may conclude the f i r s t - o r d e r autocorrelation

i s present.
These eye-ball tests are quite e f f e c t i v e , however they
are imprecise and do not lend themselves to c l a s s i c a l i n f e r e n t i a l

methods.
(ii)
yon-Neumann r a t i o
In 1941, the r a t i o of the mean square successive difference to
the variance was proposed by von-Neumann as a test s t a t i s t i c for the
existence of f i r s t - o r d e r autocorrelation [22].
Though various
applications have proven the usefulness of the von-Neumann r a t i o ,

we emphasize that this test i s applicable only when e values are
independently d i s t r i b u t e d and the sample size i s large.
In practice,
the OLS r e s i d u a l s used
to compute t h e von-Neumann r a t i o
are not i n d e p e n d e n t l y d i s t r i b u t e d
usually
even when the t r u e e r r o r
terms
are.
(iii)
Durbin-Watson t e s t
T h i s t e s t , named a f t e r i t s o r i g i n a t o r s D u r b i n and Watson, i s
w i d e l y used
f o r s m a l l sample s i z e s
comings o f the Durbin-Watson t e s t .

o f indeterminancy.
[4][5]..
There a r e some s h o r t -
First,
t h e r e e x i s t two r e g i o n s
Though an exact t e s t was suggested
i n 1966, i t s heavy c o m p u t a t i o n a l burden, p r e v e n t s

a p p l i c a t i o n s ' . [10].
Secondly,
the .test" from wide
the Durbin-Watson t e s t i s d e r i v e d f o r
non-stochastic explanatory v a r i a b l e s only.

if
by Henshaw
I t has been shown t h a t
the lagged dependent v a r i a b l e s a r e p r e s e n t e i t h e r i n s i n g l e
r e g r e s s i o n e q u a t i o n models or i n systems o f simultaneous r e g r e s s i o n

e q u a t i o n s , t h e Durbin-Watson t e s t
i s b i a s e d towards the v a l u e f o r
a random e r r o r , t h a t i s , d i s b i a s e d towards 2
v e r y m i s l e a d i n g i n f o r m a t i o n [17].
thereby
giving
I t i s as n e c e s s a r y as important
to t e s t f o r s e r i a l c o r r e l a t i o n f o r models c o n t a i n i n g lagged
dependent v a r i a b l e s s i n c e a u t o c o r r e l a t e d models a r e u s u a l l y r e p a i r e d
by i n s e r t i n g lagged Y v a l u e s i n t o the r i g h t - h a n d s i d e of the
regression equation.
To t h i s end, D u r b i n
on the h s t a t i s t i c i n 1970[3].
"h" i s d e f i n e d as the f o l l o w i n g ,
n
where b' i s the c o e f f i c i e n t o f Y
t-1*
developed
a test
based
-17-
This test
i s c o m p u t a t i o n a l cheap but o n l y a p p l i c a b l e f o r l a r g e
sample s i z e s .
The s m a l l sample p r o p e r t i e s of the "h" s t a t i s t i c
are s t i l l unknown.
-18-
5.
JOINT EFFECTS OF MULTICOLLINEARITY AND AUTOCORRELATION

In
s t a t i s t i c a l a n a l y s i s , a p o i n t e s t i m a t e i s u s u a l l y o f l i t t l e use
u n l e s s accompanied by an e s t i m a t e o f i t s a c c u r a c y .
Mean Square E r r o r
it
In t h i s
connection,
(MSE) i s w i d e l y used as a measure of a c c u r a c y .
Since
i s t r u e t h a t a c c u r a t e parameter e s t i m a t e s c o n s t i t u t e an e f f e c t i v e
model.
MSE
can be used
t o determine
the models e f f e c t i v e n e s s when
the u n d e r l y i n g o b j e c t i v e i s s i m p l y t o o b t a i n good parameter e s t i m a t e s .

In
1970 paper,
H o e r l and Kennard p r e s e n t e d
OLS and Ridge e s t i m a t e s of [11].

s t u d i e s have confirmed
the MSE p r o p e r t i e s f o r the
T h e r e a f t e r the r e s u l t s o f v a r i o u s
that ridge r e g r e s s i o n w i l l
improve the MSE
of
e s t i m a t i o n and p r e d i c t i o n s i n the presence
In
t h i s s e c t i o n , we w i l l p r e s e n t e x p r e s s i o n s f o r the MSE of
B (k)
(2.9).
and
These e x p r e s s i o n s w i l l enable us t o examine the e f f e c t o f these
c o n d i t i o n s on the r i d g e and the OLS e s t i m a t e s .
be reduced
5.1
multicollineary.
when the e r r o r terms f o l l o w a f i r s t - o r d e r a u t o c o r r e l a t e d p a t t e r n
two
o f severe
Our a n a l y s i s can
t o t h a t o f H o e r l and Kennard by s e t t i n g p
= 0.
Mean Square E r r o r of the OLS E s t i m a t e s of 8

We b e g i n w i t h the a n a l y s i s f o r the OLS e s t i m a t e s f o r a f i r s t - o r d e r A L R
model. L e t
= D i s t a n c e from
^0LS"
) T (
Q L S
to B
^0LS~^ '
"
We d e f i n e the MSE of B
ul_ib
2
to be E ( L ) .
_L
P r o p o s i t i o n 5.1
(5.D
-BO*) =
1
j = l =1
D
J J i
-19-
where
D = X(X X)
T
Proof:
( 5 . 2 )
From (2.1)
ÔLS "
(2.2)
? P ? I ~
- 1
= (X X) X (X+e) - 3T
-1
= (X X)
l
-1
X. e
By d e f i n i t i o n and (5.2) i f follows that
E(L ) = E [ ( 6
2
-B) (i
T
O L S
O L S
-3)]
= gt^xonp X e] .
Noting that E(e) = 0 i t follows from Theorem 4.6.1
G r a y b i l l [7:p.l39]
that
(5.3)
E(L ) = a t . [ X ( X X ) " X V ]
2
From the d e f i n i t i o n of V and D, (5.1) follows.

(5.1) does not give much insight into the effect of m u l t i c o l l i n e a r i t y
and autocorrelation on the MSE
of 3Q
By rotating axes (using
LS
p r i n c i p a l components) the effect can be more c l e a r l y
Proposition
5.2
1=1
where x*. i s the i
demonstrated.
j > I
observaton on the i
1=1
p r i n c i p a l component.
^ o r a matrix A we use the notation t ( A ) to denote the trace of A.

r
-20-
Proof:
(5.5)
From (2.4) and (2.5)

a
- a = (X* X*) X* Y - a
T
Q L S
-1
= (X* X*) X* (X*a+e) - a

T
T
-1 T
= (X* X*) X* g
By d e f i n i t i o n and (5.5) i t follows that
^"ibLS-^VPCgoLS-^l
< 1>
L
E [ (
?0LS^
^0LS 2
_
= E[e X*(X* X*)

T
)]
X* e]
T
Hence by the same argument used i n proving Proposition 5.1
E(L )
2
a t [X*A X* V]
u r ~ ~ ~ ~
2
xl i
!i !i
X
2
1 AT
l
L
X* X* .
l i nx
au2 tr (,
V .)
X* . x* . p X* . X* .
nx l x Y nx 2x
2
^ 2
AT
1
AT
X
p x*
y nx
2
1 A
x
2
By d e f i n i t i o n of A and V, (5.4) follows.

After orthogonal rotation, the effect of m u l t i c o l l i n e a r i t y and
autocorrelation
becomes apparent from (5.4).
First, i f p
i s positive
and most of the p r i n c i p a l components are also p o s i t i v e l y autocorrelated,

almost c e r t a i n l y the second term i n (5.4) w i l l be p o s i t i v e .
That i s to
-21-
say t h a t the MSE o f g
w i l l be l a r g e r than when these e f f e c t s a r e n o t
p r e s e n t ; moreover the d i f f e r e n c e w i l l be i n p r o p o r t i o n to the magnitude

of
p .
Secondly, we o b t a i n a c r o s s term o f e i g e n v a l u e s , A_ and a u t o T
c o r r e l a t i o n c o e f f i c i e n t , p^.
that i s ,
I f the matrix
(X X) i s i l l - c o n d i t i o n e d ,
i s c l o s e to zero and t h e r e is' a h i g h degree o f p o s i t i v e
c o r r e l a t i o n both
i n the p*"* component and the e r r o r terms, then the

1
second term i n (5.4) dominates and the MSE o f g

It
auto-
i s then extremely
characteristics.
can be v e r y l a r g e .
dangerous t o a p p l y OLS t o data w i t h the above
However, the problem w i l l n o t be t h a t s e r i o u s i f p
i s n e g a t i v e o r t h e p r i n c i p a l components, e s p e c i a l l y those weak components,

are n o t a u t o c o r r e l a t e d .
Finally,
from (5.4) we a r e a b l e to t e l l by
how much the MSE o f Pg^g changes because o f the e x i s t e n c e o f f i r s t - o r d e r

a u t o c o r r e l a t e d e r r o r s i n g e n e r a l r e g r e s s i o n models c o n t a i n i n g p
explanatory v a r i a b l e s .
Note t h a t when p^ = 0, (5.4) reduces t o
E ( L ) = a t (xV
I
u r ~ ~
2
?
^.
> A.
i=l
= a
5.2
Mean Square E r r o r o f the Ridge E s t i m a t e s
- '* " ' '-'
of 0
In p a r a l l e l w i t h 5.1, we d e f i n e
L.(k) = D i s t a n c e from B_(k)
L
~K
to g
~
The MSE o f g ( k ) i s g i v e n by
D
E[L (k)] = E[g (k)
B ) ( B ( k ) - )].
P r o p o s i t i o n 5.3
(5.6)
E[L (k)] =
where
2
y^k) = ^
2
Y
( k )
Y l
( k ) + Y (k) + Y (k)
2
iJ.
\^
(A +k)2
.
"i
-22-
n n-1
Y (k) - l o l l
l
.
j > I
3
Proof:
A.
] p -'
px*.x*
l
\
i=l( Aj+k)
From (2.7) and (2.8), the MSE of R.(k) can be written
as
E [ L ( k ) ] = E[( (k) - B ) V p ( 3 ( k ) - )]
2
"
E [ ( Z
?OLS " ?
= E[(a
(5.7)
) T ( Z
5oLS * ?
- a) Z Z(a
T
Q L S
) ]
- a)] + (Za - a ) ( Z a - a)
T
Q L S
Since the f i r s t term i n (5.7) i s a scalar, from (2.7) and Proposition

5.2, i t follows
(5.8)
E[(a
-a) Z Z(a
T
Q L S
0 L S
-a)] = a
t [^r^CA+kl)" ^" ^*^]

2
= a j t [X*(A+kI)" X* V]
2
2
u
2o
1=1 (XH-k)
n
n-1
l i t !
j > ,
P x*.x*
- ^
.
2
] Pf
i = l (X +k)
Since the matrix (Z-I) can be written as
(5.9)
Z - I = Z(I-Z )
-1
= Z(-kA )
-1
= -k(A+kI)
_1
From (5.9), the second term i n (5.7) can be expressed as follows.
(Za-a) -(Za-a) =
J
a (Z-I)^'a
i
2 T
-?
= kV(M-kl)
a
Z
= k
I
i=l
a. '
(A +k)
= Y (k)
'
Completing the p r o o f .
The
Y^(k)
MSE of B ( k ) c o n s i s t s o f t h r e e p a r t s , yîk),
R
can be c o n s i d e r e d
t o be the t o t a l v a r i a n c e
e s t i m a t e s and i s a m o n o t o n i c a l l y
the
decreasing
Y ( k ) and Y^Ck).
2
o f the parameter
f u n c t i o n o f k, Y ( k ) i s
2
square o f the b i a s brought by the augmented m a t r i x k l and i s
m o n o t o n i c a l l y i n c r e a s i n g f u n c t i o n o f k w h i l e Y-j(k) i s r e l a t e d to the
a u t o c o r r e l a t i o n i n the e r r o r terms.
Hoerl
the presence o f severe m u l t i c o l l i n e a r i t y ,
and Kennard c l a i m t h a t i n
i t i s p o s s i b l e t o reduce MSE
s u b s t a n t i a l l y by t a k i n g a l i t t l e b i a s , t h a t i s , c h o o s i n g k > 0.
This
i s because i n the neighborhood o f o r i g i n , Y^(k) w i l l drop s h a r p l y

Y (k)
2
w i l l only increase
s l i g h t l y as k i n c r e a s e s
[ll:p.60-61].
After
i n c o r p o r a t i n g a u t o c o r r e l a t i o n i n the c o n t e x t o f r i d g e r e g r e s s i o n
t h e i r a s s e r t i o n w i l l s t i l l be t r u e o n l y
satisfied.
From (5.6) we see t h a t
while
analys
i f c e r t a i n conditions are
the e f f e c t s o f m u l t i c o l l i n e a r i t y and
a u t o c o r r e l a t i o n a r e the f o l l o w i n g .
(i)
If
i s p o s i t i v e and the p r i n c i p a l components, e s p e c i a l l y t h e
weak components, a r e a l s o p o s i t i v e l y a u t o c o r r e l a t e d ,
method w i l l be even more d e s i r a b l e than OLS.
then
ridge
T h i s i s because
s u b s t a n t i a l decrease i n b o t h Y-^(k) and Y^(k) can be a c h i e v e d by

c h o o s i n g k > 0 w h i l e the i n c r e a s e
i n Y ( k ) i s r e l a t i v e l y s m a l l as
2
moving t o k > 0.
(ii)
If p
i s n e g a t i v e or almost a l l o f the p r i n c i p a l components a r e
-24 -
not
autocorrelated,
then on the average Y ( k ) i s c l o s e to z e r o ,

3
hence the r i d g e and the OLS e s t i m a t e s w i l l perform

the
same as i n the u n c o r r e l a t e d
( i i i ) Since
ridge regression
relatively
case,
i s s i m i l a r to s h r i n k i n g the model by d r o p p i n g
the l e a s t important component [21:p.24-28]
(5.6) g i v e s a t h e o r e t i c a l
j u s t i f i c a t i o n t o s h r i n k t h e model i f both t h e l a s t
e r r o r terms a r e p o s i t i v e l y a u t o c o r r e l a t e d .
of e s t i m a t i o n
stability,
component and the
From the p o i n t o f view
t h e r i d g e method w i l l be h e l p f u l when severe
m u l t i c o l l i n e a r i t y i s accompanied by h i g h degree o f p o s i t i v e autoc o r r e l a t i o n both i n the weakest component
5.3
and the e r r o r terms.
When w i l l Ridge e s t i m a t e s be b e t t e r than the OLS e s t i m a t e s ?

T a k i n g the d e r i v a t i v e s o f Y-^(k) and Y 2 ( k ) , H o e r l and Kennard found
a c o n d i t i o n on k such that r i d g e r e g r e s s i o n g i v e s b e t t e r
e s t i m a t e s than OLS i n terms of MSE.
a
That i s when k i s s m a l l e r
2
, where a
i s the l a r g e s t r e g r e s s i o n c o e f f i c i e n t
max
max
MSE o f 6 ,(k) w i l l be l e s s than that o f g. [11].

T
i s present,
Y ( k ) and Y ( k ) .
2
below.
than a /
u
i n magnitude, the
When a u t o c o r r e l a t i o n
t h e c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l
b e t t e r than OLS r e g r e s s i o n a r e d e s c r i b e d
of Y ( k ) ,
parameter
perform
C o n s i d e r the d e r i v a t i v e s
"P
dy
/dk
2k
A. af
'I
i=l
(5.10) d y / d k
3
When (X X)
values
(A^k)-
approaches s i n g u l a r i t y which i m p l i e s t h a t A
of the f i r s t
are g i v e n
two
d e r i v a t i v e s i n the neighborhood of
the
origin
by
(5.11) Lim Lim

A ->0 k->0
P
(dy /dk)
= -<*>
(dy
= 0
(5.12) Lira Lim

A ->0 k+0
P
/dk)
As k i n c r e a s e s , a huge drop i n y^ w i t h s l i g h t
expected.
-> 0,
increase
in y
may
However (5.10) shows t h a t the b e h a v i o r of y^ depends on
degree of a u t o c o r r e l a t i o n b o t h i n the p r i n c i p a l components and

terms.
Therefore
increases.
be
The
y^ may
use
increase
or d e c r e a s e at v a r i o u s
of r i d g e r e g r e s s i o n
the
rate's as
i s most f a v o u r a b l e
the
when
error
k
there
i s a h i g h degree of p o s i t i v e a u t o c o r r e l a t i o n both i n the components

the e r r o r terms.
We
now
formalize
these arguments and
present
c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l be b e t t e r than
regression
i n MSE
criterion.
a
OLS
and
-26-
Let F(k) = E ( L ) - E [ L ( k ) ]
2
2
i = i At;
(A.+k)"
" j = i *=i
fcJ
=i
(A.+k)
Then
dF/dk = 2 a J
(5.13)
j -^[x/f
i=l
(X + k )
j=l =1
X*.X*
pj]-2k
P A
1=1 ( A + k )
Assume t h a t Yg(k) i s a n o n - i n c r e a s i n g f u n c t i o n of k i n the neighborhood

From ( 5 . 1 1 ) and
of o r i g i n .
moving
towards k > 0 ,
( 5 . 1 2 ) we may
^ ( d F / d k ) > 0.
i.e.
e x i s t s k > 0 such t h a t the OLS
Theorem 5 . 1 .
/r- , ,
o*
a
max
implies
If
2
\
max
ri-1
1 3=1
then E(L?; ) - E [ L ( k ) ] > 0 .

2
In o t h e r words,
e s t i m a t e s have h i g h e r MSE
estimates.
j J J + (dF/dk) > 0
expect F ( k ) to i n c r e a s e as
n-j
1=1
there
than the r i d g e
-27-
Again (5.14) w i l l reduce to Hoerl and Kennard's result i f p
=0.
When
p o s i t i v e autocorrelation exists i n the error terms and the p r i n c i p a l

components, the second term i n (5.14) may well be p o s i t i v e , hence the
range of k for ridge estimates to be better than OLS estimates i n MSE
c r i t e r i o n w i l l be larger than what Hoerl and Kennard asserted i n
uncorrelated case.
(5.14) shows that the extension i n the range of k
i s p o s i t i v e l y related with the magnitude of p .
However, (5.14) i s
2
2
just a necessary condition on k for E(L^) to be greater than E[L2(k)]
since F(k)- i s increasing i n k over the range shown by (5.14).
It
i s possible that for some values of k, F(k) i s decreasing i n k while

2
the function value i s s t i l l p o s i t i v e , that i s E(L^) i s s t i l l greater
2
than E[L (k)]. Therefore, we may
consider (5.14) as a stringent
condition on k for ridge estimates to be better than OLS

i n MSE
estimates
criterion.
If either p
i s negative or the p r i n c i p a l components, especially
those weak ones, are not autocorrelated, the behavior of Y^k)

thereby F(k) as k increases w i l l be hard to predict.
and
The effect of
autocorrelation on the range of k depends on the data set we
gathered.
In practice, the true parameters are unknown, the range of k shown

by (5.14) can be approximated
by conducting a p r i n c i p a l component
analysis and substituting the estimates for the

5.4
parameters.
Use of the "Ridge Trace"

In ridge regression the augmented matrix (kl) i s used to cause
the system to have the general c h a r a c t e r i s t i c s of an orthogonal system.
-28-
H o e r l and Kennard claimed t h a t at c e r t a i n v a l u e of k,

stabilize
[ll':p.65].
They proposed
the system w i l l
the usage of a "Ridge T r a c e " as a
d i a g n o s t i c t o o l t o s e l e c t a s i n g l e v a l u e of k and a unique
of
g i n practice.
The
"Ridge T r a c e " w i l l p o r t r a r y the b e h a v i o r of a l l
the parameter e s t i m a t e s as k v a r i e s .
dimensions
T h e r e f o r e i n s t e a d of s u p p r e s s i n g
e i t h e r by d e l e t i n g c o l l i n e a r v a r i a b l e s or dropping
p r i n c i p a l components of s m a l l importance,
how
s i n g u l a r i t y i s causing i n s t a b i l i t y ,
incorrect
ridge estimate
signs.
the
the Ridge Trace w i l l show
over/under-estimations
and
In c o n n e c t i o n w i t h a u t o c o r r e l a t i o n where r i d g e
r e g r e s s i o n i s even more d e s i r a b l e , c e r t a i n l y the "Ridge T r a c e " w i l l

of
g r e a t h e l p i n g e t t i n g b e t t e r p o i n t e s t i m a t e s and
predictions.
Even when p
be
thereby b e t t e r
i s n e g a t i v e or the p r i n c i p a l components
e
are not a u t o c o r r e l a t e d , the m e r i t s and u s e f u l n e s s of the "Ridge T r a c e "

and the "Ridge R e g r e s s i o n " a r e s t i l l p r e s e r v e d i n d e a l i n g w i t h
problem of m u l t i - c o l l i n e a r i t y .
the
-29-
6.
RIDGE REGRESSION:
ESTIMATES, MEAN SQUARE ERROR AND PREDICTION
The MSE of OLS estimates of g can be written as the difference

in the length between two vectors, $ g and g [11:p.56]
0Tj
(6.1)
E(L*) = E ( i
T
0 L S
) - gg
'
0 L S
(6.1) shows that i n the presence of severe m u l t i c o l l i n e a r i t y , the MSE

can be improved by shortening the OLS estimates of g.
In this section
we w i l l show that t h i s reasoning appears to be compatible with the

derivation of ridge estimator of 'g.
Hence ridge regression can be
expected to be better i n terms of MSE.
61
Derivation of Ridge Estimator for a CLR Model

Let B be any estimate of g.
I t s residual sum of squares, 0 ,
can be written as the value of minimum sum of squares, 0 . , plus

^
mm'
c
the distance from B to &Q g weighted through ( X X ) .

L
0 = (Y-XB) (Y-XB)
T
(6.2)
= ^Xg
= 0
min
) (Y-xg
T
O L S
+ 0(B)
o L s
) + (B-B
) X X(B-g
T
0LS
0LS
For a s p e c i f i c value of 0(B), 0
, the ridge estimator i s founded
by choosing a B to
T
Minimize B B
(6.3)
Subject to ( B - g
) X X (B-g ) = 0
T
0LS
QLS
This problem can be solved by use of Lagrange m u l t i p l i e r techniques
-30-
where (1/k)
The
i s the m u l t i p l i e r c o r r e s p o n d i n g to the c o n s t r a i n t
(6.3).
problem i s to minimize
(6.4)
F = BB
T
(l/k)[(B-6
) X (.?-0LS
T
0 L S
T x
A n e c e s s a r y c o n d i t i o n f o r B to minimize (6.4)
|f
=2B
i[2(X
X)B-2(X
"
0*
i s that
X)^
Hence
[J+^? ?)]?4 ? ?^0LS

T
and
B* = B ( k )
(X X+kI) X Y
T
_ 1
where k i s chosen to s a t i s f y c o n s t r a i n t
work the other way
(6.3).
In p r a c t i c e , we
usually
round s i n c e i t i s e a s i e r to choose a k > 0 and
compute the a d d i t i o n a l r e s i d u a l sum
of squares,
then
0.
Q
I t i s c l e a r t h a t f o r a f i x e d increment 0 , t h e r e i s a continuum of
o'
v a l u e s of B
t h a t w i l l s a t i s f y the r e l a t i o n s h i p 0 = 0 .
+ 0 , and
mxn
o
nu
the r i d g e e s t i m a t e so d e r i v e d
Therefore,
we
may
i s the one
w i t h the minimum l e n g t h .
w e l l expect the r i d g e e s t i m a t e s to y i e l d
l e s s MSE
the presence of m u l t i c o l l i n e a r i t y s i n c e they are o r i g i n a l l y

by m i n i m i z i n g the l e n g t h of the r e g r e s s i o n v e c t o r .
c e r t a i n extent
equivalent
(6.2)
derived
I t i s t r u e to a
t h a t m i n i m i z i n g the l e n g t h of the r e g r e s s i o n v e c t o r
to r e d u c i n g
the MSE
of parameter e s t i m a t e s .
Q L S
T
increase
i n the r e s i d u a l sum
approaching s i n g u l a r i t y .
That i s to say,
is
In a d d i t i o n ,
shows t h a t i t i s p o s s i b l e to move f u r t h e r away from 8
an a p p r e c i a b l e
in
of squares as
ridge regression
(X
X)
may
without
achieve large reduction i n MSE
at v i r t u a l l y no cost i n terms of the

T
r e s i d u a l sum
In 1971,
of squares i f the conditioning of (X X) i s poor enough.
Newhouse and Oman [18] used MSE
as evaluation c r i t e r i o n i n
t h e i r Monte Carlo studies of ridge regression.

the standard way
to evaluate proposals
Since then i t has become
for ridge estimators.
the above derivation of the ridge estimator, obviously we

that ridge estimates are designed to be better i n MSE
Now
From
realize
criterion.
we would l i k e to study the implications of the constraint
i n deriving the ridge estimator.

i n t e r p r e t a t i o n , we represent
Let A = PB
Since orthogonalization can ease
(6.2) i n the rotated axes.
Then
0
= ^ O L s ' V ^ O L S
min
ÔLS V
)
T x
*CA-a
( J L S
X1=1 - V O L S . l * i
(
) 2
Where A^ i s the estimate of regression c o e f f i c i e n t for the i t h component

and a
i s the OLS
ULio
estimate of regression c o e f f i c i e n t for the i t h
component.
The problem i s
T
Minimize A A
(6.5)
Subject
to ( - ?
A
) A(A-a
T
0 L S
0 L S
) = 0
or equivalently
(6.6)
(6.5)
Subject
to
\ (A.-
i=l
Shows that the vector
) X.
2
0LS
A _
= 0
i
?
0 L S
] i s normed through A to have the
length equal to 0 .
Since the eigenvalue X^ can be considered as an
indicator of the information content and explanatory power of the i t h

p r i n c i p a l component, we may well conclude that the derivation of the
ridge estimator has already taken the r e l a t i v e information content
and explaining power of the explanatory variables into account.
(6.6)
shows that the constraint has incorporated the concept of square-errorloss function as w e l l .
It increases the length of A the most when the
parameter estimates of the important
components deviate from OLS
estimates
since i t i s found by taking the square of the deviations multiplied by

their corresponding eigenvalues.
This implies that i t i s best to
shrink the estimate of for those components that have small eigenvalues,
i . e . the ones most subject to
6.2
instability.
Derivation of Ridge Estimator for an ALR Model

In the presence of autocorrelated error terms, the OLS
estimator
of 8 w i l l no longer have the minimum-variance property; the GLS

estimator w i l l be the BLUE of B.
type
Our derivation O f a new ridge
estimator adjusted for autocorrelation, 8
(k), w i l l p a r a l l e l with
~GR
the derivation of 6 (k) i n the previous section.
Again l e t B be
r)
estimate of B..
any
Its residual sum of squares, 0, can be written as
the value of the minimum sum of squares, 0 . , plus the distance from
mm'
r,
B to
rj-i
G L S
weighted through ( X X ) .
A
(See (2.10) for notation).
0 = (Y*-X*Bj (Y*-X*B)
T
" CY -xJ
A
) (Y,-xJ
T
G L S
G L S
) 4- ( B - B
) X^X (B-B
T
G L S
G L S
We have
-33rp
0(B) = 0 , 3
For a s p e c i f i c v a l u e
(6.7)
subject
The
to ( B -
) X^(B-
i s derived
= 0
G L S
)^ ] -
to minimize B B
L a g r a n g i a n i s g i v e n by
* = ? ? + C(?-
T
G L S
> &(?-e
T
A n e c e s s a r y c o n d i t i o n f o r a minimum i s t h a t
2B
+ i
[2X^B
2 ( X ^ ) 3
T h i s reduces t o
(6.8)
B* = 3
G R
(k)
= (X*X* +
for
T -1
'
-1
T
( X Q X + kl)
X n
Where k i s chosen t o s a t i s f y
trace of 3
k l ) "
(6.7).
-1
The c h a r a c t e r i z a t i o n o f the r i d g e
(k) w i l l be e s s e n t i a l l y the same as t h a t of B ^ k ) .
a specific
increment 0 , the 3 , ^ so d e r i v e d
o
~(JK.
i s the r e g r e s s i o n
v e c t o r w i t h the minimum l e n g t h among a continuum o f v a l u e s

will
satisfy
the r e l a t i o n s h i p 0 = 0 ^
m
+ 0 .
Q
For i n s t a n c e ,
of B
that
However m u l t i c o l l i n e a r i t y
may no l o n g e r be a s u b s t a n t i a l problem a f t e r t r a n s f o r m i n g
i n some r a r e cases.
That i s ,
X into X
t h i s may happen i n time s e r i e s
s t u d i e s where m u l t i c o l l i n e a r i t y
i s a r e s u l t o f the e x p l a n a t o r y
i n c r e a s i n g together
I t i s then p o s s i b l e t h a t the transformed
over time.
v a r i a b l e s are not close to being

the case, the r e d u c t i o n
c o l l i n e a r w i t h each o t h e r .
i n MSE can n o t be obtained
variables
I f that i s
with only a s l i g h t
i n c r e a s e i n the r e s i d u a l sum o f squares.

T h i s i s because o f t h e low
"
T
MSE of
a l r e a d y achieved and the n o n - s i n g u l a r i t y o f the ( X ^ X ) .
G T s
In most c a s e s , i f not a l l ,
the m a t r i x
T -1
(X ft X)
i s very l i k e l y
to have
X
a broad
eigenvalue
spectrum i f (X X) does.
on the m o t i v a t i o n of m i n i m i z i n g
the i n t e r p r e t a t i o n and
the l e n g t h of the r e g r e s s i o n v e c t o r ,
i m p l i c a t i o n s o f - t h e c o n s t r a i n t i n the d e r i v a t i o n
of r i d g e e s t i m a t o r of 8 f o r a CLR
Mean Square E r r o r of the " G e n e r a l i z e d

The MSE
Y
p
model w i l l be a p p l i c a b l e to t h a t i n
3.
~GR
the d e r i v a t i o n of
6-3
= X8
=0
of 8^
&
and
Tc
(k)
are r e a d i l y e s t a b l i s h e d .
= a
tr
( X ^ ) '
with
to 3
-1
T
t r (X
model, (5.3)
(6.9)
Since
of 3 as f o l l o w s ,
-GLS
L e t L . = D i s t a n c e from $
2
Estimators"
s a t i s f i e s a l l the assumptions of a CLR
g i v e s the MSE
E(L )
Then the p r e v i o u s d i s c u s s i o n
-1
X)
'
e
S e t t i n g p^ = 0 i n (5.8),
L e t L^(k) = D i s t a n c e
(6.10)
E[L (k)] = a
2
(6.10) g i v e s
the MSE
from ( k ) to 8
of
8 (k).
r R
G R
t r [ ( X f t X+kI) ( X f t X ) ] +
T
2 T T -1
-2
k (X ?+kI) 8
The
(6.10),
expect
effect
since
E(L-j)
a n
of a u t o c o r r e l a t i o n Is d i f f i c u l t
to i n f e r from (6.9)
0, i s not a d i a g o n a l m a t r i x , however, n o r m a l l y we
^
E [ L ^ ( k ) ] to.be l e s s than E(L^) and
and
may
E[L (k)] respectively.

2
-35-
6.4
Estimation
T h e o r e t i c a l l y , t h e GLS g i v e s the BLUE o f 8 f o r an ALR model.
But u s u a l l y i n p r a c t i c e , n e i t h e r the order o f a u t o c o r r e l a t i o n

s t r u c t u r e nor t h e v a l u e o f the parameter p' i s known.
or GR e s t i m a t e s
can not be computed d i r e c t l y .
Many two-stage methods
have been proposed t o approximate the GLS e s t i m a t e s

to be q u i t e e f f e c t i v e .
process
Hence the GLS
and have proven
These i n c l u d e the Cochrane-Ocutt
iterative
[1] and Durbin's two-step method [ 2 ] .
In the j o i n t presence o f m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n ,

it
one
i s a c t u a l l y quite straightforward
incorporated
of .
We i l l u s t r a t e how r i d g e r e g r e s s i o n can be
i n Durbin's two-step method f o r a simple model w i t h
c o l l i n e a r explanatory
(6.11)
.
and
- 3
B l
e t-l
= P
t l
only
variables:
+ 8 X
2
t 2
t=l,2,,..,n
' J
for a l l t
E(u )
= 0
E ( u
t' t+s
u
for s = 0
=0
The
with
o f t h e two-stage r e g r e s s i o n methods i n the hope o f a c h i e v i n g
b e t t e r estimates
two
t o combine r i d g e r e g r e s s i o n
for s 0
transformed r e l a t i o n i s g i v e n by
(6.12)
- P
Y _
t
- 3 (l-p ) + ^ ( X ^ - P ^ . ^
o
+ u.
e (X
2
t 2
-P X
e
l j 2
Combining (6.11) and (6.12) gives

(6.13) Y
- 3 (1-P >-+ ^
~ h e t-l,l
X
2 t2 " W t - l , 2
X
Y
E
t-l
+ U
The f i r s t step i s to estimate the parameters of (6.13) using OLS. Then

use the estimated c o e f f i c i e n t of Y
(Y -p Y _ ),
t
^ to compute the transformed variables
( X ^ - p g X ^ . ^ and (\ ~ e t-l
P
^.
At the second step,
ridge regression i s highly recommended to be used i n place of OLS

and applied to relationship (6.12) containing those transformed
variables.
The c o e f f i c i e n t estimate of (X . - p X
.) i s our
ti
e t-li
approximation of 3
and the intercept term divided by (1-p ) i s our
n
/\
approximation of 3
G R
It might seem reasonable to apply ridge regression at the f i r s t

step of Durbin's method since X,, and X^ are c o l l i n e a r .
t l
t2
v
As stated
e a r l i e r i n Section 3, high pair-wise c o r r e l a t i o n c o e f f i c i e n t of

explanatory
variables does not necessarily r e s u l t i n estimation' i n - .
stability.
Besides, the lagged values of X ^ and X ^ are inserted
into the explanatory
variable set.
I f X ^ and X ^ are not autocorrelated,

T
the conditioning of the enlarged

u
fc
(X X) may be s a t i s f a c t o r y . Moreover
i n (6.12) has a scalar dispersion matrix, therefore OLS gives
consistent estimates of regression c o e f f i c i e n t s .

estimates,
Also among these
only the c o e f f i c i e n t estimate of Y' ^ w i l l be used to compute
the transformed variables.
Hence OLS technique i s recommended to be
used at the f i r s t step even when Xj.^ and X ^ are c o l l i n e a r .

This combination of ridge regression and Durbin's two-step method
can e a s i l y be extended to a p-variable model with higher order of
autocorrelation.
-37-
6.5
Prediction
Consider a f i r s t - o r d e r ALR model, (2.15) gives the minimum
variance predictor (BLUP).
In p r a c t i c e , both $
replaced by their estimated
values.
r T O
and p are
If ridge regression i s used i n conjunction with some other

methods to cope with the j o i n t problem of m u l t i c o l l i n e a r i t y and autoc o r r e l a t i o n , the p r e d i c t i o n i s given by
(6.14)
f c + 1
= X ^ B ^ k )
+P e
st
where X ^
t +
i s a 1 x p vector of the (t+1)
explanatory v a r i a b l e s , B
regression c o e f f i c i e n t s , P
observation on the
(k) i s a p x 1 vector of approximated
i s an estimate of autocorrelation c o e f f i c i e n t
and e^ i s the ridge r e s i d u a l at time t .
THE MONTE CARLO STUDY

Consider a f i r s t - o r d e r ALR model with two explanatory variables,
the error terms i n the transformed r e l a t i o n , as shown by (6.12), have
a scalar dispersion.
The residual sum of squares from (6.12) i s given
by
t=i
t=i
If
t 2 -
e W
and Y^ are given, the summation can run from 1 to n, other-
wise i t can only run from 2 to n.

/N
/S
respect to B , 6^, 8
Q
The direct minimization of (7.1) with
A
a n
2
d P
leads to non-linear equations, therefore,

/N
/S
/\
the analytic expressions for 8 B^> 8^ and

Q
can not be obtained.
As mentioned before, many two-stage methods have been proposed to

approximate these parameters.
Usually i n practice, not only the
parameters and the true error terms but also the order of autocorrelation
structure i s unknown.
As indicated previously, j o i n t presence of
autocorrelation and m u l t i c o l l i n e a r i t y w i l l further complicate the s i t u a tion.
Under t h i s circumstance, the r e l a t i v e effectiveness of those
two-stage methods can best be studied by the Monte Carlo experiments

[19].
7.1
Design of the Experiments

The main purpose of the experiment i s to give an empirical
support to the inference drawn from our analytic studies.

experiments are conducted i n the following manner.
The sampling
B a s i c a l l y , the
sampling experiments comprise nine d i f f e r e n t experiments with d i f f e r e n t
-39-
degree
of multicollinearity and autocorrelation.
They are
summarized in Table 1.
' Table 1
Experiment
1
2
3
4
5
6
7
8
9
12
.05
.05
.05
.50
.50
.50
.95
.95
.95
In our experiments, y ^ and
.05
.50
.90
.05
.50
.90
.05
.50
.90
are used to indicate the severity
of multicollinearity and autocorrelation respectively.. Usually in

practice, multicollinearity constitutes a problem only when Y-^
1 S
a s
high as 0.8 or 0.9. In addition, the error terms are normally considered
to be independent, moderately and highly autocorrelated when p = .05,
.50 and .90 respectively.
As shown by Table 1; the experiments are
set up to have different characteristics. Through this design, we can

study the effects of autocorrelation on estimation and prediction for
a given degree of multicollinearity.
Moreover, we can observe how these
effects of autocorrelation change as the degree of multicollinearity

varies.
The data is generated as follows; f i r s t , values are assigned to
8^, 8 , B and the probability characteristics of error terms u
1
fc
in
-40(2.9).
Three s e r i e s of e
the v a l u e s o f e
i n (2.9) are s u b s e q u e n t l y generated, g i v e n
and d i f f e r e n t v a l u e of p .
e
The
probability
c h a r a c t e r i s t i c s o f the j o i n t d i s t r i b u t i o n X, ., and X^ a r e chosen to

t l
t.2
J
generate the s e r i e s of X ^
t h r e e experiments.
and X ^
We
and X ^
a r e generated f o r the r e m a i n i n g
have a l s o a s s u r e d t h a t t h e r e i s no
f i r s t - o r d e r a u t o c o r r e l a t i o n i n X ^ and X ^
is first-order.
Solving for Y
of f o r t y o b s e r v a t i o n s a r e generated.
so t h a t the e r r o r
structure
sets
F o r each experiment, ten samples

In each sample, t h i r t y
from t = 1 to t = 30 a r e employed
by a p p r o r p i a t e methods.
significant
based on the d a t a , n i n e d i f f e r e n t
can be generated f o r the experiments.
on the Y 's
first
By v a r y i n g the c o r r e l a t i o n c o e f f i c i e n t o f X ^ and
X 2 another two s e r i e s o f X ^
s i x experiments.
t h a t a r e s u i t a b l e f o r the
observations
to e s t i m a t e the e q u a t i o n
O b s e r v a t i o n s 31 to 40 a r e used to study the
p r e d i c t i o n p r o p e r t i e s of e s t i m a t o r s .
The BLUP i s used i n the presence
of s i g n i f i c a n t a u t o c o r r e l a t i o n i n the e r r o r
terms.
S p e c i a l c a r e has to be e x e r c i s e d i n c o n t r o l l i n g the s e r i a l
c o r r e l a t i o n p r o p e r t i e s of the e r r o r terms.
In t h i s c o n n e c t i o n , OLS
r e g r e s s i o n has t o be r u n on
(7.2)
=pe
3
,+u
J
j = 1,2,...,10
t = 1,2
40
to determine whether the e s t i m a t e d r e g r e s s i o n c o e f f i c i e n t

c o n s i s t e n t w i t h the p which i s used to generate them.
i s well-known,
the OLS
is
However, as
e s t i m a t e s o f parameters f o r s m a l l samples
may
be b a d l y b i a s e d i f some o f the r e g r e s s o r s are lagged dependent

variables
[23].
T h i s i s because the e r r o r terms, u_.
fc
be independent of the r e g r e s s o r s , j _ ^ >

E
jt''"'' j40"
G
w i l l no
*
longer
^ )>
2
-41-
( -i.>
jt
)
jt+s
-4.j_
0 for s
4- 0
and a l l t , hence the OLS e s t i m a t e f o r the
c o e f f i c i e n t of ..-, i s b i a s e d .
3t-l
The u s u a l t t e s t on the e s t i m a t e of
r e g r e s s i o n c o e f f i c i e n t may be q u i t e m i s l e a d i n g , t h e r e f o r e we can o n l y
a s c e r t a i n t h a t the d e s i r e d s e r i a l
by a s s u r i n g t h a t u ^
first
t e s t whether
c o r r e l a t i o n p r o p e r t i e s are o b t a i n e d
are randomly d i s t r i b u t e d .
the s e r i e s o f u
For each sample,
i s c o n s i s t e n t w i t h the p r o b a b i l i t y
c h a r a c t e r i s t i c s chosen t o generate them, then we use run

determine whether u
fc
we
i s randomly d i s t r i b u t e d .
t e s t to
Only those s e r i e s of u
passed a l l the t e s t s are adopted i n our s i m u l a t i o n study.
We are
now
ready to e s t i m a t e the r e g r e s s i o n e q u a t i o n .
First,
the
f o r each experiment, the OLS p r i n c i p l e i s a p p l i e d to e s t i m a t e
parameters.
The Durbin-Watson
statistic
t e s t the e x i s t e n c e of a u t o c o r r e l a t i o n .
statistic
i s used as a f i l t e r
Whenever the
to
Durbin-Watson
computed from the f i t t e d model i s l e s s than the c o r r e s p o n d i n g
upper c r i t i c a l v a l u e d^ (a=0.05), a u t o c o r r e l a t i o n i s assumed to'be

p r e s e n t i n the e r r o r terms, then D u r b i n ' s two-step method i s used i n con j u n c t i o n ' w i t h Ridge r e g r e s s i o n f o r e s t i m a t i o n as d e s c r i b e d i n
S e c t i o n 6.4;
o n l y OLS
purposes.
o t h e r w i s e , a u t o c o r r e l a t i o n i s assumed to be absent, and
and Ridge r e g r e s s i o n s t e c h n i q u e s are employed
for- e s t i m a t i o n
In a d d i t i o n , whenever the e x i s t e n c e of a u t o c o r r e l a t i o n i s
~
r e c o g n i z e d , 3
~
Tc
and B
-'
(k) are computed f o r comparison purposes.
Since the t r u e v a l u e of the a u t o c o r r e l a t i o n c o e f f i c i e n t

each experiment, c a l c u l a t i o n s o f 3
and 3(k)
- GLo
~ GR
i s known f o r
w i l l simply be the
s t r a i g h t f o r w a r d m u l t i p l i c a t i o n of m a t r i c e s as shown by
(2.13) and
(6.8).
The methods adopted f o r e s t i m a t i o n i n each experiment are r e c o r d e d i n

T a b l e 2.
-42-
Table 2
Experiment
Method
OLS,
Durb., Durb.+RR, GLS,
GR
GR
OLS,
OLS:
RR
RR
GR
Durb., Durb.+RR,. GLS,
GR
OLS,
GR
9.
GR
RR
O r d i n a r y L e a s t --squares
Durb.:
Durbin's
Durb.+RR:
Regression
Two--step Method
Durbin's Two-step i n c o n j u n c t i o n w i t h
Ridge R e g r e s s i o n
GLS:
GR:
C e n t r a l i z e d Least-squares
Regression
Ridge R e g r e s s i o n a d j u s t e d f o r A u t o c o r r e l a t i o n
As i s expected, no c o r r e c t i o n f o r a u t o c o r r e l a t i o n i s n e c e s s a r y f o r
experiments
1, 4 and 7.
Whenever t h e r i d g e method i s a p p l i e d seven o r
e i g h t v a l u e s o f k have been used i n our study.
I n o r d e r t o minimize
the e f f e c t s o f s u b j e c t i v i t y r e s u l t i n g from s e l e c t i n g the v a l u e o f k
A.
i n r i d g e r e g r e s s i o n s , we compute the mean 6 o f the samples f o r every

~R
s p e c i f i c v a l u e o f k i n each experiment.
based
on a "Mean Ridge T r a c e " .
That
Then the v a l u e o f k i s s e l e c t e d
i s to say, a unique
s e l e c t e d w i l l g e n e r a l l y be t h e b e s t f o r a l l then samples.
value of k
Obviously,
the v a l u e o f k s o - s e l e c t e d may w e l l n o t t o be the b e s t f o r every

i n d i v i d u a l sample.
T h e r e f o r e , t h e minimum o f t h e MSE o f r i d g e
e s t i m a t e s o f 3 a c h i e v e d f o r each experiment
i s s l i g h t l y upward b i a s e d .
C e r t a i n l y t h i s way o f s e l e c t i n g the v a l u e o f k cannot be used i n

practice.
7.2
Sampling R e s u l t s
For
each method f o r each experiment, the MSE o f the e s t i m a t e s

2
of
t h e a d j u s t e d R , t h e r e s i d u a l sum o f squares, the MSE f o r e c a s t
and the Durbin-Watson s t a t i s t i c a r e averaged over t e n samples.

a d d i t i o n , t h e mean e s t i m a t e
statistic
a r e a l s o computed,
In
o f p . and t h e mean H a i t o v s k y h e u r i s t i c
u
i s assumed to f o l l o w a normal
d i s t r i b u t i o n w i t h mean zero and v a r i a n c e e q u a l t o 6 i . e . u ~'N(0,'6).

The t r u e mean o f X
i s 10 and t h a t o f X
tl
variance of X
i s 8.
and X ^
18 and 15.
i s chosen t o be 3 f o r each
sample.
The t r u e v a l u e o f BQ i s 5, 8^ i s 1.1 and
7.2a R e s u l t s assuming p i s Known
F i r s t we assume p
i s known.
best of s i t u a t i o n s .
i s 1.
The r e s u l t s here w i l l
whether the methods d e s c r i b e d i n S e c t i o n 6.4

the
The r e s p e c t i v e
t2
indicate
can show promise i n
T a b l e 3 c o n t a i n s t h e average MSE o f 8
~ Lrii b
and 3 f o r experiments 2, 3, 5, 6, 8 and 9.

_GR
Table 3
\5xperiment
k
\.
(Y =.05) (Y =.05) (Y =.50) (Y =-50) (Y =.95) (Y =.95)

12
(P
0.0*
12
=.50)
(P
12
=.90)
(P
12
=.50)
12
=.90)
(P
12
=.50)
(P.
=.90)
.4824
3.2913
.0834
2.3940
.025
.0591
1.8084
.0018
1.4991
.05
.0363
.8073
.1215
.8337
.075
.3603
.2274
.3033
.4263
.1
.9849
.0156
.8871
.1092
.4929
.2901
.2
5.7786
2.0100
4.0851
.6153
2.4426
.4242
.3
2.9771
7.0896
9.3743
4.0521
5.4924
1.0113
.5
30.1494
21.7887
21.1581
11.4165
13.7628
4.8285
.7
47.5367
36.6241
35.2063
22.2489
23.7003
10.9524
1.0
75.6732
63.4542
56.5869
39.9762
38.5830
.1011
2.1681
.0561
.9405
.
* GLS regressions can be considered as a s p e c i a l case of Ridge

regressions, adjusted for autocorrelation, with k = 0.
In Section 5, i t has been shown that the MSE of 8
will
increase
rapidly i f s i g n i f i c a n t l y p o s i t i v e autocorrelation exists both i n the

disturbances and i n the p r i n c i p a l components.
Correction for auto-
c o r r e l a t i o n w i l l then be necessary i n estimating the regression
equation.
Though the GLS regression y i e l d s the BLUE of 8 , the behavior of the MSE
of 8
i s very d i f f i c u l t to i n f e r from (6.10).
~Gljb
From the MSE of
experiments 3, 6 and 9 when k = 0, we observe that the MSE of 8

decreases as the degree of m u l t i c o l l i n e a r i t y increases for s u f f i c i e n t l y
high degree of autocorrelation.
On the other hand, for a given degree
of m u l t i c o l l i n e a r i t y , the MSE.of 8
autocorrelation increases.
w i l l increase as the degree of
But the magnitude of the increase i n the MSE
-45of B
decreases as the r e l a t i o n among explanatory variables increases.
For instance, the difference i n MSE
of 3
r T 0
between experiments 8 and
~ GL< o
9 i s less than that between experiments 5 and 6.
Moreover, Table 3
shows that there exists at least one value of k for each experiment
such that the MSE
of 8
i s less than that of 8 .

TC
= 9,
Note that when
~GLiD
~GR
k = .1 obtains the minimum MSE
of the estimates of g i n
T
experiment 9.
This also implies that the transformed matrix (X^X^) i s
s t i l l ill-conditioned.
In Section 5.3 we have shown that the range
of k such that the MSE
of 3- i s less than that of B

w i l l be larger
~R
~OLS
T c
i f m u l t i c o l l i n e a r i t y i s accompanied by high degree

Now
P
with parameter estimates f u l l y adjusted
of autocorrelation.
for autocorrelation (since
i s known) experiment 9 s t i l l has the largest admissible
range of k.
That i s , the range of k such that the MSE

B is s t i l l
of g
i s less than that of
~GR
larger i f m u l t i c o l l i n e a r i t y i s accompanied by high
~Gijb
degree of autocorrelation and autocorrelation has been f u l l y

We
adjusted.
also observe that as the degree of autocorrelation increases, a
larger reduction i n the MSE of the estimates of 3 can be obtained by

replacing B with $ . For instance, the difference i n the MSE of
~GLo
~GR
3 and 3
(.05) i n experiment 8 i s less than that i n experiment 9.
~G1JO
~GK
c
n T
of B i s very d i f f i c u l t , i f not
~GR
(6.10) shows that the MSE of $
i s comprised
~GR
However, the behavior of the MSE

impossible,to
of two
terms.
predict.
How
each term behaves w i l l depend not only on
the
data matrix X and the degree of autocorrelation but also on the
way
the matrix X. i s linked with the matrix 0. ^.

7.2b
Results assuming p
In practice p
i s unknown
i s unknown.
c o e f f i c i e n t i s unknown and we
We
assume that the true autocorrelation
t r y to f i t the equation using h e u r i s t i c
-46-
techniques akin to the Durbin's two-step method i n which has been

shown by G r i l i c h e s and Rao [8] to perform well when there i s autocorrelation.
We apply these techniques as described i n Section 7.1.

2
Tables 4,5 and 6 report the mean adjusted R
and the mean Durbin-
Watson s t a t i s t i c for each experiment. (du(a = 5%) = 1.57 for

experiments L-4 and 7; du(a = 5%) = 1.56 f o r theremaining six
experiments)
Table 4 ( Y
Experiment
1 2
= .05)
1
V
2
.05
= .50
.90
2*
R
a
d**
.8640
2.0791
.8979
1.8766
.9149
1.8380
.025
.8640
2.0903
.8884
1.8713
.9144
1.8269
.05
.8620
2.1001
. 8868
1.8728
.9128
1.8286
.075
.8579
2.1083
.8844
1.8805
.9104
1.8409
.1
.8567
2.1154
.8813
1.8916
.9071
1.8613
.2
. 8401
2.1353
.8619
1.9619
.8888
1.9809
.3
.8167
2.1463
.8399
2 .0383
.8648
2.1030
.5
.7651
2.1536
.7865
2.1563
.8102
2.2789
1.0
.6394
2.1527
.6571
2.2960
.6780
2.4740
k
0.0
* R :
*d :
2
R
a
the mean adjusted R'

the mean Durbin-Watson
statistic
2
R
a
Table 5
(Y,
= -50)
Experiment
5
05
.50
P
0.0
2
R
.90
' d
.8973
2.0984
.9178
1.8958
.9475
2.0754
.025
.8970
2.1040
.9175
1.9054
.9472
2.0865
.05
.8962
2.1095
.9168
1.9194
.9464
2.1059
.075
.8950
2.1147
.9155
1.9371
.9451
2.1317
.1
.8933
2.1198
.9138
1.9576
.9434
2.1622
.2
.8832
2.1381
.9032
2.0540
.9336
2.3024
.3
.8702
2.1592
.8895
2.1504
.9184
2.4532
.5
.8351
2.1721
.8546
2.2988
.8834
2.6086
.7
.7970
2.1826
.8187
2.4203
.8440
2.7084
1.0
.7412
2.1900
.7572
2.4732
.7846
2.7887
Table 6
(y., = . 9 5 )
Experiment
V
k
8
.05
9
50
90
.9208
2.0511
.9391
1.8785
.9575
1.8707
.05
.9201
2.0873
.9380
1.9058
.9565
1.8883
.1
.9181
2.1128
.9359
1.9437
.9544
1.9335
.2
.9116
2.1473
.9293
2.0280
.9677
2.0562
.9024
2.1659
.9199
2.1084
.9382
2.1660
.5
.8786
2.1768
.8957
2.2312
.9136
2.8381
.7
.8505
2.1725
.8673
2.3082
.8847
2.4417
1.0
.8056
2.1594
.8217
2.3739
.8399
2.5259
0.0
-48-
From T a b l e s 4-6, we observe t h a t the a d j u s t e d
I n c r e a s e s as the
degree of a u t o c o r r e l a t i o n i n c r e a s e s f o r a g i v e n v a l u e o f k and a g i v e n
degree o f m u l t i c o l l i n e a r i t y .
This i s i n t u i t i v e l y
plausible
since
a u t o c o r r e l a t i o n can account f o r p a r t o f the v a r i a t i o n i n the e r r o r s ,

thereby d e c r e a s i n g the r e s i d u a l sum of squares and i n c r e a s i n g the
2
adjusted R .
increases.
F o r each experiment, the a d j u s t e d R
d e c r e a s e s as k
The r e a s o n i s obvious from the d e r i v a t i o n of r i d g e
B e s i d e s , the b e s t R
is,
2
cl
estimators.
a c h i e v e d f o r each experiment i s p r e t t y h i g h , t h a t
the e s t i m a t e d model can e x p l a i n most o f the v a r i a t i o n i n Y^_.
This
a l s o i m p l i e s that the e s t i m a t i o n methods adopted i n our experiments are

fairly efficient
and powerful.. The mean Durbin-Watson
statistic
computed f o r each method f o r each experiment i s h i g h enough to

a s c e r t a i n t h a t the f i t t e d model has s u c c e s s f u l l y removed the problem
of
autocorrelation.
S i n c e the model i s r e a s o n a b l y w e l l f i t t e d ,
simulation
comparisons o f t h e e x p e r i m e n t a l r e s u l t s s h o u l d be m e a n i n g f u l as w e l l as
informative.
The average MSE o f the e s t i m a t e s o f i s computed f o r each method
for
each experiment and r e c o r d e d i n T a b l e 7.
Table 7
^Êxperiment.
1
.1101
.4824
.9594
.0030
.0342
.3996
.0180
.0720
.6951
.025
.0192
.0570
.2691
.1104
.0210
.0945
.05
.2865
.0390
.0087
.4158
.1965
.0024
.1833
.0719
.0939
.075
.8820
.3744
.1200
.8973
.5778
.1026
.1
1.7430
1.0167
.5559
1.5366
1.1097
.3765
.8307
.5643
.0041
.2
7.3539
5.9115
4.7694
5.3823
4.5690
2.7540
3.2001
1.2549
.5822
.3
15.1383
13.2456
11.5962
10.8432
9.5949
6.8219
6.6531
5.7624
3.3012
.5
33.5067
31.1559
28.8369
23.9694
22.3449
18.6255
15.6804
14.2377
10.0251
.7
38.6334
36.9396
32.0214
26.3049
24.3789
18.5115
79.3134
76.8708
60.7620
50.3176
52.8276
43.3464
40.2351
32.3991
k
0.0
1.0
-
73.8630
-50-
As t h e known a u t o c o r r e l a t i o n case, when k = 0 t h e MSE o f

e s t i m a t e s o f B i n c r e a s e s as t h e degree o f a u t o c o r r e l a t i o n i n c r e a s e s ,
g i v e n the degree o f m u l t i c o l l i n e a r i t y .
However, b e i n g
different
from t h e known a u t o c o r r e l a t i o n case, t h e MSE o f e s t i m a t e s o f 8 f i r s t

decreases
then i n c r e a s e s as t h e degree o f m u l t i c o l l i n e a r i t y i n c r e a s e s
f o r k = 0 and a g i v e n degree o f a u t o c o r r e l a t i o n .
except
f o r experiments
T a b l e 7 shows t h a t
4 and 7, b e t t e r e s t i m a t e s o f B i n MSE
criterion
can be o b t a i n e d i f D u r b i n ' s
two-step method i s combined w i t h r i d g e
regression f o r estimation.
B e s i d e s , amazingly we have found t h a t we
a r e a b l e t o o b t a i n b e t t e r e s t i m a t e s o f B i n terms o f MSE i f t h e t r u e
autocorrelation coefficient P
i s unknown.
F o r c l a r i t y , we
shall
compare o n l y t h e minimum o f t h e average MSE o f t h e e s t i m a t e s o f B

a c h i e v e d f o r each experiment
p,
known c a s e .
i n the p
unknown case w i t h t h a t i n the
T a b l e 8 r e p o r t s t h e minima o f t h e average MSE o f t h e
e s t i m a t e s o f 8 a c h i e v e d f o r each experiment
unknown c a s e s .
i n both the
known and
I n a d d i t i o n , t h e e s t i m a t i o n method and the c h a r a c t e r i s t i c s
of each experiment
are also tabulated.
-51-
Table 8
(p
(p
unknown)
known)
^ s E x p e r imen t
Experiment
(r
1 2
,P )
Estimation
Method
Min. MSE
of g
Min.
^GR
(.05,.05)
RR
.025
.0192
(.05,.50)
. Durb.+RR
.05
.0390
.05
.0363
(.05,.90)
Durb.+RR
.05
.0087
.1
.0156
(.50,.05)
.0030
(.50,,50)
Durb.+RR
.025
.0210
. 025
.0018
(.50,.90)
Durb.+RR
.05
.0024
.1
.1092
(.95,.05)
.0180
(.95,.50)
Durb.+RR
.05
.0719
.05
.0561
(.95,.90)
Durb.+RR
.1 .
.0441
.1
.2901
OLS
0.0
OLS
0.0
A couple o f i n t e r e s t i n g o b s e r v a t i o n s can be made from T a b l e 8.

if
First
the degree o f m u l t i c o l l i n e a r i t y i s h e l d c o n s t a n t , the minimum o f the
average
MSE o f parameter e s t i m a t e s w i l l
first
the degree o f a u t o c o r r e l a t i o n i n c r e a s e s .
i n c r e a s e then decrease as
On the o t h e r hand, g i v e n the
degree o f a u t o c o r r e l a t i o n , t h e minimum o f the average

estimates w i l l
first
decrease
c o l l i n e a r i t y increases.
MSE o f the parameter
then i n c r e a s e as the degree o f m u l t i -
These a r e i n t u i t i v e l y p l a u s i b l e s i n c e s u f f i c i e n t
h i g h degree o f a u t o c o r r e l a t i o n s h o u l d - l e a d to more s t a b l e parameter

e s t i m a t e s w h i l e s u f f i c i e n t h i g h degree o f m u l t i c o l l i n e a r i t y
usually
r e s u l t s i n v e r y u n s t a b l e parameter e s t i m a t e s .
observe
t h a t the v a l u e o f r i d g e parameter k, used

of
MSE
Secondly, we
t o a c h i e v e the minimum MSE
the e s t i m a t e s o f 8, i n c r e a s e s w i t h the degree o f m u l t i c o l l i n e a r i t y
and a u t o c o r r e l a t i o n .
T h i s i s c o n s i s t e n t w i t h our a n a l y t i c
findings
shown i n Section 5.3. Moreover, we have found that knowing
does not
give better estimates of 6 for s u f f i c i e n t high degree of autocorrelation.

This may r e s u l t from sample sizes being small.
Table 9 contains the mean estimates of p^ obtained i n the f i r s t step
of Durbin's method and the mean Haitovsky h e u r i s t i c s t a t i s t i c for each
experiment.
Table 9
Experiment
Bias i n p
e
H
x^
df = 3
2
.3581
125.7
.7182
.3586
.7231
.1419
.1818
.1414
.1769
123.1
111.7
38.7
37.4
39.1
2.78
.3849
.7498
.1151 .1502
2.53
2.40
In a l l cases, Durbin's two-step method tends to underestimate the true

autocorrelation c o e f f i c i e n t .
This results from the presence of the lagged
Y values among the explanatory variables [16].
I f the degree of multi-
c o l l i n e a r i t y Is held constant, the bias of estimate of p^ increases as the

degree of autocorrelation increases; while given the degree of autocorrelation, the bias decreases as the degree of m u l t i c o l l i n e a r i t y increases.
In our simulation study, Haitovsky h e u r i s t i c s t a t i s t i c can recognize the
existence of severe m u l t i c o l l i n e a r i t y i n experiments
7, 8 and 9.
However,
i t does not give any warning when there exists a f a i r l y high degree of
m u l t i c o l l i n e a r i t y , i . e . based on the Haitovsky test, m u l t i c o l l i n e a r i t y
i s i n s i g n i f i c a n t i n experiments
4, 5 and 6.
Since the Haitovsky test
i s based on the determinant of correlation matrix, i t has some b u i l t - i n

defficiencies
(see Section 3.3 for d e t a i l s ) .
Our experiments have
-53-
disclosed these d e f f i c i e n c i e s to a certain extent, hence we suggest that

s p e c i a l care has to be exercised i n applying this test.
7.2.C.
Forecasting
Tables 10, 11 and 12 report the average residual sums of squares and
the mean square error of prediction from the given values f o r the forecast period of each experiment, under the assumption that
Table 10
Experiment
k
1 2
= .05, a* = 6)
1
a
0.0
(r
i s unknown.
AA
2*
AA
F/C
F/C
AA
M S E
F/C
5.9700
8.1132
5.7055
9.1966
5.6351
10.939
.025
5.9924
8.0343
5.7332
9.6623
5.6721
10.952
.05
6.0554
7.9961
5.8113
9.0620
5.7762
11.034
.075
6.1536
7.9986
5.9328
9.1072
5.9382
11.173
.1
6.2824
8.0343
6.0918
9.1913
6.1501
11.360
.2
7.0250
8.4293
7.0074
9.8181
7.3716
12.470
.3
8.0022
9.0877
8.2087
10.739
8.9754
13.932
.5
10.245
10.755
10.957
12.944
12.6521
17.249
1.0
15.733
15.075
17.656
18.419
21.649
25.164
"2
a : the average of the residual sums of squares over ten samples.
** MSE ^ :
F
the average MSE of predictions from the given values for

the forecast period.
-54-
Table 11
(r, = .50, a
k
0.0
Experiment
-2
a
u
= 6)
12
M S E
F/C
-2
. a
u
M S E
F/C
"2
a
u
M S E
F/C
6.0169
8.2093
5.8625
9.3838
5.7757
10.733
.025
6.0331
8.1743
5.8828
9.3691
5.8038
10.731
.05
6.0797
8.1690
5.9409
9.3874
5.8838
10.672
.075
6.1541
8.1905
6.0330
9.4351
6.0109
10.850
.1
6.2518
8.2360
6.1559
9.5093
6.1803
10.961
.2
6.8444
8.6142
6.8849
10.021
7.1976
11.679
.3
7.2030
9.2476
7.9231
.10.783
8.6992
12 .482
.5
9.7190
10.851
10.470
12.735
12.128
15.327
.7
11.933
12.726
13.070
14.851
15.759
18.018
1.0
15.429
15.620
17.566
18.301
21.929
22.680
2
Table 12
a
u
0.0
= .95, a
^F/C
= 6)
Experiment
k
(r
-2
a
u
MSE .
F/C
9
-2
a
u
M S E
F/C
6.0443
8.3699
5.6725
9.8589
5.4033
11.335
.05
6.1186
8.1165
5.7759
9.4141
5.5390
10.758
.1
6.2753
8.1220
6.0473
9.3568
5.8110
10.754
.2
6.7890
8.4120
6.6160
9.5945
6.7433
11.172
.3
7.5180
8.9408
7.5161
10.116
7.9187
12.001
.5
9.4101
9.8466
11.683
11.120
. 14.301
10.445
.7
11.643
.12.290
12.584
13.651
14.881
17.123
1.0
15.198
15.343
16.973
16.918
20.713
21.591
Though the BLUP i s adopted f o r f o r e c a s t purposes, the MSE

prediction w i l l
increases.
still
i n c r e a s e as the degree of
of
autocorrelation
However, the main p o i n t i s t h a t the presence of m u l t i -
c o l l i n e a r i t y w i l l adversely
disturbances
a f f e c t the p r e d i c t i v e performance i f the
are h i g h l y s e r i a l l y c o r r e l a t e d .
The
t h a t the p r e d i c t i v e power o f the model i s not
commonly h e l d
a f f e c t e d by
belief
existence
of m u l t i - c o l l i n e a r i t y i s o n l y t r u e i f the problem of a u t o c o r r e l a t i o n
is
not
serious.
I n the 9th experiment, the model f i t t e d by
method g i v e s s a t i s f a c t o r y r e s u l t s on v a r i o u s
diagnostic tests,
the p r e d i c t i o n of the BLUP l e a v e s much t o be d e s i r e d .
determine the model which w i l l y i e l d l e s s MSE

c r i t e r i a and
c o l l i n e a r i t y and
not
l e s s MSE
We
to
perform
a l s o observed t h a t the v a l u e
of p r e d i c t i o n s .
e s t i m a t e s of B i n MSE
of
of squares w i l l
However, the
c r i t e r i o n tends to
of p r e d i c t i o n f o r each experiment.
the v a l i d i t y of the MSE
o f p r e d i c t i o n and
e s t i m a t e of the r e s i d u a l sum
u s u a l l y g i v e the minimum MSE
o f k g i v i n g the b e s t
are a b l e
t e s t s i n the j o i n t presence of m u l t i -
autocorrelation.
k t h a t y i e l d s the b e s t
still
Fortunately,
w i t h Durbin's method combined w i t h Ridge r e g r e s s i o n , we
w e l l on v a r i o u s
Durbin's
Hence, we
may
value
yield
conclude
c r i t e r i o n i n the e v a l u a t i o n of parameter
estimates.
To
avoid
c o n f u s i o n , we
based on
g
~GLS
and
obtained
However, we
true p
have not
reported
found t h a t the MSE
the MSE
of p r e d i c t i o n based
i s l e s s than t h a t based on the parameter e s t i m a t e s
by Durbin's two-step method i n c o n j u n c t i o n
regression.
of p r e d i c t i o n
with
ridge
Though Durbin's two-step method combined w i t h
r e g r e s s i o n g i v e s b e t t e r e s t i m a t e s o f 3, but
underestimates p .
Therefore,
ridge
i n general i t
the BLUP based on
(3
and
true p
on
g i v e s the minimal MSE o f p r e d i c t i o n i n each o f the experiments 2, 3,

5, 6, 8, and 9.
CONCLUSIONS
I t has been shown t h a t i n the presence o f m u l t i c o l l i n e a r i t y w i t h
s u f f i c i e n t h i g h degrees o f a u t o c o r r e l a t i o n .
The OLS e s t i m a t e s o f
r e g r e s s i o n c o e f f i c i e n t s can be h i g h l y i n a c c u r a t e .
estimation
procedure i s o b v i o u s l y n e c e s s a r y .
r e g r e s s i o n , we d e r i v e d a new
Combining GLS and Ridge
estimator.
(k) = (xV'-'-X+kl)'" X n~ Y
~ .. ~
~
~
~
T
~GR
Improving the
where 0 < k < 1
andft i s defined i n (2').
&(k), though b i a s e d ,
i s expected to perform w e l l i n the j o i n t presence o f m u l t i c o l l i n e a r i t y

and
autocorrelation.
based on the b i a s e d
Therefore,
However, s i n c e ft i s unknown,parameter e s t i m a t e s
estimator
B^Ck)
cannot be o b t a i n e d
i n practice.
we combined Durbin's two-step method w i t h o r d i n a r y
r e g r e s s i o n to approximate those parameters.
The e f f e c t i v e n e s s o f our
a p p r o x i m a t i o n can then b e s t be examined by the Monte C a r l o

Our
simulation.
study has confirmed t h a t , f o r a g i v e n degree o f m u l t i -
c o l l i n e a r i t y , the MSE o f the GLS e s t i m a t e s o f 3 i s d i r e c t l y

to the degree o f a u t o c o r r e l a t i o n .
wisdom.
Ridge
T h i s agrees w i t h
proportioned
conventional
Unexpectedly, we found t h a t the MSE o f t h e GLS e s t i m a t e s o f
3 i s i n v e r s e l y p r o p o r t i o n a l to the degree of m u l t i c o l l i n e a r i t y f o r a
s u f f i c i e n t l y high
degree o f a u t o c o r r e l a t i o n .
T h i s i m p l i e s t h a t i n the
a p p l i c a t i o n o f the GLS t e c h n i q u e , the symptom o f the e x i s t e n c e o f

m u l t i c o l l i n e a r i t y may be d i s g u i s e d .
However, s i n c e i n p r a c t i c e n e i t h e r
the t r u e e r r o r terms nor

GLS
in
the a u t o c o r r e l a t i o n c o e f f i c i e n t
e s t i m a t e s can p o s s i b l y be o b t a i n e d .
We
were p l e a s e d
the j o i n t presence of m u l t i c o l l i n e a r i t y and
i s known, no
to f i n d
autocorrelation;
whatever the degree i s , Durbin's two-step method i n c o n j u n c t i o n

Ridge r e g r e s s i o n
the GLS
with
(p^ unknown) y i e l d s even b e t t e r e s t i m a t e s of 8 than
technique ( p
known) does i n MSE
criterion.
Though the
o f k g i v i n g b e t t e r e s t i m a t e s of 3 tends to y i e l d l e s s MSE
prediction, s t i l l
the cases.
that
the GLS
gives
value
of
the minimal-MSE o f p r e d i c t i o n i n a l l
Besides, our e x p e r i m e n t a l r e s u l t s have shown t h a t
Durbin-Watson t e s t f o r d e t e c t i n g the e x i s t e n c e
the
of f i r s t - o r d e r
auto-
c o r r e l a t i o n remains p o w e r f u l i n the presence of m u l t i c o l l i n e a r i t y w h i l e

the H a i t o v s k y h e u r i s t i c s t a t i s t i c g i v e s r e l a t i v e l y l i m i t e d
about the e x i s t e n c e
of - m u l t i c o l l i n e a r i t y e i t h e r w i t h or without
presence of a u t o c o r r e l a t e d
Our
"optimal
autocorrelation.
to the s e a r c h
independent phenomena.
Empirical research
f o r optimal
d e a l i n g w i t h m u l t i c o l l i n e a r i t y and
and
find,an
package" t h a t d e a l s w i t h the j o i n t problem of
m u l t i c o l l i n e a r i t y and
been c o n f i n e d
the
e r r o r terms.
r e s u l t s a l s o suggest t h a t i t might be p o s s i b l e to
estimation
information
The
estimation
autocorrelated
ordinary
has
hitherto
techniques i n
e r r o r s as
separate
ridge regression, i . e . ,
T
adding a constant
k on the d i a g o n a l
of c o r r e l a t i o n m a t r i x (X X)
Durbin's two-step method have been shown to be v e r y

techniques i n h a n d l i n g
m u l t i c o l l i n e a r i t y and
Even though s a t i s f a c t o r y e s t i m a t i o n
and
the combination o f Durbin's method and

t h e r e may
still
e x i s t some o t h e r
and
powerful
a u t o c o r r e l a t i o n problems.
p r e d i c t i o n are o b t a i n e d
ordinary
ridge
by
regression,
even more e f f i c i e n t approaches to
the j o i n t problem o f m u l t i c o l l i n e a r i t y and
autocorrelation.
For
-58-
i n s t a n c e , the combination
of the Cochrane-Orcutt
procedure
G e n e r a l i z e d Ridge r e g r e s s i o n i s a more f l e x i b l e e s t i m a t i o n
and
thereby
with
technique
s h o u l d l e a d to b e t t e r e s t i m a t i o n and p r e d i c t i o n .
Allowing
f o r h i g h e r o r d e r and mixed o r d e r a u t o c o r r e l a t i o n w i l l be a good

d i r e c t i o n to pursue as w e l l .
BIBLIOGRAPHY
Cochrane, D. and Orcutt, G. H. (1949). Application of l e a s t squares regressions to relationships containing autocorrelated
error terms. J . Am. S t a t i s t . Assoc., 44, 32-61.
Durbin, J . (1960). Estimation of parameters i n time-series
regression models. J . Royal S t a t i s t . S o c , Series B,
139-153.
Durbin, J . (1970). Testing for s e r i a l c o r r e l a t i o n i n l e a s t squares regression when some of the regressors are lagged
dependent variables. Econometrica, 38, 410-421.
Durbin, J . and Watson, G. S. (1950). Testing f o r s e r i a l
c o r r e l a t i o n i n least-squares regression (part 1). Biometrica,
J 3 7 , 409-428.
Durbin, J . and Watson, G. S. (1951). Testing for s e r i a l c o r r e l a t i o n
i n least-squares regression (part 2). Biometrica, 38, 159-178.
Farrar, D. C. and Glanber, R. R. (1967). M u l t i c o l l i n e a r i t y i n
regression analysis: the problem r e v i s i t e d . Rev. Economics
S t a t i s t i c s , 49, 92-107.
G r a y b i l l , F. A. (1976). Theory and Application of the Linear
.Model. Daxburg Press, North Scituate, Mass.
G r i l i c h e s , Z. and Rao, P. (1969). Small-sample properties'of
several two-stage regression methods i n the context of .
autocorrelated errors. JASA, 64, 253-272.
Hailovsky, Y. (1969). M u l t i c o l l i n e a r i t y i n regression analysis:
comment. Rev. Economics S t a t i s t i c s , 486-489.
Henshaw, R. C., J r . (1966). Testing single-equation least squares
regression models for autocorrelated disturbances.
Econometrica,
34, 646-660.
Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression:
biased
estimation for nonorthogonal problems. Tech., 12, 55-67.
Hoerl, A. E., Kennard, R. W., and Baldwin, K. F. (1975). Ridge
regression: some simulations. Comm. Stat. , k_, 105-123.
Johnston, J . (1972).
Econometric Methods.
2nd edn., McGraw-Hill.
K l e i n , L. R. (1962).
Hall.
An Introduction to Econometrics.
Prentice-
-60-
[15]
L i n , T. C. (1960). Underidentification, s t r u c t u r a l estimation and

forecasting. Econometrica, 28, 856.
[16]
Marriott, F. H. C. and Pope, J . A. (1954). Bias i n the estimation

of autocorrelation. Biometrica, 41, 390-402.
[17]
Nerlove, M. and W a l l i s , K. F. (1966). Use of the Durbin-Watson

S t a t i s t i c i n inappropriate s i t u a t i o n s . Econometrica, 34,
235-238.
[18]
Newhouse, J . P. and Oman, S. D. (1971). An evaluation of ridge

estimators. Rand report No. R-716-PR.
[19]
Smith, V. K. (1973).
[20]
Thisted, R. A. (1976). Ridge regression, minimax estimation and

empirical Bayes method. Ph.D. thesis, Tech. Report 28,
B i o s t a t i s t i c s Dept., Stanford University.
[21]
Thisted, R. A. (1978). M u l t i c o l l i n e a r i t y , information and ridge

regression. S t a t i s t i c s Dept., University of Chicago.
[22]
-yon Neumann, J . (1941). D i s t r i b u t i o n of the r a t i o of the mean

square successive difference to the variance. Ann. Math. Stat.
12, 367-395.
[23]
White, J . S. (1961). Asymptotic expansions f o r the mean and

variance of the s e r i a l . c o r r e l a t i o n c o e f f i c i e n t .
Biometrica,
48, 85-94.
Monte Carlo Methods.. Lexington, Mass.

Multicollinearity Vs Autocorrelation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multicollinearity Vs Autocorrelation

Uploaded by

Copyright:

Available Formats

MULTICOLLINEARITY, AUTOCORRELATION, AND RIDGE REGRESSION

JACKIE JEN-CHY HSU

A THESIS SUBMITTED IN PARTIAL FULFILMENT OF

MASTER OF SCIENCE fBUSISJESS ADMINISTRATION)

THE UNIVERSITY OF BRITISH COLUMBIA

I t has been shown t h a t r i d g e r e g r e s s i o n can reduce t h i s

methods, have been proposed to o b t a i n good estimates

terms can a l s o cause s e r i o u s e s t i m a t i o n problems.

Although the m u l t i c o l l i n e a r i t y and

a u t o c o r r e l a t i o n problems have l o n g been r e c o g n i z e d

the j o i n t e f f e c t s of these two c o n d i t i o n s on the mean

r e g r e s s i o n i s doubly advantageous when m u l t i c o l l i n e a r i t y i s

We then d e r i v e a new r i d g e type e s t i m a t o r

and a u t o c o r r e l a t i o n , we compare the mean

JOINT EFFECTS OF MULTICOLLINEARITY AND AUTOCORRELATION

Mean Square Error of the OLS Estimates of '

Mean Square Error of the Ridge Estimates of,3

When w i l l Ridge estimates be better than the

Use of the "Ridge Trace"

ESTIMATES, MEAN SQUARE ERROR AND

Derivation of Ridge Estimator for a CLR Model

Derivation of Ridge Estimator for an ALR Model

Mean Square Error of the "Generalized Estimates"

TABLE OF CONTENTS (cont'd)

THE MONTE CARLO STUDY

As i s well-known, the presence o f some degree

Because these two c o n d i t i o n s have adverse e f f e c t s on

I n v a r i a b l y , the m u l t i c o l l i n e a r i t y and auto-

i n most i f not a l l the

"What are. the j o i n t e f f e c t s

we s h a l l study a n a l y t i c a l l y the p o s s i b l e changes i n the

e s t i m a t i o n methods i n the j o i n t presence of

As a r e s u l t o f these new f i n d i n g s , a new r i d g e

f o r a u t o c o r r e l a t i o n i s then proposed and i t s

existing diagnostic tests.

c o r r e l a t i o n i s d e r i v e d and i t s mean square e r r o r p r o p e r t i e s a r e a n a l y z e d .

methodology and the r e s u l t s o f sampling experiments appear i n S e c t i o n

methods t h a t can be used w i t h the new

ridge r u l e that hopefully

2. NOTATION AND PRELIMINARIES

(CLR) model can be r e p r e s e n t e d by

The s t a n d a r d assumptions o f the l i n e a r

(2) E(ee ) = a I , where I i s t h e i d e n t i t y m a t r i x .

with variance-covariance matrix

B (k) =(X X+kI)~

When k i s a f u n c t i o n o f |3Q > ( k )

For the CLR

i s often violated i n practice.

(2) t h a t the e r r o r s are

model i s given by r e p l a c i n g assumptions (2) and

t ; L > 2 > tp-' ' ^

v a r i a b l e s , i s independent of the contemporaneous and

assume t h a t the e r r o r term 'e

For an ALR model, the " G e n e r a l i z e d L e a s t - s q u a r e s "

the f o l l o w i n g s u b s t i t u t i o n i n the ALR model;

(2.12) s a t i s f i e s a l l the assumptions o f a CLR model, OLS

i s the t*"* GLS

(2.15) g i v e s the "Best L i n e a r Unbiased P r e d i c t o r "

(BLUP) i n a f i r s t - o r d e r ALR model

p r e f e r a b l e to t h i n k of m u l t i c o l l i n e a r i t y i n terms of i t s " s e v e r i t y "

than i t s " e x i s t e n c e "

sources of severe m u l t i c o l l i n e a r i t y may

In many cases l a r g e d a t a s e t s o n l y c o n t a i n a few b a s i c f a c t o r s .

i s the t" GLS