Professional Documents
Culture Documents
There are three important types of time series which one is likely to find in
financial econometrics:
Stationary.
Trend Stationary
yt = α + µt + εt where εt ˜N (0, σ2 )
Notice that the mean of this process varies with time but the variance is
constant.
E(yt ) = α + µt
Notice that if you define a new variable, say yt∗ , yt∗ = yt − (α + µt) then yt
is stationary.
Unit roots
A Random Walk.
The simplest example of a process with a unit root is a random walk, i.e.,
yt = yt−1 + εt (1)
where εt is i.i.d. with zero mean and constant variance.
1
Lagging the process one period we can write
yt = yt−2 + εt−1 + εt
yt = y0 + ε1 + ε2 + ... + εt−1 + εt
Mean of a RW process
If y0 is fixed the mean is constant over time
As we move further into the future this expression becomes infinite. We conclude
that the variance of a unit root process is infinite.
A unit root process will only cross the mean of the sample very infrequently,
and the process will experience long positive and negative strays away from the
sample mean.
A process that has a unit root is also called integrated of order one,
denoted as I(1). By contrast a stationary process is an integrated of order
zero process, denoted as I(0).
ARIMA(p, d, q)
A common practice using the Box and Jenkins methodology was just to take
first differences of the series and analyze de differenced process. We can then
define an ARIMA(p, d, q) process as
(1 − φ1 L − ...φp Lp )(1 − L)d yt = (1 + φ1 L − ... + φq Lq )εt
2
under these circumstances but it turns out that is not a t-distribution and that
is biased to the left.
Consider the following model
yt = α + yt−1 + εt (2)
where εt ”is assumed” to be N(0,σ 2 ). It can easily be shown that asymptot-
ically √
T (α̂T − α) −
→L N (0, (1 − α2 ))
If we where able to use this distribution for α = 1 then
√
T (α̂T − α) −
P 0
→
Even though this is valid, it is not very useful for hypothesis testing.
To obtain a non-degenerate asymptotic distribution for α̂T in the unit root
case, it turns out that we have to multiply by T and not by the square root
of T . Then the unit root coefficient converges at a faster rate T than for the
stationary case.
To get a better sense of why scaling by T is necessary when the true value
of α is unity consider
PT
yt−1 εt
α̂T − α = Pt=1 T 2
t=1 yt−1
then PT
T −1 t=1 yt−1 εt
T (α̂T − α) = PT
T −2 2
yt−1
t=1
Now, under the null that α = 1, yt can be written as
yt = y0 + ε1 + ε2 + ... + εt−1 + εt
= ε1 + ε2 + ... + εt−1 + εt if we assume y0 = 0.
Then under the null that α = 1, yt ~N(0, σ 2 t). To find the distribution of
the numerator we need to do some easy but tedious algebra. We start by noting
that under the null,
3
Then, recalling that y0 = 0 and multiplying by (T −1 ) we obtain the expression
of the numerator as the sum of two terms
T
X T
1 2 X 1 2
T −1 yt−1 εt = ( )y − ( )ε
t=1
2T T t=1 2T t
Consider the first term of this expression. Since we have shown above that
yt ~N(0, σ 2 t), standardizing we obtain
√
(yT /σ T )˜N (0, 1),
and then squaring this expression we find that the first term of the numerator
is distributed Chi-square
√
(yT /σ T )2 ˜χ2 (1).
It can be shown using the law of large numbers that the second term converges
in probability to σ 2 , i.e.
T
X T
X
(1/T ) ε2t P σ2 ,
−
→ or (1/σ 2 T ) ε2t P 1
−
→
t=1 t=1
If we put both results together we can see that the numerator converges to
T
X
(σ 2 T )−1 yt−1 εt L
−
→ (1/2)(X − 1) where X˜χ2 (1).
t=1
It can also be shown using the law of large numbers that the the denominator
converges in probability to
T
X
E(T −2 2
yt−1 ).
t=1
4
A convenient way of finding the distributions of processes with unit roots is
using continuous time stochastic processes defined below.
Brownian Motion
To show what is a BM we may start first considering a simple RW without
drift.
yt = yt−1 + εt , εt ˜N (0, 1)
or
yt = ε1 + ε2 + ... + εt−1 + εt and yt ˜N (0, t)
Now consider the change between t and s
ys − yt = εt + εt+1 + ... + εs
We may associate e1t with the change between yt−1 and yt at some interim
point, say yt−1/2 , such that
and
yt − yt−1/2 = e2t
Sampled at an integer yt −yt−1 ~N(0,1), but we can consider n possible divisions
as above such that,
Definition of a SBM
W (.) is a continuous time process,associating each date t ∈ [0, 1] with the
scalar W (t) such that
5
(a) W (0) = 0
(b) for any dates 0 < t1 < t2 ... < tk < 1,
[T /2]∗ = T /2 f or T even
6
then ∗ ∗
Tr Tr
√ √ X √ √ √ X
T XT (r) = (1/ T ) εt = ( T r∗ / T )(1/ T r∗ ) εt
t=1 t=1
but ∗
√ Tr
X √ √
2
√
(1/ T r∗ ) εt L
−
→ N (0, σ ) while ( T r ∗/ T ) → r
t=1
√ √
√Hence the asymptotic distribution of T X(r) is that of r times a N (0, σ 2 )
or T XT (r) − →L N (0, σ 2 r).
Clearly this implies that
√
T XT (r)
L N (0, r).
−
→
σ
√
T XT (r)
Notice that this is evaluated at a given value r (N B is a random
σ
variable).
We could consider the behaviour of the sample mean based on T r1∗ through
∗
T r2 for r2 > r1 and conclude that
√
T (XT (r2 ) − XT (r1 ))
L N (0, r2 − r1 )
−
→
σ
√
Then a sequence of stochastic functions T (X σ
T (.) ∞
|T =1 has an asymptotic
probability law that is described by a standard brownian motion.
√
T (XT (.))
−L W (.) (this is a random f unction)
→
σ
−1
PT
This function evaluated at r = 1 is just the sample mean, XT (1) = √T t=1 εt ,
T (XT (1))
then when r = 1 the CLT is a special case of this function, that is −→L W (1) =
σ
N (0, 1)
Convergence in Functions
Let S(.) represent a continuous time stochastic process with S(r) repre-
senting its value at some date r for r ∈ [0, 1]. Suppose further that for any
given realization S(.) is a continuous function of r with probability 1, being
{ST (.)}|∞
T =1 a sequence of such continuous functions,
We say that
ST (.) −
L S(.) whenever the following holds:
→
7
a) For any finite collection of k particular dates 0 < r1 < r2 .... < rk ≤ 1, the
sequence of k dimensional random vectors yT − L y where
→
· ¸ · ¸
ST (r1 ) S(r1 )
yT ≡ and y ≡
ST (rk ) S(rk )
b) For each ε > 0 the probability that ST (r1 ) differs from ST (r2 ) for any dates
r1 and r2 within a distance δ of each other goes to zero uniformly in T as
δ → 0..
c) P {|ST (0)| > λ} → 0 uniformly in T as λ → ∞.
Then {YT }|∞T =1 is a sequence of random variables and we could talk about its
probability limit. If the sequence converges in prob to 0, then we can say that
ST (r) −
P VT (r) or
→ Sup|ST (r) − VT (r)| −
P 0.
→
This can be generalized to sequences of functions. For example, if {ST (.)}|∞
T =1
and {VT (.)}|∞
T =1 are sequences of continuous functions with VT (.) −P ST (.) and
→
ST (.) −
L S(.), for S(.) a continuous function, then VT (.) −
→ L S(.)
→
8
( Examples of continuous functionals)
R1 R1
y = 0 S(r)dr or y = 0 [S(r)]2 dr
Example
√
T XT (.)
Given L W (.) , we can simply use the theorem to get
−
→
σ
√
T XT (.) −L σW (.)
→
or
√
[ T XT (.)]2 L [σW (.)]2
−
→
yt = yt−1 + εt εt ˜iid(0, σ2 )
or
yt = ε1 + ε2 + ...... + εt−1 + εt
this can be used to express the stochastic function
0 for 0 ≤ r < (1/T )
y1 /T for (1/T ) ≤ r < (2/T )
XT (r) =
y2 /T for (2/T ) ≤ r < (3/T )
yT /T for r = 1
Notice that the tth rectangle has width 1/T and height yt−1 /T. The total
area is then
Z 1
XT (r)dr = y1 /T 2 + y2 /T 2 + ........... + yT −1 /T 2
0
√
Multiplying both sides by T
Z 1 √ T
X
T XT (r)dr = T −3/2 yt−1
0 t=1
9
Figure 1:
√
( since [ T XT (.)] L
−
→ [σW (.)]), implying
T
X Z 1
T −3/2 yt−1 L
−
→ [σW (r)]dr
t=1 0
yt = ρyt−1 + εt εt ˜iid(0, σ 2 )
PT
yt−1 εt
(ρ̂T − ρ) = Pt=1
T 2
t=1 yt−1
then and assume that we want to test the null
H0 ) ρ = 1 .
PT
T −1 t=1 yt−1 εt
T (ρ̂T − 1) = PT
T −2 2
yt−1
t=1
becuase √
[ T XT (.)]−
L [σW (.)],
→
10
√
[ T XT (.)]2 −
L [σW (.)]2 ,
→
Z 1√ Z 1
[ T XT (r)]2 dr − L
→ [σW (r)]2 dr,
0 0
and
T
X Z 1 √
−2 2
T yt−1 = [ T XT (r)]2 dr
t=1 0
0 for 0 ≤ r < (1/T )
2
(y1 /T ) for (1/T ) ≤ r < (2/T )
2 R1 √
(since(XT (r)) = (y2 /T )2 for (2/T ) ≤ r < (3/T ) and 0 [ T XT (r)]2 dr =
. .
2
(yT /T ) for r=1
³ ´
T (y1 )2 /T 3 + (y2 )2 /T 3 + ........... + (yT −1 )2 /T 3 )
Also
T
X
T −1 yt−1 εt L (1/2)σ 2 [W (1)2 − 1]
−
→
t=1
PT PT
(Recall that T −1 t=1 yt−1 εt = (1/2T )yT2 − 2
t=1 (1/2T )εt (when y0 = 0))
then
(1/2)σ 2 [W (1)2 − 1]
T (ρ̂T − 1) = R1
0
[σW (r)]2 dr
Recall that W (1)2 is chi square with one degree of freedom. The prob that a
Chi-Square is less than 1 is .68 and since the denominator is positive, the prob
that ρ̂T − 1 is negative approaches 0.68 as T tends to infinity. In other words
in two thirds of the samples generated by a RW, the estimate will be less than
the true value of unity or negative values will be twice as likely than positive
values.
In practice critical values for the random variable T (ρ̂T − 1) are found by
calculating the exact small-sample distribution assuming the innovations are
gaussian, generally by Monte Carlo.
11
P
√ T −3/2 Tt=1 yt−1 εt
T (ρ̂T − 1) = PT 2
T −2 t=1 yt−1
1
but the numerator converges to T − 2 (1/2)σ 2 [W (1)2 − 1] . The chi-square
function has finite variance then the variance of the numerator is of order
√ (1/T ),
meaning that the numerator converges in probability to zero, hence T (ρ̂T − 1)
−
→P 0
yt = µ + βt + αyt−1 + εt (3)
2
where εt ”is assumed” to be N(0,σ )
We want to test the Hypothesis of the existence of a unit root therefore we
set the following null and alternative hypothesis.
H0 ) α = 1 ( unit root )
H1 ) α < 1 ( Integrated of order zero )
The obvious estimator of is the OLS estimator, β̂. The problem is that under
the null hypothesis there is considerable evidence of the non - adequacy of the
asymptotic (approximate in large samples) distribution. Therefore
Equation (3) can be reparameterizied as
∆yt = µ + (α − 1)yt−1 + βt + εt
or
H0 )λ = 0 ( unit root )
H1 )λ < 0 ( Integrated of order zero )
Fuller (1976) tabulated, using Monte Carlo methods, critical values for al-
ternative cases, for example for a sample size of 100 the 5 % critical values
are
µ = 0, β = 0 −2.24
µ 6= 0, β = 0 −3.17
µ 6= 0, β 6= 0 −3.73
Therefore the method simply consist to check the t-statistic of λ̂ against the
critical values of Fuller (1976). Notice that the critical values depend on
12
i) the sample size
ii) whether you include a constant and/or a time trend.
In this case we chose to augment the regression with k lags. This is usually
denoted as ADF(k). To choose the order of augmentation of the DF regression
several procedures have been proposed in the literature. Some of these consist
in:
The power ( the ability to reject the null Hypothesis) of the Dickey Fuller
Tests is notoriously weak. Thus it can be difficult to reject the null of a unit root
test even if the true series is stationary. That is, most of the time an ADF test
will not reject the null of unit root even if the true model is an autoregressive
model with an autoregressive coefficient of say .8 (that is an Integrated of order
zero series). In addition the ADF tests often come up with conflicting results
depending in the order of the lag structure. A good practice is to start with a
quite general model and delete the non significant lags of ∆yt .
13
The ADF test includes additional lagged terms to account for the fact that
the DGP might be more complicated than an AR(1). An alternative approach
is that suggested by Phillips(1987) and Perron (1988). They make a non-
parametric correction to the standard deviation which provides a consistent
estimator of the variance.
They use
T
X l
X T
X
ST2 l = T −1 (ε2t ) + 2T −1 εt εt−j
t=1 t=1 t=j+1
T
X
Sε2 = T −1 (ε2t )
t=1
where l is the lag truncation parameter used to ensure that the autocorrelation
of the residuals is fully captured.
An asymptotically valid test φ = 1, for
when the underlying DGP is not necessarily an AR(1) process, is given by the
Phillips Z-test.
T
X
Z(τ µ ) = (Sε /ST l )τ µ − (1/2)(ST2 l − Sε2 )[ST l [T 2 (yt − y)2 ].5 ]−1
t=2
where τ µ is the t-statistic associated with testing the null hypothesis ρ = 1. The
critical values for this test statistic are the same as those used for the same case
in the fuller table. Monte Carlo work suggests that the Phillips-type test has
poor size properties (tendency to over reject when is true) when the underlying
DGP has large negative MA components.
Spurious Regressions.
Granger and Newbold (1974) have shown that, using I(1), you can obtain
an apparently significant regression (say with a high R2 ) even if the regressor
and the dependent variable are independent. He generated independent random
walks regress one against the other and obtain very high R2 for this equation.
They conclude that this result is spurious. As a rule of thumb whenever you
14
obtain a very high R2 and a very low DW you should suspect that the result
are spurious.
Another important issue is that whenever you try to explain a variable, say
yt , by another variable, say xt , you should check that these variables are inte-
grated of the same order to obtain meaningful results.
Recursive t-statistics
The recursive ADF -statistic is computed using sub samples t = 1..k f ork =
k0 , .....T , where k is the start up value and T is the sample size of the full sam-
ple. The most general model (with drift and trend) is estimated for each sub
sample and the minimum value of τ τ (k/T ) across all the sub samples is chosen
and compared with the table provided by Banjeree, Lumsdaine and Stock
15
This method could also be applied using a (large enough) window to see if
there are clear changes in the pattern of a series. This can be done by carrying
out the following steps
yt = δt + ξ t + εt
For testing the null of the level stationary instead of trend stationary the test
is constructed the same way except that e is obtained as the residual from a
regression of y on an intercept only. The test is an upper tail test. When the
errors are iid the asymptotic distribution of the test is derived in Nabeya and
Tanaka (1988). In other cases need to be conviniently adjusted.
16
Variance Ratio Tests
Given the low power of the ADF test, the variance ratio test will provide us
with another tool to discriminate between stationary and non-stationary series.
Consider yt and assume that it follows follows a random walk, i.e
yt = yt−1 + εt
V ar(∆k yt )
λ1 (k) = = k.
V ar(∆1 yt )
Proof
To see this assume the following AR(1) process
Example AR(1)
yt = φ1 yt−1 + εt , t = 1, ......, T
Then subtracting yt−k in both sides of the equation we get the following expres-
sion
17
Notice that
V (yt−k ) = V (yt ) = (1/(1 − φ21 ))σ 2
andP
k−1
V ( j=0 φj1 εt−j ) = ((1 − φ2k 2
1 )/(1 − φ1 ))σ
2
In the same way we can express for a stationary process the variance of the first
difference of yt , (yt − yt−1 ) = (φ1 − 1)yt−1 + εt .
then the limit of λ1 (k) when k tends to infinite for a stationary process is
1
limk→∞ λ1 (k) =
1 − φ1
which is a constant provided that φ1 6= 1.
It should be clear from the previous result that the limit of λ2 (k) equals 0
when k tends to infinite.
The kth difference can be obtained simply by subtracting the two above equa-
tions
18
Then the variance of yt − yt−k may be written as
k−1
X ∞
X
V (yt − yt−k ) = σ 2 θ2j + σ 2 (θk+j − θj )2
j=0 j=0
From the previous equation we can see that when k tends to infinity, the variance
of ∆k yt is equal to
k−1
X
V (∆k yt ) = 2σ 2 θ2j
j=0
This expression is a constant and might be greater or smaller than one de-
pending on the θj values. Therefore can simply distinguish between the two
models by simply noting that under the random walk assumption λ1 increase
with k and that under the trend stationary assumption λ1 tends to a constant.
Alternatively we can consider λ2 V ar(∆k yt )/k. and note both, that when the
model is a random walk this expression tends to 1 (see proof above), and that
this ratio should tend to zero when k tends to infinity since V ar(∆k yt ) is con-
stant for the trend stationary model.
19