You are on page 1of 19

ECON 762 Lecture Notes

Curtis Aquino
January 12, 2017

Contents
1 Lecture 1: Introduction to Time Series Models (2017-01-10)
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Patterns in Time Series . . . . . . . . . . . . . . . . . . . . . .
1.3 Stationary versus nonstationary Series . . . . . . . . . . . . . .
1.4 Characterizing Time Series . . . . . . . . . . . . . . . . . . . .
1.4.1 The Autocorrelation Function . . . . . . . . . . . . . . .
1.4.2 Sample Autocorrelation Function . . . . . . . . . . . . .
1.4.3 Nonstationarity and Differencing . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

2
2
2
3
3
3
4
4

2 Lecture 2: Introduction to Time Series Models (2017-01-12)


2.1 Examples of Univariate Random Processes . . . . . . . . . . . .
2.1.1 White Noise Processes . . . . . . . . . . . . . . . . . . .
2.1.2 Random Walk Processes . . . . . . . . . . . . . . . . . .
2.2 Tests for White Noise Processes . . . . . . . . . . . . . . . . . .
2.2.1 Individual Tests: Bartletts Test . . . . . . . . . . . . .
2.2.2 Joint Tests: Ljung-Boxs Test . . . . . . . . . . . . . . .
2.2.3 White Noise Test Example . . . . . . . . . . . . . . . .
2.2.4 A Simulated Illustration . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

4
4
4
5
5
5
5
6
6

(2017-01-17)
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .

9
9
9
12
13
14
15

.
.
.
.
.
.

16
16
16
16
17
18
18

3 Lecture 3: Random Walks, Unit Root Tests, and Spurious


3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Properties of a Random Walk . . . . . . . . . . . . . . . . . .
3.3 The Autocorrelation Function for a Random Walk . . . . . .
3.4 Classical Least Squares Estimators and Random Walks . . . .
3.5 Unit Root Tests . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Random Walks and Spurious Regression . . . . . . . . . . . .

Relationships
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .

4 Lecture 4: Univariate Linear Time Series Models (2017-01-19)


4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Moving Average Models (M A(q)) . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Structure of M A(q) Processes . . . . . . . . . . . . . . . . . . . .
4.2.2 Properties of M A(q) Processes . . . . . . . . . . . . . . . . . . .
4.2.3 Stationarity and Forecasting M A(q) Processes . . . . . . . . . .
4.2.3.1 Forecasting M A(q) Processes where T i , i are Known
5 Lecture 5: Univariate Linear Time Series Models (2017-01-24)

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

19

Lecture 1: Introduction to Time Series Models (2017-01-10)

1.1

Overview

This course covers four topics: time series, robust inference, robust estimation, and model uncertainity. We
will use R, R Studio, LaTeX, and github to learn reproducible research. Typically, we discuss the structure in
data in terms of cause-and-effect, that is, by identifying predetermined conceptual predictors. We believe
that X, an n k matrix, has an effect on Y , an n 1 vector, so the independent variables affect the dependent
variables, sometimes in the form Y = X + u, where X causes changes in Y as a marginal effect. Time series
(TS) is different. The structure lies in the underlying stochastic process. Fundamentally, a TS is observed
when the TS data is ordered temporally and the structure of the data is fundamentally dependent on the
temporal ordering.
We assume that we have a set of data, ordered in time, that has a stochastic process that can be characterized
and described. A TS model refers to the characterization of the underlying stochastic process that generates
the observed data {yt }Tt=1 . Our aim is not to identify parameter estimates of marginal effects for causality,
instead, we want to forecast. We forecast purely on the basis of past observation of the series itself. Suppose
we have data {yt }Tt=1 and we want to forecast. What is a forecast? We use yT +h = E[YT +h |y1 , . . . , yT ] where
h > 0 for an h-step forecast. Old rules no longer apply here, for example, least-squares estimation (LSE) will
not work.
Summary
Typical causal econometric models take the form yi = X + ui where xs cause movements in y.
The structure of univariate time series models characterizes the underlying stochastic process that
generates the series with the purpose of forecasting h-steps forward by yT +h = E[YT +h |{yt }Tt=1 ]

1.2

Patterns in Time Series

We define a trend to be a steady rise or fall in the series over time. A seasonal pattern is said to exist when a
time series is affected by seasonal factor that occur within a year, or more generally, within a defined and fixed
period. In R, we use the function seasonplot(), and below is the plot for seasonalplot(AirPassengers,
main="Seasonal Plot, Air Passengers 1949-1960", xlab="Month", ylab="Passengers". A cycle or
cyclical pattern is said to exist when data exhibits a trend in a non-fixed period, like a business cycle.

400
300
200
100

Passengers

500

600

Seasonal Plot, Air Passengers 19491960

Jan Feb Mar Apr May Jun

Jul

Month

Aug Sep Oct Nov Dec

1.3

Stationary versus nonstationary Series

We consider the difference between stationarity and non-stationarity. Stationarity refers to when data appears
to be the same regardless of when it is observed or measured and the properties of the series do not depend
on time. Consequently, a stationary time series is a time series whose statistical properties, such as mean,
variance, and autocorrelation, are constant over time.
In the context of our previous discussion, trends and seasonal trends are nonstationary, while cyclical trends
are stationary because cycles are not of fixed length so the pattern is not predictable. Stationarity is
important because we are forecasting future data and trying to build models on the basis of past data. We
will use estimation techniques to accomplish this (such as fitting models using maximum-likelihood) and the
coefficients will be in some sense stable for future predictions.
Formally, a series is said to be stationary if its joint probability distribution {yt }Tt=1 is invariant with respect to
time. The joint probability distribution f (y1 , . . . , yT ) is time invariant if f (y1 , . . . , yT ) = f (y1+h , . . . , yT +h ),
that is, the joint probability distribution from shifting the series by h periods will be the same for any h. A
consequence of stationarity is that the unconditional mean, variance and covariance cannot depend on time.
This results in the properties that:

E[yt ] = y t
V [yt ] = E[(yt E[yt ])2 ] = y2 t
Cov(yt , yt+h ) = E[(yt E[yt ])(yt+h E[yt+h ])] = h t

Summary
Stationarity is defined as the joint probability distribution f (y1 , . . . , yT ) = f (y1+h , . . . , yT +h ) and a
consequence is that E[yt ] = y , V [yt ] = y2 , Cov[yt , yt+h ] = h . The properties of a stationary time
series do not depend on the time that the series is observed. Seasonal and non-seasonal trends (rises
or falls) are nonstationary while some cycles are stationary.

1.4
1.4.1

Characterizing Time Series


The Autocorrelation Function

It is typically not possible to specify the underlying joint PDF of a stochastic process but we can examine
the amount of correlation between neighbouring data points separated by k periods. The autocorrelation
function measures the amount of dependence present in neighbouring values of a series (i.e. of a time series to
its lagged version). For a stationary series, this will simplify greatly becaue our moments are time invariant:
E[(yt E[yt ])(yt+k E[yt+k ])]
p
p
V [yt ] V [yt+k ]
Cov[yt , yt+k ]
=
V [yt ]
k
=
0

ACF = k =

We can see intuitively that if we have a stationary series, 0 = 1 when k = 0. This definition requires
knowledge of the population moments. When this is not known, we use the sample ACF.
3

1.4.2

Sample Autocorrelation Function


SACF = k =

PT k

(yt y)(yt+k y)
PT
)2
t=1 (yt y

t=1

Unless otherwise stated, we work with the sample autocorrelation function (hereafter ACF) which is useful for
diagnostics, determining stationarity, and other applications. The ACF can be used to determine stationarity
because if a series is stationary, the ACF will decay very rapidly towards zero, almost immediately, and if it
doesnt it is nonstationary.
1.4.3

Nonstationarity and Differencing

We can determine whether a series is stationary or nonstation by the ACF. The ACF of a stationary series
quickly drops to zero. Then, we can model a nonstationary series by first differencing once or twice to render
the series stationary (R identifies the number of differences needed by ndiffs):

4yt = yt yt1
42 yt = 4yt 4yt1
= (yt yt1 ) (yt1 yt2 )
= yt 2yt1 + yt2
Seasonal differencing is the difference between an observation and the corresponding observation from the
previous period. Hence, we use 4yt = yt ytm where m denotes the number of seasons (or more generally
periods) in a year. For example, for quarterly data m = 4 and for monthly data m = 12. These are also
known as lag-m differences since we subtract the observation having a lag of m periods.
Summary
The autocorrelation function measures the amount of dependence in a time series to its lag. It is the
covariance between a series and its lag divided by the square root of the product of the variances
of the series and the lag of the series. For a stationary series this simplifies to k /0 and decays
rapidly towards zero. We can render a nonstationary series stationary by differencing k times by
4k yt = 4k1 yt 4k1 ytm

2
2.1
2.1.1

Lecture 2: Introduction to Time Series Models (2017-01-12)


Examples of Univariate Random Processes
White Noise Processes

A conequence of stationarity is that the unconditional means (y ), variances (y2 ), and covariances (k ) are
time invariant. The simplest time series is known as a white noise process: random with no pattern. A
white noise process is a stochastic process given by yt = t where the error t (0, t2 ) is identically and
independently distributed. The process itself is stationary and the ACF of a white noise process drops off
almost immediately.

2.1.2

Random Walk Processes

A random walk process is given by yt = yt1 + t where the error t (0, t2 ) is identically and independently
distributed. An analogy for this type of process is a drunk stumbling home. A random walk with drift is
given by yt = d + yt1 + et where d 6= 0 ensures that there is some long run trend.
Summary
A white noise process is a stationary process given by yt = t |t (0, 2 ). A random walk process
is given by yt = yt1 + t |t (0, 2 ), and a random walk with drift process is given by yt =
d + yt1 + t |t (0, 2 )

2.2

Tests for White Noise Processes

White noise processes have k = 0k > 0 where we may want to test whether a series is white noise for
analysis and diagnostics. Consider the null hypothesis that H0 : k = 0 using k to permit assuming that the
true value k is in fact zero. We may be interested in testing a given value of k (an individual test) or a
range of k (a joint test). These tests are referred to as white noise tests and we are testing that the time
series follows the white noise process yt = t |t (0, 2 ).
2.2.1

Individual Tests: Bartletts Test

Bartletts test (1946) shows that under the null hypothesis H0 : k = 0 where yt = t |t (0, 2 ) is a
H

stationary white noise process, k 0 N (0, V [


k ]) where variance is given by:

V [
k ] =

T1

if k = 1

1+2

Pk1

2
j=1 j

if k > 1

We can therefore base a test of hypothesis H0 : k = 0 on the statistic:


k 0
Z=p
V [
k ]
H

If we want to test our hypothesis, we observe Z =0 k /V [


k ] and since k N (0, V [
k ]) it must be so that
Z sqrtV [
k ] N (0, V [
k ]). If we observe a stationary series where V [
k ] = V [
] k, we can divide through
by V [
], a constant, for Z
p
If we want to test our hypothesis, / V [
] (0, 1) which means that if |Z| < Z/2 we would
fail to reject the
null. If we were given (
1 , 2 , T ) = (0.15, 0.06, 100), we
can
test
H
:

=
0
by
Z
=

/V
[
1 ] = 0.15/0.01 =
0
1
1

1.51 < 1.96. We can test H0 : 2 = 0 by Z = 2 /V [ 2 ] = 0.06/0.10 = 0.68 < 1.96.


2.2.2

Joint Tests: Ljung-Boxs Test

Ljung-Boxs test (1978) propose using a Qlb statistic:


Qlb = T (T + 2)

k
X
j=1

2j
T j

The Qlb 2 (k) under the null hypothesis. We can test the null hypothesis using the chi-squared distribution
on the Qlb statistic.
Summary
White noise processes have k = 0k > 0 and we can use our SACF to test the hypothesis that
H
a process follows is a white noise p
process. For an individual test, 0 N (0, V [
]) so we can test
Pk1
|Z| < Z/2 where Z = (
k 0)/ V [
k ] and V [
k ] = 1/T + (2/T ) j=1 2j . For a joint test,
Pk
Qlb 2 (k) so we can test Qlb < 21 (k) where Qlb = T (T + 2) j=1 2j /(T j)

2.2.3

White Noise Test Example

Consider again (
1 , 2 , T ) = (0.15, 0.06, 100). Suppose we want to jointly test H0 : 1 = 0
Qlb = (100)(100 + 2)

h 0.151212
100 1

2 = 0. We find:

0.0697392 i
= 2.86
100 2

At = 0.05, 22,1 = 5.99147. We would fail to reject since 2.86 < 5.99.
2.2.4

A Simulated Illustration

First, I will generate data for a white noise, random walk, and random walk with drift process. Then, I
compute Bartletts Z statistic and the Ljung-Box Qlb statistic and print results in a table.
set.seed(42)
## Generate data
y.wn <- arima.sim(n = 50, list(order = c(0, 0, 0)))
y.rw <- ts(cumsum(rnorm(100)))
y.rwd <- ts(cumsum(rnorm(100, mean = -0.25)))
## Set maximum lag to be considered
lag.max <- 5
## Extract the length of the series, T
T <- length(y.wn)
## Extract the values of rho computed by acf()
rho.hat <- acf(y.wn, lag.max = lag.max, plot = FALSE)$acf
## Determine how many lags there are
K <- length(rho.hat)
## Compute the variance of each rho value
var.rho.hat <- c(NA, 1/T, (1 + cumsum(2 * rho.hat[3:K]^2))/T)
## Compute the Ljung-Box Q statistics
Q.lb <- c(NA, T * (T + 2) * cumsum((rho.hat^2/(T - 0:lag.max))[-1]))
## Compute Bartlett's Z statistic
Z.Bartlett <- c(NA, (rho.hat/sqrt(var.rho.hat))[-1])
## Compute the critical values for the Q statistics
Q.crit <- c(NA, qchisq(0.95, df = 1:(K - 1)))
## Print to a table
foo <- data.frame(lag = 0:lag.max, AFC = rho.hat, var.rho.hat,
Z.Bartlett, Q.lb, Q.crit)
names.foo <- c("\\text{Lag} $(k)$", "$\\hat\\rho_k$", "$Var[\\hat\\rho_k]$",
"\\text{Bartlett's} $Z$", "$\\mathcal{Q}_{lb}$", "$\\chi^2_{k,0.95}$")

knitr::kable(foo, caption = "ACF Summary (White Noise).", col.names = names.foo,


escape = FALSE)
Table 1: ACF Summary (White Noise).
Lag (k)

V ar[
k ]

Bartletts Z

Qlb

2k,0.95

0
1
2
3
4
5

1.0000000
0.0586076
0.0305210
0.0117585
0.0677539
0.0952305

NA
0.0200000
0.0200373
0.0200428
0.0202264
0.0205892

NA
0.4144185
0.2156152
0.0830567
0.4764037
0.6636771

NA
0.1822575
0.2327154
0.2403640
0.4998325
1.0238109

NA
3.841459
5.991465
7.814728
9.487729
11.070498

Finally, I plot each process and its first difference along with the ACF of both.
par(mfrow=c(2,2))
plot(y.wn,main="WN, No Differencing",ylab="y")
acf(y.wn,main="ACF(WN), No Differencing")
plot(diff(y.wn,differences=1),main="WN, One Difference",ylab="y")
acf(diff(y.wn,differences=1),main="ACF(WN), One Difference")

ACF(WN), No Differencing

0.2

0.6

ACF

0
2

10

20

30

40

50

10

15

Time

Lag

WN, One Difference

ACF(WN), One Difference

0.5

ACF

0.5

WN, No Differencing

10

20

30

40

50

Time

10
Lag

par(mfrow=c(2,2))
plot(y.rw,main="RW, No Differencing",ylab="y")
acf(y.rw,main="ACF(RW), No Differencing")
plot(diff(y.rw,differences=1),main="RW, One Difference",ylab="y")
acf(diff(y.rw,differences=1),main="ACF(RW), One Difference")

15

ACF(RW), No Differencing

0.2

0.6

ACF

4
2

RW, No Differencing

20

40

60

80

100

10

15

20

Time

Lag

RW, One Difference

ACF(RW), One Difference

0.2

0.6

ACF

0
3

20

40

60

80

100

Time

10

15

Lag

par(mfrow=c(2,2))
plot(y.rwd,main="RWD, No Differencing",ylab="y")
acf(y.rwd,main="ACF(RWD), No Differencing")
plot(diff(y.rwd,differences=1),main="RWD, One Difference",ylab="y")
acf(diff(y.rwd,differences=1),main="ACF(RWD), One Difference")

ACF(RWD), No Differencing

20

40

60

80

100

10

15

RWD, One Difference

ACF(RWD), One Difference

0.2

0.6

Lag

2
0

20

Time

ACF

0.6
0.2

ACF

25 10

RWD, No Differencing

20

40

60

80

100

Time

10
Lag

15

Lecture 3: Random Walks, Unit Root Tests, and Spurious Relationships (2017-01-17)

3.1

Overview

A random walk describes a series of random steps in any direction described by yt = yt1 + t |t N (0, 2 ).
If we rewrite our random walk process:

yt = yt1 + t
= yt2 + t1 + t
= ytj +

t
X

i

i=tj

= y0 +

t
X

i

i=1

Example: yt |t = 3 = y2 + 3
= y 1 + 2 + 3
y0 + 1 + 2 + 3
Pt

i=1 = i is a cumulative sum of the normally distributed stochastic errors. We can use cumsum() on a white
noise process for a random walk series simulation. Random walks play a role in time series analysis because,
for example, if prices follow a random walk we cannot consistently outperform the market net of transaction
costs and the optimal investment strategy would be to buy-and-hold an index fund. Classical inference fails
in for random walk processes so we need to know how to properly test for the presence of a random walk.

Summary
A random walk is given by yt P
= yt1 + t | N (0, 2 ). We can recursively substitute
out yt1 until
Pt
t
yt = y0 + 1 + + t = y0 + i=1 i or yt = ytj + tj + + t = ytj + i=tj i

3.2

Properties of a Random Walk

We observe the unconditional mean, variance and covariance of a random walk process. Mean is given by:

E[yt ] = E[y0 +

t
X

i ]

i=1

= y0

Example: E[yt |t = 3] = E[y0 +

3
X

i ]

i=1

= E[y0 ] + E[1 + 2 + 3 ]
9

= y0
E[yt+s |{yt }Tt=1 ] = E[yt + t+s |{yt }tt=1 ]

= yt + E[

s
X

i ]

= yt
Example: E[yt+s |{yt }Tt=1 ; t = 2, s = 1] = E[y2 + 3 |y1 , y2 ]
= y2
Variance is given by:

V [yt ] = E[(yt E[yt ])2 ] = E[(y0 +

t
X

i y0 )2 ]

i=1

= E[(

= E[

t
X

t
X
i )]
i )(

i=1

i=1

t
X

2i +

XX

i=1

= E[

t
X

i j ]

j,j6=i

E[(i E[i ])2 ]]

i=1

= t2
Example: V [yt |t = 3] = E[(1 + 2 + 3 )(1 + 2 + 3 )]
= E[21 + 1 2 + 1 3 + 22 + 2 1 + 2 3 + 23 + 3 1 + 3 2 ]

= E[

3
X

2i +

3 X
3
X

i=1

j,i6=j

i j ] = E[

3
X

2i ]

i=1

= E[21 + 22 + 23 ]


= E[(1 E[1 ])2 + (2 E[2 ])2 + (3 E[3 ])2 ]

10

= 2 + 2 + 2
= t2
V [yts ] = E[(yts E[yts ])2 ]

= E[(y0 +

ts
X

i y 0 ) 2 ]

i=1

= E[(

ts
X

i )(

i=1

= E[

ts
X

ts
X

2i +

ts X
ts
X

i=1

= E[

ts
X

i )]

i=1

i j ]

j,j6=i

E[(i E[i ])2 ]]

i=1

= (t s)2
Example: V [yts |t = 3, s = 2] = E[(y1 E[y1 ])2 ]
= E[(y0 + 1 y0 )2 ]
= E[21 ]
= E[E[(1 E[1 ])2 ]]
= (3 2)2
Covariance is given by:

cov(yt , yts ) = E[(yt E[yt ])(yts E[yts ])]

= E[(y0 +

t
X

i y0 )(y0 +

i=1

= E[(

t
X
i=1

11

i )(

ts
X
i=1

ts
X
i=1

i )]

i y0 )]

= E[

ts
X

2i +

t X
ts
X

i=1

= E[

ts
X

i j ]

j,i6=j

E[(i E[i ])2 ]]

i=1

= (t s)2
Example: cov[yt , yts |t = 3, s = 2] = E[(y3 E[y3 ])(y1 E[y1 ])]
= E[(y0 + 1 + 2 + 3 y0 )(y0 + 1 y0 )]
= E[21 + 2 1 + 3 1 ]
= E[21 ] + E[2 1 ] + E[3 1 ]
= E[E[(1 E[1 ])2 ]]
= (3 2)2
We have shown that the variance and covariance of a random walk depend on time. A process can be
stationary only if its moments are time invariant so a random walk cannot be stationary.
Summary
Expected value of a random walk is given by E[yt ] = E[yt1 + t ] = y0 by recursively substituting out
yt1 . Variance of a random walk is given by V [yt ] = E[(yt E[yt ])2 ] = t2 by using the expectation
of a random walk and manipulating variance and expectation rules. Covariance of a random walk is
found the same way by cov[yt , yts ] = E[(yt E[yt ])(yts E[yts ])] = (t s)2

3.3

The Autocorrelation Function for a Random Walk

The autocorrelation function for a finite sample of data generated by a random walk is given by:

s = p
=p
=p
r
=

cov[yt , yts ]
p
V [yt ] V [yts ]
(t s)2
p
t2 (t s)2
ts
t(t s)
ts
<1
t

12

Since the sample autocorrelation function is an unbiased estimator of the ACF, 1 < 1 will be reflected in the
sample ACF as well:
s =

3.4

PT

)(yts
t=s+1 (yt y
PT
)2
t=2 (yt y

y)

Classical Least Squares Estimators and Random Walks

Consider the OLS estimator for y = X and y = X + u. We observe that:


1 =

PT

)(yts
t=s+1 (yt y
PT
)2
t=2 (yt y

y)

p
We know that 1 is an unbiased estimator of 1 = (t 1)/t < 1 and the OLS estimator E[1 ] 1 < 1.
However, we can recognize that this can not be the truth since the random walk process has an implicit
one: yt = 1yt1 + t . In other words, OLS is biased downwards if the underlying stochastic process is a
H
random walk. Suppose that we have 1 = 0.87 and we would like to test t =0 [1 1]/SE(1 ). The student-t
distribution will require E[1 ] = 1 which we have shown is not true. In fact, we should observe E[t] = 0 but
it will be less than zero rendering classical inference to be useless in the time domain. We can consider a
Monte Carlo simulation:
## The R package dyn implements a version of lm() that can
## handle ts() objects, dyn$lm().
require(dyn)
## Loading required package: dyn
## Create a vector for storing the t-statistics
t2 <- numeric()
## Conduct 10000 Monte Carlo replications
for (i in 1:1000) {
## Generate a random walk (y=lag(y,-1)+epsilon, coefficient on
## lag(y,-1) is 1)
y <- ts(cumsum(rnorm(100)))
## Fit a linear model via OLS regressing y on its lagged value
model <- dyn$lm(y ~ lag(y, -1))
## Compute the t-statistic for a test of the hypothesis
## beta2=1 using the OLS asymptotics
t2[i] <- (coef(model)[2] - 1)/sqrt(diag(vcov(model)))[2]
}
t2 <- sort(t2)
## Plot the empirical density of the vector of t-statistics
## and the (incorrect) asymptotic distribution
plot(density(t2, bw = "nrd"), xlab = "$t$-statistic", main = "")
lines(t2, dt(t2, df = 100), col = 2, lty = 2)
legend("topleft", c("Empirical Distribution", "Student-$t$ Distribution"),
lty = 1:2, col = 1:2, bty = "n")

13

0.4
0.0

0.2

Density

Empirical Distribution
Student$t$ Distribution

$t$statistic

Summary
We can use the variance and covariance
p formulas
p of a random
p walk process to determine the ACF of
a random walk by s = cov[yt , yts ]/ V [yt ] V [yts ] = (t s)/t < 1. Since E[s ] = s s < 1
using classical regression, we can infer that OLS is biased downwards given that a random walk has
= 1 by definition. The empirical distribution of the test statistics associated with using classical
inference, t = s /SE(s ) on a random walk is skewed to the left.

3.5

Unit Root Tests

A Unit Root is a feature of a random walk process where shocks in the distant past never stop influencing the
future and statistical inference is useless. Consider the model yt = yt1 + t . We cant test for a random
walk by using a student-t distribution because classical inference for H0 : = 1 is biased downwards and we
would incorrectly reject the unit root hypothesis in the presence of a unit root. Dickey and Fuller (1979)
consider the random walk model differenced once for:

4yt = yt1 + t yt1


= ( 1)yt1 + t
= yt1 + t
The differenced model is rendered stationary and we can test the null H0 : = 1. Why? A white noise
process does not contain a unit root. Testing the null now involves testing H0 : = 0 to show that the
differenced process is indeed white noise. Dickey and Fuller considered three different regression equations to
test for the presence of a unit root:

4yt = 1 yt1 + t Random Walk


4yt = 0 + 1 yt1 + t Random Walk with Drift
4yt = 0 + 1 yt1 + 2 T + t Random Walk with Drift and Trend
14

The critical values for the test statistic =0 1 /SE(


1 ) are used in the Dickey-Fuller (DF) unit root test.
The test statistic does not have a symmetric distribution so both upper u and lower l critical values must
be checked:
If < L we reject the null that = 0 ( = 1) of a random walk and conclude the variable is stationary
if > U we reject the null that = 0 ( = 1) of a random walk and conclude the variable is
nonstationary and explosive
if L < < U we fail to reject the null with some probability of a Type II error (Pr(F T R|F alse))
We can also consider an augmented Dickey-Fuller unit root test (ADF):
4yt = 0 + 1 yt1 + 2 t +

p1
X

j 4ytj + t

j=1

The ADF is used to test joint hypotheses of = 0 by forming an F-ratio. The F-ratio no longer has the
standard F distribution under the null hypothesis so ADF uses a distribution for the F -statistic under the
null of a unit root. In r, we can directly run a DF test by:
e <- ts(scan(file="https://socserv.mcmaster.ca/racinej/762/files/usdmspot.dat"))
adf.test(e)
##
## Augmented Dickey-Fuller Test
##
## data: e
## Dickey-Fuller = -1.7591, Lag order = 11, p-value = 0.6803
## alternative hypothesis: stationary

3.6

Random Walks and Spurious Regression

Granger and Newbold (1974) point out the concept of spurious correlation where there is a high frequency of
high R2 in time series, particularly in statistically independent random walks. We can see this using r:
## Generate two independent time series that have unit roots
n <- 50
set.seed(42)
x <- ts(cumsum(rnorm(n)))
y <- ts(cumsum(rnorm(n)))
model.ts <- dyn$lm(y~x)
summary(model.ts)
##
##
##
##
##
##
##
##
##
##
##
##
##

Call:
lm(formula = dyn(y ~ x))
Residuals:
Min
1Q
-5.0402 -0.8390

Median
0.3903

3Q
1.1701

Max
3.1612

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.93945
0.31171 15.846 < 2e-16 ***
x
-0.44794
0.07835 -5.717 6.76e-07 ***
--15

##
##
##
##
##

Signif. codes:

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.754 on 48 degrees of freedom


Multiple R-squared: 0.4051, Adjusted R-squared: 0.3927
F-statistic: 32.68 on 1 and 48 DF, p-value: 6.763e-07
Summary
A unit root is a non-zero in the equation yt = yt1 + t where = 1 implies a random walk.
< 1 so we difference our process to render
Testing using the t-distribution will be biased since E[]
it stationary 4yt = yt1 + t yt1 = yt1 + t where we test = 0 to show = 1. We use a
Dickey-Fuller asymmetric distribution with L < = /SE(
) < U . To test a joint hypothesis (for
example, for a random walk with drift) we use an augmented Dickey-Fuller test. Spurious regression
is a case where two randomly generated i.i.d. random walk time serieses with unit roots display
significant correlation or a high R2 value.

4
4.1

Lecture 4: Univariate Linear Time Series Models (2017-01-19)


Overview

Univariate linear time series models are widely used for forecasting a data series when little is known about
causal relationships between the variable to be forecasted and potential explanatory variables. The goal is
to forecast by modeling the underlying stochastic process of the series. We work with stationary processes
because the time invariant nature of their moments mean they can be modeled with fixed coefficients. For
example, we use ACF = k = k /0 .

4.2
4.2.1

Moving Average Models (M A(q))


Structure of M A(q) Processes

A time series generated by a moving average process of order q is expressed as:


yt = + t 1 t1 q tq | t (0, 2 )
We define the backward shift operator B so Bat = at1 , B 2 at = at2 and B i at = ati . We can express an
M A(q) process as:

yt = + t 1 t1 q tq
= + t 1 Bt q B q q
= + (1 1 B q B q )
= + (B)t
Here (B) is a polynomial in the backwards shift operator.

16

4.2.2

Properties of M A(q) Processes

A moving average process is characterized by dependency on a finite past given by the order of the process.
The expected value of an M A(q) process is:

E[yt ] = E[ + t 1 t1 q tq ]


= + E[t ] 1 E[t1 ] q E[tq ]
=
The variance of a moving average process is:

0 = V [yt ]
= E[(yt E[yt ])2 ]
= E[( + t 1 t1 q tq )2 ]
= E[(t 1 t1 q tq )(t 1 t1 q tq )]
= E[2t t 1 t1 t q tq 1 t1 t + 12 2t1 1 t1 q tq + q2 2tq ]

= E[2t +

q
X

i2 2ti + . . . ]

i=1

= E[E[(t E[t ])2 ] +

q
X

i2 E[(ti E[ti ])2 ] + . . . ]

i=1

= E[2 +

q
X

i2 2 ]

i=1

= (1 +

q
X

i2 )2

i=1

Example: 0 = V [yt |t = 3, q = 2]
= E[( + 3 1 2 2 1 )2 ]
= E[(3 1 2 2 1 )(3 1 2 2 1 )]
= E[23 + 12 22 + 22 21 + . . . ]

17

= E[2 + 12 2 + 22 2 + . . . ]
= (1 + 12 + 22 )2
The covariance of a moving average process is given by:
RETURN TO THIS***

k = cov[yt , ytk ]
= E[(yt E[yt ])(ytk E[ytk ])]
= E[(t 1 t1 q tq )(tk 1 tk1 q tkq )]

RETURN TO THIS***
Given our variance and covariance, the ACF for k q (and k = 0 for k > q) will become:
(k + k+1 1 + + q qk )2
p
k = p
Pq
Pq
(1 + i=1 i2 )2 (1 + i=1 i2 )2
=

4.2.3

k + k+1 1 + + q qk
1 + i2 + + q2

Stationarity and Forecasting M A(q) Processes

Given that our moments are time-invariant, M A(q) models are stationarity assuming variance is finite. By
our definition of the variance of a moving average process, this will be true if i2 + + q2 < . We encounter
two types of errors: we do not know the value of future errors T +i , and we do not know the true parameters
of the process which gives rise to errors in parameter estimates.
4.2.3.1

Forecasting M A(q) Processes where T i , i are Known

We consider the lstep forecast at period T and use E[T +i |{yt }Tt=1 ] = 0 to form forecasts. We are assuming
that i |i < T and i are known. Then, the 1-step forecast is:
yT+1 = E[ + T +1 1 T q T q+1 |{yt }Tt=1 ] = 1 T q T q+1
The 1-step forecast error has expectation given by:

E[
yT +1 yT +1 |{yt }Tt=1 ] = E[( 1 T q T q+1 ) ( + T +1 1 T q T q+1 )]
= E[T +1 |{yt }Tt=1 ]
=0
The lstep forecast also has an expected forecast error of zero so the forecasts are unbiased. The 1-step
forecast error variance is given by:

18

V [
yT +1 yT +1 |{yt }Tt=1 ] = E[(
yT +1 yT +1 E[
yT +1 yT +1 ])2 |{yt }Tt=1 ]]
= E[(
yT +1 yT2 +1 |{yt }Tt=1 ]
= E[(( 1 T q T q+1 ) ( + T +1 1 T q T q+1 ))2 |{yt }Tt=1 ]
= E[2T +1 |{yt }Tt=1 ]
= E[E[(T +1 E[T +1 ])2 ]]
= 2
The lstep forecast error variance is given by:

V [
yT +l yT +l |{yt }Tt=1 ] = E[(
yT +l yT +l E[
yT +l yT +l ])2 |{yt }Tt=1 ]]
= E[(
yT +l yT2 +l |{yt }Tt=1 ]

Lecture 5: Univariate Linear Time Series Models (2017-01-24)

19

You might also like