Professional Documents
Culture Documents
Ani Katchova
We have time series data on ppi (producer price index). Data are quarterly from 1960 to 2002.
Summary statistics: Mean(ppi)=64, mean ((ppi)=0.464.
Graphs in Stata
Original variable (ppi) Differenced variable (ppi)
120
4
100
2
producer price index, D
producer price index
80
0
60
-2
40
20
-4
1960q1 1970q1 1980q1 1990q1 2000q1 1960q1 1970q1 1980q1 1990q1 2000q1
time in quarters time in quarters
1
Graphs in R of the original and differenced variable
2
Dickey-Fuller Test
Original variable With trend Differenced variable
D.y or yt D.y or yt D2.y or yt
Const 0.5036* 0.5861* 0.2067*
L1.y or yt-1 -0.0006 -0.0084
(-0.26) (-0.793)
LD.y or yt-1 -0.4452*
(-6.86)
Trend or t 0.0050
The Dickey Fuller test shows that the original variable is not stationary, but the differences
variable is stationary so we need to use differences d=1 in the ARIMA models.
3
Correlograms, Autocorrelation function (ACF), and partial autocorrelation function (PACF)
Original variable (ppi) Differenced variable (ppi)
LAG ACF PACF LAG ACF PACF
1 0.990 0.999 1 0.553 0.555
2 0.978 -0.555 2 0.336 0.066
3 0.966 -0.069 3 0.319 0.203
4 0.952 -0.209 4 0.217 -0.031
5 0.937 0.023 5 0.086 -0.130
6 0.923 0.125 6 0.153 0.149
7 0.908 -0.153 7 0.082 -0.118
8 0.894 0.114 8 -0.078 -0.213
9 0.880 0.210 9 -0.080 -0.051
10 0.866 0.049 10 0.023 0.166
4
ACF and PACF of original variable
1.00
1.00
0.50
0.00
-0.50
-0.50
-1.00
0 10 20 30 40 0 10 20 30 40
Lag Lag
Bartlett's formula for MA(q) 95% confidence bands 95% Confidence bands [se = 1/sqrt(n)]
For original variable, ACF is a slow decay function (indicating non-stationarity) and PACF cuts off
at lag 1 or 2.
5
6
ACF or PACF of the differenced variable
-0.200.00 0.20 0.40 0.60
0 10 20 30 40 0 10 20 30 40
Lag Lag
Bartlett's formula for MA(q) 95% confidence bands 95% Confidence bands [se = 1/sqrt(n)]
For differenced variable, ACF tails off and PACF cuts off after lag 1 use AR(1)?
7
8
ARIMA Models
ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA
(1,0,0) (2,0,0) (0,0,1) (1,0,1) (1,1,0) (0,1,1) (1,1,1) (1,1,3) (2,1,3)
Const 64.37 64.18 64.69* 64.67* 0.46* 0.47* 0.43* 0.43* 0.44*
L1.ar 0.999* 1.64* 0.99* 0.55* 0.72* 0.73* 1.51*
L2.ar -0.64* -0.71*
L1.ma 1.00* 0.53* 0.48* -0.25* -0.24 -1.05*
L2.ma -0.10 0.21
L3.ma 0.12 0.32*
AIC 502 424 1401 441 393 405 393 392 390
BIC 511 426 1420 543 402 414 406 411 412
* These are the Stata results. R has very similar coefficients. In the SAS output, the MA components have reverse
signs than what is reported in this table and some coefficients have different magnitudes.
We know that the variable is not stationary so we need to use differenced variable ARIMA
(p,1,q). But here we also include models with the original variable ARIMA (p,0,q).
The coefficient on the lagged dependent variable is close to 1 indicating non-stationarity.
To select a model to use, look at the significance of the coefficients and also low AIC or BIC.
Usually, there are a few models that perform similarly, so it is up to the researcher to try a few
models and decide which one to use. The recommendation is to go with the simplest model.
ARIMA(1,1,1) is a good choice based on low AIC and BIC.
ARIMA(2,1,3) is also a good choice based on the significance of the lags.
9
Forecasting in R
Original variable after ARIMA(1,0,1) Differenced variable after ARIMA(1,1,1)
10