Professional Documents
Culture Documents
Contents
I Time Series 3
1 Intro 3
1.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Time series data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Time and Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Graphical Methods for Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Time Series Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Estimating Trends 11
3.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Reg/Curve Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.1 Second Differencing Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1
4 Stochastic Process 15
4.1 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.2 Stationary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Spectral Representation 16
5.1 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Important Stationary Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3 General Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 ARIMA Models 20
6.1 Autoregressive Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.1.1 AR(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.1.2 AR(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2 Stationary Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3 Autocovariance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3.1 Yale-Walker Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3.2 Applying Spectral Density Function . . . . . . . . . . . . . . . . . . . . . . . 22
6.4 Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.5 Unit Root AR(p) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7 Filtering Theorem 23
7.1 Application to moving average and Autoregression process . . . . . . . . . . . . . . . 26
7.1.1 AR(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.1.2 ARMA Process (Autoregression Moving average Process) . . . . . . . . . . . 26
7.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.3 Summary: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.4 Prediction and Partial Autocorellations . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.4.1 (s) vs (s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.4.2 Computing (s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.4.3 Kologorovs Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2
Part I
Time Series
1 Intro
1.1 Time Series
Observation occur in temporal order induced by time
e.g. Daily/hourly stock price EEG oitentail ever 2 seconds
Monthly revenues of a business
U niversal M ultivariatetimeseries
concentrate on this panel or longitudinal data
n individuals follows over time
1.2 Goal
The goal is to understand structure of the process generating time series data
Data indexed by space (as well as time)
1.3 Objectives
1. Description
Understand in crude terms the structure of the data:
4. Control
Assume input and out per process multivariate time series
Example
Geostatistics: Box with xs and os
observe a variable at some points
use data to predict value point o
3
Example:
Input: Interest rate
Output : value of dollar
use interest rate to control value of dollar
Process of TSA
Description modeling /inference prediction or control
xt = xk1 +
where
Aj and Bj = random variables
wj = frequency prediction
In practice time domain approach is more useful for modeling and prediction
But... frequency domain approaches can provide insight for building time domain models
4
Special Case:
White noise or purely random process
2. Long - Memory:
Past gives significant information about future
e.g. seasomonthly trends ( jan 2015 for jan 2016/2017)
5
2 Correlogram and Periodogram
Correlogram time approach
Periodogram frequency approach
2.1.2 Properties
1. p(0) = 1
2. 1 p(s) 1 s
3. p(s) = p(s) (even function)
4. a1 , .., an constant positive definite. Defined by:
n X
X n
at as p(s t) 0
t=1 s=1
2.2 Correlogram
plot p(s) vs s for s {0, ...., max (lag)}
R fucntion: ACF
horizontal times at 2n
This allows us to consider whether data has attributed of short or long term memory
6
2.2.1 Short memory time series
p(s) decays to 0 relatively quickly as s increases
2.3 Periodogram
The primary motivation behind using a Periodogram is identifying periodicity/cycles within the
Time Series (TS)
Idea: given some frequency w ( period = w1 ) look at correlation between xt and sinusoids with
frequency w.
Assume a regression model for xt
x + t = 0 + 1 cos(swt) + 2 sin(2wt) + t
and
1 x1
. .
= (AT A)1 AT
.
.
. .
n xn
If n is moderately large (i.e. not small)
then 1 = n2 (xt x) cos(2wt)
P
q
This, the estimate amplitude of the sinusoidal of frequency w is 1 + 2
( the square root of the sum of the above two formulas)
7
1
Repeating this w between 0 and 2 Periodogram
!2 !2
n n
1 X X
I(w) = (xt x) cos(2t) + (xt x) sin(2t)
n t=1 t=1
n
1 X
= | (xt x)e2it |2
n t=1
The above two formulas give us a fundamental formulation of the periodogram. Bellow we show a
derivation of another.
n
1 X 1 XX
I(w) = | (xt x)e2it |2 = (xt x)(xs x)e2iwt e2iws
n t=1 n s t
1 XX
= (xt x)(xs x)e2iw(ts)
n s t
Using change (v = t s, t = t) of variables and summing the diagonals:
n1 n|v|
1 X X
= (xt x)(xt+|v| x)e2iwv
n t
v=(n1)
8
Periodogram is very difficult to interpret ( very noisy)
2.3.1 Properties
1
1. I(w) is completely determined by its values 0 w 2 (Nyquist frequency)
2. I(w) = I(w)
3. I(w) = I(w + k) k Z
Note : Definition of I(w) does often vary from book to book. E.g.
n 2
1 X
iwt
I(w) = (xt x)e 0 w
2n t=1
2.3.2 Computation
which frequencies w to evaluate I(w) ?
k1
If n is not so small, evaluate I(w) at Fourier frequency wk = n for k {1, ..., [ n2 ] + 1}
If n can be factored into a product of small prime numbers, i.e. n = n1 n2 n2 ....np , then we can
evaluate I(w) at Fourier frequencies very efficiently. Best Case: n = 2p
2.3.3 Tapering
periodogram can be improved by tapering
n n
1 X X
I ? (w) = | (xt x)e2iwt |2 where H = h2t
H t=1 t=1
9
2.3.4 Interpreting the Periodogram
Always plot ln(I(w)) or ln(I ? (w)) vs w:
2.3.5 Transformation
Original scale or transformed scale?
Goal is to transform data so that local variability is more or less constant
If Time series is positive and takes values over several orders of magnitude then taking logs is
a good idea.
10
3 Estimating Trends
3.1 Concept
1. Trend may be very important to know
2. Removing trends allow us to find out more interesting structure in Time Series
e.g. Seasonal Adjustment
Time series xt (monthly data)
Model: Tt + St + t
(a) Tt : Trend (varies slowly over time )
(b) St : seasonal (St = St+12 )
(c) t : irregularity component
Seasonal Adjusted data (ideally)
Model :
(adj)
xt = Tt + t = xt St
But we dont know St , Tt estimate them
(adj)
find Tt and St So that xt = xt St
(Do need Tt as well )
Methods
1. Reg/Curve fitting
2. Filtering (i.e. running average)
Dont use this in practice unless you know what you are doing.
gets very sensitive to small changes
Non Parametric
e.g. Loess (local polynomial regression)
the function above is available in R
11
3.3 Filtering
Estimate f (t) by a weighted average of values around t.
r
X
f(t) = cu xtu
u=r
Pr
where u=r cu = 1 and {cu } called filler.
e.g. simple running average :
P
1 xt u
cu = for r a r f(t) =
2n + 1 2r + 1
Issues
1. Choice of filter, length of filter:
Simple approach: start with simple running average and apply it several times
e.g. (iteratign n times)
12
Conditions which the matrix must satisfy
e.g. if x1 , ..., xn are smooth then
x1 x1
.. ..
. = A
.
xn xn
i.e. = 1
and also, the eigenvector v should coreespng to eigenvlaues should be smooth functions. Minimal
Requirements:
1. First Property:
1 1
.. ..
. = A
.
1 1
and
1 1
.. ..
. = A
.
n n
2. Secon Property:
f(1)
x1
.. ..
. = A
.
f(n) xn
then
f(n)
x1
.. ..
. = A
.
f(1) xn
(called a centro-symmetric Matrix)
3.4 Differencing
Remove trend to focus on frequency amongst others)
Useful for removing trend
yt = xt xt1 = xt Bxt
= xt (1 b) = Oxt
(O Difference operator)
13
Second Differencing
zt = yt yt1 = Oyt = O2 xt
(1 B)2 xt = (1 2B + B 2 )xt
xt 2xt1 + xt2
Idea Behind How Differencing Removes Trends Suppose F (t) = xt = 0 +1 t+t2 +. . .+p tp
Then Op xt = constant
e.g.
p = 1 x t = + 1 t
[xt1 = 0 + 1 (t 1)] Oxt = 1
(this shows the shift from xt1 xt
14
4 Stochastic Process
4.1 Probability Theory
The goal of Time series is :
Descriptive Methods M odels T heory
4.2.2 Stationary
{xt |t I} is strictly stationary if the distribution of (xt1 , . . . , xtp ) is the same as that of (xt1 +4 , . . . , xtp +4 )
4, t, . . . , tp such that t1 , . . . , tp I and t1 4, . . . , tp 4 I
If in addition E(x2t ) < then
1. E(xt ) = (t) = (constant)
2. V (xt ) = 2 (t) = 2 (constant)
3. Cor(xt , xt+s ) = (s)()(s) = (s) (where (s) is the autocovariacne function of {xt }
Strict stationary is almost impossible to verify given a finite realization (o.e. time series)
weaker condition: 2n order stationary (or weaker covariance)
A stochastic process {xt } is secondorder stationary if E(x2j ) < and
1. E(xt ) = (constant)
2. V (xt ) = 2 (constant)
3. Cor(xt , xt+s ) = (s)()(s) = (s)
15
5 Spectral Representation
{xt } discrete time (assume I = {0, 1, 2, 3. . . .}
assume {xt } is (second order) stationary
N
X k k k1
(s) = lim cos 2 s F F
N N N N
k=1
Important Notes:
P
1. s=1 |(s)| < implies that (s) 0 as s .
This implies that w converges reasonably fast.
e.g.
k
(s) = 1+ for > 0
s
2.
3. Relationship between f (w) and (s) is analogous to that between I(w) and (s):
X
I(w) = (0) + cos(2ws)(s)
s=1
Z 1
(s) = cos(2ws)(I(w))dw
0
16
suggest I(w) is an estimate of f (w)
(It is, but not a good one)
Examples
1.
2.
Note that for {yt } the spectral density function DNE, since:
X
X
X
2
|(s)| = A + x2 + |cos(2ws)| = since |cos(2ws)| diverges
s=0 s=1 s=1
17
Spectral Distribution Function
1 2
F is linear ( w slop x2 ) w jumps of @ A at frequencies w and 1 w
same slope
same jump = 12 A
2
5.1 Application
Simulate a stationary stochastic process given p.d.f f(w) (note: f(w) = f(1-w))
Take N very large
Define RV
A1 , A2 , ..., An , B1 , ..., Bn
(2N RV total)
With:
E(Ak ) = E(Bk ) k
Cov(Aj , Ak ) = j 6= k
Cov(Bj , Bk ) = j 6= k
Cov(Aj , Bk ) = j 6= p
1 k
V ar(Ak ) = V ar(Bk ) = Nf n the s.d.f.
Define:
N X N
X k 2 k
xt = + Ak cos 2 t + V ar(Bk ) sin 2 t
N n
k=1 k=1
PN k
. R1
By V ar(Ak ) = V ar(Bk ) = N1 k=1 f N = 0 f (w)dw
Since N large )
. R1
Likewise, Cov(xt , xt+s ) = 0 f (w)cos(2ws)dw
By similar process as last formula
(a) Most Stochastic process used in modeling are driven by White Noise
(b) Time series models produce residuals (if model fits data well, it should like W.N.)
18
2. Moving Average Process {t } White noise process
V ar(t ) = 2 and E(t ) = 0
q
X q
X
xt = u tu or = u tu
u=0 u=0
TypicallyPassume B0 = 1
q
xt = + u=1 u tu + t
Write: {xt } is MA(q) (Moving Average) process
q X
X q
Cov(xt , xts ) = u v Cov(tu , tv+s )
u=0 v=1
And Pq|s|
P u u+|s|
u=0
q |s| q
(s) = u=0
u
0 |s| q
Graphical representation:
PDF:
X
f (w) = (0) + 2 (s)cos(2ws)
s=1
Xq
= (0) + 2 (s)cos(2ws)
s=1
q
X q X
X qs
= 2 u + 2 u u+s cos(2ws)
u=0 s=1 u=0
q 2
X
2
= u e(2iwu)
u=0
19
6 ARIMA Models
6.1 Autoregressive Process
ARP of order one, we abbreviate it as AR(1)
{1 } White noise (WN) E(t ) = 2
xt = xt1 + or xt = + (xt ) + t where t is called the innovation
Assume Cov(xt1 , t ) = 0
xt = (xt2 + t1 ) + t
= 2 xt2 + t1 + t
= s xts + s1 xts+1 + . . . + t
6.1.1 AR(1)
xt = t1 + t
{t } white noise t is uncorrelated with xt1 , xt2
And {xt } is a stationery with || < 1
6.1.2 AR(1)
20
Generalization for AR(p):
if |a1 |, |a2 |, . . . , |ap | < 1 then
! ! !
X X X
xt = a21 s ) as2 s ) asp s
s=o s=o s=o
X
= bs s t
s=0
X
= bs ts
s=0
21
Example:
X
F () = (0) + 2 (s) cos(sws)
s=1
It is difficult to apply formula here, but there are other, easier, methods.
22
(not that the final equality depends on t and so, is not stationary)
often write Oxt = t where O = 1
(z) = 1 1 z 2 z 2 . . . p z p
= (1 )1 z)(1 a2 z) . . . (1 z)
where (z) = (1 a2 z) . . . (1 z)
Suppose a1 = 1
()xt = t
Ha : {xt }stationary
7 Filtering Theorem
P
{xt } stat with a Spectral Density Function (SPD) f () Tt = u= xu xtu and {cu } is a linear
filter
Where fy ():
X
fy () = y (0) + 2 y (s) cos(2s)
s=1
X
= y (s)e2is
s=
23
Assume E(xt ) = 0 = EY= 0 Then
= |()|2 fx ()
Here () is the :
1. transfer function:
2. effect of the fitter on the difference frequency
P
3. = u= cu e2iu = |()|e2i()
() is the gain
e2i() is the phase shift (degree of distortion)
P
4. Given a time series x1 , . . . , xn and periodogram Ix () then if yt = cu xtu
24
1.
2.
3. Cascaded Filters P P
Yt = u cu xtu Zt = u du ytu
(double filters)
e.g. yt = Oxt and zt = Oyt = O2 xt = second difference of xt
P P
fz () = | u= du e2iu |2 fy () where ( fy () = | u= du e2iu |2 fy ())
fz () = |d ()|2 |c ()|2 fx ()
(note that the order of the filter does not matter here)
can improve running average filter by Cascading it several times
25
7.1 Application to moving average and Autoregression process
7.1.1 AR(p)
xt = 1 xt1 + . . . + p xtp + t (where t is WN)
xt 1 xt1 . . . p xtp = t
Note that
p
X
xt 1 xt1 . . . p xtp = cu xtu
u=0
2
fx () =
1.81 1.81cos(2w)
2
fu () =
1.81 + 1.81cos(2w)
or
xt = + 1 (xt1 ) + 2 (xt2 ) + . . . + p (xp ) + t + 1 t1 + . . . + q tq
The TS is an ARMA(p,q)
We need restricitons on i , j where i = 1, . . . p and j = 1, . . . , q
Reson: Stationary identifiablity
1. Define (z) = 1 1 2 . . . p 2p
All solutions of (z) = 0 satisfy |z| > 1
2. Define: (z)z = 1 + 1 z + . . . + q z q
All solutions of (z) = 0 |z| > 1
26
3. the equations (z) = 0, (z) = 0 have no common solutions.
i.e. (z) = (z) (z) 6= 0
Reasons for conditions:
1. allows us to write ARMA (p, q) process as a MA() (Moving average) (or linear process)
process;
X
x t = t + cu tu
u=1
where the summation depends on the values of i and j
2. Allows us to write {xt } TS as a AR() process
X
xt = bu xtu (where bb depends on i , j )
u=1
7.2 Properties
Auto coverience/corelation functions are very complicated in genral
SDF is nice to comput:
xt q xt1 . . . p xtp = t + 1 t1 + . . . + p tp = yt
Filter Theorem:
1. 2
p
X
fy () = 1 s e2is fx ()
s=1
2. 2
q
X
2is
fy () = 1 + s e 2
s=1
27
7.3 Summary:
1. ARMA Processes
(Includes AR and MA as special cases)
provide a rich class of stationary process
useful for modeling stationary TS
2. ARIMA Process
(Auto regression/Integrated/MA)
TS is ARIMA(p, d, q) if
Od xt = (1 )d xt is ARMA(p, q)
where ARMA(p, q) satisfies the following three conditions:
(a) {xt } is stationary
(if not, difference of time to make stationary)
(b) fit ARMA(p, q) model to the possibly differenced data
(c) Diagnostic: WN Test for residuals, etc.
E (xt xt )2
3 main questions
1. What are optimal 0 , . . . , p ?
E (xt 0 xt , . . . , p xtp )2 = 2E
i
Optimal {k } satisfies:
(s) = 1 (s 1) + + p (s p) for s = 1, . . . , p
(s) = 1 (s 1) + + p (s p)
28
0 = (1 1 2 p )E(xt )
We then Define
p2 = E (xt xt )2
= optima l PMSE
= p2 (1 p2 )
= 2 (1 2 (p))
and define (1), (2) are the partia lautocrellations of the time series {xt }
(1) (2) (s 1)
1 1 (1)
.. .. .. .. ..
(1)
. . .
.
.
.. . . . . . . .. .. ..
=
. . . . .
. .
. ..
.. ..
(s 2) (1) . .
(s 1) (1) 1 s (s)
29
Comments
1. IF {xr } is an AR(p) procss then (s) = 0 for s = p + 1, p + 2 . . . i.e. s > p
xt = + 1 (xt1 ) + + p (xtp ) + t
and so (p) = p
2. If {xt } is WN then (s) = 0 s 1
3. Typically (s) 6= 0 s ((s) 0 as s )
Suppose {xt } stationary, how well can xt be predicted based on infinte past?
X
xt = 0 + s xts
s=1
2
= E (xt xt )2
Special Cases
and V (Rt ) = 2
1
ln(f ())d
Then e 0 = 2 = V (t )
This result can Pbe thought of as a consequence of the fact that we can write the invertible
AR( ) xt = u=1 bu xti + t
30