Professional Documents
Culture Documents
Yanchang Zhao
http://www.RDataMining.com
1
Presented at UJAT and Canberra R Users Group
1 / 39
Outline
Introduction
Online Resources
2 / 39
2
Time Series Analysis with R
2
Chapter 8: Time Series Analysis and Mining, in book
R and Data Mining: Examples and Case Studies.
http://www.rdatamining.com/docs/RDataMining.pdf
3 / 39
R
4 / 39
Time Series Data in R
I class ts
I represents data which has been sampled at equispaced points
in time
I frequency=7: a weekly series
I frequency=12: a monthly series
I frequency=4: a quarterly series
5 / 39
Time Series Data in R
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2011 1 2 3 4 5 6 7 8 9 10
## 2012 11 12 13 14 15 16 17 18 19 20
str(a)
attributes(a)
## $tsp
## [1] 2011 2013 12
##
## $class
## [1] "ts"
6 / 39
Outline
Introduction
Online Resources
7 / 39
What is Time Series Decomposition
8 / 39
Data AirPassengers
Data AirPassengers: monthly totals of Box Jenkins international
airline passengers, 1949 to 1960. It has 144(=1212) values.
plot(AirPassengers)
600
500
AirPassengers
400
300
200
100
Time
9 / 39
Decomposition
apts <- ts(AirPassengers, frequency = 12)
f <- decompose(apts)
plot(f$figure, type = "b") # seasonal figures
60
40
20
f$figure
0
20
40
2 4 6 8 10 12
Index
10 / 39
Decomposition
plot(f)
2 4 6 8 10 12
Time
11 / 39
Outline
Introduction
Online Resources
12 / 39
Time Series Forecasting
13 / 39
Forecasting
ts.plot(AirPassengers, fore$pred, U, L,
col = c(1, 2, 4, 4), lty = c(1, 1, 2, 2))
legend("topleft", col = c(1, 2, 4), lty = c(1, 1, 2),
c("Actual", "Forecast", "Error Bounds (95% Confidence)"))
14 / 39
Forecasting
Actual
700 Forecast
Error Bounds (95% Confidence)
600
500
400
300
200
100
Time
15 / 39
Outline
Introduction
Online Resources
16 / 39
Time Series Clustering
17 / 39
Dynamic Time Warping (DTW)
DTW finds optimal alignment between two time series.
library(dtw)
idx <- seq(0, 2 * pi, len = 100)
a <- sin(idx) + runif(100)/10
b <- cos(idx)
align <- dtw(a, b, step = asymmetricP1, keep = T)
dtwPlotTwoWay(align)
1.0
0.5
Query value
0.0
0.5
1.0
0 20 40 60 80 100
Index
18 / 39
Synthetic Control Chart Time Series
19 / 39
Synthetic Control Chart Time Series
# read data into R sep='': the separator is white space, i.e., one
# or more spaces, tabs, newlines or carriage returns
sc <- read.table("./data/synthetic_control.data", header = F, sep = "")
# show one sample from each class
idx <- c(1, 101, 201, 301, 401, 501)
sample1 <- t(sc[idx, ])
plot.ts(sample1, main = "")
20 / 39
201 101 1
25 30 35 40 45 15 20 25 30 35 40 45 24 26 28 30 32 34 36
0
Six Classes
10
20
30
Time
40
50
60
501 401 301
10 15 20 25 30 35 25 30 35 40 45 0 10 20 30
0
10
20
30
Time
40
50
60
21 / 39
Hierarchical Clustering with Euclidean distance
22 / 39
Hierarchical Clustering with Euclidean distance
140
120
100
Height
80
60
2
2
5
6
6
40
2
2
4
2
2
6
2
2
5
6
6
6
4
5
5
2
2
5
5
6
6
5
5
3
1
4
4
4
3
4
4
3
3
1
5
5
3
3
20
3
3
4
4
1
6
6
1
3
3
1
1
1
1
1
1
dist(sample2)
hclust (*, "average")
23 / 39
Hierarchical Clustering with Euclidean distance
## memb
## observedLabels 1 2 3 4 5 6 7 8
## 1 10 0 0 0 0 0 0 0
## 2 0 3 1 1 3 2 0 0
## 3 0 0 0 0 0 0 10 0
## 4 0 0 0 0 0 0 0 10
## 5 0 0 0 0 0 0 10 0
## 6 0 0 0 0 0 0 0 10
24 / 39
Hierarchical Clustering with DTW Distance
## memb
## observedLabels 1 2 3 4 5 6 7 8
## 1 10 0 0 0 0 0 0 0
## 2 0 4 3 2 1 0 0 0
## 3 0 0 0 0 0 6 4 0
## 4 0 0 0 0 0 0 0 10
## 5 0 0 0 0 0 0 10 0
## 6 0 0 0 0 0 0 0 10
25 / 39
Height
3
3
3
3
3
3
5
5
3
3
3
3
5
5
5
5
5
5
5
5
6
6
6
4
6
4
4
4
4
4
4
4
myDist
4
4
6
6
6
1
1
1
2
2
2
2
2
2
2
2
2
2
26 / 39
Outline
Introduction
Online Resources
27 / 39
Time Series Classification
28 / 39
Decision Tree (ctree)
29 / 39
Decision Tree
## pClassId
## classId 1 2 3 4 5 6
## 1 100 0 0 0 0 0
## 2 1 97 2 0 0 0
## 3 0 0 99 0 1 0
## 4 0 0 0 100 0 0
## 5 4 0 8 0 88 0
## 6 0 3 0 90 0 7
# accuracy
(sum(classId == pClassId))/nrow(sc)
## [1] 0.8183
30 / 39
DWT (Discrete Wavelet Transform)
I Wavelet transform provides a multi-resolution representation
using wavelets.
I Haar Wavelet Transform the simplest DWT
http://dmr.ath.cx/gfx/haar/
31 / 39
DWT (Discrete Wavelet Transform)
32 / 39
Decision Tree with DWT
## pClassId
## classId 1 2 3 4 5 6
## 1 98 2 0 0 0 0
## 2 1 99 0 0 0 0
## 3 0 0 81 0 19 0
## 4 0 0 0 74 0 26
## 5 0 0 16 0 84 0
## 6 0 0 0 3 0 97
(sum(classId==pClassId)) / nrow(wtSc)
## [1] 0.8883
33 / 39
plot(ct, ip_args = list(pval = F), ep_args = list(digits = 0))
1
V57
13 > 13 3 >3
Node 4 (n = 68) Node 5 (n = 6) Node 7 (n = 9) Node 8 (n = 86) Node 10 (n = 31) Node 13 (n = 80) Node 15 (n = 9) Node 16 (n = 99) Node 18 (n = 12) Node 20 (n = 103) Node 21 (n = 97)
1 1 1 1 1 1 1 1 1 1 1
0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0 0 0 0 0 0 0 0 0 0 0
123456 123456 123456 123456 123456 123456 123456 123456 123456 123456 123456
34 / 39
k-NN Classification
k <- 20
newTS <- sc[501, ] + runif(100) * 15
distances <- dist(newTS, sc, method = "DTW")
s <- sort(as.vector(distances), index.return = TRUE)
# class IDs of k nearest neighbours
table(classId[s$ix[1:k]])
##
## 4 6
## 3 17
35 / 39
k-NN Classification
k <- 20
newTS <- sc[501, ] + runif(100) * 15
distances <- dist(newTS, sc, method = "DTW")
s <- sort(as.vector(distances), index.return = TRUE)
# class IDs of k nearest neighbours
table(classId[s$ix[1:k]])
##
## 4 6
## 3 17
35 / 39
The TSclust Package
3
http://cran.r-project.org/web/packages/TSclust/
36 / 39
Outline
Introduction
Online Resources
37 / 39
Online Resources
38 / 39
The End
Thanks!
Email: yanchang(at)rdatamining.com
39 / 39