You are on page 1of 16

Article DOI: 10.1111/exsy.

12164

Forecasting financial time series using a methodology based


on autoregressive integrated moving average and Taylor
expansion
Guisheng Zhang,1,2 Xindong Zhang1,2* and Hongyinping Feng3
(1) Institute of Management and Decision, Shanxi University, Taiyuan, Shanxi 030006, China
E-mail: xindongzhang@sxu.edu.cn
(2) School of Economics and Management, Shanxi University, Taiyuan, Shanxi 030006, China
(3) School of Mathematical Sciences, Shanxi University, Taiyuan, Shanxi 030006, China

Abstract: Financial time series prediction is regarded as one of the most challenging job because of its inherent complexity, and the
hybrid forecasting model incorporating autoregressive integrated moving average and support vector machine (SVM) has been
implemented widely to deal with the both linear and nonlinear patterns in time series data. However, the SVM model does not take into
consideration the time correlation knowledge between different data points in time series, which impacts the learning efciency of the
SVM in real application. To overcome this restriction, this paper proposes the Taylor Expansion Forecasting model as an alternative to
the SVM and develops a novel hybrid methodology via combining autoregressive integrated moving average and Taylor Expansion
Forecasting to exploit the comprehensive forecasting capacity to the nancial time series data with noise. Both theoretical proof and
empirical results obtained on several commodity future prices demonstrate that the proposed hybrid model improves greatly the forecasting
accuracy.
Keywords: nancial time series forecasting, ARIMA, SVM, Taylor expansion, tracking differentiator

1. Introduction Zhang (2003) proposes a combining approach to time


series forecasting. Firstly, the autoregressive integrated
In recent years, nancial time series forecasting based on
moving average (ARIMA) model is applied to estimate the
previously observed values has become an important issue
in investment and nancial decision-making and has drawn linear component, and the residuals from the ARIMA
considerable attention as an active research area (Shi et al., model are estimated by the Articial Neural Network
1999; Zhu et al., 2008). As a special kind of time series, the (ANN) model in the second step. Finally, the linear and
nancial time series usually spaced at uniform time intervals nonlinear forecasting values obtained from the different
(Brillinger, 1974) is inuenced by a great number of factors, models are combined together as the nal forecasting values
most of which interact in very complex ways. For this of the time series data. The empirical results with three real-
reason, forecasting nancial time series through data sets clearly demonstrate that the hybrid model is
fundamentalist approaches using well-dened trading superior to each component model used in isolation.
strategies has been regarded as one of the most difcult task However, the ANN also suffers from a number of
of modern time series forecasting (Abu-Mostafa & Atiya, shortcomings such as local minimum traps, difculty in
1996; Atsalakis & Valavanis, 2009; Polimenis & determining the hidden layer size and risk of model over-
Neokosmidis, 2014; Wei et al., 2014; Kauffman et al., tting (Tay & Cao, 2001; Cao, 2003). On the contrary,
2015). Because a single linear or nonlinear model may not support vector machine (SVM) proposed by Vapnik enjoys
be totally sufcient to identify all the characteristics of the a global optimum and exhibits better prediction accuracy.
time series datasets, many hybrid models have been Based on the structured risk minimization principle, SVM
proposed to complement each other and make use of each seeks to minimize an upper bound of the generalization
models unique feature (Shi et al., 1988; Tay & Cao, 2001; error instead of the empirical error asin other neural
Tay & Cao, 2002; Kim, 2003; Zhang, 2003; Huang et al., networks (Vapnik, 1995). Therefore, the linear ARIMA
2005; Pai & Lin, 2005; Hansen et al., 2006; Lu et al., 2009; model and the nonlinear SVM model are applied jointly in
Hadavandi et al., 2010; Lee & Tong, 2011; Nie et al., 2012; many hybrid methodologies, aiming to improve the
Wang et al., 2012; Kao et al., 2013; Zhu & Wei, 2013; Guo forecasting performance to the time series data. Pai and
et al., 2014; Cheng & Yang, 2015; Xiong et al., 2015). Lin (2005) demonstrate that a hybrid methodology of

2016 Wiley Publishing Ltd Expert Systems, October 2016, Vol. 33, No. 5 501
ARIMA and SVM takes advantage of the unique strength of are supposed to be generated from a function f, and then the
ARIMA and ANN in linear and nonlinear modelling and time series data can be forecasted using the previous historical
obtain promising computational tests on 10 different stocks. data by Taylor expansion, in which the tracking differentiator
Nie et al. (2012) presents the hybrid of ARIMA and SVM to is used to compute the specied order derivatives of the time
forecast the short-term load series and demonstrates that the series function. The highlight is that the TEF model can be
SVM extracts the sensitive component to correct the converted into integration form of the historical data and
deviation of the former forecasting by the ARIMA greatly. express the different relationship between different data points
Wang et al. (2012) propose a novel approach via combining in the time series through the weighted function instead of the
ARIMA and SVM to forecast the different return of Euclidean distance in the SVM. Therefore, compared with the
equities, and experiments show that the SVM can deal with SVM, the TEF model not only incorporates the real
the textual information to improve the nancial time series correlations among the data points of the time series but also
forecasting accuracy greatly. Zhu and Wei (2013) develop a realizes dynamic backtracking to the historical information
methodology combining ARIMA and least square SVM to hidden in time series data points. So a novel hybrid
predict the carbon prices. Experimental results reveal that methodology combining ARIMA and TEF model is
the proposed hybrid methodology is suitable fore carbon developed to forecast the nancial time series. In this
price forecasting problem. proposed methodology, the ARIMA and the TEF model
Although the hybrid models exhibit favourable overall are employed to capture the linear and nonlinear components
forecasting performance, there are still some limitations of nancial time series, respectively, and their forecasting
affecting the forecasting accuracy of these models. As a kind values are integrated into the nal forecasting results. Finally,
of special data, the relations of different data points in time the authors evaluate the forecasting performance of the
series will be increasingly strong with the time interval hybrid ARIMA and SVM model and the hybrid ARIMA
decreasing, that is, to say, one data point in time series can and TEF model by forecasting several main future prices.
be inuenced by its previous or next neighbouring points The results of computational tests are found to be promising.
more easily than other. However, the SVM model does The rest of this paper is organized as follows: in the next
not take into account this kind of autocorrelations between section, the individual ARIMA, SVM and TEF models to
different points in real-time series. Theoretically, SVM time series forecasting are described. Section 3 introduces
employs a kernel function K(xi, xj) = (xi)  (xj) to the development of the novel hybrid methodology. Section 4
determine how close an input vector is to each stored data reports the experimental results, and Section 5 provides
on dot products between patterns, and the relationship of conclusions and predicts some future research directions.
the training instances expressed by Euclidean distance in
the SVM cannot reect the real correlations for the series
data depending on time sequence. This limitation inuences 2. Individual forecasting model employed in the hybrid
the learning efciency of the machine and could result in a model
dramatic decrease in generalization performance of the
SVM when dealing with the time series data in real 2.1. Autoregressive integrated moving average model
situations. Furthermore, as a supervised learning machine
The ARIMA method introduced by (Box & Tiao, 1975) has
with multiple inputs Xi = {xi1,  , xit,  , xiN} and one output
had an enormous impact on the theory and practice of
yi R, the dimensions of the input vector N affect the
modern time series analysis and forecasting (Gooijer &
generalization performance of the SVM greatly. However,
Hyndman, 2006; Pan et al., 2014). In the ARIMA model, if
the determination of the input variables dimension N is a
the transferred time series is stationary, the future value of a
hard job when modelling the nonlinear part by the SVM
variable can be forecasted by the linear combination of past
model, which is usually determined by the experiences in real
situations or determined by the backtracking steps value p in values and past errors. Suppose X fxi gNi1 is a time series,
the ARIMA(p, d, q) model (Zhu & Wei, 2013). And once the an ARIMA(p, d, q) model can be formulated as follows:
number of the input variables is determined, the backtracking
steps to the previous data points in the time series are xed, ^lt 0 xt1 xt2  xtp  t  1 t1 (2:1)
1 2 p
and the information before the feedback steps will be omitted. 2 t2    q tq
Therefore, the SVM method cannot realize the dynamic
backward search for the historical correlation knowledge where parameters p, d and q are non-negative integers
and will inuence the information absorption hidden in the that refer to the order of the autoregressive, integrated
series data, which further undermines the generalization and moving average parts of the model, respectively, xt
performance of the SVM in real applications. is the actual value of the time series obtained by differencing
In order to overcome the limitations discussed earlier, a d times, l t is the forecast value of xt, t is the residual
new Taylor Expansion Forecasting (TEF) model based on term between the actual data and the forecasting value and
the tracking differentiator is proposed as an alternative to t  i(i = 1, 2,  , p) and t  j ( j = 1, 2,  , q) are the coefcients
SVM to forecast the nonlinear components in time series to be estimated. Having completed the order determination
forecasting problem. In the TEF model, the time series data (determining the value of p and q) and parameter estimation

502 Expert Systems, October 2016, Vol. 33, No. 5 2016 Wiley Publishing Ltd
(determining the value of and ), the ARIMA model can be be solved more easily. The optimization problem can be
applied to forecast the value of the time series. The major described as the following form by a standard dualization
advantage of the ARIMA model is that it is basically a method utilizing Lagrange multipliers.
data-oriented approach that models the linear patterns of the
time series well and is relatively easy to use. However, being a
1X N X N     
class of linear model, the ARIMA model can only capture min i  i j  j K xi ; yj
linear patterns in a time series (Zhang, 2003; Wang et al.,
2 i1 j1
2012; Zhu & Wei, 2013). Therefore, the nonlinear components X
N   X N  
i i  yi i  i r (2:4)
in the time series necessitate a proper nonlinear model.
i1 i1

8
2.2. Support vector machine model > X
N  
>
< i  i 0
Support vector machine is a novel neural network s:t: i1 (2:5)
>
> C
classication technique, which enjoys good generalization : 0 i ; i ; i 1; 2; ; N :
performance and implement global optimization solution N
simultaneously (Vapnik, 1995). Based on statistical learning 
Here, i ; i are Lagrange multipliers, and the data points
theory, SVM overcomes the curse of dimensionality and 
with non-zero i ; i pairs are called support vectors.
over-tting problem. K(xi, xj) is called kernel function and obtained by K(xi, xj)
In a regression problem, -insensitive loss function was = hi(xi)  (xj) in the feature space. Some common kernels
introduced in regression model to obtain SVR, which has include radial basis function (Gaussian), polynomial, spline
been applied to nancial time series problems and shown a or even sigmoidal functions (Vapnik, 1998).
quite good performance (Tay & Cao, 2002; Kim, 2003; Yuan
et al., 2010; Rubio et al., 2011; Duan & Xu, 2012; Wang,
2015). If we use a set of training patterns fxi ; yi gNi1 , where 2.3. Taylor Expansion Forecasting model based on the
xi Rn and yi R, the target of  SV regression is to nd a tracking differentiator
regression hyperplane with an -insensitive band from the In fact, at least from the mathematical point of view, there are
actually obtained targets yi for all the training data. a large number of nancial time series, which can be
Therefore, the objective function and constraints for SVR forecasted just by themselves. For example, suppose that
can be formulated as a convex optimization problem: fX i gNi1 is a nancial time series, which is generated from a
1 smooth function f(t) with sampling interval h. It follows from
min
2
w2 the Taylor expansion that, for any ti  1 0, ti > ti  1, i [1, N].

y  < w; xi > b
s:t: i (2:2) f t i f t i1 h
< w; xi > b  yi
1
f t i1 f_ t i1 h fti1 h2
2
The minimization of norm 12 w in the previous case is 2
1
used to ensure the function atness, and is a free parameter f t i1 hh3 ; 0; 1 (2:6)
6
that serves as a threshold: all predictions have to be within
an range of the true predictions. The smaller its value, where h is the turning parameter that represents the time
the higher accuracy of learning is required, and more interval between ti  1 and ti of time series. Provided the h
support vectors need to be found by the algorithm. is small enough, the previous formula implies that
Sometimes, some errors are wanted to be allowed.
Therefore, two relaxation factors, 0 and * 0, are 1
f t i1 hf ti1 f_ ti1 h ft i1 h2 (2:7)
introduced into the constraints of the optimization problem, 2
and the optimization problem referred previously becomes

XN  If we have known some values of f at time ti  1, for


1 
min
2
w 2
C i i example, f ti1 ; f_ ti1 and ft i1 , then we are able to
i1 obtain the value of f at time (ti  1 + h) by (2.7). This means
8 that we nish a forecasting at time ti  1 to the future time
< yi  < w; xi > b i
(ti  1 + h), with the error
s:t: < w; xi > b  yi i (2:3)
:
i ; i 0;
1
f ti1 ti1 hh3 (2:8)
where C > 0, as penalty degree of the sample with error 6
exceeding , is a regularized constant determining the Remark 1. Although the nancial prices series is neither
trade-off between the training error and the model atness. smooth nor differentiable, it still can be forecasted by the model.
Using the dual formulation, the optimization problem can The reason can be explained by the following arguments.

2016 Wiley Publishing Ltd Expert Systems, October 2016, Vol. 33, No. 5 503
For any function f L2(0, T) with T > 0 ( f can be neither f : sup j f tj: (2:12)
t0;
smooth nor differentiable) where L2 0; T
 T 

f 0 f 2 t dt < , it follows from the well-known Obviously, all the function in F2 can be forecasted
mathematically in the sense (2.7) with error (2.8). So we
real-analysis theory that there exists a sequence
will call F2 the second-order mathematical-forecasting set.
ff n gn1 C 0; T such that
For any given time series fX i gNi1 , we assume that fX i gNi1
are sampling points of a function F2 with step h. That is,
f n f as n in L2 0; T (2:9)
0 x1 ; h x2 ; ; N  1h xN (2:13)
T
That is to say 0 f n  f 2 dt0; n , and this In order to make a one-step forecast for fX i gNi1 (forecast xN
convergence implies that fn can be considered as an estimation + 1 = (Nh)), according to (2.7), we have to estimate _ N  1h
of f, provided n is large enough. We divide f into two parts: and N  1h (the value of the ((N  1)h) is known in one-
step forecasting model). However, because of the sensitiveness
f t f n t f t  f n t  : f 1 t f 2 t; t 0 (2:10) to the noise, the derivative of the usually rapidly varying noise
will drown out the derivative of the signals. Therefore, it is
not feasible to use nite difference to approximate the
where f1 C (0, T) is differentiable and f2 L2(0, T) is very derivatives of . Fortunately, the derivatives of can be
small in norm L2 0;T for large n. We regard the second approximated by applying the tracking differentiator, which is
term f2 as the high-frequency disturbance or noise that is used to extract derivatives from the corrupted signals (Dabroom
useless to the forecasting. & Khalil, 1999; Guo et al., 2002; Ahrens & Khalil, 2009; Feng &
Remark 2. Obviously, the forecasting method (2.6) or (2.7) Li, 2013). For details, see the appendix. A simulation of tracking
can be extended to the nth order theoretically. However, differentiator is plotted in Figure 1.
owing to the properties of tracking differentiator Feng and By using the tracking differentiator, a one-step forecast
Li (2013), the recommended order is n = 2, 3, 4. In this work, can be made for fxi gNi1 in the sense of (2.7). More specially,
without loss of generality, we only consider the second-order
forecasting formula (2.7).
For this purpose, we dene 1
xN 1 N hxN z1 h z2 h2 ; (2:14)
2


F 2 : f H 3 0;  f_ f f < (2:11) where z1 and z2, obtaining from the tracking differentiator,
are used to approximate _ N  1h and N  1h ,
respectively. Although this idea is simple, a large number
where H3(0, ) is a Hilbert space and  is dened by of numerical experiments show that it is quite effective.
15 1.5

10 1

5 0.5

0 0

5 0.5

10 1

15 1.5
0 2 4 6 8 10 12 0 2 4 6 8 10 12

Figure 1: = 0.05, = 0.01.

504 Expert Systems, October 2016, Vol. 33, No. 5 2016 Wiley Publishing Ltd
We call this method TEF model. Finally, the previous d
V s AV s Ds; (2:21)
discussion illustrates that in order to make a forecasting ds
for the time series fxi gNi1 by (2.14), what we have to do is
where
to obtain z1 and z2. Consequently, the key issue is how to
obtain z1 and z2 by tracking differentiator. 0 1
0 1 v s 0 1
Next, we introduce some notation. Let X = (x1, x2, 0 1 0 Bd C 0
B C
x3) 3. Dene B C B vsC B C
A@0 0 1 A; V s B ds C; Ds @ 0 A:
q B 2 C
6 6 3 @ d A
 s 3
X 2 x21 x22 x23 (2:15) v s
ds2
(2:22)
We denote by A F the Frobenius norm of the matrix A.
The authors apply the following third-order high-gain tracking The solution of system (2.21) is found to be
differentiator (Dabroom & Khalil, 1999; Feng & Li, 2013):
8 s
>
>
3
z_ t z2 t  z1 t  t  ; V s eAs V 0 0 e As D d (2:23)
>
> 1
>
>
>
< 6
z_ 2 t z3 t  2 z1 t  t ; Because A is Hurwitz, there exist constants , L > 0 such
(2:16)
>
>  s
>
> 6 F Le
that eAs . It follows from (2.22) and (2.23) that
> z_ 3 t
>  3 z1 t  t ;
>
:
e
s
z1 0 z10 ; z2 0 z20 ; z3 0 z30 ; V s2 eAs V 02 0
As
D d 2
s
V 02 Les 0 D 2 Les d (2:24)
where (z10, z20, z30) 3 is the initial value and is the
turning parameter. In (2.16), (t) is the input, and z1(t), z2 L
V 02 Les 3
(t), z3(t) the output, which are used to approximate w
t; _ t; t , respectively. That is,
More specially, we have the following lemma.
Lemma 1. Suppose that F2. Then, for any initial data
(z10, z20, z30) , there exist two positive constants ,
3
_ 1 1
1 t ; 1 t ; 1 t 2

V 0 Le
2 2
2 t=

L
independent of t such that system (2.16) satisfy (2:25)

jz1 t  t j jz2 t  _ t j On the other hand, it follows from (2.18) that

jz3 t  t j 2 et= ; t 0: (2:17) 8
3
> _ 1 t 1 t ;
< 2 t
(2:26)
_ 3 z3  , the
Proof If we let 1 z1  ; 2 z2  ; : t t t 3_ t
> 6
3 1
error system is governed by the following: 2 1 1
8
>
> 3 Combining (2.15), (2.25) and (2.26), we are able to
>
> _ 1 t 2 t  1 t;
>
> conclude (2.17) easily with
>
>
<_ 6  
2 t 3 t  2 1 t ;
(2:18) max LV 02 ;
L
>
>
>
> 6
>
> _ t  3 1 t  t;
> 3
>
: So the proof is complete.
1 0 10 ; 2 0 20 ; 3 0 30 :
In order to use the forecasting algorithm (2.14), we discretize
the system (2.16) by backward Euler method as the following:
where 10 z10  0; 20 z20  _ 0; 30 z30  0 ,
(2.18) can be rewritten as follows: 8 3
>
> Z 1 i 1 Z 1 i Z 2 ih i  Z 1 ih;
>
>
>
>
1 t 3 31 t 2 6_ 1 t 6 1 t  t3 (2:19) >
< 6
Z 2 i 1 Z 2 i Z 3 ih 2 i  Z 1 ih;
(2:27)
>
> 6
Let t = s and v(s) = 1(s). Then it follows that >
>
>
> Z 3 i 1 Z 3 i 3 i  Z 1 i h;
>
:
Z 1 1 z10 ; Z 2 1 z20 ; Z 3 1 z30 ;
d3 d2 d
v s 3 vs 6 vs 6vs  s3 (2:20)
ds3 ds2 ds
which, together with (2.13) and (2.14), lead easily to our
which can be rewritten as an evolution equation: forecasting formula

2016 Wiley Publishing Ltd Expert Systems, October 2016, Vol. 33, No. 5 505
1 forecasting. That is its insensitivity to small high-frequency
xN 1 xN Z 2 N h Z 3 N h2 (2:28)
2 noise (Guo et al., 2002; Ahrens & Khalil, 2009).
Consequently, the forecasting model (2.7) is still feasible to
Theorem 1. Suppose that fxi gNi1 is a nancial time series the following set
sampled from a function F2 with sampling interval h.
That is, (0) = x1, (h) = x2, , ((N  1)h) = xN. Then, for
any initial data (x10, x20, x30) 3, there exists a positive F2 : f f j f H 3 0; ; _f f f
constant C* independent of h such that (2.27) and (2.28) < : ; is some high frequency noiseg
satisfy (2:34)
1
lim jxN 1  N hj C  h2 h3 (2:29)
0 6 provided is sufcient small.
Therefore, our forecasting strategy (2.14) seems still valid
Proof The Taylor expansion of at t = (N  1)h is as for the most the nancial time series. The numerical
follows: experiments can support our statements.
In fact, system (2.16) can be written as the following
N h N  1h _ N  1hh (2:30) abstract form:
1 1 
N  1hh2 h3 d
2! 3! Z t A h Z t V h f t
dt
where (N  1)h < < Nh. Combining (2.16), (2.28) and
(2.30), we have Z(0) = 0, (2.35) where Z(t) = (z1(t), z2(t), z3(t)) and
jxN 1  Nhj 0 1 2 3
3=h 1 0 3=h
1 B C 6 7
jZ 2 N  _ N  1hjh jZ 3 N  N  1hjh2 Ah @ 6=h2 0 1 A; V h 4 6=h2 5 (2:35)
1  2!
3 3
 h3 jZ 2 N  z2 N  1hj jz2 6=h 0 0 6=h
3!
1 (2:31)
N  1h  _ N  1hjh jZ 3 N  z3 We solve equation (2.35) to obtain
2!
N  1hj jz3 N  1h  N  1hjh2 t t
1  Z t 0 eAh t V h f d 0 e Ah t V h f t  d (2:36)
 h3
3!
Because (2.27) is the discretization of (2.16) by backward Because the matrix Ah is Hurwitz for any h > 0, there exist
Euler method (the convergence of backward Euler method two positive constants L and independent of time t such
can be found in (Butcher, 2003), there exists a positive that
constant C* independent of h such that
e Ah t 3 Let 0 as t (2:37)
jZ 2 N  z2 N  1hj (2:32)
jZ 3 N  z3 N  1hj C  h:
It follows from (2.36) that Z(t) is actually a weighted
average of f over the interval (0, t) by regarding eAh t as
Applying Lemma 2.1, there exist two positive constants
the weighted function. If we let 0 < t1 < t2 < t, then f(t2) is
and such that
more closer related to f(t) than f(t1) because of (2.37),
jz2 N  1h  _ N  1hj (2:33) which expresses increasing tight relationship with the

jz3 N  1h  N  1hj 2 ehN 1= decreasing time intervals between the neighbouring points

: in the time series data, and this instinctive ability of
incorporating the time correlations would help to improve
the forecasting performance of the TEF model in real
Combining (2.31), (2.32) and (2.33), we get (2.29). The proof applications.
is complete. In essence, the solving process of the f_ ti1 and ft i1 by
Remark 3. Equation (2.29) means that xN + 1 can be (2.36) is the integration of the function eAh t V h f before
regarded as a one-step forecasting for the time series time t. This integration expression shows that the TEF
fxi gNi1 , provided h, is small enough. Consequently, our model can realize the dynamic backward search to make full
forecasting strategy (2.14) is valid for nancial time series use of the historical knowledge in the time series and avoid
if we choose sufciently small. the selection problem of optimal backtracking steps that
served as the input variables in the SVM model.
Remark 4. Here, we note that the high-gain tracking Furthermore, the one-parameter setting mechanism of the
differentiator (2.16) enjoys a very good property for our TEF model simplies the application.

506 Expert Systems, October 2016, Vol. 33, No. 5 2016 Wiley Publishing Ltd
3. The hybrid methodology autoregressive integrated 4. Following equation (3.2), calculate the residual series
moving average and Taylor Expansion Forecasting i ti1 , which represents the nonlinear components in
development the time series fxi gti1 .
5. Training the TEF model on the residual series i ti1 ,
Because a real-world time series is complex in nature, there
select the parameter h for minimizing MAE.
is a universal agreement that a hybrid strategy incorporating
6. Employing the trained TEF model, make one-step ahead
both linear and nonlinear modelling abilities is a good
nonlinear forecast value n^l t1 of the t + 1
alternative for this problem. Following the well-established
7. performing the equation (3.4) and calculate the nal
linear and nonlinear modelling framework (Zhang, 2003;
forecasting value ^xt1 ^l t1 n^l t1 .
Pai & Lin, 2005; Chen, 2011; Lee & Tong, 2011; Wang
et al., 2012; Zhu & Wei, 2013; Xiong et al., 2015), this study
builds a novel hybrid forecasting model ARIMATEF via
combining the ARIMA and the TEF model to improve
4. Experimentation design and results
the forecasting accuracy. In the hybrid model, the real-
world time series fxi gNi1 is assumed to be composed of the 4.1. Data description
sum of the linear and nonlinear parts as follows:
Commodity prices are widely believed to inuence price
xi l i nl i (3:1) levels more broadly and thus are of interest to those whose
where li and nli are the linear and nonlinear part of the time decisions depend on their expectations of future ination
series data fxi gNi1 to be estimated by the ARIMA and the (Gargano & Timmermann, 2014). The samples employed in
TEF model, respectively. this study consist of historical, daily closing prices for several
Firstly, use the ARIMA to forecast the time series and representative commodity futures in the United States (oil,
obtain the predicted results. We denote that li the forecasting wheat, soy bean, corn, gold and silver futures in Chicago Board
value of li through the ARIMA model. Let of Trade) and commodity futures in the Britain (copper and
aluminium futures in London Metal Exchange) from
i xi  ^l i (3:2) WindDataBase. All the closing prices data sets of the futures
which is called the residual containing nonlinear component were for the nearest expiration contracts measured in US
at time ti and can be forecasted by the TEF model using the dollars between January 19, 2011 and January 18, 2013.
previous residual series. That is, Excluding public holidays, the authors learn that the lengths
of the time series for oil, wheat, soy bean, corn gold and silver
i f i ; ; il i (3:3) futures were all 506, while the data were 505 for the copper
and aluminium futures. Figure 2 describes the curve of the daily
where f is the nonlinear function modelled by the TEF and t
oil future prices and indicates that the future prices are highly
is the error. The i can be forecasted and obtain its predicted
uncertain, nonlinear, dynamic and complicated. Furthermore,
result n^l i . Therefore, we can obtain the hybrid nal forecast.
in Table 1, the authors present a brief descriptive statistics
features of these future prices series: the mean, standard
^xt ^l t n^lt (3:4) deviation, Kurtosis, Skewness and JarqueBera test, etc. The
statistics relating to skewness, Kurtosis and JarqueBera test
where ^l i is the ARIMA forecast value of the linear part li and all reveal that these futures prices are all non-normal.
n^li is the TEF model forecast value of the nonlinear part nli.
Combined with the previous analysis, a one-step forecast 115
oil future price
algorithm for any given time series fxi gNi1 can be concluded
110
as Algorithm 1:
105
US dollars per barrel

Algorithm 1 The one-step ahead forecasting methodology


100
(ARIMATEF)
Input: fxi gti1 , where xi . 95
Output: bx t1 .
90

1. Training the ARIMA model on the time series dataset 85


fxi gti1 , select the parameters p, d and q for minimizing
mean absolute error (MAE). 80

2. Applying the ARIMA model obtained in the previous 75


t
step on the givenn time ot series fxi gi1 , obtain a linear 0 50 100 150 200 250
date (day)
300 350 400 450 506

estimation result ^l i .
i1
3. Employing the trained ARIMA model, obtain one-step Figure 2: Original oil future price series from January 19,
ahead linear forecast value ^lt1 of the xt + 1. 2011 to January 18, 2013.

2016 Wiley Publishing Ltd Expert Systems, October 2016, Vol. 33, No. 5 507
0.397166
1.901548

0.000000
Aluminium

38.66530
In order to verify the effectiveness of the proposed

251.5635
2226.830
2160.500
2786.750
1837.000
algorithm, different ratios (90% and 70%) of these future

505
prices datasets are used as in-sample training sets and the
remaining values (10% and 30%) as out-of-sample testing sets,
respectively. The training data set is used exclusively to
develop the forecasting model and the test sample set to
evaluate the forecasting ability of the forecasting model using

0.446017
2.070822
34.91017
different evaluation criteria. The detailed data compositions

798.9826
Copper
8358.996

6784.75
10161.75
for the training and testing datasets are given in Table 2.
8196.5

0
505
4.2. Forecasting evaluation criteria
Considering no single accuracy measurement can capture
the distributional features of the errors (Armstrong &
4.256417
0.795744
3.531055
33.27209

59.34652 Collopy, 1992; Mathews & Diamantopoulos, 1994;


Silver

32.735
48.46
26.35

0 DeLurgio, 1998; Goodwin & Lawton, 1999; Daz-Robles


506 et al., 2008; Xiong et al., 2013), the authors employ several
traditional performance measures to evaluate the prediction
performance of the trained forecasting models for the test
data. These criteria are MAE, root mean square error
-0.544236
2.838125

0.000003

(RMSE), mean absolute percentage error (MAPE), percent


25.53135
118.2565
1628.474
Gold

standard error of prediction (SEP) (Ventura et al., 1995)


1644.4
1891.9
1313.2

and Willmotts index of agreement (WIA). The related


506

denitions of these criteria are shown as follows:

1X l
M AE jai  yi j (4:1)
l i1
v
0.068843
2.073321

0.000096

u l
65.32318

18.50468

u1 X
686.6981
Corn

RMSE t a i  y i 2 (4:2)
685.5
838.2
535.4

l i1
506

l  
100X 
yi  ai 
MAPE (4:3)

l i1 ai 
100
0.385734
3.266146

0.000893
14.04141


Soy bean

SEP RMSE (4:4)


138.059
1387.284

y
Xl
1762.2
1103.4

a  y i 2
506
1383

i1 i
WIA 1  Xl (4:5)
i1
jai  yjjyi  yj2

In the previous formula, ai and yi represent the predicted


1.609501

result and measured value, and andy are the respective means.
44.85093
0.22013
Wheat
736.9941

100.9768
731.875

l is the number of data. In general, MAE, RMSE, MAPE and


579.25
940.2

SEP are adopted as the evaluation criterion of level prediction


0
506

to quantify the errors in the same units of the variance. The


SEP allows the comparison of the forecast from different
models and different problems because of its dimensionless
7.839859
0.117016
2.277688

0.002294

(Daz-Robles et al., 2008). The smaller the values of them, the


94.76251

12.15465
Table 1: Descriptive statistics

closer the values are predicted by a model or an estimator


94.685
Oil

75.67
113.73

approaching the actually observed ones, namely, a better


506

predictor are associated with smaller values. Besides, during


the analysis of the experimental results, the proportion of the
total variance in the outcomes explained by the model is also
described by the coefcient of determination (R2). In
Observations
JarqueBera
Testing data

regression, the R2 coefcient of determination is a statistical


Probability
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis

measure of how well the regression line approximates the


Median
Mean

real-data points. An R2 of 1 indicates that the regression line


perfectly ts the data (DeLurgio, 1960).

508 Expert Systems, October 2016, Vol. 33, No. 5 2016 Wiley Publishing Ltd
Table 2: Sample compositions in four data sets
Series Sample size Training set (size) Test set (size)

Oil 506 Jan. 19, 2011Nov. 6, 2012 (456) Nov. 7, 2012Jan. 18, 2013 (50)
506 Jan. 19, 2011Jun. 14, 2012 (355) Jun. 15, 2012Jan. 18, 2013 (151)
Wheat 506 Jan. 19, 2011Nov. 6, 2012 (456) Nov. 7, 2012Jan. 18, 2013 (50)
506 Jan. 19, 2011Jun. 14, 2012 (355) Jun. 15, 2012Jan. 18, 2013 (151)
Soy bean 506 Jan. 19, 2011Nov. 6, 2012 (456) Nov. 7, 2012Jan. 18, 2013 (50)
506 Jan. 19, 2011Jun. 14, 2012 (355) Jun. 15, 2012Jan. 18, 2013 (151)
Corn 506 Jan. 19, 2011Nov. 6, 2012 (456) Nov. 7, 2012Jan. 18, 2013 (50)
506 Jan. 19, 2011Jun. 14, 2012 (355) Jun. 15, 2012Jan. 18, 2013 (151)
Gold 506 Jan. 19, 2011Nov. 6, 2012 (456) Nov. 7, 2012 Jan. 18, 2013 (50)
506 Jan. 19, 2011Jun. 14, 2012 (355) Jun. 15, 2012Jan. 18, 2013 (151)
Silver 506 Jan. 19, 2011Nov. 6, 2012 (456) Nov. 7, 2012Jan. 18, 2013 (50)
506 Jan. 19, 2011Jun. 14, 2012 (355) Jun. 15, 2012Jan. 18, 2013 (151)
Copper 505 Jan. 19, 2011Nov. 6, 2012 (455) Nov. 7, 2012Jan. 18, 2013 (50)
505 Jan. 19, 2011Jun. 14, 2012 (354) Jun. 15, 2012Jan. 18, 2013 (151)
Aluminium 505 Jan. 19, 2011Nov. 6, 2012 (455) Nov. 7, 2012Jan. 18, 2013 (50)
505 Jan. 19, 2011Jun. 14, 2012 (354) Jun. 15, 2012Jan. 18, 2013 (151)

4.3. The models parameters determination For the SVM, the appropriate parameters can improve
the generalization of the learning machine. In this study,
The future prices are varied with daily changes featured by the poly function is selected as the kernel function, and the
strong non-stationarity. Because the ARIMA model is only parameters C and e of the model for all the experimental
t to a stationary time series, an initial differencing step future prices are adjusted based on the principle of
(corresponding to the integrated part of the model) can minimizing the MAE of the model (Daz-Robles et al.,
be applied to remove the non-stationarity. The Akaike 2008), presented in the Tables 3 and 4. In addition,
Information Criterion is used to identify the best model. according to the point of view of Zhu and Wei (2013), the
By trial and error, the optimal ARIMA models generated authors determine the dimension p of the input data Xt =
from the different ratios of the several future price datasets (t  1, t  2, , t  p) of SVM by the autoregressive order
are listed in Tables 3 and 4. in the ARIMA(p, d, q) mode.

Table 3: Parameters of different models on 10% testing set


ARIMASVM ARIMATEF

Oil p = 3, d = 1, q = 3, C = 1, e = 1 p = 3, d = 1, q = 3, h = 55
Wheat p = 3, d = 1, q = 2, C = 1, e = 1 p = 3, d = 1, q = 2, h = 60
Soy bean p = 2, d = 1, q = 2, C = 2, e = 1 p = 2, d = 1, q = 2, h = 60
Corn p = 2, d = 1, q = 3, C = 2, e = 1 p = 2, d = 1, q = 3, h = 60
Gold p = 2, d = 1, q = 2, C = 1.2, e = 0.9 p = 2, d = 1, q = 2, h = 65
Silver p = 2, d = 1, q = 1, C = 2, e = 1 p = 2, d = 1, q = 1, h = 60
Copper p = 3, d = 1, q = 3, C = 1, e = 1 p = 3, d = 1, q = 3, h = 55
Aluminium p = 2, d = 1, q = 1, C = 1, e = 1 p = 2, d = 1, q = 1, h = 60
ARIMASVM, autoregressive integrated moving average and support vector machine; ARIMATEF, autoregressive integrated moving average
and Taylor Expansion Forecasting.

Table 4: Parameters of different models on 30% testing set


ARIMASVM ARIMATEF

Oil p = 2, d = 1, q = 2, C = 1, e = 1 p = 2, d = 1, q = 2, h = 60
Wheat p = 3, d = 1, q = 3, C = 1, e = 1 p = 3, d = 1, q = 3, h = 55
Soy bean p = 3, d = 1, q = 2, C = 2, e = 1.1 p = 3, d = 1, q = 2, h = 60
Corn p = 3, d = 1, q = 2, C = 3, e = 1 p = 3, d = 1, q = 2, h = 55
Gold p = 2, d = 1, q = 2, C = 1, e = 1 p = 2, d = 1, q = 2, h = 60
Silver p = 2, d = 1, q = 2, C = 3, e = 1 p = 2, d = 1, q = 2, h = 60
Copper p = 2, d = 1, q = 2, C = 2, e = 1 p = 2, d = 1, q = 2, h = 55
Aluminium p = 2, d = 1, q = 1, C = 2, e = 1 p = 2, d = 1, q = 1, h = 60
ARIMASVM, autoregressive integrated moving average and support vector machine; ARIMATEF, autoregressive integrated moving average
and Taylor Expansion Forecasting.

2016 Wiley Publishing Ltd Expert Systems, October 2016, Vol. 33, No. 5 509
Figure 3 presents MAE values of the hybrid the valve of MAE decreases rapidly with h increasing
ARIMATEF methodology corresponding to the parameter before reaching the optimization point of 1/55, and as
h value of the TEF model on different datasets. In the value of h continues to increase, the MAE of the
statistics, the MAE is used to measure how close forecasts ARIMATEF forecasting results do not further slide but
or predictions are to the eventual outcomes. As shown in maintain at a certain level with a slight uctuation. This
Figure 2, in the case of 10% of oil future price testing set, result indicates that the forecasting model ARIMATEF

2 2.8

2.6
Mean Absolute Error (MAE)

1.8

Mean Absolute Error (MAE)


2.4

1.6 2.2

2
1.4
1.8

1.2 1.6

1.4
1
1.2

0.8 1
1/70 1/55 1/45 1/40 1/35 1/30 1/25 1/75 1/601/55 1/45 1/40 1/35 1/30
h h

Figure 3: Mean absolute percentage error (MAE) of autoregressive integrated moving average and Taylor Expansion
Forecasting model with different parameter h.

98 96

96
Modeled Gold Future Price

94
94
Oil Future Price

92
92

90 90

88
88
86
86
84

82 84
0 10 20 30 40 50 84 86 88 90 92 94 96
Prediction Set Actual Gold Future Price

105 100
Modeled Gold Future Price

100
95
Oil Future Price

95
90
90

85
85

80
80

75 75
0 25 50 75 100 125 150 75 80 85 90 95 100
Prediction Set Actual Gold Future Price

Figure 4: Time series and scatter plots of daily original oil future prices with forecasted ones by autoregressive integrated moving
average and support vector machine (ARIMASVM) and autoregressive integrated moving average and Taylor Expansion
Forecasting (ARIMATEF). The lines in the scatter plots represent linear regression lines.

510 Expert Systems, October 2016, Vol. 33, No. 5 2016 Wiley Publishing Ltd
that can converge rapidly with h is staying at a relatively produced by the Quantitative Micro Software corporation,
stable interval. As discussed previously, h should be small while the simulation with the SVM is performed by
to guarantee the accuracy of the differentiator. It is, introducing WEKA 3 toolbox (developed by the
however, that the differentiator becomes more insensitive University of Waikato, New Zealand) and the Taylor
to high-frequency noise as h goes to zero. Hence, there is expansion forecasting model is developed by MATLAB,
a trade-off between the tracking accuracy and the noise produced by the Mathworks Laboratory Corporation.
toleration, and the choice of h depends on the experimental The estimated results on the different future price series
data. Some specic settings of the parameter h are listed in are similar, and experimental result of the oil future is
Tables 3 and 4. employed in the following analysis.
The actual and predicted values provided by the two
hybrid models on different ratios of the oil future price
4.4. Experimental results
dataset are presented in Figure 4. The point-to-point
This section reports the experimental results. The ARIMA comparisons of the experimental results (Figure 4(a) and
model is established with the Eviews (Version6.0), (c)) show that the tting values of the ARIMATEF

Table 5: Performance comparison on 10% testing sample size


MAE MAPE RMSE WIA SEP

Oil ARIMASVM 1.009011 1.138186 1.302891 0.954081 1.459920


ARIMATEF 0.836160 0.942360 0.969798 0.969798 1.217591
Wheat ARIMASVM 8.652815 1.045811 10.75393 0.986294 1.31388396
ARIMATEF 5.739791 0.704931 7.64003 0.993202 0.93343669
Soy bean ARIMASVM 13.16859 0.917355 17.92699 0.937918 1.25300949
ARIMATEF 10.41991 0.730895 13.21264 0.967434 0.92349939
Corn ARIMASVM 6.259969 0.864372 7.551768 0.971811 1.04407279
ARIMATEF 4.425204 0.614458 5.791064 0.983947 0.80064595
Gold ARIMASVM 7.502658 0.442401 10.039095 0.969306 0.591931
ARIMATEF 7.413046 0.436839 9.298504 0.974817 0.548264
Silver ARIMASVM 0.332304 1.040035 0.418906 0.974631 1.30947381
ARIMATEF 0.230778 0.724665 0.285899 0.98883 0.89370261
Copper ARIMASVM 61.10567 0.756731 80.19316 0.945171 1.01225235
ARIMATEF 52.48895 0.648645 64.41928 0.965541 0.81314373
Aluminium ARIMASVM 19.95401 0.952813 27.84308 0.952069 1.35775994
ARIMATEF 17.64447 0.840733 23.78068 0.966478 1.15965807
ARIMASVM, autoregressive integrated moving average and support vector machine; ARIMATEF, autoregressive integrated moving average
and Taylor Expansion Forecasting; MAE, mean absolute error; RMSE, root mean square error; MAPE, mean absolute percentage error;
WIA, Willmotts index of agreement; SEP, standard error of prediction.

Table 6: Performance comparison on 30% testing sample size


MAE MAPE RMSE WIA SEP

Oil ARIMASVM 1.441800 1.619349 1.835592 0.956335 2.042644


ARIMATEF 1.121012 1.257105 1.400932 0.975145 1.558955
Wheat ARIMASVM 12.69989 1.517837 16.49783 0.982893 1.954397
ARIMATEF 9.177549 1.089095 12.42827 0.990174 1.472301
Soy bean ARIMASVM 18.49713 1.198798 24.04017 0.988424 1.567922
ARIMATEF 14.47236 0.940876 18.64128 0.993077 1.215801
Corn ARIMASVM 11.05631 1.534577 14.91971 0.983289 2.016412
ARIMATEF 8.787177 1.210936 11.78222 0.989543 1.592378
Gold ARIMASVM 9.111308 0.544668 12.509227 0.980355 0.745035
ARIMATEF 7.492380 0.447127 9.615850 0.994366 0.572709
Silver ARIMASVM 0.33578 1.089005 0.455701 0.992326 1.471489
ARIMATEF 0.265055 0.858851 0.344308 0.99563 1.111795
Copper ARIMASVM 67.79504 0.870927 89.59409 0.974563 1.14514
ARIMATEF 54.44996 0.697545 67.67018 0.985855 0.864921
Aluminium ARIMASVM 20.26977 1.021324 25.84553 0.981005 1.301883
ARIMATEF 15.273978 0.766806 19.65335 0.989348 0.989973
ARIMASVM, autoregressive integrated moving average and support vector machine; ARIMATEF, autoregressive integrated moving average
and Taylor Expansion Forecasting; MAE, mean absolute error; RMSE, root mean square error; MAPE, mean absolute percentage error;
WIA, Willmotts index of agreement; SEP, standard error of prediction.

2016 Wiley Publishing Ltd Expert Systems, October 2016, Vol. 33, No. 5 511
model are closer to the real data than those of the hybrid model to the future time series data forecasting
ARIMASVM model for the different testing datasets, problem.
and it can be seen clearly from the graph that the The detailed statistical results are listed in the Tables 5
proposed ARIMATEF model has greater capability to and 6. Compared with the ARIMASVM, the ARIMATEF
detect the local extremum points than the ARIMASVM model performed much better in different ratios of testing
model. The main reason is that the ARIMATEF model sets by all criteria. For out-of-sample precision
enjoys the property to be insensitive to the noise comparisons, the MAE of the ARIMASVM model is at
disturbance in dealing with the complex time series data, 1.0090 and 1.4418, whereas that of the ARIMATEF is
and in Figure 4(b) and (d), the estimated values of the much lower, at 0.8362 and 1.1210, respectively. Similar
hybrid ARIMATEF model cluster around the perfect results are also obtained from other level prediction
coefcient of determination (R2 = 1) while the scatter plot indicators. All these statistical indices listed in the Tables 5
for ARIMASVM is more dispersed. That is to say about and 6 indicate that the ARIMATEF model boasts
more than 88% of the explained variance between the outstanding capability to forecast the future price series.
observed and estimated future datasets in the forecasting To evaluate the robustness of the ARIMATEF model,
phase is captured by the hybrid ARIMATEF model, the authors test the performance of the different models on
which reveals a powerful analysing capability of the new several other representative commodities future prices.

Figure 5: Time series and scatter plots of several representative future daily prices (wheat, soy bean, corn, gold, silver, aluminum
and copper) with forecasted ones by autoregressive integrated moving average and support vector machine (ARIMASVM) and
autoregressive integrated moving average and Taylor Expansion Forecasting (ARIMATEF) on the 10% testing set. The lines in
the scatter plots represent linear regression lines.

512 Expert Systems, October 2016, Vol. 33, No. 5 2016 Wiley Publishing Ltd
Similarly, two relative ratios of 70% and 90% are considered points with different time intervals in the time series data
as training set. The point-to-point comparisons of results by the notion of the weighted function, which overcomes
and scatter plots are plotted in Figures 5 and 6, and the the shortcomings of the equivalence relationship expressed
forecasting performances of the different two models are by Euclidean distance in the SVM. This instinctive ability
summarized in Tables 5 and 6. It can be observed again that of incorporating the time correlation improves the accurate
the ARIMATEF model has much better out-of-sample forecast performance of the ARIMATEF model. On the
forecasting accuracy because of smaller MAE, RMSE, other hand, different input variables affect the SVM
MAPE, SEP and larger WIA and R2 values on different generalization ability signicantly. However, in real
future price datasets, which demonstrates that the application, the scope of the backtracking steps selected
ARIMATEF model is a preferable hybrid model and can as in the input variables is difcult to determine, which
be quite robust in terms of producing more accurate makes the effort to use SVM in forecasting time series data
forecasting results for nancial time series forecasting based on limited information a very difcult task. By
problem. contrast, the TEF model can realize the dynamic
The possible explanations of the ARIMATEF models backtracking to the historical data by equation (2.36),
superiority to the ARIMASVM could be twofold: on one which avoids the difculty of back search for the best time
hand, besides the nonlinear model ability, the TEF model lags in the time series and makes full use of the historical
expresses different relationships between the neighbouring information hidden in the previous data. Additionally,

Figure 6: Time series and scatter plots of several representative future daily prices (wheat, soy bean, corn, gold, silver, aluminum
and copper) with forecasted ones by autoregressive integrated moving average and support vector machine (ARIMASVM) and
autoregressive integrated moving average and Taylor Expansion Forecasting (ARIMATEF) on the 30% testing set. The lines in
the scatter plots represent linear regression lines.

2016 Wiley Publishing Ltd Expert Systems, October 2016, Vol. 33, No. 5 513
compared with other methods, the tracking differentiator is between different nancial markets and so on through some
more feasible in extracting the derivatives of the time series specic learning mechanism, should be more reasonable.
data even if the specic model of the target is unknown Additionally, as a new proposed model, some problems
(Guo et al., 2002; Ibrir, 2004; Davila et al., 2005). should be addressed in the future research. Firstly, only
Therefore, the hybrid methodology not only exploits the several commodity future price datasets are used as
unique strength of the individual models but also develops illustrative examples to evaluate the performances of the
a more powerful hybrid model that can improve the ARIMATEF model in this study. Therefore, using the
comprehensive analysing ability to the time series data with proposed methodology for other kinds of nancial time series
complex characteristics. Therefore, considering the superior is subject to future investigations. Yet, the problem of making
performance of the ARIMATEF forecasts, investors could multi-step ahead forecasts based on the TEF model is still
develop an effective decision support system based on this open for both theoretical and applied research. Finally, the
forecasting model to improve the investment efciency. tracking differentiator is applied to extract the derivatives
from the given series in this paper. However, some recent
studies propose alternative evolutionary approaches, such as
5. Conclusions the study of Su et al. (2005) and the study of Guo and Zhao
A novel hybrid model ARIMATEF incorporating ARIMA (2011). Thus, in the future, a novel TEF model with a new
and TEF models is proposed to forecast the nancial time tracking differentiator to approach the derivatives of the time
series. The main highlight of our works lies in that series trajectory could be investigated.

1. We propose a new TEF model based on the tracking


differentiator to predict the nonlinear components of the Acknowledgements
time series dataset. Compared with the SVM, the TEF
We would like to express our sincere appreciation to the
forecasting model incorporates the time correlation
reviewers for their valuable comments, which have greatly
knowledge and makes full use of the historical information
helped us improve the paper quality. This work also
in the time series data to improve the forecasts accuracy,
obtained foundation from the following two projects:
and the one-parameter setting work of the TEF model
National Natural Science Foundation of China (No.
simplies its application in real situations.
71371113) and Humanity & Social Science Foundation by
2. The novel hybrid model ARIMATEF enjoys superior
the Ministry of Education of China (No. 13YJA790154).
ability to detect directional change and is robust in
dealing with different time series because of the factor
that the tracking differentiator is insensitive to the noise
disturbance in extracting the specic derivatives of the Appendix A: Introduction to the tracking differentiator
time series trajectories. The empirical results show that Although the tracking differentiator is widely used in
the hybrid model can produce lowest level evaluations control theory and practice, to our best knowledge, it has
and highest index of agreement and R2 values on the not been used in the forecasting of nancial time series.
different future price datasets. Therefore, the tracking differentiator is a new mathematical
3. This study, motivated by the evidence that different tool in forecasting nancial time series. In control practice,
forecasting models can complement each other in when the signal is corrupted by noise, the derivative of the
modelling complex data sets, develops a novel hybrid usually rapidly varying noise will drown out the derivative
methodology ARIMATEF via combining the ARIMA of the signal. For example, assume that the sinusoidal signal
and TEF models. In the hybrid model, the ARIMA is corrupted by some Gauss noises (t) of standard normal
model is applied to capture linear patterns hidden in the distribution N 0; 1 with intensity , that is,
time series data, whilst the TEF model is used to capture
nonlinear patterns in the data. Experimental results u sin t t (4:6)
obtained by running on real future datasets show that
the proposed hybrid methodology ARIMATEF greatly The simulation of differentiating signal u directly is
improves prediction performance. plotted in Figure 6 (left). It is seen that the derivative of
noise drown out the derivative of the signal even if the
The literatures show that there is no universal model intensity is small.
applicable to any environment. The proposed model in this The tracking differentiator is designed to extract the
paper is designed to improve the forecasting accuracy from derivative of given signals corrupted by some noise. Much
the data driven prospective by incorporating the intrinsic research has contributed to differential trackers like high-
characteristics of the historical nancial time series data. gain observer-based differentiator (Dabroom & Khalil,
Considering the special environment in crisis, some new 1999; Guo & Zhao, 2013), the super-twisting second-order
events driven forecasting models, which incorporate sliding-mode algorithm (Davila et al., 2005), linear time-
important information such as the differential signal of the derivative tracker (Guo et al., 2002; Ibrir, 2004) and robust
market behaviour and the linkage or contagion effect exact differentiation (Levant, 1998), name just a few.

514 Expert Systems, October 2016, Vol. 33, No. 5 2016 Wiley Publishing Ltd
More specially, if we assume that f make the following DAVILA, J., L. FRIDMAN and A. LEVANT (2005) Second-order
free system sliding-modes observer for mechanical systems, IEEE
 Transactions on Automatic Control, 50, 17851789.
y_ 1 y2 DELURGIO, S.A. (1960) Principles and Procedures of Statistics with
(4:7)
y_ 2 f y1 ; y2 Special Reference to the Biological Sciences, McGraw-Hill,
New York etc.
be globally stable in the sense that limt yi 0; i 1; 2, DELURGIO, S.A. (1998) Forecasting Principles and Applications,
McGraw-Hill, Boston etc.
then system
8 DAZ-ROBLES, L.A., J.C. ORTEGA, J.S. FU, G.D. REED, J.C. CHOW,
< y_ 1 y2 J.G. WATSON and J.A. MONCADA-HERRERA (2008) A hybrid
1 (4:8) ARIMA and articial neural networks model to forecast
: y_ 2 2 f y1  u; y2 particulate matter in urban areas: the case of Temuco, Chile.
Atmospheric Environment, 42, 83318340.
DUAN, L. and L.D. XU (2012) Business intelligence for enterprise
satises, for any T > 0, (Su et al., 2005; Guo & Zhao, 2011) systems: a survey, IEEE Transactions on Industrial Informatics,
8, 679687.
T FENG, H.Y.P. and S.J. LI (2013) A tracking differentiator based on
0 jy1 t  utj jy2 t  u_ tjdt0 as 0 (4:9) Taylor expansion, Applied Mathematics Letters, 26, 735740.
GARGANO, A. and A. TIMMERMANN (2014) Forecasting commodity
Therefore, y1 and y2 can be regarded as an price indexes using macroeconomic and nancial predictors,
approximation to u and :u: , respectively. International Journal of Forecasting, 30, 825843.
GOODWIN, P. and R. LAWTON (1999) On the asymmetry of the
The main advantage of tracking differentiator lies in the
symmetric MAPE, International Journal of Forecasting, 15,
fact that it is noise tolerant. For example, we choose the 405408.
following linear form tracking differentiator of type (4.8): GOOIJER, J.G.D. and R.J. HYNDMAN (2006) 25 years of time series
forecasting, International Journal of Forecasting, 22, 443473.
8 GUO, B.Z., J.Q. HAN and F.B. XI (2002) Linear tracking-
< y_ 1 y2 differentiator and application to online estimation of the
2 2 (4:10) frequency of a sinusoidal signal with random noise perturbation,
: y_ 2  2 y1  u  y2
International Journal of Systems Science, 33, 351358.
GUO, B.Z. and Z.L. ZHAO (2011) On convergence of tracking
where the signal u is given by (4.6). The simulation of tracking differentiator, International Journal of Control, 84, 693701.
results is plotted in Figure 1 (right). It is easy to nd that the GUO, B.Z. and Z.L. ZHAO (2013) Weak convergence of nonlinear
high-gain tracking differentiator, IEEE Transactions on
tracking differentiator is robust to the small white noise.
Automatic Control, 58, 10741080.
GUO, J., S. XU and Z. BI (2014) An integrated cost-based approach
for real estate appraisals, Information Technology & Management,
15, 131139.
HADAVANDI, E., H. SHAVANDI and A. GHANBARI (2010) Integration
References
of genetic fuzzy systems and articial neural networks for stock
ABU-MOSTAFA, Y. and A. ATIYA (1996) Introduction to nancial price forecasting, Knowledge-Based Systems, 23, 800808.
forecasting, Applied Intelligence, 6, 205213. HANSEN, J.V., J.B. MCDONALD and R.D. NELSON (2006) Some
AHRENS, J.H. and H.K. KHALIL (2009) High-gain observers in the evidence on forecasting time-series with support vector
presence of measurement noise: a switched-gain approach, machines, The Journal of the Operational Research Society, 57,
Automatica, 45, 936943. 10531063.
ARMSTRONG, J.S. and F. COLLOPY (1992) Error measures for HUANG, W., Y. NAKAMORI and S.Y. WANG (2005) Forecasting
generalizing about forecasting methods: empirical comparisons, stock market movement direction with support vector machine,
International Journal of Forecasting, 8, 6980. Computers & Operations Research, 32, 25132522.
ATSALAKIS, G.S. and K.P. VALAVANIS (2009) Surveying stock IBRIR, S. (2004) Linear time-derivative trackers, Automatica, 40,
market forecasting techniques Part II: soft computing methods, 397405.
Expert Systems with Applications, 36(3, Part 2), 59325941. KAO, L.J., C.C. CHIU, C.J. LU and J.L. YANG (2013) Integration of
BOX, G.E.P. and G.C. TIAO (1975) Intervention analysis with nonlinear independent component analysis and support vector
applications to economic and environmental problems, Journal regression for stock price forecasting, Neurocomputing, 99,
of the American Statistical Association, 70, 7079. 534542.
BRILLINGER, D.R. (1974) Time Series 45, Holt, Rinehart and KAUFFMAN, R., J. LIU and D. MA (2015) Technology investment
Winston, Inc, New York-Montreal, Que-London, 661712. decision-making under uncertainty, Information Technology and
BUTCHER, J.C. (2003) Numerical Methods for Ordinary Differential Management, 16, 153172.
Equations, John Wiley & Sons, New York. KIM, K. (2003) Financial time series forecasting using support vector
CAO, L.J. (2003) Support vector machines experts for time series machines, Neurocomputing, 55, 307319.
forecasting, Neurocomputing, 51, 321339. LEE, Y.S. and L.I. TONG (2011) Forecasting time series using a
CHEN, K.Y. (2011) Combining linear and nonlinear model in methodology based on autoregressive integrated moving
forecasting tourism demand, Expert Systems with Applications, average and genetic programming, Knowledge-Based Systems,
38, 1036810376. 24, 6672.
CHENG, G. and Y. YANG (2015) Forecast combination with outlier LEVANT, A. (1998) Robust exact differentiation via sliding mode
protection, International Journal of Forecasting, 31, 223237. technique, Automatica, 34, 379384.
DABROOM, A.M. and H.K. KHALIL (1999) Discrete-time LU, C.J., T.S. LEE and C.C. CHIU (2009) Financial time series
implementation of high-gain observers for numerical forecasting using independent component analysis and support
differentiation, International Journal of Control, 72, 15231537. vector regression, Decision Support Systems, 47, 115125.

2016 Wiley Publishing Ltd Expert Systems, October 2016, Vol. 33, No. 5 515
MATHEWS, B.P. and A. DIAMANTOPOULOS (1994) Towards a YUAN, R., Z. LI, X. GUAN and L. XU (2010) An SVM-based
taxonomy of forecast error measures: a factor-comparative machine learning method for accurate internet trafc
investigation of forecast error dimensions, Journal of Forecasting, classication, Information Systems Frontiers, 12, 149156.
13, 409416. ZHANG, G.P. (2003) Time series forecasting using a hybrid ARIMA
NIE, H., G. LIU, X. LIU and Y. WANG (2012) Hybrid of ARIMA and neural network model, Neurocomputing, 50, 159175.
and SVMs for short-term load forecasting, Energy Procedia, 16, ZHU, B.Z. and Y.M. WEI (2013) Carbon price forecasting with a
14551460. novel hybrid ARIMA and least squares support vector machines
PAI, P.F. and C.S. LIN (2005) A hybrid ARIMA and support vector methodology, Omega, 41, 517524.
machines model in stock price forecasting, Omega, 33, 497505. ZHU, X., H. WANG, L. XU and H. LI (2008) Predicting stock index
PAN, S., L. WANG, K. WANG, Z. BI, S. SHAN and B. XU (2014) A increments by neural networks: the role of trading volume under
knowledge engineering framework for identifying key impact different horizons, Expert Systems with Applications, 34,
factors from safety-related accident cases, Systems Research and 30433054.
Behavioral Science, 31, 383397.
POLIMENIS, V. and I. NEOKOSMIDIS (2014) The global nancial crisis
and its transmission to Asia Pacic, Journal of Management
Analytics, 1, 266284. The authors
RUBIO, G., H. POMARES, I. ROJAS and L.J. HERRERA (2011) A
heuristic method for parameter selection in LS-SVM: application Guisheng Zhang
to time series prediction, International Journal of Forecasting, 27,
725739. Guisheng Zhang is currently a lecturer at the School of
SHI, S., L. XU and B. LIU (1988) Applications of articial neural Economics and Management, Shanxi University, China.
networks to the nonlinear combination of forecasts, Expert Guisheng Zhang received his Bachelor degree and Master
Systems, 110, 195201. degree in School of Computer and Information Technology,
SHI, S., L. XU and B. LIU (1999) Improving the accuracy of
nonlinear combined forecasting using neural networks, Expert Shanxi University, China, in 2000, 2007, respectively. Then
Systems with Applications, 16, 4954. he obtained his PhD degree in School of Economics and
SU, Y., C. ZHENG, S. DONG and B. DUAN (2005) A simple nonlinear Management, Shanxi University, in 2016. His research
velocity estimation for high-performance motion control, IEEE interests focus on nancial time series analysis and
Transactions on Industrial Electronics, 52, 11611169. forecasting.
TAY, F.E.H. and L.J. CAO (2001) Application of support vector
machines in nancial time series forecasting, Omega, 29,
309317. Xindong Zhang
TAY, F.E.H. and L.J. CAO (2002) Modied support vector machines
in nancial time series forecasting, Neurocomputing, 48, 847861. Xindong Zhang is currently a Professor at the School of
VAPNIK, V.N. (1995) The Nature of Statistical Learning Theory, Economics and Management, Shanxi University, China,
Springer New York. where she obtained her Bachelor degree and Master degree
VAPNIK, V.N. (1998) Statistical Learning Theory, Wiley New York. in Mathematics. Then she received her PhD degree in
VENTURA, S., M. SILVA, D. PEREZ-BENDITO and C. HERVAS (1995) Management from the Tianjin University of Finance and
Articial neural networks for estimation of kinetic analytical
parameters, Analytical Chemistry, 67, 15211525. Economics, China. Her research interests include asset
WANG, B., H. HUANG and X. WANG (2012) A novel text mining pricing and corporate nance.
approach to nancial time series forecasting, Neurocomputing,
83, 136145. Hongyinping Feng
WANG, X. (2015) Support vector machine and ROC curves for
modeling of aircraft fuel consumption, Journal of Management Hongyinping Feng received the BSc degree, the MSc degree
Analytics, 2, 2234. and the PhD degree, in mathematics in 2003, 2006 and 2013,
WEI, L., W. ZHANG, X. XIONG and Y. ZHAO (2014) A multi-agent respectively, all from Shanxi University, China. He was a
system for policy design of tick size in stock index futures
markets, Systems Research and Behavioral Science, 31, 512526. visiting scholar at University of the Witwatersrand, South
XIONG, T., Y. BAO and Z. HU (2013) Beyond one-step-ahead Africa from 2013 to 2014. He is currently a Professor in
forecasting: evaluation of alternative multi-step-ahead forecasting the School of Mathematics Science, Shanxi University,
models for crude oil prices, Energy Economics, 40, 405415. China. His research interests focus on distributed parameter
XIONG, T., C. LI, Y. BAO, Z. HU and L. ZHANG (2015) A systems control.
combination method for interval forecasting of agricultural
commodity futures prices, Knowledge-Based Systems, 77,
92102.

516 Expert Systems, October 2016, Vol. 33, No. 5 2016 Wiley Publishing Ltd

You might also like