You are on page 1of 12

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 40, NO.

5, OCTOBER 2010 1255

A Stochastic HMM-Based Forecasting


Model for Fuzzy Time Series
Sheng-Tun Li, Member, IEEE, and Yi-Chung Cheng

AbstractRecently, fuzzy time series have attracted more aca- Recently, the theory of fuzzy logic has been widely recognized
demic attention than traditional time series due to their capability as a successful approach for dealing with data uncertainty.
of dealing with the uncertainty and vagueness inherent in the For time series, the uncertain values can be modeled as fuzzy
data collected. The formulation of fuzzy relations is one of the
key issues affecting forecasting results. Most of the present works variables, resulting in so-called fuzzy time series [1]. The term
adopt IFTHEN rules for relationship representation, which leads fuzzy time series has been used with several different meanings
to higher computational overhead and rule redundancy. Sullivan [2]: 1) time series with uncertain single data (fuzzy data) at each
and Woodall proposed a Markov-based formulation and a fore- point in time [3], [4]; 2) time series with fuzzified real-valued
casting model to reduce computational overhead; however, its single data at each point in time [5]; and 3) fuzzy time series
applicability is limited to handling one-factor problems. In this
paper, we propose a novel forecasting model based on the hidden based on a set of elementary finite time series and composed
Markov model by enhancing Sullivan and Woodalls work to allow of several significant representative courses [6]. It is the second
handling of two-factor forecasting problems. Moreover, in order definition that will be used for the rest of this paper.
to make the nature of conjecture and randomness of forecasting In 1993, Song and Chissom introduced the theory of fuzzy
more realistic, the Monte Carlo method is adopted to estimate the logic into forecasting time series problems and proposed a
outcome. To test the effectiveness of the resulting stochastic model,
we conduct two experiments and compare the results with those new paradigm known as fuzzy time series, capable of dealing
from other models. The first experiment consists of forecasting the with vague and incomplete data represented as linguistic values
daily average temperature and cloud density in Taipei, Taiwan, under uncertain circumstances [5], [7], [8]. They established
and the second experiment is based on the Taiwan Weighted a four-step framework to manipulate the forecasting problem:
Stock Index by forecasting the exchange rate of the New Taiwan 1) determine and partition the universe of discourse into inter-
dollar against the U.S. dollar. In addition to improving forecasting
accuracy, the proposed model adheres to the central limit theorem, vals; 2) define fuzzy sets from the universe of discourse and
and thus, the result statistically approximates to the real mean of fuzzify the time series; 3) derive fuzzy relationships existing
the target value being forecast. in the fuzzified time series; and 4) forecast and defuzzify the
Index TermsForecasting, fuzzy time series, hidden Markov forecasting outputs. Finally, they validated their model using
model (HMM), Monte Carlo method. enrollment data from the University of Alabama.
Since Song and Chissoms pioneering work, a number of
I. I NTRODUCTION related research works have been reported that follow their
framework and aim to improve forecasting accuracy and/or

T HE FORECASTING problem of time series data plays an


important role in various domains, such as air pollution,
population growth, rainfall prediction, and stock forecasting.
reduce the computational overhead. These works include order-
1 [9][15], high-order [16][21], single-factor [9], [10], [12]
[14], [18], [21], two-factor [11], [17], [19], [20], and multi-
It deals with forecasting future outcomes from a temporally factor models [15]. All of them rely on constructing groups
ordered sequence of past observed data points, whose values are of fuzzy relationships in advance represented as IFTHEN
usually real numbers. However, traditional time series analysis rules, which come with higher computational overhead and
cannot handle the vagueness and uncertainty inherent in certain rule redundancy [22]. In order to alleviate the computational
data due to inaccuracies in measurements, incomplete sets of overhead in deriving the fuzzy relationship rules, Sullivan and
observations, or difficulties in obtaining the measurements [1]. Woodall proposed a so-called Markov-based model by using
conventional matrix multiplication [13]. In their model, the
fuzzy relationships are represented by a time-invariant Markov
Manuscript received December 9, 2007; revised June 24, 2008 and
April 20, 2009; accepted November 1, 2009. Date of publication December 18,
process, in which the relationship matrix is determined by the
2009; date of current version September 15, 2010. This work was supported probability distributions of linguistic values and each row of
by the National Science Council, Taiwan, under Contracts NSC96-2416-H- the matrix is unit normalized. However, their model does not
006-011-MY2 and NSC97-2410-H-165-003. This paper was recommended by
Associate Editor A. Nuernberger.
take into consideration the frequency of state transitions on
S.-T. Li is with the Institute of Information Management and the Department the estimation of the relationship matrix, resulting in missing
of Industrial and Information Management, National Cheng Kung University, information and causing the forecasting to be unsatisfactory
Tainan 701, Taiwan (e-mail: stli@mail.ncku.edu.tw).
Y.-C. Cheng is with the Department of International Business Management, [23]. In addition, it is only applicable to one-factor forecasting
Tainan University of Technology, Yongkang, Tainan County 710, Taiwan problems. In the real world, an event may be affected by inter-
(e-mail: t20042@mail.tut.edu.tw). related factors, and thus, a forecasting model that can handle
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. multiple ones will be more realistic; therefore, the results that it
Digital Object Identifier 10.1109/TSMCB.2009.2036860 produces will be more useful [19], [24].

1083-4419/$26.00 2009 IEEE


1256 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

In this paper, we propose a novel forecasting model by F (t 1), F (t 2), . . ., or F (t k), where the subscript o
enhancing Sullivan and Woodalls Markov-based forecasting denotes the relationship or.
one to allow handling two-factor forecasting problems. This In the literature, the fuzzy relation Rij (t, t 1) is usually
model is built on the basis of the hidden Markov model (HMM), represented by a fuzzy logical relationship rule (IFTHEN
a probabilistic model that is commonly applied to time series rule), as in [5], [8][10], [16], [17], [19][22]. In this paper, the
[25]. Moreover, by applying the Monte Carlo method when fuzzy relation is realized by an HMM, which will be discussed
estimating the forecasting outcome, the nature of conjecture in the following section.
and randomness of the forecasting are made more realistic
[1]. To test the effectiveness of our model, we conduct two
experiments in forecasting daily average temperature and the III. S TOCHASTIC HMM-BASED F ORECASTING M ODEL
Taiwan Weighted Stock Index and compare the results with Under realistic circumstances, there are usually multiple
those from other models. related factors that influence the behavior and outcome of any
The remainder of this paper is organized as follows. In event. For example, when trying to predict todays temperature,
Section II, the basic concept of fuzzy time series is briefly we could easily look up and observe the clouds in the sky. If
introduced, and in Section III, the new forecasting model based there are dense clouds, it can be intuitively inferred that the
on HMM is proposed. Section IV presents a performance temperature will be low. However, temperature depends on not
evaluation of the model and a comparison of the results. The only cloud density but also temperature values in previous days.
last section describes our conclusions and directions for future We thus might obtain a better forecast for todays temperature
work. by combining knowledge about what happened in previous
days with the observed cloud state. These kinds of problems
are constantly encountered in the real world, which is why
II. F UZZY T IME S ERIES
our paper focuses on targeting them. Of course, the state of
In this section, we briefly describe the concept of fuzzy time the temperature is not merely controlled by both factors, as
series and its forecasting framework. elements such as winds and air pressure are also likely to have
The definition of fuzzy time series used in this paper was first an impact. However, in this paper, we limit ourselves to prob-
proposed by Song and Chissom [7]. lems concerning two factors, in which both are probabilistically
Definition 1: Let Y (t) (t = . . . , 0, 1, 2, . . .), a subset of R, related. This can be formally represented as follows.
be the universe of discourse on which fuzzy sets fi (t) (i = Given two fuzzy time series F (t) = {fi (t)|t =
1, 2, . . .) are defined, and let F (t) be a collection of fi (t). Then, 1, 2, . . . , T, i = 1, 2, . . . , n} and G(t) = {gi (t)|t =
F (t) is called a fuzzy time series on Y (t) (t = . . . , 0, 1, 2, . . .). 1, 2, . . . , T, i = 1, 2, . . . , m}, where fi (t) and gi (t) are
Song and Chissom employed a fuzzy relational equation to the respective states at time t, the fuzzy relation among F (t),
develop their forecasting model under the assumption that the G(t), and F (t 1) can be formulated as a fuzzy relational
observations at time t are dependent only upon the accumulated equation
results of the observations at previous times, which is defined
as follows. F (t) = (F (t 1), G(t)) R(t, t 1).
Definition 2: If, for any fj (t) F (t), where j J, there
exist an fi (t 1) F (t 1), where i I, and a fuzzy relation To solve the forecasting problem of fi (t), which is dependent
Rij (t, t 1), such that fj (t) = fi (t 1) Rij (t, t 1), let on fi (t 1) and gi (t), the theory of HMM is applied, in which
R(t, t 1) = i,j Rij (t, t 1), where is the union oper- F (t) and G(t) are the hidden and observed state sequences,
ator and is the composition. R(t, t 1) is called the fuzzy respectively.
relation between F (t) and F (t 1), which can be represented
using the following fuzzy relational equation: A. HMM
F (t) = F (t 1) R(t, t 1). HMM is a statistical model to deal with symbols or signal
sequences that are assumed to be a Markov process [25],
Definition 3: If we suppose that F (t) is caused by F (t 1), [26]. The hidden Markov process is based on two essential
F (t 2), . . ., or F (t m) (m > 0), then the first-order model assumptions: 1) The next state is dependent only upon the
of F (t) can be expressed as current state, and 2) each state-transition probability does not
vary in time, i.e., it is a time-invariant model.
F (t) = F (t 1) R(t, t 1) (1) An HMM consists of two state sets and three
matrices of probabilities. The two sets are the hidden
or state set S = {s1 , s2 , . . . , sn } and the observable state
set O = {o1 , o2 , . . . , om }, where the hidden states are
F (t) = (F (t 1) F (t 2) F (t m)) Ro (t, t m) probabilistically related to the observable states and n and m
(2) are the number of hidden and observable states, respectively.
where is the union operator and is the composition. The three relational probability matrices are defined between
R(t, t 1) is called the fuzzy relation between F (t) and F (t the observable and hidden states: , A, and B. A triple
1), and Ro (t, t k) is the fuzzy relation that joins F (t) with compact notation = (, A, B) is given to indicate the
LI AND CHENG: STOCHASTIC HMM-BASED FORECASTING MODEL FOR FUZZY TIME SERIES 1257

complete parameter set of an HMM. is a 1 n into m equal intervals, v1 , v2 , . . . , vm , with length lo being
initial state vector, denoted as = [1 , 2 , . . . , n ] = defined as lo = (1/m)[(Dmax o
+ D2o ) (Dmin
o
D1o )].
[Pr(s1,t=1 ), Pr(s2,t=1 ), . . . , Pr(sn,t=1 )], where i is the Step 2Defining the fuzzy sets on the universe of dis-
probability of each state occurring at initial time step t = 1. course and fuzzifying the time series: Given a traditional crisp
A is an n n state-transition matrix A = {aij }, in which time series, one needs a fuzzification procedure to obtain
aij is the state-transition probability from states si to sj , i.e., the corresponding fuzzy one. For hidden states, n fuzzy sets
aij = Pr(sj,t |si,t1 ) = Pr(si,t1 sj,t ). s1 , s2 , . . . , sn can be defined on U s using general membership
B is an n m confusion matrix B = {bij }, where bij is the functions, as expressed as follows:
probability of observing a state oj , given the hidden state si ,

n
i.e., bij = Pr(oj,t |si,t ). si = ij /uj (5)
The following are three major problems that the HMM j=1
solution has been successfully applied to:
1) evaluation: finding the probability of an observed se- where ij is the membership degree of si belonging to uj and
quence, given an HMM; is defined by
2) decoding: finding the sequence of hidden states that most 
1, if j = i
probably generated an observed sequence; ij = 0.5, if j = i 1 or i + 1
3) learning: generating an HMM, given a sequence of obser- 0, otherwise.
vations.
For problem 2), the Viterbi algorithm [27], [28] provides Then, for a given historical datum Yt , its membership degree
an effective way of finding the sequence of hidden states that belonging to interval ui is determined by the following heuristic
most probably generated an observed sequence, i.e., to find the rules.
single best state sequence Q = q1 q2 , . . . , qt , . . . , qT , for a given Rule 1) If Yt is located at u1 , the membership degrees are 1
observation sequence P = p1 p2 , . . . , pt , . . . , pT , where qt is a for u1 , 0.5 for u2 , and 0 otherwise.
hidden state at time t, qt S, and pt is an observable state at Rule 2) If Yt belongs to ui , 1 < i < n, then the degrees are
time t, pt O. Let t (i) be the best probability along a single 1, 0.5, and 0.5 for ui , ui1 , and ui+1 , respectively,
path at time t, which accounts for the first t observations and and 0 otherwise.
ends in state si Rule 3) If Yt is located at un , the membership degrees are 1
for un , 0.5 for un1 , and 0 otherwise.
t (i) = max Pr(q1 , q2 , . . . , qt = si , p1 , p2 , . . . , pt |). Then, Yt is fuzzified as sj , where the membership degree in
q1 ,q2 ,...,qt1
(3) interval j is maximal.
For observable states, m fuzzy sets o1 , o2 , . . . , om can be
To correctly retrieve the state sequence, the next best state is defined on U o , as expressed as follows:
obtained by
  
m

t+1 (j) = max t (i)aij bj (pt+1 ). (4) oi = ij /vj (6)


i j=1

In this paper, our objective is to perform a short-term forecast where ij is the membership degree of oi belonging to vj and
in a fuzzy time series, which fits problem 2), as described. is defined by

1, if j = i
ij = 0.5, if j = i 1 or i + 1
B. Forecasting Model
0, otherwise.
The proposed forecasting model expands Sullivan and
Woodalls model by combining HMM and Monte Carlos sim- The observation variable can then be fuzzified in the same way
ulation and consists of the following five steps. Please note that as the hidden variable.
if the given time series is a fuzzy time series, steps 1 and 2 are Step 3Modeling fuzzy logical relationships using HMM:
unnecessary. The objective of forecasting is to estimate the probabil-
Step 1Partitioning the universe of discourse into ity of hidden state si,t at time t, given the condition
several intervals of equal length: Let U s and U o be the that observable state ok,t is obtained at the same time,
discourse universes of hidden and observation variables, re- i.e., Pr(si,t |ok,t ). Following Bayes theorem, we have the
spectively. In general, U s and U o are defined as U s = [Dmin s
following:
D1 , Dmax + D2 ] and U = [Dmin D1 , Dmax + D2 ], where
s s s o o o o o
Pr(si,t ) Pr(ok,t |si,t )
s
Dmin s
, Dmax o
, Dmin o
, and Dmax are the respective minimal Pr(si,t |ok,t ) = (7)
Pr(ok,t )
and maximal values of the historical data of hidden and 
observation variables, and D1s , D2s , D1o , and D2o are proper Pr(si,t |sh,t1 ) Pr(sh,t1 ) Pr(ok,t |si,t )
positive numbers. U s is then partitioned into n equal = h .
intervals, u1 , u2 , . . . , un , with length ls being defined as Pr(ok,t )
ls = (1/n)[(Dmax s
+ D2s ) (Dmins
D1s )]. U o is partitioned (8)
1258 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

Therefore, a particular HMM can be characterized by the not exist; therefore, Pr(si,1 |ok,1 ) in (7) is derived as
following three matrices:
Pr(si,1 ) Pr(ok,1 |si,1 ) Pr(si,1 , ok,1 )
A = {aij }, where aij = Pr(sj,t |si,t1 ) Pr(si,1 |ok,1 ) = = .
Pr(ok,1 ) Pr(ok,1 )
B = {bij }, where bij = Pr(oj,t |si,t ) (13)

= {i }, where i = Pr(si,1 ). In such a way, the probability of hidden state si at t = 1 can


be determined by a function t (ok,1 ) defined as
The state-transition matrix A provides information about
the relation of two contiguous hidden states. The relation of 1 (ok,1 ) = B (:, [k])T
observation and hidden states is characterized as the confusion = [Pr(s1,1 ), Pr(s2,1 ), . . . , Pr(sn,1 )]
matrix B. is a vector with the probability of the initial state.
Now, we construct an HMM = (, A, B) whose parame- [Pr(ok,1 |s1,1 ), Pr(ok,1 |s2,1 ), . . . , Pr(ok,1 |sn,1 )]
ters are to be estimated by relative frequencies as follows. = [Pr(s1,1 ) Pr(ok,1 |s1,1 ), Pr(s2,1 ) Pr(ok,1 |s2,1 )
First, the initial state vector is set to be a 1 n matrix and . . . , Pr(sn,1 ) Pr(ok,1 |sn,1 )]
is defined as
= [Pr(ok,1 , s1,1 ), Pr(ok,1 , s2,1 ), . . . , Pr(ok,1 , sn,1 )] .
n(si,1 ) (14)
i = Pr(si,1 ) = (9)
N1
In general, the forecasting result is located at the state with
where n(si,1
) is the number of the initial state si in the data set the highest probability. However, according to the probability
and N1 = ni=1 n(si,1 ).
theorem, an event with a higher probability only has a greater
Next, the state-transition matrix A = {aij } is an n n ma-
chance of occurring but will not necessarily occur. Due to the
trix defined as follows:
nature of conjecture and randomness, a stochastic simulation, in
n(sj,t , si,t1 ) particular the Monte Carlo method, is adopted to estimate the
aij = Pr(sj,t |si,t1 ) = (10) outcome. The Monte Carlo method provides approximate solu-
n(si,t1 )
tions by stochastic sampling experiments and solves problems
 based on random numbers and probability statistics. The fore-
where, aij 0 and nj=1 aij = 1, 1 i n.
Finally, the confusion matrix B = {bij } is an n m matrix casting process consists of two subsequent tasks: normalization
represented as follows: and Monte Carlo simulation.
First, in order to compute the distribution of all possible fore-
n(oj,t , si,t ) casting outcomes, normalization is needed for the probability
bij = Pr(oj,t |si,t ) = (11) vectors of functions t (sh,t1 , ok,t ) and 1 (ok,1 ), which are
n(si,t )
expressed, respectively, as

where, bij 0 and m j=1 bij = 1, 1 i n.
Step 4Forecasting by stochastic simulation: As mentioned N t (sh,t1 , ok,t ) = [ts1 , ts2 , . . . , tsn ] (15)
earlier, the probability of si,t is determined by Pr(si,t |sh,t1 ) N 1 (ok,1 ) = [1s1 , 1s2 , . . . , 1sn ] (16)
and Pr(ok,t |si,t ), as derived in (8). The probabilities of all
possible hidden states occurring at time t (t 2) with the
where
transition influence of the previous hidden state sh,t1 and the
observation state ok,t are then computed. Such an influential
Pr(ok,t |si,t ) Pr(si,t |sh,t1 )
relation can be represented by the function t (sh,t1 , ok,t ) tsi = 
n (17)
defined as follows: Pr(ok,t |si,t ) Pr(si,t |sh,t1 )
i=1

t (sh,t1 , ok,t ) Pr(ok,1 , si,1 )


1si = 
n . (18)
= B (:, [k])T A ([h], :) Pr(ok,1 , si,1 )
i=1
= [(Pr(ok |s1 )Pr(s1,t |sh,t1 )), (Pr(ok |s2 )Pr(s2,t |sh,t1)) ,
. . . , (Pr(ok |sn )Pr(sn,t |sh,t1 ))] (12) Second, a stochastic experiment consisting of l Monte Carlo
simulations is conducted to determine the forecasting result si
where A([h], :) is the hth row of state-transition matrix A from N t (sh,t1 , ok,t ) or N 1 (ok,1 ). Such forecasting results
and B(:, [k]) is the kth column of confusion matrix B. The can be represented with a vector C
symbol of operator is an array multiplication, and thus,
A B means the element-by-element vector multiplication of C = [c1 , c2 , . . . , cn ] (19)
A and B.
For the special case when t = 1, where the observation state where ci 
is the number of forecasting hidden states belonging
is the only information available, the previous hidden state does to si and ni=1 ci = l.
LI AND CHENG: STOCHASTIC HMM-BASED FORECASTING MODEL FOR FUZZY TIME SERIES 1259

TABLE I TABLE II
T RAINING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR T RAINING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR
J UNE AND J ULY 1993 IN TAIPEI , TAIWAN AUGUST AND S EPTEMBER 1993 IN TAIPEI , TAIWAN

Step 5Defuzzifying the forecasting outputs: There are sev- TABLE III
eral defuzzification methods that can be chosen. For simplicity, T RAINING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR
J UNE AND J ULY 1994 IN TAIPEI , TAIWAN
we use the most popular one, namely, center of gravity, which
is expressed as

n
ci ti
i=1

n (20)
ci
i=1

where
1

(m1 + 0.5 m2 ), i=1
1.5
1
(0.5 mi1 + mi
ti = 2 (21)

+0.5 mi+1 ), i = 2, 3, . . . , n 1
1
1.5 (0.5 m n1 + mn ), i=n

and mi is the middle point of interval ui .

IV. E XPERIMENT AND E VALUATION


In order to demonstrate the effectiveness of the proposed
forecasting model, we conducted two experiments using real-
world data. The first experiment consisted of forecasting tem-
perature with observable cloud density in Taiwan, while the
second experiment was conducted to solve the forecasting
problem of Taiwans stock index with the observable exchange
rate of the New Taiwan dollar against the U.S. dollar. set (Tables IVI) to establish the parameter of , and that for
1996 (Tables VIIX) was used as the testing set to evaluate
forecasting performance.
A. Experiments Forecasting Temperature
Following the proposed forecasting model, the first step
With Cloud Density Data
was to determine and partition the universe of discourse. We
The time span of the data used is from June to September for partitioned temperature and cloud density into n and m even
1993, 1994, 1995, and 1996. The data were divided into two intervals u1 , u2 , . . . , un and v1 , v2 , . . . , vm , respectively. For
parts, i.e., that for 1993, 1994, and 1995 was used as the training the instance when n = 6 and m = 5, the temperature intervals
1260 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

TABLE IV TABLE VI
T RAINING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR T RAINING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR
AUGUST AND S EPTEMBER 1994 IN TAIPEI , TAIWAN AUGUST AND S EPTEMBER 1995 IN TAIPEI , TAIWAN

TABLE V TABLE VII


T RAINING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR T ESTING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR
J UNE AND J ULY 1995 IN TAIPEI , TAIWAN J UNE 1996 IN TAIPEI , TAIWAN , AND F ORECASTED T EMPERATURE

(mild), s5 = (warm), and s6 = (hot), and the linguistic ob-


servable fuzzy sets were o1 = (very low), o2 = (low), o3 =
were u1 = [21, 23), u2 = [23, 25), u3 = [25, 27), u4 = [27, 29), (medium), o4 = (high), o5 = (very high). With the fuzzy sets
u5 = [29, 31), and u6 = [31, 33], and the cloud density in- being properly defined, the raw time series could be fuzzified.
tervals were v1 = [0, 20), v2 = [20, 40), v3 = [40, 60), v4 = For example, the temperature for June 2, 1993, was 25.3 C.
[60, 80), and v5 = [80, 100]. It was located at interval u3 , and the membership degree
Next, the fuzzy sets were defined, and the time series was vector was June 2,1993,s = [0, 0.5, 1, 0.5, 0, 0]. The maximum
fuzzified using (5). The linguistic hidden fuzzy sets were membership degree appeared at u3 , with the corresponding
defined as s1 = (freezing), s2 = (cold), s3 = (cool), s4 = fuzzy set being s3 . On the other hand, the cloud density for
LI AND CHENG: STOCHASTIC HMM-BASED FORECASTING MODEL FOR FUZZY TIME SERIES 1261

TABLE VIII TABLE X


T ESTING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR T ESTING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR
J ULY 1996 IN TAIPEI , TAIWAN , AND F ORECASTED T EMPERATURE S EPTEMBER 1996 IN TAIPEI , TAIWAN , AND F ORECASTED T EMPERATURE

TABLE IX The state-transition matrix A = {aij } was a 6 6 matrix


T ESTING DATA OF AVERAGE T EMPERATURE AND C LOUD D ENSITY FOR resulting from (10)
AUGUST 1996 IN TAIPEI , TAIWAN , AND F ORECASTED T EMPERATURE

0.2500 0.2500 0.5000 0 0 0
0.1053 0.5263 0.3684 0 0 0

0 0.1379 0.5345 0.3103 0.0172 0
A= .
0 0.0174 0.1478 0.5217 0.3130 0

0 0 0 0.2230 0.7095 0.0676
0 0 0.0526 0.1579 0.3158 0.4737

The confusion matrix B = {bij } was a 6 5 matrix and was


obtained from (11)

0 0 0 0 1.0000
0.1429 0.0952 0.0952 0.0476 0.6190

0.0690 0.0690 0.0690 0.1724 0.6207
B= .
0 0.0259 0.1034 0.4138 0.4569

0.0270 0.1014 0.3514 0.3649 0.1554
0.1053 0.1579 0.4211 0.2105 0.1053

To forecast the temperature of June 1, 1996, on which the


cloud density at the same day was o5 , we obtained from (14)

June 1,1996 (o5,June 1,1996 )


June 2, 1993, was 96 and belonged to interval v5 , and the = B (:, [5])T
membership degree vector was June 2,1993,o = [0, 0, 0, 0.5, 1].
The fuzzy set was o5 because the maximum membership degree = [0.3333, 0, 0, 0.6667, 0, 0]
was at v5 . All the fuzzified results of the temperature and cloud [1, 0.6190, 0.6207, 0.4569, 0.1554, 0.1053]
density from 1993 to 1996 are shown in Tables IX.
In the third step, the fuzzy logical relationships existing in the = [0.3333, 0, 0, 0.304752, 0, 0]
fuzzy time series were modeled by = (, A, B) by counting
the frequency that hidden states occurred, as shown in (9) and the normalized vector was

= [0.3333, 0, 0, 0.6667, 0, 0]. N June 1,1996 (o5,June 1,1996 ) = [0.522371, 0, 0, 0.477629, 0, 0].
1262 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

We executed the Monte Carlo simulation 100 (l = 100) times of which hidden states 5, 84, and 11 belonged to s4 , s5 , and s6 ,
on the probability vector N June 1,1996 (o5,June 1,1996 ) and ob- respectively.
tained 100 stochastic forecasting hidden states, represented by Finally, the crisp forecasting output was calculated by car-
vector C rying out defuzzification, as defined in (20) (see the equation
shown at the bottom of the page).
C = [53, 0, 0, 47, 0, 0] All the forecasting results from June to September 1996 are
illustrated in Tables VIIX.
of which 53 and 47 hidden states belonged to s1 and s4 , With the forecasting results, we then evaluated the perfor-
respectively. Finally, the crisp forecasting output was calculated mance of the proposed fuzzy time series model by predicting
by carrying out defuzzification, which is defined in (20) (see the the temperature and comparing it with previous models. The
equation shown at the bottom of the page). forecasting accuracy was measured in terms of the mean square
Taking forecasting the temperature for June 4, 1996, error (mse)
as another example, the cloud density on that day
N
was o2 , and the temperature state for June 3 was s5 . i=1 (F orecasting_V aluei Actual_V aluei )2
Because the second column of confusion matrix B was mse = .
N
B(:, [2])T = [0, 0.0952, 0.0690, 0.0259, 0.1014, 0.1579] and
the fifth row of state-transition matrix A was A([5], :) = [0, 0, The performance comparison was further conducted using
0, 0.2230, 0.7095, 0.0676], the probability vector of hidden the average forecasting error percentage (AFEP), which is
states for June 4, 1996, could be calculated from (12) defined as follows:

June 4,1996 (s5,June 3,1996 , o2,June 4,1996 )


AF EP
T
= B (:, [2]) . A ([5], :)
1  |F orecasting_V alue Actual_V alue|
N
= 100%.
= [0, 0, 0, 0.000598, 0.025279, 0.0045]. N i=1 Actual_V alue

This was further normalized to be


In order to test the superiority of the proposed model, we
compared its performance with that of Lees model [19], which
N June 4,1996 (s5,June 3,1996 , o2,June 4,1996 )
is reported as having the best performance in the current lit-
= [0, 0, 0, 0.019695, 0.832164, 0.148141]. erature. To make the comparison fair, the same intervals and
factors were used as that in Lees model. For Lees model,
It should be noted that the temperature for June 4, 1996, the previous temperature state [F (t 1)] and the previous
had a greater chance of being s5 (warm), compared to s4 and cloud density state [G(t 1)] are the main and second factors,
s6 , when the cloud density was high (o2 ) on that day and the respectively. In the proposed stochastic HMM-based model,
temperature for the previous day was warm (s5 ). F (t 1) and G(t) are considered as the hidden and observable
Next, we repeated the sampling experiments 100 states, respectively. The current temperature state [F (t)] is the
(l = 100) times with the Monte Carlo simulation with forecasting target of both models. Tables XI and XII illustrate
N June 4,1996 (s5,June 3,1996 , o2,June 4,1996 ) and obtained 100 the results of Lees model with orders one to eight and that
stochastic forecasting hidden states, represented by vector C of our model using 100 Monte Carlo simulations, in terms
of mse and AFEP based on two different interval partitions,
C = [0, 0, 0, 5, 84, 11] namely, n = 6 and m = 5, and n = 12 and m = 10. The results


n
ci ti
i=1 53 22.6667 + 0 24 + 0 26 + 47 28 + 0 30 + 0 31.333

n = = 25.1733
100
ci
i=1


n
ci ti
i=1 0 22.6667 + 0 24 + 0 26 + 5 28 + 84 30 + 11 31.3333

n =
100
ci
i=1

= 30.0467
LI AND CHENG: STOCHASTIC HMM-BASED FORECASTING MODEL FOR FUZZY TIME SERIES 1263

TABLE XI
P ERFORMANCE C OMPARISON B ETWEEN THE P ROPOSED M ODEL AND
L EE S M ODEL (n = 6, m = 5)

Fig. 1. Trend of the Taiwan Weighted Stock Index for 2005, 2006, and 2007.
TABLE XII
P ERFORMANCE C OMPARISON B ETWEEN THE P ROPOSED M ODEL AND
L EE S M ODEL (n = 12, m = 10)

TABLE XIII Fig. 2. Trend of the exchange rate of the New Taiwan dollar against the U.S.
I MPACT OF THE N UMBER OF I TERATIONS OF M ONTE C ARLO S IMULATION dollar for 2005, 2006, and 2007.
ON F ORECASTING P ERFORMANCE (n = 6, m = 5)
TABLE XV
P ERFORMANCE C OMPARISON OF THE TAIWAN
W EIGHTED S TOCK I NDEX (n = 6, m = 6)

TABLE XIV
C OMPARISON OF THE F ORECASTING ACCURACY OF THE P ROPOSED
M ODEL W ITH VARIOUS M ONTE C ARLO S IMULATION T IMES
(n = 12, m = 10)

B. Experiments Forecasting the Taiwan Weighted Stock Index


With Exchange Rate Data
In the second experiment, we applied the proposed model
to forecast the Taiwan Weighted Stock Index (hidden state)
using data from the exchange rate of the New Taiwan dollar
against the U.S. dollar (observable state) collected from 2005 to
show that the proposed model outperformed Lees in all cases 2007, as shown in Figs. 1 and 2. The data set was divided into
and that it was also better than the classic HMM, which uses the training set covering each January to October from 2005
maximum probability to decide the forecasting result (denoted to 2007 to build up the stochastic HMM model, with the test
as HMM-based in the table). This demonstrates that our set including each November to December from 2005 to 2007
approach is not only better at reflecting real-life situations but for forecasting. Table XV illustrates the yearly comparison of
also achieves superior performance results. the proposed stochastic HMM model with Lees and traditional
In order to investigate the sensitiveness of the forecasting HMM models, in which six intervals (n = 6, m = 6) were used
result to the number of iterations of the Monte Carlo simulation, for both factors and all models. It shows that the proposed
we tested several simulation runs, from 30 to 300 times, with model obtained better performance in terms of both mse and
two different interval partitioning sets. The outcomes shown in AFEP. It can be seen that the forecasting results of all the
Tables XIII and XIV confirm that our model achieves better models were comparatively worse in 2007, and this is because
results, compared to Lees model, in all numbers of simulation the yearly variation (range = 2465.32) was much lager than
runs. In addition, our results adhere to the central limit theorem, those in 2005 (942.56) and 2006 (1569.92). Fig. 3 shows the
and thus, the sampling distribution will have the same mean scatter diagrams of the predicted and actual indexes using
as that of the population for large sample sizes (N 30), the proposed stochastic HMM and traditional HMM models,
confirming that the proposed model is indeed more accurate showing a strong positive relationship between the two fore-
than others in the literature. casting indexes. However, our model demonstrated a stronger
1264 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

Fig. 3. Correlation among predicted and actual stock indexes. (a) November to December of 2005. (b) November to December of 2006. (c) November to
December of 2007.
LI AND CHENG: STOCHASTIC HMM-BASED FORECASTING MODEL FOR FUZZY TIME SERIES 1265

TABLE XVI R EFERENCES


I MPACT OF THE N UMBER OF I TERATIONS OF M ONTE C ARLO S IMULATION
ON F ORECASTING P ERFORMANCE (n = 6, m = 6) FOR THE TAIWAN [1] B. Mller and U. Reuter, Uncertainly Forecasting in Engineering.
W EIGHTED S TOCK I NDEX Berlin, Germany: Springer-Verlag, 2007.
[2] B. Mller and U. Reuter, Prediction of uncertain structural responses
using fuzzy time series, Comput. Struct., vol. 86, no. 10, pp. 11231139,
May 2008.
[3] D. Hareter, Time series analysis with non-precise dataPart I, in Proc.
9th Spec. Conf. Probabilistic Mech. Struct. Reliability, 2004. [Online].
Available: http://www.cfd.sandia.gov.PMCpostconf.html
[4] D. Hareter, Time series analysis with non-precise dataPart II, in Proc.
9th Spec. Conf. Probabilistic Mech. Struct. Reliability, 2004. [Online].
Available: http://www.cfd.sandia.gov.PMCpostconf.html
[5] Q. Song and B. S. Chissom, Forecasting enrollments with fuzzy time
seriesPart I, Fuzzy Sets Syst., vol. 54, no. 1, pp. 19, Feb. 1993.
relationship with the correlation coefficients of r = 0.968 (p < [6] S. F. Bocklisch and M. Pssler, Fuzzy time series analysis, in Advances
in Soft ComputingFuzzy Control, R. Hampel, M. Wagenknecht, and
0.01), r = 0.971 (p < 0.01), and r = 0.939 (p < 0.01) for the N. Chaker, Eds. Berlin, Germany: Physica-Verlag, 2000, pp. 331345.
years 2005, 2006, and 2007, respectively, as compared to the [7] Q. Song and B. S. Chissom, Fuzzy time series and its models, Fuzzy
traditional HMM model, with correlation coefficients of r = Sets Syst., vol. 54, no. 3, pp. 269277, Mar. 1993.
[8] Q. Song and B. S. Chissom, Forecasting enrollments with fuzzy time
0.962 (p < 0.01), r = 0.959 (p < 0.01), and r = 0.939 (p < seriesPart II, Fuzzy Sets Syst., vol. 62, no. 1, pp. 18, Feb. 1994.
0.01). Table XVI shows the forecasting performance according [9] S.-M. Chen, Forecasting enrollments based on fuzzy time series, Fuzzy
to different number of iterations of the Monte Carlo simulation Sets Syst., vol. 81, no. 3, pp. 311319, Aug. 1996.
[10] S.-M. Chen and C.-C. Hsu, A new method to forecast enrollments using
from 30 to 300, which all confirmed the superiority of the fuzzy time series, Int. J. Appl. Sci. Eng., vol. 2, no. 3, pp. 234244,
proposed model over Lees and traditional HMM models. 2004.
[11] Y.-Y. Hsu, S.-M. Tse, and B. Wu, A new approach of bivariate fuzzy
time series analysis to the forecasting of a stock index, Int. J. Un-
certain. Fuzziness Knowl.-Based Syst., vol. 11, no. 6, pp. 671690,
V. C ONCLUSION AND D IRECTIONS FOR F UTURE W ORK 2003.
[12] K. Huarng, Heuristic models of fuzzy time series for forecasting, Fuzzy
In this paper, we have proposed a novel stochastic fore- Sets Syst., vol. 123, no. 3, pp. 369386, Nov. 2001.
casting model for fuzzy time series by extending Sullivan and [13] J. Sullivan and W. H. Woodall, A comparison of fuzzy forecasting
Woodalls Markov-based model. It was built upon an HMM in and Markov modeling, Fuzzy Sets Syst., vol. 64, no. 3, pp. 279293,
Jun. 1994.
which the fuzzy relationships were formulated as state transi- [14] R.-C. Tsaur, J.-C. Yang, and H.-F. Wang, Fuzzy relation analysis in fuzzy
tions so that it can handle two-factor forecasting problems. In time series model, Comput. Math. Appl., vol. 49, no. 4, pp. 539548,
order to allow the model to more effectively reflect the real- Feb. 2005.
[15] K.-H. Huarng, T. H.-K. Yu, and Y. W. Hsu, A multivariate heuristic model
world situation and randomness of forecasting, Monte Carlo for fuzzy time-series forecasting, IEEE Trans. Syst., Man, Cybern. B,
simulation was applied to estimate the stochastic outcome. Cybern., vol. 37, no. 4, pp. 836846, Aug. 2007.
The computations involved in forecasting were simple matrix [16] S.-M. Chen, Forecasting enrollments based on high-order fuzzy time
series, Cybern. Syst., Int. J., vol. 33, no. 1, pp. 116, Jan. 2002.
operations, and thus, our model is more efficient than other [17] S.-M. Chen and J.-R. Hwang, Temperature prediction using fuzzy time
IFTHEN-based models. We used the forecasting problem of series, IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 30, no. 2,
daily average temperature and cloud density in Taipei, Taiwan, pp. 263275, Apr. 2000.
[18] J.-R. Hwang, S.-M. Chen, and C.-H. Lee, Handling forecasting problems
as the benchmark and conducted performance comparisons using fuzzy time series, Fuzzy Sets Syst., vol. 100, no. 13, pp. 217228,
with other models. The results demonstrated the superiority of Nov. 1998.
our model in forecasting. Moreover, the proposed probabilistic [19] L.-W. Lee, L.-H. Wang, S.-M. Chen, and Y.-H. Leu, Handling forecasting
problems based on two-factors high-order fuzzy time series, IEEE Trans.
forecasting model adheres to the central limit theorem, proved Fuzzy Syst., vol. 14, no. 3, pp. 468477, Jun. 2006.
by an experiment of sensitiveness, and thus, the forecasting [20] L.-W. Lee, L.-H. Wang, and S.-M. Chen, Temperature prediction
results statistically approximate to the real mean of the target and TAIFEX forecasting based on fuzzy logical relationships and ge-
netic algorithms, Expert Syst. Appl., vol. 33, no. 3, pp. 539550,
value being forecast. However, there are some limitations with Oct. 2007.
our model. First, it is developed based on HMM, in which the [21] S.-T. Li and Y.-C. Cheng, Deterministic fuzzy time series model for
observed and hidden states are probabilistically related, which, forecasting enrollments, Comput. Math. Appl., vol. 53, no. 12, pp. 1904
1920, Jun. 2007.
of course, does not hold for all problems. The other shortcom- [22] S.-T. Li and Y.-C. Cheng, An enhanced deterministic fuzzy time series
ing is that when more zeros occur in state-transition matrix forecasting model, Cybern. Syst., Int. J., vol. 40, no. 3, pp. 211235,
Apr. 2009.
A and confusion matrix B due to the less fuzzy relationships [23] H.-K. Yu, A refined fuzzy time-series model for forecasting, Phys. A,
existing in the historical temporal data, this tends to lead to the vol. 346, no. 3/4, pp. 657681, Feb. 2005.
hidden state being forecast as a zero probability vector. Dealing [24] T. A. Jilani, S. M. Aqil Burney, and C. Ardil, Multivariate high order
fuzzy time series forecasting for car road accidents, in Proc. World Acad.
with these limitations is of crucial importance for future work. Sci. Eng. Technol., Jan. 2007, vol. 21, pp. 288293.
Other interesting directions include enhancing the proposed [25] L. R. Rabiner, A tutorial on hidden Markov models and selected appli-
model so that it is high order and/or time variant to improve cations in speech recognition, Proc. IEEE, vol. 77, no. 2, pp. 257286,
Feb. 1989.
its forecasting power. [26] Y. Li, Hidden Markov models with states depending on observations,
Pattern Recognit. Lett., vol. 26, no. 7, pp. 977984, May 2005.
[27] G. D. Forney, The Viterbi algorithm, Proc. IEEE, vol. 61, no. 3,
ACKNOWLEDGMENT pp. 268278, Mar. 1973.
[28] A. J. Viterbi, Error bounds for convolutional codes and an asymptotically
The authors would like to thank J.-Y. Li for his help with the optimal decoding algorithm, IEEE Trans. Inf. Theory, vol. IT-13, no. 2,
experimentation. pp. 260269, Apr. 1967.
1266 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

Sheng-Tun Li (M94) received the B.S. and M.S. Yi-Chung Cheng received the B.S. degree in busi-
degrees in computer engineering from Tamkang ness mathematics from Soochow University, Taipei,
University, Tamsui, Taipei County, Taiwan, and the Taiwan, the M.S. degree in statistics from National
Ph.D. degree in computer science from the Univer- Central University, Jhongli City, Taiwan, and the
sity of Houston, University Park, TX. Ph.D. degree in industrial and information manage-
He is currently a Professor with the Institute of ment from National Cheng Kung University, Tainan,
Information Management and the Department of Taiwan, in 1985, 1989, and 2008, respectively.
Industrial and Information Management, National She is currently an Associate Professor with the
Cheng Kung University, Tainan, Taiwan. He is an Department of International Business Management,
author or coauthor of five books, over 50 journal Tainan University of Technology, Yongkang, Tainan,
articles, and numerous conference papers. He is a Taiwan. Her research interests include time series
holder of one patent. His research interests include knowledge engineering, data forecasting and data mining.
mining, knowledge management system, and soft computing.
Dr. Li is a Member of IEEE.

You might also like