Professional Documents
Culture Documents
Abstract—In spatiotemporal data commonly encountered in ge- clustering techniques and is inherently associated with some
ographical systems, biomedical signals, and the like, each datum underlying objective function. To cope with the specificity of
is composed of features comprising a spatial component and a the spatiotemporal data, the generic objective function of the
temporal part. Clustering of data of this nature poses challenges,
especially in terms of a suitable treatment of the spatial and tem- FCM requires a thorough examination and revision of its for-
poral components of the data. In this study, proceeding with the mulation. In this paper, we introduce a concept and offer the
objective function-based clustering (such as, e.g., fuzzy C-means), ensuing algorithmic developments by using the generic FCM
we revisit and augment the algorithm to make it applicable to spa- algorithm (although the main line of thought is equally valid for
tiotemporal data. An augmented distance function is discussed, any objective function-based clustering). The crux of the method
and the resulting clustering algorithm is provided. Two optimiza-
tion criteria, i.e., a reconstruction error and a prediction error, are is to effectively handle the data reflecting the spatial and tempo-
introduced and used as a vehicle to optimize the performance of ral facet of the problem (data) in order to preserve the essence of
the clustering method. Experimental results obtained for synthetic the problem. For this purpose, we revisit the distance function
and real-world data are reported. and augment the “standard” Euclidean distance. Equally impor-
Index Terms—Fuzzy clustering, reconstruction and prediction tant is the fact that the augmented distance is endowed with a
criteria, spatiotemporal data, weather data. substantial level of flexibility so that the contributions coming
from the temporal and spatial parts of the data could be carefully
I. INTRODUCTION balanced and optimized. The resulting flexibility is exploited to
minimize two performance indexes, namely, a reconstruction
IVEN the unprecedented growth of spatiotemporal data
G encountered in different application domains such as, e.g.,
geography, climatology, and health surveillance systems, their
error or a prediction error. To deal with the reconstruction error
is essential when assessing the quality of clusters—information
granules and quantifying their role being played in the processes
analysis has become more important and practically relevant. of information granulation and de-granulation. The prediction
In spatiotemporal data, each data point is composed of two aspects are of interest when forecasting a temporal component
parts, namely, a spatial component, typically denoting its loca- of the data given their specific location (spatial information).
tion (say, x − y or latitude–longitude coordinates), and temporal Interestingly enough, the objective function of the FCM algo-
part, comprising one or more time series associated with the spa- rithm has been subject to various modifications in order to cope
tial coordinates. Daily average temperature recorded at different with the specificity of the problem. In [19], by adding a gain
weather stations, number of disease cases reported in different field, the FCM objective function has been reformulated and
cities in a monthly period, and hourly air pollution recordings optimized in an iterative fashion for segmentation and classifi-
are examples of this kind of data. cation of M-FISH images to detect chromosomal abnormalities
Clustering of spatiotemporal data reveals interesting struc- and support a genetic disease diagnosis. In [62], a fuzzy clus-
tures that could be used in different applications. The fuzzy tering approach for data points comprising various object types
C-means (FCM) algorithm [9] is one of the commonly used was proposed by reformulating the FCM objective function and
optimizing a constrained optimization problem. A membership
matrix and a ranking matrix have been employed in the opti-
Manuscript received January 3, 2012; revised May 23, 2012 and September
mization procedure, where the membership matrix comprises
8, 2012; accepted November 6, 2012. Date of publication December 11, 2012; membership degrees of objects to clusters, while the ranking
date of current version October 2, 2013. This work was supported in part by the matrix measures how representative an object is in comparison
Alberta Innovates—Technology Futures and Alberta Advanced Education and
Technology, the Natural Sciences and Engineering Research Council of Canada,
with other objects in various clusters. In [61], a general def-
and the Canada Research Chair Program. inition of distance functions that preserve the applicability of
H. Izakian is with the Department of Electrical and Computer Engineering, the centroid-based alternating optimization in FCM is provided.
University of Alberta, Edmonton, AB, Canada, T6G 2V4 (e-mail: izakian@
ualberta.ca).
They showed that any distance function that can be used in
W. Pedrycz is with the Department of Electrical and Computer Engineering, the FCM algorithm is an instance of the generalized point-to-
University of Alberta, Edmonton, AB, Canada, T6G 2V4, with the Department centroid distance and can be derived by a differentiable convex
of Electrical and Computer Engineering Faculty of Engineering, King Abdu-
laziz University, Jeddah 21589, Kingdom of Saudi Arabia, and with the Sys-
function. In addition, in [49], some methods and guidelines to
tem Research Institute, Polish Academy of Sciences, Warsaw 00-716, Poland design collaborative fuzzy clustering algorithms for clustering
(e-mail: wpedrycz@ualberta.ca). distributed data among different data sites were developed.
I. Jamal is with AQL Management Consulting Inc., Edmonton, AB, Canada,
T6J 2R8 (e-mail: iqbaljamal@aqlmc.com).
This study is organized as follows. We start with a
Digital Object Identifier 10.1109/TFUZZ.2012.2233479 brief review of the research being reported so far. The two
fundamental concepts being of essential relevance in the context employed to define the spatiotemporal neighborhood. In [44],
of the study, that is, a representation of time series and quantify- an extended version of FCM was proposed for image segmen-
ing distance between time series are discussed. In Section III, we tation by considering the spatial location of pixels. This method
introduce spatiotemporal clustering and formulate the ensuing has been considered by Coppi et al. [59] for clustering spa-
optimization problem. In Section IV, two performance indexes tiotemporal data. In this approach, a spatial penalty term that
(evaluation criteria) casting the clustering results in the setting of was calculated using a spatial contiguity matrix has been added
reconstruction and prediction problems are investigated. In Sec- to the objective function to guarantee an approximate spatial
tions V and VI, experimental results dealing with synthetic data homogeneity of the clusters.
and real-world problems are reported. Conclusions are covered Trajectories capture the movement behavior of a set of spa-
in Section VII. tial objects in the form of time series. When the most recent
position of the objects is available, the data are called moving
objects data. Clustering of this kind of data aims to discover
II. CLUSTERING SPATIOTEMPORAL DATA—A FOCUSED a behavior of a collection of objects, e.g., those occurring in
LITERATURE REVIEW urban traffic or animals’ migration. In [15], the Euclidean dis-
In real-world applications, we encounter different kinds of tance between trajectories was used as a dissimilarity measure,
spatiotemporal data. Kisilevich et al. [51] divided spatiotem- whereas OPTICS [14] has been extended to cluster trajectories.
poral data into five categories including spatiotemporal events, Two methods, trajectory-OPTICS and a time-focused version
georeferenced variables, georeferenced time series, moving ob- of that (called TF-OPTICS) were proposed. In [13], a proba-
jects, and trajectories. bilistic regression model for trajectory detection was proposed
In spatiotemporal event data, there is a set of events, each and expected maximization algorithm [12] has been employed
occurred in a spatial location and coming with its timestamp. to model trajectories. Kalnis et al. [47] proposed algorithms to
Clustering this type of data aims to find a set of events that discover moving clusters in spatiotemporal data. In these meth-
are close to each other in both space and time. One of the ods, the set of objects of a moving cluster change over time.
commonly used methods for clustering these types of data is At each time step, the location of objects has been considered
scan statistics [52], [53]. In this method, one moves a cylindri- as a snapshot and a spatial clustering method like DBSCAN
cal window of variable size and shape, across a geographical was used for clustering. Two snapshot clusters in consecutive
region to detect clusters of events with the highest likelihood time steps were considered as moving clusters if a value of their
ratios. In [54], an extended version of FCM has been proposed Jaccard coefficient exceeds a certain threshold. A fuzzy cluster-
to find circular clusters of hotspots in spatiotemporal geograph- ing for three-way data was proposed in [40]. In this structure,
ical information system data. For each timestamp, the events are each data point was composed of objects, attributes, and situa-
clustered based on their spatial location, and then, a comparison tions. The data are clustered based on not only individual time
between occurred clusters in consecutive time stamps has been instances, but in addition, the similarity between structures has
performed to conclude some interpretations about events. Wang been considered in different time steps. A survey of clustering
et al. [55] proposed two spatiotemporal clustering methods, spatiotemporal data is reported in [51].
which are called ST-GRID and ST-DBSCAN, to detect seis-
mic events in China and neighboring countries. The ST-GRID
method used a multidimensional grid that covers the entire spa-
tiotemporal feature space. Then, by merging the dense neighbor A. Time-Series Representation Methods
cells, spatiotemporal clusters were formed. ST-BDSCAN ex- Time series have been investigated in a variety of problems
tended DBSCAN [56] by redefining density reachability using of data mining such as clustering [36], [39], classification [8],
spatial and temporal radius. Both methods exploited an ordered [45], [46], forecasting [42], [43], [60], and modeling [38], [41].
k-dist graph [56] to determine their parameters. Based on the type of data being used, the methods of time-
Georeferenced time series are composed of a set of fixed series clustering can be split into three categories [16], [27],
geographical coordinates, each corresponding to one or more namely those using raw time-series data [32]–[34], model-
time series. Georeferenced variables data form a special case based methods [24], [35], [37], and representation-based meth-
of georeferenced time series where only the most recent point ods [16], [28]–[30].
of time series is available. Clustering this type of data aims There are a number of methods proposed in the literature
to group objects based on their spatial closeness and temporal to represent time series. In general, such representation meth-
similarities. In [57], FCM has been used to cluster weather time ods are categorized into data-adaptive and non-data-adaptive
series. The Pearson correlation coefficient was employed as the methods [17], [18], [20]. Adaptive piecewise constant approxi-
similarity measure expressing closeness of two time series and mation [18], piecewise linear approximation [22], singular value
a method to determine the number of clusters has been pro- decomposition [6], and symbolic aggregate approximation [17]
posed. However, the method does not involve the spatial part are examples of data-adaptive methods. Discrete Fourier trans-
of data in the clustering process. Deng et al. [58] proposed a form (DFT) [1], Chebyshev polynomials [21], discrete wavelet
density-based spatiotemporal clustering. In this method, a spa- transform (DWT) [3], [4], and piecewise aggregate approxima-
tial proximate network has been constructed using Delaunay tion (PAA) [2] are well-known methods belonging to the second
triangulation and a spatiotemporal autocorrelation analysis was category.
IZAKIAN et al.: CLUSTERING SPATIOTEMPORAL DATA: AN AUGMENTED FUZZY C-MEANS 857
In this paper, we use three commonly studied methods to where a= [a0 , a1 , . . . , aN /2−1 ]T are scaling coefficients, and f
represent time series, namely, DFT, PAA, and DWT. They can be = [f0 , f1 , . . . , fN /2−1 ]T are wavelet coefficients present at the
viewed as sound representatives of the large set of the methods first level. To calculate the wavelet coefficients at the next level,
existing in the literature. In what follows, we review them very the aforementioned calculations are performed over the scaling
briefly. coefficients a. The procedure is recursive until the required
1) Discrete Fourier Transform: The DFT models the time number of iteration has been reached. For each wavelet function,
series using a set of sine and cosine waves. It represents the there are a number of nonzero coefficients. For example, for the
time series in a frequency domain. For a time series y of length Haar function, the nonzero coefficients are c0 = c1 = 1.
N , DFT is composed of N complex numbers, each describing One has to stress that the representation method of time se-
a sine/cosine wave given by ries is problem-dependent. For example, one may be interested
to analyze time series based on their frequency characteristics
N −1
1 (using DFT), time characteristics (where PAA could be of in-
fk = √ yi exp(−j2πki/N ) k = 0, 1, . . . , N − 1
N i=0 terest), or time–frequency joint characteristics (DWT). In this
(1) paper, we used these three representation methods in clustering
√
where j = −1. The original time series can be reconstructed time-series data.
by running an inverse transform given by
B. Distance Functions
N −1
1 Distance functions (distances, for brief) used in time series
yi = √ fk exp(j2πki/N ) i = 0, 1, . . . , N − 1 (2)
N k =0 can be divided into three general categories: Lp −norm dis-
tances, elastic measures, and statistical measures. Euclidean dis-
Faloutsos et al. [1] employed DFT to index time series. They tance L2 has been widely used as a dissimilarity measure [20]
noted that the most important features of each sequence are the and is suitable to compare equal-length time series. Dynamic
first k (real and imaginary) coefficients (k << N ) of the DFT time warping distance [7] is an elastic measure used to deter-
transform, while the other coefficients assume values close to mine an optimal match between two time series by stretching or
zero. By having these k coefficients, the original time series can compressing their segments, and concentrates on the similarity
be reconstructed with a little loss of information. of time series with respect to their shapes. Longest common sub-
2) Piecewise Aggregate Approximation: This method pro- sequence [25] is another example of the elastic-based distance
vides a simple and efficient way of time-series representation in measures. This method uses the length of the longest subse-
time domain offering a substantial dimensionality reduction [2]. quence occurring in two time series to quantify their similarity.
PAA divides the time series y into k (k << N ) segments of In addition, an edit distance of real-number sequences [48],
equal length and determines the mean value of data points lying which is another elastic-based distance measure, considers the
within each segment as the representatives of the original time number of insert, delete, and replace operations that are required
series. More formally, we have the representation in the form of to convert one sequence to another to express the similarity.
a vector f whose coordinates are expressed as follows: Pearson coefficient is a statistics-based method that is used to
N
quantify the correlation between two time series. The Kullback–
k k
(i+1)−1
Liebler distance [24] is another statistical measure useful in ex-
fi = yj , i = 0, 1, . . . , k − 1. (3) pressing the dissimilarity between two time series represented
N
j = Nk i by their Markov chain. A comparison between a number of rep-
resentation methods and similarity measures used for various
3) Discrete Wavelet Transform: Wavelets are basis functions
types of time series was reported in [20] in the problem of in-
that describe time series in a time–frequency joint representa-
dexing time series. The suitability of each similarity measure is
tion. In [3] and [4], DWT is used as an efficient representation
application-oriented. Nevertheless, the Euclidean distance is in
method to index time-series data. A suitable method to calculate
common usage.
the DWT coefficients is a pyramid algorithm [5]. In this method,
the length of time series N has to be a power of two. For time
III. CONCEPT OF CLUSTERING OF SPATIOTEMPORAL DATA
series that do not satisfy this condition, zero padding is realized.
DWT converts the time series into two types of coefficients re- In clustering spatiotemporal data, we assume that there are n
sulting from low-pass filters (also called scaling function) and data x1 , x2 , . . . , xn , each comprising its spatial and temporal
high-pass filters (also called wavelet function), each with length components. The ith data xi is represented as a concatenation
T
N /2, given by of its spatial and temporal parts, namely, xi = [xi (s)|xi (t)] ,
where xi (s) is the spatial part of xi , while xi (t) denotes the
N −1
1 N temporal part (or its representation) of the same data point. By
ai = c2i−j +1 yj , i = 0, 1, . . . , −1 (4)
2 j =0 2 considering r features in the spatial part and q features in the
temporal one, we have
N −1
1 N xi = [xi (s)|xi (t)]T = [xi1 (s), . . . , xir (s)|xi1 (t), . . . , xiq (t)]T .
fi = (−1)j cj −2i yj , i = 0, 1, . . . , −1 (5)
2 j =0 2 (6)
858 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013
c
n
J= um 2
ik d (vi , xk ) (7)
i=1 k =1
and
1 (xk j (t) − x̂k j (t))
q 2
and σj2 is the variance of jth feature. Given that commonly the
spatial part and the temporal part are expressed in spaces of
very different dimensionalities (typically r << q), in these two,
we use the normalized Euclidean distances in order to avoid
any bias toward any particular component of the distance. The
reconstruction error E(λ) is a function of λ and its minimum is
determined by a systematic sweeping through a certain range of
the values of λ. This approach, instead of any more sophisticated
1-D search, is considered because learning about the form of this
index as a function of λ is also of interest.
B. Prediction Criterion
The essence of the PC is to “predict” the temporal component
of the data by using the available spatial structure. Since each
Fig. 2. Synthetic spatiotemporal data. (a) Spatial component, (b) temporal
data point is composed of the spatial and the temporal parts, component of more distinguishable dataset, and (c) temporal component of less
the cluster centers (prototypes) are composed of the spatial part distinguishable dataset.
v(s), and temporal part v(t) as well. Using the spatial part of
data along with the spatial part of the calculated cluster centers,
we form a new partition matrix, which is denoted by Ũ , as
follows [11]:
1
ũik =
c 2/(m −1) . (17)
v i (s)−x k (s)
j =1 v j (s)−x k (s)
With the use of this new partition matrix and the temporal part
of the cluster centers v(t), we minimize the following sum of
distances:
c n
2
F = ik vi (t) − x̂k (t)
ũm (18)
i=1 k =1
where x̂k (t) is the predicted temporal part of the kth data. By
zeroing the gradient of F with respect to x̂k (t), we have Fig. 3. (a) Selected time series and its representations with the use of
c m
(b) DFT(32), (c) PAA(32), and (d) DWT(32).
i=1 ũik vi (t)
x̂k (t) = c m . (19)
i=1 ũik
groups associated with four categories of time series of length
The quality of prediction is evaluated using the following pre- of 256 samples. We considered two scenarios. In the first one,
diction error: Fig. 2(b), the time series are clearly distinguishable, while those
n
2 1 (xk j (t) − x̂k j (t))
n q 2
shown in Fig. 2(c) exhibit a significant level of overlap (less
E(λ) = xk (t) − x̂k (t) = . distinguishable data). The generated time series in these figures
q j =1
σj2
k =1 k =1 are a kind of increasing and decreasing time series encountered
(20)
in control chart patterns [50].
It takes on a form of the sum of the normalized Euclidean dis-
In Fig. 3, we presented one of the time series along with its
tances between the temporal part of the data and the predicted
corresponding representations, namely DFT(32), PAA(32), and
temporal part. As in the previous criterion, the intent is to min-
DWT(32). The notion DFT(32) means the DFT with length 32.
imize E(λ) by adjusting the value of λ. Algorithm 1 shows the
We systematically sweep through the range of values of λ
pseudocode of the proposed algorithm.
to find its value where the reconstruction or prediction error
(based on the evaluation criterion) attains its minimum. Table I
V. EXPERIMENTAL STUDIES: USE OF SYNTHETIC DATA
presents the optimal values of λ along with the corresponding
In this section, we investigate the behavior of the clustering reconstruction error reported for several number of clusters,
results quantified in terms of the criteria of reconstruction and i.e., c = 2, 3, and 4, and different representation methods with
prediction for two synthetic datasets. Fig. 2(a) shows the spa- lengths 8, 16, and 32. Notice that the reported reconstruction
tial component of these datasets where P1, P2, P3, and P4 are error is a sum of the squared Euclidean distances between the
860 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013
ALGORITHM 1 TABLE I
PSEUDOCODE OF THE CLUSTERING METHOD USING RC AND PC OPTIMAL VALUES OF λ AND THE ASSOCIATED RECONSTRUCTION ERROR FOR
THE SYNTHETIC DATASETS
TABLE II
OPTIMAL VALUES OF λ AND THE ASSOCIATED PREDICTION ERROR FOR THE
SYNTHETIC DATASETS
TABLE III
OPTIMAL VALUE OF λ AND THE ASSOCIATED RECONSTRUCTION ERROR FOR
246 STATIONS IN THE ALBERTA TEMPERATURE DATASET IN DIFFERENT
SEASONS OF 2009
Fig. 6. (a) Snapshot of the Alberta Agriculture and Rural Development sys-
tem and three highlighted stations (www.agric.gov.ab.ca). (b) Daily average
temperature in 2009 for the highlighted stations.
TABLE IV
PC FOR ALBERTA TEMPERATURE DATASET DURING 2009–2011. EACH CELL
COMPRISES TWO ENTRIES: THE OPTIMAL VALUE OF λ AND THE ASSOCIATED
PREDICTION ERROR
TABLE V
COMPARISON OF RC, PC, AND RFCM OVER THE EVALUATION CRITERIA (22)
FOR DIFFERENT REPRESENTATIONS AND NUMBER OF CLUSTERS
TABLE VI
AVERAGE AND STANDARD DEVIATION OF TESTING ERROR, TRAINING ERROR,
AND ERROR RATE REPORTED OVER 100 INDEPENDENT RUNS
Fig. 11. Original and predicted time series for (a) station a, (b) station b, and
(c) station c.
Fig. 10. (a) Selected testing samples with three labeled stations a, b, and c for
prediction. (b) Clusters of training samples with two labeled prototypes P1 and this figure. Fig. 10(b) shows the optimal clustering (λopt = 0.65)
P2. of the training samples. In this figure, two prototypes, i.e., P1
and P2, are labeled. The number of clusters was set to 5, and
runs. In addition, we define the error rate as DFT(32) representation of time series is used.
Fig. 11 shows the reconstructed time series by the original
testing error
E= . (23) features (32 DFT features) and predicted features. Using the
training error PC, the temporal part of stations a and b has been predicted
In Table VI, with the increase of the number of clusters, both with a high accuracy. However, the prediction for station c is
testing and training errors are reduced. This is quite reasonable not accurate because this station is between two clusters P1 and
since having more clusters means having more prototypes and P2 (see Fig. 10) with two very different temporal patterns. In
more information about data, and as a result, the prediction can fact, the spatial part of c is close to P1, but its temporal part is
be more accurate. Moreover, because the clustering is performed close to P2.
on training samples, the defined error rate in (23) is always Fig. 12 shows the original and predicted time series of station
higher than 1 and by increasing the number of clusters, the c along with the time series corresponding to the prototypes
reduction in training error is higher than the reduction in testing P1 and P2. Both predicted and original time series of station
error so that the rate of testing error to training error is increased. c are almost between the time series corresponding to P1 and
Fig. 10(a) shows an example of selected stations as testing P2. P1 has more effect on prediction, because the spatial part
samples (star symbols) and the others as training samples. Three of station c is closer to the spatial part of P1, and as a result,
stations a, b, and c from testing samples have been labeled in P1 has a higher weight (in the form of membership degree Ũ )
866 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013
Fig. 12. Original and predicted time series for station c [in Fig. 10(a)] and the
time series corresponding to the prototypes P1 and P2.
Fig. 14. Predicted time series and the time series corresponding to the neigh-
bors of (a) station a and (b) station b highlighted in Fig. 13.
VII. CONCLUSION
for prediction. One may consider more clusters to achieve more We have introduced the concept and algorithmic framework
accurate prediction. For example, the prediction error for station of fuzzy clustering for spatiotemporal data. It was shown that
c with number of clusters 2, 5, 8, and 12 is 1.283, 1.240, 0.684, given a different nature of spatial and temporal components of
and 0.511, respectively. the data, their different treatment is realized through a flexible
In the next step, we consider the entire data as training sam- distance function where the parameter λ, controlling the influ-
ples, and predict the temporal part of some unseen spatial co- ence of temporal and spatial components, is optimized through
ordinates in the map. The procedure is the same as used in the the minimization of the RC or PC.
previous experiment. Fig. 13 shows two generated spatial points In this research, we confined ourselves to univariate time
a and b in the map. In addition, for each point, a number of sta- series. An interesting extension could be to consider multi-
tions is selected as their neighbors. Fig. 14(a) and (b) shows variate time series. Here, the data come in the form xi =
the predicted time series for a and b along with the time series T
[xi (s)|xi1 (t), xi2 (t), . . . , xiM (t)] where xik (t) is the kth vari-
corresponding to their neighbors. As seen from these figures,
able (e.g., temperature), and M is number of variables present
the predicted time series for points a and b are similar to their
in the temporal part of data. As each time series might come
neighbors (time series).
with its own specificity, this could be reflected in the augmented
The PC that has been used in this paper is different from the
additive distance function expressed as
time-series forecasting methods proposed in literature in both
methodology and purpose. Our PC predicts the time series based d2λ (vi , xk ) = vi (s) − xk (s)2 + λ1 vi1 (t) − xk 1 (t)2
on their spatial location and the time series formed in the cluster
+ · · · + λM viM (t) − xk M (t)2 (24)
centers. In addition, in this method, the objective is to find an
optimal tradeoff to regulate the interaction between spatial and where M weight coefficients λ1 , λ2 , . . . , λM offer the required
temporal patterns in the clustering process and not forecasting flexibility, and the values of these coefficients could be subject
the time series for the future time steps. Time-series forecasting to optimization again by taking advantage of the RC or PC.
methods proposed in the literature (e.g., [23], [26], and [31]) Another interesting development worth pursuing would be to
usually assume that the times series follow a linear or nonlin- investigate some other distance measures, e.g., the dynamic time
IZAKIAN et al.: CLUSTERING SPATIOTEMPORAL DATA: AN AUGMENTED FUZZY C-MEANS 867
warping distance, longest common subsequence distance, etc. [22] E. Keogh, S. Chu, D. Hart, and M. Pazzani, “An online algorithm for
One has to be aware of the fact that as we encounter various segmenting time series,” in Proc. IEEE Int. Conf. Data Mining, 2001,
pp. 289–296.
distance functions, this may pose challenges at the end of fuzzy [23] G. E. P. Box and G. Jenkins, Time Series Analysis: Forecasting and Con-
clustering and further refinements of the generic FCM method trol. San Francisco, CA: Holden-Day, 1976.
to cope with the diversity of distance measures different from [24] M. Ramoni, P. Sebastiani, and P. Cohen, “Bayesian clustering by dynam-
ics,” Mach. Learn., vol. 47, no. 1, pp. 91–121, 2002.
the Euclidean one. [25] M. Vlachos, D. Gunopulos, and G. Kollios, “Discovering similar multidi-
mensional trajectories,” in Proc. Int. Conf. Data Eng., 2002, pp. 673–684.
[26] M. H. Magalhães, R. Ballini, and F A. C. Gomide, “Granular mod-
REFERENCES els for time-series forecasting,” in Handbook of Granular Computing,
W. Pedrycz, A. Skowron, and V. Kreinovich, Eds. New York: Wiley-
[1] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast subsequence Interscience, 2008.
matching in time-series databases,” in Proc. ACM SIGMOD Int. Conf. [27] T. W. Liao, “Clustering of time series data—a survey,” Pattern Recognit.,
Manage. Data, 1994, pp. 419–429. vol. 38, no. 11, pp. 1857–1874, Nov. 2005.
[2] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionality [28] E. A. Maharaj, P. D’Urso, and D. U. A. Galagedera, “Wavelet-based fuzzy
reduction for fast similarity search in large time series databases,” J. clustering of time series,” J. Classif., vol. 27, no. 2, pp. 231–275, 2010.
Knowl. Inf. Syst., vol. 3, no. 3, pp. 263–286, Aug. 2001. [29] P. D’Urso and E. A. Maharaj, “Autocorrelation-based fuzzy clustering of
[3] K.-P. Chan and A.W.-C. Fu, “Efficient time series matching by wavelets,” time series,” Fuzzy Sets Syst., vol. 160, no. 24, pp. 3565–3589, Dec. 2009.
in Proc. Int. Conf. Data Eng., 1999, pp. 126–133. [30] E. A. Maharaj and P. D’Urso, “Fuzzy clustering of time series in the
[4] K.-P. Chan, A. W.-C. Fu, and C. Yu, “Haar wavelets for efficient similar- frequency domain,” Inf. Sci., vol. 181, no. 7, pp. 1187–1211, Apr. 2011.
ity search of time-series: With and without time warping,” IEEE Trans. [31] H. G. Seedig, R. Grothmann, and T. A. Runkler, “Forecasting of clustered
Knowl. Data Eng., vol. 15, no. 3, pp. 686–705, May/Jun. 2003. time series with recurrent neural networks and a fuzzy clustering scheme,”
[5] S. Mallat, “A theory for multiresolution signal decomposition: The wavelet in Proc. Int. Joint Conf. Neural Netw., Atlanta, GA, 2009, pp. 2846–2853.
representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 2, [32] C. S. Möller-Levet, F. Klawonn, K.-H. Cho, and O. Wolkenhauer, “Fuzzy
pp. 674–693, Jul. 1989. clustering of short time series and unevenly distributed sampling points,”
[6] F. Korn, H. V. Jagadish, and C. Faloutsos, “Efficiently supporting ad-hoc in Proc. 5th Int. Symp. Intell. Data Anal., 2003, pp. 28–30.
queries in large datasets of time sequences,” in Proc. ACM SIGMOD Int. [33] X. Zhang, J. Liu, Y. Du, and T. Lv, “A novel clustering method on time
Conf. Manage. Data, New York, 1997, pp. 289–300. series data,” Expert Syst. Appl., vol. 38, no. 9, pp. 11891–11900, Sep.
[7] D. Berndt and J. Clifford, “Using dynamic time warping to find patterns in 2011.
time series,” in Proc. Workshop Knowledge Discovery Databases, 1994, [34] F. Petitjean, A. Ketterlin, and P. Gancarski, “A global averaging method for
pp. 359–370. dynamic time warping, with applications to clustering,” Pattern Recognit.,
[8] J. Caiado, N. Crato, and D. Peña, “A periodogram-based metric for time se- vol. 44, no. 3, pp. 678–693, Mar. 2011.
ries classification,” Comput. Statist. Data Anal., vol. 50, no. 10, pp. 2668– [35] K. Kalpakis, D. Gada, and V. Puttagunta, “Distance measures for effective
2684, Jun. 2006. clustering of ARIMA time-series,” in Proc. IEEE Int. Conf. Data Mining,
[9] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algo- 2001, pp. 273–280.
rithms. New York: Plenum, 1981. [36] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction
[10] W. Pedrycz and J. V. de Oliveira, “A development of fuzzy encoding and to Cluster Analysis. New York: Wiley, 1990.
decoding through fuzzy clustering,” IEEE Trans. Instrum. Meas., vol. 57, [37] Y. Xiong and D. Yeung, “Time series clustering with ARMA mixtures,”
no. 4, pp. 829–837, Apr. 2008. Pattern Recognit., vol. 37, no. 8, pp. 1675–1689, Aug. 2004.
[11] W. Pedrycz and A. Bargiela, “Fuzzy clustering with semantically dis- [38] G. Schwarz, “Estimating the dimension of a model,” Ann. Statist., vol. 6,
tinct families of variables: Descriptive and predictive aspects,” Pattern no. 2, pp. 461–464, 1978.
Recognit. Lett., vol. 31, no. 13, pp. 1952–1958, Oct. 2010. [39] P. D’Urso, “Fuzzy clustering for data time arrays with inlier and outlier
[12] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from time trajectories,” IEEE Trans. Fuzzy Syst., vol. 13, no. 5, pp. 583–604,
incomplete data via the EM algorithm,” J. Royal Statist. Soc., Series B,, Oct. 2005.
vol. 39, no. 1, pp. 1–38, 1977. [40] M. Sato and Y. Sato, “On a multicriteria fuzzy clustering method for 3-
[13] S. Gaffney and P. Smyth, “Trajectory clustering with mixtures of regres- way data,” Int. J. Uncertainty Fuzziness Knowl.-Based Syst.,, vol. 2, no. 2,
sion models,” in Proc. 5th ACM SIGKDD Int. Conf. Knowl. Discovery pp. 127–142, Jun. 1994.
Data Mining, 1999, pp. 63–72. [41] A. Lemos, W. Caminhas, and F. Gomide, “Multivariable Gaussian evolv-
[14] M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander, “OPTICS: Or- ing fuzzy modeling system,” IEEE Trans. Fuzzy Syst., vol. 19, no. 1,
dering points to identify the clustering structure,” in Proc. ACM SIGMOD pp. 91–104, Feb. 2011.
Int. Conf. Manag. Data, Philadelphia, PA, 1999, pp. 49–60. [42] Z. Chen, S. Aghakhani, J. Man, and S. Dick, “ANCFIS: A neuro fuzzy
[15] M. Nanni and D. Pedreschi, “Time-focused clustering of trajectories of architecture employing complex fuzzy sets,” IEEE Trans. Fuzzy Syst.,
moving objects,” J. Intell. Inf. Syst., vol. 27, no. 3, pp. 267–289, Nov. vol. 19, no. 2, pp. 305–322, Apr. 2011.
2006. [43] S. Chen and C. Chen, “TAIEX forecasting based on fuzzy time series and
[16] Y. Yang and K. Chen, “Time series clustering via RPCL network ensemble fuzzy variation groups,” IEEE Trans. Fuzzy Syst., vol. 19, no. 1, pp. 1–12,
with different representations,” IEEE Trans. Syst., Man, Cybern. C, Appl. Feb. 2011.
Rev., vol. 41, no. 2, pp. 190–199, Mar. 2011. [44] D. L. Pham, “Spatial models for fuzzy clustering,” Comput. Vis. Image
[17] J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing SAX: A novel Understand., vol. 84, no. 2, pp. 285–297, 2001.
symbolic representation of time series,” Data Mining Knowl. Discovery, [45] V. Petridis and A. Kehagias, “Predictive modular fuzzy systems for time-
vol. 15, no. 2, pp. 107–144, Aug. 2007. series classification,” IEEE Trans. Fuzzy Syst., vol. 5, no. 3, pp. 381–397,
[18] K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani, “Locally adaptive Aug. 1997.
dimensionality reduction for indexing large time series databases,” ACM [46] S. M. Arafat and M. Skubic, “Modeling fuzziness measures for best
Trans Database Syst., vol. 27, no. 2, pp. 188–228, Jun. 2002. wavelet selection,” IEEE Trans. Fuzzy Syst., vol. 16, no. 5, pp. 1259–
[19] H. Cao, H. W. Deng, and Y. P. Wang, “Segmentation of M-FISH images 1270, Oct. 2008.
for improved classification of chromosomes with an adaptive fuzzy C- [47] P. Kalnis, N. Mamoulis, and S. Bakiras, “On discovering moving clusters
means clustering algorithm,” IEEE Trans. Fuzzy Syst., vol. 20, no. 1, in spatio-temporal data,” in Proc. Int. Symp. Spatial Temporal Databases,
pp. 1–9, Feb. 2012. 2005, pp. 364–381.
[20] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, “Query- [48] L. Chen, M. T. Özsu, and V. Oria, “Robust and fast similarity search for
ing and mining of time series data: Experimental comparison of represen- moving object trajectories,” in Proc. ACM SIGMOD Int. Conf. Manage.
tations and distance measures,” in Proc. VLDB Endowment, Auckland, Data, 2005, pp. 491–502.
New Zealand, 2008, pp. 1542–1552. [49] L. F. S. Coletta, L. Vendramin, E. R. Hruschka, R. J. G. B. Campello, and
[21] Y. Cai and R. Ng, “Indexing spatio-temporal trajectories with Chebyshev W. Pedrycz, “Collaborative fuzzy clustering algorithms: Some refinements
polynomials,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2004, and design guidelines,” IEEE Trans. Fuzzy Syst., vol. 20, no. 3, pp. 444–
pp. 599–610. 462, Jun. 2012.
868 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013
[50] D. T. Pham and A. B. Chan, “Control chart pattern recognition using a Witold Pedrycz (M’88–SM’94–F’99) received the
new type of self organizing neural network,” Proc. Inst. Mech. Eng., Part M.Sc., Ph.D. and D.Sci. degrees from the Silesian
I: J. Syst. Control Eng., vol. 212, no. 2, pp. 115–127, 1998. University of Technology, Gliwice, Poland.
[51] S. Kisilevich, F. Mansmann, M. Nanni, and S. Rinzivillo, “Spatio- He is currently a Professor and Canada Research
temporal clustering,” in Data mining and Knowledge Discovery Hand- Chair (CRC computational intelligence) with the De-
book. New York: Springer, 2010, pp. 855–874. partment of Electrical and Computer Engineering,
[52] M. Kulldorff, “Prospective time periodic geographical disease surveillance University of Alberta, Edmonton, AB, Canada. In
using a scan statistic,” J. Roy. Statist. Soc. A, vol. 164, no. 1, pp. 61–72, 2009, he was elected as a foreign member of the Pol-
2001. ish Academy of Sciences, Warsaw, Poland. He is the
[53] H. Izakian and W. Pedrycz, “A new PSO-optimized geometry of spatial author of 14 research monographs covering various
and spatio-temporal scan statistics for disease outbreak detection,” Swarm aspects of computational intelligence and software
Evol. Comput., vol. 4, pp. 1–11, Jun. 2012. engineering. He is also with the Department of Electrical and Computer Engi-
[54] F. Di Martino and S. Sessa, “The extended fuzzy C-means algorithm neering Faculty of Engineering, King Abdulaziz University, Jeddah, Kingdom
for hotspots in spatio-temporal GIS,” Expert Syst. Appl., vol. 38, no. 9, of Saudi Arabia. His main research interests include computational intelligence,
pp. 11829–11836, Sep. 2011. fuzzy modeling and granular computing, knowledge discovery and data mining,
[55] M. Wang, A. Wang, and A. Li, “Mining spatial-temporal clusters from fuzzy control, pattern recognition, knowledge-based neural networks, relational
geo-databases,” in Proc. 2nd Int. Conf. Adv. Data Mining Appl., 2006, computing, and software engineering. He has published numerous papers in this
pp. 63–270. area.
[56] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm Prof. Pedrycz was elected as a Fellow of the Royal Society of Canada in 2012.
for discovering clusters in large spatial databases with noise,” Data Mining He has been a member of numerous program committees of IEEE conferences in
Knowl. Discovery, pp. 226–231, 1996. the area of fuzzy sets and neurocomputing. He is intensively involved in editorial
[57] Z. Liu and R. George, “Fuzzy cluster analysis of spatio-temporal data,” in activities. He is an Editor-in-Chief of Information Sciences and Editor-in-Chief
Proc. 18th Int. Symp. Comput. Inf. Sci., Antalya, Turkey, 2003, pp. 984– of the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A:
991. SYSTEMS AND HUMANS. He currently serves as an Associate Editor of the IEEE
[58] M. Deng, Q. Liu, J. Wang, and Y. Shi, “A general method of spatio- TRANSACTIONS ON FUZZY SYSTEMS and is a member of a number of edito-
temporal clustering analysis,” Sci. Chin. Inf. Sci., pp. 1–14, 2011. rial boards of other international journals. In 2007, he received a prestigious
[59] R. Coppi, P. D’Urso, and P. Giordani, “A fuzzy clustering model for Norbert Wiener Award from the IEEE Systems, Man, and Cybernetics Coun-
multivariate spatial time series,” J. Classif., vol. 27, no. 1, pp. 54–88, Mar. cil. He received the IEEE Canada Computer Engineering Medal in 2008. In
2010. 2009, he received a Cajastur Prize for soft computing from the European Centre
[60] Y. C. Cheng and S. T. Li, “Fuzzy time series forecasting with a probabilis- for Soft Computing for “pioneering and multifaceted contributions to granular
tic smoothing hidden Markov model,” IEEE Trans. Fuzzy Syst., vol. 20, computing.”
no. 2, pp. 291–304, Apr. 2012.
[61] J. Wu, H. Xiong, C. Liu, and J. Chen, “A generalization of distance func-
tions for fuzzy C-means clustering with centroids of arithmetic means,”
IEEE Trans. Fuzzy Syst., vol. 20, no. 3, pp. 557–571, Jun. 2012.
[62] J. P. Mei and L. Chen, “A fuzzy approach for multitype relational data
clustering,” IEEE Trans. Fuzzy Syst., vol. 20, no. 2, pp. 358–371, Apr.
2012.
Hesam Izakian (S’12) received the M.S. degree in Iqbal Jamal received the M.S. degree in management
computer engineering (artificial intelligence) from science and the M.A.Sc.Eng. degree from the Univer-
the University of Isfahan, Isfahan, Iran. He is cur- sity of British Columbia, Vancouver, BC, Canada.
rently working toward the Ph.D. degree with the De- He is currently a Principal of AQL Management
partment of Electrical and Computer Engineering, Consulting (AQLMC) Inc., Edmonton, AB, Canada:
University of Alberta, Edmonton, AB, Canada. a data mining/analytics-based company. AQLMC
He is working under the supervision of Prof. W. specializes in developing and implementing data an-
Pedrycz. His research interests include computational alytics in support of anomaly detection for animal,
intelligence, knowledge discovery and data mining, human, and environmental health. AQLMC also con-
pattern recognition, and software engineering. ducts operations analysis for public sector services.