Izakian, 2013 - Clustering Spatiotemporal Data An Augmented Fuzzy C-Means

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO.
5, OCTOBER 2013 855
Clustering Spatiotemporal Data: An Augmented

Fuzzy C-Means
Hesam Izakian, Student Member, IEEE, Witold Pedrycz, Fellow, IEEE, and Iqbal Jamal
Abstract—In spatiotemporal data commonly encountered in ge- clustering techniques and is inherently associated with some
ographical systems, biomedical signals, and the like, each datum underlying objective function. To cope with the specificity of
is composed of features comprising a spatial component and a the spatiotemporal data, the generic objective function of the
temporal part. Clustering of data of this nature poses challenges,
especially in terms of a suitable treatment of the spatial and tem- FCM requires a thorough examination and revision of its for-
poral components of the data. In this study, proceeding with the mulation. In this paper, we introduce a concept and offer the
objective function-based clustering (such as, e.g., fuzzy C-means), ensuing algorithmic developments by using the generic FCM
we revisit and augment the algorithm to make it applicable to spa- algorithm (although the main line of thought is equally valid for
tiotemporal data. An augmented distance function is discussed, any objective function-based clustering). The crux of the method
and the resulting clustering algorithm is provided. Two optimiza-
tion criteria, i.e., a reconstruction error and a prediction error, are is to effectively handle the data reflecting the spatial and tempo-
introduced and used as a vehicle to optimize the performance of ral facet of the problem (data) in order to preserve the essence of
the clustering method. Experimental results obtained for synthetic the problem. For this purpose, we revisit the distance function
and real-world data are reported. and augment the “standard” Euclidean distance. Equally impor-
Index Terms—Fuzzy clustering, reconstruction and prediction tant is the fact that the augmented distance is endowed with a
criteria, spatiotemporal data, weather data. substantial level of flexibility so that the contributions coming
from the temporal and spatial parts of the data could be carefully
I. INTRODUCTION balanced and optimized. The resulting flexibility is exploited to
minimize two performance indexes, namely, a reconstruction
IVEN the unprecedented growth of spatiotemporal data
G encountered in different application domains such as, e.g.,
geography, climatology, and health surveillance systems, their
error or a prediction error. To deal with the reconstruction error
is essential when assessing the quality of clusters—information
granules and quantifying their role being played in the processes
analysis has become more important and practically relevant. of information granulation and de-granulation. The prediction
In spatiotemporal data, each data point is composed of two aspects are of interest when forecasting a temporal component
parts, namely, a spatial component, typically denoting its loca- of the data given their specific location (spatial information).
tion (say, x − y or latitude–longitude coordinates), and temporal Interestingly enough, the objective function of the FCM algo-
part, comprising one or more time series associated with the spa- rithm has been subject to various modifications in order to cope
tial coordinates. Daily average temperature recorded at different with the specificity of the problem. In [19], by adding a gain
weather stations, number of disease cases reported in different field, the FCM objective function has been reformulated and
cities in a monthly period, and hourly air pollution recordings optimized in an iterative fashion for segmentation and classifi-
are examples of this kind of data. cation of M-FISH images to detect chromosomal abnormalities
Clustering of spatiotemporal data reveals interesting struc- and support a genetic disease diagnosis. In [62], a fuzzy clus-
tures that could be used in different applications. The fuzzy tering approach for data points comprising various object types
C-means (FCM) algorithm [9] is one of the commonly used was proposed by reformulating the FCM objective function and
optimizing a constrained optimization problem. A membership
matrix and a ranking matrix have been employed in the opti-
Manuscript received January 3, 2012; revised May 23, 2012 and September
mization procedure, where the membership matrix comprises
8, 2012; accepted November 6, 2012. Date of publication December 11, 2012; membership degrees of objects to clusters, while the ranking
date of current version October 2, 2013. This work was supported in part by the matrix measures how representative an object is in comparison
Alberta Innovates—Technology Futures and Alberta Advanced Education and
Technology, the Natural Sciences and Engineering Research Council of Canada,
with other objects in various clusters. In [61], a general def-
and the Canada Research Chair Program. inition of distance functions that preserve the applicability of
H. Izakian is with the Department of Electrical and Computer Engineering, the centroid-based alternating optimization in FCM is provided.
University of Alberta, Edmonton, AB, Canada, T6G 2V4 (e-mail: izakian@
ualberta.ca).
They showed that any distance function that can be used in
W. Pedrycz is with the Department of Electrical and Computer Engineering, the FCM algorithm is an instance of the generalized point-to-
University of Alberta, Edmonton, AB, Canada, T6G 2V4, with the Department centroid distance and can be derived by a differentiable convex
of Electrical and Computer Engineering Faculty of Engineering, King Abdu-
laziz University, Jeddah 21589, Kingdom of Saudi Arabia, and with the Sys-
function. In addition, in [49], some methods and guidelines to
tem Research Institute, Polish Academy of Sciences, Warsaw 00-716, Poland design collaborative fuzzy clustering algorithms for clustering
(e-mail: wpedrycz@ualberta.ca). distributed data among different data sites were developed.
I. Jamal is with AQL Management Consulting Inc., Edmonton, AB, Canada,
T6J 2R8 (e-mail: iqbaljamal@aqlmc.com).
This study is organized as follows. We start with a
Digital Object Identifier 10.1109/TFUZZ.2012.2233479 brief review of the research being reported so far. The two
1063-6706 © 2013 IEEE

856 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013
fundamental concepts being of essential relevance in the context employed to define the spatiotemporal neighborhood. In [44],
of the study, that is, a representation of time series and quantify- an extended version of FCM was proposed for image segmen-
ing distance between time series are discussed. In Section III, we tation by considering the spatial location of pixels. This method
introduce spatiotemporal clustering and formulate the ensuing has been considered by Coppi et al. [59] for clustering spa-
optimization problem. In Section IV, two performance indexes tiotemporal data. In this approach, a spatial penalty term that
(evaluation criteria) casting the clustering results in the setting of was calculated using a spatial contiguity matrix has been added
reconstruction and prediction problems are investigated. In Sec- to the objective function to guarantee an approximate spatial
tions V and VI, experimental results dealing with synthetic data homogeneity of the clusters.
and real-world problems are reported. Conclusions are covered Trajectories capture the movement behavior of a set of spa-
in Section VII. tial objects in the form of time series. When the most recent
position of the objects is available, the data are called moving
objects data. Clustering of this kind of data aims to discover
II. CLUSTERING SPATIOTEMPORAL DATA—A FOCUSED a behavior of a collection of objects, e.g., those occurring in
LITERATURE REVIEW urban traffic or animals’ migration. In [15], the Euclidean dis-
In real-world applications, we encounter different kinds of tance between trajectories was used as a dissimilarity measure,
spatiotemporal data. Kisilevich et al. [51] divided spatiotem- whereas OPTICS [14] has been extended to cluster trajectories.
poral data into five categories including spatiotemporal events, Two methods, trajectory-OPTICS and a time-focused version
georeferenced variables, georeferenced time series, moving ob- of that (called TF-OPTICS) were proposed. In [13], a proba-
jects, and trajectories. bilistic regression model for trajectory detection was proposed
In spatiotemporal event data, there is a set of events, each and expected maximization algorithm [12] has been employed
occurred in a spatial location and coming with its timestamp. to model trajectories. Kalnis et al. [47] proposed algorithms to
Clustering this type of data aims to find a set of events that discover moving clusters in spatiotemporal data. In these meth-
are close to each other in both space and time. One of the ods, the set of objects of a moving cluster change over time.
commonly used methods for clustering these types of data is At each time step, the location of objects has been considered
scan statistics [52], [53]. In this method, one moves a cylindri- as a snapshot and a spatial clustering method like DBSCAN
cal window of variable size and shape, across a geographical was used for clustering. Two snapshot clusters in consecutive
region to detect clusters of events with the highest likelihood time steps were considered as moving clusters if a value of their
ratios. In [54], an extended version of FCM has been proposed Jaccard coefficient exceeds a certain threshold. A fuzzy cluster-
to find circular clusters of hotspots in spatiotemporal geograph- ing for three-way data was proposed in [40]. In this structure,
ical information system data. For each timestamp, the events are each data point was composed of objects, attributes, and situa-
clustered based on their spatial location, and then, a comparison tions. The data are clustered based on not only individual time
between occurred clusters in consecutive time stamps has been instances, but in addition, the similarity between structures has
performed to conclude some interpretations about events. Wang been considered in different time steps. A survey of clustering
et al. [55] proposed two spatiotemporal clustering methods, spatiotemporal data is reported in [51].
which are called ST-GRID and ST-DBSCAN, to detect seis-
mic events in China and neighboring countries. The ST-GRID
method used a multidimensional grid that covers the entire spa-
tiotemporal feature space. Then, by merging the dense neighbor A. Time-Series Representation Methods
cells, spatiotemporal clusters were formed. ST-BDSCAN ex- Time series have been investigated in a variety of problems
tended DBSCAN [56] by redefining density reachability using of data mining such as clustering [36], [39], classification [8],
spatial and temporal radius. Both methods exploited an ordered [45], [46], forecasting [42], [43], [60], and modeling [38], [41].
k-dist graph [56] to determine their parameters. Based on the type of data being used, the methods of time-
Georeferenced time series are composed of a set of fixed series clustering can be split into three categories [16], [27],
geographical coordinates, each corresponding to one or more namely those using raw time-series data [32]–[34], model-
time series. Georeferenced variables data form a special case based methods [24], [35], [37], and representation-based meth-
of georeferenced time series where only the most recent point ods [16], [28]–[30].
of time series is available. Clustering this type of data aims There are a number of methods proposed in the literature
to group objects based on their spatial closeness and temporal to represent time series. In general, such representation meth-
similarities. In [57], FCM has been used to cluster weather time ods are categorized into data-adaptive and non-data-adaptive
series. The Pearson correlation coefficient was employed as the methods [17], [18], [20]. Adaptive piecewise constant approxi-
similarity measure expressing closeness of two time series and mation [18], piecewise linear approximation [22], singular value
a method to determine the number of clusters has been pro- decomposition [6], and symbolic aggregate approximation [17]
posed. However, the method does not involve the spatial part are examples of data-adaptive methods. Discrete Fourier trans-
of data in the clustering process. Deng et al. [58] proposed a form (DFT) [1], Chebyshev polynomials [21], discrete wavelet
density-based spatiotemporal clustering. In this method, a spa- transform (DWT) [3], [4], and piecewise aggregate approxima-
tial proximate network has been constructed using Delaunay tion (PAA) [2] are well-known methods belonging to the second
triangulation and a spatiotemporal autocorrelation analysis was category.
IZAKIAN et al.: CLUSTERING SPATIOTEMPORAL DATA: AN AUGMENTED FUZZY C-MEANS 857
In this paper, we use three commonly studied methods to where a= [a0 , a1 , . . . , aN /2−1 ]T are scaling coefficients, and f
represent time series, namely, DFT, PAA, and DWT. They can be = [f0 , f1 , . . . , fN /2−1 ]T are wavelet coefficients present at the
viewed as sound representatives of the large set of the methods first level. To calculate the wavelet coefficients at the next level,
existing in the literature. In what follows, we review them very the aforementioned calculations are performed over the scaling
briefly. coefficients a. The procedure is recursive until the required
1) Discrete Fourier Transform: The DFT models the time number of iteration has been reached. For each wavelet function,
series using a set of sine and cosine waves. It represents the there are a number of nonzero coefficients. For example, for the
time series in a frequency domain. For a time series y of length Haar function, the nonzero coefficients are c0 = c1 = 1.
N , DFT is composed of N complex numbers, each describing One has to stress that the representation method of time se-
a sine/cosine wave given by ries is problem-dependent. For example, one may be interested
to analyze time series based on their frequency characteristics
N −1
1 (using DFT), time characteristics (where PAA could be of in-
fk = √ yi exp(−j2πki/N ) k = 0, 1, . . . , N − 1
N i=0 terest), or time–frequency joint characteristics (DWT). In this
(1) paper, we used these three representation methods in clustering
√
where j = −1. The original time series can be reconstructed time-series data.
by running an inverse transform given by
B. Distance Functions
N −1
1 Distance functions (distances, for brief) used in time series
yi = √ fk exp(j2πki/N ) i = 0, 1, . . . , N − 1 (2)
N k =0 can be divided into three general categories: Lp −norm dis-
tances, elastic measures, and statistical measures. Euclidean dis-
Faloutsos et al. [1] employed DFT to index time series. They tance L2 has been widely used as a dissimilarity measure [20]
noted that the most important features of each sequence are the and is suitable to compare equal-length time series. Dynamic
first k (real and imaginary) coefficients (k << N ) of the DFT time warping distance [7] is an elastic measure used to deter-
transform, while the other coefficients assume values close to mine an optimal match between two time series by stretching or
zero. By having these k coefficients, the original time series can compressing their segments, and concentrates on the similarity
be reconstructed with a little loss of information. of time series with respect to their shapes. Longest common sub-
2) Piecewise Aggregate Approximation: This method pro- sequence [25] is another example of the elastic-based distance
vides a simple and efficient way of time-series representation in measures. This method uses the length of the longest subse-
time domain offering a substantial dimensionality reduction [2]. quence occurring in two time series to quantify their similarity.
PAA divides the time series y into k (k << N ) segments of In addition, an edit distance of real-number sequences [48],
equal length and determines the mean value of data points lying which is another elastic-based distance measure, considers the
within each segment as the representatives of the original time number of insert, delete, and replace operations that are required
series. More formally, we have the representation in the form of to convert one sequence to another to express the similarity.
a vector f whose coordinates are expressed as follows: Pearson coefficient is a statistics-based method that is used to
N
quantify the correlation between two time series. The Kullback–
k k
(i+1)−1
Liebler distance [24] is another statistical measure useful in ex-
fi = yj , i = 0, 1, . . . , k − 1. (3) pressing the dissimilarity between two time series represented
N
j = Nk i by their Markov chain. A comparison between a number of rep-
resentation methods and similarity measures used for various
3) Discrete Wavelet Transform: Wavelets are basis functions
types of time series was reported in [20] in the problem of in-
that describe time series in a time–frequency joint representa-
dexing time series. The suitability of each similarity measure is
tion. In [3] and [4], DWT is used as an efficient representation
application-oriented. Nevertheless, the Euclidean distance is in
method to index time-series data. A suitable method to calculate
common usage.
the DWT coefficients is a pyramid algorithm [5]. In this method,
the length of time series N has to be a power of two. For time
III. CONCEPT OF CLUSTERING OF SPATIOTEMPORAL DATA
series that do not satisfy this condition, zero padding is realized.
DWT converts the time series into two types of coefficients re- In clustering spatiotemporal data, we assume that there are n
sulting from low-pass filters (also called scaling function) and data x1 , x2 , . . . , xn , each comprising its spatial and temporal
high-pass filters (also called wavelet function), each with length components. The ith data xi is represented as a concatenation
T
N /2, given by of its spatial and temporal parts, namely, xi = [xi (s)|xi (t)] ,
where xi (s) is the spatial part of xi , while xi (t) denotes the
N −1
1 N temporal part (or its representation) of the same data point. By
ai = c2i−j +1 yj , i = 0, 1, . . . , −1 (4)
2 j =0 2 considering r features in the spatial part and q features in the
temporal one, we have
N −1
1 N xi = [xi (s)|xi (t)]T = [xi1 (s), . . . , xir (s)|xi1 (t), . . . , xiq (t)]T .
fi = (−1)j cj −2i yj , i = 0, 1, . . . , −1 (5)
2 j =0 2 (6)
As noted earlier, our interest is in the augmentation of the FCM

algorithm so that the spatiotemporal nature of the data can be
fully utilized in the clustering process. The aim of the FCM is
to construct a collection of “c” information granules—clusters
with the structure of data described by a collection of prototypes
v1 , v2 , . . . , vc and a fuzzy partition matrix U = [uik ], i =
c
1, 2, . . . , c, k =
n1, 2, . . . , n, where u ik ∈ [0, 1], i=1 uik =
1, ∀k, and 0 < k =1 uik < n, ∀i. This structure arises through
the minimization of the following objective function:

c
n
J= um 2
ik d (vi , xk ) (7)
i=1 k =1
where m(m > 1) is a fuzzification coefficient. The distance d

used in the objective function is usually viewed as the Euclidean
distance or its relative such as the weighted Euclidean or the
Mahalanobis distance [9]. When it comes to the spatiotemporal
data, the key point is to prudently capture a notion of distance,
which will clearly distinguish between the spatial and the tem- Fig. 1. Overall scheme of evaluation of the clustering process completed with
the aid of (a) RC and (b) PC.
poral components in the problem at hand. Likewise, we may like
to accommodate a crucial possibility to strike a sound tradeoff
between the distance determined with regard to the spatial and
the temporal parts of the feature vector. This is accomplished IV. EVALUATION CRITERIA
by forming an additive form of the distance function composed The two criteria of interest are concerned with a way in which
of the two components the results of clustering are evaluated. Those are the reconstruc-
tion criterion (RC) [10] and prediction criterion (PC) [11]. Fig. 1
d2λ (vi , xk ) = vi (s) − xk (s)2 + λ vi (t) − xk (t)2 , λ ≥ 0. highlights the essence of these two criteria.
(8) Our starting point is the result of clustering expressed in terms
This augmented distance allows us to control the effect of each of the prototypes and the partition matrix. The clustering was
part of data in the determination of the overall Euclidean dis- realized for a certain value of λ.
tance and helps strike a sound balance between the impact of the
spatial and temporal components of the data. When λ = 0, the A. Reconstruction Criterion
spatial component is considered and the temporal part is com-
pletely ignored. The higher the value of λ, the more substantial The essence of this evaluation process is to “reconstruct” the
the impact of the temporal part of the spatiotemporal data on original data using the cluster prototypes and the partition matrix
the discovery of the structure. Subsequently, the aforementioned by minimizing the following sum of distances [10]:
distance function is used in the objective function
c
n
2
F = ik vi − x̂k
um (12)

c
n
i=1 k =1
J= um 2
ik dλ (vi , xk ). (9)
where x̂k is the reconstructed version of xk . By zeroing the
i=1 k =1
gradient of F with respect to x̂k , we have
Carrying out the optimization of J, we arrive at the following c
um
ik vi
expressions for the prototypes and the partition matrix x̂k = i=1 c m . (13)
u
i=1 ik
n m
k =1 uik xk Once the reconstruction has been completed, viz.,
vi = n m (10)
k =1 uik x̂1 , x̂2 , . . . , x̂n were constructed with the use of (13), the quality
1 of reconstruction regarded as a function of λ is expressed in the
uik =
c 2/(m −1) . (11) form
d λ (v i ,x k )
j =1 d λ (v j ,x k ) n
E(λ) = xk − x̂k 2
As usual, these two formulas are used in an iterative way in k =1
which the partition matrix and the prototypes are updated in a
n
n
consecutive fashion. While the weight factor (λ) offers a badly = xk (s) − x̂k (s)2 + xk (t) − x̂k (t)2 (14)
needed flexibility to the method and could help in its opti- k =1 k =1
mization, it becomes crucial to arrive at a constructive way of where
selecting its optimal value. In what follows, we introduce two
1 (xk j (s) − x̂k j (s))
r 2
2
evaluation criteria using which the factor’s value becomes opti- xk (s) − x̂k (s) = (15)
mized. r j =1 σj2
and
1 (xk j (t) − x̂k j (t))
q 2
xk (t) − x̂k (t)2 = (16)

q j =1 σj2
and σj2 is the variance of jth feature. Given that commonly the
spatial part and the temporal part are expressed in spaces of
very different dimensionalities (typically r << q), in these two,
we use the normalized Euclidean distances in order to avoid
any bias toward any particular component of the distance. The
reconstruction error E(λ) is a function of λ and its minimum is
determined by a systematic sweeping through a certain range of
the values of λ. This approach, instead of any more sophisticated
1-D search, is considered because learning about the form of this
index as a function of λ is also of interest.
B. Prediction Criterion
The essence of the PC is to “predict” the temporal component
of the data by using the available spatial structure. Since each
Fig. 2. Synthetic spatiotemporal data. (a) Spatial component, (b) temporal
data point is composed of the spatial and the temporal parts, component of more distinguishable dataset, and (c) temporal component of less
the cluster centers (prototypes) are composed of the spatial part distinguishable dataset.
v(s), and temporal part v(t) as well. Using the spatial part of
data along with the spatial part of the calculated cluster centers,
we form a new partition matrix, which is denoted by Ũ , as
follows [11]:
1
ũik =
c 2/(m −1) . (17)
v i (s)−x k (s)
j =1 v j (s)−x k (s)
With the use of this new partition matrix and the temporal part
of the cluster centers v(t), we minimize the following sum of
distances:
c n
2
F = ik vi (t) − x̂k (t)
ũm (18)
i=1 k =1
where x̂k (t) is the predicted temporal part of the kth data. By
zeroing the gradient of F with respect to x̂k (t), we have Fig. 3. (a) Selected time series and its representations with the use of
c m
(b) DFT(32), (c) PAA(32), and (d) DWT(32).
i=1 ũik vi (t)
x̂k (t) = c m . (19)
i=1 ũik
groups associated with four categories of time series of length
The quality of prediction is evaluated using the following pre- of 256 samples. We considered two scenarios. In the first one,
diction error: Fig. 2(b), the time series are clearly distinguishable, while those
n
2 1 (xk j (t) − x̂k j (t))
n q 2
shown in Fig. 2(c) exhibit a significant level of overlap (less
E(λ) = xk (t) − x̂k (t) = . distinguishable data). The generated time series in these figures
q j =1
σj2
k =1 k =1 are a kind of increasing and decreasing time series encountered
(20)
in control chart patterns [50].
It takes on a form of the sum of the normalized Euclidean dis-
In Fig. 3, we presented one of the time series along with its
tances between the temporal part of the data and the predicted
corresponding representations, namely DFT(32), PAA(32), and
temporal part. As in the previous criterion, the intent is to min-
DWT(32). The notion DFT(32) means the DFT with length 32.
imize E(λ) by adjusting the value of λ. Algorithm 1 shows the
We systematically sweep through the range of values of λ
pseudocode of the proposed algorithm.
to find its value where the reconstruction or prediction error
(based on the evaluation criterion) attains its minimum. Table I
V. EXPERIMENTAL STUDIES: USE OF SYNTHETIC DATA
presents the optimal values of λ along with the corresponding
In this section, we investigate the behavior of the clustering reconstruction error reported for several number of clusters,
results quantified in terms of the criteria of reconstruction and i.e., c = 2, 3, and 4, and different representation methods with
prediction for two synthetic datasets. Fig. 2(a) shows the spa- lengths 8, 16, and 32. Notice that the reported reconstruction
tial component of these datasets where P1, P2, P3, and P4 are error is a sum of the squared Euclidean distances between the
ALGORITHM 1 TABLE I
PSEUDOCODE OF THE CLUSTERING METHOD USING RC AND PC OPTIMAL VALUES OF λ AND THE ASSOCIATED RECONSTRUCTION ERROR FOR
THE SYNTHETIC DATASETS
TABLE II
OPTIMAL VALUES OF λ AND THE ASSOCIATED PREDICTION ERROR FOR THE
SYNTHETIC DATASETS
original extracted features and the reconstructed features (see

(14). In all experiments, the value of the fuzzification coefficient
m was set to 2.
The table visualizes the effect of different parameters on the
optimal value of λ and the resulting reconstruction error. Among
different representation methods, the DFT representation has
the lowest value of the optimal λ, while the DWT assumes the
highest value. The reason is that the magnitude of features is
different depending on the representation method used.
As shown in this table, given a higher dimensionality of the
representation space used for the temporal part of data, the op-
timal value of λ will occur in a lower amount to prevent bias
toward temporal part in the clustering process. With the increase
of the number of clusters, the reconstruction error becomes re-
duced. Having more visible structure in the more distinguishable
dataset [see Fig. 2(b)], its reconstruction error usually is lower
than the one reported for the less distinguishable dataset. Table II
shows the results obtained when using the PC. datasets, by considering the number of clusters c = 3, the “po-
We can see that most of the conclusions obtained when deal- sition” of the spatial part of prototypes and the “structure” of
ing with the RC hold here. There is an exception, however: the temporal part of prototypes are not efficient for prediction,
Sometimes with the increase in the number of clusters, the error as the predicted time series are the weighted (calculated by the
does not decrease. For example, the value of the error for c = position of spatial part of prototypes in the form of Ũ ) average
3 is higher than the one for c = 2 because for the generated of temporal parts of prototypes.
Fig. 5. Plots of reconstruction and prediction errors versus λ for c = 3 and

DFT(16) representation. (a) Reconstruction error and (b) prediction error.
A. Analysis of Alberta Temperature Data

Alberta agriculture and rural development provides updated
agriculture-related data including daily temperature, humidity,
Fig. 4. Contour plots of membership functions for selected values of λ and precipitation, etc. The data are recorded by a number of sta-
c = 2, PAA(16) representation, and less distinguishable dataset. (a) λ = 0. tions located within the province of Alberta, Canada. For each
(b) λ = 1 and RC. (c) λ = 1 and PC. (d) λ = 3 and RC. (e) λ = 3 and PC. station, the geographical coordinates in the form of its latitude
(f) λ = 10 000.
and longitude are provided. These data are available online at
www.agric.gov.ab.ca. In this system, the end-user can select the
A. Effect of λ on the Performance and Results of Clustering required stations and pertinent agriculture-related variables to
In this experiment, we show how λ impacts the effect arising download the data. Fig. 6(a) shows a snapshot of the system with
from the temporal and spatial components of the data. We use three highlighted stations located in South East, South West, and
the less distinguishable dataset, [see Fig. 2(c)], set the number North West Alberta.
of clusters to 2, and use PAA(16) as the representation method Fig. 6(b) shows the average daily temperature recorded at
of the time series; both the RC and the PC are considered. Fig. 4 these stations in 2009. The collected data are of high relevance
shows the results in the form of a contour plot of the obtained to various groups of users. Epidemiologists form one of such
membership functions. The values λ = 0 and λ = 10 000 are groups: who are seeking to better understand the relationships
treated as the extreme cases: when λ = 0, the spatial part is in- between measures of environmental health and measures of an-
volved in clustering, while the second boundary focuses on the imal health, for example, to better understand the relationships
temporal part of the data. It becomes visible that the changes of between province-wide precipitation, temperature and humid-
λ lead to the shift of the contour plots which are reflective of ity, and the dynamics of the prevalence of endemic diseases
the growing impact of the temporal or spatial component of the and possible outbreaks. Animal health information includes
data. In the sequel, we investigate the impact of λ on the recon- temporal and spatial field level veterinarian observations (e.g.,
struction and prediction errors. In the series of experiments, we preliminary syndromes and clinical diagnoses) and laboratory
set the number of clusters to c = 3. The DFT(16) is used as the results from submitted field samples. While this understanding
representation method. Fig. 5 displays the plots for (a) RC and enhances knowledge of the dynamics of interacting environmen-
(b) PC. The optimal value of λ is clearly visible. tal and health domains through ad hoc analyses, it also supports
the development of near-real-time surveillance systems. These
systems provide operational insight into these dynamic relation-
VI. EXPERIMENTAL STUDIES: USE OF REAL-WORLD DATA ships needed for ongoing monitoring, response, and improved
In this section, we investigate the proposed method in applica- control of biosafety and risk of diseases. As can be seen from
tion to the Alberta temperature dataset, including daily average Fig. 6(b), different stations located in different parts of province
temperature. come with different temperature patterns. Therefore, grouping
TABLE III
OPTIMAL VALUE OF λ AND THE ASSOCIATED RECONSTRUCTION ERROR FOR
246 STATIONS IN THE ALBERTA TEMPERATURE DATASET IN DIFFERENT
SEASONS OF 2009
Fig. 6. (a) Snapshot of the Alberta Agriculture and Rural Development sys-
tem and three highlighted stations (www.agric.gov.ab.ca). (b) Daily average
temperature in 2009 for the highlighted stations.
(clustering) these stations based on their locations and their

daily average temperature (or any other variable e.g. precipita-
tion) generates some useful insights with potential applicability
to various domains. We consider the temperature data recorded
during 2009–2011 at 246 stations located across Alberta. Notice
that in the experiments, in the first step, we project latitude and
longitude coordinates to Cartesian coordinates to be used in the
calculations of the Euclidean distance.
1) Alberta Temperature Data in Different Seasons—
Reconstruction Criterion: We split the daily average temper-
ature data recorded in 2009 into four seasons (Spring, Summer,
Fall, and Winter) and run the experiments using the RC, while
the number of clusters varies from 2 to 5. The length of each
time series is about 90 (depends on season) and for each rep-
resentation method, the length of 8 has been chosen. Table III
summarizes the results.
What could have been expected, when forming more clusters,
the reconstruction error is reduced. Furthermore, from this table,
we can see that in some cases, we have λopt = 0. This means
that involving temporal information in these cases does not help
the method to reconstruct data in a more accurate way. Fig. 7 Fig. 7. Clusters visualized in the form of contour plot of the membership
shows the contour plot of the membership degrees of the clusters degrees for successive seasons of 2009, c = 2, and PAA(8) representation:
obtained for different seasons of the year. (a) Spring, (b) Summer, (c) Fall, and (d) Winter.
TABLE IV
PC FOR ALBERTA TEMPERATURE DATASET DURING 2009–2011. EACH CELL
COMPRISES TWO ENTRIES: THE OPTIMAL VALUE OF λ AND THE ASSOCIATED
PREDICTION ERROR
for different representation methods, the revealed structures in

temporal part of data can be more or less significant.
2) Alberta Daily Average Temperature During 2009–2011—
Prediction Criterion: We considered daily average temperature
for 246 stations in Alberta in the time period 2009–2011 and
build the clusters to investigate the PC. Table IV shows the
optimal amount of λ and its corresponding prediction error for
these 246 stations and number of clusters c = 2, 4, 6, 8, and 10.
The length of time series in each dataset is 365, and the length
of representation methods is set to 32.
The plots in Fig. 9 illustrate the obtained clusters for c = 4.
The clusters vary, depending upon the value of λ. The use of the
optimal value gives rise to clusters that form a sound balance be-
Fig. 8. Clusters of spatiotemporal data—Summer 2009 data, c = 3, and tween the spatial and temporal resemblance. The results identify
(a) DFT(8), (b) PAA(8), and (c) DWT(8). The optimal values of λ are 0.35,
45, and 125, respectively.
the region of the Rocky Mountains, prairie region (that is com-
posed of the southern and northern sections of the province),
and the northern part of the province (an upper portion of the
map).
For different seasons, we encounter different structures. This
is quite reasonable because in some seasons, several locations B. Comparative Study
on the map are similar in temperature, while in some other
seasons, they might be very different. Moreover, we can see Pham [44] proposed a spatial model of FCM (called Robust
that the Spring clusters are similar to the Winter clusters, while Fuzzy C-means Algorithm (RFCM)), for image segmentation.
Summer clusters are similar to the Fall clusters. The reason is This method uses a spatial penalty on membership degrees. The
that in the Spring and Winter, the temperature is low in most proposed objective function is as
parts of Alberta so that there is no significant difference in
c
β m m
n c n
2
ik xi − vk +
um
temperature in most stations. As a result, the spatial part of data V = uik uc j
has more effect on the resulting clusters. On the other hand, in the 2 i=1
i=1 k =1 k =1 c ∈M i j ∈N k
Summer and Fall, the magnitude of temperature in the Rocky (21)
Mountains area (south west Alberta) is significantly different where Nk denotes the neighbors of station k, and Mi =
from the temperature recorded in some other areas [as can be {1, 2, . . . , c} − {i}. Equation (21) is composed of two parts:
seen from Fig. 6(b)] so that the temporal part of the data has more the FCM objective function for temporal part of data and a spa-
effects. Fig. 8 shows the clusters obtained for Summer 2009 tial regularization term. β is a weight to control the effect of each
data, λopt and c = 3. The stars denote the spatial prototypes. part in clustering (like λ in our method). The aforementioned
There are clear differences between the clusters when using objective function can be minimized by calculating partition
different representations of the time series. This is not surprising matrix and prototypes in an iterative process. Let us say that the
as different representation methods capture different facets of kth object has a high membership degree to ith cluster. Mini-
the time series. In addition, for each representation method, the mizing (21) leads to the reduction of the membership degrees
distinguishability of the features can be different, and as a result, of objects in Nk to the cluster centers in Mi . Coppi et al. [59]
TABLE V
COMPARISON OF RC, PC, AND RFCM OVER THE EVALUATION CRITERIA (22)
FOR DIFFERENT REPRESENTATIONS AND NUMBER OF CLUSTERS
sult, always in (22), we have Q ≥ 2. We calculated Q for RC,

PC, and RFCM. In RFCM, to find the optimal value of β, a
heuristic can be used. In [44] and [59], different values of β in a
range is checked to optimize an objective function. This objec-
tive function is minimizing a cross-validation error in [44] and
maximizing a spatial autocorrelation in [59]. Since the evalua-
tion criterion in this comparison is Q in (22), we check different
values of β and select the one that can minimize it. Table V
shows the comparison for different representations and differ-
ent number of clusters for Alberta temperature data in 2009.
As can be seen from this table, in most cases, RC and PC have
a lower value of Q because these methods consider the same
importance for each part of data in clustering, while RFCM pays
less attention to the spatial part. In fact, in RFCM, the spatial
part of data has been used to smooth the temporal clusters (like
spatial smoothing of pixels in image processing). In addition,
we can see that for different representation methods, there are
different amounts of Q because each representation method
captures special kinds of features, and based on these features,
the temporal structures are different.
Fig. 9. Plot of spatiotemporal clusters for 2009 for (a) λ = 0, (b) λ = 10 000,
and (c) λ = λo p t using PC. The number of clusters c = 4 and DFT(32) used as
the representation method.
C. Prediction Abilities
extended this method to cluster spatial time series. To compare In this experiment, we consider a part of the 2009 Alberta
our method (using the RC and PC) with the RFCM, we propose temperature dataset as the training samples xtrain , and the oth-
the following evaluation criterion: ers as testing samples xtest , and predict the temporal part of the
J (x(s)|U ) J (x(t)|U ) testing samples based on their spatial coordinates. The proce-
Q= + (22) dure of this experiment is given in the following.
J (x(s)) J (x(t))
1) Cluster the training samples using the augmented FCM
where U is the optimal partition matrix in spatiotemporal clus- and PC to find the optimal clusters (using optimal λ). The
tering (resulted from optimal λ in our methods and optimal β in result
trainis a settrain
of spatiotemporal
prototypes in the form of
RFCM). J (x(s)|U ) is the FCM objective function for spatial v (s)|v (t) .
part of data by considering U as its partition matrix and calcu- 2) Using the spatial part of the testing samples xtest (s) and
lating new prototypes. J (x(s)) is the FCM objective function the spatial part of the calculated prototypes vtrain (s), cal-
resulting from clustering spatial part of data separately. In ad- culate the new partition matrix Ũ using (17).
dition, x(t) denotes the temporal part of data. In fact, J (x(s)) 3) Predict the temporal part of the testing samples using
and J (x(t)) are two normalization terms. The intuition behind Ũ and the temporal part of the calculated prototypes
the proposed criterion is that we consider a clustering as an vtrain (t).
“appropriate” clustering, if it is suitable for both spatial part In this experiment, we consider ntest = 74 (around 30%)
and temporal part of data. The lower the value of Q, the more stations of the 2009 Alberta temperature dataset as the testing
appropriate the spatiotemporal clusters. Notice that since, in samples and the other stations as training samples.
clustering spatial (or temporal) part of data separately, we do Table VI shows the average prediction error for the testing set
not consider the other part, the resulting partition matrix will (called testing error), average prediction error for the training set
be the optimal one for that part, and obviously, we will have (training error), and an average error rate for different represen-
J (x(s)|U ) ≥ J (x(s)) and J (x(t)|U ) ≥ J (x(t)), and as a re- tations and different number of clusters over 100 independent
TABLE VI
AVERAGE AND STANDARD DEVIATION OF TESTING ERROR, TRAINING ERROR,
AND ERROR RATE REPORTED OVER 100 INDEPENDENT RUNS
Fig. 11. Original and predicted time series for (a) station a, (b) station b, and
(c) station c.
Fig. 10. (a) Selected testing samples with three labeled stations a, b, and c for
prediction. (b) Clusters of training samples with two labeled prototypes P1 and this figure. Fig. 10(b) shows the optimal clustering (λopt = 0.65)
P2. of the training samples. In this figure, two prototypes, i.e., P1
and P2, are labeled. The number of clusters was set to 5, and
runs. In addition, we define the error rate as DFT(32) representation of time series is used.
Fig. 11 shows the reconstructed time series by the original
testing error
E= . (23) features (32 DFT features) and predicted features. Using the
training error PC, the temporal part of stations a and b has been predicted
In Table VI, with the increase of the number of clusters, both with a high accuracy. However, the prediction for station c is
testing and training errors are reduced. This is quite reasonable not accurate because this station is between two clusters P1 and
since having more clusters means having more prototypes and P2 (see Fig. 10) with two very different temporal patterns. In
more information about data, and as a result, the prediction can fact, the spatial part of c is close to P1, but its temporal part is
be more accurate. Moreover, because the clustering is performed close to P2.
on training samples, the defined error rate in (23) is always Fig. 12 shows the original and predicted time series of station
higher than 1 and by increasing the number of clusters, the c along with the time series corresponding to the prototypes
reduction in training error is higher than the reduction in testing P1 and P2. Both predicted and original time series of station
error so that the rate of testing error to training error is increased. c are almost between the time series corresponding to P1 and
Fig. 10(a) shows an example of selected stations as testing P2. P1 has more effect on prediction, because the spatial part
samples (star symbols) and the others as training samples. Three of station c is closer to the spatial part of P1, and as a result,
stations a, b, and c from testing samples have been labeled in P1 has a higher weight (in the form of membership degree Ũ )
Fig. 12. Original and predicted time series for station c [in Fig. 10(a)] and the
time series corresponding to the prototypes P1 and P2.
Fig. 14. Predicted time series and the time series corresponding to the neigh-
bors of (a) station a and (b) station b highlighted in Fig. 13.
ear model and try to find the parameters of the corresponding

model using historical data. Then, the generated model is used
Fig. 13. Generated two unseen spatial points a and b and their neighbors in to forecast the time series in the future.
the map.
VII. CONCLUSION
for prediction. One may consider more clusters to achieve more We have introduced the concept and algorithmic framework
accurate prediction. For example, the prediction error for station of fuzzy clustering for spatiotemporal data. It was shown that
c with number of clusters 2, 5, 8, and 12 is 1.283, 1.240, 0.684, given a different nature of spatial and temporal components of
and 0.511, respectively. the data, their different treatment is realized through a flexible
In the next step, we consider the entire data as training sam- distance function where the parameter λ, controlling the influ-
ples, and predict the temporal part of some unseen spatial co- ence of temporal and spatial components, is optimized through
ordinates in the map. The procedure is the same as used in the the minimization of the RC or PC.
previous experiment. Fig. 13 shows two generated spatial points In this research, we confined ourselves to univariate time
a and b in the map. In addition, for each point, a number of sta- series. An interesting extension could be to consider multi-
tions is selected as their neighbors. Fig. 14(a) and (b) shows variate time series. Here, the data come in the form xi =
the predicted time series for a and b along with the time series T
[xi (s)|xi1 (t), xi2 (t), . . . , xiM (t)] where xik (t) is the kth vari-
corresponding to their neighbors. As seen from these figures,
able (e.g., temperature), and M is number of variables present
the predicted time series for points a and b are similar to their
in the temporal part of data. As each time series might come
neighbors (time series).
with its own specificity, this could be reflected in the augmented
The PC that has been used in this paper is different from the
additive distance function expressed as
time-series forecasting methods proposed in literature in both
methodology and purpose. Our PC predicts the time series based d2λ (vi , xk ) = vi (s) − xk (s)2 + λ1 vi1 (t) − xk 1 (t)2
on their spatial location and the time series formed in the cluster
+ · · · + λM viM (t) − xk M (t)2 (24)
centers. In addition, in this method, the objective is to find an
optimal tradeoff to regulate the interaction between spatial and where M weight coefficients λ1 , λ2 , . . . , λM offer the required
temporal patterns in the clustering process and not forecasting flexibility, and the values of these coefficients could be subject
the time series for the future time steps. Time-series forecasting to optimization again by taking advantage of the RC or PC.
methods proposed in the literature (e.g., [23], [26], and [31]) Another interesting development worth pursuing would be to
usually assume that the times series follow a linear or nonlin- investigate some other distance measures, e.g., the dynamic time
warping distance, longest common subsequence distance, etc. [22] E. Keogh, S. Chu, D. Hart, and M. Pazzani, “An online algorithm for
One has to be aware of the fact that as we encounter various segmenting time series,” in Proc. IEEE Int. Conf. Data Mining, 2001,
pp. 289–296.
distance functions, this may pose challenges at the end of fuzzy [23] G. E. P. Box and G. Jenkins, Time Series Analysis: Forecasting and Con-
clustering and further refinements of the generic FCM method trol. San Francisco, CA: Holden-Day, 1976.
to cope with the diversity of distance measures different from [24] M. Ramoni, P. Sebastiani, and P. Cohen, “Bayesian clustering by dynam-
ics,” Mach. Learn., vol. 47, no. 1, pp. 91–121, 2002.
the Euclidean one. [25] M. Vlachos, D. Gunopulos, and G. Kollios, “Discovering similar multidi-
mensional trajectories,” in Proc. Int. Conf. Data Eng., 2002, pp. 673–684.
[26] M. H. Magalhães, R. Ballini, and F A. C. Gomide, “Granular mod-
REFERENCES els for time-series forecasting,” in Handbook of Granular Computing,
W. Pedrycz, A. Skowron, and V. Kreinovich, Eds. New York: Wiley-
[1] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast subsequence Interscience, 2008.
matching in time-series databases,” in Proc. ACM SIGMOD Int. Conf. [27] T. W. Liao, “Clustering of time series data—a survey,” Pattern Recognit.,
Manage. Data, 1994, pp. 419–429. vol. 38, no. 11, pp. 1857–1874, Nov. 2005.
[2] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionality [28] E. A. Maharaj, P. D’Urso, and D. U. A. Galagedera, “Wavelet-based fuzzy
reduction for fast similarity search in large time series databases,” J. clustering of time series,” J. Classif., vol. 27, no. 2, pp. 231–275, 2010.
Knowl. Inf. Syst., vol. 3, no. 3, pp. 263–286, Aug. 2001. [29] P. D’Urso and E. A. Maharaj, “Autocorrelation-based fuzzy clustering of
[3] K.-P. Chan and A.W.-C. Fu, “Efficient time series matching by wavelets,” time series,” Fuzzy Sets Syst., vol. 160, no. 24, pp. 3565–3589, Dec. 2009.
in Proc. Int. Conf. Data Eng., 1999, pp. 126–133. [30] E. A. Maharaj and P. D’Urso, “Fuzzy clustering of time series in the
[4] K.-P. Chan, A. W.-C. Fu, and C. Yu, “Haar wavelets for efficient similar- frequency domain,” Inf. Sci., vol. 181, no. 7, pp. 1187–1211, Apr. 2011.
ity search of time-series: With and without time warping,” IEEE Trans. [31] H. G. Seedig, R. Grothmann, and T. A. Runkler, “Forecasting of clustered
Knowl. Data Eng., vol. 15, no. 3, pp. 686–705, May/Jun. 2003. time series with recurrent neural networks and a fuzzy clustering scheme,”
[5] S. Mallat, “A theory for multiresolution signal decomposition: The wavelet in Proc. Int. Joint Conf. Neural Netw., Atlanta, GA, 2009, pp. 2846–2853.
representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 2, [32] C. S. Möller-Levet, F. Klawonn, K.-H. Cho, and O. Wolkenhauer, “Fuzzy
pp. 674–693, Jul. 1989. clustering of short time series and unevenly distributed sampling points,”
[6] F. Korn, H. V. Jagadish, and C. Faloutsos, “Efficiently supporting ad-hoc in Proc. 5th Int. Symp. Intell. Data Anal., 2003, pp. 28–30.
queries in large datasets of time sequences,” in Proc. ACM SIGMOD Int. [33] X. Zhang, J. Liu, Y. Du, and T. Lv, “A novel clustering method on time
Conf. Manage. Data, New York, 1997, pp. 289–300. series data,” Expert Syst. Appl., vol. 38, no. 9, pp. 11891–11900, Sep.
[7] D. Berndt and J. Clifford, “Using dynamic time warping to find patterns in 2011.
time series,” in Proc. Workshop Knowledge Discovery Databases, 1994, [34] F. Petitjean, A. Ketterlin, and P. Gancarski, “A global averaging method for
pp. 359–370. dynamic time warping, with applications to clustering,” Pattern Recognit.,
[8] J. Caiado, N. Crato, and D. Peña, “A periodogram-based metric for time se- vol. 44, no. 3, pp. 678–693, Mar. 2011.
ries classification,” Comput. Statist. Data Anal., vol. 50, no. 10, pp. 2668– [35] K. Kalpakis, D. Gada, and V. Puttagunta, “Distance measures for effective
2684, Jun. 2006. clustering of ARIMA time-series,” in Proc. IEEE Int. Conf. Data Mining,
[9] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algo- 2001, pp. 273–280.
rithms. New York: Plenum, 1981. [36] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction
[10] W. Pedrycz and J. V. de Oliveira, “A development of fuzzy encoding and to Cluster Analysis. New York: Wiley, 1990.
decoding through fuzzy clustering,” IEEE Trans. Instrum. Meas., vol. 57, [37] Y. Xiong and D. Yeung, “Time series clustering with ARMA mixtures,”
no. 4, pp. 829–837, Apr. 2008. Pattern Recognit., vol. 37, no. 8, pp. 1675–1689, Aug. 2004.
[11] W. Pedrycz and A. Bargiela, “Fuzzy clustering with semantically dis- [38] G. Schwarz, “Estimating the dimension of a model,” Ann. Statist., vol. 6,
tinct families of variables: Descriptive and predictive aspects,” Pattern no. 2, pp. 461–464, 1978.
Recognit. Lett., vol. 31, no. 13, pp. 1952–1958, Oct. 2010. [39] P. D’Urso, “Fuzzy clustering for data time arrays with inlier and outlier
[12] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from time trajectories,” IEEE Trans. Fuzzy Syst., vol. 13, no. 5, pp. 583–604,
incomplete data via the EM algorithm,” J. Royal Statist. Soc., Series B,, Oct. 2005.
vol. 39, no. 1, pp. 1–38, 1977. [40] M. Sato and Y. Sato, “On a multicriteria fuzzy clustering method for 3-
[13] S. Gaffney and P. Smyth, “Trajectory clustering with mixtures of regres- way data,” Int. J. Uncertainty Fuzziness Knowl.-Based Syst.,, vol. 2, no. 2,
sion models,” in Proc. 5th ACM SIGKDD Int. Conf. Knowl. Discovery pp. 127–142, Jun. 1994.
Data Mining, 1999, pp. 63–72. [41] A. Lemos, W. Caminhas, and F. Gomide, “Multivariable Gaussian evolv-
[14] M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander, “OPTICS: Or- ing fuzzy modeling system,” IEEE Trans. Fuzzy Syst., vol. 19, no. 1,
dering points to identify the clustering structure,” in Proc. ACM SIGMOD pp. 91–104, Feb. 2011.
Int. Conf. Manag. Data, Philadelphia, PA, 1999, pp. 49–60. [42] Z. Chen, S. Aghakhani, J. Man, and S. Dick, “ANCFIS: A neuro fuzzy
[15] M. Nanni and D. Pedreschi, “Time-focused clustering of trajectories of architecture employing complex fuzzy sets,” IEEE Trans. Fuzzy Syst.,
moving objects,” J. Intell. Inf. Syst., vol. 27, no. 3, pp. 267–289, Nov. vol. 19, no. 2, pp. 305–322, Apr. 2011.
2006. [43] S. Chen and C. Chen, “TAIEX forecasting based on fuzzy time series and
[16] Y. Yang and K. Chen, “Time series clustering via RPCL network ensemble fuzzy variation groups,” IEEE Trans. Fuzzy Syst., vol. 19, no. 1, pp. 1–12,
with different representations,” IEEE Trans. Syst., Man, Cybern. C, Appl. Feb. 2011.
Rev., vol. 41, no. 2, pp. 190–199, Mar. 2011. [44] D. L. Pham, “Spatial models for fuzzy clustering,” Comput. Vis. Image
[17] J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing SAX: A novel Understand., vol. 84, no. 2, pp. 285–297, 2001.
symbolic representation of time series,” Data Mining Knowl. Discovery, [45] V. Petridis and A. Kehagias, “Predictive modular fuzzy systems for time-
vol. 15, no. 2, pp. 107–144, Aug. 2007. series classification,” IEEE Trans. Fuzzy Syst., vol. 5, no. 3, pp. 381–397,
[18] K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani, “Locally adaptive Aug. 1997.
dimensionality reduction for indexing large time series databases,” ACM [46] S. M. Arafat and M. Skubic, “Modeling fuzziness measures for best
Trans Database Syst., vol. 27, no. 2, pp. 188–228, Jun. 2002. wavelet selection,” IEEE Trans. Fuzzy Syst., vol. 16, no. 5, pp. 1259–
[19] H. Cao, H. W. Deng, and Y. P. Wang, “Segmentation of M-FISH images 1270, Oct. 2008.
for improved classification of chromosomes with an adaptive fuzzy C- [47] P. Kalnis, N. Mamoulis, and S. Bakiras, “On discovering moving clusters
means clustering algorithm,” IEEE Trans. Fuzzy Syst., vol. 20, no. 1, in spatio-temporal data,” in Proc. Int. Symp. Spatial Temporal Databases,
pp. 1–9, Feb. 2012. 2005, pp. 364–381.
[20] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, “Query- [48] L. Chen, M. T. Özsu, and V. Oria, “Robust and fast similarity search for
ing and mining of time series data: Experimental comparison of represen- moving object trajectories,” in Proc. ACM SIGMOD Int. Conf. Manage.
tations and distance measures,” in Proc. VLDB Endowment, Auckland, Data, 2005, pp. 491–502.
New Zealand, 2008, pp. 1542–1552. [49] L. F. S. Coletta, L. Vendramin, E. R. Hruschka, R. J. G. B. Campello, and
[21] Y. Cai and R. Ng, “Indexing spatio-temporal trajectories with Chebyshev W. Pedrycz, “Collaborative fuzzy clustering algorithms: Some refinements
polynomials,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2004, and design guidelines,” IEEE Trans. Fuzzy Syst., vol. 20, no. 3, pp. 444–
pp. 599–610. 462, Jun. 2012.
[50] D. T. Pham and A. B. Chan, “Control chart pattern recognition using a Witold Pedrycz (M’88–SM’94–F’99) received the
new type of self organizing neural network,” Proc. Inst. Mech. Eng., Part M.Sc., Ph.D. and D.Sci. degrees from the Silesian
I: J. Syst. Control Eng., vol. 212, no. 2, pp. 115–127, 1998. University of Technology, Gliwice, Poland.
[51] S. Kisilevich, F. Mansmann, M. Nanni, and S. Rinzivillo, “Spatio- He is currently a Professor and Canada Research
temporal clustering,” in Data mining and Knowledge Discovery Hand- Chair (CRC computational intelligence) with the De-
book. New York: Springer, 2010, pp. 855–874. partment of Electrical and Computer Engineering,
[52] M. Kulldorff, “Prospective time periodic geographical disease surveillance University of Alberta, Edmonton, AB, Canada. In
using a scan statistic,” J. Roy. Statist. Soc. A, vol. 164, no. 1, pp. 61–72, 2009, he was elected as a foreign member of the Pol-
2001. ish Academy of Sciences, Warsaw, Poland. He is the
[53] H. Izakian and W. Pedrycz, “A new PSO-optimized geometry of spatial author of 14 research monographs covering various
and spatio-temporal scan statistics for disease outbreak detection,” Swarm aspects of computational intelligence and software
Evol. Comput., vol. 4, pp. 1–11, Jun. 2012. engineering. He is also with the Department of Electrical and Computer Engi-
[54] F. Di Martino and S. Sessa, “The extended fuzzy C-means algorithm neering Faculty of Engineering, King Abdulaziz University, Jeddah, Kingdom
for hotspots in spatio-temporal GIS,” Expert Syst. Appl., vol. 38, no. 9, of Saudi Arabia. His main research interests include computational intelligence,
pp. 11829–11836, Sep. 2011. fuzzy modeling and granular computing, knowledge discovery and data mining,
[55] M. Wang, A. Wang, and A. Li, “Mining spatial-temporal clusters from fuzzy control, pattern recognition, knowledge-based neural networks, relational
geo-databases,” in Proc. 2nd Int. Conf. Adv. Data Mining Appl., 2006, computing, and software engineering. He has published numerous papers in this
pp. 63–270. area.
[56] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm Prof. Pedrycz was elected as a Fellow of the Royal Society of Canada in 2012.
for discovering clusters in large spatial databases with noise,” Data Mining He has been a member of numerous program committees of IEEE conferences in
Knowl. Discovery, pp. 226–231, 1996. the area of fuzzy sets and neurocomputing. He is intensively involved in editorial
[57] Z. Liu and R. George, “Fuzzy cluster analysis of spatio-temporal data,” in activities. He is an Editor-in-Chief of Information Sciences and Editor-in-Chief
Proc. 18th Int. Symp. Comput. Inf. Sci., Antalya, Turkey, 2003, pp. 984– of the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A:
991. SYSTEMS AND HUMANS. He currently serves as an Associate Editor of the IEEE
[58] M. Deng, Q. Liu, J. Wang, and Y. Shi, “A general method of spatio- TRANSACTIONS ON FUZZY SYSTEMS and is a member of a number of edito-
temporal clustering analysis,” Sci. Chin. Inf. Sci., pp. 1–14, 2011. rial boards of other international journals. In 2007, he received a prestigious
[59] R. Coppi, P. D’Urso, and P. Giordani, “A fuzzy clustering model for Norbert Wiener Award from the IEEE Systems, Man, and Cybernetics Coun-
multivariate spatial time series,” J. Classif., vol. 27, no. 1, pp. 54–88, Mar. cil. He received the IEEE Canada Computer Engineering Medal in 2008. In
2010. 2009, he received a Cajastur Prize for soft computing from the European Centre
[60] Y. C. Cheng and S. T. Li, “Fuzzy time series forecasting with a probabilis- for Soft Computing for “pioneering and multifaceted contributions to granular
tic smoothing hidden Markov model,” IEEE Trans. Fuzzy Syst., vol. 20, computing.”
no. 2, pp. 291–304, Apr. 2012.
[61] J. Wu, H. Xiong, C. Liu, and J. Chen, “A generalization of distance func-
tions for fuzzy C-means clustering with centroids of arithmetic means,”
IEEE Trans. Fuzzy Syst., vol. 20, no. 3, pp. 557–571, Jun. 2012.
[62] J. P. Mei and L. Chen, “A fuzzy approach for multitype relational data
clustering,” IEEE Trans. Fuzzy Syst., vol. 20, no. 2, pp. 358–371, Apr.
2012.
Hesam Izakian (S’12) received the M.S. degree in Iqbal Jamal received the M.S. degree in management
computer engineering (artificial intelligence) from science and the M.A.Sc.Eng. degree from the Univer-
the University of Isfahan, Isfahan, Iran. He is cur- sity of British Columbia, Vancouver, BC, Canada.
rently working toward the Ph.D. degree with the De- He is currently a Principal of AQL Management
partment of Electrical and Computer Engineering, Consulting (AQLMC) Inc., Edmonton, AB, Canada:
University of Alberta, Edmonton, AB, Canada. a data mining/analytics-based company. AQLMC
He is working under the supervision of Prof. W. specializes in developing and implementing data an-
Pedrycz. His research interests include computational alytics in support of anomaly detection for animal,
intelligence, knowledge discovery and data mining, human, and environmental health. AQLMC also con-
pattern recognition, and software engineering. ducts operations analysis for public sector services.

Izakian, 2013 - Clustering Spatiotemporal Data An Augmented Fuzzy C-Means

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Izakian, 2013 - Clustering Spatiotemporal Data An Augmented Fuzzy C-Means

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO.

5, OCTOBER 2013 855

Clustering Spatiotemporal Data: An Augmented

1063-6706 © 2013 IEEE

As noted earlier, our interest is in the augmentation of the FCM

where m(m > 1) is a fuzzification coefficient. The distance d

xk (t) − x̂k (t)2 = (16)

original extracted features and the reconstructed features (see

Fig. 5. Plots of reconstruction and prediction errors versus λ for c = 3 and

A. Analysis of Alberta Temperature Data

(clustering) these stations based on their locations and their

for different representation methods, the revealed structures in

sult, always in (22), we have Q ≥ 2. We calculated Q for RC,

ear model and try to find the parameters of the corresponding

You might also like

Izakian, 2013 - Clustering Spatiotemporal Data An Augmented Fuzzy C-Means

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Izakian, 2013 - Clustering Spatiotemporal Data An Augmented Fuzzy C-Means

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO.

5, OCTOBER 2013 855

Clustering Spatiotemporal Data: An Augmented

1063-6706 © 2013 IEEE

As noted earlier, our interest is in the augmentation of the FCM

where m(m > 1) is a fuzzification coefficient. The distance d

xk (t) − x̂k (t)2 = (16)

original extracted features and the reconstructed features (see

Fig. 5. Plots of reconstruction and prediction errors versus λ for c = 3 and

A. Analysis of Alberta Temperature Data

(clustering) these stations based on their locations and their

for different representation methods, the revealed structures in

sult, always in (22), we have Q ≥ 2. We calculated Q for RC,

ear model and try to find the parameters of the corresponding

You might also like

xk (t) − x̂k (t)2 = (16)