You are on page 1of 7

Performance of Fuzzy C-Means Method in Rainfall Clustering

Achmad Fanany Onnilita Gaffar1, Haviluddin2, Novianti Puspitasari3, Edy


Budiman4
1
Department of Information Technology, State of Polytechnic Samarinda, Indonesia
2,3,4
Faculty of Computer Science and Information Technology, Mulawarman
University, Indonesia

*Email: haviluddin@gmail.com, miechan.novianti@gmail.com

Abstract. An accurate cluster of the rainfall is important in our life. For this purpose, a
computational intelligence, namely Fuzzy C-Means (FCM) method has been presented in order
to investigate the rainfall in East Kalimantan, Indonesia. This experiment showed that the FCM
method performed well in modelling the rainfall. This method was able to preserve the clustering
of the observed data. In other words, this method can be used in modelling and clustering of
rainfall time series datasets. The cluster is described high, medium and low rainfall at certain
times. Furthermore, the clustering result could be a recommendation in our daily activities such
as socio-economic, agriculture, plantation, fisheries, and so forth.

Keywords: FCM, rainfall, clustering

1. Introduction
Weather is one of the determinants of climatic conditions. Where, one of the factors that directly affect
the type or variation of climate is rainfall. Therefore, information on rainfall is an important element and
greatly affects all life conditions such as public safety, socio-economic, agriculture, plantation, fisheries,
and even aviation in a region. East Kalimantan Province has considerable rainfall in every year with
rainfall patterns that variation by place and time, also unpredictable. Therefore, information on
precipitation patterns occurring by using appropriate methods is necessary. So that, people can know
and utilize rainfall pattern information according to their needs [1].
Until now, some various methods for data processing have been implemented such as prediction,
clustering, and classification data using Fuzzy methods [2]. This method can classify a data based on
similarity. One of the fuzzy clustering methods is the FCM. Some researchers’ recommendations that
the FCM method for providing various decision has been widely applied in various studies because of
its good performance [3, 4]. Several studies showing the performance of C-Means fuzzy include
comparing the performance of the FCM method with the K-Means method for telecom-oriented telecom
data and meteorological data. Where, the results show that the performance of fuzzy C-Means is better
than K-Means based on the computing time of each method [3, 5]. Other studies suggest that the fuzzy
C-Means algorithm shows excellent performance with regard to the quality of the grouping of data.
In this research, FCM method for grouping rainfall pattern that happened in East Kalimantan region
into three categories that is low, medium and high have been applied. The aims of this study is to
minimize the objective function set in the clustering process, so that the data clustering density based
on the degree of membership is good [5]. Therefore, this paper will apply Fuzzy C-Means (FCM) model
that have been explored in order to clustering the rainfall in East Kalimantan. Section 2 describes the
architectures of FCM model. Section 3 describes the analysis and discussion of the results. Finally,
conclusions are summarized in Section 4.

2. Method
In this section, a brief information on the general rainfall clustering models is presented by using Fuzzy
C-Means (FCM).

2.1. Fuzzy C-Means


In this study, FCM method have been implemented in order to clustering rainfall. In principle, the FCM
method is based on the criteria of the amount of distance between clusters. Therefore, the FCM
clustering method is based on the process of partitioning from a set of data into a number of clusters, in
which all objects in each cluster have a certain degree of similarity. Then, each feature vector is rated
between [0…1] by using the membership function.
The FCM method is also based on the Point-Prototype Clustering (PPC) method with the optimum
partition output of Centroid. Where, the optimization of Centroid partition is obtained by minimizing
objective function, Eq. 1 and 2.
𝑁 𝐶

𝐽𝐹𝐶𝑀 (𝑈, 𝑉) = ∑ ∑(𝑢𝑖𝑗 )𝑞 (𝑑𝑗𝑖 )2 (1)


𝑗=1 𝑖=1
Where:
U = fuzzy K-partition datasets
V = prototype centroid datasets 𝑉 = {𝑣1 , 𝑣2 , … , 𝑣𝑐 } ⊂ 𝑅 2
(𝑑𝑗𝑖 )2 = ‖𝑥𝑗 − 𝑣𝑖 ‖2 =√(𝑥𝑗(𝑟𝑜𝑤) − 𝑣𝑖(𝑟𝑜𝑤) )2 + 𝑥𝑗(𝑐𝑜𝑙) − 𝑣𝑖(𝑐𝑜𝑙) )2 (2)

Euclidean Distance between 𝑥𝑗 and 𝑣𝑖


xj = feature vector to j. Data feature vector  𝑋 = {𝑥1 , 𝑥2 , … , 𝑥𝑐 } ⊂ 𝑅 𝑃
vi = centroid cluster to i.
uij = membership degree xj in cluster to i
N = total data
C = total cluster
q = fuzzifier parameter, q > 1

The steps of the FCM clustering algorithm are as follows.


 Initialization of centroid vectors vi (prototypes).
 Calculate the distance between feature vectors (X) to the centroid vector (V). The closest
distance of feature vectors to one of the centroid vectors is expressed as a cluster member with
intended centroid.
 Calculate of membership degree for all feature vectors in all clusters using Eq. 3.
1 1/(𝑞−1)
[ ]
1 (𝑑𝑗𝑖 )2
𝑢𝑖𝑗 = 1/(𝑞−1) = 1/(𝑞−1) (3)
(𝑑𝑗𝑖 )2 ∑𝐾
1
∑𝐾
𝑘=1 (𝑑 )2 ]
[ 𝑘=1 [ (𝑑𝑗𝑘 )2
]
𝑗𝑘

 Calculate new centroid using Equation 4.


∑𝑁 𝑞
𝑗=1(𝑢𝑖𝑗 ) 𝑋𝑗
𝑉̂𝑖 =
∑𝑁
𝑗=1(𝑢𝑖𝑗 )
𝑞 (4)

 Re-Calculate step 4 𝑢𝑖𝑗 → 𝑢̂𝑖𝑗


 If
𝑀𝑎𝑥𝑖𝑗 |𝑢𝑖𝑗 → 𝑢̂𝑖𝑗 |, where  is a termination criteria [0...1], then iteration process will stop or
back to step 5.

START

Read Feature Initialize Matrix of


Vector Cluster Member
X=( x1, x2, ..xn) G = (1 .. n ;1 .. K)

Cluster Determine cluster


number (K) ? member by
Target error () Minimum D

Initialize Centroid Calculate degree of


Vector member of X
C = (c1, c2, ...cK) u

Calculate distance Calculate New


between Feature and Centroid Vector
Centroid Vector Cnew
D = dist (X’,C)

Y
C = Cnew |Cnew – C| > 

END

Figure 1. FCM algorithm

2.2. Performance of Fuzzy C-Means


In the FCM method, data grouping is done by grouping data attributes into several clusters based on
data similarity. Then, measured using a distance measurement method [6, 7]. In this research, Sum of
Squared Error (SSE) as a distance measurement method have been implemented. In principle, this
method will provide information error data distance to centroid. If, SSE value smaller means that the
clustering is good. The SSE formulation using Eq. 5.

𝑆𝑆𝐸 = ∑ ∑ 𝑑(𝑝, 𝑚𝑖 )2 (5)


𝑘=1 𝑝∈𝐶𝑖
Where, p ∈ Ci is an each data point in cluster i; mi is centroid from cluster i; d is neighborhood
distances in each cluster i.
2.3. Datasets
In this study, the rainfall data from 1986-2008 (276 samples data series) which have been taken from
the 13 station Meteorology in Tenggarong, East Kalimantan, Indonesia were captured. Before analyzed,
the rainfall datasets have been normalized using Eq. 6. In this study, the rainfall datasets were classified
into three includes highest, average, and lowest then analyzed by using MATLAB R2013b. The rainfall
dataset can be seen in Table 1.

(𝑥 − 𝑥𝑚𝑖𝑛 )
𝑋̅ = (6)
(𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 )
Where, 𝑋̅ is a data normalization from 𝑥; 𝑥𝑚𝑎𝑥 is a maximum values; 𝑥𝑚𝑖𝑛 is a minimum values. In
this experiment, MATLAB software for FCM method have been implemented.

Table 1. Rainfall Data Normalization

Rainfall (mm)
Areas
Highest Lowest Average
Long Iram 4598,4 1057,0 2814,3
Melak 4273,0 0,0 1723,9
Kota Bangun 4036,5 1346,0 2365,7
Muara Kaman 7692,0 0,0 3923,3
Teluk Dalam 3051,0 955,0 2054,1
Tenggarong 7263,0 797,0 3585,7
Muara Ancalong 1611,0 396,0 1072,1
Temindung 2757,5 1565,9 2217,7
Baqa 3828,0 0,0 1967,1
Samboja 2859,0 111,5 1494,0
Klandasan 3654,4 744,0 2082,3
Sepinggan 3785,6 1483,0 2690,2
Waru 3296,0 637,0 2126,3

3. RESULT AND DISCUSSION


In this section, the testing results of rainfall dataset in East Kalimantan Province by using the FCM
method will be described. After data normalization process, then the number of clusters has been
established. In this study, rainfall datasets will be grouped into three clusters, namely lowest (C1),
average (C2) and high (C3) rainfall with maximum iteration 100 times. Furthermore, 𝑋𝑖1 as a highest,
𝑋𝑖2 as an average, and 𝑋𝑖3 as a lowest rainfall parameters have been is used. From the process, the value
of the objective function; centroid and the degree of membership of each cluster in the last iteration has
been established. The training and testing data patterns are shown in Figure.
RainFall Pattern

4000 6
4

3000
1 12
3
Average

2000 8

11
13 5
1000 9
2
10
0
8000 7

6000 2000
4000 1500
1000
2000 500
Max 0 0
Min

Figure 1. Rainfall Training Pattern

Mapping of RainFall Pattern (3 clusters) Mapping of RainFall Pattern (4 clusters)

4000 6 4000 6
4 4

3000 3000
1 12 1 12
3 3
Average
Average

2000 8 2000 8

11
13 5 11
13 5
1000 9 1000 9
2 2
10 10
0 0
7 8000 7
8000
6000 2000 6000 2000
4000 1500 4000 1500
1000 1000
2000 500 2000 500
Max 0 0 Max 0 0
Min Min
3 cluster 4 clusters
Figure 2. Rainfall in three clusters Figure 3. Rainfall in four clusters
Mapping of RainFall Pattern (5 clusters) Mapping of RainFall Pattern (6 clusters)

4000 6 4000 6
4 4

3000 3000
1 12 1 12
3 3
Average
Average

2000 8 2000 8

11
13 5 11
13 5
1000 9 1000 9
2 2
10 10
0 0
7 8000 7
8000
6000 2000 6000 2000
4000 1500 4000 1500
1000 1000
2000 500 2000 500
Max 0 0 Max 0 0
Min Min
6 clusters
5 clusters
Figure 4. Rainfall in five clusters Figure 5. Rainfall in six clusters
Table 1. Moving centroid values in each clusters

Centroid
No Centroid
Cluster Attribute
1 2 3 4 5 6
Min 760.710 495.627 729.342
3 Max 3600.650 7016.902 3321.239
Average 2106.354 3550.253 2035.559
Min 734.244 734.244 760.490 486.455
4 Max 3387.089 3387.089 3639.785 7219.854
Average 2053.622 2053.622 2114.250 3633.189
Min 649.068 1094.850 658.262 472.491 658.262
5 Max 3569.541 3872.804 3310.356 7303.370 3310.356
Average 2043.896 2369.027 1996.135 3667.904 1996.135
Min 687.263 637.925 637.925 637.924 1178.920 466.631
6 Max 3627.538 3339.128 3339.128 3339.129 3954.122 7341.638
Average 2073.427 1996.919 1996.919 1996.918 2419.612 3684.786

Table 1 shows that even number of centroid keeps generating an odd number of clusters because
there are two centroids have the same position. In other words, the last values of odd centroid position
have been determined at even centroid. In this experiment, 3, 4, 5 and 6 clustering have been
implemented. The clustering results can be seen in Table 2.

Table 2. The clustering results

No No Cluster
Areas
Data 3 4 5 6
Long Iram 1 1 3 2 5
Melak 2 1 3 1 1
Kota Bangun 3 1 3 2 5
Muara Kaman 4 2 4 4 6
Teluk Dalam 5 3 1 5 2
Tenggarong 6 2 4 4 6
Muara Ancalong 7 3 1 5 2
Temindung 8 3 1 5 2
Baqa 9 1 3 1 1
Samboja 10 3 1 5 4
Klandasan 11 1 3 1 1
Sepinggan 12 1 3 2 5
Waru 13 3 1 3 2

4. CONCLUSION
In this paper, the FCM method is presented. This research was used rainfall datasets from 13
meteorology stations in East Kalimantan, Indonesia. This experiment showed that the FCM method is
better results analysis in clustering. Nevertheless, we have also concluded that FCM algorithm was
slower in processes. As future work, an optimizing methods in order to get good accuracy between
centroids is proposed.

Acknowledgments
Thanks to the Artificial Intelligence research group, Faculty of Computer Science and Information
Technology (CSIT) and the IDB Project.

References

[1] Mislan, Haviluddin, S. Hardwinarto, Sumaryono, and M. Aipassa, "Rainfall Monthly Prediction
Based on Artificial Neural Network: A Case Study in Tenggarong Station, East Kalimantan - Indonesia,"
in International Conference on Computer Science and Computational Intelligence (ICCSCI 2015),
Jakarta, Indonesia, 2015, pp. 142-151.
[2] Purnawansyah, Haviluddin, A. F. O. Gafar, and I. Tahyudin, "Comparison between K-Means and
Fuzzy C-Means Clustering in Network Traffic Activities," in 2017 International Conference on
Management Science and Engineering Management (ICMSEM), 2017.
[3] W. Wang, W. Pedrycz, and X. Liu, "Time series long-term forecasting model based on
information granules and fuzzy clustering," Engineering Applicationsof Artificial Intelligence, vol.
41(2015), pp. 17–24, 2015.
[4] S. Cramer, M. Kampouridis, A. A. Freitas, and A. K. Alexandridis, "An extensive evaluation of
seven machine learning methods for rainfall prediction in weather derivatives," Expert Systems with
Applications, vol. 85, pp. 169-181, 2017.
[5] T. Velmurugan, "Performance based analysis between k-Means and Fuzzy C-Means clustering
algorithms for connection oriented telecommunication data," Applied Soft Computing, vol. 19 (2014),
pp. 134–146, 2014.
[6] A. Agrawal and H. Gupta, "Global K-Means (GKM) Clustering Algorithm: A Survey,"
International Journal of Computer Applications, vol. 79, 2013.
[7] J. Wu, J. Long, and M. Liu, "Evolving RBF neural networks for rainfall prediction using hybrid
particle swarm optimization and genetic algorithm," Neurocomputing, vol. 148, pp. 136–142, 2015.

You might also like