06991841

2014 China International Conference on Electricity Distribution (CICED 2014)
Shenzhen, 23-26 Sep. 2014
Electricity Information Big Data Based Load Curve Clustering
Haiyan Zheng', Nong Jin2, Zheng Xiong', Cong Ji', Chao Fang', Chunlin Zhong'
1. Jiangsu Frontier Electric Technology CO.,LTD,Nanjing2111 02, China;
2. Jiangsu Electric Power Company, Nanjing210024, China
Abstract
Clustering analysis of load curves on basis of

electricity information big data is an important basis
of load characterisitic and electricity consumption
ha bits analysis of large users. In view of the slow
speed of traditional K-means clustering algorithm
in the background of big data, a parallel K-means
clustering algorithm is proposed to speed up the
clustering procedure. Firstly, all the load curves
are de-noised by wavelet decomposing in order to
reduce the influence of small fluctuations. Secondly,
a multi-core parallel technology based K-means
clustering algorithm is appl ied to load curve
clustering. Thirdly, more than 40,000 load curves are
clustered by the multi-core parallel technology based
K-means clustering algorithm. Test results show that
the proposed parallel K-means clustering algorithm
can speed up clustering procedure effectively.
Key words: big data; wavelet denoising; load curve
cluster; K-means clustering algorithm; multi-core
paralleltechnology
1 Introduction
Improvement of electricity information acquisition

system of Jiangsu Province ma kes it possi ble to
obtain daily load curve of large customers. It is an
opportunity but also a challenge for Jiangsu power
grid that how to make full use of existing electricity
information big data resources[1].
With ra pid growth of power load d emand and
continuous increase of peak and valley difference,
peak time power supply tension rises, as well as
peak shaving pressure. Analysis of large customer
load characteristics is an important basis of power
pricing strategy and load prediction. It is impossible
to analyse large customers one by one in facing an
enormous amount of load curves. And load curve
clustering is an effective way to find groups with
similar load characteristics. Besides, clustering can
also reduce the workload of load characteristics
analysis and improve the accuracy of load modelling.
K-me ans algorithm is an iterative partiti oning
clustering algorithm that separates data into k
mutually excessive groups by ta king Euclidean
distance as similarity measurement. But the resulting
set of clusters strongly depends on the selection of
initial centroids[2]. An improved K-means clustering
algorithm with better initial centroids based on
(CICED 2014 Session3)
weighted average is proposed in [3]. According to the

distribution of all data, [4][5] obtain more accurate
and stable K-means algorithm.
The original K-means algorithm is computationally
expensive in processing a large amount of data,
which restricts its application in big data clustering.
The calculation speed of K-m eans clustering
algorithm is greatly improved by the early termination
of static centre in [6]. A parallel K-means algorithm
based on distri buted computer system achieves
good results in large data sets clustering[7]. But a
distributed computer system which cost a lot should
be built at first. Multi-core CPUs are found in ordinary
computers today. Thus we propose a multi-core
parallel technology based K-means algorithm which
costs less but also effective.
2 Wavelet De-noising of Load Curves
Small fluctuations in the curve don't represent

electricity consumption trend of large consumers.
On the contrary, the fluctuations will cause bad
clustering result[8]. Due to its great time domain
characteristics, wavelet decomposition is widely
applied in signal de-noising. Therefore, we use dbN
wavelet to de-noise load curves in order to avoid the
influence of small fluctuations. And the specific steps
are as follows.
Step 1 Wavelet decomposition. Select appropriate
wavelet function and decomposition layers, and then
run do wavelet decomposition.
Step 2 Wavelet de-noising. Choose an appropriate
threshold for self-define soft threshold den-noising.
Step 3 Wavelet reconstruction. Do inverse wavelet
transform according to the coefficient of different
frequency components.
A sample load curve is shown in Fig.2 to verify
wavelet de-noising effect.
lirnPltnilr
Flg.1
Paper No. CP0843
Comparison of load curve before and after wavelet de-noising
1/4
Page /912
Lo ad curve loses its c h a racteristic of p ower

load decrease at noon after de-noising by hard
threshold wavelet, but which de-noising by soft
threshold wavelet becomes smoother, and the basic
characteristics are p reserved.
3 Multi-core pa ra ll el technology based K-means
algorithm
3.1
K-means clustering algorithm
K-means is one of the most widely used clustering

algorithm for its simple principle and significant
effect[10, 1 1]. A set of initial centroids are selected at
first, and secondly, the algorithm keeps independent
among clusters and tight within clusters through
iteration. Centroids are updated by m e an value of all
data in subset during iteration. The specific steps of
separating data set{x1, x2, ... , xN} into K clusters is
as follows.
Step 1 Select K data as initial centroids randomly.
Step 2 Calculate the degree of similarity between
data and centroids, and separate data into the
cluster with the highest degree of similarity.
Step 3 Update centroids by the mean value of
all data in subset. Check whether any centroid
changes. If changes, turn to Step 2; otherwise, end
the clustering proceeding and output the clustering
result.
3.2
data and centroids, and separate data into the

cluster with the highest degree of similarity.
Step 4 Update centroids by the mean value of all
data in subset. Check whether any centroid changes.
If changes, turn to Step 2; otherwise, turn to Step 5.
Step 5 Cl ose parallel computing process and
display clustering results.
4 Clustering analysis of Jiangsu Large Consumer
Workday load curves of 45000 large customers are
analyzed from the point of electricity information big
data of Jiangsu Province. The procedure of load
curve analysis is shown in Fig.3.
Data
Selected
r----
Data
NOl1l1alization
Wavelet
denoising
.,
- ...-
,_,A'.. ' ..
'
"t5{
\.i'p ( " , - ,
. 1,
J1''
" -
.--
-" .... -
i'.vl.,.,,'11.;[Ii.;i
d_
There are tens of thousands of load curves to

be cluster ed in the background of electricity
consumption big data, which is hard for original
K-means clustering algorithm to deal with. It is
possible to develop multi-core parallel technology
based K-means clustering algo r ithm to speed up
clustering for the mainstream computers are with
several cores. Multi-core parallel clustering sketch is
shown in Fig.2.
_ _
. "
--='DuA
1:'}t ,'i'.(1'"
__
'I\; _ '1.' '::.:.'
----
Divde data
into n part
----- ..
Transfer data
to n cores
Fig.4
'O;;npuUng ..
similarity
between data
dI!?
Fig.2 The diagram of mu lti-core parallel c lustering
The main ste ps of multi-core parallel technology

based K-means are as follows.
Step 1 Detect core number n of the computer,
activate all cores and ready for clustering.
Step 2 Select K data as initial centroids randomly.
Step 3 Divide data into n parts and distribute to the
n cores, calculate the degree of similarity between
two stage
clustering
Fig.3 Procedure of load curve ana lysis
parallel K-means clustering algorithm
----- ..
-----+
16
kinds of load curves with obvious characteristics
First, the load curves are preprocessed by following

steps:
(1) Delete customers whose load data is incomplete
or load capacity is zero, and finally 41,487 load
curves are obtained.
(2) Normalize every load curve by its daily maximum
load capacity to facilitate follow-up load curve cluster
analysis.
(3) Wavelet de-noising. Firstly, decompose every
load curve by Daubechies wavel et series and
secondly, de-noise every curve by wavelet self
defined soft-thresholding. Thirdly, get load curve
Paper No. C P0843
2/4
Page 1913
after de-noising by wavelet reconstruct.

The multi-core parallel technology based K-means
algorithm clusters all load curves into 20 categories,
and then 16 kinds of load curv es with obvious
characteristics are chosen(35,204 load curves in
total, with the proportion of 84.86%). Red thick
curves shown in FigA are represented as cluster
center of categories.
Distribution of all 16 kinds of load curves is shown in
the chart below.
8
"'Jill
10
Fig.5 Distribution of all 16 kinds
12
14
of load
16
curves
Speed-up ratio of improved K-means clustering

method based on multi-core parallel technology is
shown in Tab.1.
T
ab.1 computing time of multi-core paralle l technology based K-means
c lustering method
number of cores
com utin time/s
1
20.61
2
11.90
4
7.25
It is favorable for big data based load clustering

and analyzing that 4 core parallel computing time is
shortened to 35.18%.
The above 16 kinds of curves can be classified into
the following five categories according to their load
characteristics.
Tab.2
cateoories
II
II
kinds
.2.3,4
show that the proposed parallel K-means clustering

algorithm can speed up c lustering procedure
effectively.
Reference
[1] Viktor Mayer-Schonberger, Kenneth Cukier. Big

Data: A Revolution That Will Transform How We
Live, Work, and Think[M]. Boston: Houghton Mifflin
Harcourt, 2013.
[2] Nazeer K A A and Sebastian M P. Improving the
Accuracy and Efficiency of the k-means Clustering
Algorithm[C]. Proceedings of the World Congress
Engineering 2009 Vol I, London, 2009.
[3] Sohrab Mahmud, Mostafizer Rahman, Nasim
Ak htar. Improvement of K- means Clustering
algorithm with better initial centroids based o n
weighted average[C]. 7th International Conference
on Electrical and Computer Engineering, Dhaka,
2012.
[4] Xuhui Che n, Yong Xu. K-Means Clustering
Algorith m w ith Refined Initial Ce nter[C ]. 2nd
International Conference on Biomedical Engineering
and Informatics, Tianjin, 2009.
[5] Jianwen Xie, Yuanbiao Zhang, Weigang Jiang.
A K-means Clustering Algorithm with Meliorated
Initial Centers and its Application to Partition of Diet
Structures[C]. International Symposium on Intelligent
Information Technology Application Workshops,
2008.
Definition of five categories

nurmer of curves
567
tl..
7896
9842
IV
12.13
4911
14.1!>.lt
Lti6:
load characteristics
sfiOi1 tlnle hloh load
load channes smoothlv
hiah oa on oavt nle
high load on day time
(decreases slinhtlv at noon)
hiah oa In tte mont
Clustering result shows that parallel computing

technology based clustering algorithm proposed
in the paper is significant and feasible, and is an
effective way to solve big data background load
curve cluster.
5 Conclusion
In the background of big data clus tering, the

following works are carried out in the paper:
(1) In order to reduce the influe nce o f sm all
fluctuations, all load curves are de-noised by wavelet
decomposing.
(2) A multi-core parallel technology based K-means
clustering algorithm is proposed to speed up the
clustering procedure.
(3) The multi-core parallel technology based K-means
clustering algorithm is applied to large consumers
clustering analysis of Jiangsu Province. Test results
[6] Lai Jim Z C, Tsung Jen Huang, Vi-ching Liaw.

A fast k-means clustering algorithm using cluster
center di splacement[J ]. Pattern Rec ognition,
pp.2551-2556, Vol.42, 2009.
[7] Jitendra Kumar, Mills Richard T, Ho man Forrest
M, et al. Parallel k-Means Clustering for Quantitative
Ecoregion Delineation Using Large Data Sets[C].
International Conference on Computational Science,
Singapore, 2011.
[8] Hongwei Guo, Yanchi Liu, Helan Liang, et al.
An Application on Time Series Clustering Based on
Wavelet Decomposition and Denoising[C]. Fourth
International Conference on Natural Computation,
Jinan, 2008.
[9] Sen Ouyang, Zhengxiang Song, Degui Chen, et
al. Application of wavelet soft-threshold de-noising
technique to power quality detection[J]. Automation
of Power System, pp. 56-60, Vol.26, No.19, 2002.
Paper No. CP0843
3/4
Page /914
[10] Mora- Fl6rez J, Cormane-Angarita J, Ord6nez

Plata G. k-means algorithm and mixture distributions
for locating faults in power systems[J]. Electric
Power Systems Research, pp.714-721, Vol.79 NO.5
2009.
[11] Kalyani S, Swarup K S . Particle swarm

optimization based K-means clustering approach
for security[J]. Expert Systems with Applications,
pp.108 3 9-1046, Vo1.38 , No.9, 2011.
Author's brief int roduction and contact
information:
Haiyan Zheng was born in Zhenjiang City, Jiangsu
Province in 1979. He received his master's degree in
Computer Science, and he is now a Senior Engineer
of Power System. His current interest is design and
development of power system software.
Nong Jin was born in Nanjing City, Jiangsu Province
in 1957. He received his undergraduate degree in
Power System and its Automation, and he is now
a Senior Engineer of Power System . His current
interest is design and development of power system
software.
Zhen g Xiong was born in Nanchang City, Jiangxi
Province in 1978. He received his undergraduate
degree in Co mputer Science, and he is now a
Engineer of Power System. His current interest is
design and development of power system software.
Cong Ji was born in Rudong City, Jiangsu Province
in 1988. He received his master's degree in Power
System and its Automation, and he is now an
Assistant Engineer of Power System. His current
software.
Chao Fang was born in Jurong City, Jiangsu
Province in 1985. He received his associate's degree
in Computer Science, and he is now an Engineer of
Computer Science. His current interest is design and
development of power system software.
Chunlin Zhong born in Jurong City, Jiangsu Province
in 1983. He received his undergraduate degree in
Power System and its Automation, and he is now
an Eng ineer of Computer Science. His current
software.
(CICED 2014 Session3 )
Paper No. C P0843
4 /4
Page 1915

06991841

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

06991841

Uploaded by

Copyright:

Available Formats

2014 China International Conference on Electricity Distribution (CICED 2014)

Shenzhen, 23-26 Sep. 2014

Electricity Information Big Data Based Load Curve Clustering

Clustering analysis of load curves on basis of

Improvement of electricity information acquisition

(CICED 2014 Session3)

weighted average is proposed in [3]. According to the

Small fluctuations in the curve don't represent

Paper No. CP0843

Comparison of load curve before and after wavelet de-noising

2014 China International Conference on Electricity Distribution (CICED 2014)

Lo ad curve loses its c h a racteristic of p ower

K-means clustering algorithm

K-means is one of the most widely used clustering

Shenzhen, 23-26 Sep. 2014

data and centroids, and separate data into the

There are tens of thousands of load curves to

'I\; _ '1.' '::.:.'

The main ste ps of multi-core parallel technology

(CICED 2014 Session3)

Fig.3 Procedure of load curve ana lysis

parallel K-means clustering algorithm

kinds of load curves with obvious characteristics

First, the load curves are preprocessed by following

Paper No. C P0843

2014 China International Conference on Electricity Distribution (CICED 2014)

after de-noising by wavelet reconstruct.

Fig.5 Distribution of all 16 kinds

Speed-up ratio of improved K-means clustering

It is favorable for big data based load clustering

Shenzhen, 23-26 Sep. 2014

show that the proposed parallel K-means clustering

[1] Viktor Mayer-Schonberger, Kenneth Cukier. Big

Definition of five categories

Clustering result shows that parallel computing

In the background of big data clus tering, the

(CICED 2014 Session3)

[6] Lai Jim Z C, Tsung Jen Huang, Vi-ching Liaw.

Paper No. CP0843

2014 China International Conference on Electricity Distribution (CICED 2014)

Shenzhen, 23-26 Sep. 2014

[10] Mora- Fl6rez J, Cormane-Angarita J, Ord6nez

[11] Kalyani S, Swarup K S . Particle swarm

(CICED 2014 Session3 )

Paper No. C P0843

You might also like