You are on page 1of 6

Electric Customer Classification Using

Hopfield Recurrent ANN


López, José J.; Aguado, José A; Martín, Francisco; Muñoz, F; Rodríguez, Alejandro and Ruiz, José E.
Departamento de Ingeniería Eléctrica. Universidad de Málaga, España
Tel:+34 952131306, Fax:+34 952131091, e-mail: jjlopez@uma.es

Keywords— Non-supervised classification, Hopfield ANN, This paper is organized as follows. In Section II,
Principal Components. characterization and reduction data techniques are reviewed.
The Hopfield recurrent artificial neural network is detailed in
Abstract— In retail power markets precise information Section III. In Section IV, it is described and formulated
related to electric customers is of relevant interest. For efficient
several cluster validation indexes. Then, in Section V, results
tariff design and pricing it is required accurate classification and
segmentation of electric customers. In this paper, it is proposed a of the application of the Hopfield Neural Network in a set of
methodology for clustering electric customers based on a data of distribution customers are shown. Finally, conclusions
recurrent Hopfield Artificial Neural Network (H-ANN). In order are given.
to reduce the size of the input set of the clustering algorithm
several filtering techniques are used. The effectiveness of the II. CHARACTERIZATION AND DATA REDUCTION
proposed approach is measured using characterization indexes.
Results in a set of distribution customers are presented to Classification and clustering approaches of large databases
demonstrate de efficiency of the approach. of electric customers usually requires a reduction of the
number of data. This must be done without any loss of
relevant information. The most used techniques are the hour
I. INTRODUCTION load profile, the form factors method, the harmonic analysis

I N liberalized electrical energy markets, customers are (Discrete Fourier transform), the time-frequency analysis
playing a more and more key role in achieving efficiency in (Discrete Wavelet transform) and the Principal Component
a competitive market setting. Electrical energy suppliers are Analysis (PCA).
interested to develop new strategies and products oriented to A. Hour load profile.
the customers with the objective to offer the best possible
Electrical customers can be considered as elements of a N-
service at the least cost. dimension vector space. Each vector is formed by the values
The new liberalized scheme of the electrical energy of consumption at a certain time interval.
systems has promoted the need to precisely monitor and Let Q be a set of vectors representing electrical customers.
control the activities of the customers. A correct knowledge of Each customer P(q) can be considered as a vector:
P

demand behavior offers the opportunity of providers to


formulate dedicated tariffs and design efficient pricing
schemes. Therefore, it is necessary to develop tools that P ( q ) = [ p1( q ) , p 2( q ) ,..., p N( q ) ] q = 1,2,..., Q (1)
correctly classify the demand consumption pattern.
In order to identify the consumption pattern of the
customers, power suppliers group customers exhibiting similar where N is the number of variables that characterizes the
patterns into clusters. The amount of customer data is usually electrical customer.
huge so, before clustering, a filtering technique is applied [1].
B. Form factors
Statistical techniques have been used for the analysis of
series of data but more recently Artificial Intelligence based A form factor is defined as a parameter that characterizes a
techniques have attracted much attention. Among the most vector (x1,…., xn) . Examples of form factors are the
successful techniques, it can be highlighted the “modified arithmetic mean, the RMS value, the standard deviation, etc.
follow-the-leader” [1],[2] and [4], Self-Organized Maps Many form factors can be defined. The proposed method
(SOM) [3] and [5], K-means [1]-[5], and fuzzy K-means [1]. consists of expressing the information contained in the vector
y [3]. On the other hand, and in order to measure the quality by means of a small number of parameters.
of the cluster algorithms, several indexes have been proposed. The form factors are a set of factors in the range [0, 1].
[1, 3, 8]. In this paper, a Hopfield Artificial Neural Network Each one of them emphasizes a particular aspect of the load
(H-ANN) is proposed as a clustering technique for curve. The nine indexes used to classify the electrical
classification of electric customers. consumers are shown in Table I (see [ 1, 5]).
978-1-4244-1744-5/08/$25.00 ©2008 IEEE
TABLE I the time-frequency response of the load curve. The frequency
FORM FACTORS response of the electrical consumers is computed from a
Discrete Wavelet Transform (DWT). This process is called
Annonation Definition multiresolution analysis.
ff1 Pmed,day/P max,day The first step in obtaining a DWT consists of feeding the
ff2 P med,solar day/P max,solar day discrete-time signal to a low-pass filter with impulse response
l(n). Then the signal is subsampled in order to remove half of
ff3 P min,day/P med,solar day
the samples, according to Nyquist's rule. This process can be
ff4 1/3.(P med,nigh/P med,day) expressed as:
ff5 1/5.(P med,noon/ P med,solar day)
ff6 P min,solar day/P med,solar day
ff7 P med,nigh/P med,solar day y(k ) = ∑ x ( n ) ⋅ l ( 2k − n ) (7)
ff8 1/5.(P max,solar day/P max,day) n

ff9 P min,day/P min,solar day


where y(k) is the filtered and subsampled signal, l(2k-n)
C. Harmonic analysis represents the subsampling together with the filtering
Harmonic analysis consists of extracting a series of indices operation, and x(n) is the original signal.
or variables from the frequency response of the load curve. Decomposition of the signal into different frequency bands
The frequency response of the electrical consumers is is accomplish by a low-pass filter l(n) followed by a high-pass
computed using the Discrete Fourier Transform (DFT). This filter u(n). This operation is expressed as:
process is called harmonic analysis.
For each harmonic coefficient, three variables are defined.
One is the coefficient's module (Ah) and the other two are y high (k ) = ∑ x ( n ) ⋅ u ( 2k − n ) (8)
values related to the phase (Xh´ and Xh´´). n

y low ( k ) = ∑ x ( n) ⋅ l ( 2k − n)
n

1 − cos(θ h ) Ah
X h´ = ⋅ (2) where yhigh(k) and ylow(k) they are the outputs from the high-
2 ∑A
j∈Θn
2
j and low-pass filters, respectively.
The variables set are obtained from an analysis MRA, is
1 − sin(θ h ) Ah the different values from the approximated signal of greater
X h´´ = ⋅ (3) level. These values are used as indices based on the dominion
2 ∑A
j∈Θ
2
j time-frequency, and the number of indices will depend on the
level of DTW.
E. Principal components analysis (PCA).
where Ah and h are the module and phase of the h order It consists to represent a n observation set with p variable,
harmonic coefficient, and Ѳn is the set of ordered harmonic in another set of information with a minor number of variables
coefficients according to the decreasing value of ζh. r. The new set of information is a linear combination of the
For a set of n harmonics, the variables are defined as: original observations. Mathematically, it is necessary to
calculate the matrix of own values (V).
With the matrix of own values, we calculated a matrix Z,
f n = {( Ak , X k´ , X k´´ ), (k ≠ 0) ∩ (k ∈ Θ n )}∪ L that relates the original observations X, with matrix V, of the
(4) following way:
{( A ), (k = 0) ∩ (k ∈ Θ )}
k n

For a client q, the set of n harmonics is: Z = X ⋅V (9)

If we want to collect the new data in a space r<p, we have


f n( q ) = {f j( q ) , j = 1,L , n} (5) to select the p first columns of the V(maximum variance).
Thus we have the original data in a space of r dimensions.
and for Q customers, it is:

Z r = X ⋅ Vr (10)
Fn = { f n( q ) , q = 1, L , Q} (6)
where Zr they are the data in dimensional space r, X is the
matrix of original data and Vr is the matrix with the r
D. Multiresolution analysis(MRA) component selected.
It consists of extracting series of indexes or variables from
III. HOPFIELD NEURONAL NETWORK The main characteristic of the H-ANN is that as iterations
This model was introduced in the early 80’s by John progress, the computational energy function reduced and
Hopfield [9]. stabilize [9].
The Hopfield Artificial Neural Network is a recurrent
neural network, i.e., a neural network where the connections IV. INDICATORS OF CLUSTER VALIDATION
between the units form a directed cycle. Therefore, the output In order to measure the efficiency of the Hopfield-ANN
of the network becomes its input. Recurrent neural networks several indexes are defined. These indexes measure the quality
must be approached differently from feed-forward neural of clusters provided by the Hopfield-ANN. These indexes
networks, in designing, analyzing their behavior and training were previously proposed in [1-5] and [8].
them. Recurrent neural networks can also behave chaotically.
Usually, the dynamical system theory is used to model and A. Mean index adecuacy (MIA)
analyze them. It is the geometric average of the distance between the
Essentially, a Hopfield-ANN is constituted by N units centroide of the cluster and all the curves that belong to this
connecting processes between them, thus the entrances of each cluster. This distance is computed for all clusters formed in
process unit are the exits of the other units. An illustrative the classification process.
graph of a H-ANN is shown in Figure 1.

1 K

w24= w42 MIA =


K
∑d
k =1
2
(C ( k ) , S ( k ) ) (13)
2 4 w4N= wN4
w12= w21
where C(k) are the centroides of clusters and S(k) is the
1 N customers number in cluster k.
B. Clustering indicating dispersion (CDI)
w13= w31 w5N= wN5 This index determines the dispersion of each cluster.
3 5
w35= w53

Fig.1 Example of Hopfield Neuronal Network. wij are the connection weights 1 1 K ^
CDI = ^
K
∑d 2
(S (k ) ) (14)
The connection weights are represented by a weight matrix d (C ) k =1

W=(wij ) whose weights are named synaptical weights. A main


^
feature of W is that it is a symmetric matrix and wii=0. where d (C ) is the average distance of centroides clusters and
When the exit of the unit process is activated, it shows state ^
1. If this exit is deactivated, it shows state 0. The Hopfield- d ( S ( k ) ) is the average distance of customers cluster S(k).
ANN begins with an initial state that is defined by the
sequence s1(0), s2(0),...., sN(0), where si(0) ∈ {0,1}. During C. Davies-Bouldin index (DBI)
each iteration, a process unit is randomly selected. If during This index is the average of the distances of each cluster
the iteration k+1, we select the process unit i, a recurrent with their closer cluster.
update rule determines the state that represents this unit based
on the same sequence at iteration k, i.e. s1(k), s2(k)...., sN(k).
The recurrent updating rule is the following. 1 K ⎧⎪ d^ ( S ( i ) ) + d^ ( S ( j ) ) ⎫⎪
DBI =
K
∑ máx ⎨ (i ) ( j) ⎬ (15)
i , j =1 ⎪⎩ d (C , C ) ⎪⎭
⎧ N

⎪1 if

∑ w s (k ) ≥ θ ⎪⎪
ij j ^

s i ( k + 1) = ⎨
j =1 where d ( S ( i ) ) is the average distance of customers cluster i,
N ⎬ (11)
⎪0 if
∑ wij s j ( k ) < θ ⎪
^

⎪⎩ ⎪⎭ d ( S ( j ) ) is the average distance of customer cluster j and


j =1
d (C ( i ) , C ( j ) ) is the distance between centroides clusters i,j.
where the parameters wij, are the synaptical weights and θ is a
threshold parameter. V. TEST RESULTS
At iteration k, a computational energy function E(k) can be The H-ANN was applied on a data set corresponding to
defined as medium voltage customers of a distribution utility in southern
Spain. The data consist of 230 measures grouped into three
1 N N N
categories: Public Service (P=19), Residential (R=36) and
E (k ) = − ∑∑
2 i=1 j =1
wij
s ii
( k ) s j
( k ) +∑i =1
θ i si (k ) (12) Industry (I=175). The data set was normalized in the range [0,
1].
First, test results are presented showing the efficiency of TABLE II
NUMBER OF CUSTOMER CLASSES
the filtering techniques described in Section II combined with
the H-ANN.
Number of customer classes
In Figs. 2, 3 and 4, it is shown the evolution of validation Number cluster (K)
P R I
indexes (MIA, CDI and DBI) with respect to the number of
cluster. It can be concluded that DFT, DWT and PCA obtain 8 3 7 2
very similar results being the best filtering procedures. 9 3 8 2
10 3 9 2
11 4 9 2
12 4 9 3
13 4 10 3
14 4 11 3
15 5 12 3
16 5 13 3
17 4 14 3
18 5 14 3
19 5 14 3
20 5 14 4

This table shows the number of customer classes for


Fig. 2: Index of validation MIA
different number of cluster. As the number of cluster
increases, the number of customer classes is stabilized. The
evolution of the computational energy function of the H-ANN
is plotted in Fig. 5

Fig.3: Index of validation CDI

Fig.5: Energy of stabilization with principal components

For the PCA, the contribution of each principal component


is shown in Table III.
TABLE III
PRINCIPAL COMPONENTS CONTRIBUTION

MC 1 MC 2 MC 3
% Contribution
95,45 2,60 1,24

For a given number of cluster K=10, Table IV shows the


Fig. 4: Index of validation DBI number of customer in each cluster.

In the next subsections, the behavior of the H-ANN in


combination with DWT and PCA is analyzed.
A. Classification with PCA
The classification results using PCA for a range varying
from 8 to 20 clusters is shown in Table II.
TABLE IV
CUSTOMER NUMBER FOR K=10

customer
Cluster
P R I
1 - 4 -
2 - 16 -
3 7 23 -
4 - 7 14
5 - 16 -
6 - 36 -
7 - 34 -
Fig.6: Energy of stabilization with DWT analysis
8 - - 22
9 4 16 -
For a given number of cluster K=10, Table VI shows the
10 8 23 - number of customer in each cluster.

It can be observed that for K=10, five clusters (1, 2, 5, 6 y TABLA VI


7) only have customer of type R, only one cluster has CUSTOMER NUMBER FOR K=10
customer of type I (8) and finally, four cluster have a mix of
customers (3, 4, 9 and 10). customers
Cluster
U R I
B. Classification with DWT
1 - 5 -
The classification results using DWT for a range varying
2 - 20 -
from 8 to 20 clusters is shown in Table V. A level 5 DWT is
3 5 16 -
employed for all simulations.
4 10 24 -
TABLE V 5 - 30 -
NUMBER OF CUSTOMER CLASSES 6 - 30 -
7 - 16 -
Number of customer classes 8 3 5 16
Number cluster (K)
U R I 9 1 29 -
8 3 7 2 10 - - 20
9 3 8 2
10 4 9 2 It can be observed that for K=10, five clusters (1, 2, 5, 6 y
11 5 10 2 7) only have customer of type R, only one cluster has
12 5 11 3 customer of type I (10) and finally, four cluster have a mix of
13 5 12 3 customers (3, 4, 8 and 9).
14 5 12 3
15 5 13 3
16 5 14 3 VI. CONCLUSIONS
17 6 15 3 A set of load profiles has been classified using the
18 7 16 4 different procedures exposed in this paper. Results of different
approaches have been compared. For all cases the Hopfield-
19 7 17 4
ANN was stabilized and reached a minimum.
20 7 17 5 The cluster validation index (MIA, CDI and DB) employed
indicate that, except for the form factors indices, all the
Similarly to previous subsection, Table V shows the procedures give very similar results.
number of customer classes for different number of clusters. The best results are reached with data characterization by
As the number of cluster increases, the number of customer DWT and PCA although PCA is the most efficient method
classes is stabilized. As can be observed, the number of with respect computation time since only three parameters per
customer classes is higher as the one obtained with PCA. load curve is used.
The evolution of the computational energy function of the
H-ANN for the DWT is plotted in Fig. 6. A higher value for REFERENCES
the energy function is obtained using DWT.
[1] G. Chicco, R. Napoli and F. Piglione, “Comparisons among clustering
techniques for electricity customer classification”, IEEE Trans. Power
Systems, vol. 21, nº 2 may 2006.
[2] G. Chicco, R. Napoli , Piglione, M. Scutariu,, P. Postolache, and C. José Ernesto Ruiz González (b. 1971) received the graduate in Electronic
Toader, “Load patter-based classification of electricity customers”. IEEE and Industrial Automation Engineering and Master in Computer Science and
Trans. Power System, vol 19 nº 2, May 2004. Artificial Intelligence from the University of Málaga. Presently, he is Lecturer
at the Dept. Electrical Engineering of University of Málaga (1995-2008). His
[3] G. Chicco, R. Napoli and F. Piglione “Application of clustering algorithms topics of research include power quality and renewable energies.
and self organizing maps to classify electricity customers”. Proc. IEEE
Bologna Power Tech, Bologna, Italy Jun 23-26-2003.

[4] G. Chicco, R. Napoli , Piglione, M. Scutariu,, P. Postolache, and C.


Toader. “Emergent electricity customer classification”. IEEE Generation,
Transmission and Distribution, vol. 152(2): pp. 164-172, March 2005

[5] Valero S., Ortiz M., García F., Encinas N., Gabald´´on A., Molina A.,
Gómez E. “Characterization and identification of electrical customers trough
the use of sel-organizing map and daily load parameters”. IEEE Power Energy
System Conference, October 2004.

[6] Mallat S. “A theory for multiresolution signal descomposition: the wavelet


representation”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 11, 1989.

[7] Li J. Mann Bruce L., Koger C.H. “Dimensionality reduction of


hyperspectral data using discrete wavelet transform feature extraction”. IEEE
Transactions on Geoscience and Remote Sensing. Vol 40 (10), October 2002

[8] Bouldin D.W. Davies D.L. “A cluster separation measure”. IEEE


Transactions on Pattern Analysis and Machine Intelligence. Vol 1(2): pp.224-
227, 1979.

[9] Hopfield, J.J. "Neural Network and Physical Systems with Emergent
Colletive Computational Abilities." Proc. Natl. Acad. Sci. USA, vol. 79, 2554-
2558 April 1982.

BIOGRAPHIES
José Jesús López Vázquez (b. 1967) received the graduate in Electrical
Engineering and Master in Computer Science and Artificial Intelligence from
the University of Málaga, Málaga, Spain. Currently, he is an Associate
Professor at the Dept. Electrical Engineering of University of Málaga (1993-
2008). His topics of research include operation and planning of electric power
systems.

José A. Aguado received the Ingeniero Industrial and Ph.D. degrees from
the University of Málaga, Málaga, Spain, in 1997 and 2001, respectively.
Currently, he is an Associate Professor and Head of the Department of
Electrical Engineering at the University of Málaga. His research interests
include operation, planning, and deregulation of electric energy systems and
numerical optimization techniques

Francisco Ignacio Martin Moreno received his Licenciado en


Matemáticas degree from the University of Granada (Spain), his Ingeniero
Industrial degree from the UNED, and the PhD degree from the University of
Malaga (Spain). He was a visiting professor at the University of British
Columbia (Canada). He is currently a Professor at the Department of Electrical
Engineering, University of Malaga. His research areas of interest are power
system protection, time-frequency analysis, and power quality.

Francisco Jesús Muñoz Gutiérrez (b. 1964) received the graduate in


Electronic and Industrial Automation and Master in Computer Science and
Artificial Intelligence from the University of Málaga, Málaga. Spain.
Presently, he is Dean of the Polytechnic School at the University of Malaga
where he is also an Associate Professor (1986-2008) at the Department of
Electrical Engineering. His topics of research include transmission lines and
artificial intelligence techniques applied to power systems.

Alejandro Rodríguez Gómez (b. 1971) received the graduate in


Electrical Engineering and Master in Computer Science and Artificial
Intelligence from the University of Málaga. Presently he is Associate
Professor at the Dept. Electrical Engineering of University of Málaga (1995-
2008). His topics of research include power quality and renewable energies.

You might also like