You are on page 1of 5

UNSUPERVISED LEARNING STRATEGIES FOR THE DETECTION AND CLASSIFICATION OF TRANSIENT

PHENOMENA ON ELECTRIC POWER DISTRIBUTION SYSTEMS

David L. Lubkeman Chris D. Fallon Adly A. Girgis .


Department of Electrical and Computer Engineering
Clemson University
Clemson. South Carolina 29634-0915
ABSTRACT kinds of events that could potentially be classified with a given
A number of utilities are currently installing high-speed data feature set.
aoquisition equipment in their distribution substations. This This paper discusses the feasibility of applying unsupervised
equipment will make it ssible to record the transient waveforms learning techniques to the classification of transient events on
due to events such as c w and high-impedance faults, capacitor distribution networks. The specific unsupervised learning
switching, and load switching. This paper describes the schemes applied include the self-organizing mapping scheme
potential of applying unsupervlsed learning strategies to the introduced by Kohonen as well as a model based on adaptive
classification of the various events observed by a substation resonance theory which is attributed to Grossberg. Training and
recorder. Several strategies are tested using simulation studies testing cases are based on EMTP program simulations of typical
and the effectiveness of unsupervised learning is compared to transient events on a model power distribution system. The
current classification strategies as well as supervised learning. performance of the unsupervised learning schemes is also
compared to that obtained by applying supervised learning based
Keywords: Fault Classificorion, Unsupervised Learning, Paver on backpropagation.
Distribution Fault Analysis.
OVERVIEW OF LEARNING STRATEGIES
INTRODUCTION Unsupervised vs. Supervised Learning
The classification of transient disturbances on electric power What the distribution engineer would like to have as a
distribution feeders is a challenging problem. Traditional disturbance classifier is a black box that can be hooked up to a
protective relaying schemes are capable of correctly identifying transient recorder with minimal setup. The engineer would not
major disturbances, such a$ low-impedance faults. However, there normally have the resources necessary to develop a set of cases .
are presently no commercial on-line detection schemes that can that would be needed to train a neural network-based classifier
be applied to differentiate between transient disturbances due to requiring supervised training, such as a backpropagation
events such as high-impedance faults, transformer inrush and load network. Instead, it would be desirable that this classifier could
switching. Modem data recording systems are capable of learn how to differentiate between the various events on its own, ,
sampling three-phase voltages and currents, at sampling rates up with minimal interaction on the part of the engineer.
to 6 KHz. This makes it possible to calculate lower-order Neural networks are trained by applying sets of input pattern
harmonics and the extent of higher-frequency noise. Although vectors and adjusting the network weights according to a
different types of disturbances ap ar to have certain unique predetermined learning strategy. There are two basic types of
signature characteristics, the class&on of these events is not learning strategies: supervised and unsupervised. Supervised
a trivial task. Each feeder has different load and line learning requires a set of training pairs, consisting of input
characteristics, which would mean that any detection scheme vectors with target vectors corresponding to the desiied output.
would have to be adaptable to a particular system. Also, the The goal of a supervised training strategy is to minimize the m r
voltage and current waveforms in a distribution system are rich in between the output produced by each input vector and the desired
high-frequency noise with a very wide frequency spectrum (white output for a set of training vectors. A popular neural network
noise) and a number of harmonics. Any successful scheme for based on supervised training is the backpropagation network.
event detection and classification would have to be based on some The basic problem with strategies employing supervised
type of adaptive pattem recognition technique. training is that there are many situations in which the
Neural networks have successfully been applied to a number relationships between the classifier inputs and the appropriate
of power engineering-oriented classification problems [11. They output target patterns can not be determined ahead of time. It is
have been especially useful when the essential features of the for these types of situations that unsupervised learning strategies
patterns are unknown or difficult to characterize. Initial efforts to have been developed that do not require an output target vector.
train a neural network to differentiate between high-impedance The training set would only consist of input vectors
faults and other transient events have alread been described by corresponding to the features of the event to be classified. The
Ebron, et al. [2]. In this study, a number oztraining cases for a goal of the training algorithm in this case would be to adjust
typical 12 kV distribution system were developed through the use network weights to produce output vectors that are consistent. In
of the Electromagnetic Transients Program (EMTP).The standard other words, the application of similar input vectors,
~

backpropagation learning algorithm was then applied to train the corresponding to the same class of events would produce the same
network. The results indicated that this type of supervised output pattem. Although a vector from a certain event class
learning could be successfully applied. Unfwtunately, the would produce a specific output. there would be no way to
problem with a classification approach based on supervised determine, before tr\aining the network, which output pattern
learning is that a substantial amount of effort would be required to would be produced by a given input vector beloneg to a certain
obtain the training cases. Also, because actual fault waveforms class of events. Although a number of convenhonal algorithms
differ from simulation results, it would eventually be necessary to for performing the unsltpervised clustering described above have
train the network on real network data. Since the types of events already been developed, such as the K-Means and ISODATA
needed to train the network would not naturally occur over a short algorithms [3]. this paper'will only focus on two common neural
period of time, it would be necessary to stage the events in order network approaches for unsupervised learning.
~ to gather the data.
A more desirable approach to the classification of transient Self-organized Mapping
events would involve unsupervised leaming. A nctwork that
learns without supervision is appro riate since there is no One strategy for unsupervised learning is self-organized
requirement for a priori knowledge ofthe relationship between mapping, based on work done by T. Kohonen [4]. The self- -
input patterns and the events to be classified. Unfortunately, if organizing map is a neural network which maps an n-dimensional
an event is not consistently associated with a characteristic input space onto a two-dimensional grid. A Kohonen layer
pattern of activity, then that event cannot be classified. One employs a competitive or "winner-take-all" strategy. For a given
could also employ unsupervised learning as a tool to discover the input vector, only one neuron will output a logical one, with all

Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.
the other outputs set to zero. The Kohonen layer is capable of A preprocessing strategy is needed to normalize the t r h g
grouping input vectors into clusters which correspond to a set of input vectors before using them as inputs to the network.
category that the input vector belongs to. This is accomplished This is accomplished by dividing each component of an input
be adjusting the Kohonen layer weights so that input vectors with vector by that vector's length. This has the effect of convertmg
similar features activate the same Kohonen neuron. Eventually, each input into a unit vector in ndimeasional space. Hence, each
the weights of a neuron will be the average of the class of input input vector terminates on the surface of a hypersphere.
vectors that activate it. Kohonen used this type of network for
speech recognition in which he created phoneme maps.
The structure of this network consists of three layers: an COMPETITION LAYER
input layer, a Kohonen layer and a competition layer, as shown in
Figure 1. The Kohonen layer is basicall a twodimensional array
of neurons, where each input neuron is &y connected to those in
the Kohonen layer. Neurons in the Kohonen layer are also
connected to neighboring neurons. The Kohonen layer is also
fully connected to the competition layer, which only contains a
single neuron. Also, the neurons of the Kohonen layer are
connected to neighboring neurons. This interconnection allows
the self-organized mapping strategy to maintain spatial
relationships among nearby members of the grid elements.
Before training, the Kohonen layer is initialized such that
the neuron's weights are set to points on a grid in the unit square,
defmed by the fmt two coordinates of the input space, where the
coordinates vary between zero and one in each dimension. When
an input vector is presented to the network, each Kohonen neuron
computes the distance between its weight vector and the input
vector. where this distance is given by Figure 1 Self-Organized Mapping Network

Adaptive Resonance Theory


The human brain has the ability to process new memories as
they arrive and still keep from erasing or corrupting existing
memories. One of the disadvantages of a backpropagation
network is that the addition of a new input vector to the training
set may require that the network be completely retrained. This
makes the backpropagation network unsuited for incremental .
The neuron in the competition layer then determines the learning in certain environments. A neural network designed for
"winner", which is the Kohonen neuron with the smallest incremental learning is the ART network, based on the
distance. This is referred to as a competitive learning strategy application of adaptive resonance theory and developed by '
since the neurons compete against each other for the ability to Carpenter and Grossberg [a]. There are several variations of the
modify their weights. The winning Kohonen neuron's weights ART network. ART1 was developed for binary signals and ART2
are then adjusted to move it closer to the input vector. In this was developed for analog signals.
particular implementation of the self-organizing map, the The ART network is basically a pattem vector classifier
winning neuron's neighbors are also adjusted such that their which can also be used for unsupervised learning. That is, it
weight vectors are also moved closer to the input vector. The accepts an input vector pattem and classifies it into a category
adjustment of the neighboring neurons is needed to main& the depending on pattems already seen by the network. The desired
spatial integrity of the grid. The weight adjustments are as output does not have to be known ahead of time to train the
follows: network. If an in ut vector pattern does not match up to anything
stored by the ARf network, then a new category is created. If the
input vector pattern is matched with a pattern category, then the
w"k = {ld + ~ ( - xwL15, if k is the winner weights corresponding to that pattem category are modified to
make it more like the input vector. Hence new input pattem
vectors will modify weights corresponding to stored pattems if
the match is within a certain tolerance referred to as the vigilance
k
Wmw = +ld + ~ ( -xwkold1, if k is a neighbor of the winner factor.
An ART2, two-layer network, is illustrated in Figure 2
below. The classification strategy consists of three stages:
recognition, comparison and search. New input pattem vectors
As the learning progresses, the neuron weight vectors are spread are learned and classified b modifying the bottom-up weights
out such that each weight vector represents a region in the input from the F1 neurons to the 22 neurons. Neurons in the E2 layer
space. That is, each neuron's weight vector becomes the then compete for the ability t~ match up with the input pattern.
prototype for inputs in that region. The learning rate for the where each neuron in the F2 layer corresponds to a pattern
neighbors is also subject to a cooling factor, which determines category. The topdown weights from the F2 layer then provide
how quickly the neighborhood effect is reduced to zero. an expectation to the F1 layer, of what a typical pattern should
One potential problem with Kohonen networks is that if the look l i e for a given category. A vigilance factor specified by
input vectors are very similar and the initialization process the user then determines what degree of recognition is required for
spreads the initial weight vectors over a wide range, then only a a match to be declared. If the F2 match is close enough to the
few neurons will get involved in the learning process. To r e c w input vector pattern, then resonance is said to occur. Network
this- problem, this implementation of the self-organizing ma weights are then adjusted to make the stored pattern look more
also mcludes a conscience mechanism developed by Desieno [5f like the input. In this manner, the weights for a given pattem
The conscience mechanism is used to monitor each Kohonen reflect the average of the pattems in a given category. However,
neuron's history of success in the competitive learning process. if a mismatch occurs, then the F2 neuron is inhibtted and process
If a neuron wins too often, then the conscience mechanism takes is regeated until a match occurs. If the network is unable to match
that unit temporarily out of the competition. This will allow the 1 pattern with any existing category, then the network
neurons in undersampled areas to get involved in the competitive creates a new pattem category, using the input vector as a
learning process. prototype for a new category.

108

Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.
The network user needs to specify the size of the F1 layer, Harmonic components
which corresponds to the size of the input vector, the number of
neurons to be used in the F2 layer, a vigilance arameter. and
some other miscellaneous learning parameters. he
successful
Deviations in frequency
Difference between pre- and postevent values .
application of this type of network is highly dependent on the Rate of change of above quantities.
selection of the vigilance factor. If the vigilance is set too high, The detection and classification of low-impedance faults is fairly
then input vector pattems will fail to match up to those stored in straightforward since this only involves discriminating between
memory, resulting in a large number of pattem categories being the high current magnitudes associated with faults and the normal
created. In this case. the network fails to generalize correctly, load currents. However, the detection of events associated with
since only a slight variation of a pattem will create a new class. switching or high-impedance faults requires a more detailed
However, if the vigilance is set too low, then different categories analysis of transients.
will &come indistinguishable and get grouped together. This
usually necessitates the need for some type of supervision to Each disturbance due to capacitor switching, faults, etc. is
adjust the vigilance factor. accompanied by transients in the current and voltage waveforms.
Certain aspects of transients are unique to the type of disturbance,
while others are common to all of them. The uency and rate of
SHORT-TERM MEMORY FZ
STM RESET
decay of these transients depend on the -3
disturbance and
the location of the event causing the disturbance. For example,
capacitor switching creates both voltage and current transients.
The voltage transients are based on the natural frequency of the
system, which normally varies between 250 and lo00 Hz. These
transients may decay within half a cycle. The capacitor switching
can also magnify the harmonic distortion.

n :a,L,
Ip(my w I
+ TD LTM

SHORT-TERM MEMORY F1
High-impedance faults also produce transients. However
these transients may decay faster due to the high attenuation
produced by the fault impedance. It is interesting to note that this
'

type of fault is not easily detected by conventional relaying


schemes since this fault's characteristlcs are similar to other
transient events. Switching a parallel transformer to satisfy
loading conditions may result in transients or inrush current. In '
some instances, this inrush may be incorrectly interpreted as a
fault condition. High-impedance faults can exhibit arcing of a
INPUT PAllERN highly random nature, resulting in fault currents with noticeable
high-frequency components. Yet this same behavior can result
from such normal operations as capacitor switching and. '
Figure 2 ART2 Network transformer tap changing, so frequency monitors are also
unreliable. These attributes of high-impedance faults make them
DESIRED FEATURES OF A NEURAL NETWORK- very difficult to detect and the identification of salient features is
BASED EVENT CLASSIFICATION STRATEGY an ongoing area of research [7].
The success of a strategy for classifying transient events will The detection of certain types of events based on the
be highly dependent on the features presented to the neural monitoring of waveforms at the substation is not a trivial task.
network. Neural networks do not normally operate on raw data. The waveforms of the voltage or the current in a distribution
Some form of preprocessing involving filtering, computing the system are rich in high-frequency noise with a very wide
discrete Fourier transform components or scaling is usually frequency spectrum and a number of harmonics. It is reasonable
essential. A neural network approach to the classification of to expect that the level of these harmonics and high-frequency
power system disturbances, as illustrated in Figure 3. would transients will increase in the future. Also, each feeder has
consist of three basic tasks: collecting a set of sampled feeder different load and line characteristics, which would mean that any
line currents and voltages corresponding to abnormal and normal detection scheme would have to be adaptable to a particular
conditions, using this set to train a neural network, and Jesting system.
the network on a separate set of processed line currents and AND WGEMENT OF FAULT DATA UBRARY

voltages. The preprocessor is an integral part of this strategy


since it conditions the raw data into a form suitable for input into
the neural net, as illustrated in Figure 4. Such a detection strategy
could be based on parameters such as changes in seQuence
components, variations in the non-60 Hz components in the
current waveform, and abnormal high-frequency noise. These
SUUAlWN
S
-
.
parameters would be calculated by a preprocessor for a number of
windows, where each window represents a certain time period.
This allows the network to make use of changes in parameters
over time. The parameters which existed for the n-window range
would then form one input vector for a neural network. The
associated output vector would then be used to indicate the type of
transient event.
Feature selection is more of an art than a science. The goal of
feature selection is to eliminate as much unnecessary information
as possible while still retaining the salient information in a
cbmpact form. Useful features are those which vary widely from
class to class, are easy to measure and calculate and which are not RRYDUUV I Lorn- I
correlated with other features. This process is difficult to WRLO
S M V DATA "utnwIIJ#
automate and must be based on an intuitive understanding of the
classification problem. For identifying disturbances on Figure 3 Creation and Management of Fault Data
distribution systems, the following items have typically been
looked at as possible features:
Magnitude and phase of currents
Magnitude and phase of voltages

109

Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.
'6PERmACm'r substantial increases in current on two phases. What
distinguishes an ungrounded-line-to-line fault from a grounded-
'
I
line-to-line fault is the fact that the latter results in an increase in
zero sequence c m m t [9].
I The fmt set of studies involved a 12 element input pattern
vector consisting of three-phase pre- and post-fault current
magnitudes and angles, as measured at the s&station. To test
whether the inputs were sufficient for characterizing the fault
typs, a backpropagation network consisting of 1 hidden la er
I
I RAWDATAANALYSIS I
with 12 neurons was presented with the training data. h e
I
I
network was able to correctly class 98% of the cases after
I
P H A S E # about 100 iterations. A self-organamapping network with a
I
I
I
CURRENTS ; seven by seven Kohonen layer was then presented with the
training set. The mapping results are shown in Figure 5a The
I grounded-line-to-line faults correspond to the triangles. the
NEUTRAL
I
I
I
CURRENT ; ungrounded-bto-line faults am represented by the squares while
the single-line-to-ground faults are represented by the circles. As
I I shown in the map, the single-line-to-ground faults are well
I I grouped, while there is good, but not ideal differentiation between
I SAMRE I the two sets of lineto-he faults.
I
I
I I DATA PREPROCESSOR
-I
SETS I
I
I
Next, an ART2 network with 12 nemns in the F1 layer and 4
neurons in the F2 layer with a vigilance factor of 0.95 was tested.
I I The network accurately classified the single-line-to-ground faults,
I I but could only obtain a success rate of about 8096 when hying to
I I differentiate between the two types of line-to-line faults.
I I
I
I
I
I
I

8.0
7.0 -
rI NEURAL NETWORK
J
6.0 e 0 U
5.0 e o h

Fault
J
HIF
J
Capadtor Xfmr Load
4.0
3.0
- 0 0
e
0

0
0
n
A
Fault Switching Inrush Switching
EVENT CLASSIFICATION 2.0 e A A 0
1.0 6 0 A A 0
Figure 4 Neural Network-Based Classification Scheme 0.0 0 6 1 c b t l @ A l
~ 1.0
-1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
SIMULATION RESULTS
In order to evaluate the suitability of unsupervised strategies Figure 5a Results of Self-Organized Map for First Set of Cases
such as self-organized mapping and adaptive resonance theory, as
opposed to supervised strategies such as backpropagation, it was
necessary to derive suitable test and training sets. A simulation A second set of studies with a r e d d pattem vedor size was
of a typical radial distribution feeder with multiphase laterals and then created. The new pattem consisted of the difference between
loads was used to create case studies. For the purpose of the the pre- and post-fault current magnitudes for each phase, as well
transient simulations, the system was modelled as mutually as the difference between pre- and post-fault zero sequence current,
coupled resistive and inductive transmission lines with lumped to make a total of 4 inputs. A backpropagation network
loads to best simulate actual distribution feeders. Transient data consisting of 1 hidden layer with 4 neurons was able to achieve
was generated using EMTP by creating faults at different locations the same 98% classification success as cited before. The same
within the network. The fault type, loading conditions, fault data set was also applied to the same self-organized mapping
resistance and the point on the voltage waveform when the fault network as descrhd above, with results as shown in Figure 5b.
occurred was varied throughout the simulations. The 60 Hz Again there is a noticeable differentiation made in the three types
phasor quantities of the fault induced transient data were estimated of events.
by means of an optimal estimation algorithm [8]. The data An ART2 network with 4 neurons in the F1 la er and 5
produced was the pre-fault voltage and current phasors and the neurons in the F2 layer with a vigilance factor of 0.9rwas then
post-fault voltage and current phasors taken after the Kalman tested. As before, the network was able to differentiate between
fdter algorithm converged, typically one-half to threequarters of single-line-to-ground and line-to-line faults. However, the
a cycle after the detection of the transient. A trainiig set with 75 network was not able to completely differentiate between
events-was constructed as well as a test set with 75 events. grounded and ungrounded line-to-line faults. A number of
In this initial study, only three types of events were different variations were made to the network without much
considered: single-line-to-ground faults, ungrounded-line-to-line success. The two type of faults differ in that one has a large WO
faults, and grounded-line-to-line faults. It was decided to fully
explore this sim ler set of events before moving on to more
complex types o f events involving additional frequency domain
sequence component while the other doesn't. Ap
problem was that there was too great a differeact in rt:yG
within each class of line-to-line faults. When the vigilance factor
information. Hence the goal of the classificakion process was to was increased, instead of differentiating between the two types of
select which of the three fault events occurred. Single line-to- faults. the network would onl take each type of line-to-line fault
ground faults can be characterized by a large increase in current on and divide its cases into smalrer subclasses.
a single phase only, while line-to-he faults are characterized by

110

Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.
[5] Duane Desieno, "Adding a Conscience to Competitive
Learning", Proceedings of the IEEE International
.
Conference on Neural Networks, Volume 1, July 1988, IEEE
8.0 Press, pp. 117-124.
1.0 -
[6] Carpenter,. Gail A., and Grossberg. Stephen, "ART 2: Self-
6.0 e o 0 0 0 h Organization of Stable Category Recognition Codes for
5.0 e A O h Analog Input Patterns", Applied Optics, Vol. 26. No. 23,
1987, pp. 4919-30.
"4.0 - A A A B 0
3.0 4 A A A 0 0 [7] Adly A. Girgis. Wenbin Chang. Elham B. Makram.
2.0 8 n o 0 0 "Analysis of High-Impedance Fault Generated Signals using
a Kalman Filtering Approach, IEEE Transactions on Power
1.0 A 0 0 0 Delivery, Vol. 5. No. 4, November 1990, pp. 1714-1724.
0.0 A A t b t b l l a 1
-1.0 ' 1
[8] A.A. Girgis, R.G. Brown, "Application of Kalman Filtering
in Computer Relaying", IEEE Transactions on Power
Apparatus and Systems, Vol. PAS-100, No. 7, July 1981,
-1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
pp. 3387-3397.
Figure 5b Results of Self-organized Map for Second Set of Cases [9] A.G. Phadke, M. Ibrahim, T. Hlibka, "Fundamental Basis
for Distance Relaying with Symmetrical Components".
IEEE Transactions on P e e r Ap aratus and S stems, Vol.
CONCLUSIONS PAS-96, no. 2, March/Apnl 197,! pp. 635-64dl
It is apparent that developing an event classification scheme
based totally on unsupervised learning will be a difficult task.
Obvieusly some type of initial supervised training would be
~ q u i r e dbefore such a device is used in the field. This could be
rcomplished by using a simulation of the network to which this
rype of device is to be attached. Although this would take quite a
bit of extra work, an event classifier would then only require
aruprvised learning to compensate for the difference between
IL sudation model and the waveforms which actually occur on
dr feeder.
The practical application of neural networks will also
rvolve the integration of small, special purpose networks with
mnventional fault detection algorithms. It will be difficult to
ccmstruct a large neural network with a large number of inputs that
will be able to classify a multitude of different events. There is no
rcd to have a network learn that the zero sequence current, which
I the phasor sum of the three-phase currents, corresponds to a
fault involving ground. One could incorporate t h s type of
howledge into a procedural algorithm. However, a special-
popose neural network component could be embedded to help
aerentiate between two events which are very similar, given
&at only a limited comparison between the two is required.

-
Future work is also required on how to best incorporate
ddltional salient features into an input pattern vector. There is a
rcd to mix in additional frequency domain information related to
m-60 Hz phenomena. This will be necessary to classify other
of events, such as capacitor switching and high-impedance
faalts. However problems with dimensionality will be
crpuntered if one were to attempt to add a complete set of three-
values for each measurable harmonic.

i
REFERENCES
Proceedings of the Workshop on Applications of Artificial
Neural Network Methodolo y in Power Systems
Engineering. A p d 8-10, 19%. Elemson university.
Sonja Ebron, David Lubkeman and Mark White, "Neural Net
Processing A roach to the Detection of High Impedance
Faults", IEE&ransactions on Paver Delivery, Vol. 5, No.
2, April 1990. pp. 905-914.
Yoh-Han Pao, Adaptive Pattern Recognition and Neural
Networks. Addison-Wesley. Reading, Massachusetts, 1989.
T. Kohonen, Self-organization and Associative Memory.
Springer-Verlag, Berlin, 1984.
111

Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.

You might also like