You are on page 1of 10

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 20, NO.

1, FEBRUARY 2005 189

Feature Extraction via Multiresolution Analysis for


Short-Term Load Forecasting
Agnaldo J. Rocha Reis, Member, IEEE, and Alexandre P. Alves da Silva, Senior Member, IEEE

Abstract—The importance of short-term load forecasting has The MultiResolution Analysis (MRA) approach for tack-
been increasing lately. With deregulation and competition, energy ling feature extraction problems splits up the load series into
price forecasting has become a big business. Bus-load forecasting one low-frequency and some high-frequency subseries in the
is essential to feed analytical methods utilized for determining
energy prices. The variability and nonstationarity of loads are wavelet domain. Using this new representation of the original
becoming worse, due to the dynamics of energy prices. Besides, load signal, two different alternatives are investigated. The
the number of nodal loads to be predicted does not allow frequent first one is new. It consists of creating a model for STLF
interactions with load forecasting experts. More autonomous whose inputs are based on information from the original load
load predictors are needed in the new competitive scenario. This
paper describes two strategies for embedding the discrete wavelet sequence and from wavelet domain subseries, too. The second
transform into neural network-based short-term load forecasting. alternative predicts the load’s future behavior by independently
Its main goal is to develop more robust load forecasters. Hourly forecasting each subseries in the wavelet domain. The latter has
load and temperature data for North American and Slovakian been proposed in [9]–[11].
electric utilities have been used to test the proposed methodology.
The basic difference between the two alternatives is the
Index Terms—Load forecasting, neural networks, wavelets. way that features from the subseries in the wavelet domain
are treated. In the previous work, after predicting each load
I. INTRODUCTION subseries, the final forecast is obtained by returning to the
original domain (reconstruction). In the proposed alternative,

A RTIFICIAL neural networks (NNs) have been success-


fully applied to Short-Term Load Forecasting (STLF)
[1]–[4]. Many electric utilities that previously had employed
reconstruction is not needed, and features from different sub-
series are combined in the same model. There are also a few
papers on another possible way of merging wavelets with
STLF tools based on classical statistical techniques now are NNs [12], [13]. These papers have shown the viability of the
using NN-based STLF tools [5]. A comprehensive review so-called wavenets to load forecasting. On the other hand, our
of the application of NNs to STLF can be found in [6]. A main contribution is related to the representation aspect of the
certain regularity of the data is an important precondition for STLF problem, not with a particular model by itself. In any
the successful application of NNs [7]. When using classical case, no previous work on load forecasting via decomposition
statistical techniques, a stationary process is assumed for the using wavelets has considered the distortion at the end points
data. For load time series, an assumption of stationarity has to of the load series caused by filtering.
be discarded most of the time. Besides, one has to bear in mind The feasibility of the aforementioned approaches for STLF
that different kinds of nonstationarities may exist. is verified using hourly load and temperature data from North
In order to tackle the problem of nonstationarity, wavelets American and Slovakian electric utilities. The tests evaluate 1-
have been utilized because they can produce a good local repre- to 24-hr ahead predictions. This paper is divided as follows: Sec-
sentation of the signal in both time and frequency domains [8]. tion II deals with some basic theoretical aspects of the wavelet
Moreover, the wavelet decomposition, as a multiscale analysis transform and MRA. In Section III, the proposed forecasting
tool, can be used to unfold inner load characteristics, which are models are described. They are compared through forecasting
useful for more precise forecasting. In this paper, the effective- simulations in Section IV. Finally, Section V presents the main
ness of forecasting strategies that exploit multiresolution repre- conclusions of this work.
sentation via wavelet transform is investigated. The goal is to
identify different sources of useful information embedded in a II. WAVELET TRANSFORM
load time series.
Wavelets can be described as a pulse of short duration with
finite energy that integrates to zero. The basic fact about them
Manuscript received May 3, 2004. This work was supported by the Brazilian is that they are located in time (or space), unlike trigonometric
Research Council (CNPq). Paper no. TPWRS-00593-2003.
A. J. da Rocha Reis is with the Systems Engineering Group (GESis), Fed- functions. This characteristic enables them to analyze a great
eral University of Itajubá, AV BPS 1303, Itajubá, MG, 37.500–903, Brazil, and deal of nonstationary signals [14].
the Department of Control Engineering and Automation, Federal University of
Ouro Preto, Campus Morro do Cruzeiro, Ouro Preto, MG, 35.400-000, Brazil
(e-mail: agnaldo_reis@yahoo.com).
A. Motivation for Using Wavelet Transforms
A. P. Alves da Silva is with PEE-COPPE, Federal University of Rio When using Fourier transforms for analyzing a signal, it is
de Janeiro, C.P. 68504, Rio de Janeiro, RJ, 21945-970, Brazil (e-mail:
alex@coep.ufrj.br). very hard to say when a particular event took place because the
Digital Object Identifier 10.1109/TPWRS.2004.840380 basis functions used in Fourier analysis are precisely located in
0885-8950/$20.00 © 2005 IEEE
190 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 20, NO. 1, FEBRUARY 2005

Fig. 1. Examples of mother wavelets. (a) Haar. (b) Daubechies of order 2. (c) Daubechies of order 3. (d) Daubechies of order 4.

frequency but are applied all the time. This drawback implies a given signal , with respect to , is defined in (1), where
that the conventional Fourier analysis is suited for dealing and are the scale and translation factors, respectively.
with frequencies that do not evolve with time, i.e., stationary
signals. The short-time Fourier transform, which takes into
account short data windows, has been tried in order to deal CWT (1)
with nonstationary signals. However, low-frequency estimates
require long windows, while the high-frequency ones need
small windows. A CWT coefficient, at a particular scale and translation,
On the other hand, it is very well known that electric load represents how well the original signal and the scaled/trans-
series contain several nonstationary features such as trends, lated mother wavelet match. Thus, the set of all wavelet coeffi-
changes in level and slope, and seasonalities, to name a few. cients CWT , associated to a particular signal , is the
These features are often the most important and challenging wavelet representation of the signal with respect to the mother
parts of the load signal and must be taken into account when wavelet .
dealing with nonstationarity. Hence, it is plain to note that Since the CWT is achieved by continuously scaling and trans-
loads’ characteristics challenge the traditional Fourier analysis. lating the mother wavelet, substantial redundant information is
Wavelet analysis overcomes the limitations of the Fourier generated. Therefore, instead of doing that, the mother wavelet
methods by using functions that retain a useful compromise can be scaled and translated using certain scales and positions
between time location and frequency information. Implicitly, based on powers of two. This scheme is more efficient and just
wavelets have a window that automatically adapts itself to give as accurate as the CWT. It is known as the Discrete Wavelet
the appropriate resolution. Transform (DWT) and defined as

B. Continuous and Discrete Wavelet Transforms DWT (2)

Wavelet analysis employs a prototype function called mother


wavelet . This function has null mean and sharply drops in The scaling and translation parameters and , in (1), are
an oscillatory way, as in Fig. 1. Data are represented via su- functions of the integer variable ( and ).
perposition of scaled and translated versions of the prespecified In (2), is an integer variable that refers to a particular point of
mother wavelet. The Continuous Wavelet Transform (CWT) of the input signal, and is the discrete time index.
ROCHA REIS AND ALVES DA SILVA: FEATURE EXTRACTION VIA MULTIRESOLUTION ANALYSIS FOR SHORT-TERM LOAD FORECASTING 191

Fig. 3. Multiple-level decomposition scheme (S = A + D + D ).

decomposition via wavelets corrupts the information at both


sides of each subseries. Distortion on the left side (i.e., the oldest
information is corrupted) degrades the forecaster estimation. On
the other side, corruption of the most recent load information
affects both model estimation and forecasting. In order to deal
with this problem, signal extension (known as padding) at
the borders of the original load series is applied. The goal is
to minimize the amount of distortion on both edges of the
subseries (in the wavelet domain).
Four different padding schemes have been compared in this
Fig. 2. Single-resolution analysis via Mallat’s algorithm (S = A + D ) work. The first padding scheme, named zero-padding (zpd), in-
(adapted from [16]). cludes zeros at the beginning and at the end of the load series.
The second approach supposes that signals can be recovered by
C. Multiresolution and Mallat’s Pyramidal Algorithm symmetric boundary replication (sym). Smooth padding (spd)
A fast DWT variant, using filters, was developed by Mallat in is the third scheme evaluated here, and it employs a first-order
[8]. Multiresolution via Mallat’s pyramidal algorithm is a pro- derivative extrapolation. These padding schemes are discussed
in depth in [16]. Finally, the proposed padding strategy appends
cedure to obtain “approximations” and “details” from a given
the previous measured load values at the beginning of the series
signal. An approximation is a low-frequency representation of
under analysis and forecasted values at the end of it. By using
the original signal, whereas a detail is the difference between
the proposed padding, one can expect no distortion at the begin-
two successive approximations. An approximation holds the
nings of the decomposed signals and a reduced amount of dis-
general trend of the original signal, whereas a detail depicts
tortion at their ends. Bear in mind that the better the quality of
high-frequency components of it.
forecasts, the better the quality of padding, and, consequently,
The pyramidal algorithm presents two stages: decomposition
the lesser the border distortion. Another important question is
(analysis) and reconstruction (synthesis). The former calculates
related to the length of the attached information. Empirically,
the Fast Wavelet Transform (FWT), while the latter computes
one has found out that the use of 72 padding-values (at least) in
the Inverse Fast Wavelet Transform (IFWT). Multiresolution each extremity of the load series is enough to reduce distortions.
can be obtained by using a filter bank composed of L, H, L , and Fig. 4 shows the deviations of these four schemes from the
H , as shown in Fig. 2. The lowpass and highpass decomposition ideal padding, i.e., measurements attached to the beginning
filters (L and H), together with their corresponding reconstruc- and to the end of the series, too. For convenience, only sub-
tion filters (L and H ), form a system called quadrature mirror series is presented. Notice that the ideal padding cannot be
filters. In this paper, the filters’ coefficients [analogous to used in practice, because the exact padding-values at the right
in (2)] are based on Daubechies mother wavelets [15]. side border are unknown (related with the future). Traditional
Starting from signal [i.e., ], two sets of coefficients padding schemes introduce very heavy oscillations on both
are produced by the FWT: approximation coefficients and sides of . On the other hand, the proposed padding scheme
detail coefficients . This decomposition is obtained by con- (prev) produces an subseries quite similar to that generated
volving with the lowpass filter L for approximation and with by the ideal padding.
the highpass filter H for detail, followed by downsampling, i.e.,
by throwing away every other coefficient. Conversely, starting E. Representation of a Load Series via MRA
from and , the IFWT reconstructs by inverting the de- In this subsection, one can find an example of the decom-
composition stage. Inversion is achieved by inserting zeros be- position of a load series via MRA. The main objective of
tween the wavelet coefficients (upsampling) and by convolving this example is to depict the typical waveforms that arise in
the resulting signal with the reconstruction filters L and H the MRA of a load series, normalized between [ 0.5; 0.5].
(Fig. 2). Notice that a multilevel decomposition process can be However, before performing the wavelet analysis, two tasks
achieved, according to Fig. 3. The successive approximations must be accomplished: selection of the mother wavelet and
are decomposed, so that is broken down into lower resolution definition of the number of levels of decomposition.
components. There are many types of mother wavelets that can be used
in practice. To choose the most suitable, the attributes of the
D. Tackling Border Distortion mother wavelet and the characteristics of the signal must be
When one performs filtering/convolution on finite-length taken into account. Different wavelet families have been consid-
signals, border distortions arise. Therefore, multilevel load ered in [15]. This reference shows that the Daubechies wavelet
192 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 20, NO. 1, FEBRUARY 2005

is the most appropriate for treating load series. In this work, the applied in this work. Besides ordinary normalization, a com-
feasibility of the Daubechies wavelets of orders 2–4 has been plementary procedure based on the standardization of variables
evaluated, and results have shown the superiority of the first one. can be very useful. The process of first-order differencing sub-
It is advisable to select a suitable number of decomposition tracts adjacent load values. A good reason for differencing is
levels based on the nature of the signal or based on another that, besides series detrending, the variations can be as impor-
criterion such as entropy [17]. In this paper, based on features of tant as the original values. Finally, MRA—the main interest of
typical load curves, three, four, and five levels of decomposition this work—extracts the important features of the load dynamics.
have been considered. It has been found out that three levels The most important purpose of this paper is to show that patterns
of decomposition is the most promising choice, because it has in the wavelet domain are not so clear in the original load time
described the load series in a more thorough and meaningful series.
way than the others. This conclusion is mainly due to the low-
frequency band (approximation), which is the most significant C. Input Variable Selection
part of a load signal. The three-level decomposition scheme
emphasizes the regular behavior of a load series. It reveals The basic set of input variables have been selected by
hidden patterns that cannot be seen with higher resolutions. applying the partial and standard autocorrelation functions.
Periodic details provide different sources of useful information. Temperature variables were added for every time instant for
However, high-frequency details associated to higher resolutions which a load variable was included, plus the forecasted tem-
are mainly superfluous random noise. perature for the target hour. Finally, two additional inputs
Fig. 5 shows a load’s approximation and the corre- and , codifying the
sponding levels of detail , , and . The series is hour of the day , were included, too.
corrupted by two outliers (located at 27h and 80h) and 12 It has been shown that the best set of input variables for linear
missing values (153h through 164h). It can be noticed that predictors is not necessarily good for nonlinear models [19].
approximation is a very good representation for the original Nevertheless, the present work keeps using the most popular
load signal. The most essential information can be found at input variables for STLF models, because the main aim of this
this level. represents the daily load seasonality, and it is a paper is to show that feature extraction via MRA can be very
smoothed version of the load curve. The levels of detail beneficial for STLF.
and contain useful higher frequency information. These
subseries exhibit some regularities, similar shapes, and compa- D. Selection of NN Structure
rable mean values. Irregularities in those levels of detail are due
to load random variations and measurement errors. The level Multilayer Perceptrons (MLPs) have been used in this work
of detail shows some peaks that allow time localization of as the basic forecasting engine. One single hidden layer has been
peak load. The effects of outliers on the details are clear. It can used. Hyperbolic and linear activation functions have been em-
be noticed that they are synchronized with and . Besides, ployed in the hidden and output layers, respectively.
the flat part of , ranging from 153h through 164h, can be 1) Model 1 (M1): The first load forecaster is the benchmark
associated to missing values. [1]. Input and output variables are standardized (zero mean and
standard deviation equal to one) and then normalized in the
[ 0.5; 0.5] range. Fig. 6 shows this MLP-based load predictor.
III. MODELS FOR LOAD FORECASTING The architecture consists of 11 inputs, four hidden neurons, and
This section describes the proposed load forecasters. For one neuron in the output layer.
convenience, the design task is divided into four parts: data 2) Model 2 (M2): The second load predictor adds differ-
set definition, data preprocessing, input variable selection, and enced variables to the M1 set of inputs. Therefore, three time
selection of type and structure of NNs. series are employed in the selection of the input variables:
hourly standardized and normalized load (P), temperature (T),
A. Data Set and first-order differenced load (D). Table II displays the inputs
for M2.
Hourly load and temperature data from a North-American
The output of the MLP is also the standardized and normal-
electric utility [18], over the period of January 1, 1988, through
ized differenced load [i.e., ]. The ar-
October 12, 1992, have been used in this work. Some informa-
chitecture consists of 15 inputs, four hidden neurons, and one
tion about these hourly load and temperature data can be found
output neuron. M2 also provides 1- to 72-hr ahead predictions
in Table I. As in any research area, in load forecasting, it is
for the padding required by models 3 and 4.
also important to allow the reproduction of one’s results. The
3) Model 3 (M3): Model 3 combines standardization/nor-
only way of doing that is using public domain data sets, such
malization and first-order differencing with MRA via
as the one mentioned above, provided at http://www.ee.wash-
Daubechies wavelets of order 2 (daub2). Several combina-
ington.edu/class/559/2002spr.
tions of different input variables have been tried. It has been
verified that the information provided by the differenced series
B. Data Preprocessing decomposition is more useful than the one from the original
Before a data set is ready to compose a training set for NNs, it load. Three levels of decomposition have been considered.
is advisable to preprocess it. Techniques such as normalization, There are two main reasons for doing that. First, important
standardization, first-order differencing, and MRA have been high-frequency features of a load series can be emphasized.
ROCHA REIS AND ALVES DA SILVA: FEATURE EXTRACTION VIA MULTIRESOLUTION ANALYSIS FOR SHORT-TERM LOAD FORECASTING 193

Fig. 4. Evaluated techniques zpd, sym, spd, and prev appear in the legend at the upper-panel. Magnified views of both edges are shown below that.

Second, the noisy part of a load series can be separated. Fig. 7 model. The main differences are 1) five load input variables
shows the M3’s sketch. (instead of four) have been selected by autocorrelation—note
Analysis of the frequency bands has shown that the most that a lag of 12 hr has been included; and 2) the temperature in-
informative subseries are the differenced and . The formation has not been considered—this is due to the fact that
differenced and are very noisy because the differenced the cross-correlations are not significant. The submodels and
series is already a high-frequency component of the original consist of three hidden neurons and one output neuron each.
load series. Therefore, the information provided by them has c) Submodel for : Since the level of detail is more
been disregarded from M3. The input set for M3 can be found related to the noisy part of the load series, predictions for this
in Table III. The corresponding output is the same as in M2. The level of resolution are based on mean values only. The mean is
NN’s architecture consists of 23 inputs, four hidden neurons, estimated over the previous six weeks of data.
and one output neuron.
4) Model 4 (M4): Model 4 presents a different approach
IV. TESTS
[9]–[11]. Here, the idea is to decompose the load series using
MRA with daub2 and to model it via individual fitting of each In the absence of temperature forecasting, two case studies
level of resolution (frequency bands). That is, each component have been considered. In Case Study A, measured temperatures
( , , , and ) is modeled separately, and the final fore- have been employed as temperature predictions for the next day.
casting is obtained by adding those four forecasts. Notice that Yet in Case Study B, the effect of adding Gaussian noise to
for M4, the approximation and all three levels of details must the measured temperatures, as a way to simulate temperature
be necessarily taken into account to “reconstruct” the load se- forecasting errors, has been investigated. Last, more tests, using
ries. Fig. 8 shows the diagram for this model. Each of its sub- load forecasting competition data sets, have been carried out
models will be introduced in the sequel. (Case Study C).
a) Submodel for : Since the approximation is a Tables V–X summarize the results for regular workdays only.
smoothed version of the load series, analogous input variables, Holidays and anomalous days should be treated with special
as in M2, have been selected (Table IV). This submodel consists schemes, usually system dependent. Holidays and anomalous
of three hidden neurons and one neuron in the output layer. days represent a very hard task to any load forecasting tool. This
b) Submodels for and : Autocorrelation analysis is due to the fact that those atypical load conditions are rare
has also been used to select input variables for and . and quite different from regular workdays. For instance, there
Table IV presents the sets of inputs/outputs for these submodels. is one example of Labor Day per year, and the corresponding
It can be noticed that both submodels share the same structure. load behavior varies depending on the day of the week and the
Nonetheless, they are quite different from the previous sub- nature of the load.
194 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 20, NO. 1, FEBRUARY 2005

Fig. 5. Load series corrupted by two outliers and 12 missing values.

TABLE I TABLE II
SUMMARY OF THE FIVE-YEAR TEMPERATURE AND LOAD DATA SET OF INPUT VARIABLES FOR M2

Fig. 6. Structure of M1, where T~(t) represents the forecasted temperature.

A. Case Study A
A comparison of the four proposed models has been per-
formed. Concatenations of six-week windows for the current
year and for the equivalent periods—one year before and
two years before [e.g., Oct./Nov. of Year(N), Oct./Nov. of
Year(N–1), and Oct./Nov. of Year(N–2)]—have been used for Fig. 7. Diagram for M3.
training and validation purposes, with data grouping according
to the day of the week. For each day of the week, an MLP has propagation algorithm with cross-validation [20]. The early
been trained, applying an improved version of the original back stop criterion has been adopted to finish the training sessions.
ROCHA REIS AND ALVES DA SILVA: FEATURE EXTRACTION VIA MULTIRESOLUTION ANALYSIS FOR SHORT-TERM LOAD FORECASTING 195

TABLE III TABLE V


INPUT VARIABLES FOR M3 HOURLY MAPE FOR THE 4 MODELS (1–24 STEPS AHEAD)

TABLE VI
AVERAGE GLOBAL EVALUATION FOR 1–24 STEPS AHEAD WITH “PERFECT”
TEMPERATURE PREDICTIONS

Fig. 8. Diagram for M4.


performed for every hour of the day. The load forecasters are
retrained at the end of each day to incorporate the most recent
TABLE IV
INPUT VARIABLES FOR M4 load information. The concatenation of six-week training
windows, for a particular day, is shifted one day ahead, and the
forecasts for the next 24 hr are evaluated. This test procedure is
repeated for the last two years of the available load/temperature
data.
The load forecast accuracy is reported in terms of the
following error measures: mean absolute percentage error
(MAPE), mean square error (MSE), and mean error (ME).
Furthermore, ANalysis Of VAriance (ANOVA) has been used
to compare the means of the forecasting errors for the models,
in order to find out if the concepts behind the four models are
significantly different. The first group of errors consists of daily
MAPEs for M1, considering the test period. The second group
consists of daily MAPEs for M2, and so forth.
1) One-day Ahead Load Forecast Analysis: Based on
Table VI, it can be noticed that the benchmark, i.e., M1, has not
Different partitions for the training and validation subsets have performed very well. Nevertheless, the addition of differenced
been randomly created every 500 epochs. load variables to it has slightly improved its accuracy (M2).
After the one-step ahead training, next-hour (online fore- With M3, which exploits multiresolution, further enhancement
casting) and 1- to 24-hr ahead (off-line forecasting always can be noted. Yet model 4, on average, is the most accurate
beginning at midnight) predictions are computed. Multiple one. For multiple steps ahead, its forecasting errors are signif-
steps ahead are reached via recursion, i.e., by feeding input icantly smaller than the other models (see Tables V and VI).
variables with the model’s outputs. Next-hour forecasts are With M3 and M4, average accuracy improvements of 11.73%
196 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 20, NO. 1, FEBRUARY 2005

TABLE VII TABLE X


HOURLY MAPE FOR THE FOUR MODELS (NEXT-HOUR FORECAST) AVERAGE GLOBAL EVALUATION FOR NEXT-HOUR FORECASTING WITH
SIMULATED TEMPERATURE PREDICTIONS

TABLE VIII
AVERAGE GLOBAL EVALUATION FOR NEXT-HOUR FORECASTING WITH
“PERFECT” TEMPERATURE PREDICTIONS

TABLE IX
AVERAGE GLOBAL EVALUATION FOR 1–24 STEPS AHEAD WITH SIMULATED
TEMPERATURE PREDICTIONS

Fig. 9. Actual load and corresponding forecasts (+/ ) obtained with M4.

results and similar performances. The fact that M4 has not per-
formed very well for one-step ahead predictions has to do with
and 26.26%, with respect to M1, have been obtained (i.e., its own nature. Its performance for this horizon is highly depen-
and , respectively; see dent on the quality of the attached information used as padding.
Table VI).
It is clear that a larger forecasting lead-time does not neces- B. Case Study B
sarily imply a larger forecasting error. That depends on the data In this case, the effect of adding Gaussian noise of zero mean
variability for the different periods of the day and the kind of and standard deviation of 0.6 C to the measured temperature
model at hand [21]. Notice that the MSE, in Table VI, points series, in order to simulate temperature forecasting errors, has
out a smaller number of high errors for M4. been investigated (Tables IX and X). Comparing Table VI with
The outcome of ANOVA also suggests that the differences Table IX, it can be noticed that the accuracy of M1, for 1–24
among the means of the four groups of errors, corresponding steps ahead, has been considerably degraded with the “fore-
to each model, are likely to be just chance. The value of the F casted” temperatures . On the other
statistic ( ) assures, with a confidence of 95%, that the hand, M2 has not degraded much. Concerning M3, one can say
source of the variation has a significant effect. Fig. 9 illustrates that the temperature forecast errors have not meaningfully im-
a 24-h ahead recursive forecasting for each band, using M4. pacted it. The MAPE of 3.16% for the noiseless case increases to
2) Next-Hour Forecast Analysis: Based on Tables VII and 3.38% with the “forecasted” temperatures. Finally, M4 has not
VIII, it can be noticed that M2 and M3 have presented the best suffered with noise addition to the perfect temperature forecasts
ROCHA REIS AND ALVES DA SILVA: FEATURE EXTRACTION VIA MULTIRESOLUTION ANALYSIS FOR SHORT-TERM LOAD FORECASTING 197

either. Its overall MAPE changes from 2.64% to 2.82%. An av- An important decision concerning the estimation of models
erage accuracy improvement of 36.77% ( ; 3 and 4 is related to the type of padding. Attaching load
Table IX), with respect to M1, can be noticed when using M4. measurements at the beginning of the series and forecasted
Table X presents a global average evaluation of the models values at the end of it has been proposed in this paper as
for next-hour forecasts. Again, as in the perfect temperature the best way to minimize distortion due to filtering. The best
forecast case (Table VIII), M2 and M3 have been the winning load predictor without multiresolution analysis M2 has been
models. The differences in the MAPE and MSE indices for used for padding. In fact, it is very competitive, too, since
models 2 and 3 have no statistical significance. Regarding fore- the basic concept behind M2 is a simplified version of the
casting bias, the sums of forecasting errors, i.e., the ME indices, one behind M3. Model 2 combines features from the original
show no significant tendency of underestimating or overesti- load series with a high-frequency component of it, which is a
mating the load values. An average accuracy improvement of byproduct of first-order differencing. However, its MSEs for
46.98% ( ; Table X), with respect to M1, is one-day ahead forecasts indicate a higher incidence of large
achieved with M3. prediction errors.
It is true that the computational effort increases with the appli-
C. Case Study C cation of MRA, since M2 is also required. However, considering
the performance gains, one can conclude that the combination of
A comparison of the forecasting accuracy of M4 (the best pro-
wavelets with NN-based STLF is worthwhile. Future work will
posal for one-day ahead predictions) with two neural models
estimate confidence intervals for the forecasts [21]. Moreover,
introduced in [18] has been performed. Only the MAPE index
better ways to select input variables from the load subseries will
has been considered. Load and temperature data from the same
be investigated.
North American utility previously mentioned have been em-
ployed. The test period ranges from November 7, 1990, through
March 31, 1991 (major holidays were not considered), such that REFERENCES
each model has been tested with 137 days. The tests consider [1] D. C. Park, M. A. El-Sharkawi, R. J. Marks II, L. E. Atlas, and M. J.
Damborg, “Electric load forecasting using an artificial neural network,”
1- to 24 four-hr ahead predictions only. The recurrent-neural- IEEE Trans. Power Syst., vol. 6, no. 2, pp. 442–449, May 1991.
net-based load forecaster known as ANN2 provided a global [2] K. Y. Lee, Y. T. Cha, and J. H. Park, “Short-term load forecasting using
MAPE of 5.4%. Yet OH2, an adaptive neural network, produced an artificial neural network,” IEEE Trans. Power Syst., vol. 7, no. 1, pp.
124–132, Feb. 1992.
a global MAPE of 6.1%. On the other hand, M4 has provided a [3] A. G. Bakirtzis, V. Petridis, S. J. Klartzis, M. C. Alexiadis, and A. H.
MAPE of 3.7%. Maissis, “A neural network short-term load forecasting model for the
Recently, the European Network on Intelligent Tech- Greek power system,” IEEE Trans. Power Syst., vol. 11, no. 2, pp.
858–863, May 1996.
nologies for Smart Adaptive Systems (EUNITE) organized [4] A. D. Papalexopoulos, S. Hao, and T. M. Peng, “An implementation of a
a load forecasting competition with 56 contestants. Two neural network based load forecasting model for the EMS,” IEEE Trans.
Power Syst., vol. 9, no. 4, pp. 1956–1962, Nov. 1994.
years of half-hourly load and daily average temperature data [5] A. Khotanzad, R. Afkhami-Rohani, and D. Maratukulam,
from a Slovakian utility was supplied to design the models “ANNSTLF—artificial neural network short-term load fore-
(http://neuron.tuke.sk/competition). The goal was to estimate caster—generation three,” IEEE Trans. Power Syst., vol. 13, no. 4, pp.
1413–1422, Nov. 1998.
one month ahead daily maximum load. On the other hand, the [6] H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural networks for
objective in this paper is associated with hourly forecasts for a short-term load forecasting: A review and evaluation,” IEEE Trans.
shorter horizon. However, for future comparisons, 1- to 24-hr Power Syst., vol. 16, no. 1, pp. 44–55, Feb. 2001.
[7] D. W. Bunn, “Forecasting loads and prices in competitive power mar-
ahead forecasts have been estimated for the same period of kets,” Proc. IEEE, vol. 88, no. 2, pp. 163–169, Feb. 2000.
evaluation used in the competition. MAPEs of 3.1% and 2.8%, [8] S. Mallat, “A theory for multiresolution signal decomposition—the
wavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol.
for M3 and M4, respectively, have been verified. 11, no. 7, pp. 674–693, Jul. 1989.
[9] M. Ning and C. Yunping, “An ANN and wavelet transformation based
method for short term load forecast,” in Proc. Int. Conf. Energy Manage.
V. CONCLUSION Power Del., vol. 2, Singapore, Mar. 1998, pp. 405–410.
[10] B.-L. Zhang and Z.-Y. Dong, “An adaptive neural-wavelet model for
In this paper, two approaches for enhancing NN-based STLF short term load forecasting,” Elect. Power Syst. Res., vol. 59, no. 2, pp.
via multiresolution analysis are compared (M3 and M4). Next- 121–129, Sep. 2001.
hour and recursive load forecasts from 1 to 24 hr ahead have [11] C.-I. Kim, I.-K. Yu, and Y. H. Song, “Kohonen neural network and
wavelet transform based approach to short-term load forecasting,” Elect.
been carried out by MLPs in both approaches. The modeling of Power Syst. Res., vol. 63, no. 3, pp. 169–176, Oct. 2002.
a load time series via individual fitting of each subseries in the [12] A. Oonsivilai and M. E. El-Hawary, “Wavelet neural network based
short term load forecasting of electric power system commercial load,”
wavelet domain M4 has not presented the same adaptability as in Proc. IEEE Canadian Conf. Elect. Comput. Eng., vol. 3, Edmonton,
M3 for the very short term horizon. The prediction errors of M4 AB, Canada, May 1999, pp. 1223–1228.
for the next hour are more sensitive to border distortion. On the [13] C.-M. Huang and H.-T. Yang, “Evolving wavelet-based networks for
short-term load forecasting,” Proc. Inst. Elect. Eng. Gen. Trans. Distrib.,
other hand, M4 has been the most precise model for one-day vol. 148, no. 3, pp. 222–228, May 2001.
ahead forecasts, although M3 has produced very competitive [14] G. Strang and T. Nguyen, Wavelets and Filter Banks. Cambridge, MA:
Wellesley-Cambridge, 1996, p. 490.
results. These tests suggest that, considering both horizons, M3 [15] M. Misiti, Y. Misiti, G. Oppenheim, and J.-M. Poggi, “Décomposition
has been the most successful load predictor. Even though there par ondelettes et méthodes comparatives: étude d’une courbe de charge
are more inputs for M3 to handle, the combination of features électrique,” Revue de Statistique Appliquée, vol. XLII, no. 2, pp. 57–77,
1994.
from the original load series, the differenced load, MRA, and [16] , Wavelet Toolbox Manual—User’s Guide. Natick, MA: Math
the temperature series provides a richer information content. Works, 1996, p. 626.
198 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 20, NO. 1, FEBRUARY 2005

[17] R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms Alexandre Pinto Alves da Silva (SM’00) received the B.Sc, M.Sc., and
for best basis selection,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. Ph.D. degrees in electrical engineering from the Catholic University of Rio
713–718, Mar. 1992. de Janeiro, Rio de Janeiro, Brazil, in 1984 and 1987, and the University of
[18] M. C. Brace, J. Schmidt, and M. Hadlin, “Comparison of the forecasting Waterloo, Waterloo, ON, Canada, in 1992, respectively.
accuracy of neural networks with other established techniques,” in Proc. He worked at the Electric Power Research Center (CEPEL), Rio de Janeiro,
First Int. Forum App. Neural Netw. Power Syst., Jul. 1991, pp. 31–35. from 1987 to 1988. During 1999, he was a Visiting Professor in the Depart-
[19] I. Drezga and S. Rahman, “Input variable selection for ANN-based ment of Electrical Engineering, University of Washington, Seattle. From 1993
short-term load forecasting,” IEEE Trans. Power Syst., vol. 13, no. 4, to 2002, he was with the Federal University of Itajubá (UNIFEI), Itajubá, MG,
pp. 1238–1244, Nov. 1998. Brazil. Currently, he is a Professor of electrical engineering at the Federal Uni-
[20] T. Masters, Neural, Novel & Hybrid Algorithms for Time Series Predic- versity of Rio de Janeiro (COPPE/UFRJ).
tion. New York: Wiley, 1995, p. 514. Prof. Alves da Silva was the TPC Chairman of the First Brazilian Conference
[21] A. P. Alves da Silva and L. S. Moulin, “Confidence intervals for neural on neural networks in 1994 and of the International Conference on Intelligent
network based short-term load forecasting,” IEEE Trans. Power Syst., System Applications to Power Systems in 1999.
vol. 15, no. 4, pp. 1191–1196, Nov. 2000.

Agnaldo José da Rocha Reis (M’04) received the B.Sc, M.Sc., and D.Sc. de-
grees in electrical engineering from the Catholic University of Minas Gerais,
Belo Horizonte, MG, Brazil, in 1996, and the Federal University of Itajubá
(UNIFEI), Itajubá, MG, in 1999 and 2003, respectively.
Currently, he is with the Systems Engineering Group (GESIS), UNIFEI, and
the Department of Control Engineering and Automation (DECAT) at the Federal
University of Ouro Preto (UFOP), Ouro Preto, Brazil.

You might also like