You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4055841

Stochastic model of residual demand curves with decision trees

Conference Paper · August 2003


DOI: 10.1109/PES.2003.1270443 · Source: IEEE Xplore

CITATIONS READS

12 71

8 authors, including:

Enrique Lobato Luis Rouco


Universidad Pontificia Comillas Universidad Pontificia Comillas
54 PUBLICATIONS   605 CITATIONS    135 PUBLICATIONS   1,498 CITATIONS   

SEE PROFILE SEE PROFILE

Jose Chofre
unniversidad de alicante, españa
7 PUBLICATIONS   60 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Control of HVDC-VSC multi-terminal systems to improve transient stability View project

All content following this page was uploaded by Enrique Lobato on 02 July 2014.

The user has requested enhancement of the downloaded file.


1

Stochastic Model of Residual Demand Curves


with Decision Trees
A. Ugedo, E. Lobato, A. Franco, L. Rouco, Member IEEE, J. Fernández-Caro, J. de-Benito, J. Chofre
and J. De-la-Hoz

programming approach can be formulated to obtain optimal


Abstract— Generating firms operating in deregulated markets bidding curves. This paper develops a procedure to build a
need strategic bidding procedures to maximize their expected stochastic model of RDC to be included in an optimal bid
profits. In some electricity markets, due to the number and size of curves optimization program. This research work is oriented
the participants, the clearing price may be affected by the
for its application to the Spanish electricity market.
production supplied to the market. To model this effect, the
residual demand curve (RDC) is considered. This paper proposes A generation bid consists on a set of non-decreasing blocks
a methodology based on decision trees to estimate the of energy-price. In the same way, demand bid curves are
probabilistic RDC that a generating agent faces in each hourly formed by non-increasing blocks of energy-price. Hence, the
period of the market. The method explains the behavior of the RDC of an agent results in a non-increasing set of energy-price
RDC patterns (obtained through clustering techniques) by a set of blocks for each hour. A useful representation consists on
factors (linear combinations of explanatory variables) determined
by the statistical technique factor analysis. A decision tree is built
sampling the i RDC at a vector of n equally spaced values of
to compute the probability of each RDC pattern, taking as input energy ( q1 ,...,qn ) , obtaining the corresponding vector of
estimations of the numerical value of the explanatory factors. In
addition, the paper describes the stochastic programming prices ( p1i ,..., pni ) , as depicted in Fig 1. It should be noted that
formulation of the RDC patterns to obtain optimal bidding each curve must be sampled at the same energy vector
curves. The methodology proposed is illustrated with a case study
applied to the first intradaily market of the Spanish electricity ( q1 ,...,qn ) , in order to perform mathematical computations
market. between curves, or to classify them.

Index Terms—competitive electricity market, strategic Price


bidding, clustering, factor analysis, decision trees.
p 1i
I. INTRODUCTION p ik

T he electricity industry has experienced a process of


deregulation in an increasing number of countries. Within
this deregulated framework, strategic bidding procedures must p in

be developed by a generating firm in order to maximize its


total expected profit [1-3]. In some electricity markets, due to
the number and size of the participants, the clearing price may
be affected by the production supplied to the market. q1 qk qn
According to the theory of supply function equilibrium, the Amount of energy
residual demand curve is considered. The RDC of an agent is Fig 1: Sampling of the RDC at a set of n equally spaced values of energy
obtained subtracting to the total demand curve, the added
supply curve of the remaining competitors. However, residual Different approaches to estimate RDC, suitable to use with
demand curves are not known since there is uncertainty in the a stochastic optimization model, have been proposed in the
market conditions and the strategic behavior of the literature. In [1], equal probability RDC are selected by
competitors. Therefore, they need to be estimated in order to clustering past realizations together with future estimations of
model market power within a stochastic bidding procedure. If the explanatory variables. In [2], the strategic behavior of a
a stochastic model of the RDC is obtained, a stochastic firm is optimized for a high number of RDC scenarios; the
optimal bidding curve is built fitting a hinges model with all
the obtained optimal energy-price pairs. In this way, it is not
This work has been supported by the Spanish generating firm Viesgo, S. L necessary to consider RDC estimations, just past realizations.
of Enel Group.
A. Ugedo (e-mail: Alejandro.Ugedo@iit.upco.es), E. Lobato and L. Rouco
Time series together with regression techniques are used in [4]
are with the School of Engineering of Universidad Pontificia Comillas, to obtain RDC for the model proposed in [3]. Other
C/Alberto Aguilera, 23, 28015 Madrid, Spain. A. Franco is currently with approaches combine clustering techniques and neural networks
Carboex, S.L.U. (Endesa). C/Manuel Cortina 2, Planta 2.
to estimate both RDC patterns and its probability [5].
J. Fernández-Caro, J. de-Benito, J. Chofre and J. De-la-Hoz are with
Viesgo, S. L., C/Orense 11, 8ª planta, Madrid. Although neural network performs rather well in terms of
2

accuracy, they act as a black box; given a set of input II. OVERVIEW OF THE METHODOLOGY
variables, it is not easy to interpret their output. Fig 2 depicts the methodology proposed in this paper for
A different methodology based on decision trees is estimating the RDC patterns and their probabilities that a
proposed in this paper to obtain residual demand curve generating firm will face in a specific energy or ancillary
patterns and their corresponding probability. The process service market. The process comprises five steps.
comprises five steps. Firstly, a set of possible explanatory variables of the RDC
The first step consists on an initial selection of explanatory behaviour is selected. The initial selection of explanatory
variables, based on the knowledge and experience of the variables is based on the knowledge and experience of the
market performance, and a correlation study is applied. market performance. With the selected variables a correlation
In the second step, the information of the explanatory study is performed.
variables is reduced to a small number of factors (expressed as Afterwards, a factor analysis of the set of explanatory
linear combinations of the explanatory variables) using the variables is carried out. The factorial analysis is a statistical
statistical technique of factor analysis. technique that reduces the number of explanatory variables,
The third step classifies the different RDC in a finite excluding the non-significant ones. The factor analysis will
number of patterns obtained by applying clustering techniques select the minor number of factors (each factor is a linear
to the whole set of available RDC. combination of the initial explanatory variables) that explain
A decision tree is built in the fourth step to compute the most of the data variability.
probability of each RDC pattern, taking as input the Then, the set of RDC data is classified through clustering
estimations of the explanatory factors. techniques obtaining a finite number of significant patterns
In order to include the probability of each estimated RDC that group similar behaviours of RDC.
pattern in a linear optimization program, these curves are The next step builds a decision tree that explains the
approximated by linear regression. The resulting quadratic patterns of the residual demand curves attending to the values
income function obtained is lineally modeled by a set of of the factors computed in the factor analysis.
tangent cuts. In addition, integer variables must be included in
order to achieve an increasing bid that will be submitted into Correlations study
the market. y
The methodology proposed has been applied to estimate the Initial selection of variables.
Study of the correlations
RDC patterns and their associated probability that a Spanish of the explanatory variables.
generating utility face in the market. The Spanish electricity
x1
market is organized in two different kinds of markets: (a)
Factor Analysis
energy markets (which include the day ahead electricity market
and six intradaily markets) and (b) ancillary services markets Variables analysis.
(comprising the secondary and tertiary reserve markets and the Explanatory variables
reduction.
deviation management markets). It should be noted that the
methodology proposed in this paper is a general methodology
that can be applied to the different energy or ancillary services Clustering
markets. However, in this study, only intradaily RDC patterns
Residual demand curves
have been estimated. On one hand, the Spanish generating classification.
Patterns of behaviour
utility under consideration acts as a price-taker firm in the are obtained.
daily and secondary reserve market. On the other hand, the
tertiary and deviation management markets are difficult to Decision tree
predict. The estimated intradaily residual demand curve
x1 < X 1 ?
patterns have been included in the stochastic optimization A decision tree is built, in
order to obtain the probability
model described in [6]. of each RDC pattern. x2 < X 2 ?
The paper is organised as follows. Section II overviews the
methodology proposed to estimate the RDC patterns and their Model
corresponding probability. The next five sections describe with
Linear regresion of RDC patterns
a case study of the first intradaily market in Spain, each step of Different scenarios of
Income
the methodology in detail. Section III contains the correlation function
residual demand (patterns)
- probability (decision tree).
study. Factor analysis of the explanatory variables is
performed in section IV. Section V details the clustering of the
Fig 2: Proposed methodology for building residual demand curves
residual demand curves. Section VI deals with the decision
tree built to predict the corresponding probability of each RDC Finally, taking as input the estimations of the explanatory
pattern. Section VII explains how the RDC patterns are factors, the decision tree is applied to obtain the probability of
modelled within a mixed-integer linear optimisation program. each RDC pattern. A linear regression of the RDC patterns are
Finally, conclusions are presented in section VIII. computed to feed the stochastic mixed-linear optimization
3

model formulated in [6], where binary variables model the with the first six variables. The factor scree plot of the analysis
non-decreasing constraints of the generation bid. depicted in Fig 4 yields that 2 factors explain most of the data
The following sections describe each step of the variability (about 95% of the total variability).
methodology in detail, using a case study applied to the first Fig 5 shows the rotation space graph that indicates the
intradaily market of the Spanish power system. underlying structure of the explanatory variables. Factor 1
comprises variables x2 through x6 and is related with the
III. CORRELATIONS STUDY power system demand. Factor 2 is explained mainly by
An initial set of variables that explain the behaviour of the variable x1 and is related with the marginal price of the daily
residual demand curves is chosen. The initial selection of market.
explanatory variables is based on the knowledge and The conclusions obtained in the factor analysis confirms the
experience of the market performance. A correlation study is results yielded by the correlations study performed in step one.
then carried out in order to establish the level of correlation Gráfico de sedimentación
between variables. This correlation analysis permits a better 4

interpretation of the results of the factor analysis and a higher


accuracy of its conclusions. 3

For estimating RDC of the first intradaily market, seven


initial explanatory variables were selected. Due to 2

confidentiality reasons this variables are not named. They will

Eigenvalue
1
be referred as x1 ,....,x7 . Fig 3 shows the scatter plot of the

Autovalor
selected explanatory variables. 0
1 2 3 4

Component
Número number
de componente

Fig 4: Factor scree plot of factor analysis performed with variables x1


through x6 .
Factor 2
Gráfico de componentes en espacio rotado
1.0 x1 Factor 1

x5
.5 x3
x2
Component 2

0.0

Fig 3: Scatter plot of the selected explanatory variables


Componente 2

-.5

Direct inspection of Fig 3 shows that variables x2 , x3 and


x5 are highly correlated. There is also a significant correlation -1.0
-1.0 -.5 0.0 .5 1.0

between variables x1 and x2 and between variables x6 and


Component
Componente 11
x3 . This correlation analysis suggests the existence of two or
Fig 5: Underlying structure of the main factors
three main factors that explains most of the variability of the
data. Factor analysis will confirm this result. V. CLUSTERING
The aim of clustering is to classify the total set of RDC in a
IV. FACTOR ANALYSIS
given number of groups (called clusters). Each group presents
The factor analysis is a statistical technique that identifies RDC with a similar behaviour. Each group can be modelled by
the underlying structure of a set of explanatory variables. It the prototype that represents its pattern [8].
reduces the number of explanatory variables, excluding the Before a clustering is carried out, the number of clusters
non-significant ones. The factor analysis will select the minor must be selected. An adequate number of clusters can be
number of factors that explain most of the data variability. obtained from the graph that represents the total clustering
Factor analysis achieves a reduction in the data dimensionality error together with the number of clusters. The clustering error
without significant loss of information [7]. is a measure of the dispersion of the curves contained in each
A first factor analysis has been applied to the explanatory group, and therefore, of the quality of the clustering.
variables of the case study of the first intradaily Spanish Clustering error can be computed for instance as the sum of
market. Through the Barlett test it is concluded that it is the Euclidean distance of each RDC to the prototype of its
necessary to remove x7 to achieve significant conclusions in group. An adequate number of clusters corresponds to the
the factor analysis. A second factor analysis is then performed curve’s elbow (see Fig 6): a higher number of clusters is not
4

worth because the decreasing error is not significant; a smaller illustrates how a terminal node can be used to estimate the
number of groups does not correctly represent the groups and probability of each pattern. Each node represents the total
therefore the error grows quickly. number of examples traversing the node (referred as L in Fig
An initial set of 4992 residual demand curves of the first 8) and the total number of examples of each pattern (referred
intradaily market was available in the case study presented. as a for the third pattern in Fig 8). If the node of Fig 8 is the
The clustering error-number of clusters function of Fig 6 terminal node reached when the tree is applied, the probability
indicates that an adequate number of clusters can be of pattern three is given by the ratio a / L . The classification
established in 6. rules of each node is derived from a mathematical process that
2.4
x 10
5
minimizes the impurity of the resulting nodes, using a group of
2.2
examples called learning set. For an excellent reference of the
applications of decision trees to power systems, see [9].
2
Error
1.8

1.6

1.4

1.2
a ρP =
a
1 3
L
0.8

0.6

0.4
2 4 6 8 10 12 14 16
L
Number of Clusters
Fig 8: Structure of a node of a decision tree and computation of the
Fig 6: Clustering error vs number of clusters for intradaily market 1 residual probabilities of each pattern.
demand curves
In the case study presented, a decision tree has been
The clustering of the residual demand curves of the first
developed to estimate the probability of occurrence of each
intradaily market has been performed with the k-means
RDC pattern obtained in the cluster analysis. The separation
algorithm. Fig 7 depicts the prototype of each cluster and the
rules are built according to numerical values of the two factors
dispersion of the RDC around each cluster.
extracted in the factor analysis. Fig 9 illustrates a
simplification of the decision tree obtained for the case study.
Pattern 1 (1905 curves ) Pattern 3 (1507 curves) Pattern 5 (779 curves)
For instance, if the numerical estimation of factor 1 was 0.2
and the numerical value of factor 2 was 0.4, we would end in
the final node circled in the picture, where we could determine
the probability of each pattern as sketched in Fig 8.

Patterns
1
2
Pattern 2 (152 curves)
3
Pattern 4 (399 curves)
Pattern 6 (250 curves) 4
5
Factor2<-0.31111? 6
no
si

Factor2<1.3518?
si no

Fig 7: Custers of intradaily market 1 residual demand curves


Factor1<-0.58201?
VI. DECISION TREE no

A decision tree is made up of a set of nodes connected si

forming a tree structure. Each node contains a decision rule


that helps in the classification of patterns. The terminal nodes
Factor1<0.53759?
have not successor nodes, which means that no more si no
separation decision rule is applied. A decision tree allows to
reach a final node by evaluating the separation rules with the
numerical estimations of the explanatory variables. This final Fig 9: Decision tree to estimate the probability of each RDC pattern according
node contains the proportion (or probability) of each type of to separation rules.
pattern according to the explanatory variable predictions. Fig 8
5

VII. LINEAR MODEL OF RDC introducing a binary variable δ k1 , so that:


In order to include RDC patterns of a specific market (and
pk > pk + 1 → δ k1 = 1 → qk > qk + 1 (5)
their associated probability) in a mixed-integer optimization
program, a linear regression of the each pattern is needed [4]. The logic restrictions contained in (5) are equivalent to the
TABLE 1 contains the correlation coefficient of the linear two following constraints:
regression of the six RDC patterns obtained in the cluster pk − pk + 1 ≤ M ⋅ δ k1
analysis, for the case study presented in the paper. It can be (6)
qk − qk + 1 > − M ⋅ ( 1 − δ k1 )
seen that the linear regression explains more than 51% in the
worst adjusted pattern (number 1), and in the rest of patterns is where M is a upper bound of the two constraints (in practice,
higher than 73%. it is set to a big number -let’s say 106-).
On the other hand, if the price of a pattern k is smaller than
TABLE I the price of a pattern k + 1 , the energy must be also smaller:
CORRELATION COEFICIENT OF THE LINEAR REGRESSION OF RDC PATTERNS
Pattern rho pk < pk + 1 → qk < qk + 1 . This condition can be imposed
1 0.51071271
2 0.73480497 defining the binary variable δ k2 , so that:
3 0.87855298
4 0.94161119 pk < pk + 1 → δ k2 = 1 → qk < qk + 1 (7)
5 0.87430791
6 0.8327726 The logic restrictions contained in (7) are equivalent to the
two following constraints:
In a stochastic optimization model that maximizes the
pk − pk + 1 ≥ − M ⋅ δ k2
firm’s profit in a specific market, one term of the objective (8)
function includes the expected income of the sold energy: qk − qk + 1 < M ⋅ ( 1 − δ k2 )
max " + ∑ ρ k ⋅ Ink + " (1) Note that δ k1 and δ k2 can not take value 1 at the same time,
k
so the following constraint is added to help the branch and
where Ink is the income obtained if RDC pattern k (with bound solution method of the linear mixed-integer program:
associated probability ρ k ) occurs. The income Ink can be
δ k2 + δ k1 ≤ 1 (9)
formulated as:
Ink = qk ⋅ pk = qk ⋅ ( ak + bk ⋅ qk ) = ak ⋅ qk + bk ⋅ qk2 (2) In
where qk is the energy sold in pattern k , pk is the price of [€ / h]
energy in pattern k , and ak and bk are the coefficients of the
linear regression of the RDC pattern k . It should be noted that
(2) is a quadratic convex function of the energy qk . However,
it can be included in a linear optimization program
approximating (2) by a set of j linear tangent cuts [10]:

∂Ink
Ink ≤ Ink1 + ⋅ ( qk − qk1 )
∂qk qk1

... (3)
qk qk qk
∂Ink
Ink ≤ In + j
⋅ ( qk − q j
)
k
∂qk qkj
k
[MWh ]
Fig 10: Approximation of the income quadratic convex function by a set of
where: tangent cuts.
∂Ink
= ak + 2 ⋅ bk ⋅ qk1 VIII. CONCLUSIONS
∂qk qk1
(4)
Optimal bidding procedures have to be developed by a
generating agent to optimize its strategy. In order to model
In addition, non-decreasing constraints must be imposed in market power within optimal bidding procedures, the residual
the optimization program in order to achieve an increasing bid demand curve that an agent faces within a specific market must
that will be submitted into the market. be considered. This paper has proposed a methodology based
On one hand, if the price of a pattern k is greater than the on decision trees to estimate the stochastic RDC patterns
price of a pattern k + 1 , the energy must be also greater: obtaining their associated probability. In addition, a linear
pk > pk + 1 → qk > qk + 1 . This condition can be imposed formulation of the patterns has been sketched in order to apply
the results as input of stochastic optimization bidding model.
6

Binary variables need to be introduced to impose non- projects for different firms related with the energy
industry.
decreasing requirements in the generation bids.
Luis Rouco Rodríguez (Student Member 1989,
IX. ACKNOWLEDGMENT Member 1991) obtained the Electrical Engineer degree
and the Ph.D degree from Universidad Politécnica de
The optimization tool described in this paper has been
Madrid in 1985 and 1990. He is Associate Professor
developed under the leadership of Spanish utility Viesgo S. L of the School of Engineering of Universidad Pontificia
of Enel Group. The authors gratefully acknowledge the fruitful Comillas. His areas of interest are modeling, analysis,
comments of J. Torné and all the personal staff of the energy simulation and identification of electric power
systems. He has been visiting researcher at Ontario
management center of Viesgo. Hydro, MIT and ABB Power Systems.

X. REFERENCES
[1] A. Baillo, “A methodology to develop optimal schedules and offering Alvaro Franco Ugidos was born in Leon, Spain, in
strategies for a generation company operating in a short-term electricity 1978. He received the degree of Industrial Engineer in
market”, PhD Thesis, Comillas University, October 2002. 2001 from Universidad Pontificia Comillas, Madrid,
[2] J. García-González, J. Barquín, J. Román, “Building suply functions Spain. From December 2001 to September 2002, he
under uncertainty for a day—ahead electricity market”, in Proc. 6th was a Researcher at the Instituto de Investigación
International Conference on Probabilistic Methods Applied to Power Tecnológica, Universidad Pontifica Comillas.
Systems, Madeira, Portugal, September 2000. Nowadays, he is a gas analyst in Endesa and is
[3] D. Berzal, J. I. De-la-Fuente, T. Gómez, “Building Generation Supply pursuing a Bussiness Adminsitration Bachelor from
Curves under Uncertainty in Residual Demand Curves for the Day- UNED. His areas of interest include operation and
Ahead Electricity Market”, in Proc IEEE Power Tech. Conference, economics in electric and gas power systems.
Porto, Portugal, September 2001.
[4] A. Martín-Calmarza, I. De-la-Fuente, “New forecasting method for the Joaquín Fernández-Caro was born in Seville, Spain, in 1971. He received
residual demand curves using time series (ARIMA) models”, in Proc. the degree of Mechanical Engineer in 1996. He started working in Saltos del
7th International Conference on Probabilistic Methods, Naples, Italy. Guadiana in 1997, in charge of the hydro scheduling. He moved to Spanish
September 2002. utility Endesa in the wholesale market bidding group, where he stood for 3
[5] J. Villar, A. Muñoz, E. F. Sánchez-Úbeda, A. Mateo, M. Casado, A. years till he moved to strategy. In 2001 he moved to Viesgo S. L. of Enel
Campos, J. Maté, E. Centeno, S. Rubio, J. J. Marcos, R. González, Group where he is actually on charge of the wholesale market operation and
“SGO: Management information system for strategic bidding in medium term planning.
electrical markets”, in Proc. 2001 IEEE Power Tech. Conference,
Porto, Portugal, POM6-394. September 2001. Julián de Benito was born in Madrid, Spain, in 1972.
[6] A. Franco, E Lobato, L. Rouco, A. Ugedo, J. Fernández-Caro, J. De- He received the degree of Electrical Engineer in 1997.
Benito, J. Cofre, “Optimization of the Spanish Market Séquense by a He started working in ABB Service, a branch of ABB
Price-Taker Generating Firm”, submitted for presentation at 2003 IEEE AG, in 1998, working as a consultant in plant
Power Tech. Conference, Bologna, Italy, September 2003. maintenance. In 1999, he joined Endesa, integrating
[7] A. Basilevsky, Statistical Factor Analysis and Related Methods. Theory the wholesale market bidding group. In 2001, he
and Applications, New York: Wiley, 1994 moved to Viesgo Generación S. L. of Enel Group,
[8] B. S. Everitt, Cluster Analysis, Third Edition, New York: Wiley, 1993 where he is in charge of the energy management center
[9] L. Wehenkel, Automatic Learning Techniques in Power Systems, of Viesgo Generación.
Kluwer Academic, Boston, 1999.
[10] J. García-González, J. Barquín, J. Román, A. González “Strategic Jorge de la Hoz Ardiz was born in Madrid, Spain, in 1975. He received the
Bidding in Deregulated Power Systems”, in Proc. at 1999 Power degree of Electrical Engineer in 1999 from Universidad Pontificia Comillas,
System Computation Conference”, Trondheim, Norway, July 1999. Madrid, Spain. In 1999, he joined SchlumbergerSema as Energy Business
Consultant in charge of Electricity and Gas Projects. In January 2002, he
moved to Viesgo S.L. of Enel Group where he is in charge of the energy
XI. BIOGRAPHIES
management and biding of Viesgo in the Spanish wholesale electricity
market.
Alejandro Ugedo Álvarez-Ossorio was born in
Madrid, Spain, in 1979. He obtained an Industrial Javier Chofre Álvarez was born in Denia (Alicante), Spain, in 1973. He
Engineer degree in 2002 from Universidad Pontificia received the degree of Minus Engineer (Energy) in 1998 from Universidad
Comillas, Madrid, Spain. Since September 2002, he is Politécnica, Madrid, Spain. In 1998, he joined Tecnatom (Engineering
a Researcher at the Instituto de Investigación Department) in ELCOGAS GICC Plant. In 2000, he was in charge of the O
Tecnológica, Universidad Pontifica Comillas. His & M in CND Cogeneration Plant, in Dominican Republic. In January 2002,
areas of interest include analysis, planning, operation he moved to Viesgo S.L. of Enel Group where he is on charge of the energy
and economics in electric power systems. At the management and biding of Viesgo in the Spanish wholesale electricity
present time, he is participating in a research project market.
for a firm related with the energy industry.

Enrique Lobato Miguélez was born in Burgos,


Spain, in 1974. He received the degree of Electrical
Engineer in 1998 and the PhD degree in 2002, from
Universidad Pontificia Comillas, Madrid, Spain. Since
June 1998, he is a Researcher at the Instituto de
Investigación Tecnológica, Universidad Pontifica
Comillas. His areas of interest include analysis,
planning, operation and economics in electric power
systems. He has participated in several research
projects for different firms related with the energy

View publication stats

You might also like