Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Article history: As the conventional centralized automatic generation control (AGC) is inadequate to handle the ever-
Received 1 March 2016 increasing penetration of renewable energy and the requirement of plug-and-play of smart grid, this
Received in revised form 22 April 2016 paper proposes a mixed homogeneous and heterogeneous multi-agent based wolf pack hunting (WPH)
Accepted 15 May 2016
strategy to achieve a fast AGC power dispatch, optimal coordinated control, and electric power autonomy
Available online 26 May 2016
of an islanding smart distribution network (ISDN). A virtual consensus variable is employed to deal with
the topology variation resulted from the excess of power limits and to achieve the plug-and-play of AGC
Keywords:
units. Then an integrated objective of frequency deviation and short-term economic dispatch is devel-
Automatic generation control
Islanding smart distribution network
oped, such that all units can maintain an optimal operation in the presence of load disturbances. Four
Wolf pack hunting case studies are undertaken to an ISDN with various distributed generations and microgrids.
Virtual consensus variable Simulation results demonstrate that WPH has a greater robustness and a faster dynamic optimization
than that of conventional approaches, which can increase the utilization rate of the renewable energy
and effectively resolve the coordination and electric power autonomy of ISDN.
Ó 2016 Published by Elsevier Ltd.
http://dx.doi.org/10.1016/j.enconman.2016.05.039
0196-8904/Ó 2016 Published by Elsevier Ltd.
L. Xi et al. / Energy Conversion and Management 122 (2016) 10–24 11
Nomenclature
multi-step Q(k) was designed for optimal power flow of large-scale amount of computation time as the agent number increases com-
power grid. It is a single agent based approach, which needs a large pared to that of [19]. Furthermore, it only has a single equilibrium
amount of computation time as the agent number increases. More- thus the system stability can be maintained. However, the above
over, multiple equilibriums may emerge which would result in an methods haven’t taken the collaborative consensus (CC) of decen-
undesired system instability. In contrast, this paper develops a tralized control systems into account, which cannot achieve a
robust decentralized controller of AGC, which can achieve a coordi- smart collaboration as each region is independent.
nated control between multi agents to improve the global perfor- In MAS, a consensus among all agents is defined by a same
mance of the whole system with an easy implementation. It is a selection of objective variable value through information exchang-
multiple agents based approach which consumes a much smaller ing with adjacent agents [20]. In the past decades, the application
12 L. Xi et al. / Energy Conversion and Management 122 (2016) 10–24
Family: This is a group of units sharing similar regulation fea- usk ak ¼ minðUðsk ; ak Þ; ui =ðjAi j 1ÞÞ ð9Þ
tures in a region, such as hydro, micro gas turbine, and diesel
generator. where u is a variable learning rate with ulose > uwin. If an average
Patriarch: Who is a leader of generation control units (a big wolf mixed strategy value is lower than the current value, then the agent
in WPH) with significant dispatch capacity, it can achieve a wins and uwin will be selected, otherwise ulose will be chosen. The
highly active searching and execute complex generation com- updating law is given as
8 X X
mands independently. < uwin ; if Uðsk ; ai ÞQðsk ; ai Þ > ~ k ; ai ÞQðsk ; ai Þ
Uðs
Family member: Who is a follower of generation control units (a ui ¼ ai 2A ai 2A ð10Þ
small wolf in WPH), however, it can only follow the patriarch :
ulose ; otherwise
behavior and execute some simple generation commands.
Reserve: Who is a standby group of small hydropower units, it ~ k, ai) is the average mixed strategy.After an action ai is
where U(s
will only be put into operation if a load disturbance exceeds executed, the average mixed strategy table of all actions is updated
50% of the default value. under state sk by
~ k ; ai Þ
Uðs ~ k ; ai Þ þ ðUðsk ; ai Þ Uðs
Uðs ~ k ; ai ÞÞ=visitðsk Þ; 8ai 2 A
2.1. MAS-SG framework
ð11Þ
A wolf king adopts MAS-SG to achieve a control objective while
where visit(sk) is the total number of state sk from the initial state to
each control area contains one and only one wolf pack. DWoLF-
the current state.
PHC(k) method based MAS-SG has been developed by the authors.
Some basic results are recalled in this section while more details
2.2. MAS-CC framework
can be found in [33,39].
The optimal target state value function Vp⁄(s) and strategy p⁄(s)
MAS-CC is introduced into WPH, which is adopted by the family
obtained under state s in Q-learning can be expressed as follows:
members with homogeneous MAS to follow the patriarch of a wolf
V p ðsÞ ¼ maxQ ðs; aÞ ð1Þ pack.
a2A
Initialize Q0(s, a), R(0), U0(s, a), Ũ0(s, a), visit(s0), and e0(s, a), for all s∈S, a∈A;
Set parameters φwin, φlose, φ, γ, λ, α, and Tstep=AGC decision time;
Give the initial state s0, k = 0;
Repeat
1. Choose an exploration action ak based on the mixed strategy set U(sk, ak);
2. Execute the exploration action ak to AGC units and run LFC system for the next Tstep sec;
3. Observe a new state sk+1 via ∆f and generation costs;
4. Obtain a short-term reward R(k) using (25);
5. Calculate the one step Q-function error ρk by (4);
6. Estimate SARSA(0) value function error δk through (5);
7. For each state-action pair (s, a), execute:
i) Let ek+1(s, a) γλek(s, a);
ii) Update Q-function Qk(s, a) to Qk+1(s, a) according to (6);
8. Resolve the mixed strategy Uk(sk, ak) by (8) and (9);
9. Update the value function Qk(sk, ak) to Qk+1(sk, ak) using (7);
10. Update the eligibility trace using (3), let e(sk, ak) e(sk, ak)+1;
11. Select the variable learning rate φ according to (10);
12. Resolve average mixed strategy table based on (11);
13. Set visit(sk) visit(sk)+1;
14. Output the total power reference ∆P ;
15. Apply consensus algorithm through (20) or (21);
16. Calculate the unit’s regulation power ∆PGi using (19);
17. If the generation limit is not exceeded, then execute step 19;
18. Calculate the consensus variable i and the unit’s regulation power ∆PGi by (23);
19. Calculate the power error ∆Perror according to (22);
20. If |∆Perror|< Perror is not satisfied, execute step 15;
max
Table 1
The parameter values used in WPH.
C i ðPGi Þ ¼ ai P2Gi þ bi P Gi þ ci ð15Þ
Parameter Value
where PGi is the active power of the ith unit; Ci is the generation
k (trace-attenuation factor) 0.9
costs of the ith unit; positive constants ai, bi, ci are the coefficients
c (discount factor) 0.9
of the generation costs, respectively. a (Q-learning rate) 0.5
Hence, the generation costs of the assigned AGC power dispatch u (variable learning rate) 0.06
will be amended into
Table 2
Transfer function parameters of units used in ISDN model.
X
n
2.4. Virtual consensus variable
DPerror ¼ DPR DPGi ð22Þ
i¼1
It can be readily found from (23) that the consensus variable
Thus, the consensus update with consideration of generation update is restricted by the upper and lower limit of adjustable
constraints can be described as capacity of units. Basically, if the active power of a unit exceeds
8 its limit then this limit will be taken as the consensus variable,
>
> xi ¼ xi;lower ; if DPGi < DPGi;min
>
> which will not be further updated. The change of an updating
< X n
xi ½k þ 1 ¼ dij xj ½k; if DPGi;min 6 DPGi 6 DPGi;max law means a variation of the dimension and element of topology
ð23Þ
>
> matrix D. In addition, an effective solution to the time-varying
>
>
j¼1
: topology needs to be found to satisfy the plug-and-play of ISDN.
xi ¼ xi;upper ; if DPGi > DPGi;max
As a result, a virtual consensus variable is proposed to tackle
where xi,lower and xi,upper are minimum and maximum of the ith the above issues, which is the same as (20) and (21) and does
agent consensus variable. not require the power limit of units to update itself, thus the
Table 3
System parameters of units used in ISDN model.
1000 1000
Q(λ)
0 0
DWoLF-PHC(λ)
-1000 -1000
1000 1000
Q
0
-1000 -1000
0 5000 10000 15000 0 5000 10000 15000
Time(sec) Time(sec)
(a) The pre-learning of different methods.
50 50.2
WPH
50
Q(λ)
49.8
49.8
Frequency (Hz)
49.6
49.4 49.6
DWoLF-PHC(λ)
50
50.5
Q
49.5 50
49.5
49
0 5000 10000 15000 0 5000 10000 15000
Time(sec) Time(sec)
(b) The system frequency of different methods.
Fig. 6. The pre-learning obtained under sinusoidal load disturbance.
0.1 0.1
Q-function differences
WPH
Q(λ)
0.05 0.05
DWoLF-PHC(λ)
0 0
1.5 0.15
1 0.1
Q
0.5 0.05
0 0
0 10000 20000 0 10000 20000
Time(sec) Time(sec)
computational burden can be dramatically reduced. Moreover, one The consensus variable xi and the unit’s regulation power DPGi
can use the virtual consensus variable to virtually connect the cannot be updated simultaneously in all regions, which results
standby units, and obtain the real consensus variable through a in an undesirable time-delay for the obtained optimal strategy.
correction of power limit without any further modification of sys-
tem topology, such that the plug-and-play can be achieved. The The overall WPH procedure is illustrated by Fig. 4.
real consensus variable xi can be calculated after the virtual con-
sensus variable xi,virtual is obtained as
8 3. WPH designed for AGC
< xi;lower ; if xi;virtual < xi;lower
>
xi ¼ xi;virtual ; if xi;lower 6 xi;virtual 6 xi;upper ð24Þ This section aims to design WPH for an adaptive coordinated
>
:
xi;upper ; if xi;virtual > xi;upper AGC. During each iteration, the wolf king monitors the current
operation state online to update the value function and Q-function,
then an action will be executed based on the average mixed
2.5. WPH procedure strategy.
1000 1000
Q(λ)
500 500
0 0
1000 1000
Q
500 500
0 0
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
(a) The controller output of different methods
.
50 50
WPH
Q(λ)
49.8 49.8
Frequency (Hz)
49.6 49.6
49.4
DWoLF-PHC(λ)
50 50
49.8
Q
49.8 49.6
49.6 49.4
49.2
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
(b) The system frequency of different methods.
Fig. 8. Control performance of different methods obtained under step load disturbance.
1000 1000
Controller output (kW)
WPH
Q(λ)
500
500
0
0
DWoLF-PHC(λ)
1000 1000
Q
500 500
0 0
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
(a) The controller output of different methods.
50.5 50.5
DWoLF-PHC(λ) WPH
Q(λ)
Frequency (Hz)
50 50
49.5 49.5
50.5
50.5
50 50
Q
49.5
49.5
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
(b) The system frequency of different methods.
Fig. 9. Control performance of different methods obtained under an impulsive load disturbance.
of |Df| and Cinstantaneous is selected as the reward function, in which where |Df| and Cinstantaneous indicate the instantaneous absolute
a larger weighted sum will result in a smaller reward. values of the frequency deviation and the actual generation costs
The reward function R is written as of all units at the kth iteration. l and (1 l) represent the
metrics of |Df| and generation costs, respectively. Here l = 0.5 is
Rðsk1 ; sk ; ak1 Þ ¼ ljDf j2 ð1 lÞC instantaneous =50000 ð25Þ chosen.
20 L. Xi et al. / Energy Conversion and Management 122 (2016) 10–24
3.2. Parameter setting Simulation tests have found that convergence rate will be sig-
nificantly decelerated with a decreased DPerror, thus a variable con-
There are four parameters k, c, a, and u needed to be appropri- vergence coefficient e is proposed to improve the convergence rate.
ately selected as follows [9,39]: For the leader, eDPerror is chosen as a constant when the magnitude
of DPerror is less than a specific value, e.g., 10% of the maximum
The trace-attenuation factor 0 < k < 1, which allocates the cred- power deviation, such that the convergence rate can be increased.
its among state-action pairs. It determines the convergence rate Note that the value of eDPerror should be properly chosen to guar-
and the non-Markov decision process (MDP) effects for large antee the convergence of algorithm.
time-delay systems. A smaller value will assign fewer credits For a system consisting of n agents, the leader consensus vari-
to the historical state-action pairs for Q-function errors. able is increased by eDPerror while each agent consensus variable
The discount factor 0 < c < 1, which discounts the future is increased by eDPerror/n in average. According to (19), the total
rewards of Q-functions. A value close to 1 should be chosen as AGC power increment is calculated as R(eDPerror/2nai) with the ter-
the latest rewards in the thermal-dominated load frequency mination criterion DPerror 6 DP max
error . The sufficient condition for con-
control (LFC) process are the most important [9]. vergence is
The Q-learning rate 0 < a < 1, which comprises the convergence
rate and algorithm stability of Q-functions. Note that a small a DPmax
jeDP error j 6 Pn error ð26Þ
value leads to deaccelerating the learning rate and enhancing i¼1 1=2nai
the system stability.
The variable learning rate 0 < u < 1, which derives an optimal where DP max
error > 0 is the maximum tolerated power error of ISDN
policy by maximizing the action value. In particular, the algo- considering both convergence rate and stability of the algorithm.
rithm will be degraded into Q-learning if u = 1 as a maximal
action value is permanently executed in every iteration. For a
fast convergence rate a stochastic game ratio ulose/uwin = 4 is 4. Case studies
selected.
WPH is an extension of DWoLF-PHC(k). Details of case study of
The parameter values used in WPH are given in Table 1. DWoLF-PHC(k) carried out on IEEE two-area LFC power system
[43] can be found in the authors’ previous work [33].
An ISDN model consisting of various small distributed genera-
3.3. Analysis of convergence coefficient e tions (small hydropower plants, wind farms, biomasses power
plant, diesel generator, photovoltaics, etc.) is illustrated by Fig. 5.
It can be seen from (21) that convergence coefficient e determi- Note that this is a simplified model as photovoltaics and wind
nes the convergence rate of algorithm, of which a small value farms do not join FR, in which the photovoltaics simulate 24 h radi-
results in a slow convergence while a large value may lead to a ation intensity from [26] while wind farms have a cut-in wind
non-convergence. Hence, a proper e is very important for the speed of 3 m/s, a cut-off wind speed of 20 m/s, and a rated wind
trade-off between the convergence rate and stability of the speed of 11 m/s. The physical meaning of transfer function param-
algorithm. eters of units are given in Table 2, while the parameter values are
1000 1000
Controller output (kW)
WPH
Q(λ)
800 800
DWoLF-PHC(λ)
600 600
1000 1000
Q
800 800
600
600
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
(a) The controller output of different methods.
50 50
Q(λ)
WPH
Frequency(Hz)
49.8 49.8
49.7 49.7
DWoLF-PHC(λ)
50 50.1
49.9 50
Q
49.9
49.8
49.8
49.7 49.7
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
(b) The system frequency of different methods.
Fig. 10. Control performance of different methods obtained under a white noise load disturbance.
L. Xi et al. / Energy Conversion and Management 122 (2016) 10–24 21
300 700
Active power(kW)
Active power(kW)
200 600
Wind farms
100 500
Photovoltaics
0 400
0 6 12 18 24
Time(h)
(a) Active power of renewable energy power produced during 24 hours.
2000
Total active power
Load disturbance
Active power (kW)
1000
-1000
-2000
0 6 12 18 24
Time(h)
(b) Active power of stochastic load disturbance and total power of AGC.
Hydro
Biomass
Active power(kW)
-500
0 6 12 18 24
Time(h)
(c) Active power of different types of unit.
Fig. 11. The generation plot using WPH.
taken from [44–47]. Moreover, other system parameters of units 4.2. Step load disturbance
used in ISDN model are summarized in Table 3.
The structure of ISDN model is demonstrated in Fig. 5, which The control performance of WPH, DWoLF-PHC(k), Q(k)-learning,
involves one DN, 3 microgrids, 19 adjustable units with a total reg- and Q-learning is compared in the presence of a step load distur-
ulation power of 2760 kW, while nonadjustable units are consid- bance applied in ISDN model. Fig. 8 shows that their overshoots
ered as load disturbances. Furthermore, each adjustable unit has are around 2.7%, 8.6%, 6.3%, and 17.4%, respectively, while their
a corresponding agent and the connection weight bij between error averages are 0.5%, 2%, 4%, and 4.5%, respectively. As a conse-
agents is chosen to be 1. quence, WPH can provide a better control performance for AGC
units with less control costs, such that the wear-and-tear of the
4.1. Pre-learning units can be significantly reduced.
Fig. 6 presents the pre-learning of ISDN model, in which a con- 4.3. Impulsive and white noise load disturbance
sistent 10-min sinusoidal load disturbance is applied. It is obvious
that WPH can converge to the optimal strategy. The output of each unit is controlled by its own governor and its
Additionally, a Q matrix 2-norms ||Qik(s, a) Qi(k-1)(s, a)||2 6 1 set point is obtained according to an optimal dispatch. The long-
(1 = 0.0001 is a specified positive constant) is used as the termina- term control performance of WPH is evaluated by statistic experi-
tion criterion for the pre-learning of an optimal strategy [9]. Both ments, which undergoes a specific disturbance, occurred during a
the Q values and look-up table will be saved after the pre- period of 30 days. Four types of controller are tested, i.e., WPH,
learning, such that WPH can be applied into a real power system. DWoLF-PHC(k), Q(k)-learning, and Q-learning. The statistic experi-
The convergence of Q-function differences obtained during the ments are obtained under an impulsive and a white noise load dis-
pre-learning is given by Fig. 7. Apparently, WPH can accelerate turbance. Figs. 9(a) and 10(a) illustrate that WPH has smoother
the convergence rate by about 51.3–57.4% over that of others. regulation commands, while Figs. 9(b) and 10(b) show WPH can
22 L. Xi et al. / Energy Conversion and Management 122 (2016) 10–24
4000
3000
0 6 12 18 24
Time(h)
(a) Hourly generation costs of different algorithms obtained in 24 hours.
132000
Total generaon costs ($)
130000 130124
128000 WPH
126000 125220
124430 QP
124000 122770
122511 GWO
122000
GA
120000
118000 PROP
WPH QP
GWO GA
PROP
(b) Total generation costs of different algorithms.
Fig. 12. Comparison results of different algorithms.
Table 4
Feature comparison of different algorithms.
improve |Df| than that of others by 2 times and 3 times, effect of wind farms and photovoltaics is aggregated and consid-
respectively. ered as a square stochastic load with a cycle of 3600 s and a distur-
bance amplitude smaller than 2000 kW.
Remark 1. Similar to our published work [9,10,14,32,33], a Fig. 11(a) shows the active power of wind farms and photo-
sinusoidal load disturbance was firstly adopted in the pre- voltaics produced during 24 h, while Fig. 11(b) shows the total
learning to obtain the Q values and look-up table, which had been active power can accurately and rapidly track the load disturbance.
saved and will be employed for the future online operation. Then, In particular, the load disturbance consists of wind farms, photo-
in the online operation, a step change of load disturbance was used voltaics, and square disturbances. Note that the spike appeared
to simulate a sudden load increase, which often occurs in the in AGC active power is used to balance the stochastic power distur-
power system operation to evaluate the control performance of bance of wind farms and photovoltaics. Fig. 11(c) demonstrates the
WPH. In this paper, we have considered more practical operation 24 h power regulation of different AGCs. For a positive disturbance,
conditions to further investigate the effectiveness and control small hydropower plants and micro-turbines will be regulated at
performance of WPH by using an additional pulse-wave and a first, otherwise biomasses and diesel generators will be regulated
white noise load disturbance, which simulate a regular series of at first. Therefore, each unit can achieve a load ED if the equal
sudden load increase and drop (a pulse-wave), as well as a random incremental principle is satisfied.
load disturbance representing an unpredictable penetration of The control performance of WPH, GWO, PROPR [48], quadratic
distributed generations (a white noise). programming (QP) [49], GA [50] is compared here.
Both the hourly generation costs in 24 h and total generation
4.4. Stochastic load disturbance costs of different algorithms are represented in Fig. 12. Fig. 12(a)
shows the generation costs of PROP are the highest while that of
A real-time simulation in the presence of a 24 h stochastic dis- WPH are the lowest. Note that there exists a continuous fluctuation
turbance is undertaken on ISDN model, in which the combinatorial in the obtained results of GWO thus its control performance is not
L. Xi et al. / Energy Conversion and Management 122 (2016) 10–24 23
stable. Moreover, Fig. 12(b) presents that WPH can save about (3) Simulation results verify that WPH is highly adaptive and
$259, $1919, $2709, $7613 than that of QP, GWO, GA, and PROP, robust to the multi-regional, intensively stochastic, and
respectively. interconnected complex ISDN, which can dramatically
As a result, WPH is more adaptive under various operation increase the utilization rate of renewable energy and reduce
conditions and has a superior self-learning capability than that generation costs.
of others, particularly when the system is disturbed by the
stochastic load fluctuation. Since both the joint decision actions References
and previous state-action pairs are employed, WPH uses the aver-
age policy value to design a variable learning rate to achieve ISDN [1] Karavas CS, Kyriakarakos G, Arvanitis KG, Papadakis G. A multi-agent
coordination. Since the average mixed strategy needs to be decentralized energy management system based on distributed intelligence
for the design and control of autonomous polygeneration microgrids. Energy
resolved online for the mixed strategy update of ISDN model, a Convers Manage 2015;103:166–79.
real-time control performance must be considered to design the [2] Torreglosa JP, García P, Fernández LM, Jurado F. Hierarchical energy
variable learning rate and the average policy value. Furthermore, management system for stand-alone hybrid system based on generation
costs and cascade control. Energy Convers Manage 2014;77:514–26.
it is straightforward to obtain a relativity weight of each unit, [3] Azzam M, Mohamed YS. Robust controller design for automatic generation
which can dynamically update its Q-function look-up table control based on Q-parameterization. Energy Convers Manage 2012;43
through the experience sharing, such that the controller can be (13):1663–73.
[4] Tripathy SC, Bhardwaj V. Automatic generation control of a small hydro-
properly and timely tuned to optimize the overall control perfor- turbine driven generator. Energy Convers Manage 1996;37(11):1635–45.
mance. The experimental results verify that the utilization rate of [5] Howlader HR, Matayoshi H, Senjyu T. Distributed generation incorporated
the renewable energy has been dramatically increased with with the thermal generation for optimum operation of a smart grid
considering forecast error. Energy Convers Manage 2015;96:303–14.
reduced generation costs.
[6] Shayeghi H, Ghasemi A, Moradzadeh M, Nooshyar M. Simultaneous day-ahead
forecasting of electricity price and load in smart grids. Energy Convers Manage
5. Discussion 2015;95:371–84.
[7] Bevrani H, Habibi F, Babahajyani P, Watanabe M, Mitani Y. Intelligent
frequency control in an AC microgrid: online PSO-based fuzzy tuning
The difference between each algorithm is provided in Table 4. approach. IEEE Trans Smart Grid 2012;3(4):1935–44.
One can find that WPH is convergent, decentralized, strong robust, [8] Mallesham G, Mishra S, Jha AN. Automatic generation control of microgrid
using artificial intelligence techniques. In: Proceedings of the IEEE on Power
and has the lowest generation costs. This paper proposes a novel and Energy Society General Meeting, vol. 59; 2012. p. 1–8.
decentralized autonomous control, which has two main advan- [9] Yu T, Zhou B, Chan KW, Chen L, Yang B. Stochastic optimal relaxed automatic
tages as follows: generation control in non-Markov environment based on multi-step Q(k)
learning. IEEE Trans Power Syst 2011;26(3):1272–82.
[10] Yu T, Zhou B, Chan KW, Yuan Y, Yang B, Wu Q. R(k) imitation learning for
WPH is based on active power control and area frequency automatic generation control of interconnected power grids. Automatica
autonomy while the existing automatic voltage control (AVC) 2012;48(9):2130–6.
[11] Yu T, Wang Y, Ye W, Zhou B, Chan KW. Stochastic optimal generation
is based on reactive power control and node voltage control.
command dispatch based on improved hierarchical reinforcement learning
This similarity inspires a combination of WPH and AVC for approach. IET Gener, Transm Dis 2011;5(8):789–97.
future studies. As a result, the implementation of WPH of a [12] Zhou B, Chan KW, Yu T. Equilibrium-inspired multiple group search optimizer
decentralized EMS is feasible with acceptable generation costs. with synergistic learning for multi-objective electric power dispatch. IEEE
Trans Power Syst 2013;28(4):3534–45.
The power generation can be optimized by WPH in the [13] Zhou B, Chan KW, Yu T, Wei H, Tang J. Strength pareto multi-group search
presence of ever-increasing penetration of wind, solar, and optimizer for multiobjective optimal VAR dispatch. IEEE Trans Ind Inform
flywheel energy storage. Furthermore, the introduction of 2014;10(2):1012–22.
[14] Yu T, Zhou B, Chan KW, Lu E. Stochastic optimal CPS relaxed control
decentralized autonomy can fully exploit the power generated methodology for interconnected power systems using Q-learning method. J
from the large centralized sources (hydro, thermal, gas, Energy Eng-ASCE 2011;137(3):116–29.
nuclear energy, etc.), small distributed sources (wind, solar, [15] Doostizadeh M, Aminifar F, Lesani H, Ghasemi H. Multi-area market clearing in
wind-integrated interconnected power systems: a fast parallel decentralized
ocean energy, etc.), controllable loads, and static/dynamic method. Energy Convers Manage 2016;113:131–42.
storage systems. [16] Torreglosaa JP, García-Triviñob P, Fernández-Ramirezb LM, Juradoc F.
Decentralized energy management strategy based on predictive controllers
for a medium voltage direct current photovoltaic electric vehicle charging
Note that WPH has the fastest convergence rate for AGC, which station. Energy Convers Manage 2016;108:1–13.
is within a control period of 4–16 s. Hence it is adequate for the [17] Sayari NA, Chilipi R, Barara M. An adaptive control algorithm for grid-
control design of many small time-scale systems, such as drone interfacing inverters in renewable energy based distributed generation
systems. Energy Convers Manage 2016;111:443–52.
group and robot group.
[18] Mehrasa M, Pouresmaeil E, Mehrjerdi H, Jørgensen BN, Catalão JP. Control
technique for enhancing the stable operation of distributed generation units
6. Conclusion within a microgrid. Energy Convers Manage 2015;97:362–73.
[19] Yu T, Liu J, Hu X, Chan KW, Wang J. Distributed multi-step Q(k) learning for
Optimal power flow of large-scale power grid based on distributed multi-step
The contribution of this paper can be summarized as follows: backtrack Q(k) learning. Int J Elec Power 2012;42(1):614–20.
[20] Degroot MH. Reaching a consensus. J Am Stat Assoc 1974;69(345):118–21.
[21] Zhang Z, Ying XC, Chow MY. Decentralizing the economic dispatch problem
(1) An equal incremental principle based WPH is designed by using a two-level increment cost consensus algorithm in a smart grid
combining MAS-SG and MAS-CC to realize an optimal environment. In: North American power symposium. Charlotte: IEEE Press;
coordinated control of ISDN, which can simultaneously 2011. p. 1–7.
[22] Zhang Z, Chow MY. The leader election criterion for decentralized economic
realize an SGC based on mixed homogeneous and hetero-
dispatch using incremental cost consensus algorithm. In: IECON 2011-37th
geneous MA. annual conference on IEEE industrial electronics society. Melbourne: IEEE
(2) A virtual consensus variable has been employed into WPH to Press; 2011. p. 2730–5.
[23] Zhang Z, Chow MY. Convergence analysis of the incremental cost consensus
resolve topology variation caused by the AGC power exceed-
algorithm under different communication network topologies in a smart grid.
ing, while the startup and shutdown of units can be trans- IEEE Trans Power Syst 2012;27(4):1761–8.
formed into an actual and virtual connection between [24] Yang S, Tan S, Xu JX. Consensus based approach for economic dispatch
agents. Besides, the use of variable convergence coefficient problem in a smart grid. IEEE Trans Power Syst 2013;28(4):4416–26.
[25] Zhang Y, Rahbari-Asr N, Chow MY. A robust distributed system incremental
significantly improves the convergence rate such that an cost estimation algorithm for smart grid economic dispatch with
AGC dynamic optimal dispatch can be achieved. communications information losses. J Netw Comput Appl 2016;59:315–24.
24 L. Xi et al. / Energy Conversion and Management 122 (2016) 10–24
[26] Binetti G, Davoudi A, Lewis FL, Naso D, Turchiano B. Distributed consensus- [37] Ali R, Mohamed TH, Qudaih YS, Mitani Y. A new load frequency control
based economic dispatch with transmission losses. IEEE Trans Power Syst approach in an isolated small power systems using coefficient diagram
2014;29(4):1711–20. method. Int J Elec Power Energy Syst 2014;56(3):110–6.
[27] Kar S, Hug G. Distributed robust economic dispatch in power systems: a [38] Bevrani H, Hiyama T. Intelligent automatic generation control. CRC Press;
consensus+ innovations approach. In: IEEE power and energy society general 2011.
meeting. San Diego: IEEE Press; 2012. p. 1–8. [39] Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artif
[28] Rahbari-Asr N, Ojha U, Zhang Z, Chow MY. Incremental welfare consensus Intell 2002;136(2):215–50.
algorithm for cooperative distributed generation_demand response in smart [40] Godsil C, Royle G. Algebraic graph theory. New York: Springer-Verlag; 2001.
grid. IEEE Trans Smart Grid 2014;5(6):2836–45. [41] Moreau L. Stability of multiagent systems with time-dependent
[29] Loia V, Vaccaro A. Decentralized economic dispatch in smart grids by self- communication links. IEEE Trans Autom Control 2005;50(2):169–82.
organizing dynamic agents. IEEE Trans Syst, Man, Cybern, Syst 2014;44 [42] Ren W, Beard RW. Distributed consensus in multi-vehicle cooperative control:
(4):397–408. theory and applications. London: Springer-Verlag; 2008.
[30] Gupta E, Saxena A. Grey wolf optimizer based regulator design for automatic [43] Ray G, Prasad AN, Prasad GD. A new approach to the design of robust load
generation control of interconnected power system. Cogent Eng frequency controller for large scale power systems. Electr Power Syst Res
2016;8459:49–64. 1999;52(1):13–22.
[31] Mallick RK, Debnath MK, Haque F, Rout RR. Application of grey wolves-based [44] Kundur P. Power system stability and control. New York: McGraw-Hill; 1994.
optimization technique in multi-area automatic generation control. In: [45] Saha AK, Chowdhury S, Chowdhury SP, Crossley PA. Modeling and simulation
International conference on electrical, electronics, and optimization of microturbine in islanded and grid-connected mode as distributed energy
techniques (ICEEOT); 2016. resource. In: IEEE power & energy society general meeting. Pittsburgh: IEEE
[32] Yu T, Xi L, Yang B, Xu Z, Jiang L. Multiagent stochastic dynamic game for smart Press; 2008. p. 1–7.
generation control. J Energy Eng 2016;142(1):04015012. [46] Moreira C. Microgrids-operation and control under emergency conditions. LAP
[33] Xi L, Yu T, Yang B, Zhang X. A novel multi-agent decentralized win or learn fast LAMBERT Academic Publishing; 2012.
policy hill-climbing with eligibility trace algorithm for smart generation [47] Awad B, Ekanayake JB, Jenkins N. Intelligent load control for frequency
control of interconnected complex power grids. Energy Convers Manage regulation in microgrids. Intell Autom Soft Co 2010;16(2):303–18.
2015;103:82–93. [48] Gao ZH, Teng XL, Tu LQ. Hierarchical AGC mode and CPS control strategy for
[34] Pudjianto D, Ramsay C, Strbac G. Virtual power plant and system integration of interconnected power systems (in Chinese). Autom Elect Power Syst 2004;28
distributed energy resources. IET Renew Power Gen 2007;1(1):10–6. (1):78–81.
[35] Busoniu L, Babsaka R, Schutter B. A comprehensive survey of multiagent [49] Haddadian H, Hosseini SH, Shayeghi H, Shayanfar HA. Determination of
reinforcement learning. IEEE Trans Syst Man Cyber C-Appl Rev 2008;38 optimum generation level in DTEP using a GA-based quadratic programming.
(2):156–72. Energy Convers Manage 2011;52(1):382–90.
[36] You H, Vittal V, Yang Z. Self-healing in power systems: an approach using [50] Golpîra H, Bevrani H, Golpîra H. Application of GA optimization for automatic
islanding and rate of frequency decine-based load shedding. IEEE Trans Power generation control design in an interconnected power system. Energy Convers
Syst 2003;18(1):174–81. Manage 2011;52(5):2247–55.