Wolf Pack Hunting Strategy for Automatic Generation Control of an Islanding Smart Distribution Network

Energy Conversion and Management 122 (2016) 10–24
Contents lists available at ScienceDirect
Energy Conversion and Management

journal homepage: www.elsevier.com/locate/enconman
Wolf pack hunting strategy for automatic generation control

of an islanding smart distribution network q
Lei Xi a,b, Zeyu Zhang b, Bo Yang c, Linni Huang b, Tao Yu b,⇑
a
College of Electrical Engineering and New Energy, China Three Gorges University, Yichang 443002, China
b
School of Electric Power, South China University of Technology, Guangzhou 510641, China
c
Faculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650504, China
a r t i c l e i n f o a b s t r a c t
Article history: As the conventional centralized automatic generation control (AGC) is inadequate to handle the ever-
Received 1 March 2016 increasing penetration of renewable energy and the requirement of plug-and-play of smart grid, this
Received in revised form 22 April 2016 paper proposes a mixed homogeneous and heterogeneous multi-agent based wolf pack hunting (WPH)
Accepted 15 May 2016
strategy to achieve a fast AGC power dispatch, optimal coordinated control, and electric power autonomy
Available online 26 May 2016
of an islanding smart distribution network (ISDN). A virtual consensus variable is employed to deal with
the topology variation resulted from the excess of power limits and to achieve the plug-and-play of AGC
Keywords:
units. Then an integrated objective of frequency deviation and short-term economic dispatch is devel-
Automatic generation control
Islanding smart distribution network
oped, such that all units can maintain an optimal operation in the presence of load disturbances. Four
Wolf pack hunting case studies are undertaken to an ISDN with various distributed generations and microgrids.
Virtual consensus variable Simulation results demonstrate that WPH has a greater robustness and a faster dynamic optimization
than that of conventional approaches, which can increase the utilization rate of the renewable energy
and effectively resolve the coordination and electric power autonomy of ISDN.
Ó 2016 Published by Elsevier Ltd.
1. Introduction power references tracking of AGC in an IDN. In order to further

improve the adaptability and control performance of AGC, an
Generally speaking, a higher randomness and uncertainty of the online particle swarm optimization (PSO)-based fuzzy tuning
active power and load disturbance of smart distribution network approach was proposed for frequency control in an AC microgrid
(DN) will be resulted in by the ever-increasing penetration of dis- [7]. Moreover, bacterial foraging optimization (BFO), PSO, genetic
tributed generations and active load integration. Consequently, a algorithm (GA) [50], and conventional gradient descent algorithm
massive information gathering and high computational burden were applied to simultaneously optimize all the control parame-
will be emerged in the energy management system (EMS) [1,2], ters of microgrids by [8]. On the other hand, reinforcement learn-
which brings in many new problems and challenges to automatic ing has been investigated by the authors to achieve a smart
generation control (AGC) [3,4] of islanding DN (IDN). As a result, generation control (SGC) of interconnected power grids such that
it is necessary to study the multi-agent system (MAS) decentral- the AGC dynamic control performance can be improved [9–14].
ized coordinated control of AGC for IDN [5,6]. However, the aforementioned literatures are all based on central-
AGC can be normally divided into two procedures: (a) The total ized control, which requires a large amount of remote information
power references tracking of AGC, and (b) the total power refer- thus the control performance may be unsatisfactory with a rela-
ences dispatch into each unit through optimization. In practice, tively slow dynamic response [15,16].
proportional–integral (PI) controller is widely used in the total Recently, many researches have been undertaken to design a
decentralized control for smart grid. A decentralized control was
developed to improve the performance of a single-phase grid-
q
This work was partially supported by The National Basic Research Program (973 interfacing inverter on component level [17], which can merely
Program) (Grant no. 2013CB228205) and The National Natural Science Foundation improve the local performance of an inverter. Additionally, a
of China (Grant no. 51477055). passivity-based control was proposed by [18] to solve the integra-
⇑ Corresponding author.
tion of a single distributed generation unit, of which the storage
E-mail addresses: xilei2014@163.com (L. Xi), z_zeyu1991@163.com (Z. Zhang),
function is difficult to be constructed and it can only improve the
yangbo_ac@outlook.com (B. Yang), 1029807957@qq.com (L. Huang), taoyu1@scut.
edu.cn (T. Yu). local performance of a generator. In our published work [19], a
http://dx.doi.org/10.1016/j.enconman.2016.05.039
0196-8904/Ó 2016 Published by Elsevier Ltd.
L. Xi et al. / Energy Conversion and Management 122 (2016) 10–24 11
Nomenclature
Constants CC collaborative consensus

bij the weight between vi and vj MA multi-agent
ai coefficient of the generation costs ED economic dispatch
bi coefficient of the generation costs DCEQ(k) decentralized correlated equilibrium Q(k)
ci coefficient of the generation costs DWoLF-PHC(k) decentralized win or learn fast policy hill-
ai dynamic coefficient of the generation costs under climbing (k)
power disturbance FR frequency regulation
bi dynamic coefficient of the generation costs under MAS-SCG MAS stochastic consensus game
power disturbance MAS-SG MAS stochastic game
ci dynamic coefficient of the generation costs under MAS-CC MAS collaborative consensus
power disturbance DN distribution network
l metric of |Df| IDN islanding distribution network
1l metric of the generation costs
c the discount factor Variables
k the trace-attenuation factor ~ k, ai) the average mixed strategy
U(s
a the Q-learning rate e convergence coefficient
u a variable learning rate Df absolute value of the frequency deviation
1 a specified positive constant x equal incremental rate of generation costs
xi,lower minimum of the ith agent consensus variable
Set xi,upper maximum of the ith agent consensus variable
A the set of action xi,virtual the virtual consensus variable
S the set of state s the state of system in MAS-SG
U the set of mixed strategy s0 the initial state
V the set of node visit(sk) the total number of state sk from the initial state to the
E the set of edge current state
ag a greedy action
Indices a action
k index of iteration qk the Q-function error of the agent at the kth iteration
i, j index of agent running from 1 to n DPGi AGC regulation power of the ith unit
Vp⁄(s) optimal target state value function
Abbreviations p⁄(s) optimal strategy
PI proportional–integral R reward
PSO particle swarm optimization ek(s, a) the eligibility trace at the kth iteration used under state
s and action a
BFO bacterial foraging optimization
EMS energy management system dij[k] the (i, j) entry of the stochastic row matrix
AGC automatic generation control D ¼ ½dij 2 Rnn in the kth communication
xi the state of the ith agent in MAS-CC
MAS multi-agent system
GA genetic algorithm vi node
SGC smart generation control dk the estimate of Q-function error
SARSA (k) state-action-reward-state-action (k) Ci the generation costs of the ith unit
Ctotal the total actual generation costs
MDP Markov decision process
LFC load frequency control Tstep AGC decision time
VPP virtual power plant DP R the total power reference
ISDN islanding smart distribution network DP min
Gi the minimal adjustable capacity
PGi,actual the actual active power of the ith unit
DP max
Gi the maximal adjustable capacity
PGi,plan the planned active power of the ith unit DPerror the difference between the total power reference and
the total regulation power of all units
DP max
error the maximum tolerated power error of ISDN B weighted adjacency matrix
DPGi AGC regulation power of the ith unit L the topology of MAS
⁄
Q optimal function matrix G directed graph
D stochastic row matrix Qk(s, a) the state-action value function
R(sk, sk+1, ak) the agent’s reward function from state sk to sk+1
obtained under a selected action ak
WPH wolf pack hunting
multi-step Q(k) was designed for optimal power flow of large-scale amount of computation time as the agent number increases com-
power grid. It is a single agent based approach, which needs a large pared to that of [19]. Furthermore, it only has a single equilibrium
amount of computation time as the agent number increases. More- thus the system stability can be maintained. However, the above
over, multiple equilibriums may emerge which would result in an methods haven’t taken the collaborative consensus (CC) of decen-
undesired system instability. In contrast, this paper develops a tralized control systems into account, which cannot achieve a
robust decentralized controller of AGC, which can achieve a coordi- smart collaboration as each region is independent.
nated control between multi agents to improve the global perfor- In MAS, a consensus among all agents is defined by a same
mance of the whole system with an easy implementation. It is a selection of objective variable value through information exchang-
multiple agents based approach which consumes a much smaller ing with adjacent agents [20]. In the past decades, the application
12 L. Xi et al. / Energy Conversion and Management 122 (2016) 10–24
of CC in computer science, automatic control, aerospace engineer-

ing has been popularly studied, while few work has been done in
power systems. A multi-agent (MA) communication based incre-
mental consensus algorithm for economic dispatch (ED) of power
systems was firstly proposed in [21–23], of which the application
was extended from an undirected graph to a strongly connected
graph for ED by [24]. Then, Ref. [25] designed a robust incremental
cost estimation algorithm to resolve the communication informa-
tion losses. Moreover, a parallel consensus algorithm [26] was
developed by considering the transmission line losses and unit
constraints, which was able to handle various system scales and
topologies. Additionally, an innovation [27] was introduced to
replace the leader to track power deviations, such that the depen-
dence of centralized information can be reduced, while an incre-
mental welfare consensus algorithm was applied into generation/
demand response to further decentralize the overall control struc-
ture [28]. In [29], the decentralized autonomy for ED was effec- Fig. 1. The group hunting of a wild wolf pack in the harsh environment.
tively resolved through a collaborative dynamic agent framework.
Grey wolf optimizer (GWO) was proposed in [30,31], which has
four hierarchies in the whole wolf pack, namely, alpha, beta, delta, The remaining part of the paper is organized as follows. Sec-
and omega, while each hierarchy has and only has one wolf pack. tion 2 presents the concept and design of WPH. Section 3 develops
Moreover, there’s a limit of wolf members (5–12 on average) in a the WPH design of AGC. In Section 4, an ISDN model consisting of
single wolf pack. It is merely a parameter optimization method. various small distributed generations and microgrids under vari-
In particular, [30] adopted GWO to optimize frequency sensitivity ous disturbances is built, of which four case studies are under-
coefficient D and speed regulation R, objective functions J1 (Inte- taken. Some related discussions are provided in Section 5. Finally,
gral time absolute error) and J2 (Integral square error) used in Section 6 concludes the paper.
AGC; [31] employed GWO to optimize the proportional–integral–
derivative (PID) controller parameters. In contrast, WPH has five 2. WPH
hierarchies in the whole wolf pack, namely, wolf king, chief, patri-
arch, family member, reserve, of which the wolf king is a unique The decentralized autonomous control of power systems has
wolf, while the other four hierarchies can have unlimited number inspired the active power grid splitting [36] and frequency support
of wolf packs. It is a hybrid of control (Dwolf-PHC(k) based on of an islanding power grid [37]. Currently, decentralized autono-
MAS-SG) and optimization (CC based on MAS-CC). In particular, mous control has been applied to AGC of microgrids [38], which
Dwolf-PHC(k) is firstly used under MAS-SG to rapidly obtain the is very attractive as a virtual power plant (VPP) was introduced
overall power reference (control), then CC is adopted under MAS- for the power system FR. How to effectively integrate multiple
CC to optimally distribute the obtained overall power reference microgirds and VPPs into an ISDN to achieve an optimal AGC
into each unit (optimization). becomes a very important issue. The framework of decentralized
The authors have also developed decentralized correlated equi- AGC is illustrated by Fig. 2.
librium Q(k) (DCEQ(k)) [32] and decentralized win or learn fast pol- WPH is designed by combining MAS-SG and MAS-CC to handle
icy hill-climbing(k) (DWoLF-PHC(k)) [33], which can achieve an the coordination and distributed region optimization. The conven-
SGC of islanding power grids and improve the collaborative perfor- tional AGC for ISDN includes: (a) Obtain the total power references
mance of each region. However, the power of each unit was via controller (normally PI controller) based on local frequency
achieved through a fixed proportion of the adjustable capacity deviations; and (b) assign the total power references to each unit
rather than a dynamic optimization, which ignored the cooperation via a fixed proportion of the adjustable capacity based on the
between the secondary frequency regulation (FR) [34] and tertiary real-time operation condition. In contrast, WPH based AGC
FR. In fact, the optimal operation point of smart grid may not be includes: (a) Obtain the total power references via MAS-SG based
maintained when an unexpected load disturbance occurs. Thus, a on local frequency deviations; and (b) the chief firstly tracks the
decentralized CC control with smart AGC power dispatch becomes total power references and each unit communicates with its adja-
extraordinarily crucial for the optimal coordinated control of IDN. cent units, then all units obtain its own power reference through
This paper aims to propose a novel MAS stochastic consensus MAS-CC.
game (MAS-SCG) for the hybrid of homogeneous and heteroge- The application of WPH is not restricted to the centralized cal-
neous MA, in which an MAS collaborative consensus (MAS-CC) will culation and power reference dispatch of a single centralized con-
be chosen if a system has many followers, while MAS stochastic troller. Actually, if some agents break down unexpectedly, the rest
game (MAS-SG) [35] will be selected if a system has few leaders. can remain an information exchange and continues to achieve a
The idea of MAS-SCG originally stems from the group hunting of new consensus. As there often exists more than one communica-
a wild wolf pack in the harsh environment illustrated by Fig. 1, tion channels between agents, AGC performance can keep optimal
which ensures the survival and prosperity of the whole wolf pack when some communication channels are even faulty. This is due to
via CC, and then wolf pack hunting (WPH) strategy is developed. the information sharing of each agent specified by Fig. 3. Some
Under the proposed framework, a continuous information related concepts are defined as follows:
exchange between agents (AGC units) is adopted to rapidly calcu-
late optimal power references, which achieves an optimal coordi- Wolf king: This is a control center with MAS-SG among all
nated control of each region and electric power autonomy of regions. It communicates with each chief, and sends optimal
islanding power grids with increased utilization rate of renewable power references dispatch to them.
energy. An islanding smart DN (ISDN) model consisting of multiple Chief: This is a dispatch end of a whole region. It communicates
distributed generations and microgrids is built to verify the effec- and coordinates with the wolf king, and sends power references
tiveness of the proposed approach. to the patriarch of each family.
Fig. 2. The framework of decentralized AGC.
Fig. 3. The multi-area WPH framework.

Family: This is a group of units sharing similar regulation fea- usk ak ¼ minðUðsk ; ak Þ; ui =ðjAi j 1ÞÞ ð9Þ
tures in a region, such as hydro, micro gas turbine, and diesel
generator. where u is a variable learning rate with ulose > uwin. If an average
Patriarch: Who is a leader of generation control units (a big wolf mixed strategy value is lower than the current value, then the agent
in WPH) with significant dispatch capacity, it can achieve a wins and uwin will be selected, otherwise ulose will be chosen. The
highly active searching and execute complex generation com- updating law is given as
8 X X
mands independently. < uwin ; if Uðsk ; ai ÞQðsk ; ai Þ > ~ k ; ai ÞQðsk ; ai Þ
Uðs
Family member: Who is a follower of generation control units (a ui ¼ ai 2A ai 2A ð10Þ
small wolf in WPH), however, it can only follow the patriarch :
ulose ; otherwise
behavior and execute some simple generation commands.
Reserve: Who is a standby group of small hydropower units, it ~ k, ai) is the average mixed strategy.After an action ai is
where U(s
will only be put into operation if a load disturbance exceeds executed, the average mixed strategy table of all actions is updated
50% of the default value. under state sk by
~ k ; ai Þ
Uðs ~ k ; ai Þ þ ðUðsk ; ai Þ Uðs
Uðs ~ k ; ai ÞÞ=visitðsk Þ; 8ai 2 A
2.1. MAS-SG framework
ð11Þ
A wolf king adopts MAS-SG to achieve a control objective while
where visit(sk) is the total number of state sk from the initial state to
each control area contains one and only one wolf pack. DWoLF-
the current state.
PHC(k) method based MAS-SG has been developed by the authors.
Some basic results are recalled in this section while more details
2.2. MAS-CC framework
can be found in [33,39].
The optimal target state value function Vp⁄(s) and strategy p⁄(s)
MAS-CC is introduced into WPH, which is adopted by the family
obtained under state s in Q-learning can be expressed as follows:
members with homogeneous MAS to follow the patriarch of a wolf

V p ðsÞ ¼ maxQ ðs; aÞ ð1Þ pack.
a2A
p ðsÞ ¼ arg maxQ ðs; aÞ ð2Þ 2.2.1. Graph theory

a2A The topology of MAS is represented through a directed graph
where A is the set of action. G = (V, E, A) with a set of node V = {v1, v2, . . . , vn}, a set of edge
The eligibility trace based on state-action-reward-state-action E # V V, and a weighted adjacency matrix B ¼ ½bij 2 Rnn [40].
(k) (SARSA(k)) is chosen as Here node vi denotes the ith agent, edge means the relationship
among the agents, and constant bij P 0 is the weight between vi
ckek ðs; aÞ þ 1; if ðs; aÞ ¼ ðsk ; ak Þ and vj, respectively. A graph G is strongly connected if any vertex
ekþ1 ðs; aÞ ¼ ð3Þ
ckek ðs; aÞ; otherwise can be realized from any other vertex by a directed path. The Lapla-
where ek(s, a) denotes the eligibility trace at the kth iteration used cian matrix L ¼ ½lij 2 Rnn of the graph G can be written as
under state s and action a; c is the discount factor; and k is the X
n
trace-attenuation factor. lii ¼ bij ; lij ¼ bij ; 8i–j ð12Þ
Q(k)-learning adopts SARSA(k) returns as a value function esti- j¼1;j–i
mator through traces, which combines the frequency and recency

where L determines the topology of MAS.
of heuristic events. The estimates of current value function errors
are calculated by
2.2.2. Design of CC
qk ¼ Rðsk ; skþ1 ; ak Þ þ cQ k ðskþ1 ; ag Þ Q k ðsk ; ak Þ ð4Þ An MAS consisting of n autonomous agents is regarded as a
node in a directed graph G. CC aims to achieve a consensus among
dk ¼ Rðsk ; skþ1 ; ak Þ þ cQ k ðskþ1 ; ag Þ Q k ðsk ; ag Þ ð5Þ each agent, which uses real-time updates of the state after commu-
where R(sk, sk+1, ak) indicates an agent’s reward function from state nicating with adjacent agents. Due to the communication delay
sk to sk+1 obtained under a selected action ak; ag is a greedy action; among agents, the first-order algorithm of a discrete time system
qk denotes the Q-function error of the agent at the kth iteration; and is chosen as [41]
dk is the estimate of Q-function error. X
n
Q-functions are updated as follows: xi ½k þ 1 ¼ dij ½kxj ½k ð13Þ
j¼1
Q kþ1 ðs; aÞ ¼ Q k ðs; aÞ þ adk ek ðs; aÞ ð6Þ
where xi is the state of the ith agent; k is the discrete time index;
Q kþ1 ðsk ; ak Þ ¼ Q kþ1 ðsk ; ak Þ þ aqk ð7Þ and dij[k] is the (i, j) entry of the stochastic row matrix
where a is the Q-learning rate. With sufficient trial-and-error, D ¼ ½dij 2 Rnn in the kth communication, which is given by
the state-action value function Qk(s, a) can converge to an optimal ,
X
n
joint action strategy denoted by optimal matrix Q⁄ with the dij ½k ¼ jlij j jlij j; i ¼ 1; . . . ; n: ð14Þ
probability of 1. j¼1
For a given agent, based on a mixed strategy set U(sk, ak), an
With the time-invariant communication and constant bij, a CC can
exploration action ak will be executed under state sk and transited
be achieved if and only if the directed graph is strongly connected
to state sk+1 with a reward R, in which the updating law of U(sk, ak)
[42].
is written by
(
usk ak ; if ak – arg maxakþ1 Q ðsk ; akþ1 Þ 2.2.3. Generation cost consensus
Uðsk ; ak Þ Uðsk ; ak Þ þ P
usk akþ1 ; otherwise The generation costs are chosen as the consensus variable
among all units in ISDN, which is usually represented by a quadra-
ð8Þ tic polynomial as
Initialize Q0(s, a), R(0), U0(s, a), Ũ0(s, a), visit(s0), and e0(s, a), for all s∈S, a∈A;
Set parameters φwin, φlose, φ, γ, λ, α, and Tstep=AGC decision time;
Give the initial state s0, k = 0;
Repeat
1. Choose an exploration action ak based on the mixed strategy set U(sk, ak);
2. Execute the exploration action ak to AGC units and run LFC system for the next Tstep sec;
3. Observe a new state sk+1 via ∆f and generation costs;
4. Obtain a short-term reward R(k) using (25);
5. Calculate the one step Q-function error ρk by (4);
6. Estimate SARSA(0) value function error δk through (5);
7. For each state-action pair (s, a), execute:
i) Let ek+1(s, a) γλek(s, a);
ii) Update Q-function Qk(s, a) to Qk+1(s, a) according to (6);
8. Resolve the mixed strategy Uk(sk, ak) by (8) and (9);
9. Update the value function Qk(sk, ak) to Qk+1(sk, ak) using (7);
10. Update the eligibility trace using (3), let e(sk, ak) e(sk, ak)+1;
11. Select the variable learning rate φ according to (10);
12. Resolve average mixed strategy table based on (11);
13. Set visit(sk) visit(sk)+1;
14. Output the total power reference ∆P ;
15. Apply consensus algorithm through (20) or (21);
16. Calculate the unit’s regulation power ∆PGi using (19);
17. If the generation limit is not exceeded, then execute step 19;
18. Calculate the consensus variable i and the unit’s regulation power ∆PGi by (23);
19. Calculate the power error ∆Perror according to (22);
20. If |∆Perror|< Perror is not satisfied, execute step 15;
max
21. Output the unit’s regulation power ∆PGi;

22. Set k = k + 1, and return to step 1.
End
Fig. 4. The overall execution procedure of WPH.
Table 1
The parameter values used in WPH.
C i ðPGi Þ ¼ ai P2Gi þ bi P Gi þ ci ð15Þ
Parameter Value
where PGi is the active power of the ith unit; Ci is the generation
k (trace-attenuation factor) 0.9
costs of the ith unit; positive constants ai, bi, ci are the coefficients
c (discount factor) 0.9
of the generation costs, respectively. a (Q-learning rate) 0.5
Hence, the generation costs of the assigned AGC power dispatch u (variable learning rate) 0.06
will be amended into
C i ðPGi;actual Þ ¼ C i ðPGi;plan þ DPGi Þ ¼ ai DP2Gi þ bi DPGi þ ci ð16Þ

where x is the equal incremental rate of generation costs. x is cho-
where PGi,actual is the actual active power of the ith unit; PGi,plan sen as the consensus variable of MAS which can be calculated by
is the planned active power of the ith unit; DPGi is the AGC
regulation power of the ith unit; positive constants ai, bi, ci are xi ¼ 2ai DPGi þ bi ð19Þ
dynamic coefficients used under power disturbances, with ai ¼ ai ;
where xi is the equal incremental rate of the ith unit generation
bi ¼ 2ai P Gi;plan þ bi ; ci ¼ ai P2Gi;plan þ bi PGi;plan þ ci . costs.
For a system consisting of n AGC units, AGC objective function
can be described by 2.3. Consensus algorithm based on equal incremental principle
8 P
>
> minC total ¼ ni¼1 ðai DP2Gi þ bi DPGi þ ci Þ
< P The consensus algorithm based on equal incremental principle
s:t:DPR ni¼1 DPGi ¼ 0 ð17Þ [23] can be expressed according to (13) as
>
>
: min
DPGi 6 DPGi 6 DPmax Gi
X
n
xi ½k þ 1 ¼ dij xj ½k; i ¼ 1; 2; . . . ; n ð20Þ
where Ctotal is the total actual generation costs; DPR is the total j¼1
power reference; DPminGi and DPmax

Gi are the minimal and maximal The power deviation is employed into the consensus update of
adjustable capacity, respectively. leader to balance the power as
According to the equal incremental principle, the minimum of
Ctotal can be obtained when all partial derivatives of the generation X
n
xi ½k þ 1 ¼ dij xj ½k þ eDPerror ð21Þ
costs to AGC regulation power of each unit are equal, which yield j¼1
dC 1 ðPG1;actual Þ dC 2 ðPG2;actual Þ dC n ðPGn;actual Þ

¼ ¼ ¼ ¼x ð18Þ where DPerror is the difference between the total power reference
dDPG1 dDPG2 dDPGn and the total regulation power of all units, which can be written by
Fig. 5. The structure of ISDN model.

Table 2
Transfer function parameters of units used in ISDN model.
Units type Parameters Value

Hydropower unit Secondary lag time, TSH 3
Pilot value and servomotor time constant, TP 0.04
Servo gain, KS 5
Permanent droop, RP 1
Reset time, TR 0.3
Temporary droop, RT 1
Maximum gate opening rate, Rmax open 0.16 pu/s
Maximum gate closing rate, Rmax close 0.16 pu/s
Water starting time, TW 1
Biomass power unit Secondary lag time, TSB 10
Time constant of the governor, TGB 0.08
Steam starting time, TWB 5
Mechanical starting time, TMB 0.3
Micro gas turbine Secondary lag time, TSM 5
Fuel system lag time constant 1, T1 0.08
Fuel system lag time constant 2, T2 0.3
Load limit time constant, T3 3
Temperature control loop gain, KT 1
Load limit, Lmax 1.2
Fuel cell Secondary lag time, TSF 2
Inverter time constant, TF 10.056
Inverter gain, KF 9.205
Diesel generator Secondary lag time, TSD 7
Time constant of the governor, TGD 2
Steam starting time, TWD 1
Mechanical starting time, TMD 3
X
n
2.4. Virtual consensus variable
DPerror ¼ DPR DPGi ð22Þ
i¼1
It can be readily found from (23) that the consensus variable
Thus, the consensus update with consideration of generation update is restricted by the upper and lower limit of adjustable
constraints can be described as capacity of units. Basically, if the active power of a unit exceeds
8 its limit then this limit will be taken as the consensus variable,
>
> xi ¼ xi;lower ; if DPGi < DPGi;min
>
> which will not be further updated. The change of an updating
< X n
xi ½k þ 1 ¼ dij xj ½k; if DPGi;min 6 DPGi 6 DPGi;max law means a variation of the dimension and element of topology
ð23Þ
>
> matrix D. In addition, an effective solution to the time-varying
>
>
j¼1
: topology needs to be found to satisfy the plug-and-play of ISDN.
xi ¼ xi;upper ; if DPGi > DPGi;max
As a result, a virtual consensus variable is proposed to tackle
where xi,lower and xi,upper are minimum and maximum of the ith the above issues, which is the same as (20) and (21) and does
agent consensus variable. not require the power limit of units to update itself, thus the
Table 3
System parameters of units used in ISDN model.
Area Units type Family no. Units no. DP max

Gi (kW) DP min
Gi (kW)
Ci ($/h)
ai bi ci
DN Hydropower unit Family1 G1 250 250 0.0001 0.0346 8.5957
G2 150 150 0.0001 0.0335 8.0643
G3 150 150 0.0001 0.0335 8.0643
G4 100 100 0.0001 0.0314 7.6248
Reserve (small hydropower unit) Family2 G5 100 100 0.0001 0.0314 7.6248
Biomass power unit Family3 G6 200 200 0.0004 0.0656 8.7657
Chief G7 200 200 0.0004 0.0656 8.7657
Microgrid1 Micro gas turbine Family4 G8 100 100 0.0002 0.1088 5.2164
G9 100 100 0.0002 0.1088 5.2164
G10 150 150 0.0002 0.1164 5.4976
Chief G11 150 150 0.0002 0.1164 5.4976
Microgrid2 Micro gas turbine Family5 G12 150 150 0.0002 0.1164 5.4976
G13 150 150 0.0002 0.1164 5.4976
Fuel cell Family6 G14 150 150 0.0003 0.1189 3.5442
Chief G15 150 150 0.0003 0.1189 3.5442
Microgrid3 Diesel generator Family7 G16 120 120 0.0004 0.2348 10.9952
G17 120 120 0.0004 0.2348 10.9952
G18 120 120 0.0004 0.2348 10.9952
Chief G19 120 120 0.0004 0.2348 10.9952
1000 1000
Controller output (kW)

WPH
Q(λ)
0 0
DWoLF-PHC(λ)
-1000 -1000
1000 1000
Q
0
-1000 -1000
0 5000 10000 15000 0 5000 10000 15000
Time(sec) Time(sec)
(a) The pre-learning of different methods.
50 50.2
WPH
50
Q(λ)
49.8
49.8
Frequency (Hz)
49.6
49.4 49.6
DWoLF-PHC(λ)
50
50.5
Q
49.5 50
49.5
49
0 5000 10000 15000 0 5000 10000 15000
Time(sec) Time(sec)
(b) The system frequency of different methods.
Fig. 6. The pre-learning obtained under sinusoidal load disturbance.
0.1 0.1
Q-function differences
WPH
Q(λ)
0.05 0.05
DWoLF-PHC(λ)
0 0
1.5 0.15
1 0.1
Q
0.5 0.05
0 0
0 10000 20000 0 10000 20000
Time(sec) Time(sec)
Fig. 7. The convergence of different methods obtained during the pre-learning.
computational burden can be dramatically reduced. Moreover, one The consensus variable xi and the unit’s regulation power DPGi
can use the virtual consensus variable to virtually connect the cannot be updated simultaneously in all regions, which results
standby units, and obtain the real consensus variable through a in an undesirable time-delay for the obtained optimal strategy.
correction of power limit without any further modification of sys-
tem topology, such that the plug-and-play can be achieved. The The overall WPH procedure is illustrated by Fig. 4.
real consensus variable xi can be calculated after the virtual con-
sensus variable xi,virtual is obtained as
8 3. WPH designed for AGC
< xi;lower ; if xi;virtual < xi;lower
>
xi ¼ xi;virtual ; if xi;lower 6 xi;virtual 6 xi;upper ð24Þ This section aims to design WPH for an adaptive coordinated
>
:
xi;upper ; if xi;virtual > xi;upper AGC. During each iteration, the wolf king monitors the current
operation state online to update the value function and Q-function,
then an action will be executed based on the average mixed
2.5. WPH procedure strategy.
The proposed WPH strategy has three following features:

3.1. Reward function selection
Wolf king adopts MAS-SG to achieve the control objective in
ISDN, while small wolves (family members) permanently follow In general, absolute value of the frequency deviation |Df| max-
their chief (corresponding to MAS-CC). imizes the long-term benefit of control performance and alleviates
The optimal strategy for a given region is only valid in that severe power fluctuations, while generation costs consider the
region. effect of EMS to the economy. As a consequence, a weighted sum
1000 1000

DWoLF-PHC(λ) WPH
Q(λ)
500 500
0 0
1000 1000
Q
500 500
0 0
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
(a) The controller output of different methods
.
50 50
WPH
Q(λ)
49.8 49.8
Frequency (Hz)
49.6 49.6
49.4
DWoLF-PHC(λ)
50 50
49.8
Q
49.8 49.6
49.6 49.4
49.2
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
Fig. 8. Control performance of different methods obtained under step load disturbance.
1000 1000
WPH
Q(λ)
500
500
0
0
DWoLF-PHC(λ)
1000 1000
Q
500 500
0 0
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
(a) The controller output of different methods.
50.5 50.5
DWoLF-PHC(λ) WPH
Q(λ)
Frequency (Hz)
50 50
49.5 49.5
50.5
50.5
50 50
Q
49.5
49.5
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
Fig. 9. Control performance of different methods obtained under an impulsive load disturbance.
of |Df| and Cinstantaneous is selected as the reward function, in which where |Df| and Cinstantaneous indicate the instantaneous absolute
a larger weighted sum will result in a smaller reward. values of the frequency deviation and the actual generation costs
The reward function R is written as of all units at the kth iteration. l and (1 l) represent the
metrics of |Df| and generation costs, respectively. Here l = 0.5 is
Rðsk1 ; sk ; ak1 Þ ¼ ljDf j2 ð1 lÞC instantaneous =50000 ð25Þ chosen.
3.2. Parameter setting Simulation tests have found that convergence rate will be sig-
nificantly decelerated with a decreased DPerror, thus a variable con-
There are four parameters k, c, a, and u needed to be appropri- vergence coefficient e is proposed to improve the convergence rate.
ately selected as follows [9,39]: For the leader, eDPerror is chosen as a constant when the magnitude
of DPerror is less than a specific value, e.g., 10% of the maximum
The trace-attenuation factor 0 < k < 1, which allocates the cred- power deviation, such that the convergence rate can be increased.
its among state-action pairs. It determines the convergence rate Note that the value of eDPerror should be properly chosen to guar-
and the non-Markov decision process (MDP) effects for large antee the convergence of algorithm.
time-delay systems. A smaller value will assign fewer credits For a system consisting of n agents, the leader consensus vari-
to the historical state-action pairs for Q-function errors. able is increased by eDPerror while each agent consensus variable
The discount factor 0 < c < 1, which discounts the future is increased by eDPerror/n in average. According to (19), the total
rewards of Q-functions. A value close to 1 should be chosen as AGC power increment is calculated as R(eDPerror/2nai) with the ter-
the latest rewards in the thermal-dominated load frequency mination criterion DPerror 6 DP max
error . The sufficient condition for con-
control (LFC) process are the most important [9]. vergence is
The Q-learning rate 0 < a < 1, which comprises the convergence
rate and algorithm stability of Q-functions. Note that a small a DPmax
jeDP error j 6 Pn error ð26Þ
value leads to deaccelerating the learning rate and enhancing i¼1 1=2nai
the system stability.
The variable learning rate 0 < u < 1, which derives an optimal where DP max
error > 0 is the maximum tolerated power error of ISDN
policy by maximizing the action value. In particular, the algo- considering both convergence rate and stability of the algorithm.
rithm will be degraded into Q-learning if u = 1 as a maximal
action value is permanently executed in every iteration. For a
fast convergence rate a stochastic game ratio ulose/uwin = 4 is 4. Case studies
selected.
WPH is an extension of DWoLF-PHC(k). Details of case study of
The parameter values used in WPH are given in Table 1. DWoLF-PHC(k) carried out on IEEE two-area LFC power system
[43] can be found in the authors’ previous work [33].
An ISDN model consisting of various small distributed genera-
3.3. Analysis of convergence coefficient e tions (small hydropower plants, wind farms, biomasses power
plant, diesel generator, photovoltaics, etc.) is illustrated by Fig. 5.
It can be seen from (21) that convergence coefficient e determi- Note that this is a simplified model as photovoltaics and wind
nes the convergence rate of algorithm, of which a small value farms do not join FR, in which the photovoltaics simulate 24 h radi-
results in a slow convergence while a large value may lead to a ation intensity from [26] while wind farms have a cut-in wind
non-convergence. Hence, a proper e is very important for the speed of 3 m/s, a cut-off wind speed of 20 m/s, and a rated wind
trade-off between the convergence rate and stability of the speed of 11 m/s. The physical meaning of transfer function param-
algorithm. eters of units are given in Table 2, while the parameter values are
1000 1000
WPH
Q(λ)
800 800
DWoLF-PHC(λ)
600 600
1000 1000
Q
800 800
600
600
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
(a) The controller output of different methods.
50 50
Q(λ)
WPH
Frequency(Hz)
49.8 49.8
49.7 49.7
DWoLF-PHC(λ)
50 50.1
49.9 50
Q
49.9
49.8
49.8
49.7 49.7
0 40000 80000 0 40000 80000
Time(sec) Time(sec)
Fig. 10. Control performance of different methods obtained under a white noise load disturbance.
300 700
Active power(kW)
Active power(kW)
200 600
Wind farms
100 500
Photovoltaics
0 400
0 6 12 18 24
Time(h)
(a) Active power of renewable energy power produced during 24 hours.
2000
Total active power
Load disturbance
Active power (kW)
1000
-1000
-2000
0 6 12 18 24
Time(h)
(b) Active power of stochastic load disturbance and total power of AGC.
Hydro
Biomass
Active power(kW)
500 Micro gas turbine

Fuel cell
Diesel generator
-500
0 6 12 18 24
Time(h)
(c) Active power of different types of unit.
Fig. 11. The generation plot using WPH.
taken from [44–47]. Moreover, other system parameters of units 4.2. Step load disturbance
used in ISDN model are summarized in Table 3.
The structure of ISDN model is demonstrated in Fig. 5, which The control performance of WPH, DWoLF-PHC(k), Q(k)-learning,
involves one DN, 3 microgrids, 19 adjustable units with a total reg- and Q-learning is compared in the presence of a step load distur-
ulation power of 2760 kW, while nonadjustable units are consid- bance applied in ISDN model. Fig. 8 shows that their overshoots
ered as load disturbances. Furthermore, each adjustable unit has are around 2.7%, 8.6%, 6.3%, and 17.4%, respectively, while their
a corresponding agent and the connection weight bij between error averages are 0.5%, 2%, 4%, and 4.5%, respectively. As a conse-
agents is chosen to be 1. quence, WPH can provide a better control performance for AGC
units with less control costs, such that the wear-and-tear of the
4.1. Pre-learning units can be significantly reduced.
Fig. 6 presents the pre-learning of ISDN model, in which a con- 4.3. Impulsive and white noise load disturbance
sistent 10-min sinusoidal load disturbance is applied. It is obvious
that WPH can converge to the optimal strategy. The output of each unit is controlled by its own governor and its
Additionally, a Q matrix 2-norms ||Qik(s, a) Qi(k-1)(s, a)||2 6 1 set point is obtained according to an optimal dispatch. The long-
(1 = 0.0001 is a specified positive constant) is used as the termina- term control performance of WPH is evaluated by statistic experi-
tion criterion for the pre-learning of an optimal strategy [9]. Both ments, which undergoes a specific disturbance, occurred during a
the Q values and look-up table will be saved after the pre- period of 30 days. Four types of controller are tested, i.e., WPH,
learning, such that WPH can be applied into a real power system. DWoLF-PHC(k), Q(k)-learning, and Q-learning. The statistic experi-
The convergence of Q-function differences obtained during the ments are obtained under an impulsive and a white noise load dis-
pre-learning is given by Fig. 7. Apparently, WPH can accelerate turbance. Figs. 9(a) and 10(a) illustrate that WPH has smoother
the convergence rate by about 51.3–57.4% over that of others. regulation commands, while Figs. 9(b) and 10(b) show WPH can
Hourly generation costs ($/h)

8000
WPH
7000 QP
GA
PROP
6000
GWO
5000
4000
3000
0 6 12 18 24
Time(h)
(a) Hourly generation costs of different algorithms obtained in 24 hours.
132000
Total generaon costs ($)
130000 130124
128000 WPH
126000 125220
124430 QP
124000 122770
122511 GWO
122000
GA
120000
118000 PROP
WPH QP
GWO GA
PROP
(b) Total generation costs of different algorithms.
Fig. 12. Comparison results of different algorithms.
Table 4
Feature comparison of different algorithms.
Algorithm Convergence Agent Mixed Game type Framework Feature Generation

type policy costs
Q-learning No SA No General sum games MDP Decentralized, on-line High
Q(k)-learning Yes SA No General sum games MDP Decentralized, on-line Medium
QP Yes Non No Centralized, on-line Medium
GA Yes Non No Centralized, off-line Medium
PROP Yes Non No Centralized, on-line High
DWoLF-PHC Yes MA Yes General sum games and self MAS-SG Decentralized, on-line Medium
(k) play
WPH Yes MA Yes General sum games and self MAS-SCG Decentralized, on-line, great Low
play robustness
improve |Df| than that of others by 2 times and 3 times, effect of wind farms and photovoltaics is aggregated and consid-
respectively. ered as a square stochastic load with a cycle of 3600 s and a distur-
bance amplitude smaller than 2000 kW.
Remark 1. Similar to our published work [9,10,14,32,33], a Fig. 11(a) shows the active power of wind farms and photo-
sinusoidal load disturbance was firstly adopted in the pre- voltaics produced during 24 h, while Fig. 11(b) shows the total
learning to obtain the Q values and look-up table, which had been active power can accurately and rapidly track the load disturbance.
saved and will be employed for the future online operation. Then, In particular, the load disturbance consists of wind farms, photo-
in the online operation, a step change of load disturbance was used voltaics, and square disturbances. Note that the spike appeared
to simulate a sudden load increase, which often occurs in the in AGC active power is used to balance the stochastic power distur-
power system operation to evaluate the control performance of bance of wind farms and photovoltaics. Fig. 11(c) demonstrates the
WPH. In this paper, we have considered more practical operation 24 h power regulation of different AGCs. For a positive disturbance,
conditions to further investigate the effectiveness and control small hydropower plants and micro-turbines will be regulated at
performance of WPH by using an additional pulse-wave and a first, otherwise biomasses and diesel generators will be regulated
white noise load disturbance, which simulate a regular series of at first. Therefore, each unit can achieve a load ED if the equal
sudden load increase and drop (a pulse-wave), as well as a random incremental principle is satisfied.
load disturbance representing an unpredictable penetration of The control performance of WPH, GWO, PROPR [48], quadratic
distributed generations (a white noise). programming (QP) [49], GA [50] is compared here.
Both the hourly generation costs in 24 h and total generation
4.4. Stochastic load disturbance costs of different algorithms are represented in Fig. 12. Fig. 12(a)
shows the generation costs of PROP are the highest while that of
A real-time simulation in the presence of a 24 h stochastic dis- WPH are the lowest. Note that there exists a continuous fluctuation
turbance is undertaken on ISDN model, in which the combinatorial in the obtained results of GWO thus its control performance is not
stable. Moreover, Fig. 12(b) presents that WPH can save about (3) Simulation results verify that WPH is highly adaptive and
$259, $1919, $2709, $7613 than that of QP, GWO, GA, and PROP, robust to the multi-regional, intensively stochastic, and
respectively. interconnected complex ISDN, which can dramatically
As a result, WPH is more adaptive under various operation increase the utilization rate of renewable energy and reduce
conditions and has a superior self-learning capability than that generation costs.
of others, particularly when the system is disturbed by the
stochastic load fluctuation. Since both the joint decision actions References
and previous state-action pairs are employed, WPH uses the aver-
age policy value to design a variable learning rate to achieve ISDN [1] Karavas CS, Kyriakarakos G, Arvanitis KG, Papadakis G. A multi-agent
coordination. Since the average mixed strategy needs to be decentralized energy management system based on distributed intelligence
for the design and control of autonomous polygeneration microgrids. Energy
resolved online for the mixed strategy update of ISDN model, a Convers Manage 2015;103:166–79.
real-time control performance must be considered to design the [2] Torreglosa JP, García P, Fernández LM, Jurado F. Hierarchical energy
variable learning rate and the average policy value. Furthermore, management system for stand-alone hybrid system based on generation
costs and cascade control. Energy Convers Manage 2014;77:514–26.
it is straightforward to obtain a relativity weight of each unit, [3] Azzam M, Mohamed YS. Robust controller design for automatic generation
which can dynamically update its Q-function look-up table control based on Q-parameterization. Energy Convers Manage 2012;43
through the experience sharing, such that the controller can be (13):1663–73.
[4] Tripathy SC, Bhardwaj V. Automatic generation control of a small hydro-
properly and timely tuned to optimize the overall control perfor- turbine driven generator. Energy Convers Manage 1996;37(11):1635–45.
mance. The experimental results verify that the utilization rate of [5] Howlader HR, Matayoshi H, Senjyu T. Distributed generation incorporated
the renewable energy has been dramatically increased with with the thermal generation for optimum operation of a smart grid
considering forecast error. Energy Convers Manage 2015;96:303–14.
reduced generation costs.
[6] Shayeghi H, Ghasemi A, Moradzadeh M, Nooshyar M. Simultaneous day-ahead
forecasting of electricity price and load in smart grids. Energy Convers Manage
5. Discussion 2015;95:371–84.
[7] Bevrani H, Habibi F, Babahajyani P, Watanabe M, Mitani Y. Intelligent
frequency control in an AC microgrid: online PSO-based fuzzy tuning
The difference between each algorithm is provided in Table 4. approach. IEEE Trans Smart Grid 2012;3(4):1935–44.
One can find that WPH is convergent, decentralized, strong robust, [8] Mallesham G, Mishra S, Jha AN. Automatic generation control of microgrid
using artificial intelligence techniques. In: Proceedings of the IEEE on Power
and has the lowest generation costs. This paper proposes a novel and Energy Society General Meeting, vol. 59; 2012. p. 1–8.
decentralized autonomous control, which has two main advan- [9] Yu T, Zhou B, Chan KW, Chen L, Yang B. Stochastic optimal relaxed automatic
tages as follows: generation control in non-Markov environment based on multi-step Q(k)
learning. IEEE Trans Power Syst 2011;26(3):1272–82.
[10] Yu T, Zhou B, Chan KW, Yuan Y, Yang B, Wu Q. R(k) imitation learning for
WPH is based on active power control and area frequency automatic generation control of interconnected power grids. Automatica
autonomy while the existing automatic voltage control (AVC) 2012;48(9):2130–6.
[11] Yu T, Wang Y, Ye W, Zhou B, Chan KW. Stochastic optimal generation
is based on reactive power control and node voltage control.
command dispatch based on improved hierarchical reinforcement learning
This similarity inspires a combination of WPH and AVC for approach. IET Gener, Transm Dis 2011;5(8):789–97.
future studies. As a result, the implementation of WPH of a [12] Zhou B, Chan KW, Yu T. Equilibrium-inspired multiple group search optimizer
decentralized EMS is feasible with acceptable generation costs. with synergistic learning for multi-objective electric power dispatch. IEEE
Trans Power Syst 2013;28(4):3534–45.
The power generation can be optimized by WPH in the [13] Zhou B, Chan KW, Yu T, Wei H, Tang J. Strength pareto multi-group search
presence of ever-increasing penetration of wind, solar, and optimizer for multiobjective optimal VAR dispatch. IEEE Trans Ind Inform
flywheel energy storage. Furthermore, the introduction of 2014;10(2):1012–22.
[14] Yu T, Zhou B, Chan KW, Lu E. Stochastic optimal CPS relaxed control
decentralized autonomy can fully exploit the power generated methodology for interconnected power systems using Q-learning method. J
from the large centralized sources (hydro, thermal, gas, Energy Eng-ASCE 2011;137(3):116–29.
nuclear energy, etc.), small distributed sources (wind, solar, [15] Doostizadeh M, Aminifar F, Lesani H, Ghasemi H. Multi-area market clearing in
wind-integrated interconnected power systems: a fast parallel decentralized
ocean energy, etc.), controllable loads, and static/dynamic method. Energy Convers Manage 2016;113:131–42.
storage systems. [16] Torreglosaa JP, García-Triviñob P, Fernández-Ramirezb LM, Juradoc F.
Decentralized energy management strategy based on predictive controllers
for a medium voltage direct current photovoltaic electric vehicle charging
Note that WPH has the fastest convergence rate for AGC, which station. Energy Convers Manage 2016;108:1–13.
is within a control period of 4–16 s. Hence it is adequate for the [17] Sayari NA, Chilipi R, Barara M. An adaptive control algorithm for grid-
control design of many small time-scale systems, such as drone interfacing inverters in renewable energy based distributed generation
systems. Energy Convers Manage 2016;111:443–52.
group and robot group.
[18] Mehrasa M, Pouresmaeil E, Mehrjerdi H, Jørgensen BN, Catalão JP. Control
technique for enhancing the stable operation of distributed generation units
6. Conclusion within a microgrid. Energy Convers Manage 2015;97:362–73.
[19] Yu T, Liu J, Hu X, Chan KW, Wang J. Distributed multi-step Q(k) learning for
Optimal power flow of large-scale power grid based on distributed multi-step
The contribution of this paper can be summarized as follows: backtrack Q(k) learning. Int J Elec Power 2012;42(1):614–20.
[20] Degroot MH. Reaching a consensus. J Am Stat Assoc 1974;69(345):118–21.
[21] Zhang Z, Ying XC, Chow MY. Decentralizing the economic dispatch problem
(1) An equal incremental principle based WPH is designed by using a two-level increment cost consensus algorithm in a smart grid
combining MAS-SG and MAS-CC to realize an optimal environment. In: North American power symposium. Charlotte: IEEE Press;
coordinated control of ISDN, which can simultaneously 2011. p. 1–7.
[22] Zhang Z, Chow MY. The leader election criterion for decentralized economic
realize an SGC based on mixed homogeneous and hetero-
dispatch using incremental cost consensus algorithm. In: IECON 2011-37th
geneous MA. annual conference on IEEE industrial electronics society. Melbourne: IEEE
(2) A virtual consensus variable has been employed into WPH to Press; 2011. p. 2730–5.
[23] Zhang Z, Chow MY. Convergence analysis of the incremental cost consensus
resolve topology variation caused by the AGC power exceed-
algorithm under different communication network topologies in a smart grid.
ing, while the startup and shutdown of units can be trans- IEEE Trans Power Syst 2012;27(4):1761–8.
formed into an actual and virtual connection between [24] Yang S, Tan S, Xu JX. Consensus based approach for economic dispatch
agents. Besides, the use of variable convergence coefficient problem in a smart grid. IEEE Trans Power Syst 2013;28(4):4416–26.
[25] Zhang Y, Rahbari-Asr N, Chow MY. A robust distributed system incremental
significantly improves the convergence rate such that an cost estimation algorithm for smart grid economic dispatch with
AGC dynamic optimal dispatch can be achieved. communications information losses. J Netw Comput Appl 2016;59:315–24.
[26] Binetti G, Davoudi A, Lewis FL, Naso D, Turchiano B. Distributed consensus- [37] Ali R, Mohamed TH, Qudaih YS, Mitani Y. A new load frequency control
based economic dispatch with transmission losses. IEEE Trans Power Syst approach in an isolated small power systems using coefficient diagram
2014;29(4):1711–20. method. Int J Elec Power Energy Syst 2014;56(3):110–6.
[27] Kar S, Hug G. Distributed robust economic dispatch in power systems: a [38] Bevrani H, Hiyama T. Intelligent automatic generation control. CRC Press;
consensus+ innovations approach. In: IEEE power and energy society general 2011.
meeting. San Diego: IEEE Press; 2012. p. 1–8. [39] Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artif
[28] Rahbari-Asr N, Ojha U, Zhang Z, Chow MY. Incremental welfare consensus Intell 2002;136(2):215–50.
algorithm for cooperative distributed generation_demand response in smart [40] Godsil C, Royle G. Algebraic graph theory. New York: Springer-Verlag; 2001.
grid. IEEE Trans Smart Grid 2014;5(6):2836–45. [41] Moreau L. Stability of multiagent systems with time-dependent
[29] Loia V, Vaccaro A. Decentralized economic dispatch in smart grids by self- communication links. IEEE Trans Autom Control 2005;50(2):169–82.
organizing dynamic agents. IEEE Trans Syst, Man, Cybern, Syst 2014;44 [42] Ren W, Beard RW. Distributed consensus in multi-vehicle cooperative control:
(4):397–408. theory and applications. London: Springer-Verlag; 2008.
[30] Gupta E, Saxena A. Grey wolf optimizer based regulator design for automatic [43] Ray G, Prasad AN, Prasad GD. A new approach to the design of robust load
generation control of interconnected power system. Cogent Eng frequency controller for large scale power systems. Electr Power Syst Res
2016;8459:49–64. 1999;52(1):13–22.
[31] Mallick RK, Debnath MK, Haque F, Rout RR. Application of grey wolves-based [44] Kundur P. Power system stability and control. New York: McGraw-Hill; 1994.
optimization technique in multi-area automatic generation control. In: [45] Saha AK, Chowdhury S, Chowdhury SP, Crossley PA. Modeling and simulation
International conference on electrical, electronics, and optimization of microturbine in islanded and grid-connected mode as distributed energy
techniques (ICEEOT); 2016. resource. In: IEEE power & energy society general meeting. Pittsburgh: IEEE
[32] Yu T, Xi L, Yang B, Xu Z, Jiang L. Multiagent stochastic dynamic game for smart Press; 2008. p. 1–7.
generation control. J Energy Eng 2016;142(1):04015012. [46] Moreira C. Microgrids-operation and control under emergency conditions. LAP
[33] Xi L, Yu T, Yang B, Zhang X. A novel multi-agent decentralized win or learn fast LAMBERT Academic Publishing; 2012.
policy hill-climbing with eligibility trace algorithm for smart generation [47] Awad B, Ekanayake JB, Jenkins N. Intelligent load control for frequency
control of interconnected complex power grids. Energy Convers Manage regulation in microgrids. Intell Autom Soft Co 2010;16(2):303–18.
2015;103:82–93. [48] Gao ZH, Teng XL, Tu LQ. Hierarchical AGC mode and CPS control strategy for
[34] Pudjianto D, Ramsay C, Strbac G. Virtual power plant and system integration of interconnected power systems (in Chinese). Autom Elect Power Syst 2004;28
distributed energy resources. IET Renew Power Gen 2007;1(1):10–6. (1):78–81.
[35] Busoniu L, Babsaka R, Schutter B. A comprehensive survey of multiagent [49] Haddadian H, Hosseini SH, Shayeghi H, Shayanfar HA. Determination of
reinforcement learning. IEEE Trans Syst Man Cyber C-Appl Rev 2008;38 optimum generation level in DTEP using a GA-based quadratic programming.
(2):156–72. Energy Convers Manage 2011;52(1):382–90.
[36] You H, Vittal V, Yang Z. Self-healing in power systems: an approach using [50] Golpîra H, Bevrani H, Golpîra H. Application of GA optimization for automatic
islanding and rate of frequency decine-based load shedding. IEEE Trans Power generation control design in an interconnected power system. Energy Convers
Syst 2003;18(1):174–81. Manage 2011;52(5):2247–55.

Wolf Pack Hunting Strategy for Automatic Generation Control of an Islanding Smart Distribution Network

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wolf Pack Hunting Strategy for Automatic Generation Control of an Islanding Smart Distribution Network

Uploaded by

Copyright:

Available Formats

Energy Conversion and Management 122 (2016) 10–24

Contents lists available at ScienceDirect

Energy Conversion and Management

Wolf pack hunting strategy for automatic generation control

1. Introduction power references tracking of AGC in an IDN. In order to further

Constants CC collaborative consensus

of CC in computer science, automatic control, aerospace engineer-

Fig. 2. The framework of decentralized AGC.

Fig. 3. The multi-area WPH framework.

p ðsÞ ¼ arg maxQ ðs; aÞ ð2Þ 2.2.1. Graph theory

mator through traces, which combines the frequency and recency

21. Output the unit’s regulation power ∆PGi;

Fig. 4. The overall execution procedure of WPH.

C i ðPGi;actual Þ ¼ C i ðPGi;plan þ DPGi Þ ¼ ai DP2Gi þ bi DPGi þ ci ð16Þ

power reference; DPminGi and DPmax

dC 1 ðPG1;actual Þ dC 2 ðPG2;actual Þ dC n ðPGn;actual Þ

Fig. 5. The structure of ISDN model.

Units type Parameters Value

Area Units type Family no. Units no. DP max

Controller output (kW)

Fig. 7. The convergence of different methods obtained during the pre-learning.

The proposed WPH strategy has three following features:

Controller output (kW)

500 Micro gas turbine

Hourly generation costs ($/h)

Algorithm Convergence Agent Mixed Game type Framework Feature Generation

You might also like

p ðsÞ ¼ arg maxQ ðs; aÞ ð2Þ 2.2.1. Graph theory