BolognaniZampieri NECSYS2010

Distributed Quasi-Newton Method and its
Application to the Optimal Reactive Power

Flow Problem
Saverio Bolognani ∗ Sandro Zampieri ∗∗
∗
Department of Information Engineering, University of Padova,
Padova, Italy (e-mail: saverio.bolognani@dei.unipd.it).
∗∗
Department of Information Engineering, University of Padova,
Padova, Italy (e-mail: zampi@dei.unipd.it).
Abstract: We consider a distributed system of N agents, on which we define a quadratic

optimization problem subject to a linear equality constraint. We assume that the nodes can
estimate the gradient of the cost function by measuring the steady state response of the system.
Even if the cost function cannot be decoupled into individual terms for the agents, and the linear
constraint involves the whole system state, we are able to design a distributed, gradient-driven,
algorithm, for the solution of the optimization problem. This algorithm belongs to the class
of quasi-Newton methods and requires minimal knowledge of the system to behave fairly well.
We proved finite time convergence of the algorithm in its centralized version, and we designed
its distributed implementation in the case in which a communication graph is given. In this
latter case, the tool of average consensus results to be fundamental for the distribution of the
algorithm. As a testbed for the proposed method, we consider the problem of optimal distributed
reactive power compensation in smart microgrids.
Keywords: Distributed optimization; quasi-Newton method; consensus; reactive power

compensation; smart microgrids.
1. INTRODUCTION Here we consider the simple case of a quadratic convex

function, but we do not assume separability of the cost
function in individual terms; moreover, a linear equality
Distributed optimization has been a key research subject constraint couples the decision variables of all nodes.
since when, in computer science, it has been explored
the possibility of minimizing a cost function by distribut- To tackle this problem, we specialized some quite classical
ing the processing task among agents (processors). This tools in convex optimization, quasi-Newton methods (Den-
framework has been first presented in the seminal work nis and Schnabel (1983), Nocedal and Wright (2006)), to
by Tsitsiklis et al. (1986) and gave birth to a well known the case of a linearly constrained problem. Then we exploit
literature on the field, e.g. Bertsekas and Tsitsiklis (1997). the tool of average consensus (Olfati-Saber et al. (2007),
Fagnani and Zampieri (2008)) to apply these methods to
More recently, the problem of distributed optimization a large scale complex system. As a result, we prove global
have been applied to the more challenging scenario of finite time convergence of a quasi-Newton method for con-
complex, large-scale systems. In these systems different strained minimization, and we show, via some numerical
issues may coexist: agents are in a large, unknown and simulations, how the algorithm behaves well when it is
time-varying, number; individual agents do not know the distributed among nodes that can only communicate with
whole cost function and cannot evaluate it; communication a subset of neighbors.
is constrained; sensing and actuation on an underlying
physical system has to be performed together with data As a motivation for this work, we present the problem of
processing; the physical system is partially unknown. A optimal reactive power compensation in power distribu-
notable “success story” in this sense is the application of tion networks. This application is part of the extremely
distributed optimization to the Internet: since the work of important framework of ancillary services in the so called
Kelly et al. (1998), large-scale data networks have been smart-grids (Santacana et al. (2010), Ipakchi and Albuyeh
probably the preferred testbed for these algorithms. (2009)), which can be considered among the most inter-
esting and intriguing testbeds at the moment.
Among the main tools for distributed optimization in
complex systems are the subgradient methods (see Nedic
and Ozdaglar (2009) and references therein). The largest 2. PROBLEM FORMULATION
part of these works assume however that the cost function
to be minimized is the sum of individual cost functions of Consider a distributed system described by a state q ∈
the agents (which is true, for example, in the problem of RN . Each component qi of q is the (scalar) state of a
utility maximization in data networks). single agent i ∈ V = {1, . . . , N }. Every agent updates
its state synchronously at times tk = kT, k ∈ Z, where T
is a constant positive integer and t0 = 0. controller PSU
Suppose that an underlying physical system exists, de-

scribed by the N -dimensional non linear ODE
v̇(t) = h(v(t), q(t)). (1) compensator 1
Assume that its steady state responde to constant input
q(t) = q is v(t) = v = g(q) ∈ RN . Following Chapter 8.1
in Isidori (1995), this is guaranteed if:
compensator i
+
• h(g(q), q) = 0;
input
• ∃ q̄ such that v = 0 is exponentially stable for (1) smart microgrid
with constant input q(t) = q̄ (and g(q̄) = 0).
−
Assume that each node is capable of measuring the i-th
element vi of v.
Let us consider a quadratic cost function of q
compensator N
M
F (q) = qT q + mT q , M > 0 (2)
2
whose gradient ∇F (q) = Mq + m coincides with the Fig. 1. Schematic representation of the controller structure
function g(q) given above. In other words, by measuring proposed in Tedeschi et al. (2008).
the steady-state response of the system (1) to the constant
input q, the nodes can estimate (element-wise) the gradi- Both residential and industrial users may require a si-
ent of the cost function (2) in q. Notice that as g(q̄) = 0, nusoidal current which is not in phase with voltage. A
then q̄ = arg min F (q). convenient description for that consists in stating that
they demand reactive power together with active power,
In this work we focus on the problem of designing a associated with out-of-phase and in-phase components of
distributed algorithm for the minimization of (2) subject the current respectively. Precisely, the active power p and
to a linear equality constraint, that is solving reactive power q delivered at node i are
min F (q) subject to aT q = b. (3)
q pi (t) = Ui Ii cos φ, qi (t) = Ui Ii cos φ,
2.1 Communication constraints where φ is the phase difference θiu − θii .
These power terms can be defined also in the case in
Communication between agents is constrained to be con- which signals are not sinusoidal and in which they can
sistent with a given communication graph G = (V, E), be considered perturbed versions of periodic signals with
where E ⊆ V × V is a set of edges (i, j), with i, j ∈ V. period T0 = 2π/ω0 . Following Tenti and Mattavelli (2003),
Therefore, for every node i, there exists a set of neighbors we define the homo-integral of a generic function x(t) as

Ni = {j ∈ V|(j, i) ∈ E} ⊆ V which is the set of agents from x̂(t) = ω0 X(t) − X̄(t)
which node i can gather data. We assume that (i, i) ∈ E ∀ i, Rt R t+T
and that G is strongly connected. where X(t) = 0 x(τ )dτ and X̄(t) = T10 t 0 X(t)dt.
3. OPTIMAL REACTIVE POWER COMPENSATION By introducing the scalar product (function of time)
Z t+T0
IN MICROGRIDS 1
hx, yit = x(τ )y(τ )dτ (4)
T0 t
As a motivating example, we introduce in this section the
we can then define active and reactive powers in this more
problem of optimal reactive power compensation in power
general framework as
distribution networks.
pi (t) = hu, iit , qi (t) = hû, iit .
Let us define a smart microgrid as a portion of the
electrical power distribution network that connects to the Like active power flows, reactive power flows contribute
larger distribution grid (or to the transmission grid) in one to power losses on the transmission and distribution lines,
point and that is managed autonomously from the rest of cause voltage drop in the network, and may lead to grid
the network. In particular, ancillary services like reactive instability (see Kundur (1994) and references therein).
power compensation, voltage support, and voltage profile
quality enhancement, are taken care by some microgrid Reactive power is not a “real” physical power, meaning
controllers, whose objective is to provide a high quality that to produce it there is no energy conversion involved
of the service to the microgrid users while satisfying some nor fuel costs. It is therefore preferrable to minimize
constraint on how the microgrid interfaces with the larger reactive power flows by producing reactive power as close
grid. We focus here on reactive power compensation. as possible to the users that need it.
Consider the steady state of the network, when voltages One possible approach has been proposed in Tedeschi
and currents are sinusoidal signals at frequency ω0 /2π in et al. (2008), and is sketched in Figure 1. It consists in
any point (at any port) of the network: a centralized controller that measures the reactive power
flow at the input port of the microgrid, i.e. where the
ui (t) = Ui sin(ω0 t + θiu ), ii (t) = Ii sin(ω0 t + θii ). microgrid connects with the main grid. According to this
qi fi
1 i
fi
1 i
fj
j
N
qj N
nodes in Fi
user
Fig. 3. The subset Fi , defined as the set of nodes containing
compensator the root for which the edge i is the only bridge.
Fig. 2. Tree of users and compensators. The direction of − −

1 i
the flow and the naming convention is indicated for +
node i ∈ C and j ∈ U.
measurement, the controller produces a reference for the +
amount of reactive power that has to be produced inside j
the microgrid. This reference has to be split by a power
sharing unit (PSU) among those agents in the network
that can produce a commanded amount of reactive power
(compensators), in a way that minimizes reactive power
flows in the microgrid. The number of compensators in a Fig. 4. Dependance of the gradient g on the flows in the
microgrid can be very large, as the electronic interface of tree; e.g. gi − gj equals to the flow on the path from
any distributed generator (wind turbines, combined heat i to j. According to the given convention, flows have
and power generators, micro hydroelectric, solar panels) to be counted with negative sign when going upwards
can also produce reactive power at no additional cost. in the tree, as the signs show.
Let the electrical connections in the microgrid be described K = 2 diag(k2 , . . . , kN ), we can rewrite the problem as
by a tree of N̄ agents (see Figure 2). Each agent injects AT KA
a quantity qi (t) of reactive power into the network. N min F (q) = qT q + q0T BT KAq
2
of them (the compensators, whose indices belong to C) M
can be commanded to inject a given amount of reactive = qT q + mT q
2
power, while the other nodes (users, whose indices belong
subject to 1T q = −1T q0 = c
to U) inject (or are supplied with, if negative) a fixed and
unknown amount. Flows on the tree edges are oriented and therefore we have casted the optimal reactive power
outbound with respect to the tree root (node 1), and flow problem into the framework described in Section 2.
indexed as in Figure 2. Reactive power obeys regular flow The gradient of F (q) can be expressed as g(q) =
conservation equations. AT KAq + AT KBq0 = AT Kf , where
..
 
As power losses are a quadratic function of the reactive
power flowing on a line, the optimization problem of  X . 
having minimal power losses in the microgrid corresponds T
A Kf = 

 ki f i

 (5)
to minimizing i∈E−Pi


N ..
X .
F (f2 , . . . , fN ) = fi2 ki
i=2
and Pi ⊆ E is the path from the root to node i. Let us
consider the difference gi (q)−gj (q), where gi (q) and gj (q)
where ki is the resistance of the edge i (which goes linearly
are the elements of g(q) corresponding to compensators i
with the length of the line). The constraints are
P and j. As Figure 4 shows, this difference corresponds to
• i∈C∪U qi = P0 gi (q) − gj (q) =
X
δ` (i, j)k` f` (q) (6)
• fi = fi (q) = j∈Fi qj , for i = 2, . . . , N `∈Pij
where the first constraint enforce reactive power conserva- where Pij ⊆ E is the path from node i to node j, and
tion in the network (assuming that the amount of reactive δ` (i, j) ∈ {+1, −1} depends on whether the edge i appears
power lost in the distribution lines is negligible compared in forward or backward direction in the path from i to j.
to the amount supplied to users), and the second set of
constraints allows to express the flow on each edge as the Let us suppose that each node is capable of measuring
the root-mean-square value vi (t) = kui kt of the voltage
sum of the reactive power injected by a subset Fi of the
at its location, and consider the steady state response of
nodes, as illustrated in Figure 3.
the network to a constant q (corresponding to sinusoidal
By stacking all the flows in a vector f we can then obtain voltages and currents across the network). The voltage
the linear form f = Aq + Bq0 , where q and q0 contain drop across edge (i, j) is described by vj − vi = fj − fj0
all qi ’s for i ∈ C and i ∈ U, respectively. By defining where i is the parent of node j, units of measurement
have been properly normalized, and fj0 is the voltage drop In the special, although interesting, case in which feasibil-
due to the active power flow on edge (i, j). If the lines’ ity of the decision variable is required at any iteration, the
reactance is larger than the lines’ resistance, then fj0 fj gain matrix Γ must satisfy
and therefore fj ≈ vj − vi . By this assumption, we have aT Γu = 0 for all u ∈ RN .
gi − gj ≈ vi − vj and therefore
g(q) = v(q) − ξ1 The next lemma show what it the best choice for Γ, if the
Hessian M is fully known.
where v is the vector of all voltage measurements and
ξ is an unknown scalar. While we did not consider this Lemma 1. (Constrained Newton algorithm). Let q be a
uncertainty of the gradient estimate in Section 2, this feasible point for the optimization problem (3). Assume
uncertainty is not harmful, as the uncertain term ξ1 is M−1 aaT M−1
orthogonal to the constraint 1T q = c. Γ = M−1 − . (8)
aT M−1 a
+
Notice that by solving the constrained optimization prob- Then q defined by (7) is the solution of the constrained
lem, we obtain orthogonality of the gradient g with re- optimization problem (i.e. the algorithm converges in one
spect to the constraint, i.e. g = µ1 for some µ. This step).
corresponds to having constant voltage across the whole
network. Therefore by neglecting the contribution of active Proof. To show that q+ is the solution of the constrained
power flows to voltage drop, we are considering the two optimization problem, we have to show that
problems of minimizing power losses and of achieving • q+ is feasible
optimal (flattest) voltage profile, equivalent. These two • the gradient g(q+ ) is orthogonal to the constraint.
objectives are not equivalent if the resistance of the lines T
is not negligible compared with their reactance. The first claim is true for any Γ in the form H − Haa H
aT Ha
,
for any H, as
4. ITERATIVE OPTIMIZATION ALGORITHMS aT Ha T
aT q+ = aT q − aT Hg(q) + T a Hg(q) = b.
a Ha
Two main issues arise when trying to design a distributed
optimization algorithm for large-scale complex systems For the second claim we have to prove that g(q+ ) ∈
like the one described in Section 3. ker(aT )⊥ = Im(a). Indeed, we have
g(q+ ) = m + Mq+
The first issue is the fact that the agents do not know the
structure of the whole system (the number of agents, their = m + Mq − MM−1 g(q)
connections in the communication graph, the underlying MM−1 a
physical system, etc.). Every agent has a local knownledge + T −1 aT M−1 g(q)
a M a
of this information, and in many cases there is no central aT M−1 g(q)
unit that has a broader view of the system. Moreover, it is = a T −1 ∈ Im(a).
often the case that the structure of the systems changes in a M a

time, due to some external events, reconfiguration, node
appearance and disappearance. If we know an approximation H of the inverse of the
The second issue is given by the communication capabili- Hessian M, we can plug H in (8), and we obtain the
ties among nodes, resulting in constraints on information approximate Netwon update step
and decision sharing among the agents. Agents willing to HaaT H

+
coordinate their behavior and exchange data may be forced q =q− H− T g(q)
a Ha
to interact with a smaller subset of neighbors, and in some (9)
cases to deal with quantization, data rate constraints, and aT Hg(q)
= q − Hg(q) + Ha.
unreliable communication. aT Ha
We will deal with the issue of algorithm distribution among In the update step (9) two parts can be recognized:
agents in the next section. In this section we will focus ∆q = q+ − q = ∆qdesc + ∆qproj (10)
on the first issue, reviewing some optimization methods
where
for convex optimization, specializing them to the linearly
constrained case, and discussing their effectiveness for • ∆qdesc = −Hg(q) is a descent step towards the
large-scale complex systems. optimum of the unconstrained quadratic problem
T
The class of optimization problems introduced in Section 2 • ∆qproj = a aHg(q)

T Ha Ha projects q + ∆qdesc on the
is generally tackled via gradient-driven iterative methods constraint (as proof of Lemma 1 shows, feasibility of
(see for example Boyd and Vandenberghe (2008)). A q+ is guaranteed regardless of the choice of H).
general formulation of the iterative update law of these According to the avilable level of knowledge of the system,
methods is the following: different degrees of approximation of the inverse of M can
q+ = q − Γg(q) be achieved, resulting in different algorithms.
(7)
Γ+ = Φ(q, Γ, g(q)). On one hand, we saw in Lemma 1 that if M−1 is com-
1 pletely known, it can be exploited to obtain the fastest
where Γ is an N × N gain matrix .
1 In this and in the following sections, we introduced the shorter quantities that appear in the algorithms), when this does not lead
notation q+ = q(tk+1 ) and q = q(tk ) (and similarly for other to confusion.
(one-step) convergence. However, Newton method requires aT ∆q(tk )∆d(tk )u
that node i knows the whole i-th row of M−1 . This may aT G(tk+1 )u = aT G(tk )u +
∆dT (tk )∆d(tk )
not be possible in large-scale systems, and jeopardizes the
possibility of node insertion and removal. aT G(tk )∆d(tk )∆dT (tk )u
−
∆dT (tk )∆d(tk )
On the other hand, when minimal knowledge is available, a
= 0,
diagonal H = αI can be used, obtaining the specialization
of steepest descent method to the linearly constrained case. where we used ∆q(tk ) = G(tk )d(tk ), and the fact that
Unfortunately, the steepest descent descent method may aT ∆d(tk ) = 0 and aT d(tk ) = 0. Therefore by induction
require a large number of iterations to converge, depending the thesis is verified.
on the condition number of the Hessian M (Boyd and Lemma 2 guarantees that aT ∆q = 0, or in other words
Vandenberghe (2008)). This results to be a major problem it guarantees that the update step for q returns always a
in our framework, where estimating the gradient g(q) feasible point for the constrained optimization problem, if
for a given q requires that the systems is driven into q(0) is feasible.
the state q, and consists in measuring the steady state
response of a dynamical system (therefore introducing an Lemma 3. The estimate G(tk+1 ) has full rank as long as
implicit tradeoff between accuracy and time delay in the G(0) is full rank and d(tj ) 6= 0 for 0 ≤ tj ≤ tk .
measurement). Proof. By Sherman-Morrison formula, G(tk+1 ) has full
Quasi-Newton methods (see for example Dennis and Schn- rank whenever G(tk ) is full rank and
abel (1983) and Nocedal and Wright (2006)) have instead ∆dT (tk )G(tk )−1 ∆q(tk ) 6= 0.
the useful feature of building an estimate of the inverse of By substituting we have
the Hessian from the previous steps of the algorithm. In
∆dT (tk )G(tk )−1 ∆q(tk ) = −∆dT (tk )d(tk )
the framework of complex systems, these methods deserve
special attention, as they require minimal knowledge of the = dT (tk )G(tk )MΩa d(tk )
problem, and they can deal with time-varying structures = dT (tk )G(tk )Md(tk )
via their adaptive behavior. which is zero if and only if d(tk ) = 0. Therefore by
Consider the following specialization of Broyden’s algo- induction G(tk+1 ) has full rank.
rithm (belonging to the class of quasi-Newton methods) Lemma 4. (Lemma 2.1 in Gay (1979)). Consider the k-th
for the constrained optimization problem (3): iteration of algorithm (11). If
q+ = q − Gd d(tk ) and ∆d(tk−1 ) are linearly independent,
[∆q − G∆d] ∆dT (11) then for 1 ≤ j ≤ b(k + 1)/2c, the j + 1 vectors
G+ = G + T
i
[Ωa MG(tk−2j+1 )] d(tk−2j+1 ), 0 ≤ i ≤ j,
∆d ∆d
where d = Ωa g, Ωa = I − aaT /aT a, is the projection of are linearly independent.
the gradient on the constraint, and Proof. The proof is given in Gay (1979).
∆d = d+ − d
We can now state the following result on the global, finite-
∆q = q+ − q. time convergence of (11).
Suppose that G is initialized as αI, for some α > 0, and Theorem 5. Consider the algorithm (11) initialized with
that it is not updated if k∆dk = 0. 2 G(0) = αI, with α > 0, and q(0) any feasible state. Then
It is easy to see that (11) is a special case of (7), in which the algorithm converges in at most 2N steps to the solution
Γ = GΩa and in which the rank-1 update for G satisfies of the constrained quadratic problem (3).
the secant condition Proof. By Lemma 4, there exists k with 1 ≤ k ≤ 2N
G+ ∆d = ∆q. such that d(tk ) and ∆d(tk−1 ) are linearly dependent.
Trivially, if d(tk ) = 0, this solves the optimization problem
The following lemmas will be helpful in proving the global (3), as the gradient is orthogonal to the constraint and
convergence of this algorithm 3 . Lemma 2 ensures that q(tk ) is a feasible point. If instead
Lemma 2. For any G(tk ) returned by the algorithm (11), ∆d(tk−1 ) = 0, then from the definition of d we have
and for all u ∈ RN , Ωa M∆q(tk−1 ) = 0, and therefore
Mq(tk ) = Mq(tk−1 ) + βa for some β ∈ R.
aT u = 0 ⇒ aT G(tk )u = 0.
Being M invertible, this means that ∆q(tk−1 ) = βM−1 a.
Proof. Consider the base case G(0) = αI. We have By left multiplying both terms by aT and using Lemma 2,
we get
aT G(0)u = αaT u = 0. βaT M−1 a = aT ∆q(tk−1 ) = 0.
Let us now suppose that the condition is verified for G(tk ), Therefore β = 0 and ∆q(tk−1 ) = 0. By Lemma 3 and
and consider G(tk+1 ). We have (11), this implies that d(tj ) = 0 for some j ≤ k − 1, and
therefore the solution has been reached in at most k ≤ 2N
2 steps. As a last case, suppose that d(tk ) and ∆d(tk−1 ) are
It will be clear in the proof of Theorem 5 that if k∆dk = 0, then
the algorithm has converged.
both non zero, but they are linearly dependent. Therefore
3 For these lemmas and for Theorem 3 we need to express time there exists λ 6= 0 such that
dependance explicitely. d(tk ) = λ∆d(tk−1 ).
From the algorithm equations and from the secant con- Of course, the way in which x is computed and the way
dition we have that ∆d(tk−1 ) = Ωa M∆q(tk−1 ) = in which all agents agree on its value is a key point in
Ωa MH(tk )∆d(tk−1 ). The same is then true for d(tk ), the design of the algorithm. For example a finer time
yielding d(tk ) = Ωa MG(tk )d(tk ). By rearranging the scale might exist: the fusion vector x can be obtained as
algorithm equations it is easy to see that d(tk+1 ) = d(tk )+ the result of another distributed algorithm, which exploits
Ωa M∆q(tk ) and therefore the same communication graph. This faster algorithm is
d(tk+1 ) = d(tk ) − Ωa MG(tk )d(tk ) = 0. initialized locally on the basis of the data stored in the
nodes and on the basis of measured steady state of the
Even in this case, d(tk+1 ) = 0 together with Lemma 2 underlying system. As it runs at a much faster pace, by
guarantee that q(tk+1 ) is the solution of the costrained the end of the period of time T it is able to implement
optimization problem. the function f of the data and to guarantee that all nodes
agree on a common x.
5. ALGORITHM DISTRIBUTION
Among the main algorithms that can be exploited to
obtain the fusion vector x, the tool of average consensus
In this section we deal with the problem of distributing (as described for example in Olfati-Saber et al. (2007) and
the algorithm among the nodes, in a way that is consis- in Fagnani and Zampieri (2008)) is of particular interest.
tent with the capabilities of the nodes to gather infor-
mation from neighbors according to the communication Average consensus can be quite useful when dealing with
constraints given by the problem. optimization problems subject to linear equality con-
straints. Consider indeed the decomposition (10) of the
Consider the decomposition of the generic algorithm (7) update step in an approximate Newton method, consisting
into update laws for the single agents. The generic agent i in an uncontrained descent step followed by its projection
has to perform the update over the constraint:
qi+ = qi − ΓTi g(q) aT Hg(q)
(12) q+ = q − Hg(q) + Ha
Γ+
i = Φi (q, Γ, g(q)). aT Ha
= q − Hg(q) + xHa.
where ΓTi is the i-th row of Γ. It is not possible, in general,
to implement (12) in a distributed manner, as both the The proposed decomposition has the major advantage that
update for qi and for the vector Γi require quantities that the scalar x on which all nodes have to agree, can be
are not available at node i: q, g(q), and Γ. obtained via average consensus, once the communication
graph is strongly connected. Indeed, if every node is
In the special case in which at every iteration of the capable of initializing a vector
algorithm
(i) ai Hi g(q)
Γij 6= 0 ⇒ (i, j) ∈ E, (13) z (0) = , (16)
ai Hi a
then the update law for qi only requires information that
can be gathered from the set of neighbors Ni . The issue of where ai and Hi are the i-th component of a and the
distributing the algorithm among the agents reduces then i-th row of H, respectively, then by running an average
to a sparsity condition on Γ (and a similar condition can consensus algorithm the nodes eventually agree on the
be stated for the update law for Γi ). quantity  
1 T
However, note that the decision variables in the optimiza- a Hg(q)
z̄ =  N 1
 
tion problem in (3) are coupled in two ways: by a possibly T

non diagonal Hessian M, and by the complicating con- a Ha
N
straint in which all the variables appear. For this reason, from which every node can obtain the desired quantity
it is inherently hard to solve the optimization problem via T
x = a aHg(q)
T Ha .
some purely local laws.
To tackle this issue, we introduce some shared piece of Therefore if we choose an estimate H of M−1 that satisfies
information among all agents: let x ∈ Rp be an auxiliary the sparsity constraint given by the communication graph,
fusion vector, function of the whole system state, and i.e.
suppose that all the nodes agree on the value of x. Hij 6= 0 ⇒ (i, j) ∈ E,
then both the computation of ∆qdesc and the inizialization
The local update laws (12) can then be replaced by of z for the computation of ∆qproj requires only commu-
qi+ = qi − γi (qj , gj (q), j ∈ Ni ; η i , x) nication between neighbors in G.
(14)
η+i = φi (qj , gj (q), j ∈ Ni ; η i , x).
6. DISTRIBUTED QUASI-NEWTON METHODS
where η i is a local parameter vector. The fusion vector x
is a function of all the data available in the system:
The approach presented in last section for algorithm
x = f (q, g(q), η i , i ∈ V). (15) distribution will now be applied to the Broyden quasi-
Newton method proposed in Section 4.
Algorithm (14) is now consistent with the communication
graph, because the update laws for both qi and η i are Let us define g(i) the the part of the gradient correspond-
function of local data (qi , gi (q), η i ), of data that can be ing to the set of the neighbors Ni = {j1 , . . . jni } of node
gathered from neighbors j ∈ Ni , and of the fusion vector i:
(i)
x. g(i) ∈ Rni , g` = gj` for ` = 1, . . . , ni . (17)
Following the sparse secant method proposed in Dennis 150
and Schnabel (1983) (Theorem 11.2.1) and including a
projection step as proposed in Section 5, let us consider 140
cost function F (q)

the following algorithm:
aT Hg(q) 130
q+ = q − Hg(q) + Ha
aT Ha (18)
120
h i
H+ = H + PE D + ∆q − H∆g(q) ∆gT (q)

where PE is the projection operator on the sparsity con- 110

strain induced by E:

Aij , if (i, j) ∈ E 100
(PE (A))ij =
0, if (i, j) ∈
/ E, 0 2 4 6 8 10 12 14
+
and where D is the diagonal matrix defined as iteration
( T
1/g(i) (q(tk ))g(i) (q(tk )) if g(i) (q(tk )) 6= 0 Fig. 5. Comparison of different algorithms: Newton
D + ii =

method (solid), quasi-Newton method with com-
0, if g(i) (q(tk )) = 0.
plete communication graph (dashed), quasi-Newton
method with sparsity constraint (dash-dotted), steep-
It is easy to see that (18) corresponds to (11) when the est descent with fixed step length (dotted).
graph is complete, with the only difference that in the
complete-graph case, thanks to Lemma 2, the projection 7. SIMULATIONS
on the constrain is performed on the measured gradient
and not on the descent step. Some simulation results will now be presented, showing
how the proposed algorithms behave when applied to the
Algorithm (18) can be distributed among the agents ob- problem of optimal reactive power compensation presented
taining the following update equations for the update of in Section 3.
qi and Hi ∈ Rni .
We will assume that communications take place through
qi+ = qi − HTi g (i) (q) + xHTi a(i)
the electrical lines, considering both the case in which ev-
h i ∆g(i) (q) ery node can communicate with any other node (complete
H+ T
i = Hi + ∆qi − Hi ∆g (q)
(i)
T
∆g(i) (q)∆g(i) (q) communication graph) and the case in which communi-
cation between two nodes is possible only if the electric
where a(i) is defined similarly to the definition (17) of path between these two nodes is short enough. This is the
g(i) , and x = z̄1 /z̄2 , where z̄1,2 are the elements of the simplest model for a promising communication solution for
2-dimensional vector resulting from the average consensus smart grids, which is Power Line Communication.
algorithm initialized as
Consider a microgrid of 33 nodes, 10 of which are compen-
a HT g(i) (q)

sators. Reactive power demands have been chosen accord-
z(i) (0) = i iT (i) .
ai Hi a (q) ing to a unitary variance normal distribution with nega-
tive mean −σ (standard deviation). The initial injected
Both the measure of v for the estimation of g(q), and
reactive power of the compensators have been normally
the initialization of z, take place as soon as the steady distributed too.
state response of the underlying system is available.Note
that the update laws for qi ’s and Hi ’s are consistent with The electrical connection tree has height 6, and every node
the communication graph, and memory requirements for that is not a leaf has an average of 2.4 children. Nodes that
every node scale with the number of neightbors in the are connected by a path of length not greater than 4 are
communication graph. able to communicate. The induced communication graph
results to be connected, but not complete.
The effectiveness of the quasi-Newton algorithm subject to
sparsity contraints depends of course on the structure of In Figure 5 the behavior of the different proposed al-
the inverse of the Hessian, and therefore on the particular gorithms has been plotted. For the sake of fairness, the
optimization problem that the algorithm has to solve. In step length for the fixed-step-length, steepest descent al-
the next section we will show how the problem of optimal gorithm, has been optimized for fastest convergence, and
reactive power compensation is a notable example in this the same initial condition has been set for quasi-Newton
sense. methods. Note however that this choice for the steplength
requires global knowledge of the problem: if this knowledge
The proof that we presented for the case of a complete is not available of it is approximate, more conservative
communication graph cannot be adapted to this other choices would be preferred (corresponding to a smaller
case, as it strongly relies on some algebraic properties of steplength) and therefore the advantage of quasi-Newton
the descent step that are lost as soon as we enforce sparsity methods would be even larger.
on the Hessian inverse estimates. Our claim is that in
the case of non-complete communication graph only local In the top left quadrant of Figure 6, the inverse of the
convergence is achieved; simulation results (presented in Hessian has been plotted. One can see how the sparsity
the next section) show that the algorithm behaves well on constraint induced by the communication graph (and
the testbed that we are considering. plotted in the lower right quadrant) is meaningful in
M−1 Hcomplete Another direction of investigation is the introduction of
approximate averaging in the consensus algorithms: while
we assumed that the consensus returns the average of the
2 2
nodes’ initial conditions, this is an approximation because
4 4 of the finite available time. The effects of this approxima-
tion is one of the very next directions of investigation.
6 6
8
ACKNOWLEDGEMENTS
8
10 10 The authors wish to thank Carlo Fischione for his useful
2 4 6 8 10 2 4 6 8 10 collaboration on the review of available distributed opti-
mization techniques and for the kind hospitality at KTH.
Hsparse sparsity constraint
REFERENCES
2 2 Bertsekas, D.P. and Tsitsiklis, J.N. (1997). Parallel and
4 4 Distributed Computation: Numerical Methods. Athena
Scientific.
6 6 Boyd, S. and Vandenberghe, L. (2008). Convex Optimiza-
8 8
tion. Cambridge University Press.
Dennis, Jr., J.E. and Schnabel, R.B. (1983). Numerical
10 10 Methods for Unconstrained Optimization and Nonlinear
2 4 6 8 10 2 4 6 8 10 Equations. Prentice-Hall.
Fagnani, F. and Zampieri, S. (2008). Randomized consen-
Fig. 6. Estimates of the inverse of the Hessian returned sus algorithms over large scale networks. IEEE Journal
from the quasi-Newton algorithm, together with the on Selected Areas in Communications, 26(4), 634 – 649.
real one and with the sparsity constraint. Gay, D.M. (1979). Some convergence properties of broy-
den’s method. SIAM Journal on Numerical Analysis,
describing the largest elements (in absolute value) of the 16(4), 623–630.
inverse of Hessian. The other two quadrant of Figure 6 Ipakchi, A. and Albuyeh, F. (2009). Grid of the future.
show the estimates of the Hessian returned by the quasi- are we ready to transition to a smart grid? IEEE Power
Newton methods when the algorithm has converged. and Energy Magazine, 7(2), 52 –62.
Isidori, A. (1995). Nonlinear Control Systems, volume 1.
Springer Verlag, London, 3rd edition.
8. CONCLUSION Kelly, F.P., Maulloo, A.K., and Tan, D.K.H. (1998). Rate
control for communication networks: Shadow prices,
In this paper we developed a distributed algorithm for the proportional fairness and stability. Journal of the Oper-
solution of a quadratic, convex, optimization problem with ational Research Society, 49, 237–252.
a linear equality constraint. Distributing the algorithm Kundur, P. (1994). Power System Stability and Control.
among the agents is complicated by the fact of having McGraw-Hill.
a non separable cost function and a global constraint. Nedic, A. and Ozdaglar, A. (2009). Distributed sub-
These two aspects have been tackled, respectively, by gradient methods for multi-agent optimization. IEEE
the construction of an estimate of the Hessian’s inverse Transactions of Automatic Control, 54(1), 48–61.
which is consistent with the communication graph, and by Nocedal, J. and Wright, S.J. (2006). Numerical Operation.
exploiting consensus to perform the projection step. Springer.
As a notable example of application of this algorithm, we Olfati-Saber, R., Fax, J.A., and Murray, R.M. (2007).
described the problem of optimal reactive power compen- Consensus and cooperation in networked multi-agent
sation in a microgrid. As the gradient can be estimated systems. Proceedings of the IEEE, 95(1), 215 – 233.
locally (up to a term orthogonal to the constraint) and Santacana, E., Rackliffe, G., Tang, L., and Feng, X. (2010).
because the Hessian inverse results to be well described by Getting smart. with a clearer vision of the intelligent
an approximation consistent with the sparsity constraint grid, control emerges from chaos. IEEE Power and
induced by the communication graph, the proposed quasi- Energy Magazine, 8(2), 41–48.
Newton method behaves well on this testbed. Tedeschi, E., Tenti, P., and Mattavelli, P. (2008). Syner-
gistic control and cooperative operation of distributed
While we considered a static optimization problem in harmonic and reactive compensators. In Proceedings of
this work, it is interesting to include the possibility of a the Power Electronic Specialists Conference.
dynamic optimization problem. This would allow us, for Tenti, P. and Mattavelli, P. (2003). A time-domain
example, to include slowly time varying reactive power de- approach to power term definitions under non-sinusoidal
mands in our testbed, which is a very reasonable assump- conditions. In 6th International Workshop on Power
tion. As the estimate of the Hessian’s inverse obtained Definitions and Measurements under Non-Sinusoidal
by the algorithm remains valid even if demands change Conditions. Milano, Italy.
(because reactive power demands affect the linear term Tsitsiklis, J.N., Bertsekas, D.P., and Athans, M. (1986).
and not the quadratic term of the cost function), we expect Distributed asynchronous deterministic and stochastic
the quasi-Newton method to perform much better than gradient optimization algorithms. IEEE Transactions
simpler fixed-step-length, steepest descent algorithms. on Automatic Control, 31(9), 803–812.

BolognaniZampieri NECSYS2010

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BolognaniZampieri NECSYS2010

Uploaded by

Copyright:

Available Formats

Distributed Quasi-Newton Method and its

Application to the Optimal Reactive Power

Abstract: We consider a distributed system of N agents, on which we define a quadratic

Keywords: Distributed optimization; quasi-Newton method; consensus; reactive power

1. INTRODUCTION Here we consider the simple case of a quadratic convex

Suppose that an underlying physical system exists, de-

Fig. 2. Tree of users and compensators. The direction of − −

The class of optimization problems introduced in Section 2 • ∆qproj = a aHg(q)

cost function F (q)

where PE is the projection operator on the sparsity con- 110

You might also like