You are on page 1of 144

EE364 Review Session 1

Administrative info:

• TAs: Alessandro Magnani, Argyris Zymnis and Joelle Skaf

• office hours: Tue 5:30 - 7:30pm, Wed 4:00 - 8:00pm, Packa

• review session: example problems and hw hints :)

• homeworks due Thursdays by 5pm

• hw file-cabinet, Packard 2nd floor near lunch area

• newsgroup: su.class.ee364
Important sets

• subspace

• affine set

• convex set

• cones

examples: line (say passing through origin), hyperplanes, halfs


polyhedra (bounded and unbounded), norm balls, etc.

EE364 Review Session 1


Combinations and hulls

y = θ1x1 + · · · + θk xk is a

• linear combination of x1, . . . , xk


P
• affine combination if i θi = 1
P
• convex combination if i θi = 1, θi ≥ 0
• conic combination if θi ≥ 0

(linear, affine, . . . ) hull of S = {x1, . . . , xk } is a set of all


(linear, affine, . . . ) combinations from S

linear hull: span(S)


affine hull: Aff (S)
convex hull: conv(S)
conic hull: Cone(S)

EE364 Review Session 1


example: S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}

what is linear hull? affine hull? convex hull? con

EE364 Review Session 1


Important rules

• intersection
  
subspace sub
 affine  \  affi
Sα is 
 convex
 for α ∈ A =⇒
 Sα is 
 con
α∈A
convex cone con

example: polyhedron is intersection of a finite number of h


and hyperplanes.

• affine functions and convexity

• epigraph: connection between convex sets and convex func

EE364 Review Session 1


Dual cones

if K is a cone, dual cone is defined as

K ? = { y | xT y ≥ 0 for all x ∈ K }

PSfrag replacements K?
90◦
90◦

EE364 Review Session 1


Exercise 2.10: Solution set of a quadratic ine

Let C ⊆ Rn be the solution set of a quadratic inequality,

C = {x ∈ Rn | xT Ax + bT x + c ≤ 0},

with A ∈ Sn, b ∈ Rn, and c ∈ R.

1. Show that C is convex if A  0.

2. Show that the intersection of C and the hyperplane defined


g T x + h = 0 (where g 6= 0) is convex if A + λgg T  0 for s

EE364 Review Session 1


Exercise 2.13: Conic hull of outer produc

Consider the set of rank-k outer products, defined as


{XX T | X ∈ Rn×k , rank X = k}. Describe its conic hull in
terms.

EE364 Review Session 1


EE364 Review

EE364 Review Session 2

• Convex sets and convex functions

• Examples from chapter 3

• Installing and using CVX

1
Operations that preserve convexity

Convex sets Convex functions


Intersection Nonnegative weighted sum
Affine transformation Composition with an affine function
Perspective transformation Perspective transformation
Pointwise maximum and supremum
Minimization

• Convex sets and convex functions are related via the epigraph.

• Composition rules are extremely important.

EE364 Review Session 2 2


Simple composition rules

Let h : R → R and g : Rn → R. Let f (x) = h(g(x)). Then:

• f is convex if h is convex and nondecreasing, and g is convex

• f is convex if h is convex and nonincreasing, and g is concave

• f is concave if h is concave and nondecreasing, and g is concave

• f is concave if h is concave and nonincreasing, and g is convex

EE364 Review Session 2 3


Ex. 3.6 Functions and epigraphs. When is the epigraph of a function a
halfspace? When is the epigraph of a function a convex cone? When is the
epigraph of a function a polyhedron?

Solution:

If the function is affine, positively homogeneous (f (αx) = αf (x) for


α ≥ 0), and piecewise-affine, respectively.

Ex. 3.19 Nonnegative weighted sums and integrals.

Pr
1. Show that f (x) = i=1 αix[i] is a convex function of x, where
α1 ≥ α2 ≥ · · · ≥ αr ≥ 0, and x[i] denotes the ith largest component of
Pk
x. (You can use the fact that f (x) = i=1 x[i] is convex on Rn.)

2. Let T (x, ω) denote the trigonometric polynomial

T (x, ω) = x1 + x2 cos ω + x3 cos 2ω + · · · + xn cos(n − 1)ω.

EE364 Review Session 2 4


Show that the function
Z 2π
f (x) = − log T (x, ω) dω
0

is convex on {x ∈ Rn | T (x, ω) > 0, 0 ≤ ω ≤ 2π}.

Solution:

1. We can express f as

f (x) = αr (x[1] + x[2] + · · · + x[r]) + (αr−1 − αr )(x[1] + x[2] + · · · + x[r−1])


+(αr−2 − αr−1)(x[1] + x[2] + · · · + x[r−2]) + · · · + (α1 − α2)x[1],

which is a nonnegative sum of the convex functions

x[1], x[1]+x[2], x[1]+x[2]+x[3], ..., x[1]+x[2]+· · ·+x[r].

EE364 Review Session 2 5


2. The function

g(x, ω) = − log(x1 + x2 cos ω + x3 cos 2ω + · · · + +xn cos(n − 1)ω)

is convex in x for fixed ω. Therefore


Z 2π
f (x) = g(x, ω)dω
0

is convex in x.

EE364 Review Session 2 6


Ex. 3.22 Composition rules. Show that the following functions are convex.

aT
1. f (x) = − log(− log( m i x+bi )) on
P
i=1 e
Pm aT x+b
domPf = {x | i=1 e
i i < 1}. You can use the fact that
n
log( i=1 eyi ) is convex.

2. f (x, u, v) = − uv − xT x on
dom f = {(x, u, v) | uv > xT x, u, v > 0}. Use the fact that xT x/u is

convex in (x, u) for u > 0, and that − x1x2 is convex on R2++.

3. f (x, u, v) = − log(uv − xT x) on
dom f = {(x, u, v) | uv > xT x, u, v > 0}.

EE364 Review Session 2 7


Solution:

aT
1. g(x) = log( m i x+bi ) is convex (composition of the log-sum-exp
P
i=1 e
function and an affine mapping), so −g is concave. The function
h(y) = − log y is convex and decreasing. Therefore f (x) = h(−g(x)) is
convex.
p
2. We can express f as f (x, u, v) = − u(v − xT x/u). The function

h(x1, x2) = − x1x2 is convex on R2++, and decreasing in each
argument. The functions g1(u, v, x) = u and g2(u, v, x) = v − xT x/u
are concave. Therefore f (u, v, x) = h(g(u, v, x)) is convex.

3. We can express f as

f (x, u, v) = − log u − log(v − xT x/u).

The first term is convex. The function v − xT x/u is concave because v


is linear and xT x/u is convex on {(x, u) | u > 0}. Therefore the second
term in f is convex: it is the composition of a convex decreasing
function − log t and a concave function.

EE364 Review Session 2 8


is convex on dom f = Sn++.

Ex. 3.18a Show that f (X) = tr X −1

Solution:

Define g(t) = f (Z + tV ), where Z  0 and V ∈ Sn.

g(t) = tr((Z + tV )−1)


 
= tr Z (I + tZ
−1 −1/2
VZ )
−1/2 −1

−1 T

= tr Z Q(I + tΛ) Q
−1

T −1

= tr Q Z Q(I + tΛ) −1

n
X
= (QT Z −1Q)ii(1 + tλi)−1,
i=1

where we used the eigenvalue decomposition Z −1/2V Z −1/2 = QΛQT . In


the last equality we express g as a positive weighted sum of convex
functions 1/(1 + tλi), hence it is convex.

EE364 Review Session 2 9


CVX and disciplined convex programming

• CVX is a Matlab-based modeling system for convex optimization

• Can solve any problem that obeys the disciplined convex programming
ruleset

• Converts a problem to standard form and solves it using the solver


SeDuMi

You can download CVX from:

http://www.stanford.edu/~boyd/cvx/

EE364 Review Session 2 10


Using CVX

Typical CVX script:

cvx_begin

variable [name]([size1],[size2]) (optional)[type]

(optional)minimize(convex scalar function of the variables)

constraints

cvx_end

EE364 Review Session 2 11


Example:
minimize kAx − bk2,
where x ∈ Rn, b ∈ Rm.

CVX source code:

cvx_begin
variable x(n)
minimize(norm(A*x-b))
cvx_end

Note: This is for demonstration purposes only. It is perhaps easier to solve


this least-squares problem using x=A\b.

EE364 Review Session 2 12


Example:
minimize 1T x
subject to Ax = b
x ≥ 0,
where x ∈ Rn, b ∈ Rm.

CVX source code:

cvx_begin
variable x(n)
minimize(ones(1,n)*x)
subject to
A*x == b
x >= 0
cvx_end

EE364 Review Session 2 13


Example:
minimize maxk=1,...,m max(aTk x, 1/aTk x)
subject to 0 ≤ x ≤ 1,
where x ∈ Rn.

CVX source code:

cvx_begin
variable x(n)
minimize(max(max([A*x inv_pos(A*x)]’)))
subject to
x >= 0
x <= 1
cvx_end

EE364 Review Session 2 14


EE364: Convex Optimization
Section 3

April 25, 2005

EE364 §3
Outline

• Generalized eigenvalues

• Hyperbolic constraints

• Homework hints

• Conjugate function example

• Proof of Hölder’s inequality

EE364 §3 1
Generalized eigenvalues

the maximum generalized eigenvalue of a pair of symmetric matrices (X, Y ),


with Y  0 is

uT Xu
λmax(X, Y ) = sup T , dom f = Sn × Sn++
u6=0 u Y u

for each u 6= 0, the function


uT Xu
uT Y u

is linear-fractional in (X, Y ), hence a quasiconvex function of (X, Y ).

the function λmax(X, Y ) is quasiconvex, since it is the supremum of a


family of quasiconvex functions

EE364 §3 2
Hyperbolic constraints as SOC constraints.
Problem 4.26

xT x ≤ yz, y ≥ 0, z≥0

if and only if
 
2x
≤ y + z,

y−z y ≥ 0, z≥0
2

with x ∈ Rn, y, z ∈ R

EE364 §3 3
Proof

 
2x
y − z ≤ y + z, y ≥ 0, z ≥ 0 ⇐⇒

2
  2
2x ≤ (y + z)2,


y−z y ≥ 0, z ≥ 0 ⇐⇒
2

4xT x + (y − z)2 ≤ y 2 + z 2 + 2yz, y ≥ 0, z ≥ 0 ⇐⇒


xT x ≤ yz, y ≥ 0, z≥0

EE364 §3 4
Maximizing harmonic mean

Pm T
−1
maximize i=1 1/(ai x + bi ) ,

with domain {x | Ax  b}, where aTi is the ith row of A

• the function is log-concave

• can be cast as an SOCP

EE364 §3 5
the problem is equivalent to

minimize 1T t
subject to ti(aTi x + bi) ≥ 1, i = 1, . . . , m
t0

writing the hyperbolic constraints as SOC constraints yields an SOCP

T
minimize  t
1 
2
≤ a T x + b i + ti ,
subject to
T i i = 1, . . . , m
a i x + b i − t i 2
ti ≥ 0, aTi x + bi ≥ 0, i = 1, . . . , m

EE364 §3 6
Maximizing geometric mean

Qm T
1/m
maximize i=1 (ai x − bi ) ,

with domain {x | Ax  b}, where aTi is the ith row of A

• the function is concave

• can be cast as an SOCP

EE364 §3 7
consider m = 4 as an example
the problem is equivalent to

maximize y1y2y3y4
subject to y = Ax − b
y  0,

and
maximize t1t2
subject to y = Ax − b
y1y2 ≥ t21
y3y4 ≥ t22
y  0, t1 ≥ 0, t2 ≥ 0,
and also
maximize t
subject to y = Ax − b
y1y2 ≥ t21
y3y4 ≥ t22
t1 t2 ≥ t 2
y  0, t1, t2, t ≥ 0

EE364 §3 8
expressing the three hyperbolic constraints

y1y2 ≥ t21, y3y4 ≥ t22, t 1 t2 ≥ t 2

as SOC constraints yields an SOCP:

minimize −t
 
2t1
≤ y1 + y2, y1 ≥ 0, y2 ≥ 0
subject to

 y 1 − y 2

 2
2t2
y3 − y4 ≤ y3 + y4, y3 ≥ 0, y4 ≥ 0

  2
2t
t1 − t2 ≤ t1 + t2, t1 ≥ 0, t2 ≥ 0

2
y = Ax − b

EE364 §3 9
General case

• we can assume without loss of generality m = 2K for some integer K

• we can resurisvely expand the objective funtion as for m = 4

EE364 §3 10
Problem 3.49 log-concavity

(a) linear − log-sum-exp

(c) to show Qn
xi
i=1
f (x) = Pn , domf = Rn++
x
i=1 i
is log-concave
we need to show
n
X n
X
g(x) = log f (x) = log xi − log xi
i=1 i=1

is concave on Rn++

EE364 §3 11
Problem 3.49 (c) continued...

we will show ∇2g(x)  0.

partial derivatives:
∂g(x) 1 1
= − Pn ,
∂xi xi x
i=1 i
and

∂ 2g(x) 1 1
= − 2 + Pn 2,
∂x2i xi (
i=1 xi )
∂ 2g(x) 1
= Pn 2, i 6= j
∂xi∂xj ( i=1 xi )

therefore
T
 
11 1 1
∇2g(x) = Pn 2 − diag , . . . ,
( i=1 xi) x21 x2n

EE364 §3 12
Problem 3.49 (c) continued...
to show uT ∇2g(x)u ≤ 0, for all u ∈ Rn, i.e.,
!
T
 
T 1 1 11
u diag ,..., 2 − Pn 2 u ≥ 0,
x21 xn ( i=1 xi)
same as n Pn 2
X u2 i ( i=1 ui )
≥ 2
x2i
Pn
i=1 ( i=1 xi)

using Cauchy-Schwartz inequality

n
!2 n
! n
!
X X u2i X
ui ≤ 2 x2i
i=1
x
i=1 i i=1
n
! n
!2
X u2i X
≤ xi
i=1
x2i i=1

EE364 §3 13
Problem 3.49 (d)

• restrict the function to a line X = Z + tV

• use part c of this problem

EE364 §3 14
Problem 4.8

(a) Minimizing a linear function over an affine set

minimize cT x
subject to Ax = b

• if b 6∈ R(A), infeasible, optimal value is +∞


• we can write
c = AT λ + ĉ
where ĉ ∈ N (A)
• if ĉ = 0
cT x = λT Ax = λT b
optimal value is λT b, all feasible solutions are optimal
• if ĉ 6= 0 and x0 is feasible
x = x0 − tĉ is feasible too

cT x = λT Ax + ĉT (x0 − tĉ) = λT b + ĉT x0 − kĉk2t


the problem is unbounded (p? = −∞)

EE364 §3 15
Problem 4.8 continued...

(c) Minimizing a linear function over a rectangle

minimize cT x
subject to l  x  u,

where l and u satisfy l  u


• the objective and the constraints are separable
• we can solve the problem by minimizing over each component of x
independently

EE364 §3 16
the optimal x?i minimizes cixi subject to li ≤ xi ≤ ui
• if ci > 0, then x?i = li
• if ci < 0, then x?i = ui
• if ci = 0, then any xi in [li, ui] is optimal
the optimal value is
p? = l T c + + u T c − ,
where c+
i = max{c i , 0} and c −
i = max{−ci , 0}

EE364 §3 17
Conjugate function

find the conjugate of f (x) = |x|p/p, p > 1, x ∈ R.

f ∗(y) = sup(xy − f (x))


x

let y > 0, taking the derivative and setting it to zero

y − xp−1 = 0

and
p/(p−1)
y
f ∗(y) = y p/(p−1) −
p
define q such that 1/p + 1/q = 1, then (with similar analysis for y ≤ 0),

f ∗(y) = |y|q /q

EE364 §3 18
Proof of Hölder’s inequality

Hölder’s inequality: for p > 1, 1/p + 1/q = 1 and x, y ∈ Rn,

xT y ≤ kxkpkykq

Young’s inequality: For f ∗, the conjugate function of f ,

xT y ≤ f (x) + f ∗(y)

for p > 1, the conjugate function of

f (x) = kxkpp/p,

is
f ∗(y) = kykqq /q,
where 1/p + 1/q = 1

EE364 §3 19
Proof continued ...

thus
T
kxkpp kykqq
x y≤ +
p q
apply to x/kxkp, y/kykq ,

T
p q
x y 1 x
1 y

=1+1=1
≤ +
kxkpkykq p kxkp p q kykq q p q

therefore
xT y ≤ kxkpkykq

EE364 §3 20
EE364 Review

EE364 Review Session 4

Outline:

• convex optimization examples

• solving quasiconvex problems by bisection

• exercise 4.47

1
Convex optimization problems

we have seen

• linear programming (LP)

• quadratic programming (QP)

• quadratically constrained QP (QCQP)

• second-order cone programming (SOCP)

• geometric programming (GP)

• semidefinite programming (SDP)

EE364 Review Session 4 2


Force/moment generation with thrusters

• rigid body with center of mass at origin p = 0 ∈ R2

• n forces with magnitude ui, acting at pi = (pix, piy ), in direction θi

ui
θi
(pix, piy )

PSfrag replacements

EE364 Review Session 4 3


Pn
• resulting horizontal force: Fx = i=1 ui cos θi

Pn
• resulting vertical force: Fy = i=1 ui sin θi

Pn
• resulting torque: T = i=1 (piy ui cos θi − pixui sin θi)

• force limits: 0 ≤ ui ≤ 1 (thrusters)

• fuel usage: u1 + · · · + un

problem: find thruster forces ui that yield given desired forces and torques
Fxdes, Fydes, T des, and minimize fuel usage (if feasible)

EE364 Review Session 4 4


can be expressed as LP:

minimize 1T u
subject to F u = f des
0 ≤ ui ≤ 1, i = 1, . . . , n

where
 
cos θ1 ··· cos θn
F = sin θ1 ··· sin θn ,
p1y cos θ1 − p1x sin θ1 · · · pny cos θn − pnx sin θn

f des = (Fxdes, Fydes, T des), 1 = ( 1, 1, · · · 1 )

EE364 Review Session 4 5


clear all; close all;

% input data
% ----------
% thrusters x-coordinates
px = [-3 -2 -1 1.5 2 ];
% thrusters y-coordinates
py = [ 0 1 -2 1 -2.5];
% angles
thetas = [-85 30 -150 0 85]*pi/180;

F = [ cos(thetas);
sin(thetas);
py.*cos(thetas) - px.*sin(thetas)];
% different problem specified by each column of f_des
f_des = [ 0 0 1 -.5 0 0; ...
.5 -1 0 0 0 0; ...
0 0 0 0 2 -2];

EE364 Review Session 4 6


% problem solution

thrus = [];

for i=1:6

cvx_begin
variable u(5)
minimize ( sum ( u ) )
F*u == f_des(:,i)
u >= 0
u <= 1
cvx_end

thrus = [thrus u];

end

EE364 Review Session 4 7


for Fxdes = 0, Fydes = 0.5, T des = 0:

−1

−2

−3

−4
−4 −3 −2 −1 0 1 2 3 4

EE364 Review Session 4 8


for Fxdes = 0, Fydes = −1, T des = 0:

−1

−2

−3

−4
−4 −3 −2 −1 0 1 2 3 4

EE364 Review Session 4 9


for Fxdes = 1, Fydes = 0, T des = 0:

−1

−2

−3

−4
−4 −3 −2 −1 0 1 2 3 4

EE364 Review Session 4 10


for Fxdes = −0.5, Fydes = 0, T des = 0:

−1

−2

−3

−4
−4 −3 −2 −1 0 1 2 3 4

EE364 Review Session 4 11


for Fxdes = 0, Fydes = 0, T des = 2:

−1

−2

−3

−4
−4 −3 −2 −1 0 1 2 3 4

EE364 Review Session 4 12


for Fxdes = 0, Fydes = 0, T des = −2:

−1

−2

−3

−4
−4 −3 −2 −1 0 1 2 3 4

EE364 Review Session 4 13


Extensions of thruster problem

• opposing thruster pairs:


Pn
minimize kuk1 = i=1 |ui|
subject to F u = f des
|ui| ≤ 1, i = 1, . . . , n

can express as LP

• more accurate fuel use model:


Pn
minimize i=1 φi(ui)
subject to F u = f des
0 ≤ ui ≤ 1, i = 1, . . . , n

φi are piecewise linear increasing convex functions


can express as LP

EE364 Review Session 4 14


• minimize maximum force/moment error:

minimize kF u − f desk∞
subject to 0 ≤ ui ≤ 1, i = 1, . . . , n

can express as LP

• minimize number of thrusters used:

minimize # thrusters on
subject to F u = f des
0 ≤ ui ≤ 1, i = 1, . . . , n

can’t express as LP
(but we could check feasibility of each of the 2n subsets of thrusters)

EE364 Review Session 4 15


Optimizing structural dynamics

linear elastic structure

PSfrag replacements
f1
f2
f3
f4

dynamics (ignoring damping): M d¨ + Kd = 0

• d(t) ∈ Rk : vector of displacements


• M = M T  0 is mass matrix; K = K T  0 is stiffness matrix

EE364 Review Session 4 16


Fundamental frequency

• solutions have form


k
X
di(t) = αij cos(ωj t − φj )
j=1

where 0 ≤ ω1 ≤ ω2 ≤ · · · ≤ ωk are the modal frequencies, i.e., positive


solutions of det(ω 2M − K) = 0

• fundamental frequency:
1/2 1/2
ω1 = λmin (K, M ) = λmin(M −1/2KM −1/2 )

– structure behaves like mass at frequencies below ω1


– gives stiffness measure (the larger ω1, the stiffer the structure)

• ω1 ≥ Ω ⇐⇒ Ω2M − K  0 so ω1 is quasiconcave function of M , K

EE364 Review Session 4 17


• design variables: xi, cross-sectional area of structural member i
(geometry of structure fixed)
P P
• M (x) = M0 + i xi M i , K(x) = K0 + i xi K i

P
• structure weight w = w0 + i xi w i

• problem: minimize weight s.t. ω1 ≥ Ω, limits on cross-sectional areas

as SDP: P
minimize w0 + i xiwi
subject to Ω2M (x) − K(x)  0
li ≤ x i ≤ u i

EE364 Review Session 4 18


Solving quasiconvex problems via bisection

minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
Ax = b
fi convex, f0 quasiconvex

idea: express sublevel set f0(x) ≤ t as sublevel set of convex function:

f0(x) ≤ t ⇔ φt(x) ≤ 0

where φt : Rn → R is convex in x for each t

now solve quasiconvex problem by bisection on t, solving convex feasibility


problem
φt(x) ≤ 0, fi(x) ≤ 0, i = 1, . . . , m, Ax = b
(with variable x) at each iteration

EE364 Review Session 4 19


bisection method for quasiconvex problem:

given l < p∗; feasible x;  > 0


u := f0(x)
repeat
t := (u + l)/2
solve convex feasibility problem
φt(x) ≤ 0, fi(x) ≤ 0, Ax = b
if feasible,
u := t
x := any solution of feas. problem
else l := t
until u − l ≤ 

• reduces quasiconvex problem to sequence of convex feasibility problems

EE364 Review Session 4 20


4.47 Maximum determinant positive semidefinite matrix completion.

We consider a matrix A ∈ Sn, with some entries specified, and the others
not specified. Say
 
3.0 0.5 0.25
 0.5 2.0 0.75 
A=
 .
0.75 1.0 
0.25 5.0

The positive semidefinite matrix completion problem is to determine values


of the unspecified entries of the matrix so that A  0 (or to determine
that such a completion does not exist).

EE364 Review Session 4 21


(a) Why can we assume (w.l.o.g) that Aii are specified?

(b) Formulate this problem as an SDP feasibility problem?

EE364 Review Session 4 22


(c) Assume that A has at least one completion that is positive definite, and
the diagonal entries of A are specified (i.e., fixed). The positive definite
completion with largest determinant is called the maximum determinant
completion. Show that the maximum determinant completion is unique.
Show that if A? is the maximum determinant completion, then (A?)−1
has zeros in all the entries of the original matrix that were not specified.

(d) Suppose A is specified on its tridiagonal part, i.e., we are given


A11, . . . , Ann and A12, . . . , An−1,n . Show that if there exists a positive
definite completion of A, then there is a positive definite completion
whose inverse is tridiagonal.

EE364 Review Session 4 23


Matlab code:

n = 4;

% create and solve the problem


cvx_begin sdp
% A is a PSD symmetric matrix (n-by-n)
variable A(n,n) symmetric;

maximize( det_rootn( A ) )

A >= 0;
% constrained matrix entries.
A(1,1) == 3;
A(2,2) == 2;
A(3,3) == 1;
A(4,4) == 5;
A(1,2) == .5;
A(1,4) == .25;
A(2,3) == .75;
cvx_end

EE364 Review Session 4 24


Matrix A with maximum determinant (20.578) is:

A =
3.0000 0.5000 0.1874 0.2500
0.5000 2.0000 0.7500 0.0417
0.1874 0.7500 1.0000 0.0156
0.2500 0.0417 0.0156 5.0000

Its eigenvalues are:

eigs =
0.5964
2.0908
3.2773
5.0355

The inverse of matrix A is:

0.3492 -0.0870 0.0000 -0.0167


-0.0870 0.7174 -0.5217 -0.0000
0.0000 -0.5217 1.3913 0.0000
-0.0167 -0.0000 0.0000 0.2008

EE364 Review Session 4 25


EE364 Review

EE364: Review Session 5

Outline:

• Duality examples

• Strong duality

• Farkas’ lemma

• Mixed strategies for matrix games

• Homework hints

1
Duality

• Primal problem

minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p

Pm Pp
• Lagrangian L(x, λ, ν) = f0(x) + i=1 λi fi (x) + i=1 νi hi (x)

• Dual g(λ, ν) = inf x L(x, λ, ν)

• For λ  0, g(λ, ν) ≤ p∗

• Dual problem
maximize g(λ, ν)
subject to λ  0

EE364: Review Session 5 2


LP duality example

Example: Show that the dual of the following LP

minimize 3x1 + 2x2 + x3


subject to x1 + x2 ≥ 4
x2 + 2x3 ≥ 2
x1, x2, x3 ≥ 0,

can be expressed as
maximize 4y1 + 2y2
subject to y1 ≤ 3
y1 + y2 ≤ 2
2y2 ≤ 1
y1, y2 ≥ 0.

EE364: Review Session 5 3


Dual of an SOCP

Example: Find the dual of the SOCP

minimize f T x
subject to kAx + bk2 ≤ cT x + d,

with variables x ∈ Rn. The problem data are f, c ∈ Rn, A ∈ Rm×n,


b ∈ Rm and d ∈ R.

Solution: We can express the problem as

minimize f T x
subject to kyk2 ≤ t
Ax + b = y
cT x + d = t.

EE364: Review Session 5 4


Let ν ∈ R, u ∈ Rm and µ ∈ R be the Lagrange multipliers for the above
problem. The Lagrangian is

L(x, y, ν, u, µ) =f T x + ν(kyk2 − t) + uT (y − Ax − b) + µ(t − cT x − d)


=(f − AT u − µc)T x + (νkyk2 + uT y)+
+ t(−ν + µ) − uT b − νd.

Therefore the dual function is

g(ν, u, µ) = inf (f − AT u − µc)T x + inf (νkyk2 + uT y)+


x y

+ inf [t(−ν + µ)] − uT b − νd.


t

Using Cauchy-Schwarz, we have that



0, if kuk2 ≤ ν
inf (νkyk2 + uT y) =
y −∞, otherwise.

EE364: Review Session 5 5


Thus

−uT b − νd, if AT u + µc = f, kuk2 ≤ ν, µ = ν



g(ν, u, µ) =
−∞, otherwise.

The dual of the original SOCP can be written as

maximize −uT b − νd
subject to kuk2 ≤ ν
AT u + νc = f.

EE364: Review Session 5 6


Strong duality

• Slater’s condition: strong duality holds for a convex problem if it is


strictly feasible, i.e.,

∃x ∈ int D : fi(x) < 0, i = 1, . . . , m, Ax = b

• Sharper version: Affine constraints need not be strictly feasible, only


feasible

• Example: convex problem for which strong duality fails

minimize e−x
subject to x2/y ≤ 0,

with domain D = {(x, y) | y > 0}

• Optimal value p? = 1

EE364: Review Session 5 7


• Lagrangian L(x, y, λ) = e−x + λx2/y, dual function is

0 λ≥0
g(λ) = inf (e−x + λx2/y) =
y>0,x −∞ λ < 0,

• Dual problem:
maximize 0
subject to λ ≥ 0,

• Dual optimal value d? = 0

• Global sensitivity inequality p?(u) ≥ p?(0) − λ?u does not hold:


– p?(u) = 1 if u = 0
– p?(u) = 0 if u > 0
– p?(u) = ∞ if u < 0

EE364: Review Session 5 8


Strong duality for LPs

• Strong duality holds if either primal or dual is feasible (Slater’s


condition)

• No strong duality only if both primal and dual are infeasible: p? = +∞,
d? = −∞

• Example
minimize x
   
0 −1
subject to x
1 1

• Dual LP
maximize z1 − z2
subject to z2 + 1 = 0
z1 , z 2 ≥ 0

EE364: Review Session 5 9


Farkas’ lemma

• Two sets of inequalities are called strong alternatives if exactly one of


the two is feasible

• Farkas’ lemma: The system of inequalities

Ax  0, cT x < 0, (1)

where A ∈ Rm×n and c ∈ Rn, and the system of inequalities

AT y + c = 0, y  0, (2)

are strong alternatives

• Can be proved using strong duality for LPs

EE364: Review Session 5 10


Farkas’ lemma (contd.)

• Consider the LP
minimize cT x
subject to Ax  0.

• Optimum value 0 if 1 is not feasible and −∞ if 1 is feasible

• Dual of this LP
maximize 0
subject to AT y + c = 0
y  0.

• Dual has optimal value 0 if 2 is feasible and −∞ if 2 is not feasible

• Since the primal LP is feasible, strong duality holds

• Thus 1 and 2 are strong alternatives

EE364: Review Session 5 11


Mixed strategies for matrix games

• Two players: player 1 and player 2

• Player 1 makes a choice k ∈ {1, . . . , m}. Player 2 makes a choice


l ∈ {1, . . . , n}

• Player 1 then makes a payment Pkl to player 2, where P ∈ Rm×n is the


payoff matrix

• The goal of player 1 is to minimize the payment to player 2, while the


goal of player 2 is to maximize it

• We assume that the players use randomized or mixed strategies

prob(k = i) = ui, i = 1, . . . , m, prob(l = i) = vi, i = 1, . . . , n

EE364: Review Session 5 12


Game from player 1’s perspective:

• Assume that player 1’s strategy u is known to player 2

• This clearly gives an advantage to player 2

• Player 2 will choose his/her strategy v to maximize uT P v, which results


in the expected payoff

sup{uT P v | v  0, 1T v = 1} = max (P T u)i


i=1,...,n

• Best thing that player 1 can do is to choose u to minimize this


worst-case payoff

minimize maxi=1,...,n(P T u)i


(3)
subject to u  0, 1T u = 1

• Let the optimal value of this problem be p∗

EE364: Review Session 5 13


Game from player 2’s perspective:

• Assume that player 2’s strategy v is known to player 1

• This clearly gives an advantage to player 1

• Player 1 will choose his/her strategy u to minimize uT P v, which results


in the expected payoff

inf{uT P v | u  0, 1T u = 1} = min (P v)i


i=1,...,m

• Best thing that player 2 can do is to choose u to maximize this


worst-case payoff

maximize mini=1,...,m(P v)i


(4)
subject to v  0, 1T v = 1

• Let the optimal value of this problem be q ∗

EE364: Review Session 5 14


Analysis:

• Clearly knowing opponents strategy gives an advantage (or at least


cannot hurt), thus p∗ ≥ q ∗

• We will prove that p∗ = q ∗, using strong duality for LPs

• We can express problem 3 as the LP

minimize t
subject to u  0, 1T u = 1
P T u  t1,

with extra variable t ∈ R

• Introduce multipliers λ ∈ Rn for P T u  t1, µ ∈ Rm for u  0 and


ν ∈ R for 1T u = 1

EE364: Review Session 5 15


• The Lagrangian is

L(x, t, λ, µ, ν) = t + λT (P T u − t1) − µT u + ν(1 − 1T u)


= ν + (1 − 1T λ)t + (P λ − ν1 − µ)T u

• Thus the dual function is

1T λ = 1,

ν P λ − ν1 = µ
g(λ, µ, ν) =
−∞ otherwise

• The dual problem is

maximize ν
subject to λ  0, 1T λ = 1, µ0
P λ − ν1 = µ

EE364: Review Session 5 16


• By eliminating µ we can rewrite the dual as

maximize ν
subject to λ  0, 1T λ = 1
P λ  ν1

• This problem is clearly equivalent to problem 4

• Since both LPs are feasible, we have strong duality

• The optimal values of 3 and 4 are the same

EE364: Review Session 5 17


EE364 Review

EE364: Review Session 6

Outline:

• Variable bounds and dual feasibility

• SDP relaxation

• Monotone transformation of the objective

• Homework hints

1
Variable bounds and dual feasibility

in many problems the constraints include variable bounds, as in

minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
li ≤ xi ≤ ui, i = 1, . . . , n

the Lagrangian is

m
X
L(x, λ, µ, ν) = f0(x) + λifi(x) + µT (x − u) + ν T (l − x)
i=1

EE364: Review Session 6 2


for any x ∈ Rn and any λ, we can choose µ  0 and ν  0 so that x
minimizes L(x, λ, µ, ν)

we have
m
X
∇xL(x, λ, µ, ν) = ∇f0(x) + λi∇fi(x) + (µ − ν)
i=1

if x minimizes L, we have ∇xL = 0 and therefore

m
X
ν − µ = ∇f0(x) + λi∇fi(x)
i=1

EE364: Review Session 6 3


" m
#+
X
ν = ∇f0(x) + λi∇fi(x)
i=1
m m
!
1 X X
= |∇f0(x) + λi∇fi(x)| + ∇f0(x) + λi∇fi(x)
2 i=1 i=1

and
" m
#−
X
µ = ∇f0(x) + λi∇fi(x)
i=1
m m
!
1 X X
= |∇f0(x) + λi∇fi(x)| − ∇f0(x) − λi∇fi(x)
2 i=1 i=1

where | · | is componentwise

EE364: Review Session 6 4


• therefore, if λ  0 then (λ, µ, ν) is dual feasible

• we can obtain a lower bound for any λ  0

EE364: Review Session 6 5


Example

with x = (l + u)/2 and λ = 0 we can find a dual feasible point and a lower
bound on f ?

we have

1
ν = (∇f0((l + u)/2) + |∇f0((l + u)/2)|)
2
1
µ = (−∇f0((l + u)/2) + |∇f0((l + u)/2)|)
2

EE364: Review Session 6 6


and therefore the lower bound becomes
 T
l+u 1 l+u l+u l+u
L(x, 0, µ, ν) = f0( )+ −∇f0( ) + |∇f0( )| ( − u)
2 2 2 2 2
 T
1 l+u l+u l+u
+ ∇f0( ) + |∇f0( )| (l − )
2 2 2 2
l+u u−l T l+u
= f0 ( )−( ) |∇f0( )|
2 2 2

EE364: Review Session 6 7


this bound can also be derived directly
since f0 is convex

u+l u+l T ? u+l


f ? ≥ f0 ( ) + ∇f0( ) (x − )
2 2 2
u+l u+l T u+l
≥ f0 ( ) + inf ∇f0( ) (x − )
2 lxu 2 2

but inf lxu ∇f0((u + l)/2)T (x − (l + u)/2) is obtained for

• xi = (ui − li)/2 if ∇f0((l + u)/2)i ≤ 0,

• xi = (li + ui)/2) if ∇f0((l + u)/2)i > 0

therefore,
inf lxu ∇f0((u + l)/2)T (x − (u + l)/2) = −|∇f0((u + l)/2)|T (u − l)/2
we get
f ? ≥ f0((u + l)/2) − |∇f0((u + l)/2)|T (u − l)/2

EE364: Review Session 6 8


SDP relaxations of two-way partitioning problem

consider the problem

minimize xT W x
subject to x2i = 1, i = 1, . . . , n,

• xi = 1 if it belongs to the one partition

• xi = −1 if it belongs to the other partition

• Wij is the cost of having i and j in the same partition

EE364: Review Session 6 9


the Lagrangian is

n
X
L(x, ν) = xT W x + νi(x2i − 1)
i=1

= xT (W + diag(ν))x − 1T ν

and therefore the dual problem is

maximize −1T ν
subject to W + diag(ν)  0

the optimal value of the dual is a lower bound on the optimal value of the
partitioning problem

EE364: Review Session 6 10


Another bound

since
xT W x = tr(xT W x) = tr(W xxT )
and
(xxT )ii = x2i
we can write the original problem as

minimize tr(W X)
subject to X  0, rank X = 1
Xii = 1, i = 1, . . . , n,

EE364: Review Session 6 11


the problem
minimize tr(W X)
subject to X  0, rank X = 1
Xii = 1, i = 1, . . . , n,
is not convex but we can write a relaxation by removing the rank constraint

minimize tr(W X)
subject to X  0,
Xii = 1, i = 1, . . . , n,

• this problem is convex (SDP) and gives a lower bound on the original
problem

• if the solution has rank 1 we solved the original problem

EE364: Review Session 6 12


we now find the dual of the dual problem

minimize 1T ν
subject to W + diag(ν)  0

introducing a Lagrange multiplier X ∈ Sn for the matrix inequality

L(ν, X) = 1T ν − tr(X(W + diag(ν)))


n
X
= 1T ν − tr(XW ) − νiXii
i=1
n
X
= − tr(XW ) + νi(1 − Xii)
i=1

this is bounded below as a function of ν only if Xii = 1 for all i, so we

EE364: Review Session 6 13


obtain the dual problem

maximize − tr(W X)
subject to X  0
Xii = 1, i = 1, . . . , n

this is the same as the relaxation problem

EE364: Review Session 6 14


Monotone transformation of the objective

consider the convex optimization problem

minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m

suppose φ : R → R is increasing and convex


then the problem

minimize f˜0(x) = φ(f0(x))


subject to fi(x) ≤ 0, i = 1, . . . , m

is convex and equivalent to it

• we are interested in the connection between the duals

• we consider φ(a) = exp a

EE364: Review Session 6 15


suppose λ is feasible for the dual of the first problem and x̄ minimizes

m
X
f0(x) + λifi(x)
i=1

it can be shown that x̄ also minimizes


m
X
exp f0(x) + λ̃ifi(x)
i=1

for appropriate choice of λ̃


thus, λ̃ is dual feasible for the second problem

EE364: Review Session 6 16


Pm
since x̄ minimizes f0(x) + i=1 λi fi (x) we have
m
X
∇f0(x̄) + λi∇fi(x̄) = 0
i=1

but
" m
# m
∂ X X
exp f0(x) + λ̃i∇fi(x̄) = exp f0(x̄)∇f0(x̄) + λ̃i∇fi(x̄)
∂x i=1 i=1
x=x̄
" m
#
X
= exp f0(x) ∇f0(x̄) + λ̃ie−f0(x̄)∇fi(x̄)
i=1

if we take λ̃i = exp f0(x̄)λi ≥ 0


" m
#
∂ X
exp f0(x) + λ̃i∇fi(x̄) =0
∂x i=1 x=x̄

EE364: Review Session 6 17


if p? denote the optimal value of the first problem
the optimal value of the second is exp p?

we have bound
p? ≥ g(λ),
where g is the dual function of the first problem and

exp p? ≥ g̃(λ̃)

where g̃ is the dual function of the second problem or equivalently

p? ≥ log g̃(λ̃)

EE364: Review Session 6 18


f0 (x̄)
Pm f0 (x̄)
we have g̃(λ̃) = e + i=1 e λifi(x̄) and therefore

m
!
X
log g̃(λ̃) = log ef0(x̄) + ef0(x̄)λifi(x̄)
i=1
m
!
X
= f0(x̄) + log 1 + λifi(x̄)
i=1

the bound we from the modified problem is always worse, i.e.,


log g̃(λ̃) ≤ g(λ) in fact

m
! m
X X
log g̃(λ̃) − g(λ) = log 1 + λifi(x̄) − λifi(x̄)
i=1 i=1

from the identity log(1 + y) − y ≤ 0 we conclude

EE364: Review Session 6 19


Additional problem hint

square( square( x + y ) ) <= x - y

• the problem is that square() can only accept affine arguments,


because it is convex, but not increasing

• we can restrict square() to R+ so that it’s convex and increasing

• we use square_pos() instead:

square_pos( square( x + y ) ) <= x - y

• we can introduce additional variable

variable t
square( x+y ) <= t;
square( t ) <= x - y

EE364: Review Session 6 20


EE364 Review

EE364 Review Session 7

Outline:

• numerical linear algebra examples

• gradients and chain rule

• homework hints

1
Numerical linear algebra: factor and solve

factor-solve method for Ax = b

• computation cost f + s

• consider set of n linear equations in n variables, i.e., A is square

• use LU factorization, A = LU

• f = (2/3)n3 (Gaussian elimination)

• s = 2n2 (back and forward solve)

• for example, can compute n × n matrix inverse with cost


f + ns = (8/3)n3 (why?)
ans: solve AX = I system with factor-solve method

EE364 Review Session 7 2


Numerical linear algebra examples

examples setup:
we give naive but correct algorithms (in Matlab notation)

• compute flop count for each algorithm using LU factor-solve


• if possible, give a more efficient method (and its flop count)

1. calculate cT A−1b where c ∈ Rn, A ∈ Rn×n, b ∈ Rn, and matrix A is


nonsingular

naive algorithm: val = c’*(inv(A)*b)

flop count:
ans: (8/3)n3 + 2n2 + 2n ≈ (8/3)n3
more efficient method:
ans: val = c’*(A\b) with flop count about (2/3)n3

EE364 Review Session 7 3


2. calculate cT A−1B where c ∈ Rn, A ∈ Rn×n, B ∈ Rn×m, and matrix A
is nonsingular

naive algorithm: val = c’*(inv(A)*B)

flop count:
ans: (8/3)n3 + 2n2m + 2nm ≈ (8/3)n3 + 2n2m

more efficient method:


ans: val = (A’\c)’*B with flop count about (2/3)n3

EE364 Review Session 7 4


3. solve the set of equations
   
A 0 b
x= .
0 B c

where A ∈ Rn×n, B ∈ Rn×n, b ∈ Rn, c ∈ Rn, and matrices A and B


are nonsingular

naive algorithm:
x = [A, zeros(n,n); zeros(n,n), B] \ [b; c]

flop count:
ans: (2/3)(2n)3 = (16/3)n3

more efficient method:


ans: [A\b; B\c] with flop count 2(2/3)n3 = (4/3)n3

EE364 Review Session 7 5


4. solve the set of equations
   
A B b1
x= .
C I b2

where A ∈ Rn×n, B ∈ Rn×10n, C ∈ R10n×n, b1 ∈ Rn, b2 ∈ R10n, and I


is the 10n × 10n identity matrix; also assume that the whole matrix is
nonsingular

naive algorithm: x = [A, B; C eye(10*n)] \ [b1; b2]

flop count:
ans: (2/3)(11n)3 = (2662/3)n3
more efficient method:
ans: we use elimination of variables to get equations

(A − BC)x1 = b1 − Bb2 and x2 = b2 − Cx1

flop count: forming A − BC costs 20n3, b1 − Bb2 is 20n2, backslash is


(2/3)n3, and computing x2 costs 20n2, overall about (62/3)n3.

EE364 Review Session 7 6


5. solve the set of equations
   
I B b1
x= .
C I b2

where B ∈ Rm×n, C ∈ Rn×m, and n > m; also assume that the whole
matrix is nonsingular

naive algorithm: x = [eye(m), B; C, eye(n)] \ [b1; b2]

flop count:
ans: (2/3)(n + m)3
more efficient method:
ans: we use elimination of variables to get equations

(I − BC)x1 = b1 − Bb2 and x2 = b2 − Cx1

flop count: forming I − BC costs 2nm2, b1 − Bb2 is 2nm, backslash is


(2/3)m3, and computing x2 costs 2nm, overall about 2nm2 + (2/3)m3.

EE364 Review Session 7 7


Solving almost separable linear equations

Consider the following system of equations

Ax + By = c
Dx + Ey + F z = g
Hy + Jz = k

where A, J ∈ Rn×n, B, H ∈ Rn×m, D, F ∈ Rm×n, E ∈ Rm×m, c, k ∈ Rn,


g ∈ Rm and n > m

In other words, we need to solve the following system


    
A B 0 x c
 D E F  y  =  g 
0 H J z k

EE364 Review Session 7 8


The naive way would be to do the following:
   −1  
x A B 0 c
 y = D E F   g 
z 0 H J k

but we can take advantage of the structure by first reordering the


equations and variables
    
A 0 B x c
 0 J H  z  =  k 
D F E y g

The system now looks like an ”arrow” system, which we can efficiently
solve by block elimination.

EE364 Review Session 7 9


Since       
A 0 x B c
+ y=
0 J z H k
then      
x A c −1
A B−1
= − y
z J −1k J −1H

We know that  
x  
D F
+ Ey = g
z
then using the expression derived before
    
  A c −1
A B −1
D F − z + Ey = g
J −1k J −1H

and therefore

(E − DA−1B − F J −1H)y = g − DA−1c − F J −1k

EE364 Review Session 7 10


We can therefore solve the system of equations efficiently by taking
advantage of structure in the following way

• Form

M = A−1B, n = A−1c,
P = J −1H, q = J −1k.

• Compute r = g − Dn − F q.

• Compute S = E − DM − F P .

• Find
y = S −1r, x = n − M y, z = q − P y.

EE364 Review Session 7 11


Derivative and gradients

When f is real-valued (i.e., f : Rn → R) the derivative Df (x) is a 1 × n


matrix, i.e., it is a row vector.

Its transpose is called the gradient of the function:

∇f (x) = Df (x)T ,

which is a (column) vector, i.e., in Rn.

Its components are the partial derivatives of f :

∂f (x)
∇f (x)i = , i = 1, . . . , n.
∂xi

The first-order approximation of f at a point x can be expressed as (the


affine function of z)
f (x) + ∇f (x)T (z − x).

EE364 Review Session 7 12


• Consider f : Rn → R,

f (x) = (1/2)xT P x + q T x + r,

where P ∈ Sn, q ∈ Rn, and r ∈ R.


Its derivative at x is the row vector Df (x) = xT P + q T , and its
gradient is ∇f (x) = P x + q.

n Pm
• Consider g : R → R, g(x) = log i=1 exp(xi ). Its gradient is
 
exp x1
1 ..
∇g(x) = Pm  , (1)
i=1 exp xi exp xm

EE364 Review Session 7 13


Matrix derivatives

Let f : Sn → R.

One (tedious) way to find the gradient of f is to introduce a basis for Sn,
find the gradient of the associated function, and finally translate the result
back to Sn.

Instead, we will directly find the first-order approximation of f at


X ∈ Sn++.

Let Z ∈ Sn++ be close to X, and let ∆X = Z − X (which is assumed to


be small).

We need to find the matrix D such that

f (Z) ≈ f (X) + tr(D(Z − X))

EE364 Review Session 7 14


Let f (X) = log det X, dom f = Sn++.

Let Z ∈ Sn++ be close to X, and let ∆X = Z − X (which is assumed to


be small).

log det Z = log det(X + ∆X)


 
1/2 −1/2 −1/2 1/2
= log det X (I + X ∆XX )X

= log det X + log det(I + X −1/2∆XX −1/2)


Xn
= log det X + log(1 + λi),
i=1

where λi is the ith eigenvalue of X −1/2∆XX −1/2.

Now we use the fact that ∆X is small, which implies λi are small, so to
first order we have log(1 + λi) ≈ λi.

EE364 Review Session 7 15


we get
n
X
log det Z ≈ log det X + λi
i=1

= log det X + tr(X −1/2∆XX −1/2)


= log det X + tr(X −1∆X)
−1

= log det X + tr X (Z − X) ,

The first-order approximation of f at X is the affine function of Z given by


−1

f (Z) ≈ f (X) + tr X (Z − X) .

Thus, we can write the simple formula

∇f (X) = X −1.

This result should not be surprising, since the derivative of log x, on R++,
is 1/x.

EE364 Review Session 7 16


Chain rule

Suppose f : Rn → Rm is differentiable at x ∈ int dom f and


g : Rm → Rp is differentiable at f (x) ∈ int dom g. Define h : Rn → Rp
by h(z) = g(f (z)). Then

Dh(x) = Dg(f (x))Df (x). (2)

Composition with an affine function:

Suppose f : Rn → Rm is differentiable, A ∈ Rn×p, and b ∈ Rn. Define


g : Rp → Rm as g(x) = f (Ax + b), with dom g = {x | Ax + b ∈ dom f }.

The derivative of g is Dg(x) = Df (Ax + b)A.

When f is real-valued (i.e., m = 1),

∇g(x) = AT ∇f (Ax + b).

EE364 Review Session 7 17


• Consider the function f : Rn → R, with dom f = Rn and

m
X
f (x) = log exp(aTi x + bi),
i=1

where a1, . . . , am ∈ Rn, and b1, . . . , bm ∈ R.


Note that f is the composition of the affine function Ax + b, where
A ∈ Rm×n with Pm rows a T
1 , . . . , a T
m , and the function g : R m
→ R given
by g(y) = log( i=1 exp yi).
Then by the composition formula we have

1
∇f (x) = AT z
1T z

where zi = exp(aTi x + bi), i = 1, . . . , m.

EE364 Review Session 7 18


• Consider
h(x) = log det(F0 + x1F1 + · · · + xnFn),
where F0, . . . , Fn ∈ Sp, and

dom h = {x ∈ Rn | F0 + x1F1 + · · · + xnFn  0}.

The function h is the composition of the affine mapping from x ∈ Rn to


F0 + x1F1 + · · · + xnFn ∈ Sp, with the function log det X.

∂h(x)
= tr(Fi∇ log det(F )) = tr(F −1Fi),
∂xi

where F = F0 + x1F1 + · · · + xnFn. Thus we have


 
tr(F −1
F1 )
∇h(x) =  .. .
tr(F −1Fn)

EE364 Review Session 7 19


Homework hints

• Let f (X) = tr AX, dom f = Sn then ∇f (X) = A.

• The backward difference matrix


 
−1
 1 −1
 ∈ RN ×N .

D=  ... ... 
1 −1

can be built in MATLAB as follows:

D = diag(-ones(N,1)) + diag(ones(N-1,1),-1);

or

D = toeplitz([-1; 1; zeros(N-1,1)], [-1 zeros(1,N-1)]);

EE364 Review Session 7 20


• The (sparse) tridiagonal matrix ∆ ∈ Rn×n
 
1 −1 0 ··· 0 0 0

 −1 2 −1 · · · 0 0 0 

 0 −1 2 ··· 0 0 0 

∆= .. .. .. .. .. .. 
.
 

 0 0 0 ··· 2 −1 0 

 0 0 0 ··· −1 2 −1 
0 0 0 ··· 0 −1 1

can be built in MATLAB as follows:

d_1 = 2*ones(n,1); d_1(1) = 1; d_1(n) = 1;


d_2 = -ones(n,1);
D = spdiags([d_2 d_1 d_2],[-1 0 1], n,n);

EE364 Review Session 7 21


EE364 Review

EE364 Review Session 8

Outline:

• Approximate TV de-noising

• Inertia of the KKT matrix

• Inequality constrained problems

1
Approximate TV de-noising

• TV denoising, a bicriterion optimization problem

cor
Pn−1
minimize kx − x k2 + µ i=1 |xi+1 − xi|

• xcor ∈ Rn is the corrupted signal and x ∈ Rn is the de-noised signal to


be computed

• Problem can be formulated as an SOCP or a QP

• Approximate TV denoising

minimize kx − xcork22 + µφatv,

where
X p
n−1 
φatv(x) = 2 + (xi+1 − xi)2 − 
i=1

EE364 Review Session 8 2


• Objective (ψ(x)) is twice differentiable

• Can solve this problem using Newton’s method

• Gradient and Hessian are

∇ψ(x) = 2(x − xcor) + µ∇φatv(x), ∇2ψ(x) = 2I + µ∇2φatv (x)

• Use chain rule to compute ∇φatv(x) and ∇2φatv(x)

• Let f : R → R denote the function


p
f (u) = 2 + u2 − 

• Its first and second derivatives are

f 0(u) = u(2 + u2)−1/2, f 00(u) = 2(2 + u2)−3/2

EE364 Review Session 8 3


• Define F : R(n−1) → R as

n−1
X
F (u1, . . . , un−1) = f (ui)
i=1

• Its gradient and Hessian are

∇F (u) = (f 0(u1), . . . , f 0(un−1)), ∇2F (u) = diag(f 00(u1), . . . , f 00(un−1))

• We have φatv(x) = F (Dx), where


 
−1 1
 −1 1 
D=
 ... ...  ∈ R(n−1)×n

−1 1

EE364 Review Session 8 4


• Using the chain rule

∇φatv (x) = D T ∇F (Dx), ∇2φatv(x) = D T ∇2F (Dx)D

• Therefore

∇ψ(x) = 2(x − xcor) + µD T ∇F (Dx)


∇2ψ(x) = 2I + µD T ∇2F (Dx)D.

• Hessian is tridiagonal

• We can compute the Newton step in O(n) operations.

EE364 Review Session 8 5


Matlab code for Newton method:

% Newton method for approximate total variation de-noising


D = spdiags([-1*ones(n,1) ones(n,1)], 0:1, n-1, n);

% Newton’s method
ALPHA = 0.01;
BETA = 0.5;
MAXITERS = 100;
NTTOL = 1e-10;

x = zeros(n,1);
newt_dec = [];

for iter = 1:MAXITERS


d = (D*x);
val = (x-xcor)’*(x-xcor) + ...
MU*sum(sqrt(EPSILON^2+d.^2)-EPSILON*ones(n-1,1));
grad = 2*(x - xcor) + ...
MU*D’*(d./sqrt(EPSILON^2+d.^2));

EE364 Review Session 8 6


hess = 2*speye(n) + ...
MU*D’*spdiags(EPSILON^2*(EPSILON^2+d.^2).^(-3/2),...
0,n-1,n-1)*D;

v = -hess\grad;
lambdasqr = -grad’*v;
newt_dec = [newt_dec sqrt(lambdasqr)];

if (lambdasqr/2) < NTTOL, break; end;

t = 1;
while ((x+t*v-xcor)’*(x+t*v-xcor) + ...
MU*sum(sqrt(EPSILON^2+(D*(x+t*v)).^2)-...
EPSILON*ones(n-1,1)) > val - ALPHA*t*lambdasqr)
t = BETA*t;
end;
x = x+t*v;
end;

EE364 Review Session 8 7


Progress of Newton’s method
2
10

1
10
Newton decrement λ(x)

0
10

−1
10

−2
10

−3
10

−4
10
Sfrag replacements
−5
10
0 2 4 6 8 10 12 14

EE364 Review Session 8 8


Approximate TV de-noising with  = 0.001, µ = 50
3
x
xcor

−1

Sfrag replacements−2

−3
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

EE364 Review Session 8 9


Tikhonov regularized smoothing with µ = 250

3
xtikh
xcor

−1

Sfrag replacements
−2

−3
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
t

EE364 Review Session 8 10


Tikhonov regularized smoothing with µ = 20000

3
xtikh
xcor

−1

Sfrag replacements
−2

−3
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
t

EE364 Review Session 8 11


Inertia of KKT matrix

• KKT matrix for equality-constrained quadratic minimization


 T

P A
M= ,
A 0

P ∈ Sn+, A ∈ Rp×n , rank A = p < n

• Inertia of a matrix: (p, z, n), where p is number of positive eigenvalues,


z is number of 0 eigenvalues, and n is number of negative eigenvalues

• Condition for non-singularity of KKT matrix: P + AT A  0

• If M is non-singular, there exists nonsingular matrix R ∈ Rn×n such


that
RT (P + AT A)R = I

EE364 Review Session 8 12


• Let AR = U ΣV1T be the singular value decomposition of AR, with
U ∈ Rp×p , Σ = diag(σ1, . . . , σp) ∈ Rp×p and V1 ∈ Rn×p
n×(n−p)
 
• Let V2 ∈ R  be
 such that V = V1 V2 is orthogonal, and
define S = Σ 0 ∈ Rp×n

• Since AR = U SV T ,

V T RT (P + AT A)RV = V T RT P RV + S T S = I

• Therefore V T RT P RV = I − S T S is a diagonal matrix Λ

Λ = V T RT P RV = diag(1 − σ12, . . . , 1 − σp2, 1, . . . , 1)

• Congruence transformations preserve inertia


 T T  T
   T

V R 0 P A RV 0 Λ S
=
0 UT A 0 0 U S 0

EE364 Review Session 8 13


• Applying a permutation to matrix on the right gives a block diagonal
matrix with n diagonal blocks
 
λ i σi
, i = 1, . . . , p, λi = 1, i = p + 1, . . . , n.
σi 0

• The eigenvalues of the 2 × 2-blocks are


p
λi ± λ2i + 4σi2
,
2

i.e., one eigenvalue is positive and one is negative

• So p + n − p = n positive eigenvalues, and p negative eigenvalues

EE364 Review Session 8 14


Inequality constrained problems

• Problems with inequality constraints:

minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
Ax = b

• Exact reformulation with indicator function:


Pm
minimize f0(x) + i=1 I−(fi(x))
subject to Ax = b

where I−(u) = 0 if u ≤ 0, I−(u) = ∞ otherwise

• Approximation via barrier function:


Pm
minimize f0(x) + (1/t) i=1 h(fi (x))
subject to Ax = b

EE364 Review Session 8 15


• h convex (twice differentiable) increasing function with domain −R++

• Approximation improves as t → ∞
Pm
• Logarithmic barrier function: φ(x) = − i=1 log(−fi(x)), with
dom φ = {x | f1(x) < 0, . . . , fm(x) < 0}

• Approximation via log-barrier:

minimize tf0(x) + φ(x)


subject to Ax = b

is an equality constrained problem

EE364 Review Session 8 16


• Example:
minimize x2 + 1
subject to 2 ≤ x ≤ 4,

• Feasible set is [2, 4], and optimal point x? = 2

b = − log(x − 2) − log(4 − x)
• Log-barrier function I(x)

60

50

40

30

20
PSfrag replacements
10

0
1 2 3 4 5
x
b for t = 10−1, 10−0.8, 10−0.6, . . . , 100.8, 10.
Figure 1: f0 + (1/t)I,

EE364 Review Session 8 17


• Newton step for the approximate problem:
 2 2 T
   
t∇ f0(x) + ∇ φ(x) A ∆xnt t∇f0(x) + ∇φ(x)
=−
A 0 νnt 0

• Gradient and Hessian of the logarithmic barrier function φ are given by

m
X 1
∇φ(x) = ∇fi(x),
i=1
−f i (x)
m
X m
X 1
1
∇2φ(x) = 2
∇f i (x)∇f i (x) T
+ ∇ 2
fi(x)
f
i=1 i
(x) i=1
−f i (x)

EE364 Review Session 8 18

You might also like