Notes On Optimization and Pareto Optimality

Notes on Optimization and Pareto Optimality
John Duggan
University of Rochester
June 21, 2010
Contents
1 Opening Remarks
2 Unconstrained Optimization
3 Pareto Optimality
3.1 Existence of Pareto Optimals . . . . . . . . . . . . . . . . . . . . . .
3.2 Characterization with Concavity . . . . . . . . . . . . . . . . . . . . .
3.3 Characterization with Differentiability . . . . . . . . . . . . . . . . .
8
9
12
18
4 Constrained Optimization
19
5 Equality Constraints
5.1 First Order Analysis . . . . .
5.2 Examples . . . . . . . . . . .
5.3 Second Order Analysis . . . .
5.4 Multiple Equality Constraints
.
.
.
.
21
21
27
31
37
6 Inequality Constraints
6.1 First Order Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Concave Programming . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Second Order Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .
44
44
49
55
7 Pareto Optimality Revisited
58
8 Mixed Constraints
62
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Opening Remarks
These notes were written for PSC 408, the second semester of the formal modeling
sequence in the political science graduate program at the University of Rochester. I
hope they will be a useful reference on optimization and Pareto optimality for political scientists, who otherwise would see very little of these subjects, and economists
wanting deeper coverage than one gets in a typical first-year micro class. I do not
invent any new theory, but I try to draw together results in a systematic way and to
build up gradually from the basic problems of unconstrained optimization and optimization with a single equality constraint. That said, Theorem 8.2 may be a slightly
new way of presenting results on convex optimization, and Ive strived for quantity
and quality of figures to aid intuition. As alternatives to these notes, I suggest Simon
and Blume (1994), who cover a greater range of topics, and Sundaram (1996), who
is more thorough and technically rigorous. Unfortunately, my notes are not entirely
self-contained and do presume some sophistication with calculus and a bit of linear
algebra and matrix algebra (not too much), and worse yet, I havent been entirely
consistent with notation for partial derivatives; I hope the meaning of my notation is
clear from context.
Unconstrained Optimization
Given X Rn , f : X R, and x X, we say x is a maximizer of f if f (x) =

max{f (y) | y X}, i.e., for all y X, f (x) f (y). We say x is a local maximizer
of f if there is some > 0 such that for all y X B (x), f (x) f (y). And x is a
strict local maximizer of f if the latter inequality holds strictly: there is some > 0
such that for all y X B (x) with y 6= x, f (x) > f (y). We sometimes use the term
global maximizer to refer to a maximizer of f .
Our first result establishes a straightforward necessary condition for an interior local
maximizer of a function: the derivative of the function at the local maximizer must
be equal to zero. A direction t Rn is any vector with unit norm, i.e., ||t|| = 1.
Theorem 2.1 Let X Rn , let x X be interior to X, and let f : X R be differentiable at x. If x is a local maximizer of f , then for every direction t, Dt f (x) = 0.
Proof Suppose x is an interior local maximizer, and let t be an arbitrary direction.
Pick > 0 such that B (x) X and for all y B (x), f (x) f (y). In particular,
f (x) f (x + t) for R small. Then
Dt f (x) = lim
f (x + t) f (x)
0
Dt f (x) = lim
f (x + t) f (x)
0,
and
0
Figure 1: First order condition

and we conclude Dt f (x) = 0, as required.
In matrix terms, assuming f is continuously differentiable, the necessary first order

condition is written Df (x)t = 0. This is a generalization of the well-known first order
condition from univariate calculus: if the graph of a function peaks at one point in
the domain, then the graph of the function has slope equal to zero at that point.
In general, we see that the derivative of the function in any direction (in multiple
dimensions, there are many directions) must be zero. See Figure 1.
This suggests an approach for finding the maximizers of a function: we solve the
first order condition on the interior of the domain and check to see if any of these
solutions are maximizers. Now, however, solving the first order condition can itself
be a rather complicated task. Considering the directions pointing along each axis
(the unit coordinate vectors), we see that each partial derivative is equal to zero,
D1 f (x1 , . . . , xn ) = 0
D2 f (x1 , . . . , xn ) = 0
..
.
Dn f (x1 , . . . , xn ) = 0,
a system of n equations in n unknowns (x1 , . . . , xn ). Solving such a system can be
straightforward or impossiblehopefully the former.
Example Letting X = R2+ and f (x) = x1 x2 2x41 x22 , the first order condition is
x2 8x31 = 0
x1 2x2 = 0.
2
Solving the second equation for x1 , we have x1 = 2x2 . Substituting this into the first
equation, we have x2 64x32 = 0, which has three solutions: x2 = 0, 1/8, 1/8. Then
the first order condition has three solutions,
(x1 , x2 ) = (0, 0), (1/4, 1/8), (1/4, 1/8),
but the last of these is not in the domain of f , and the first is on the boundary of the
domain. Thus, we have a unique solution in the interior of the domain: (x1 , x2 ) =
(1/4, 1/8).

The usual necessary second order condition from univariate calculus extends as well.
Theorem 2.2 Let X Rn , let x X be interior to X, and let f : X R be twice
differentiable. If x is a local maximizer of f , then for every direction t, we have
Dt2 f (x) 0.
Proof Assume x is an interior local maximizer of f , let t be an arbitrary direction,
and let > 0 be such that B (x) X and for all y B (x), f (x) f (y). Consider a
sequence {n } such that n 0, so for sufficiently high n, we have x + n t B (x),
and therefore f (x + n t) f (x). For each such n, the mean value theorem yields
n (0, n ) such that
Dt f (x + n t) =
f (x + n t) f (x)
,
n
and therefore Dt f (x + n t) 0. Furthermore, using Dt f (x) = 0, we have

Dt f (x + n t) Dt f (x)
0.
n
Taking limits, we have Dt2 f (x) 0, as required.
In matrix terms, assuming f is twice continuously differentiable, the inequality in

the previous theorem is written t D 2 f (x)t 0. That is, the Hessian of f at x is
negative semi-definite. It is easy to see that the necessary second order condition is
not sufficient.
Example Let X = R and f (x) = x3 x4 . Then Df (0) = D 2 f (0) = 0, but x = 0 is
not a local maximizer.

As for functions of one variable, we can give sufficient conditions for a strict local
maximizer. Although this result is of limited usefulness in finding a global maximizer, we will see that it can be of great use in comparative statics analysis, which
investigates the dependence of maximizers on parameters.
Theorem 2.3 Let X Rn , let x X be interior to X, and let f : X R be twice

continuously differentiable. If Dt f (x) = 0 and Dt2 f (x) < 0 for every direction t, then
x is a strict local maximizer of f .
Proof Assume x is interior, Dt f (x) = 0 and Dt2 f (x) < 0 for every direction t, and
suppose that x is not a local maximizer. Then for all n, there exists xn B 1 (x) such
n
that f (xn ) f (x). For each n, letting tn = ||xn1x|| (xn x), the mean value theorem
yields n (0, 1) such that
Dtn f (n xn + (1 n )x) =
f (xn ) f (x)
0.
||xn x||
Then, letting yn = x + n (xn x), we have

Dftn (yn ) Dtn f (x)
0.
||yn x||
Since {tn } lies in the closed unit ball, a compact set, we may consider a convergent subsequence (still indexed by n) with limit t. Taking limits, we conclude that
Dt2 f (x) 0, a contradiction. We conclude that x is a local maximizer.

In matrix terms, the inequality in the previous theorem is written t D 2 f (x)t < 0. That
is, the Hessian of f at x is negative definite. (See Simon and Blume (1994) for methods to check negative definiteness of a symmetric matrix.) Obviously, the sufficient
second order conditions are not necessary: consider X = R, f (x) = x4 , and x = 0.
Example Return to the example of X = R2+ and f (x) = x1 x2 2x41 x22 . We
saw that the first order condition has a unique interior solution, (x1 , x2 ) = (1/4, 1/8).
Then
3
Dt2 f (1/4, 1/8) = t21 + 2t1 t2 2t22 t21 + 2t1 t2 t22 = (t1 t2 )2 ,
2
which is non-positive. Thus, Dt2 f (1/4, 1/8) 0 for every direction t, but the sufficient
condition of Theorem 2.3 is not satisfied, and it is open for now whether (1/4, 1/8) is
a local maximizer.

Let X Rn be convex, and let f : X R be twice continuously differentiable. To
describe a systematic method for solving the unconstrained maximization problem
max f (x),
xX
we assume for simplicity that X is closed, though not necessarily compact. To represent the values of f at the extremes of the domain, if it is unbounded, let
y =
lim sup{f (x) : x X and ||x|| = k},
where we assume for simplicity that this limit exists, and let
y = sup{f (x) : x bdX}
represent the highest value of f on the boundary of its domain. Assume for simplicity
that the function f has at most a finite number of critical points. There are then
three possibilities.
1. The first order condition has a unique solution x .
(a) If Dt2 f (x ) < 0 for every direction t, then x is the unique maximizer.
(b) If Dt2 f (x ) > 0 for every direction t, then x is the unique minimizer.
An element x is a maximizer if and only if it is a boundary point and
f (x) max{y , y}. There may be no maximizer.
(c) Else, x is the unique maximizer if and only if f (x ) max{y , y}.
If this inequality does not hold, then an element x is a maximizer
if and only if it is a boundary point and f (x) max{y , y}. There
may be no maximizer.
2. There are multiple solutions, say x1 , . . . , xk , to the first order condition.
An element x is a maximizer if and only if it is a critical point or boundary
point and
f (x) max{y , y, f (x1 ), . . . , f (xk )}.
There may be no maximizers.
3. The first order condition has no solution. An element x is a maximizer
if and only if it is a boundary point and f (x) max{y , y}. There may
be no maximizer.
If X is compact, then f has a maximizer, simplifying the situation somewhat.
Example Returning one last time to X = R2+ and f (x) = x1 x2 2x41 x22 , we
have noted that (1/4, 1/8) is the unique interior solution to the first order condition
and that the second directional derivatives of f are non-positive at this point. Thus,
we are in case 1(c). Note that when x1 = 0 or x2 = 0, we have f (x1 , x2 ) 0, and
f (0) = 0. Thus, y = 0. Furthermore, we claim that y = . To see this, take any
value c < 0, and suppose ||(x1 , x2 )|| = k. We argue that when k is sufficiently large,
we necessarily have f (x1 , x2 ) c. Rewriting f (x1 , x2 ) as
f (x1 , x2 ) = (x2 x1 )2 + x21 x1 x2 2x41 ,
we see that f (x1 , x2 ) c if x21 2x41 c. This in turn holds if x1 a =
5
max{1, |c|}.
Otherwise, we have x1 < a, which implies

q
x2 =
k 2 x21 >
k 2 a.
Then we have
f (x1 , x2 ) (x2 x1 )2 + x21 = ( k 2 a a)2 + a2 < c

for high enough k, which establishes the claim. Then f (1/4, 1/8) > 0 = max{y , y},
and we conclude that (1/4, 1/8) is the unique maximizer of this function.

As for functions of one variable, matters are greatly simplified when we consider
concave functions. Recall that a necessary condition for f to be concave is that at the
Hessian matrix D 2 f (x) be negative semi-definite at every interior x, i.e., t D 2 f (x)t 0
for every direction t. When the domain is open, negative definiteness is also sufficient.
Theorem 2.4 Let X Rn be convex, let x X be interior to X, and let f : X R
be differentiable at x and concave. If Df (x) = 0, then x is a global maximizer of f .
1
Proof Suppose Df (x) = 0 but f (y) > f (x) for some y X. Let t = ||yx||
(y x).
Consider any sequence n 0, and let xn = x + n t. When n < ||y x||, note that
xn =
||y xn ||
||xn x||
x+
y
||y x||
||y x||
is a convex combination of x and y. By concavity, we have

f (xn ) f (x)
||xn x||
||yxn ||
f (x)
||yx||
||xn x||
f (y)
||yx||
||xn x||
f (x)
f (y) f (x)
> 0.
||y x||
Taking limits, we have

Dt f (x)
f (y) f (x)
> 0,
yx
a contradiction.
Example Assume X = Rn , and note that f (x) = ||x x||2 is strictly concave. The
first order condition has the unique solution x = y, and we conclude that this is the
unique maximizer of the function. (Of course, we could have verified that directly.)

The next result lays the foundation for comparative statics analysis, in which we
consider how local maximizers vary with respect to underlying parameters of the
problem. Specifically, we study the effect of letting a parameter, say , vary in the
6
objective function. Of course, if x is a local maximizer given parameter , and then

the value of the parameter changes to , then x may no longer be a local maximizer.
But we will see that under the first order and second order sufficient condition, its
location will vary smoothly as we vary the parameter.
Theorem 2.5 Let I be an open interval and X Rn , and let f : X I R be twice
continuously differentiable in an open neighborhood around (x , ), an interior point
of X I. Assume x satisfies the first order condition at , i.e., Dx f (x , ) = 0. If
t Dx2 f (x , )t < 0 for all directions t, then there are an open set Y Rn with x Y ,
an open interval J R with J, and a continuously differentiable mapping
: J Y such that for all J, () is the unique solution to maxxY f (x, )
belonging to Y , and () satisfies the second order sufficient condition at , i.e.,
t Dx2 f ((), )t < 0 for all directions t. Furthermore,
D() = Dx f ((), )[Dx2 f ((), )]1 .
Proof By continuous differentiability, there exist open sets Y X and J I such
that (x , ) Y J and for all (x, ) Y J and all directions t, t Dx2 f (x, )t < 0.
In particular, the matrix Dx2 f (x , ) is non-singular. Since x is a local maximizer
of f (, ), the first order condition holds at x : Dx f (x , ) = 0. By the implicit
function theorem, there exist open sets Y X and J I with (x , ) Y J
and a continuously differentiable function : J Y such that for all J , ()
is the unique solution to Dx f (x, ) = 0 belonging to Y . Define Y to be a nonempty,
convex, open subset of the intersection Y Y and J = J J 1 (Y ). This is
possible because x Y Y and the intersection Y Y is open, so we can choose
Y to be any open ball containing x and included in Y Y . Note that Y is open
and is continuous, so 1 (Y ) is open, so J is open. Furthermore, (i) maps J
into Y , (ii) for all , () is the unique solution to Dx f (x, ) = 0 belonging to J,
and (iii) for all (x, ) Y J and all directions t, t Dx2 f (x, )t < 0. Note that (iii)
implies that f (, ) is strictly concave on Y . Taken together, given any J, (i)(iii)
imply that () is the unique solution to maxyY f (y, ) belonging to Y . To obtain
the derivative, note the () satisfies the first order condition, so
() Dx f ((), ) = 0
for all J, i.e., is constant on J. Then D() = 0 on J, and using the chain
rule, this is
0 = Dx2 f ((), )D() + Dx f ((), ) ,
where we view D() and Dx f ((), ) as 1 n matrices. Since Dx2 f ((), ) is nonsingular, we can premultiply the above equation by its inverse and take transposes to
obtain the desired expression.

7
In case the matrix algebra in the preceding theorem is a bit hard to digest, we can
state the derivative of in terms of partial derivatives when n = 1: it is
D( ) =
Dx f (x , )
.
D2 f (x , )
Given the parameterization of a local maximizer from the previous theorem, we

can consider the locally maximized value of the objective function,
F () = f ((), ),
as a function of . Since f is twice continuously differentiable and is continuously
differentiable, it follows that f ((), ) is continuously differentiable as a function
of . The next result, known as the envelope theorem, provides a simple way
of calculating the rate of change of the locally maximized objective function as a
function of : basically, we take a simple partial derivative of the objective function
with respect to , holding x = () fixed. That is, although the location of the
constrained local maximizer may in fact change when we vary , we can ignore that
variation, treating the constrained local maximizer as fixed in taking the derivative.
Theorem 2.6 Let I be an open interval and X Rn , and let f : X I R be twice
continuously differentiable in an open neighborhood around (x , ), an interior point
of X I. Let : I Rn be a continuously differentiable mapping such that for all
I, () is a local maximizer of f at . Let x = ( ), and define the mapping
F : I R by
F () = f ((), )
for all I. Then F is continuously differentiable and
DF ( ) = D f (x , ).
Proof For all I, the chain rule implies
DF () = Dx f ((), )D() + D f ((), ).
Since () is a local maximizer at , the first order condition Dx f ((), ) = 0
obtains, and the above expression simplifies as required.
Pareto Optimality
A set N = {1, 2, . . . , n} of individuals must choose from a set A of alternatives.

Assume each individual is preferences over alternatives are represented by a utility
function ui : A R. One alternative y Pareto dominates another alternative x if
8
x1
x2
x
Figure 2: Pareto optimals with Euclidean preferences
ui(y) ui (x) for all i, with strict inequality for at least one individual. An alternative
is Pareto optimal if there is no alternative that Pareto dominates it.
Consider the case of two individuals and quadratic utility, i.e., ui(x) = ||x xi ||2 ,
and an alternative x, as in Figure 2. It is clear that any alternative in the shaded
lens is strictly preferred to x by both individuals, which implies that x is Pareto
dominated and, therefore, not Pareto optimal. In fact, this will be true whenever
the individuals indifference curves through an alternative create a lens shape like
this. The only way that the individuals indifference curves wont create such a lens
if they meet at a tangency at the alternative x, and this happens only when x lies
directly between the two individuals ideal points. We conclude that, when there are
just two individuals and both have Euclidean preferences, the set of Pareto optimal
alternatives is the line connecting the two ideal points. See Figure 3 for elliptical
indifference curves, in which case the set of Pareto optimal alternatives is a curve
connecting the two ideal points. This motivates the standard terminology: when
there are just two individuals, we refer to the set of Pareto optimal alternatives as
the contract curve.
3.1
Existence of Pareto Optimals
It is straightforward to provide a sufficient condition for Pareto optimality of an

alternative in terms of social welfare maximization with weights 1 , . . . , n on the
utilities of individuals.
Theorem 3.1 Let x A, and let 1 , . . . , n > 0 be positive weights for each individual. If x solves
n
X
max
i ui (y),
yA
i=1
then x is Pareto optimal.
x1
x2
Figure 3: More Pareto optimals
Proof Suppose x solves the above maximization problem but there is some alternative y that Pareto dominates it. Since ui (y) ui (x) for each i, each term i ui (y)
is at least as great as i ui(x). And since there is some individual, say j, such that
uj (y) > uj (x), and since j > 0, there is at least one y-term that is strictly greater
than the corresponding x-term. But then
n
X
i ui (y) >
i=1
n
X
i ui(x),
i=1
a contradiction.
From the preceding sufficient condition, we can then deduce the existence of at least
one Pareto optimal alternative very generally.
Theorem 3.2 Assume A Rd is compact and each ui is continuous. Then there
exists a Pareto optimal alternative.
P
Proof Define the function f : A R by f (x) = ni=1 i ui(x) for all x. Note that f
is continuous, and so it achieves a maximum over the compact set A. Letting x be
a maximizer, this alternative is Pareto optimal.

We have shown that if an alternative maximizes the sum of utilities for strictly positive
weights, then it is Pareto optimal. The next result imposes Euclidean structure on
the set of alternatives and individual utilities, namely strict quasi-concavity, and
strengthens the result of Theorem 3.1 by weakening the sufficient condition to allow
some weights to be zero.
10
Theorem 3.3 Assume A Rd is convex and each ui is strictly quasi-concave. If

there exist weights 1 , . . . , n 0 (not all zero) such that x solves
max
yA
n
X
i ui (y),
i=1
then it is Pareto optimal.

Proof Suppose x maximizes the weighted sum of utilities over A but is Pareto
dominated by some alternative z. In particular, ui(z) ui (x) for each i. Define
w = 12 x + 12 z, and note that convexity of A implies w A. Furthermore, strict
quasi-concavity implies ui (w) > min{ui(x), ui (z)} = ui(x) for all i. Since the weights
i are non-negative, we have i ui (w) i ui (x) for all i, and since i > 0 for at least
one individual, the latter inequality is strict for at least one individual. But then
n
X
i ui(w) >
i=1
n
X
i ui (x),
i=1
a contradiction. We conclude that x is Pareto optimal.
Our sufficient condition for Pareto optimality for general utilities, Theorem 3.1, relies
on all coefficients i being strictly positive, while Theorem 3.3 weakens this for strictly
quasi-concave utilities to at least one positive i . In general, we cannot state a
sufficient condition that allows some coefficients to be zero, even if we replace strict
quasi-concavity with concavity.
Example Let there be two individuals, A = [0, 1], u1 (x) = x, and u2(x) = 0. These
utilities are concave, and x = 0 maximizes 1 u1 (x) + 2 u2 (x) with weights 1 = 0
and 2 = 1, but it is obviously not Pareto optimal.

In the latter example, of course the problem maxx[0,1] 1 u1 (x)+2 u2 (x) (with 1 = 0
and 2 = 1) has multiple (in fact, an infinite number of) solutions. Next, we provide
a different sort of sufficient condition, relying on uniqueness of solutions to the social
welfare problem, for Pareto optimality.
Theorem 3.4 Assume that for weights 1 , . . . , n 0 (not all zero), the problem
max
yA
n
X
i ui (y)
i=1
has a unique solution. If x solves the above maximization problem, then it is Pareto
optimal.
11
utility for 2
V
z
z + (1 )z
= (1 , . . . , n )
(u1 (x ), . . . , un (x ))
z
U
utility for 1
Figure 4: Utility imputations

The proof is trivial. Suppose that the conditions of the theorem hold and x solves
the problem but is not Pareto optimal; but then there is a distinct alternative y that
provides each individual with utility no lower than x, but then y is another solution
to the problem, a contradiction.
3.2
Characterization with Concavity
As yet, we have derived sufficientbut not necessaryconditions for Pareto optimality. To provide a more detailed characterization of the Pareto optimal alternatives
under convexity and concavity conditions, we define the set of utility imputations as

there exists x A s.t.
n
.
U =
zR :
(u1(x), . . . , un (x)) z
Intuitively, given an alternative x, we may consider the vector (u1 (x), . . . , un (x))
of utilities generated by x. Note that this vector lies in Rn , which has number of
dimensions equal to the number of individuals. The set of utility imputations consists
of all such utility vectors, as well as any vectors less than or equal to them. See Figure
4 for the n = 2 case.
The next lemma gives some useful technical properties of the set of utility imputations.
In particular, assuming the set of alternatives is convex and utilities are concave, it
establishes that the set U of imputations is convex. See Figure 4.
Lemma 3.5 Assume A Rm is convex and each ui is concave. Then U is convex.
Furthermore, if each ui is strictly concave, then for all distinct x, y A and all
12
(0, 1), there exists z U such that

z > (u1 (x), . . . , un (x)) + (1 )(u1(y), . . . , un (y)).
Proof Take distinct z, z U , so there exist x, x A such that
(u1 (x), . . . , un (x)) z
and
(u1 (x ), . . . , un (x )) z .
Since A is convex, we have x = x + (1 )x A. By concavity of ui , we have

ui(x ) ui (x) + (1 )ui(x ) zi + (1 )zi
for all i N. Setting z = (u1 (x ), . . . , un (x )), we have z z + (1 )z , which
implies z + (1 )z U . Therefore, U is convex. Now assume each ui is strictly
concave, and consider any distinct x, x A and any (0, 1). Borrowing the above
notation, strict concavity implies
ui (x ) > ui (x) + (1 )ui(x ),
which implies
z > (u1(x), . . . , un (x)) + (1 )(u1 (x ), . . . , un (x )),
as required.
Next, assuming utilities are concave, we derive a necessary condition for Pareto optimality: if an alternative x is Pareto optimal, then there is a vector of non-negative
weights = (1 , . . . , n ) (not all zero) such that x maximizes the sum of individual
utilities with those weights. Note that we do not claim that x must maximize the
sum of utilities with strictly positive weights.
Theorem 3.6 Assume A Rd is convex and each ui is concave. If x is Pareto
optimal, then there exist weights 1 , . . . , n 0 (not all zero) such that x solves
max
yA
n
X
i ui (y).
i=1
Proof Assume that x is Pareto optimal, and define the set

V
= {z Rn : z > (u1 (x ), . . . , un (x ))}
of vectors strictly greater than the utility vector (u1(x ), . . . , un (x )) in each coordinate. For the remainder of the proof, let z = (u1 (x ), . . . , un (x )) be the utility
vector associated with x . The set V is nonempty, convex, and open (and so has
13
nonempty interior). The set U of imputations is nonempty and, by Lemma 3.5, convex. Note that U V = , for suppose otherwise. Then there exists z U V ,
which implies the existence of x A such that
(u1 (x), . . . , un (x)) z > z .
But then we have xPi x for all i N, contradicting our assumption that x is Pareto
optimal. Therefore, by the separating hyperplane theorem, there is a hyperplane H
that separates U and V . Let H be generated by the linear function f at value c,
and let = (1 , . . . , n ) Rn be the non-zero gradient of f . Then we may assume
without loss of generality that for all z U and all w V , we have f (z) c f (w),
i.e., z c w. We claim that z = c, and particular that x solves the
maximization problem in the theorem. Since z U , it follows immediately that
f evaluated at this vector is less than or equal to c. Suppose it is strictly less so,
i.e., z < c. Given > 0, define w = z + (1, 1, . . . , 1), and note that w V ,
and therefore w c. But for sufficiently small, we in fact have w < c, a
contradiction. That x solves the maximization problem in the theorem then follows
immediately: for all x A, we have (u1 (x), . . . , un (x)) U , and then
(u1 (x), . . . , un (x)) c = z ,
or equivalently,
X
iN
i ui(x)
i ui (x ),
iN
as claimed. Finally, we claim that Rn+ , i.e., i 0 for all i N. To see this,
suppose that i < 0 for some i. Then we may define the vector w = z + ei , and for
high enough , we have
w = z + i i < z .
For all > 0, we have w = w + (1, 1, . . . , 1) V , and therefore w c. But we
may choose > 0 sufficiently small that w < z = c, a contradiction. Thus,
Rn+ \ {0}.

The proof of the previous result uses the separating hyperplane theorem and the
following insight. We can think of the social welfare function above as merging two
steps: first we apply individual utility functions to an alternative x to get a vector,
say z = (z1 , . . . , zn ), of individual utilities, and then we take the dot product z
to get the social welfare from x. Of course, dot products are equivalent to linear
functions, so we can view the second step as applying a linear function f : Rn R to
the vector of utilities. Geometrically, when n = 2, we can draw the level sets of the
linear function, and if x maximizes social welfare with weights , then the vector
of utilities from x , denoted (u1 (x ), . . . , un (x )), must maximize the linear function
over the set U of utility imputations. See Figure 4.
14
utility for 2
= (1 , 2 ) 0
(u1 (1), u2(1))
U
utility for 1
Figure 5: No strictly positive weights

With the previous result, this provides a complete characterization of Pareto optimality (under appropriate convexity/concavity conditions) in terms of optimization
theory.
Corollary 3.7 Assume A Rd is convex and each ui is strictly concave. Then x is
Pareto optimal if and only if there exists weights 1 , . . . , n 0 (not all zero) such
that x solves
n
X
max
i ui (y).
yA
i=1
The condition that the weights are non-negative but not all zero cannot be strengthened to the condition that they are all strictly positive in the necessary condition of
Theorem 3.6 and Corollary 3.7.
Example Suppose there are two individuals who must choose an alternative in the
unit interval, A = [0, 1], with quadratic utilities: u1 (x) = x2 and u2 (x) = (1 x)2 .
Then x = 1 is Pareto optimal, yet there do not exist strictly positive weights 1 , 2 >
0 such that x maximizes 1 u1 (y) + 2 u2 (y). See Figure 5. Given any strictly positive
weights, 1 and 2 , the level set through (0, 1) of the linear function with gradient
(1 , 2 ) cuts through the set of utility imputations; thus, (u1(1), u2 (1)) does not
maximize the linear function over the set of imputations.

The previous corollary uses the assumption of strict concavity to provide a full characterization of Pareto optimality. It is simple to deduce a more general conclusion
that relies instead on the uniqueness condition of Theorem 3.4.
15
Corollary 3.8 Assume A Rd is convex and each ui is concave. Furthermore,

assume that for all weights 1 , . . . , n 0 (not all zero), the problem
max
yA
n
X
i ui (y)
i=1
has a unique solution. Then x is Pareto optimal if and only if there exist weights
1 , . . . , n 0 (not all zero) such that x solves the above maximization problem.
One direction follows immediately from Theorem 3.6. Under the conditions of the
corollary, suppose x solves the maximization problem for some non-negative weights
(not all zero). Then Theorem 3.4 implies x is Pareto optimal, as required.
With the necessary condition for Pareto optimality established in Theorem 3.6, we
can use calculus techniques to calculate contract curves in simple examples with two
individuals. Let x intX be Pareto optimal, which therefore maximizes 1 u1 (x) +
2 u2 (x) for some 1 , 2 0 such that 1 + 2 > 0. Then the first order necessary
condition holds, and for all coordinates j, k = 1, . . . , n, we have
1 Dj u1 (x ) + 2 Dj u2 (x ) = 0
1 Dk u1 (x ) + 2 Dk u2 (x ) = 0.
Note that when Dk u1 (x ) 6= 0 and Dk u2 (x ) 6= 0, we have
Dj u1 (x )
Dj u2 (x )
=
.
Dk u1 (x )
Dk u2 (x )
That is, the marginal rates of substitution of k for j are equal for the two individuals,
i.e., their indifference curves are tangent, as in Figures 2 and 3. And although the
machinery we have developed thus far requires the utilities u1 and u2 in the preceding
discussion to be concave, we will see that the analysis extends more generally.
Example Suppose A = Rd and each ui is quadratic. Since quadratic utilities are
strictly concave, it follows that x is Pareto optimal if and only if there exist weights
1 , . . . , n 0 (not all zero) such that x solves
max
yA
n
X
i ui (y).
i=1
P
Furthermore, since each ui is strictly concave, the function ni=1 i ui (x) is strictly
concave, so x is a solution to the above maximization problem if and only if it solves
the first order condition
0 = D
n
X
i ui(x) =
i=1
n
X
i=1
16
2i (
xi x),
Figure 6: Convex hull of ideal points
or
x =
n
X
i=1
Finally, writing
i =
Pni
j=1
i
Pn
j=1 j
xi .
, we have
i 0 for all i,
x =
n
X
Pn
i=1
i = 1, and
i xi ,
i=1
i.e., x is a convex combination of ideal points with weights

i . This gives us a
characterization of all of the Pareto optimal alternatives: an alternative is Pareto
optimal if and only if it is a convex combination of individual ideal points. That is,
we connect the exterior ideal points to create an enclosed space, and the Pareto
optimals consist of that line and the area within. See Figure 6.

Since we rely only on ordinal information contained in utility representations, and
any utility representation ui is equivalent, for our purposes, to an infinite number of
others resulting from monotonic transformations of ui . This may seem to run counter
to the result just described: if x maximizes social welfare with weights (1 , . . . , n )
for one specification of utility representations, u1 , . . . , un , then there is no cause to
think it will maximize social welfare with those weights for a different specification,
say 5u1, u32 , ln(u3 ), . . .. Indeed, it may not. But if we take monotonic transformations
of the original utility functions, x will still be Pareto optimal, and there will still exist
weights, say (1 , . . . , n ), for which x maximizes social welfare. In short, Theorem
3.6 says that a Pareto optimal alternative will maximize social welfare for suitably
chosen weights, but those weights may depend on the precise specification of utility
functions.
17
3.3
Characterization with Differentiability
When utilities are differentiable, we can sharpen the characterization of the previous
subsection. We first note that at an interior Pareto optimal alternative, the gradients
of the individuals are linearly dependent.
Theorem 3.9 Assume A Rd , and let x be interior to A. Assume P
each ui is differentiable at x. Then there exist 1 , . . . , n 0 (not all zero) such that ni=1 i Dui(x) =
0.
Proof If there do not exist such weights, then 0
/ conv{Du1 (x), . . . , Dun (x)}. Then
by the separating hyperplane theorem, there is a non-zero vector p Rn such that
p Du1 (x) > 0, . . . , p Dun (x) > 0. Then there exists > 0 such that x + p A and
ui(x + p) > ui(x) for all i, contradicting Pareto optimality of x.

An easy implication of Theorem 3.9 is a differentiable version of Theorem 3.6. Indeed,
if each ui is differentiable and concave and x is Pareto optimal, then there
weights
Pare
n
1 , . . . , n 0 such that x satisfies the first order condition for maxyA i=1 i ui (y),
and by concavity, x solves the maximization problem.
We can take a geometric perspective by defining the mapping u : X Rn from
alternatives to vectors of utilities, i.e., u(x) = (u1 (x), . . . , un (x)). Then the derivative
of u at x is the matrix
u1
u1
(x)
(x)
x1
xd
..
..
..
.
.
.
.
un
(x)
x1
un
(x)
xd
The span of the columns is a linear subspace of Rn called the tangent space of u
at x. Theorem 3.9 implies that at a Pareto optimal alternative, the rank of this
derivative is n 1 or less. By Pareto optimality, u(x) belongs to the boundary of
u(X). Furthermore, the theorem implies
u1
u1
(x) x
(x)
x1
d

..
..
..
1 n
= 0,
.
.
.
un
(x)
x1
un
(x)
xd
so the tangent space has a normal vector (1 , . . . , n ) with non-negative coordinates.
The weights in Theorem 3.9 cannot be unique: if weights (1 , . . . , n ) fulfill the

theorem, then any positive scaling of the weights does as well. But when the derivative
Du(x) has rank n 1, the weights are unique up to a positive scalar. Indeed, when
the derivative has rank n 1, the tangent space at u(x) is a hyperplane of dimension
n 1, e.g., it is a tangent line when n = 2 and a tangent plane when n = 3. See
18
utility
for 1
du
(x)
dx1
normal space
u(x)
du
(x)
dx3
du
(x)
dx2
boundary
of u(X)
utility
for 2
utility
for 3
Figure 7: Unique weights
Figure 7 for the three-individual case. Then the normal space is one-dimensional,
and the uniqueness claim follows.
Theorem 3.10 Assume A Rd , and let x be interior to A. Assume each ui is
differentiable at x and that
P Du(x) has rank n 1. Then there exist 1 , . . . , n 0
(not all zero) such that ni=1 i Dui (x) = 0, and these weights are unique up to a
positive scaling.
The rank condition used in the previous result, while reasonable in some contexts,
is restrictive; it implies, for example, that the set of alternatives has dimension at
least n 1. Note that the condition that the weights are non-negative and not all
zero implies that the tangent line at u(x) is downward sloping when n = 2, and it
formalizes the idea that the boundary of u(X) at u(x) is downward sloping for any
number of individuals.
Constrained Optimization
A constrained maximization problem is one in which we search for a maximizer within

a constraint set C Rn . Given domain X Rn , constraint set C Rn , and objective
19
function f : X R, the problem is

maxxX f (x)
subject to x C.
That is, we want a vector x X C such that for all y X C, f (x) f (y).
An element x X C is a constrained local maximizer of f subject to C if there
exists some > 0 such that for all y B (x) X C, f (x) f (y).
Similarly, an element x X C is a constrained strict local maximizer of f subject
to C if there exists some > 0 such that for all y B (x) X C with y 6= x, we
have f (x) > f (y).
As long as f is continuous and X C is nonempty and compact, there is at least one
(global) constrained maximizer.
We will first consider constraint sets C taking the form of a single equality constraint:
C = {x Rn | g(x) = c},
where g : Rn R is any function, and c R is a fixed value of g. We write a
maximization problem subject to such a constraint as
maxxX f (x)
s.t. g(x) = c.
You might think of g(x) as a cost and c as a pre-determined budget. The latter
formulation is unrestrictive, but we will impose more structure (i.e., differentiability)
on g. Then we will allow multiple equality constraints g1 : Rn R, . . . , gm : Rn R,
so that the constraint set takes the form
C = {x Rn | g1 (x) = c1 , . . . , gm (x) = cm },
where cj is a fixed value of the jth constraint for j = 1, . . . , m.
We then consider the maximization problem with multiple inequality constraints: C
satisfying
C = {x Rn |g1 (x) c1 , . . . , gm (x) cm }.
These problems are written
maxn f (x)
xR
..
.
s.t. g1 (x)
gm (x)
20
c1
cm ,
now defining f on the entire Euclidean space and building any restrictions on the
domain into the constraints of the problem. Of course, it may be that gj (x) = xj
and cj = 0, so the j th constraint is just a non-negativity constraint: xj 0. Note
that the problem of equality constraints is a special case of inequality constraints: we
can always convert g(x) = c into two inequalities g(x) c and g(x) c.
Finally, we consider the general problem with mixed constraints, where we are given
equality constraints g1 : Rn R, . . . , g : Rn R and inequality constraints h1 : Rn
R, . . . , hm : Rn R. Then the form of the constraint set is

g1 (x) = c1 , . . . , g (x) = c
n
.
C =
xR |
h1 (x) d1 , . . . , hm (x) dm
We consider the hybrid optimization
maxxRn f (x)
s.t gj (x) = cj , j = 1, . . . ,
hj (x) dj , j = 1, . . . , m,
again defining f on the entire Euclidean space.
5
5.1
Equality Constraints
First Order Analysis
As in the case of unconstrained optimization, we can derive restrictions on constrained

local maximizers in terms of directional derivatives. In contrast to the unconstrained
case, however, we can only consider moves in a small range of directionsonly in
directions t that are orthogonal to the level set of g at c. In truth, we can only move
along the level set of g at c; moving from a constrained local maximizer in such a
direction t can violate the constraint if g is non-linear, a fact that is the source of some
complexity of the analysis. But we can almost move in the directions orthogonal
to the gradient of the constraint function, and that is enough for our purposes.
In problems with equality constraints, constrained local maximizers look something
like x in Figure 8. Note that such a vector need not be a constrained global maximizer
here, f takes a strictly higher value at y, which also satisfies the constraint.
Note that the level sets of f and g are tangent at x. In other words, their gradients
are collinear (maybe pointing in opposite directions). When thats not the case, we
can find a point like y in Figure 9 with g(y) = c and f (y) > f (x). Moreover, we can
find such vectors arbitrarily close to x, so x cant be a local maximizer.
21
x2
level sets
of f
y
Df (x)
x
g=c
Dg(x)
x1
Figure 8: Constrained local maximizer
x2
level set
of f
Df (x)
x
g=c
y
Dg(x)
x1
Figure 9: Not a constrained local maximizer
22
x2
Dg(x)
g=c
z
x1
x=0
(z)
I
Figure 10: Proof of Lagrange
Theorem 5.1 (Lagrange) Let X Rn , f : X R, and g : Rn R. Assume f
and g are continuously differentiable in an open neighborhood around x, an interior
point of X. Also assume Dg(x) 6= 0. If x is a constrained local maximizer of f subject
to g(x) = c, then there is a unique multiplier R such that
Df (x) = Dg(x).
(1)
Proof I provide a heuristic argument for the case of two variables. The idea is to
transform the constrained problem into an unconstrained one. The theorem assumes
that Dg(x) 6= 0, and (only to simplify notation) we will assume x = 0 and D2 g(x) 6= 0.
The implicit function theorem implies that in an open interval I around x1 = 0, we
may then view the level set of g at c as the graph of a function : I R such that for
all z I, g(z, (z)) = c. See Figure 10. Note that 0 = x = (0, (0)). Furthermore,
is continuously differentiable with derivative
D(z) =
D1 g(z, (z))
.
D2 g(z, (z))
(2)
Because x is interior to X, we can choose the interval I small enough that each
(z, (z)) belongs to the domain X of the objective function. Then z = 0 is a local
maximizer of the unconstrained problem
maxzI f (z, (z)),
23
and we know the first order condition holds, i.e., differentiating with respect to z and
using the chain rule, we have
D1 f (0) + D2 f (0)D(0) = 0,
which implies
D1 f (0) + D2 f (0)
D1 g(0)
= 0.
D2 g(0)
2 f (0)
, we have Df (0) = Dg(0), as desired.
Defining = D
D2 g(0)
Of course, the first order condition from Lagranges theorem can be written in terms
of partial derivatives:
f
g
(x) =
(x)
xi
xi
for all i = 1 . . . n. Thus, the theorem gives us n + 1 equations (including the constraint) in n + 1 unknowns (including ). If we can solve for all of the solutions of this
system, then we have an upper bound on the interior constrained local maximizers.
Remember: the theorem of Lagrange gives a necessary condition for a constrained
local maximizer, not a sufficient one; the solutions to the first order condition may
not be local maximizers.
The number is the Lagrange multiplier corresponding to the constraint. The condition Dg(x) 6= 0 is called the constraint qualification. Without it, the result would
not be true.
Example Consider X = R, f (x) = (x + 1)2 , and g(x) = x2 . Consider the problem
of maximizing f subject to g(x) = 0. The maximizer is clearly x = 0. But Dg(0) = 0
and Df (0) = 2, so there can be no such that Df (0) = Dg(0).

There is an easy way to remember the conditions in the corollary to the Lagranges
theorem: if x is an interior constrained local maximizer of f subject to g(x) = c,
and if Dg(x) 6= 0, then there exists R such that (x, ) is a critical point of the
function L : X R R defined by
L(x, ) = f (x) + (c g(x)).
That is, there exists R such that
L
(x, )
x1
=
..
.
f
(x)
x1
L
(x, )
xn
L
(x, )
=
=
f
(x)
xn
g
x
(x) = 0
1
..
.
g
x
(x) = 0
n
c g(x)
= 0,
24
which is equivalent to the first order condition (1). The function L is called the
Lagrangian function.
Though its not quite technically correct, its as though weve converted a constrained
maximization problem into an unconstrained one: maximizing the Lagrangian L(x, )
with respect to x. Imagine allowing xs that violate the constraint; for example,
suppose, at a constrained maximizer x , that we could increase the value of f by
moving from x to a nearby point x with g(x) < c. Since this x violates the constraint,
we dont want this to be profitable, so the Lagrangian has to impose a cost of doing
so in the amount (c g(x)) (here, has to be positive). Then is like a price
of violating the constraint imposed by the Lagrangian. The reason why this is not
technically correct is that given the multiplier , a constrained local maximizer need
not be a local maximizer of L(, ).
Example Consider X = R, f (x) = (x 1)3 + x, g(x) = x, and
maxxR f (x)
s.t. g(x) = 1
The unique solution to the constraint, and therefore to the maximization problem,
is x = 1. Note that Df (x) = 3(x 1)2 + 1 and Dg(x) = 1, and evaluating at the
solution x = 1, we have Df (1) = 1 = Dg(1). Thus, the multiplier for this problem is
= 1. The Lagrangian is
L(x, ) = (x 1)3 + x + (1 x),
and evaluated at = 1, this becomes
L(x, 1) = (x 1)3 + 1.
But note that this function is strictly increasing at x = 1, i.e., for arbitrarily small
> 0, we have L(1 + , 1) > L(1, 1), so x = 1 is not a local maximizer of L(, 1).
Note the following implication of Lagranges Theorem: at a constrained local maximizer, x, we have
f
(x)
xi
f
(x)
xj
g
(x)
xi
g
(x)
xj
for all i and j. The lefthand side is the marginal rate of substitution telling us the
value of xi in terms of xj . The righthand side tells us the cost of xi in terms of xj .
Lagrange tells us that, at an interior local maximizer, those have to be the same.
Recall that Lagranges theorem only gives us a necessarynot a sufficientcondition
for a constrained local maximizer. To see why the first order condition is not generally
sufficient, consider the following example.
25
Example Consider X = R2 , f (x1 , x2 ) = x1 + x22 , g(x1 , x2 ) = x1 , and

maxxR2 f (x1 , x2 )
s.t. g(x1 , x2 ) = 1.
Note that x = (1, 0) satisfies the constraint g(x ) = 1, and the constraint qualification is also satisfied. Furthermore, the first order condition from Lagranges theorem
is satisfied at x = (1, 0). Indeed, Df (x) = (1, 2x2 ) and Dg(x) = (1, 0). Evaluating
at x , we have Df (x ) = (1, 0) = Dg(x). Thus, the equality Df (x ) = Dg(x) is
obtained by setting = 1, as in Lagranges theorem. But x is not a constrained local
maximizer: for arbitrarily small > 0, (1, ) satisfies g(1, ) = 1 and f (1, ) > f (x ).

Note that the objective function in the preceding example violates quasi-concavity.
I claim, for now without proof, that when the objective function is concave and the
constraint is linear, the first order condition from Lagranges theorem is sufficient
for a global maximum. But what if g is linear, the first-order condition is satisfied
at x, and f is only quasi-concave? Must x be a global maximizer? The answer,
unfortunately, is no. In fact, x need not even be a local maximizer.
Example Consider f (x1 , x2 ) = x32 , g(x1 , x2 ) = x1 , and
maxxR2 f (x1 , x2 )
s.t. g(x1 , x2 ) = 1.
Note that f is quasi-concave, that g is linear with gradient (1, 0) satisfying the constraint qualification, and that x = (1, 0) satisfies the constraint g(x ) = 1. Furthermore, Df (x1 , x2 ) = (0, 3x22 ). Evaluating at x , we have Df (x ) = 0, and we obtain
the equality Df (x ) = Dg(x) by setting = 0. But x is not a constrained local
maximizer: for arbitrarily small > 0, (1, ) satisfies g(1, ) = 1 and f (1, ) > f (x ).

But the example leaves open one possibility for a general result. In the example, the
objective function was quasi-concave, but the gradient at x was zero; what if f is
quasi-concave and the gradient is non-zero? The next result establishes that these
conditions are indeed sufficient for a global maximizer. It actually follows from a more
general result, Theorem 6.2, for inequality constrained maximization, so we defer the
proof until then.
Theorem 5.2 Let X be open and convex, let f : X R be quasi-concave and continuously differentiable, and let g : Rn R be linear. If g(x) = c and there exists
R such that the first order condition (1) holds with respect to x, then x is a constrained
global maximizer of f subject to g(x) = c provided either of two conditions holds:
26
1. Df (x) 6= 0, or
2. f is concave.
The preceding example shows that the first order condition is not sufficient for a
local maximizer (and a fortiori, not for a global maximizer). One approach to this
problem, taken above, is to add the assumption of non-zero gradient. An alternative is to strengthen the first order condition to the assumption that x is a local
maximizer. . . but this hope is not realized: in the previous example, re-define f to
be constant at zero whenever x2 < 0, leaving the definition unchanged whenever
x2 0; then every vector with x2 < 0 is a local maximizer (right?) but not a global
maximizer. We end by strengthening these assumptions even further, and deducing
an even stronger condition: if f is quasi-concave and x is a constrained strict local
maximizer, then x is the unique global maximizer.
Theorem 5.3 Let X Rn be convex, let f : X R be quasi-concave, and let
g : Rn R be linear. If x X is a constrained strict local maximizer, then it is the
unique constrained global maximizer, i.e., it is the unique maximizer of f subject to
g(x) = c.
Proof Assume x X is a constrained strict local maximizer, and suppose there
exists y X with y 6= x such that g(y) = c and f (y) f (x). Let > 0 be
such that for all z X C B (x) with z 6= x, we have f (x) > f (z). Given any
with 0 < < 1, define z() = y + (1 )x. Then quasi-concavity implies
f (z()) min{f (x), f (y)} = f (x). Furthermore, with g(x) = g(y) = c, linearity of g
implies g(z()) = c. But for small enough > 0, we have z() X C B (x) and
f (z()) 0, a contradiction.

Of course, if f is strictly quasi-concave and x is a constrained local maximizer, then
it is a constrained strict local maximizer, and the theorem can be applied.
5.2
Examples
Example A consumer purchases a bundle (x1 , x2 ) to maximize utility. His income is

I > 0 and prices are p1 > 0 and p2 > 0. His utility function is u : R2+ R. We assume
u is differentiable and monotonic in the following sense: for all (x1 , x2 ) and (y1 , y2 )
with x1 y1 and x2 y2 , at least one inequality strict, we have u(x1 , x2 ) > u(y1, y2 ).
The consumers problem is:
max(x1 ,x2 )R2+ u(x1 , x2 )
s.t. p1 x1 + p2 x2 = I.
27
Note that we impose the constraint that the consumer must spend all of his income;
since we assume monotonicity, this is without loss of generality. The set X C =
R2+ {(x1 , x2 ) | p1 x1 + p2 x2 = I} is compact (since p1 , p2 > 0), and u is continuous,
so the maximization problem has a solution. We can apply Lagranges theorem with
f (x1 , x2 ) = u(x1 , x2 )
g(x1 , x2 ) = p1 x1 + p2 x2
c = I
to find all the constrained local maximizers (x1 , x2 ) interior to R2+ (i.e, x1 , x2 > 0)
satisfying Dg(x1 , x2 ) 6= 0. In fact, for all (x1 , x2 ) R2+ ,
Dg(x1 , x2 ) = (p1 , p2 ) 6= 0,
so the constraint qualification is always met. Letting (x1 , x2 ) be an interior constrained local maximizer, there exists R such that (x1 , x2 , ) is a critical point of
the Lagrangian:
L(x1 , x2 , ) = u(x1 , x2 ) + (I p1 x1 p2 x2 ).
That is,
L
u
(x1 , x2 , ) =
(x1 , x2 , ) p1 = 0
x1
x1
L
u
(x1 , x2 , ) =
(x1 , x2 , ) p2 = 0
x2
x2
L
(x1 , x2 , ) = I p1 x1 p2 x2
= 0.
Solving these equations gives us the critical points of the Lagrangian, and if a maximizer (x1 , x2 ) is interior to R2+ (x1 , x2 > 0 ), then it will be one of these critical points.
Note that
u
(x1 , x2 )
x1
u
(x1 , x2 )
x2
p1
,
p2
i.e., the relative value of x1 in terms of x2 equals the relative price. Consider the CobbDouglas special case u(x1 , x2 ) = x1 x2 , where , > 0. Its clear that every maximizer
must be interior to R2+ . (Right?) The critical points of the Lagrangian satisfy
x1
x2 p1 = 0
1
x1 x21 p2 = 0,
Divide to get
x2 p 1
,
x1 p 2
or x2 =
p1
x.
p2 1
Plug into p1 x1 + p2 x2 = I to get

p1
p1 x1 + p2
x1
= I,
p2
28
so the unique critical point of the Lagrangian is

x1 =
I
+ p1
and
x2 =
I
.
+ p2
Since this critical point is unique, it is the unique maximizer, and we call
I
+ p1
I
x2 (p1 , p2 , I) =
+ p1
x1 (p1 , p2 , I) =
demand functions. They tell us the consumers consumption for different prices and
incomes. Fixing p2 and I, we can graph x1 as a function of p1 , which gives us the
demand curve for good 1. We can also solve for by substituting into x1
x2 = p1 .
1
This gives us,

1 1

=
p1 +
p1
+
p2

+1
I
=
.
p1
p2
+
If + = 1, then the last term drops out. Note that we can always take a strictly
increasing transformation of Cobb-Douglas utilities to obtain + = 1 without altering the consumers demand functions, but such a transformation can affect the
Lagrange multiplier.
Example Now consider the distributive model of social choice, where the set of
alternatives is the unit simplex,
)
(
n
X
xi = 1 ,
X =
x Rn+ |
i=1
and each individual simply wants more of this scarce resource for him- or herself.
Formally, assume each is utility function ui(x1 , . . . , xn ) is strictly increasing in xi
and invariant with respect to reallocations of the resource among other individuals.
Consider the welfare maximization problem of a social planner with non-negative
weights 1 , . . . , n (not all zero):
max
xX
n
X
i ui (x).
i=1
From Lagranges theorem, all interior local maximizers satisfy

n
ui (x)
X
i ui(x) = i
=
xi i=1
xi
n
X
xi = 1.
i=1
29
In contrast to unconstrained maximization, where the first order condition means that
the marginal impact of changing each choice variable is zero, now an interior allocation
can be a local maximizer only if the marginal impacts are equalized across individuals.
If a local maximum involves some individuals receiving an allocation of zero, then the
logic extends to all individuals receiving a positive amount of the resource. Now
consider the special case ui (x) = ln(xi ). (Henceforth, we only consider alternatives in
which each individual receives a strictly positive amount of the resource, so utilities
are well-defeined.) These utilities are concave in x but not strictly concave or even
strictly quasi-concave. Given the structure of the set of alternatives and utilities, we
can write the first order condition as
n
X
i
i ln(xi ) =
=
xi i=1
xi
n
X
xi = 1,
i=1
and it is straightforward to deduce that the unique maximizer is x = (1 , . . . , n ).

Interestingly, we have seen this problem before in the Cobb-Douglas example of the
consumers problem: the maximization problem in the distributive setting is unaffected if we take a strictly increasing transformation of the objective function, so we
can replace the above objective with
n
n
Y
Y
Pn
i
ln(x
)
ln
x
i
i
e i=1
e i =
xi i ,
=
i=1
i=1
which has the form of a Cobb-Douglas utility function with exponent i on xi ; thus,
the above problem is isomorphic to the problem of a Cobb-Douglas consumer facing
unit prices p1 = = pn = 1 and income I = 1. Because the maximization problem
has a unique solution for all such weights, the characterization result of Corollary 3.8
applies, and so we have solved for all Pareto optimal alternatives. In fact, by varying
the weights 1 , . . . , n , we conclude that every alternative is Pareto optimal a fact
that was pretty obvious from the outset (right?).
Example Prior to a national election, suppose a political party must decide how
much to spend in a number of electoral districts i = 1,
P.n. . , n. Let xi denote the amount
spent in district i, and assume x1 0, . . . xn 0, i=1 xi = I. The probability the
party wins district i is Pi (xi ), where Pi : R+ R is a differentiable function. The
party seeks to maximize the expected number of districts it wins, i.e.,
P
max(x1 ,...xn )Rn+ ni=1 Pi (xi )
s.t. x1 + + xn = I.
The first order conditions for an interior local maximizer are

n
X
DP1 (x1 ) = , . . . , DPn (xn ) = ,
xi = I.
i=1
30
Again, the first order conditions reduce to the following simple principle: allocate
money to districts in a way that equalizes marginal probability of victory across
the districts. Note that the special case Pi (xi ) = i ln(xi ) is equivalent to the CobbDouglas specification of the consumers problem. For an alternative parameterization,
it could be that
Pi (xi ) =
i +
+ xi
where i < and i may vary across districts. The first order condition is
1
2
n1
n
.
2 =
2 = =
2 =
( + x1 )
( + x2 )
( + xn1 )
( + xn )2
The solutions to these equations will include all interior maximizers, if any. (Whether
there are any will depend on the i s. If i is close to so the probability of victory
2 ( )
is close to one, spending will be low. If i < (+1)2j for all j 6= i, spending in
district j will be zero.) The n = 2 case is analytically tractable:
(1 + 2 ) + 1 I 1 2 (I + 2)
x1 =
2 1
(1 + 2 ) + 2 I 1 2 (I 2)
,
x2 =
1 2
where 1 = 1 and 2 = 2 .
5.3
Second Order Analysis
Second order necessary and sufficient conditions are more complicated than they
were in unconstrained optimization. As in the first order analysis, a condition on
second order directional derivatives needs to be satisfied at an interior constrained
local maximizerbut only in directions t that are orthogonal to the level set of g
at c. Once again, we must deal with the complication that we can only move along
the level set of g at c; moving from a constrained local maximizer in such a direction
t can violate the constraint if g is non-linear. Once again, the insight is to convert
the constrained optimization problem into an unconstrained one using the implicit
function theorem.
Theorem 5.4 Let f : X R and g : Rn R be twice continuously differentiable in
an open neighborhood around x, an interior point of X. Assume Dg(x) 6= 0. Assume
x is a constrained local maximizer of f subject to g(x) = c, and let R satisfy the
first order condition (1). Then
t [D 2 f (x) D 2 g(x)]t 0
for all directions t with Dg(x)t = 0.
31
(3)
Proof We give a heuristic proof for the two-variable case similar to that of Lagranges
theorem. As above, we have Dg(x) 6= 0, and we assume for simplicity that x = 0 and
D2 g(x) 6= 0. To further simplify matters, assume that the gradient of g at x points
straight up, so D1 g(x) = 0. (This just amounts to a rotation of axes that doesnt
affect the second order analysis.) Again, we have an open interval I around x1 = 0
and a continuously differentiable function : I R such that for all z I, we have
g(z, (z)) = c. Because we assume Dg(0) = (0, D2g(0)), this means Dg(x)t = 0 for
exactly two directions, i.e., t = (1, 0) and t = (1, 0). In either case, the necessary
second order condition is
D12 f (0) D12 g(0).
To obtain this, we note again that z = 0 is a local maximizer of the unconstrained
problem
max f (z, (z)).
zI
That is, defining the function : I R by (z) = f (z, (z)), z = 0 is a local

maximizer of . Thus, the first order necessary condition holds,
D(0) = D1 f (0) + D2 f (0)D(0) = 0,
where the derivative of has the form (2) given by the implicit function theorem.
Furthermore, the necessary second order condition holds, i.e., D 2 (0) 0. To expand
this we use the chain rule to calculate the derivatives of as follows:
D(z) = D1 f (z, (z)) + D2 f (z, (z))D(z)
D 2 (z) = D12 f + D1,2 f D + [D1,2 f + D2,2 f D]D + D2 f D2 ,
where f is evaluated at (z, (z)) and at z. To unpack this further, we must expand
the second derivative of using (2) and the chain rule:
D2 g[D12g + D1,2 gD] D1 g[D1,2 g + D2,2 gD]
(D2 g)2
D2 gD12g D1 gD1,2g
,
=
(D2 g)2
D 2 (z) =
again evaluating g at (z, (z)) and at z. These derivatives simplify considerably at

z = 0, because D1 g(0) = 0 and (2) imply D(0) = 0. Then
D 2 (0) = D12 f (0) + D2 f (0)D 2 (0) 0
D2 g(0)D12g(0)
D12 g(0)
D 2 (0) =
=
.
(D2 g(0))2
D2 g(0)
Substituting the latter expression for D 2 (0) into the inequality D 2 F (0) 0, we have
D12 f (0) D2 f (0)
32
D12g(0)
0.
D2 g(0)
Finally, recall that the first order condition (1) implies D2 f (0) = D2 g(0), so the
preceding inequality becomes D12 f (0) D12 g(0), as required.

We can write the necessary second order condition in (3) in terms of the Lagrangian.
Recall the Lagrangian of an equality constrained maximization problem is defined as
L(x, ) = f (x) + (c g(x)).
Suppose the first order condition (1) holds at x with multiplier , i.e.,
Dx L(x, ) = Df (x) Dg(x) = 0,
and define the Hessian of the Lagrangian with respect
2
2
L(x, )
L(x, )
x1 x2
x21
2
2
x2 x1 L(x, )
L(x, )
x22
Dx2 L(x, ) =
..
..
.
.
2
2
L(x, ) xn x2 L(x, )
xn x1
to x as
..
.
Then the necessary second order condition is
2
L(x, )
x1 xn
2
L(x, )
x2 xn
..
.
2
L(x, )
x2
n
t Dx2 L(x, )t 0
for all t with Dg(x)t = 0.
How do we check whether the Hessian satisfies these inequalities? We can form the
bordered Hessian of the Lagrangian,

0
Dg(x)
,
Dg(x) Dx2 L(x, )
and then check signs of the last n 1 leading principal minors of the matrix. But
this takes us beyond the scope of these notes. See Simon and Blume (1994) for a nice
explanation.
Of course, the latter result provides only a necessary second order condition, not a
sufficient one. Strengthening the condition to strict inequality, we have a stronger
second order condition that is sufficient for a constrained strict local maximizer. Note
that, in contrast to the analysis of necessary conditions, the next result does not rely
on the constraint qualification.
Theorem 5.5 Let f : X R and g : Rn R be twice continuously differentiable
in an open neighborhood around x, an interior point of X. Assume x satisfies the
constraint g(x) = c and the first order condition (1) with multiplier R. If
t [D 2 f (x) D 2 g(x)]t < 0
(4)
for all directions t with Dg(x)t = 0, then x is a constrained strict local maximizer of
f subject to g(x) = c.
33
Again, we can write the sufficient second order condition in terms of the Lagrangian
as t Dx2 L(x, )t < 0 for all t with Dg(x)t = 0.
In fact, with the first order condition (1) from Lagranges theorem, the second order
sufficient condition implies much more. It implies that x is locally isolated, i.e., there
is an open set Y Rn around x such that x is the unique constrained local maximizer
belonging to Y . Furthermore, following the analysis of unconstrained optimization,
we can consider the possibility that the objective function f and the constraint function g contain parameters, notationally suppressed until now, and we can study the
effect of letting one parameter, say , vary. Of course, if x is a constrained local
maximizer given parameter , and then the value of the parameter changes a small
amount to , then x may no longer be a constrained local maximizer, but assuming
the second order sufficient condition, the new constrained local maximizer will be
close to x, and its location will vary smoothly as we vary the parameter. Note that
the constraint qualification is reinstated in the next result.
Theorem 5.6 Let I be an open interval and X Rn , and let f : X I R and
g : Rn I R be twice continuously differentiable in an open neighborhood of (x , ),
an interior point of X I. Assume Dg(x , ) 6= 0. Assume x satisfies the constraint
g(x , ) = c and the first order condition at , i.e.,
Dx f (x , ) Dx g(x , ) = 0,
with multiplier R. If
t [D 2 f (x , ) D 2 g(x , )]t < 0
for all t with Dg(x , )t = 0, then there are an open set Y Rn with x Y , an open
interval J R with J, and continuously differentiable mappings : I Y and
: I R such that for all I, (i) () is the unique maximizer of f (, ) subject to
g(x, ) = c belonging to Y , (ii) the unique multiplier for which () satisfies the first
order necessary condition (1) at is (), and (iii) () satisfies the second order
sufficient condition (4) at with multiplier ().
The preceding result lays the theoretical groundwork necessary for studying the effect
of a parameter on the solution to a given optimization problem. This exercise is
referred to as comparative statics. For example, under the conditions of the preceding
theorem, we can take partial derivatives,
x1 x1 x1
,
,
, etc.,
p1 p2 I
that tell us how the consumers maximizer changes with respect to market parameters.
34
Example Recall the solution for the Cobb-Douglas consumers demands:

x1 (p1 , p2 , I) =
I
+ p1
and
x2 (p1 , p2 , I) =
I
+ p1
The conditions of Theorem 5.6 are satisfied here, and indeed we can directly compute
partial derivatives of demand for good 1 as
x1
I
(p1 , p2 , I) =
p1
+ p21
x1
(p1 , p2 , I) = 0
p2
x1
(p1 , p2 , I) =
.
I
p1 ( + )
Obviously, partial derivatives for good 2 are similar. Interesting features of CobbDouglas demands are that demand curves are downward-sloping and the demand for
any good is invariant with respect to price changes in other goods. Indeed, the share
of income spent on good 1 is always /( + ) and the share spent on good 2 is
/( + ).

Given the preceding result and the mapping , which specifies a constrained local
maximizer as a continuously differentiable function of the parameter I, the locally
maximized value
F () = f ((), )
is itself a continuously differentiable mapping. The next result is an extension of
the envelope theorem to equality constrained maximization problems; it provides a
simple technique for performing comparative statics on this maximized value function:
basically, we can take a simple partial derivative of the parameterized Lagrangian,
L(x, , ) = f (x, ) + (c g(x, )),
only through the argument. That is, although the location of the constrained local
maximizer may indeed change when we vary , we can ignore that variation, treating
the constrained local maximizer as fixed in taking the derivative.
g : Rn I R be twice continuously differentiable in an open neighborhood around
(x , ), an interior point of X I. Let : I Rn and : I R be continuously
differentiable mappings such that for all I, () is a constrained local maximizer
satisfying the first order condition (1) with multiplier () at . Let x = ( ) and
= ( ), and define the mapping F : I R by
F () = f ((), )
35

DF ( ) =
L
(x , , ).
Proof Note that

f
g
L
(x , , ) =
(x , ) (x , ),
and using the chain rule, we have

n
X
f
di
f
DF () =
((), )
() +
((), ).
x
d
i
i=1
Since ( ) = x , the result then follows if

n
X
g
f di
(x , )
( ) = (x , ).
xi
d
i=1
To verify the latter equality, we write G() = g((), ) and use the chain rule to
conclude
DG() =
n
X
g
di
g
((), )
() +
((), ) = 0,
x
d
i
i=1
where the second equality above follows since g((), ) takes a constant value of c
on I, so its derivative is zero. Then
n
X
g di
(x , ) =
(x , )
( ),
xi
d
i=1
g
so the result follows if

n
X
f
i=1

g di
(x , )
(x , )
( ) = 0,
xi
xi
d
which follows from the first order condition (1).
The previous analysis looks less general than it is, and in fact, it provides an intuitive
interpretation of the Lagrange multiplier. Although the parameter does not explicitly enter the value of the constraint, c, we can consider a simple linear specification,
g(x, ) = g(x) , so the Lagrangian becomes
L(x, ) = f (x) + (c + g(x)),
36
so by changing , we are effectively varying the value of the constraint, now c + .

By Theorem 5.7, the rate of change of the locally maximized value of the objective
function as we increase is
DF ( ) =
L
(x , , ) = .
That is, the value of the multiplier at a constrained local maximizer tells us the
marginal effect of increasing the value of the constraint.
Example In the consumers problem, given prices and income p1 , p2 , and I, let
x1 (p1 , p2 , I) and x2 (p1 , p2 , I) be demands satisfying the first order condition and second order sufficient condition. Then the consumers maximum utility is
U(p1 , p2 , I) = u(x1 (p1 , p2 , I), x2 (p1 , p2 , I)),
and the function U() is called the consumers indirect utility function. How does
this vary with respect to prices and income? Consider I. According to the envelope
theorem, we take the partial derivative of the Lagrangian,
u1 (x1 , x2 ) + (I p1 x1 p2 x2 ),
with respect to I, where x1 and x2 are fixed at their maximized values and is
the associated multiplier. Thats just ! Thus, we see that the Lagrange multiplier
measures the rate at which the consumers utility increases with her income, i.e., it
is the marginal utility of money. How does the consumers maximum utility vary
with the price p1 ? It is simply x1 .
5.4
Multiple Equality Constraints
More generally, a maximization problem may be subject to several constraints. Let

f : X R, g1 : Rn R, . . . , gm : Rn R. Consider
maxxX f (x)
s.t g1 (x) = c1
..
.
gm (x) = cm .
The results for problems with multiple equality constraints are very similar to the
case with one constraint.
Theorem 5.8 (Lagrange) Let X Rn , and assume f : X R, g1 : Rn R,
. . . , gm : Rn R are continuously differentiable in an open neighborhood around x,
37
interior to X. Assume the gradients of the constraints, {Dg1 (x), . . . , Dgm (x)}, are
linearly independent. If x is a local constrained maximizer of f subject to g1 (x) =
c1 , . . . , gm (x) = cm , then there are unique multipliers 1 , 2 , . . . , m R such that
Df (x) =
m
X
j Dgj (x).
(5)
j=1
Of course, the first order condition can be written in terms of partial derivatives:
m
X
f
gj
j
(x) =
(x)
xi
x
i
j=1
for all i = 1, . . . , n. Thus, it gives us n + m equations in n + m unknowns. If we

can solve for all of the solutions of this system, then we have an upper bound on
the interior constrained local maximizers. The numbers 1 , . . . , m are the Lagrange
multipliers corresponding to the constraints. The linear independence condition is
the general constraint qualification.
The main difference when we move to multiple equality constraints is the form of
the constraint qualification. Previously, we assumed only that Dg(x) 6= 0, and now
we assume that {Dg1 (x), . . . , Dgm (x)} is linearly independent, which means that no
gradient Dgi (x) can be written as a linear combination of the remaining ones, i.e.,
there do not exist scalars j for all j 6= i such that
X
Dgi(x) =
j Dgj (x).
j6=i
When there is one constraint (m = 1), the requirement is that Dg1 (x) 6= 0 (the same
as before); when there are two constraints (m = 2), the requirement is that the two
gradients, Dg1 (x) and Dg2 (x), are not collinear; when there are three constraints
(m = 3), the gradients of the constraints do not lie on the same plane.
As for the case of a single equality constraint, the constraint qualification is needed
for the result. To provide some geometric insight into the condition, consider the
case of two constraints in Figure 11. Here, x is a constrained local maximizer of f
(in fact, it is the unique element of the constraint set), but Df (x) cannot be written
as a linear combination of Dg1 (x) and Dg2 (x), which are linearly dependent.
Put differently, Lagranges theorem says that if x is an interior constrained local
maximizer, there exist 1 , . . . , m R such that (x, 1 , . . . m ) is a critical point of
the Lagrangian function L : X Rm R, now defined by
L(x, 1 , . . . , m ) = f (x) +
m
X
j=1
38
j (cj gj (x)).
x2
Dg1(x)
Df (x)
x
g 1 = c1
Dg2(x)
g 2 = c2
x1
Figure 11: Constraint qualification needed
The analysis of second order conditions and envelope theorems is very much the
same as with a single equality constraint. Indeed, the interpretation of the Lagrange
multipliers is the same: j tells us the rate at which the maximized value of f changes
if we increase cj in the j th constraint.
Example Consider the problem of a social planner in an exchange economy. There
are n consumers and K commodities. The social endowment (the amount in existence)
of good k is Wk . The planner has to decide on an allocation of the goods to consumers,
where xi = (xi1 , xi2 , . . . , xiK ) is the bundle for consumer i and x1k + x2k + + xnk = Wk
for each good k. Given continuously differentiable utility functions u1 , u2 , . . . , un
representing the preferences of consumers and non-negative weights 1 , . . . , n (not
all zero), the planner solves
max
i=1,...,n
xik R+ k=1,...,K
s.t.
n
X
n
X
i ui (xii , xi2 , . . . , xiK )
i=1
xik = Wk k = 1, 2, . . . , K.
i=1
This is a maximization problem subject to multiple equality constraints, one for each
commodity. The Lagrangian for the problem is
L(x1 , . . . , xn , ) =
n
X
i ui (xi1 , . . . , xiK ) +
i=1
K
X
k=1
39
k (Wk
n
X
i=1
xik ).
By Lagranges theorem, an interior constrained maximizer must satisfy

L
xik
n
X
ui i
(x , . . . , xiK ) = k
xik 1
i = 1, . . . , n, k = 1, . . . , K
= i
xik = Wk
k = 1, . . . , K.
i=1
Three observations can be made. First, these conditions imply

ui
(xi1 , . . . , xiK )
xk
ui
(xi1 , . . . , xiK )
x
k
.
That is, if we look at any is marginal rate of substitution between any goods k and
(measuring the value of k for i in terms of ), it is k . This is independent of i, so
the marginal rates of substitution of the consumers are equal. Indeed, recall that k
is the rate at which maximized social welfare increases with an increase in the total
amount of good k (and similarly for ). So k is the social value of good k in terms
of good , i.e., an extra unit of good k is worth roughly kl units of good . The first
order condition says that the planner must equate the individual values of the goods
to the social value. Second, the first order conditions imply
ui
(xi1 , . . . , xiK )
i x
k
= 1.
j xkj (xj1 , . . . , xjK )
The lefthand side is the marginal rate of substitution between is consumption of

good k and js consumption (measuring the value of is consumption in terms of j).
Interestingly, this is equal to one for all pairs of consumers and for all goods. Third,
rewriting the above formulation of the first order condition, we have
ui
(xi1 , . . . , xiK )
xk
uj
(xj1 , . . . , xjK )
xk
j
.
i
The lefthand side compares the increased utility consumer i would get from more
of good k to the increased utility consumer j would get. If it is high, is weight in
the welfare function must be low compared to js. Is this the opposite of what you
expected? If is weight werent relatively low, the planner would go ahead and give
more of good k, and that would raise social welfare but then the original allocation
couldnt have been optimal! To continue the example, recall the definition of a Walrasian equilibrium allocation (
x1 , . . . , xn ): there exist prices p1 , p2 , . . . , pK such that
1. for all i = 1, . . . , n, xi = (
xi1 , . . . , xiK ) solves
maxxi1 ,...,xiK 0 ui (xi1 , . . . , xiK )
i
s.t. p1 xi1 + + pK xiK p1 w1i + + p1 wK
.
40
2. for all k = 1, . . . , K, Wk =
Pn
i=1
xik .
Assuming each consumers solution is interior, xi = (

xi1 , . . . , xiK ) satisfies
ui i
(
x , . . . , xiK ) = i p1
xi1 1
..
.
ui i
(
x , . . . , xiK ) = i pK ,
xik 1
where i is the Lagrange multiplier for is problem. Now reconsider the planning
problem with welfare weights i = 1i (one over is marginal utility of money) for
each consumer:
max
i=1,...,n
xik R+ k=1,...,K
s.t.
n
X
n
X
i ui (xii , xi2 , . . . , xiK )
i=1
xik = Wk k = 1, 2, . . . , K.
i=1
The first order conditions of this problem imply

i
ui i
(x1 , . . . , xiK ) = k ,
xk
or equivalently,
ui i
(x1 , . . . , xiK ) = i k .
xk
Clearly, the Walrasian allocation (
x1 , . . . , xn ) satisfies the first order conditions with
multipliers k = pk . Adding concavity of consumer utilities (see Theorem 5.9), we can
conclude that the Walrasian allocation is indeed the social optimum given weights 1i ,
and then the multiplier pk represents the social value of good k.

Other results from the analysis of a single equality constraint carry over to the case of
multiple constraints. First, we note the implications of quasi-concave objective and
linear constraints. Again, the result follows from Theorem 6.2.
Theorem 5.9 Let X be open and convex, let f : X R be quasi-concave and
continuously differentiable, and let g1 : Rn R, . . . , gm : Rn R be be linear. If
g1 (x) = c1 , . . . , gm (x) = cm , and there exists 1 , . . . , m R such that the first order
condition (5) holds with respect to x, then x is a constrained global maximizer of f
subject to g1 (x) = c1 , . . . , gm (x) = cm provided either of two conditions holds:
1. Df (x) 6= 0, or
41
2. f is concave.
With quasi-concavity, a constrained strict local maximizer is the unique constrained
(global) maximizer.
g1 : Rn R, . . . , gm : Rn R be linear. If x X is a constrained strict local
maximizer, then it is the unique constrained global maximizer, i.e., it is the unique
maximizer of f subject to g1 (x) = c1 , . . . , gm (x) = cm .
Moving to the second order analysis, the necessary condition is again that the second
directional derivative of the Lagrangian be non-positive, now in every direction that
is orthogonal to the gradient of each constraint function.
Theorem 5.11 Let f : X R and g1 : Rn R, . . . , gm : Rn R be twice continuously differentiable in an open neighborhood around x, an interior point of X.
Assume the gradients {Dg1 (x), . . . , Dgm (x)} are linearly independent. Assume x is
a constrained local maximizer of f subject to g1 (x) = c1 , . . . , gm (x) = cm , and let
1 , . . . , m R satisfy the first order condition (5). Consider any direction t such
that Dgj (x)t = 0 for all j = 1, . . . , m. Then
"
#
m
X
2
2
j D gj (x) t 0.
t D f (x)
j=1
Again, strengthening the weak inequality to strict gives us a second order condition
that, in combination with the first order condition, is sufficient for a constrained strict
local maximizer. Note that, in contrast to the analysis of necessary conditions, the
next result does not rely on the constraint qualification.
Theorem 5.12 Let f : X R and g1 : Rn R, . . . , gm : Rn R be twice differentiable in an open neighborhood around x, an interior point of X. Assume x satisfies
the constraints g1 (x) = c1 , . . . , gm (x) = cm and the first order condition (5) with
multipliers 1 , . . . , m R. Assume that for all directions t with Dgj (x)t = 0 for all
j = 1, . . . , m, we have
"
#
m
X
j D 2 gj (x) t < 0.
(6)
t D 2 f (x)
j=1
Then x is a constrained strict local maximizer of f subject to g1 (x) = c1 , . . . , gm (x) =

cm .
42
As with a single constraint, we can consider the parameterized optimization problem

and can provide conditions under which a constrained local maximizer is a welldefined, smooth function of the parameter. We now reinstate the constraint qualification.
Theorem 5.13 Let I be an open interval and X Rn , and let f : X I R
and g1 : Rn I R, . . . , gm : Rn I R be twice continuously differentiable in
an open neighborhood of (x , ), an interior point of X I. Assume the gradients
{Dx g1 (x , ), . . . , Dx gm (x , )} are linearly independent. Assume x satisfies the
constraints g1 (x , ) = c1 , . . . , gm (x , ) = cm and the first order condition at ,
i.e.,
Dx f (x , )
m
X
j Dx gj (x , ) = 0,
j=1
with multipliers 1 , . . . , m R. Assume that for all t with Dx gj (x , )t = 0 for all

j = 1, . . . , m, we have
"
#
m
X
j Dx2 gj (x , ) t < 0.
t Dx2 f (x , )
j=1
Then there are an open set Y Rn with x Y , an open interval J R with J,

and continuously differentiable mappings : J Y , 1 : J R, . . . , m : J R
such that for all J, (i) () is the unique maximizer of f (, ) subject to g1 (x, ) =
c1 , . . . , gm (x, ) = cm belonging to Y , (ii) the unique multipliers for which () satisfies the first order condition (1) at are 1 (), . . . , m (), and (iii) () satisfies
the second order sufficient condition (6) at with multipliers 1 (), . . . , m ().
Fortunately, the statement of the envelope theorem carries over virtually unchanged.
g1 : Rn I R, . . . , gm : Rn I R be twice continuously differentiable in an open
neighborhood of (x , ), an interior point of X I. Let : I Rn and 1 : I R,
. . . , m : I R be continuously differentiable mappings such that for all I, () is
a constrained local maximizer satisfying the first order condition (5) with multipliers
1 (), . . . , m () at . Let x = ( ) and j = j ( ), j = 1, . . . , m, and define the
mapping F : I R by
F () = f ((), )
DF ( ) =
L
(x , , ).
43
Again, we can use the envelope theorem to characterize j as the marginal effect of
increasing the value of the jth constraint.
6
6.1
Inequality Constraints
First Order Analysis
We now consider maximization subject to multiple inequality constraints. Now, there

are different possible first order conditions for a constrained local maximizer, depending on which constraints are met with equality. Given x Rn , we say the j th
constraint is binding if gj (x) = cj , and if gj (x) < cj , then the constraint is slack.
Figure 12 illustrates a problem with two inequality constraints and depicts three possibilities, depending on whether none, one, or two constraints are binding. In the first
case, we could have a constrained local maximizer such as x, for which no constraints
bind. Such a vector must be a critical point of the objective function. In the second
case, we could have a single constraint binding at a constrained local maximizer such
as y, and here the gradients of the objective and constraint are collinear. Interestingly, these gradients actually point in the same direction. Lastly, we could have a
constrained local maximizer such as z, where both constraints bind. Here, the gradient of the objective is not collinear with the gradient of either constraint, and it may
appear that no gradient restriction is possible. But in fact, Df (z) can be written
as a linear combination of Dg1(z) and Dg2(z) with non-negative weights; and if the
picture were a three-dimensional picture of optimization over three variables, then we
would have drawn the three gradients lying on the same plane.
The restrictions evident in Figure 12 are formalized in the next theorem. Although
normally attributed to Kuhn and Tucker, the result was derived independently by
Karoush. Note that we now assume the domain of f is the entire Euclidean space
Rn . To capture maximization over a smaller domain X Rn , we would formalize X
in terms inequality constraints. For example, if we want the domain of the objective
function to be Rn+ , then we impose the non-negativity restrictions x1 0, . . . , xn 0
by adding them explicitly as inequality constraints: we specify g1 (x) = x1 , . . . ,
gm (x) = xm and c1 = = cm = 0.
Theorem 6.1 (Karoush-Kuhn-Tucker Theorem) Let f : Rn R, g1 : Rn
R,. . . , gm : Rn R be continuously differentiable in an open neighborhood around
x. Suppose the first k constraints are the binding ones at x, and assume the gradients
of the binding constraints, {Dg1(x), . . . , Dgk (x)}, are linearly independent. If x is a
constrained local maximizer of f subject to g1 (x) c1 , . . . , gm (x) cm , then there
44
x2
Dg1 (z)
Dg2 (z)
Df (z)
Dg1 (y)
Df (y)
y
Df (x) = 0
C
x
g 1 = c1
g 2 = c2
x1
Figure 12: Kuhn-Tucker conditions
are unique multipliers 1 , . . . m R such that
Df (x) =
m
X
j Dgj (x)
(7)
j = 1, . . . , m
j = 1, . . . , m.
(8)
(9)
j=1
j (cj gj (x)) = 0
j 0
Proof Under the conditions of the theorem, suppose x is a constrained local maximizer. By Gales (1960) Theorem 2.9, either there exist 0 , 1 , . . . , k 0 (not all
zero) such that
0 Df (x) +
k
X
j (Dgj (x)) = 0,
j=1
or there exists a direction t such that Df (x)t > 0 and Dgj (x)t > 0 for all j =
1, . . . , k. In the latter case, however, we can choose > 0 sufficiently small so that
f (x + t) > f (x) and gj (x + t) < gj (x) = cj for all j = 1, . . . , k, but then x is not
a constrained local maximizer, a contradiction. In the former case, note that linear
independence of {Dg1 (x), . . . , Dgk (x)} implies that 0 6= 0, and so we can define
45
j = j /0 , j = 1, . . . , k, to fulfill (7) and k+1 = = m = 0 to fulfill (8) and (9).

Again, linear independence implies that these coefficients are unique.

Geometrically, the first order condition from the Karush-Kuhn-Tucker theorem means
that the gradient of the objective function, Df (x), is contained in the semipositive cone generated by the gradients of binding
constraints, i.e., it is contained in the set
( m
)
X
Dg2 (x)
j Dgj (x) | 1 , . . . , m 0 ,
j=1
depicted to the right. The technology of the proof is

Dg1 (x)
essentially a form of the separating hyperplane theox
rem, but one known as a theorem of the alternative that is especially adapted for
problems exhibiting a linear structure. In turn, there are different versions of the
theorem of the alternative, depending on the types of inequalities involved. (Some
versions involve all strict inequalities, some all weak, etc.) We use Gales (1960) Theorem 2.9, which states that a vector y lies in the semi-positive cone of a collection
{a1 , . . . , ak } if and only if it is not the case that there exists a vector t such that
aj t > 0 for all j = 1, . . . , k.
In practical terms, the first order conditions (7) and (8) give us n + m equations in
n + m unknowns. If we can solve for all of the solutions of this system, then we have
an upper bound on the interior constrained local maximizers. Typically, one goes
through all combinations of binding constraints; given one set of binding constraints
meeting the constraint qualification, solve the problem as though it were just one of
multiple equality constraints. Furthermore, if any solutions involve j < 0, then the
non-negativity conditions (9) allow us to discard them as possible constrained local
maximizers. Typically, some combinations of binding constraints will be impossible,
so those can be skipped. After doing this for all possible combinations of binding
constraints, one hopefully has a small set of possible candidates for constrained local
maximizers satisfying the constraint qualification. The numbers 1 , . . . , m are still
referred to as Lagrange multipliers, and the linear independence condition is still
referred to as the constraint qualification.
As with equality constraints, we can define the Lagrangian L : Rn Rm R by
L(x, 1 , . . . , m ) = f (x) +
m
X
j=1
j (cj gj (x)),
and then condition (7) from Theorem 6.1 is the requirement that x is a critical point
of the Lagrangian given multipliers 1 , . . . , m .
An important difference from the case of equality constraints is that the constraint
qualification now holds only for the gradients of binding constraints. (With equality
46
constraints, every constraint is binding, but now some may not be.) Another important difference, touched on above, is that the multipliers are non-negative. Indeed,
the interpretation of j is as before it tells us the rate of change of the maximized value of the objective function as we increase constrained value cj of the jth
constraint but now only the inequality gj (x) cj needs to be maintained, so
increasing cj cant hurt, so the multipliers are non-negative. Yet another difference is
that the equality constraints gj (x) = cj have been replaced in (8) by the conditions
j (cj gj (x)), j = 1, . . . , m, which are called the complementary slackness conditions.
Put differently, complementary slackness says that
j > 0 cj gj (x) = 0
j = 0 cj gj (x) > 0.
In words, the multiplier of every slack constraint is zero, and every constraint with a
positive multiplier is binding.
Referring back to Figure 12, first consider a constrained local maximizer such as x.
Assuming the constraint qualification holds, the Karoush-Kuhn-Tucker theorem says
that the gradient of f at x can be written as
Df (x) = 1 Dg1 (x) + 2 Dg2 (x),
and since both constraints are slack, complementary slackness implies 1 = 2 = 0,
which gives us Df (x) = 0. At y, the second constraint is slack, so 2 = 0, and we
have Df (y) = 1 Dg1 (y) for 1 0, as depicted. At z, the first order condition (7)
implies that the gradient of the objective lies in the semi-positive cone generated by
the binding constraints, as depicted.
To see why the constraint qualification is needed, consider Figure 13, a simple adaptation of Figure 11. Again, x is a local constrained maximizer of f (in fact, it is the
unique element of the constraint set), but Df (x) cannot be written as a non-negative
linear combination of Dg1 (x) and Dg2 (x), because they are linearly dependent.
Example The consumers problem is most accurately formulated in terms of inequality constraints. We can now think of u defined on all R2 and impose non-negativity
constraints on the consumers choice. The problem is
max2
xR
u(x1 , x2 )
s.t. p1 x1 + p2 x2 I
x1 0
x2 0.
Defining g1 (x1 , x2 ) = p1 x1 + p2 x2 , g2 (x1 , x2 ) = x1 , and g3 (x1 , x2 ) = x2 , this is
a maximization problem subject to the inequality constraints (1) g1 (x1 , x2 ) I,
47
x2
Dg1(x)
Df (x)
x
g 1 c1
Dg2(x)
g 2 c2
x1
Figure 13: Constraint qualification needed
(2) g2 (x1 , x2 ) 0, and (3) g3 (x1 , x2 ) 0. Note the constraints cannot bind simultaneously. First, consider the possibility that only (2) binds, i.e., p1 x1 + p2 x2 < I,
x1 = 0, and x2 > 0. Note that Dg2 (x) = (1, 0) 6= 0, so the constraint qualification
is met. By complementary slackness, it follows that 1 = 3 = 0, so the first order
condition becomes
g2
u
(x1 , x2 ) = 2
(x1 , x2 ) = 2
x1
x1
g2
u
(x1 , x2 ) = 2
(x1 , x2 ) = 0
x2
x2
g2 (x1 , x2 ) = 0, 2 0,
but this is incompatible with monotonicity of u, so we discard this case. Similarly for
the case in which only (3) binds, the case in which (2) and (3) both bind, and the case
in which no constraints bind. Next, consider the case in which (1) and (2) bind, i.e.,
p1 x1 + p2 x2 = I, x1 = 0, x2 > 0. Note that Dg1(x) = (p1 , p2 ) and Dg2(x) = (1, 0)
are linearly independent, so the constraint qualification is met. Since x2 > 0, complementary slackness implies 3 = 0, so the first order conditions are
u
g1
g2
(x1 , x2 ) = 1
(x1 , x2 ) + 2
(x1 , x2 )
x1
x1
x1
u
g1
g2
(x1 , x2 ) = 1
(x1 , x2 ) + 2
(x1 , x2 )
x2
x2
x2
g1 (x1 , x2 ) = I,
g2 (x1 , x2 ) = 0, 1 , 2 0.
48
We substitute x1 = 0 into p1 x1 + p2 x2 = I to solve for x2 = pI2 , and we conclude that

the bundle (0, pI2 ) is one possible optimal bundle for the consumer. Similarly, when
(1) and (3) bind, we find the possible bundle ( pI1 , 0). Finally, we consider the case in
which only (1) binds. Then complementary slackness implies 2 = 3 = 0, and the
first order conditions are
u
g1
(x1 , x2 ) = 1
(x1 , x2 )
x1
x1
u
g1
(x1 , x2 ) = 1
(x1 , x2 )
x2
x2
g1 (x1 , x2 ) = I, 1 0.
I
I
and x2 = +
, and
When u is Cobb-Douglas, these equations yield x1 = +
p1
p2
checking the three possible solutions, youll see that this one indeed solves the consumers problem. Assume, instead, that the two goods are perfect substitutes, i.e.,
u(x1 , x2 ) = ax1 + bx2 with a, b > 0, and consider the case in which only (1) binds.
The first order conditions imply a = 1 p1 and b = 1 p2 , so this case is only possible
when the consumers marginal rate of substitution (measuring the value of good 1 in
terms of good 2) is equal to the relative price of good 1: ab = pp12 . Then every bundle
(x1 , x2 ) satisfying the budget constraint with equality yields utility

ap2 I p1 x1
aI
bI
I p1 x1
= ax1 +
=
=
,
ax1 + b
p2
p1
p2
p1
p2
so all such bundles are optimal. If the razors edge condition on marginal rates of substitution and relative prices does not hold, then either ab > pp12 or the opposite obtain,
and the only possible optimal bundles are the corner solutions. In the former case,

aI
bI
I
I
,
,0
=
>
= u 0,
u
p1
p1
p2
p2
so the consumer optimally spends all of his money on good 1, and in the remaining
case he spends everything on good 2.
6.2
Concave Programming
Like optimization subject to equality constraints, optimization problems subject to

inequality constraints are simplified under concavity conditions. In fact, such problems are even more amenable to this structure. We first extend our earlier results
for concave objective and linear constraints. First, we establish a general result that
implies our earlier results for quasi-concave objective and linear equality constraints.
Now, it is enough that constraints are quasi-concave: the full strength of linearity is
not needed for inequality constraints.
49
Theorem 6.2 Let f : Rn R be quasi-concave and continuously differentiable, and

let g1 : Rn R, . . . , gm : Rn R be quasi-convex. Suppose there exist 1 , . . . , m
R such that the first order condition (7)(9) holds with respect to x. Then x is a
constrained global maximizer of f subject to g1 (x) c1 , . . . , gm (x) cm provided
either of two conditions holds:
1. Df (x) 6= 0, or
2. f is concave.
Proof Note that either Df (x) 6= 0 or, under the assumptions of the theorem,
Df (x) = 0 and f is concave, which implies that x is an unconstrained (and therefore
a constrained) global maximizer. Thus, we consider the Df (x) 6= 0 case. Let y be any
element of the constraint set C, i.e., y satisfies gj (y) cj for j = 1, . . . , m, and let t =
1
(y x) be the direction pointing to the vector y from x. Given (0, 1], define
||yx||
z() = x + (y x) = (1 )x + y,
a convex combination of x and y. Note that gj (x) cj and gj (y) cj for each j, and
so by quasi-convexity, we have
gj (z()) max{gj (x), gj (y)} cj .
For each binding constraint j, we then have gj (z()) cj = gj (x), and therefore
gj (z()) gj (x)
0,
0
||y x||
Dt gj (x) = lim
and of course, for each slack constraint, we have j = 0. Combining these observations, we conclude
m
X
Dt f (x) =
j Dt gj (x) 0.
j=1
Now suppose in order to derive a contradiction that f (y) > f (x). Then there exists
(0, 1] such that
Dt f (z()) = Df (z())t > 0,
and by quasi-concavity of f , we have f (z()) f (x). See Figure 14 for a visual
depiction. By continuity of the dot product, there exists > 0 sufficiently small that
1
Df (z())(t Df (x)) > 0. Letting t = ||t+Df
(t Df (x)) point in the direction
(x)||
of the perturbed vector t Df (x), it follows that the derivative of f at z() in this
direction is positive, i.e., Dt f (z()) > 0. This means that for sufficiently small > 0,
we can define w = z()+t such that f (w) > f (z()) f (x). Given (0, 1], define
v() = x + (w x) = (1 )x + w,
50
a convex combination of x and w. Let s =

from x to w, and note that
1
(w
||wx||
x) be the direction pointing
f (v()) f (x)
0.
0
||w x||
Ds f (x) = lim
(10)
But we also have

Ds f (x) = Df (x)s
1
=
Df (x)[w x]
||w x||
1
Df (x)[z() + t x]
=
||w x||
1
=
Df (x)[(y x) + t ]
||w x||
Df (x)t
||w x||
Df (x)[t Df (x)]
=
||w x|| ||t + Df (x)||
[Df (x) Df (x)]

||w x|| ||t + Df (x)||
< 0,
where the first line follows from the definition of directional derivative, the second
from the definition of s, the third from the definition of w, the fourth from the definition of z(), the fifth from Df (x)t 0, the sixth from the definition of t , the
seventh from Df (x)t 0, and the final line follows from Df (x) 6= 0. This contradicts Ds f (x) 0 from (10), and we conclude that f (x) f (y), i.e., x is a constrained
global maximizer.

Returning to Theorems 5.2 and 5.9, note that a linear constraint g(x) = c can be
reformulated as two linear inequality constraints: g(x) c and g(x) c. Since
Theorem 6.2 does not rely on a constraint qualification, we can map the earlier
results to the framework of this section and apply the current theorem. The only
slight gap is that in the earlier results, we assumed an open, convex domain X Rn ,
rather than assuming f is defined on the entire Euclidean space, but that difference
is inconsequential.
As in the analysis of equality constraints, if f is quasi-concave and x is a constrained
strict local maximizer, then it is the unique global maximizer. A difference from
equality constraints is that we can allow the constraints to be quasi-convex, rather
than actually linear.
51
replacemen
x2
Df (x)
t
x
z()
y
t
w
level
set of f
x1
Figure 14: Proof of Theorem 6.2

g1 : Rn R, . . . , gm : Rn R be quasi-convex. If x X is a constrained strict
local maximizer, then it is the unique constrained global maximizer of f subject to
g1 (x) c1 , . . . , gm (x) cm .
We end this section with an analysis that is particular to inequality constraints. Under
a weak version of the constraint qualification, and with concave objective and convex
constraints, solutions to the constrained maximization problem can be re-cast as
unconstrained maximizers of the Lagrangian, with appropriately chosen multipliers.
Formally, writing = (1 , . . . , m ) for a vector of multipliers, we say (x , ) is a
saddlepoint of the Lagrangian if for all x Rn and all Rm
+,
L(x, ) L(x , ) L(x , ).
Pm
In words,
given
x
,
minimizes
j
j=1 j (cj gj (x )); and given , x maximizes
Pm
f (x) + j=1 j (cj gj (x)). Note that the maximization problem over x is unconstrained, but if (x , ) is a saddlepoint, then x will indeed satisfy gj (x ) cj for each
j; indeed, if cj gj (x ) < 0, then the term j (cj gj (x )) could be made arbitrarily
negative by choice of arbitrarily large j , so (x , ) could not be a saddlepoint.
Theorem 6.4 Let f : Rn R be concave, let g1 : Rn R, . . . , gm : Rn R be
convex, and let x Rn . If there exist 1 , . . . , m R such that (x , ) is a saddlepoint of the Lagrangian, then x is a global constrained maximizer of f subject to
g1 (x) c1 , . . . , gm (x) cm . Conversely, assume there is some x Rn such that
g1 (
x) < c1 , . . . , gm (
x) < cm . If x is a constrained local maximizer of f subject to
g1 (x) c1 , . . . , gm (x) cm , then there exist 1 , . . . , m R such that (x , ) is a
saddlepoint of the Lagrangian. Furthermore, if f, g1 , . . . , gm are differentiable at x,
then the first order condition (7)(9) holds.
52
The condition g1 (
x) < c1 , . . . , gm (
x) < cm is called Slaters condition. To gain an
intuition for the saddlepoint theorem and the need for Slaters condition, consider
Figure 15. Here, we consider maximizing a function of any number of variables, but
to illustrate the problem in a two-dimensional graph, we assume there is a single
inequality constraint, g(x) c. On the horizontal axis, we graph values of f (x) as x
varies over Rn , and on the vertical axis, we graph cg(x) as x varies over the Euclidean
space. When f is concave and g is convex (so c g(x) is concave), you can check
that the set {(f (x), c g(x)) | x Rn }, which is shaded in the figure, is convex. The
values (f (x), c g(x)) corresponding to vectors x satisfying the constraint g(x) c
lie above the horizontal axis, the darker shaded regions in the figure. The ordered
pairs (f (x ), c g(x )) corresponding to solutions of the constrained maximization
problem are indicated by the black dots.
Consider the problem of minimizing f (x ) + (c g(x )) with respect to , holding
x fixed. This simply means that at a saddlepoint, (i) if c g(x ) > 0, then = 0,
and (ii) if c g(x ) = 0, then can be any non-negative number. Figure 15 depicts
the first possibility in Panel (a) and the second possibility in Panels (b) and (c). Now
consider the problem of maximizing f (x) + (c g(x)) with respect to x, holding
fixed. Lets write the objective function as a dot product: (1, ) (f (x), c
g(x)). Viewed this way, we can understand the problem as choosing the ordered pair
(f (x), c g(x)) in the shaded region that maximizes the linear function with gradient
(1, ). This is depicted in Panels (a) and (b). The difference in the two panels is
that in (a), the constraint is not binding at the solution to the optimization problem
(so Df (x ) = g(x ) = 0), while in (b) it is (so may be positive).
The difference between Panels (b) and (c) is that Slaters condition is not satisfied in
the latter: there is no x such that g(x) < c; graphically, the shaded region does not
contain any points above the horizontal axis. The pair (f (x ), cg(x)) corresponding
to the solution of the maximization problem is indicated by the black dot; we then
must choose such that (f (x ), cg(x )) maximizes the linear function with gradient
(1, ). The difficulty is that for any finite , the pair (f (x ), c g(x )) does not
maximize the linear function; instead, the maximizing pair will correspond to a vector
x that violates the constraint, i.e., c g(x) < 0. To make (f (x ), c g(x )) the
maximizing pair, the gradient of the linear function must be pointing straight up,
which would correspond to something like infinite (whatever that would mean). In
other words, if Slaters condition is not satisfied, then there may be no way to choose
a multiplier to solve the saddlepoint problem.
Example For a formal example demonstrating the need for Slaters condition, let
n = 1, f (x) = x, m = 1, c1 = 0, and g(x) = x2 . The only point in R satisfying
g1 (x) 0 is x = 0, so this is trivially the constrained maximizer of f . But Df (0) = 1
and Dg1 (0) = 0, so there is no 0 such that Df (0) = Dg1 (0).
53
a)
c g(x)
(f (x ), c g(x ))
(1, 0)
= 0
f (x)
b)
c g(x)
(1, )
> 0
(f (x ), c g(x ))
c)
c g(x)
(1, )
f (x)
(1, )?
f (x)
(f (x ), c g(x ))
Figure 15: Saddlepoint theorem
54
Note that Slaters condition is implied by the usual constraint qualification.

Indeed, suppose the gradients of the constraints
Dg2 (x)
convex
{Dg1(x), . . . , Dgm(x)} are linearly independent; in
hull
particular, there do not exist non-negative coefficients 1 , . . . , m summing to one such that
P
m
j=1 j Dgj (x) = 0. In geometric terms, the zero
Dg1 (x)
0
vector does not belong to the convex hull of the set
p
of gradients. By the separating hyperplane theorem,
there is a direction p such that p Dgj (x) > 0 for
j = 1, . . . , m, and this means the derivative in direction p is negative for each constraint: Dp gj (x) < 0. Then we can choose > 0 sufficiently small that z = x p
satisfies gj (z) < gj (x) cj for j = 1, . . . , m, fulfilling Slaters condition. In fact, this
argument shows that we can fulfill Slaters condition using vectors arbitrarily close
to the constrained local maximizer.
6.3
Second Order Analysis
The second order analysis parallels that for multiple equality constraints, modified to
accommodate the different first order conditions. Again, the necessary condition is
that the second directional derivative of the Lagrangian be non-positive in a restricted
set of directions. A difference is that now the inequality must hold only for directions
orthogonal to the gradients of binding constraints.
Theorem 6.5 Let f : Rn R, g1 : Rn R, . . . , gm : Rn R be twice continuously differentiable in an open neighborhood around x. Suppose the first k constraints are the binding ones at x, and assume the gradients of the binding constraints,
{Dg1(x), . . . , Dgk (x)}, are linearly independent. Assume x is a constrained local maximizer of f subject to g1 (x) c1 , . . . , gm (x) cm , and let 1 , . . . , m R+ satisfy
the first order conditions (7)(9). Consider any direction t such that Dgj (x)t = 0 for
all binding constraints j = 1, . . . , k. Then
"
#
m
X
j D 2 gj (x) t 0.
t D 2 f (x)
j=1
Note that the range of directions for which the above inequality must hold is the
set of directions that are orthogonal to the gradients of binding constraints. One
might think it should hold as well for directions t such that Dgj (x)t 0 for all
j = 1, . . . , m, since any direction with Dgj (x)t < 0 is feasible for that constraint. In
fact, the stronger version of the condition (using the larger range of directions) is not
necessary.
Example Let n = 1, f (x) = ex , m = 1, g1 (x) = x, and c1 = 0. Clearly, x = 0
maximizes f subject to g1 (x) 0, and the first order condition Df (0) = 1 Dg1 (0)
55
holds with 1 = 1. Furthermore, the direction t = 1 satisfies Dg1 (0)t = 1 < 0.

Nevertheless,
D 2 f (0) 1 D 2 g1 (0) = 1 > 0,
violating the stronger version of the condition.
Again, strengthening the weak inequality to strict gives us a second order condition
that, in combination with the first order condition, is sufficient for a constrained strict
local maximizer. In contrast to the analysis of necessary conditions, the next result
does not rely on the constraint qualification.
Theorem 6.6 Let f : Rn R, g1 : Rn R, . . . , gm : Rn R be twice continuously
differentiable in an open neighborhood around x. Assume x satisfies the constraints
g1 (x) c1 , . . . , gm (x) cm and the first order condition with multipliers 1 , . . . , m
R+ satisfying (7)(9). Assume that for all directions t with Dgj (x)t 0 for all binding
constraints j = 1, . . . , k, we have
"
#
m
X
2
2
j D gj (x) t < 0.
(11)
t D f (x)
j=1
Then x is a constrained strict local maximizer of f subject to g1 (x) c1 , . . . , gm (x)

cm .
Note that, in contrast to Theorem 6.5, the range of directions over which the inequality
holds in Theorem 6.6 is now larger, also holding for directions in which binding
constraints are decreasing: it holds for all t such that Dgj (x)t 0 rather than
Dgj (x)t = 0. This subtlety does not arise in the analysis of equality constraints, and
the next example demonstrates that it plays a critical role.
Example Let n = 1, f (x) = x2 , m = 1, c1 = 0, and g1 (x) = x. Obviously,
x = 0 is not a local maximizer of f subject to g1 (x) 0, and the first order condition
from Theorem 6.1 holds with = 0. Nevertheless, it is vacuously true that for all
directions t such that Dg1 (0)t = 0, the inequality (11) holds.

As with equality constraints, we can consider the parameterized optimization problem
and can provide conditions under which a constrained local maximizer is a welldefined, smooth function of the parameter. As before, we reinstate the constraint
qualification. A change from the previous result is that we strengthen the first order
condition by assuming strict complementary slackness, which entails that j > 0 if
and only if gj (x) = cj . That is, whereas complementary slackness means gj (x) = cj
if j > 0, we now add the converse direction of this statement.
56
Theorem 6.7 Let I be an open interval, and let f : Rn I R and g1 : Rn I R,

. . . , gm : Rn I R be twice continuously differentiable in an open neighborhood of
(x , ). Assume x satisfies the constraints g1 (x , ) c1 , . . . , gm (x , ) cm ,
suppose the first k constraints are the binding ones at x , and assume the gradients
of the binding constraints, {Dx g1 (x , ), . . . , Dx gk (x , )}, are linearly independent.
Assume x satisfies the first order condition at , i.e.,
Dx f (x , ) =
m
X
j Dx gj (x , )
j=1
j (cj
gj (x , )) = 0
j 0
j = 1, . . . , m
j = 1, . . . , m,
with multipliers 1 , . . . , m R+ , and that strict complementary slackness holds, i.e.,

j > 0 if and only if j k. Assume that for all t with Dx gj (x , )t 0 for all
binding constraints j = 1, . . . , k, we have
"
#
m
X
2

2

j Dx gj (x , ) t < 0.
t Dx f (x , )
j=1
Then there are an open set Y Rn with x Y , and open interval J R with J,
and continuously differentiable mappings : J Y , 1 : J R, . . . , m : J R
such that for all J, (i) () is the unique maximizer of f (, ) subject to g1 (x, )
c1 , . . . , gm (x, ) cm belonging to Y , (ii) the unique multipliers for which ()
satisfies the first order necessary condition (1) with strict complementary slackness at
are 1 (), . . . , m (), and (iii) () satisfies the second order sufficient condition
(11) at with multipliers 1 (), . . . , m ().
Fortunately, the statement of the envelope theorem carries over virtually unchanged.
Theorem 6.8 Let I be an open interval, and let f : Rn I R and g1 : Rn I R,
. . . , gm : Rn I R be twice continuously differentiable in an open neighborhood of
(x , ). Let : I Rn and 1 : I R, . . . , m : I R be continuously differentiable
mappings such that for all I, () is a constrained local maximizer satisfying the
first order condition (7)(9) at with multipliers 1 (), . . . , m (). Let x = ( )
and j = j ( ), j = 1, . . . , m, and define the mapping F : I R by
F () = f ((), )
DF ( ) =
L
(x , , ).
57
Again, we can use the envelope theorem to characterize j as the marginal effect
of increasing the value of the jth constraint; with inequality constraints, of course,
this cannot diminish the maximized value of the objective, so the multipliers are
non-negative.
Pareto Optimality Revisited
We now return to the topic of characterizing Pareto optimal alternatives and explore
an alternative approach using the framework of constrained optimization. First, we
give a general characterization in terms of inequality constrained optimization. Second, we establish a necessary first order condition for Pareto optimality that adds a
rank condition on gradients of individual utilities to the assumptions of Theorem 3.9
to deduce strictly positive coefficients and provides an interpretation of the coefficients
in terms of shadow prices of utilities. Finally, we establish that with quasi-concave
utilities, the first order condition is actually sufficient for Pareto optimality as well.
This gives us a full characterization that, in comparison with Corollary 3.7, weakens the assumption of concavity to quasi-concavity but adds the rank condition on
gradients.
The next result is structure free, extending our earlier analysis by dropping all convexity, concavity, and differentiability conditions. It gives a full characterization: an
alternative is Pareto optimal if and only if it satisfies n different maximization problems (one for each individual) subject to inequality constraints. The proof follows
directly from definitions and is omitted.
Theorem 7.1 Let x A be an alternative, and let ui = ui (x) for all i. Then x is
Pareto optimal if and only if it solves
maxyX ui (y)
s.t. uj (y) uj , j = 1, . . . , i 1, i + 1, . . . , n
for all i.
Note that the sufficiency direction of Theorem 7.1 uses the fact that the alternative
x solves n constrained optimization problems, one for each individual. Figure 16
demonstrates that this feature is needed for the result: there, x maximizes u2 (y)
subject to u1 (y) u1 , but it is Pareto dominated by x . Obviously, x is Pareto
optimal, as it maximizes u1 (y) subject to u2 (y) u2 and it maximizes u2 (y) subject
to u1 (y) u1 .
Of course, we can use our analysis of maximization subject to multiple inequality constraints to draw implications of Theorem 7.1. Consider a Pareto optimal alternative
58
utility for 2
p (1, 12) (21 , 1)

(u1 (x), u2 (x))
u1
U
u1
(u1 (x ), u2(x ))
(u1 (x ), u2(x ))
u2
u2
utility
for 1
Figure 16: Pareto optimality without concavity

x, as in Figure 16, for which the constraint qualification holds for the optimization
problem corresponding to each i. In this context, note that all constraints are binding
by construction: uj (x) = uj for all j 6= i. Thus, the constraint qualification is that
the gradients
Du1(x), . . . , Dui1(x), Dui+1(x), . . . , Dun(x)
are linearly independent. One implication of the constraint qualification is that the
set of alternatives has dimension at least n 1. Furthermore, an implication of the
constraint qualification holding for all i is that all individuals gradients are non-zero
at x. When there are just two individuals, the qualification becomes Du2 (x) 6= 0 for
individual 1s problem and Du1 (x) 6= 0 for individual 2s problem, i.e., the condition
of non-zero gradients is necessary and sufficient for the constraint qualification.
Note that for each i, x is a constrained local maximizer of ui subject to uj (x) uj ,
j 6= i. Then the first order condition from Theorem 6.1 holds, as stated in the
next theorem, where we omit the complementary slackness conditions because all
constraints are binding.
Theorem 7.2 Assume A Rd , let x be an alternative interior to A, and assume each
ui : Rd R is continuously differentiable in an open neighborhood around x. Suppose
that for each i, the gradients {Duj (x) | j 6= i} are linearly independent. If x is Pareto
optimal, then for each i, there exist unique multipliers i1 , ii1 , ii+1 , . . . , in 0, j 6= i,
such that
X
Dui(x) =
ij Duj (x).
(12)
j6=i
59
Recall that the multiplier on a constraint has the interpretation of giving the rate of
change of the maximized objective function as we increase the value of the constraint.
In this context, the multiplier ij has a special meaning: it is the rate at which we
can increase i utility by taking utility away from individual j. Put differently, it is
the rate at which is utility would decrease if we increase js utility (holding all other
individuals at the constraint). Thus, it is the shadow price of utility for j in terms
of utility for i. Geometrically, viewed in Rd , the gradient Dui(x) of individual i lies
on the (n 1)-dimensional hyperplane spanned by the other individuals gradients.
Now recall the mapping u : X Rn defined by u(x) = (u1 (x), . . . , un (x)). Then
u(X) is the set of possible utility vectors, and the linear independence assumption in
Theorem 7.2 is equivalent to the requirement that the derivative of u at x, which is
the matrix
u1
u1
(x) x
(x)
x1
d
..
..
..
,
.
.
.
un
(x)
x1
un
(x)
xd
has rank n1. This means that there is a uniquely defined hyperplane that is tangent
to u(X) at the point u(x). When there are just two individuals, this implies there is
a unique tangent line at (u1 (x), u2 (x)), as in Figure 16. See Figure 7 for the case of
three individuals. This hyperplane has a normal vector p that is uniquely defined up
to a non-zero scalar. The first order condition (12) from Theorem 7.2 can be written
in matrix terms as
u1
1
(x) u
(x)
x1
xd

i
..
..
..
1 ii1 1 ii+1 in
= 0,
.
.
.
un
(x)
x1
un
(x)
xd
and we conclude that p is, up to a non-zero scalar, equal to the vector of multipliers
(with a coefficient of one for i) for individual is problem.
An implication of the above analysis is that the vectors (i1 , . . . , ii1 , 1, ii+1, . . . , in )
of multipliers corresponding to individuals i = 1, . . . , n are collinear. Indeed, they
are each normal to the tangent hyperplane at u(x), and the set of normal vectors is
one-dimensional, so the claim follows. The claim can also be verified mechanically by
multiplying both sides of
X
Dui(x) =
ij Duj (x)
j6=i
by
1
ij
and manipulating to obtain

Duj (x) =
X i
1
k
Du
(x)
Duk (x).
i
i
j
i
k6=i,j j
60
Since the multipliers for js problem are unique, this implies ji =

k 6= i, j, jk =
ik
,
ij
1
ij
and for all
as claimed.
Three interesting conclusions follow from these observations. First, the multipliers
from Theorem 7.2 are actually strictly positive. Second, the utility shadow prices for
any two individuals are reciprocal: we can transfer utility from j to i at rate ij , and
we can transfer utility from i to j at rate ji = 1i . Third, the relative prices of any
j
two individuals are independent of the problem we consider. To see this, consider any
two individuals h, i, and let j and k be any two individuals. Then from the analysis
in the preceding paragraph, we have
ij
hj /hi
hj
= h h = h.
ik
k /i
k
If, for example, it is twice as expensive, in terms of is utility, to increase js utility
as it is to increase ks utility, then it is also twice as expensive in terms of hs utility.
A finaland importantgeometric insight stems from the sign of the multipliers;
they are all non-negative, and at least one is strictly positive. Thus, the tangent
hyperplane to u(X) has a normal vector with all non-negative coordinates, at least
one positive. When there are just two individuals, this means that the utility frontier
is sloping downward at (u1 (x), u2 (x)), as in Figure 16, and the idea extends to a
general number of individuals, as in Figure 7. We conclude that at a Pareto optimal
alternative for which the constraint qualification is satisfies, the boundary of u(X) is
sloping downward, in a precise sense.
This is only a necessary condition, as Figure 17 illustrates: the boundary of u(X)
is downward sloping at (u1 (x), u2 (x)), but x is Pareto dominated by y. Although
conceptually possible, however, the anomaly depicted in the figure is precluded under
the typical assumption of quasi-concave utility. Recall that, by Theorem 6.2, the first
order condition is sufficient for a maximizer when the objective and constraints are
quasi-concave. With Theorem 7.1, this yields the following result.
Theorem 7.3 Assume A Rd is convex, let x A be an alternative, and assume
each ui : Rd R is continuously differentiable and quasi-concave. Suppose that for
each i, Dui(x) 6= 0 and there exist multipliers i1 , ii1 , ii+1 , . . . , in 0, j 6= i, such
that
X
Dui(x) =
ij Duj (x).
j6=i
Then x is Pareto optimal.
Thus, under quite general conditions, the first order condition (12) is necessary and
sufficient for Pareto optimality.
61
utility for 2
(u1 (y), u2(y))

p
(u1 (x), u2 (x))
u(X)
utility
for 1
Figure 17: Violation of Pareto optimality
Corollary 7.4 Assume A Rd is convex, let x be an alternative interior to A, and
assume each ui : Rd R is continuously differentiable and quasi-concave. Suppose
that for each i, the gradients {Duj (x) | j 6= i} are linearly independent. Then x is
Pareto optimal if and only if there exist strictly positive multipliers 1 , . . . , n > 0
such that
n
X
i Dui (x) = 0.
i=1
As discussed above, Pareto optimality implies strictly positive coefficients through

the rank condition, and then we select any i in Theorem 7.2 and manipulate (12)
to obtain the simpler first order condition in the above corollary. For the other
direction, obviously all gradients must be non-zero, and we can manipulate the first
order condition and set ij = j /i to fulfill the assumptions of Theorem 7.3.
Mixed Constraints
The goal of this section is simply to draw together results for equality constrained
and inequality constrained maximization into a general framework. Conceptually,
nothing new is added.
Let f : Rn R, g1 : Rn R, . . . , g : Rn R, and h1 : Rn R, . . . , hm : Rn R.
62
We consider the hybrid optimization problem

maxxRn f (x)
s.t gj (x) = cj , j = 1, . . . ,
hj (x) dj , j = 1, . . . , m,
incorporating any restrictions on the domain of f into the constraints.
The first order analysis extends from the previous sections. See Theorem 1 and
Corollary 3 of Fiacco and McCormick (1968).
Theorem 8.1 Let f : Rn R, g1 : Rn R,. . . , g : Rn R, h1 : Rn R, . . . ,
hm : Rn R be continuously differentiable in an open neighborhood around x. Suppose the first k inequality constraints are the binding ones at x, and assume the
gradients
{Dg1 (x), . . . , Dg (x), Dh1 (x), . . . , Dhk (x)}
are linearly independent. If x is a constrained local maximizer of f subject to g1 (x) =
c1 , . . . , g (x) = c and h1 (x) d1 , . . . , hm (x) dm , then there are unique multipliers
1 , . . . , 1 , . . . , m R such that
Df (x) =
j Dgj (x) +
j Dhj (x)
(13)
j=1
j=1
j (dj hj (x)) = 0
j 0
m
X
j = 1, . . . , m
j = 1, . . . , m.
(14)
(15)
As above, we can define the Lagrangian L : Rn R Rm R by

L(x, 1 , . . . , , 1 , . . . , m ) = f (x) +
X
j=1
j (cj gj (x)) +
m
X
j=1
j (dj hj (x)),
and condition (13) from Theorem 8.1 is then the requirement that x is a critical point
of the Lagrangian given multipliers 1 , . . . , , 1 , . . . , m .
Our results for quasi-concave objective functions with non-zero gradient go through
in the general setting, now with the assumption that all equality constraints are linear
and all inequality constraints are quasi-convex. Again, we rely on Theorem 6.2 for
the proof.
63
Theorem 8.2 Let f : Rn R be quasi-concave and continuously differentiable, let

g1 : Rn R, . . . , g : Rn R be linear, and let h1 : Rn R, . . . , hm : Rn R be
quasi-convex. Suppose there exist 1 , . . . , , 1 , . . . , m R such that the first order
condition (13)(15) holds with respect to x. Then x is a constrained global maximizer
of f subject to g1 (x) = c1 , . . . , g (x) = c and h1 (x) d1 , . . . , hm (x) dm provided
either of two conditions holds:
1. Df (x) 6= 0, or
2. f is concave.
With the above convexity conditions on the objective and constraints, if x is a constrained strict local maximizer, then it is the unique global maximizer.
Theorem 8.3 Let f : Rn R be quasi-concave, let g1 : Rn R, . . . , g : Rn R be
linear and h1 : Rn R, . . . , hm : Rn R be quasi-convex. If x Rn is a constrained
strict local maximizer, then it is the unique constrained global maximizer of f subject
to g1 (x) = c1 , . . . , g (x) = c and h1 (x) d1 , . . . , hm (x) dm .
Next, we have the standard second order necessary condition. See Theorems 2 and 3
in Fiacco and McCormick (1968).
hm : Rn R be twice continuously differentiable in an open neighborhood around x.
Suppose the first k inequality constraints are the binding ones at x, and assume the
gradients
{Dg1 (x), . . . , Dg (x), Dh1 (x), . . . , Dhk (x)}
are linearly independent. Assume x is a constrained local maximizer of f subject to
g1 (x) = c1 , . . . , g (x) = c and h1 (x) d1 , . . . , hm (x) dm , and let 1 , . . . ,
1 , . . . , m R satisfy the first order conditions (13)(15). Consider any direction t
such that Dgj (x)t = 0 for all j = 1, . . . , and Dhj (x)t = 0 for all j = 1, . . . , k. Then
"
#
m
X
X
j D 2 gj (x)
j D 2 hj (x) t 0.
t D 2 f (x)
j=1
j=1
Again, strengthening the weak inequality to strict yields the second order sufficient
condition for a local maximizer. See Theorem 4 of Fiacco and McCormick (1968).
64

hm : Rn R be twice continuously differentiable in an open neighborhood around x.
Assume x satisfies the constraints g1 (x) = c1 , . . . , g (x) = c and h1 (x) d1 , . . . ,
hm (x) dm and the first order condition with multipliers 1 , . . . , 1 , . . . , m R
satisfying (13)(15). Assume that for all directions t with Dgj (x)t = 0 for all j =
1, . . . , and Dhj (x)t 0 for all j = 1, . . . , k, we have
"
#
m
X
X
j D 2 gj (x)
j D 2 hj (x) t < 0.
(16)
t D 2 f (x)
j=1
j=1
Then x is a constrained strict local maximizer of f subject to g1 (x) = c1 , . . . , g (x) =

c and h1 (x) d1 , . . . , hm (x) dm .
Adding strict complementary slackness, we obtain conditions under which solutions
to mixed problems vary smoothly with respect to parameters. See Theorem 5.1 in
Fiacco and Ishizuka (1990).
Theorem 8.6 Let I be an open interval, and let f : Rn I R, g1 : Rn I R,. . . ,
g : Rn I R, h1 : Rn I R, . . . , hm : Rn I R be twice continuously differentiable in an open neighborhood around (x , ). Assume x satisfies the constraints
g1 (x , ) = c1 , . . . , g (x , ) = cm and h1 (x , ) d1 , . . . , hm (x , ) dm ,
suppose the first k inequality constraints are the binding ones at x, and assume the
gradients
{Dx g1 (x , ), . . . , Dx g (x , ), Dx h1 (x , ), . . . , Dx hk (x , )}
are linearly independent. Assume x satisfies the first order condition at , i.e.,
Dx f (x , ) =
j Dx gj (x , )
j=1
j (dj
hj (x , )) = 0
j 0
m
X
j Dx hj (x , )
j=1
j = 1, . . . , m
j = 1, . . . , m,
with multipliers 1 , . . . , , 1 , . . . , m R, and that strict complementary slackness

holds, i.e., j > 0 if and only if j k. Assume that for all directions t with
Dx gj (x , )t = 0 for all j = 1, . . . , and Dx hj (x , )t 0 for all j = 1, . . . , k,
we have
#
"
m
X
X
j Dx2 hj (x , ) t < 0.
j Dx2 gj (x , )
t Dx2 f (x , )
j=1
j=1
Then there are an open set Y Rn with x Y , and open interval J R with
J, and continuously differentiable mappings : J Y , 1 : J R, . . . , : J
65
R, 1 : J R, . . . , m : J R such that for all J, (i) () is the unique

maximizer of f (, ) subject to g1 (x, ) = c1 , . . . , gm (x, ) = c and h1 (x, ) d1 ,
. . . , hm (x, ) dm belonging to Y , (ii) the unique multipliers for which () satisfies
the first order necessary condition (1) with strict complementary slackness at are
1 (), . . . , (), 1 (), . . . , m (), and (iii) () satisfies the second order sufficient
condition (16) at with multipliers 1 (), . . . , (), 1 (), . . . , m ().
Next is our last version of the envelope theorem. See Theorem 4.1 in Fiacco and
Ishizuka (1990).
Theorem 8.7 Let I be an open interval, and let f : Rn I R, g1 : Rn I R,. . . ,
g : Rn I R, h1 : Rn I R, . . . , hm : Rn I R be twice continuously
differentiable in an open neighborhood around (x , ). Let : I Rn and 1 : I
R, . . . , : I R and 1 : I R, . . . , m : I R be continuously differentiable
mappings such that for all I, () is a constrained local maximizer satisfying the
first order condition (7)(9) at with multipliers 1 (), . . . , (), 1 (), . . . , m ().
Let x = ( ) and = (1 ( ), . . . , ( )) and = (1 ( ), . . . , m ( )). Define
the mapping F : I R by
F () = f ((), )
DF ( ) =
L
(x , , , ).
Finally, we can again use the envelope theorem to characterize j as the marginal
effect of increasing the value of the jth equality constraint; and j as the marginal
effect of increasing the value of the jth inequality constraint, which of course cannot
reduce the maximized value of the objective.
References
[1] A. Fiacco and Y.Ishizuka (1990) Sensitivity and Stability Analysis for Nonlinear
Programming, Annals of Operations Research, 27: 215236.
[2] A. Fiacco and G. McCormick (1968) Nonlinear Programming: Sequential Unconstrained Minimization Techniques, McLean, VA: Research Analysis Corporation.
[3] D. Gale (1960) The Theory of Linear Economic Models, Chicago, IL: University
of Chicago Press.
66
[4] C. Simon and L. Blume (1994) Mathematics for Economists, New York, NY:
Norton.
[5] R. Sundaram (1996) A First Course in Optimization Theory, New York, NY:
Cambridge University Press.
67

Notes On Optimization and Pareto Optimality

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes On Optimization and Pareto Optimality

Uploaded by

Copyright:

Available Formats

Notes on Optimization and Pareto Optimality

7 Pareto Optimality Revisited

Given X Rn , f : X R, and x X, we say x is a maximizer of f if f (x) =

Figure 1: First order condition

In matrix terms, assuming f is continuously differentiable, the necessary first order

and therefore Dt f (x + n t) 0. Furthermore, using Dt f (x) = 0, we have

In matrix terms, assuming f is twice continuously differentiable, the inequality in

Theorem 2.3 Let X Rn , let x X be interior to X, and let f : X R be twice

Then, letting yn = x + n (xn x), we have

lim sup{f (x) : x X and ||x|| = k},

Otherwise, we have x1 < a, which implies

f (x1 , x2 ) (x2 x1 )2 + x21 = ( k 2 a a)2 + a2 < c

is a convex combination of x and y. By concavity, we have

Taking limits, we have

objective function. Of course, if x is a local maximizer given parameter , and then

Given the parameterization of a local maximizer from the previous theorem, we

A set N = {1, 2, . . . , n} of individuals must choose from a set A of alternatives.

Figure 2: Pareto optimals with Euclidean preferences

Existence of Pareto Optimals

It is straightforward to provide a sufficient condition for Pareto optimality of an

then x is Pareto optimal.

Figure 3: More Pareto optimals

Theorem 3.3 Assume A Rd is convex and each ui is strictly quasi-concave. If

then it is Pareto optimal.

a contradiction. We conclude that x is Pareto optimal.

Figure 4: Utility imputations

Characterization with Concavity

(0, 1), there exists z U such that

Since A is convex, we have x = x + (1 )x A. By concavity of ui , we have

Proof Assume that x is Pareto optimal, and define the set

= {z Rn : z > (u1 (x ), . . . , un (x ))}

Figure 5: No strictly positive weights

Corollary 3.8 Assume A Rd is convex and each ui is concave. Furthermore,

Figure 6: Convex hull of ideal points

i.e., x is a convex combination of ideal points with weights

Characterization with Differentiability

so the tangent space has a normal vector (1 , . . . , n ) with non-negative coordinates.

The weights in Theorem 3.9 cannot be unique: if weights (1 , . . . , n ) fulfill the

A constrained maximization problem is one in which we search for a maximizer within

function f : X R, the problem is

As in the case of unconstrained optimization, we can derive restrictions on constrained

Figure 9: Not a constrained local maximizer

Example Consider X = R2 , f (x1 , x2 ) = x1 + x22 , g(x1 , x2 ) = x1 , and

Example A consumer purchases a bundle (x1 , x2 ) to maximize utility. His income is

Plug into p1 x1 + p2 x2 = I to get

so the unique critical point of the Lagrangian is

From Lagranges theorem, all interior local maximizers satisfy

and it is straightforward to deduce that the unique maximizer is x = (1 , . . . , n ).

The first order conditions for an interior local maximizer are

Second Order Analysis

That is, defining the function : I R by (z) = f (z, (z)), z = 0 is a local

again evaluating g at (z, (z)) and at z. These derivatives simplify considerably at

Then the necessary second order condition is

Example Recall the solution for the Cobb-Douglas consumers demands:

for all I. Then F is continuously differentiable and

Proof Note that

and using the chain rule, we have

Since ( ) = x , the result then follows if

so the result follows if

which follows from the first order condition (1).

so by changing , we are effectively varying the value of the constraint, now c + .

Multiple Equality Constraints

More generally, a maximization problem may be subject to several constraints. Let