MA225 - Differentiation

WMS
MA225
Differentiation
Revision Guide
Written by Alexander Burn
WMS
ii MA225 Differentiation
Contents
1 Continuity in Normed Vector Spaces 1
1.1 Norms, Metrics and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Open, Closed and Sequentially Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Linear Maps and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Differentiation of functions between normed vector spaces 4

2.1 Defining Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 The Fréchet derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Properties of Differentiable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Sufficient Conditions for Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Properties of the Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Higher Order Fréchet Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 The Implicit and Inverse Function Theorems 8

3.1 Partial Fréchet Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Implicit and Inverse Function Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Manifolds and Tangent Spaces 10

4.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Tangent Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Critical Point Theory 11

5.1 Constrained Optimization and Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . 11
Introduction
This revision guide for MA225 Differentiation has been designed as an aid to revision, not a substitute
for it. Differentiation is a fairly long course with lots of confusing definitions and big theorems; hopefully
this guide will help you make sense of how everything fits together. Only some proofs are included; the
inclusion of a particular proof is no indication of how likely it is to appear on the exam. However, the
exam format doesn’t change much from year to year, so the best way of revising is to do past exam
papers, as well as assignment questions, using this guide as a reference.
For further practice, T. W. Körner’s A Companion to Analysis is both an invaluable reference and a
source of literally hundreds of questions. For the more brave of heart, Rudin’s Principles of Mathematical
Analysis is a classic, unparalleled textbook, with some of the most challenging analysis questions known
to man.
Disclaimer: Use at your own risk. No guarantee is made that this revision guide is accurate or
complete, or that it will improve your exam performance. Use of this guide will increase entropy,
contributing to the heat death of the universe. Contains no GM ingredients. Your mileage may vary.
All your base are belong to us.
Authors
Written by A. Burn (a.burn@warwick.ac.uk).
Based upon lectures given by Timothy Sullivan in 2015 at the University of Warwick.
Some material, including all of chapter one, taken from previous editions by Dave McCormick, based
upon lectures given by Vassili Gelfreich in 2007.
Any corrections or improvements should be entered into our feedback form at http://tinyurl.com/WMSGuides
(alternatively email revision.guides@warwickmaths.org).
History
First Edition: December 10, 2015. Current Edition: January 24, 2016.
MA225 Differentiation 1
1 Continuity in Normed Vector Spaces

Most of the basic material on normed vector spaces was covered in MA244 Analysis III; we repeat it
here for reference purposes, and because it could easily come up in Differentiation as well.
1.1 Norms, Metrics and Continuity

Recall that a vector space over the real numbers R is a non-empty set of elements that can be added
together and multiplied by real numbers under certain axioms. To do analysis on vector spaces, we need
something like the absolute value function on R; such a function is called a norm:
Definition 1.1. A norm on a vector space V over R is a function k · k : V → R satisfying the following
conditions:
(i) positive-definiteness: kvk = 0 ⇐⇒ v = 0 ∈ V
(ii) non-negativity: for all v ∈ V , kvk ≥ 0;
(iii) homogeneity: for all λ ∈ R and v ∈ V , kλvk = |λ|kvk;
(iv) triangle inequality: for all v, w ∈ V , kv + wk ≤ kvk + kwk.
We say (V, k · k) is a normed vector space.
Proposition 1.2. Let x = (x1 , . . . , xn ) ∈ Rn . For any 1 ≤ p < ∞, k · kp defined by
 1/p
Xn
kxkp :=  |xj |p 
j=1
is a norm on Rn . The case p = 2 defines the Euclidean norm. Furthermore, k · k∞ as defined by

kxk∞ := max |xj | is also a norm on Rn ; we justify the notation by the fact that kxk∞ = lim kxkp .
1≤j≤n p→∞
If we want to do analysis on a space which is not necessarily a vector space, we need a metric:
Definition 1.3. A metric or distance on a set X is a function d : X × X → R satisfying the following
conditions:
(i) positive-definiteness: d(x, y) = 0 ⇐⇒ x = y;
(ii) non-negativity: for all x ∈ X, d(x, x) ≥ 0;
(iii) symmetry: for all x, y ∈ X, d(x, y) = d(y, x);
(iv) triangle inequality: for all x, y, z ∈ X, d(x, y) ≤ d(x, z) + d(z, y).
We say (X, d) is a metric space.
Proposition 1.4. If k · k : V → R is a norm on V , then d(x, y) := kx − yk is a metric on V .
Sometimes when we use different norms, they deliver the same answers when we ask questions of
convergence and continuity. If they do, we say they are equivalent:
Definition 1.5. Let V be a real vector space and let k · k0 and k · k00 be two norms on V . The two norms
k · k0 and k · k00 are said to be equivalent if there exist real constants λ1 , λ2 such that for all v ∈ V ,
λ1 kvk0 ≤ kvk00 ≤ λ2 kvk0 .
This defines an equivalence relation on the set of norms on V . We are only concerned with finite-
dimensional vector spaces, and in particular Rn . In this case1 what norm we use does not matter,
because:
1 In general it does matter; for instance, the sup norm and the L norm on the space of continuous functions C 0 ([a, b])
1
are not equivalent.
2 MA225 Differentiation
Theorem 1.6. On Rn , any two norms are equivalent.

Having a norm on a vector space allows us to do analysis in much the same way as we do on R,
simply by substituting the norm k · k for the absolute value function | · |.
Definition 1.7. A sequence of points xn ∈ Rn , n ∈ N, is said to converge to x ∈ Rn if kxn − xk → 0 as
a sequence in R. Equivalently, xn converges to x if
∀ε > 0, ∃ N = N (ε) ∈ N s.t. n ≥ N =⇒ kxn − xk < ε.
Definition 1.8. Given a subset X ⊂ Rn , a function f : X → Rp is said to be continuous at x0 ∈ X, if
∀ε > 0, ∃ δ = δx (ε) > 0 s.t. (x ∈ X and kx − x0 k < δ) =⇒ kf (x) − f (x0 )k < ε.
If f is continuous at each x ∈ X, we say that f is continuous.

Since any two norms on Rn are equivalent, the continuity of f at x0 does not depend on the choice
of norms in Rn and Rp . Furthermore, just as in R, we can show that f is continuous at x ∈ X iff given
any sequence (xn ) → x, we have f (xn ) → f (x).
Proposition 1.9. For X ⊂ Rn , if f : X → Rp is written as f (x) = (f1 (x), f2 (x), . . . , fp (x)), then f is
continuous at x0 ∈ X if and only if every component fi is continuous at x0 .
Examples 1.10. These are some important examples of continuous functions on Rn :
1. Addition and multiplication are continuous functions R2 → R defined by (x1 , x2 ) 7→ x1 + x2 and
(x1 , x2 ) 7→ x1 x2 .
2. Composition of continuous functions is continuous. Hence sums and products of continuous func-
tions are continuous.
3. Any linear map Rn → Rp is continuous; also any polynomial map is continuous.
1.2 Open, Closed and Sequentially Compact Sets

We do not do analysis at points, we do it on sets; in R these sets are usually intervals. The natural
generalisation of an open interval is an open ball :
Definition 1.11. In Rn with norm k · k, given x0 ∈ Rn and r > 0, the open ball with centre x0 and
radius r is defined by
B(x0 , r, k · k) := {x ∈ Rn | kx − x0 k < r}.
We normally omit the k · k and just write2 B(x0 , r).
In particular, an open interval (a, b) = B( a+b b−a
2 , 2 , | · |). Even more general than an open ball is an
open subset:
Definition 1.12. A subset U ⊂ Rn is said to be open if
∀x ∈ U, ∃ r = rx > 0 s.t. B(x, r) ⊂ U.
A subset Z ⊂ Rn is closed if Rn \ Z is open.

Lemma 1.13. Z ⊂ Rn is closed if and only if given a sequence (xn )∞
n=1 in Z which converges to some
point x ∈ V , we have x ∈ Z.
Proof. (=⇒) If x ∈ Rn \ Z which is open, then ∃ ε > 0 s.t. B(x, ε) ⊂ Rn \ Z and hence ky − xk < ε =⇒
y ∈ Rn \ Z and hence y ∈ / Z. This contradicts kxk − xk < ε for n ≥ N (ε), hence x ∈ Z.
(⇐=) Suppose Z is not closed, then Rn \ Z is not open. Then there exists x ∈ Rn \ Z s.t. ∀ε > 0,
B(x, ε) ∩ Z 6= ∅. Hence for each k ∈ N, B(x, k1 ) ∩ Z is nonempty, so pick xk ∈ B(x, k1 ) ∩ Z. Then
(xk )∞
k=1 is a sequence in Z but (xk ) → x ∈
/ Z.
2 In some texts the notation Br (x0 ) is used to refer to B(x0 , r).
The Bolzano–Weierstrass theorem states that every bounded sequence in R has a convergent subse-
quence. This is a very useful theorem, so we now search for its generalisation to Rn .
Definition 1.14. A set X is sequentially compact if every sequence in X has a convergent subsequence,
which converges to a point of X.
Theorem 1.15. A subset X ⊂ Rn is sequentially compact if and only if it is closed and bounded.
Proof. (=⇒) If (xk )∞ k=1 ⊂ X converges, then any subsequence must converge to the same limit. By
sequential compactness this limit lies in X, and hence by lemma 1.13 X is closed. Furthermore, if X is
not bounded, then ∀ k ∈ N, ∃ xk ∈ X s.t. kxk k > k. Hence any subsequence of (xk )∞ k=1 is not bounded,
and hence cannot converge, contradicting X being sequentially compact. Hence X is bounded.
j
(⇐=) Take any sequence (xk )∞ 1 n
k=1 in X and write xk = (xk , . . . , xk ) for xk ∈ R. As X is bounded, (xk )
is bounded, so by the Bolzano–Weierstrass theorem there is a subsequence such that the first component
(x1ki ) converges. Consider (xki ) and take a subsequence of this such that the second component converges.
Repeat n times; the last subsequence converges; since X is closed this limit lies in X.
Proposition 1.16. If X ⊂ Rn is sequentially compact and f : X → Rp is continuous, then f (X) is
sequentially compact.
Theorem 1.17 (Extreme Value Theorem). If X is sequentially compact and f : X → R is continuous,
then f is bounded and attains its bounds.
1.3 Linear Maps and Continuity

Linear maps are particularly important in many areas of mathematics, and they are vital in defining
differentials of functions of several variables. A useful technical notion is the operator norm of a linear
map Rn → Rp :
Definition 1.18. Let T : Rn → Rm be a linear map. The operator norm of T is defined by
kT xk
kT kop := sup kT xk = sup .
kxk=1 x∈Rn ,x6=0 kxk
It follows that for all x ∈ Rn , we have kT xk ≤ kT kop kxk.

Proposition 1.19. Let L(Rn , Rp ) denote the set of linear maps Rn → Rp . Then L(Rn , Rp ) is a vector
space and the operator norm k · kop is a norm on L(Rn , Rp ).
It was proved in MA244 Analysis III that a linear map is continuous if and only if its operator
norm is finite. For linear maps Rn → Rp , the operator norm is always finite, but in infinite-dimensional
spaces this is not always the case.
2 Differentiation of functions between normed vector spaces

2.1 Defining Differentiability
Recall that f : R → R is differentiable at x0 ∈ R if the derivative f 0 (x0 ) := limh→0 f (x0 +h)−f
h
(x0 )
exists.
We would like to generalise this to functions f : X → Y between normed vector spaces.
This derivative, df (x), should have the following properties:
• We would like df (x) to be a linear map between X and Y, i.e.
(df (x))(αu + βv) = αdf (x)(u) + βdf (x)(v).
• In addition, we would like the operation of differentiation f 7→ df (x) to be linear,
d(αf + βg)(x) = αdf (x) + βdg(x).
This is even more important than the previous bullet point, as this will be required to allow the
derivative to satisfy all of the usual algebraic rules for differentiation, such as the product rule,
quotient rule, sum rule, and so on.
• The derivative of compositions should satisfy the chain rule,
d(g ◦ f )(x) = dg(f (x)) ◦ df (x).
• If X = Rn and Y = Rm , then the matrix which represents the linear map df (x) should be the m × n
Jacobian matrix of partial derivatives of f at x, denoted Jf (x).
• Finally, we would like to be able to generalise many of the results from univariate calculus, proved
in earlier Analysis modules, such as the Mean Value Theorem, Taylor’s Theorem, etc.
ϕ(h)
Definition 2.1. Let ϕ : R → R. We say ϕ(h) = o(h) if lim = 0.
h h→0
kα(h)k
More generally, if α : Rn → Rp , we say α(h) = o(h) if lim = 0.
khk→0 khk
Proposition 2.2. A function f : R → R is differentiable at x0 ∈ R if and only if there exists a ∈ R such

that
f (x0 + h) = f (x0 ) + ah + ϕ(x0 , h)
with ϕ(x0 , h) = o(h). We say f 0 (x0 ) = a.
Before we consider functions f : X → Y, we first consider the special case of f : Rn → R.
Definition 2.3 (Partial derivatives). Let f : Rn → R, and let x = (x1 , . . . , xn ) ∈ Rn . Define
∂f f (x1 , . . . , xk−1 , xk + h, xk+1 , . . . , xn ) − f (x1 , . . . , xn )
(x1 , . . . , xn ) := lim
∂xk h→0 h
if the limit exists. Alternatively, if ei = (0, . . . , 0, 1, 0, . . . 0) is the ith standard basis vector of Rn , then
if the partial derivative exists,
∂f f (x + hek ) − f (x)
(x) = lim .
∂xk h→0 h
Theorem 2.4. The partial derivative satisfies the sum, product and quotient rules.
Unfortunately, it is possible to have a discontinuous function for which all the partial derivatives
exist! This will motivate us to define the Fréchet derivative later on.
For example, consider the function f : R2 → R, defined by
(
xy
2 2, if (x, y) 6= (0, 0)
f (x, y) := x +y
0, otherwise
This function is discontinuous at (0, 0), but the partial derivatives exist there.
More generally:
Definition 2.5 (Directional derivatives). Let U ⊆ X be an open set, and let f : U → Y. The directional
derivative of f at x0 ∈ U in direction v ∈ X is defined as
f (x + hv) − f (x)
Df (x; v) := lim ,
h→0 h
in K
if the limit exists.

Lemma 2.6. If a function f : Rn → R has directional derivatives in all directions v at x0 , then it has
all partial derivatives, and
∂f
(x) = Df (x; ek ),
∂xk
where ek is the standard basis element for Rn .
Definition 2.7 (Gâteaux derivative). Let X and Y be normed vector spaces, let U ⊆ X be open, let
x ∈ U , and let f : U → Y. If the directional derivative,
d f (x + hv) − f (x)
Df (x)(v) := f (x + hv) ≡ lim ∈ Y,

dh h=0 h→0 h
in K
exists for all directions v ∈ X, then f is said to be Gâteaux differentiable at x, and the function Df (x) :
X → Y is called the Gâteaux derivative of f at x.
We now generalise proposition 2.2 and use it to define differentiability f : X → Y.
2.2 The Fréchet derivative

Definition 2.8 (Fréchet derivative). For an open subset U ⊆ X, a map f : U → Y is differentiable at
x ∈ U if there exists a bounded linear map Ax : X → Y such that
f (x + v) − f (x) − Ax v
lim =0
v→0 kvkX
in X
If so, we call Ax the Fréchet derivative of f at x, and it is usually denoted by df (x) or dx f .

Sometimes the Fréchet derivative df (x) will simply be referred to as the differential of f at x.
Proposition 2.9. If the Fréchet derivative df (x) exists, it is unique.
Example 2.10. A linear map T : X → Y is differentiable at every point x ∈ X and dT (x) = T .
2.3 Properties of Differentiable Functions

Proposition 2.11. Given an open subset U ⊂ X, if f : U → Y is differentiable at x ∈ U then f is
continuous at x.
Proof. Let > 0. Since f is Fréchet differentiable at x, there exists a δ0 > 0 such that

kf (y) − f (x) − df (x)(y − x)kY ≤ ky − xkX whenever ky − xkX < δ0 .
2
Hence, for such y, the triangle inequality gives that
kf (y) − f (x)kY ≤ kf (y) − f (x) − df (x)(x − y)kY + kdf (x)(y − x)kY

≤ ky − xkX + kdf (x)kop ky − xkX
2
δ0
< + kdf (x)kop δ0 .
2
Now, let 0 < δ < min{1, δ0 } be small enough that kdf (x)kop < 2 , and then kf (y) − f (x)kY < whenever
ky − xkY .
Proposition 2.12. Given an open subset U ⊂ X, if f : U → Y is differentiable at x ∈ U , then there is

a (directional) derivative of f in every direction v at x for every v ∈ U , and Df (x; v) = df (x)(v).
Proposition 2.13. Given an open subset X ⊂ Rn , if f : X → R is differentiable at x0 ∈ X, then each
of the partial derivatives exists at x0 . Moreover, if h = (h1 , . . . , hn ),
∂f ∂f ∂f
df (x0 )(h) = (x0 )h1 + (x0 )h2 + · · · + (x0 )hn .
∂x1 ∂x2 ∂xn
Corollary 2.14. Given an open subset X ⊂ Rn , if f : X → Rp is differentiable at x0 ∈ X then, for
h = (h1 , . . . , hn ),  ∂f ∂f1 ∂f1

∂x1 (x0 )
1
∂x2 (x0 ) . . . ∂xn (x0 )  
 ∂f2 (x ) ∂f2 (x ) . . . ∂f2 (x ) h1
 
 ∂x1 0 ∂x2 0 ∂xn 0   h2 
df (x0 )(h) =    ..  .
  
.. .. .. ..  . 
. . . .

 
  hn
∂fp ∂fp ∂fp
∂x1 (x0 ) ∂x2 (x0 ) . . . ∂xn (x0 )

∂fi
The p × n matrix is called the Jacobian matrix 3 of f .
∂xj 1≤i≤p
1≤j≤n
The differential df (x0 ) : Rn → Rp is a linear map which is described by the Jacobian matrix. In the
0
case of f : R → R, the differential df (x0 )(h) = hf (x0 ), so in that case the Jacobian matrix is just the
1 × 1 matrix f 0 (x0 ) .
2.4 Sufficient Conditions for Differentiability

While all the conditions in the previous section were necessary consequences of differentiability, they are
not sufficient conditions to ensure differentiability:
Examples 2.15. 1. The function
( x1 x2
if (x1 , x2 ) 6= (0, 0)
2
f : R → R, f (x1 , x2 ) = x21 + x22
0 if (x1 , x2 ) = (0, 0)
has all partial derivatives on R2 , but f is not differentiable at 0.

2. The function
x21 x2

if (x1 , x2 ) 6= (0, 0)

2
f : R → R, f (x1 , x2 ) = x4 + x22
 1
0 if (x1 , x2 ) = (0, 0)
has all directional derivatives at every x0 ∈ R2 in every direction v ∈ R2 , but f is not differentiable
at 0.
However, if all the partial derivatives exist and are continuous, then this is enough to guarantee
differentiability:
∂f
Theorem 2.16. Given an open subset X ⊂ Rn , let f : X → R. If all partial derivatives ∂x j
(1 ≤ j ≤ n)
exist and are continuous in a neighbourhood of x ∈ X, then f is differentiable at x. Moreover, if
h = (h1 , . . . , hn ),
∂f ∂f ∂f
df (x0 )(h) = (x0 )h1 + (x0 )h2 + · · · + (x0 )hn .
∂x1 ∂x2 ∂xn
∂fi
Corollary 2.17. Given an open subset X ⊂ Rn , let f : X → Rp . If all partial derivatives ∂xj
(1 ≤ i ≤ p,
1 ≤ j ≤ n) exist and are continuous on X, then f is differentiable at every point x ∈ X and the map
x 7→ df (x) (as a map X → L(Rn , Rp )) is continuous.
3 There ∂(f1 , . . . , fp )
are many notations for the Jacobian; one of the most common is .
∂(x1 , . . . , xn )
2.5 Properties of the Differential

Proposition 2.18 (Linearity). Let U ⊂ X be open and let f, g : U → Y. If f and g are both differentiable
at a point x ∈ U , then for any α, β ∈ K, αf + βg is also differentiable at x and
d(αf + βg)(x) = αdf (x) + βdg(x).
Proposition 2.19 (Product and quotient rules). Let U ⊆ X be open and let f, g : U → K. If f and g
are differentiable at x ∈ U , then f g is differentiable at x and
d(f g)(x) = f (x)dg(x) + g(x)df (x).

f
Furthermore, if g(x) 6= 0, then g is differentiable at x and

f 1
d g (x) = [g(x)df (x) − f (x)dg(x)] .
g 2 (x)
Theorem 2.20 (Chain rule). Let U ⊂ X and V ⊂ Y be open, and let f : U → Y and g : V → Z be
such that f is Fréchet differentiable at x ∈ U , and g is Fréchet differentiable at y := f (x) ∈ V . Then
g ◦ f : U → Z is Fréchet differentiable at x and
d(g ◦ f )(x) = dg(f (x)) ◦ df (x) : X → Z.
Theorem 2.21 (Mean value theorem). Let U ⊆ X be open, and let f : U → Y be Fréchet differentiable
everywhere in U . Let x, v ∈ X be such that the line segment L := {x + tv | t ∈ [0, 1]} lies entirely in U .
Then "Z #
1
f (x + v) − f (x) = df (x + tv)dt (v).
0
Proof. Let h(t) := x + tv and let g(t) := f (h(t)) = f (x + tv), so that h : [0, 1] → X and g : [0, 1] → Y are
continuously differentiable. By the chain rule,
g 0 (t) = dg(t)(1) = df (x + tv) · dh(t)(1) = df (x + tv)(v).
Then
Z 1
g(1) − g(0) = g 0 (t)dt
0
Z 1
= df (x + tv)(v)dt
"0Z #
1
= df (x + tv)dt (v).
0
2.6 Higher Order Fréchet Derivatives

Definition 2.22. Let X and Y be vector spaces, and let p ∈ N. A p-multilinear map is a function
A : X × X × ... × X → Y
| {z }
p times
that is linear with respect to each of the p arguments individually. The vector space of p-multilinear
maps from X to Y will be denoted Lp (X; Y).
Definition 2.23 (Hessian matrix). The Hessian matrix is defined as follows:

 ∂2f ∂2f 2 
∂x21
(x0 ) ∂x1 ∂x2 (x0 ) . . . ∂x∂1 ∂x
f
n
(x0 )
 
 ∂2f ∂2f ∂2f

2  (x0 ) 2 (x0 ) . . . (x 0 )
∂ f  ∂x 2 ∂x 1 ∂x 2 ∂x 2 ∂x n 
Hf (x0 ) = =
 .
∂xi ∂xj 1≤i≤p  .. .. .. .. 
1≤j≤n  . . . .


 
2 2 2
∂ f ∂ f ∂ f
∂xn ∂x1 (x 0 ) ∂xn ∂x2 (x 0 ) . . . ∂x2 (x 0 )
n
The Hessian matrix represents the bilinear map corresponding to the second order Fréchet derivative.
If X and Y are normed vector spaces, then Lp (X; Y) denotes the vector space of bounded/continuous
multilinear maps. These are A ∈ Lp (X; Y) with finite norm
kAk := sup{kA(x1 , x2 , ..., xp )kY | kx1 kX ≤ 1, ..., kxp kX ≤ 1}.
We can now define higher order Fr’echet derivatives inductively using multilinear maps:
Definition 2.24. Let U ⊆ X be open, and let f : U → Y. Then f is p times Fréchet differentiable if
f is Fréchet differentiable at every x ∈ U and df : U → L(X; Y) is p − 1 times Fréchet differentiable. The
space of all such functions f : U → Y with continuous pth derivative dp f := d(d(p−1) f ) : U → Lp (X; Y)
is denoted by C p (U ; Y), and C ∞ (U ; Y) := ∩p≥1 C p (U ; Y) denotes the space of infinitely differentiable
(or smooth) functions.
We will now formulate Taylor’s Theorem for the Fréchet derivative:
Theorem 2.25 (Taylor’s Theorem). Let U ⊆ X be open, and let f : U → Y be of class C p . Let x ∈ U and
v ∈ X be such that the line segment L := {x + tv | t ∈ [0, 1]} ⊆ U . Then, with v (n) := (v, v, ..., v) ∈ Xn ,
1 1
f (x + v) = f (x) + df (x)v + d2 f (x)v (2) + ... + d(p−1) f (x)v (p−1) + Rp ,
2 (p − 1)!
where Rp is the remainder term, given by

"Z #
1
(1 − t)(p−1) p
Rp = d f (x + tv)dt v (p) .
0 (p − 1)!
3 The Implicit and Inverse Function Theorems

Informally, the implicit function theorem states that if x ∈ X and y ∈ Y satisfy an equation
g(x, y) = z
for some ”nice enough” function g : X × Y → Z, and for some fixed value z, then we may write y as a
function of x in a neighbourhood of some base point (x∗ , y ∗ ) ∈ g −1 (z). That is, there are open sets U
about x∗ , V about y∗, and a function ϕ : U → V such that
for (x, y) ∈ U × V, g(x, y) = z ⇐⇒ y = ϕ(x).
We say that y is an implicit function of x near (x∗ , y ∗ ). Whether this theorem can be applied will be
determined by the partial derivative of g with respect to y at (x∗ , y ∗ ).
Example 3.1. A simple example is to consider the function g(x, y) = y 2 − x2 . In the case g(x, y) = 1,
y can be written as an implicit function of x near to the points (0, 1) and (0, −1);
g(x, y) = y 2 − x2 = 1 =⇒ y 2 = x2 + 1,
√ √
so that y = x2 + 1 in a neighbourhood of (0, 1), and y = − x2 + 1 in a neighbourhood of (0, −1).
3.1 Partial Fréchet Derivatives

The partial Fréchet derivatives of f at the point (x∗ , y ∗ ) are the bounded linear maps ∂x f (x∗ , y ∗ )
and ∂y f (x∗ , y ∗ ), where ∂x f (x∗ , y ∗ ) is the linear map such that
f (x∗ + u, y ∗ ) − f (x∗ , y ∗ ) − ∂x f (x∗ , y ∗ )u

lim = 0,
u→0 kukX
in X
and ∂y f (x∗ , y ∗ ) is the linear map such that
f (x∗ , y ∗ + v) − f (x∗ , y ∗ ) − ∂y f (x∗ , y ∗ )v

lim = 0.
v→0 kvkY
in Y
3.2 Implicit and Inverse Function Theorems

Theorem 3.2 (Implicit Function Theorem). Let X, Y, Z be Banach spaces, and let W ⊆ X × Y be an
open set. Suppose that g : W → Z is class C 1 . Fix (x∗ , y ∗ ) ∈ W , and let z := g(x∗ , y ∗ ). If the partial
derivative ∂y g(x∗ , y ∗ ) : Y → Z is invertible, then there exit open sets U about x∗ and V about y ∗ , with
U × V ⊆ W , and a C 1 function ϕ : U → V such that
{(x, ϕ(x)) | x ∈ U } = {(x, y) ∈ U × V | g(x, y) = z}.
The Fréchet derivative of ϕ is given by

−1
dϕ(x) = − ∂y g(x, ϕ(x)) ◦ ∂x g(x, ϕ(x)) for all x ∈ U.
The implicit function theorem can now be applied to the example given in the introduction to this
section.
Example 3.3. Let g : X × Y → Z be defined by g(x, y) = y 2 − x2 , and let (x∗ , y ∗ ) := (0, 1) ∈ X × Y so

that z := g(0, 1) = 1. In a neighbourhood W of (0, 1), g is of class C 1 . The Jacobian matrix of f is given
by
Jf (x, y) = [−2x, 2y],
which is an invertible matrix providing that at least one of the columns is non-zero. At the point (0, 1),
the second column is non-zero, and so the partial derivative ∂y g(0, 1) is invertible. This proves that the
implicit function theorem applies, and that y can be written as a function of x in a neighbourhood of
(0, 1)
Theorem 3.4 (Inverse Function Theorem). Let X and Y be Banach spaces, let W ⊆ X be open, and let
x∗ ∈ U . If the Fréchet derivative df (x∗ ) : X → Y is invertible, then there exist open sets U and V with
x∗ ∈ U ⊆ W and y ∗ := f (x∗ ) ∈ V ⊆ Y

such that f U : U → V is invertible, with a ’local inverse’ g : V → U such that
g ◦ f = idU , f ◦ g = idV and dg(y ∗ ) = (df (x∗ ))−1 .
The Implicit Function Theorem is implied by the Inverse Function Theorem, and vice-versa, so once
one of the theorems has been proved, the other can be proved as an easy corollary.
Definition 3.5. If U ⊆ X and V ⊆ Y are open, and f : U → V is of class C k with a C k inverse

f −1 : V → U , then f is called a diffeomorphism of class C k .
4 Manifolds and Tangent Spaces
4.1 Manifolds
The implicit function theorem allows us to formulate the following definition of a manifold.
Definition 4.1. A subset M ⊆ RM is called an m-dimensional regularly embedded manifold of

smoothness C k if it can locally be expressed as the graph of a C k function of m variables.
In other words, for any p ∈ M, we can reorder components of RM so that we can write RM =
R × RM −m , and find open sets U ⊆ Rm and V ⊆ RM −m with p ∈ U × V , and a C k function ϕ : U → V
m
such that
M ∩ (U × V ) = graph(ϕ) := {(x, ϕ(x)) | x ∈ U }.
Definition 4.2. Let X and Y be normed vector spaces, U ⊆ X an open set, and let f : U → Y. Then
x ∈ U is a regular point of f if df (x) is a surjective linear map.
Furthermore, y ∈ f (U ) ⊆ Y is a regular value if every x ∈ f −1 (y) is a regular point.
Theorem 4.3 (Writing manifolds as level sets). M ⊆ RM is a C k m-dimensional manifold if and only
if it can locally be written as a pre-image f −1 (y), where y is a regular value of some C k RM −m -valued
function f .
Theorem 4.4 (Manifolds via local flattenings). M ⊆ RM is a C k m-dimensional manifold if and only if,
for every p ∈ M, there are open sets W ⊆ RM about p and X ⊆ RM about 0 and a C k diffeomorphism
φ : W → X such that
φ(M ∩ W ) = {(x1 , ..., xM ) ∈ X | xm+1 = ... = xM = 0}.
4.2 Tangent Spaces

Definition 4.5. Let M ⊆ RM be any subset, and let p ∈ M. A short curve in M based at p is a
continuously differentiable function γ : (−δ, δ) → M, with γ(0) = p.
Definition 4.6. Let M be an m-dimensional submanifold of RM , and let p ∈ M. A vector v ∈ RM is

called a tangent vector to M at p if there is a short curve in M based at p such that γ 0 (0) = v. The
tangent space to M at p is
Tp M := {v ∈ RM | vz is a tangent vector to M at p}
= {γ 0 (0) ∈ RM | γ : (−δ, δ) → M is C 1 and γ(0) = p}.
The tangent bundle of M is
T M := {(p, v) ∈ R2M | p ∈ M and v ∈ Tp M}.
Theorem 4.7. Let M ⊆ RM be a C k m-dimensional manifold, given near p ∈ M as the pre-image of a

regular value of some RM −m -valued C k function g. Then Tp M = ker dg(p), and has dimension m. That
is,
v ∈ Tp M ⇐⇒ dg(p)v = 0.
5 Critical Point Theory

In univariate calculus, if f : U → R is a differentiable function defined on an open set U ⊆ R, then the
locally extremal values of the function only occur at critical points x ∈ U , defined as those points for
which f 0 (x) = 0.
These results can be generalised to the multivariate case.
Definition 5.1. Let X and Y be normed vector spaces of dimension m and n, U ⊆ X an open set, and
let f : U → Y be differentiable. A point x ∈ U is called a critical point if rank df (x) < min{m, n}, and
its image f (x) is called a critical value.
Definition 5.2. Let X be a normed vector space, and let f : U → R be differentiable. A point x∗ is a
local minimiser for f , and f (x∗ ) is a local minimum, if x∗ ∈ V ⊆ U for some open set V , such that
f (x∗ ) ≤ f (x) for all x ∈ V.
Similarly, x∗ is a local maximiser for f , and f (x∗ ) is a local maximum, if x∗ ∈ V ⊆ U for some open
set V , such that
f (x∗ ) ≥ f (x) for all x ∈ V.
Theorem 5.3 (Fermat’s principle). Let X be a normed vector space, U ⊆ X an open set, and let
f : U → R. If f has a local extremum and is Fréchet differentiable at a point x∗ ∈ U , then x∗ is a critical
point for f .
Definition 5.4. Let X be a normed vector space, U ⊂ X an open set, and let f : U → Re . A critical
point x∗ ∈ U is called a saddle point if there exist short curves γ and σ in U based at x∗ such that
γ 0 (0) and σ 0 (0) are linearly independent, and such that t = 0 is a local minimiser of (f ◦ γ)(t), and a
local maximiser of (f ◦ σ)(t).
In univariate calculus, we can use second derivatives to determine the nature of a critical point. For
example, if U ⊆ R is an open set, and f : U → R has a critical point at x∗ ∈ U , then x∗ is a local
minimiser if f 00 (x∗ ) > 0, and a local maximiser if f 00 (x∗ ) < 0.
Again, these results can be generalised to multivariate calculus.
Definition 5.5. A bilinear map A : X × X → R is said to be positive definite if inf A(u, u) > 0. If
kuk=1
the inequality holds as ≥ rather than >, then A is said to be positive semi-definite. We define A to
be negative (semi-)definite in a similar way.
Theorem 5.6. Let U ⊆ X be open, and let f : U → R have a critical point at x∗ ∈ U . If d2 f (x∗ ) is
positive definite, then x∗ is a local minimiser of f , and if d2 f (x∗ ) is negative definite, then x∗ is a local
maximiser of f .
Definition 5.7. Let U ⊆ Rn be open. A critical point x∗ ∈ U of f ∈ C 2 (U ; R) is called non-degenerate

if all n eigenvalues of d2 f (x∗ ) are non-zero. Otherwise, it is called degenerate.
5.1 Constrained Optimization and Lagrange Multipliers

In practise, one often searches for extreme values of a function subject to some constraints, as opposed
to searching for the extreme values for a function over its entire domain.
Theorem 5.8 (Lagrange multiplier theorem). Let X and Y be real Banach spaces, let U ⊆ X be open,
and let f : U → R and g : U → Y be C 1 . Suppose that x ∈ U is a local extremiser of f in g −1 (v), and
a regular point of g. Then there exists a continuous linear functional λ : Y → R, called a Lagrange
multiplier, such that
df (x) = λ ◦ dg(x).
Example 5.9. Consider the unit sphere S2 := {(x, y, z) ∈ R3 | g(x, y, z) := x2 + y 2 + z 2 = 1}. Define
a height function f : R3 → R by f (x, y, z) := z, and consider the problem of maximising this height
function over the unit sphere S2 .
The Jacobian matrices are given by

Jf (x, y, z) = 0 0 1 and Jg(x, y, z) = 2x 2y 2z .
Clearly, as Jg has full rank everywhere on S2 , every point of the unit sphere is a regular point of g.
Thus, to extremise f over S2 = g −1 (0), we need to find a scalar λ ∈ R such that

0 0 1 = λ 2x 2y 2z .
From this, one can easily determine that z = ±1 and that λ = ± 21 . Simply evaluating f at these points
allows us to determine that f (0, 0, −1) = −1 is the minimum value of f over S2 , and that f (0, 0, 1) = 1
is the maximum value.

MA225 - Differentiation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MA225 - Differentiation

Uploaded by

Copyright:

Available Formats

WMS

Written by Alexander Burn

2 Differentiation of functions between normed vector spaces 4

3 The Implicit and Inverse Function Theorems 8

4 Manifolds and Tangent Spaces 10

5 Critical Point Theory 11

1 Continuity in Normed Vector Spaces

1.1 Norms, Metrics and Continuity

is a norm on Rn . The case p = 2 defines the Euclidean norm. Furthermore, k · k∞ as defined by

λ1 kvk0 ≤ kvk00 ≤ λ2 kvk0 .

Theorem 1.6. On Rn , any two norms are equivalent.

∀ε > 0, ∃ N = N (ε) ∈ N s.t. n ≥ N =⇒ kxn − xk < ε.

Definition 1.8. Given a subset X ⊂ Rn , a function f : X → Rp is said to be continuous at x0 ∈ X, if

∀ε > 0, ∃ δ = δx (ε) > 0 s.t. (x ∈ X and kx − x0 k < δ) =⇒ kf (x) − f (x0 )k < ε.

If f is continuous at each x ∈ X, we say that f is continuous.

1.2 Open, Closed and Sequentially Compact Sets

∀x ∈ U, ∃ r = rx > 0 s.t. B(x, r) ⊂ U.

A subset Z ⊂ Rn is closed if Rn \ Z is open.

1.3 Linear Maps and Continuity

It follows that for all x ∈ Rn , we have kT xk ≤ kT kop kxk.

2 Differentiation of functions between normed vector spaces

(df (x))(αu + βv) = αdf (x)(u) + βdf (x)(v).

• In addition, we would like the operation of differentiation f 7→ df (x) to be linear,

d(αf + βg)(x) = αdf (x) + βdg(x).

d(g ◦ f )(x) = dg(f (x)) ◦ df (x).

Proposition 2.2. A function f : R → R is differentiable at x0 ∈ R if and only if there exists a ∈ R such

if the limit exists.

2.2 The Fréchet derivative

If so, we call Ax the Fréchet derivative of f at x, and it is usually denoted by df (x) or dx f .

2.3 Properties of Differentiable Functions

Proposition 2.12. Given an open subset U ⊂ X, if f : U → Y is differentiable at x ∈ U , then there is

2.4 Sufficient Conditions for Differentiability

has all partial derivatives on R2 , but f is not differentiable at 0.

2.5 Properties of the Differential

d(αf + βg)(x) = αdf (x) + βdg(x).

d(f g)(x) = f (x)dg(x) + g(x)df (x).

d(g ◦ f )(x) = dg(f (x)) ◦ df (x) : X → Z.

g 0 (t) = dg(t)(1) = df (x + tv) · dh(t)(1) = df (x + tv)(v).

2.6 Higher Order Fréchet Derivatives

Definition 2.23 (Hessian matrix). The Hessian matrix is defined as follows:

kAk := sup{kA(x1 , x2 , ..., xp )kY | kx1 kX ≤ 1, ..., kxp kX ≤ 1}.

where Rp is the remainder term, given by

3 The Implicit and Inverse Function Theorems

for (x, y) ∈ U × V, g(x, y) = z ⇐⇒ y = ϕ(x).

3.1 Partial Fréchet Derivatives

f (x∗ + u, y ∗ ) − f (x∗ , y ∗ ) − ∂x f (x∗ , y ∗ )u

and ∂y f (x∗ , y ∗ ) is the linear map such that

f (x∗ , y ∗ + v) − f (x∗ , y ∗ ) − ∂y f (x∗ , y ∗ )v

3.2 Implicit and Inverse Function Theorems

{(x, ϕ(x)) | x ∈ U } = {(x, y) ∈ U × V | g(x, y) = z}.

The Fréchet derivative of ϕ is given by

Example 3.3. Let g : X × Y → Z be defined by g(x, y) = y 2 − x2 , and let (x∗ , y ∗ ) := (0, 1) ∈ X × Y so

g ◦ f = idU , f ◦ g = idV and dg(y ∗ ) = (df (x∗ ))−1 .

Definition 3.5. If U ⊆ X and V ⊆ Y are open, and f : U → V is of class C k with a C k inverse

4 Manifolds and Tangent Spaces

Definition 4.1. A subset M ⊆ RM is called an m-dimensional regularly embedded manifold of

φ(M ∩ W ) = {(x1 , ..., xM ) ∈ X | xm+1 = ... = xM = 0}.

4.2 Tangent Spaces

Definition 4.6. Let M be an m-dimensional submanifold of RM , and let p ∈ M. A vector v ∈ RM is

The tangent bundle of M is

T M := {(p, v) ∈ R2M | p ∈ M and v ∈ Tp M}.

Theorem 4.7. Let M ⊆ RM be a C k m-dimensional manifold, given near p ∈ M as the pre-image of a

5 Critical Point Theory

f (x∗ ) ≤ f (x) for all x ∈ V.