Professional Documents
Culture Documents
MA225
Differentiation
Revision Guide
WMS
ii MA225 Differentiation
Contents
1 Continuity in Normed Vector Spaces 1
1.1 Norms, Metrics and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Open, Closed and Sequentially Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Linear Maps and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Introduction
This revision guide for MA225 Differentiation has been designed as an aid to revision, not a substitute
for it. Differentiation is a fairly long course with lots of confusing definitions and big theorems; hopefully
this guide will help you make sense of how everything fits together. Only some proofs are included; the
inclusion of a particular proof is no indication of how likely it is to appear on the exam. However, the
exam format doesn’t change much from year to year, so the best way of revising is to do past exam
papers, as well as assignment questions, using this guide as a reference.
For further practice, T. W. Körner’s A Companion to Analysis is both an invaluable reference and a
source of literally hundreds of questions. For the more brave of heart, Rudin’s Principles of Mathematical
Analysis is a classic, unparalleled textbook, with some of the most challenging analysis questions known
to man.
Disclaimer: Use at your own risk. No guarantee is made that this revision guide is accurate or
complete, or that it will improve your exam performance. Use of this guide will increase entropy,
contributing to the heat death of the universe. Contains no GM ingredients. Your mileage may vary.
All your base are belong to us.
Authors
Written by A. Burn (a.burn@warwick.ac.uk).
Based upon lectures given by Timothy Sullivan in 2015 at the University of Warwick.
Some material, including all of chapter one, taken from previous editions by Dave McCormick, based
upon lectures given by Vassili Gelfreich in 2007.
Any corrections or improvements should be entered into our feedback form at http://tinyurl.com/WMSGuides
(alternatively email revision.guides@warwickmaths.org).
History
First Edition: December 10, 2015. Current Edition: January 24, 2016.
MA225 Differentiation 1
If we want to do analysis on a space which is not necessarily a vector space, we need a metric:
Definition 1.3. A metric or distance on a set X is a function d : X × X → R satisfying the following
conditions:
(i) positive-definiteness: d(x, y) = 0 ⇐⇒ x = y;
(ii) non-negativity: for all x ∈ X, d(x, x) ≥ 0;
(iii) symmetry: for all x, y ∈ X, d(x, y) = d(y, x);
(iv) triangle inequality: for all x, y, z ∈ X, d(x, y) ≤ d(x, z) + d(z, y).
We say (X, d) is a metric space.
Proposition 1.4. If k · k : V → R is a norm on V , then d(x, y) := kx − yk is a metric on V .
Sometimes when we use different norms, they deliver the same answers when we ask questions of
convergence and continuity. If they do, we say they are equivalent:
Definition 1.5. Let V be a real vector space and let k · k0 and k · k00 be two norms on V . The two norms
k · k0 and k · k00 are said to be equivalent if there exist real constants λ1 , λ2 such that for all v ∈ V ,
This defines an equivalence relation on the set of norms on V . We are only concerned with finite-
dimensional vector spaces, and in particular Rn . In this case1 what norm we use does not matter,
because:
1 In general it does matter; for instance, the sup norm and the L norm on the space of continuous functions C 0 ([a, b])
1
are not equivalent.
2 MA225 Differentiation
The Bolzano–Weierstrass theorem states that every bounded sequence in R has a convergent subse-
quence. This is a very useful theorem, so we now search for its generalisation to Rn .
Definition 1.14. A set X is sequentially compact if every sequence in X has a convergent subsequence,
which converges to a point of X.
Theorem 1.15. A subset X ⊂ Rn is sequentially compact if and only if it is closed and bounded.
Proof. (=⇒) If (xk )∞ k=1 ⊂ X converges, then any subsequence must converge to the same limit. By
sequential compactness this limit lies in X, and hence by lemma 1.13 X is closed. Furthermore, if X is
not bounded, then ∀ k ∈ N, ∃ xk ∈ X s.t. kxk k > k. Hence any subsequence of (xk )∞ k=1 is not bounded,
and hence cannot converge, contradicting X being sequentially compact. Hence X is bounded.
j
(⇐=) Take any sequence (xk )∞ 1 n
k=1 in X and write xk = (xk , . . . , xk ) for xk ∈ R. As X is bounded, (xk )
is bounded, so by the Bolzano–Weierstrass theorem there is a subsequence such that the first component
(x1ki ) converges. Consider (xki ) and take a subsequence of this such that the second component converges.
Repeat n times; the last subsequence converges; since X is closed this limit lies in X.
Proposition 1.16. If X ⊂ Rn is sequentially compact and f : X → Rp is continuous, then f (X) is
sequentially compact.
Theorem 1.17 (Extreme Value Theorem). If X is sequentially compact and f : X → R is continuous,
then f is bounded and attains its bounds.
kT xk
kT kop := sup kT xk = sup .
kxk=1 x∈Rn ,x6=0 kxk
It was proved in MA244 Analysis III that a linear map is continuous if and only if its operator
norm is finite. For linear maps Rn → Rp , the operator norm is always finite, but in infinite-dimensional
spaces this is not always the case.
4 MA225 Differentiation
This is even more important than the previous bullet point, as this will be required to allow the
derivative to satisfy all of the usual algebraic rules for differentiation, such as the product rule,
quotient rule, sum rule, and so on.
• The derivative of compositions should satisfy the chain rule,
• If X = Rn and Y = Rm , then the matrix which represents the linear map df (x) should be the m × n
Jacobian matrix of partial derivatives of f at x, denoted Jf (x).
• Finally, we would like to be able to generalise many of the results from univariate calculus, proved
in earlier Analysis modules, such as the Mean Value Theorem, Taylor’s Theorem, etc.
ϕ(h)
Definition 2.1. Let ϕ : R → R. We say ϕ(h) = o(h) if lim = 0.
h h→0
kα(h)k
More generally, if α : Rn → Rp , we say α(h) = o(h) if lim = 0.
khk→0 khk
This function is discontinuous at (0, 0), but the partial derivatives exist there.
MA225 Differentiation 5
More generally:
Definition 2.5 (Directional derivatives). Let U ⊆ X be an open set, and let f : U → Y. The directional
derivative of f at x0 ∈ U in direction v ∈ X is defined as
f (x + hv) − f (x)
Df (x; v) := lim ,
h→0 h
in K
exists for all directions v ∈ X, then f is said to be Gâteaux differentiable at x, and the function Df (x) :
X → Y is called the Gâteaux derivative of f at x.
We now generalise proposition 2.2 and use it to define differentiability f : X → Y.
The differential df (x0 ) : Rn → Rp is a linear map which is described by the Jacobian matrix. In the
0
case of f : R → R, the differential df (x0 )(h) = hf (x0 ), so in that case the Jacobian matrix is just the
1 × 1 matrix f 0 (x0 ) .
Proposition 2.19 (Product and quotient rules). Let U ⊆ X be open and let f, g : U → K. If f and g
are differentiable at x ∈ U , then f g is differentiable at x and
Theorem 2.20 (Chain rule). Let U ⊂ X and V ⊂ Y be open, and let f : U → Y and g : V → Z be
such that f is Fréchet differentiable at x ∈ U , and g is Fréchet differentiable at y := f (x) ∈ V . Then
g ◦ f : U → Z is Fréchet differentiable at x and
Theorem 2.21 (Mean value theorem). Let U ⊆ X be open, and let f : U → Y be Fréchet differentiable
everywhere in U . Let x, v ∈ X be such that the line segment L := {x + tv | t ∈ [0, 1]} lies entirely in U .
Then "Z #
1
f (x + v) − f (x) = df (x + tv)dt (v).
0
Proof. Let h(t) := x + tv and let g(t) := f (h(t)) = f (x + tv), so that h : [0, 1] → X and g : [0, 1] → Y are
continuously differentiable. By the chain rule,
Then
Z 1
g(1) − g(0) = g 0 (t)dt
0
Z 1
= df (x + tv)(v)dt
"0Z #
1
= df (x + tv)dt (v).
0
A : X × X × ... × X → Y
| {z }
p times
that is linear with respect to each of the p arguments individually. The vector space of p-multilinear
maps from X to Y will be denoted Lp (X; Y).
8 MA225 Differentiation
The Hessian matrix represents the bilinear map corresponding to the second order Fréchet derivative.
If X and Y are normed vector spaces, then Lp (X; Y) denotes the vector space of bounded/continuous
multilinear maps. These are A ∈ Lp (X; Y) with finite norm
We can now define higher order Fr’echet derivatives inductively using multilinear maps:
Definition 2.24. Let U ⊆ X be open, and let f : U → Y. Then f is p times Fréchet differentiable if
f is Fréchet differentiable at every x ∈ U and df : U → L(X; Y) is p − 1 times Fréchet differentiable. The
space of all such functions f : U → Y with continuous pth derivative dp f := d(d(p−1) f ) : U → Lp (X; Y)
is denoted by C p (U ; Y), and C ∞ (U ; Y) := ∩p≥1 C p (U ; Y) denotes the space of infinitely differentiable
(or smooth) functions.
We will now formulate Taylor’s Theorem for the Fréchet derivative:
Theorem 2.25 (Taylor’s Theorem). Let U ⊆ X be open, and let f : U → Y be of class C p . Let x ∈ U and
v ∈ X be such that the line segment L := {x + tv | t ∈ [0, 1]} ⊆ U . Then, with v (n) := (v, v, ..., v) ∈ Xn ,
1 1
f (x + v) = f (x) + df (x)v + d2 f (x)v (2) + ... + d(p−1) f (x)v (p−1) + Rp ,
2 (p − 1)!
g(x, y) = z
for some ”nice enough” function g : X × Y → Z, and for some fixed value z, then we may write y as a
function of x in a neighbourhood of some base point (x∗ , y ∗ ) ∈ g −1 (z). That is, there are open sets U
about x∗ , V about y∗, and a function ϕ : U → V such that
We say that y is an implicit function of x near (x∗ , y ∗ ). Whether this theorem can be applied will be
determined by the partial derivative of g with respect to y at (x∗ , y ∗ ).
Example 3.1. A simple example is to consider the function g(x, y) = y 2 − x2 . In the case g(x, y) = 1,
y can be written as an implicit function of x near to the points (0, 1) and (0, −1);
g(x, y) = y 2 − x2 = 1 =⇒ y 2 = x2 + 1,
√ √
so that y = x2 + 1 in a neighbourhood of (0, 1), and y = − x2 + 1 in a neighbourhood of (0, −1).
MA225 Differentiation 9
The implicit function theorem can now be applied to the example given in the introduction to this
section.
which is an invertible matrix providing that at least one of the columns is non-zero. At the point (0, 1),
the second column is non-zero, and so the partial derivative ∂y g(0, 1) is invertible. This proves that the
implicit function theorem applies, and that y can be written as a function of x in a neighbourhood of
(0, 1)
Theorem 3.4 (Inverse Function Theorem). Let X and Y be Banach spaces, let W ⊆ X be open, and let
x∗ ∈ U . If the Fréchet derivative df (x∗ ) : X → Y is invertible, then there exist open sets U and V with
x∗ ∈ U ⊆ W and y ∗ := f (x∗ ) ∈ V ⊆ Y
such that f U : U → V is invertible, with a ’local inverse’ g : V → U such that
The Implicit Function Theorem is implied by the Inverse Function Theorem, and vice-versa, so once
one of the theorems has been proved, the other can be proved as an easy corollary.
4.1 Manifolds
The implicit function theorem allows us to formulate the following definition of a manifold.
such that
M ∩ (U × V ) = graph(ϕ) := {(x, ϕ(x)) | x ∈ U }.
Definition 4.2. Let X and Y be normed vector spaces, U ⊆ X an open set, and let f : U → Y. Then
x ∈ U is a regular point of f if df (x) is a surjective linear map.
Furthermore, y ∈ f (U ) ⊆ Y is a regular value if every x ∈ f −1 (y) is a regular point.
Theorem 4.3 (Writing manifolds as level sets). M ⊆ RM is a C k m-dimensional manifold if and only
if it can locally be written as a pre-image f −1 (y), where y is a regular value of some C k RM −m -valued
function f .
Theorem 4.4 (Manifolds via local flattenings). M ⊆ RM is a C k m-dimensional manifold if and only if,
for every p ∈ M, there are open sets W ⊆ RM about p and X ⊆ RM about 0 and a C k diffeomorphism
φ : W → X such that
Tp M := {v ∈ RM | vz is a tangent vector to M at p}
= {γ 0 (0) ∈ RM | γ : (−δ, δ) → M is C 1 and γ(0) = p}.
Definition 5.1. Let X and Y be normed vector spaces of dimension m and n, U ⊆ X an open set, and
let f : U → Y be differentiable. A point x ∈ U is called a critical point if rank df (x) < min{m, n}, and
its image f (x) is called a critical value.
Definition 5.2. Let X be a normed vector space, and let f : U → R be differentiable. A point x∗ is a
local minimiser for f , and f (x∗ ) is a local minimum, if x∗ ∈ V ⊆ U for some open set V , such that
Similarly, x∗ is a local maximiser for f , and f (x∗ ) is a local maximum, if x∗ ∈ V ⊆ U for some open
set V , such that
f (x∗ ) ≥ f (x) for all x ∈ V.
Theorem 5.3 (Fermat’s principle). Let X be a normed vector space, U ⊆ X an open set, and let
f : U → R. If f has a local extremum and is Fréchet differentiable at a point x∗ ∈ U , then x∗ is a critical
point for f .
Definition 5.4. Let X be a normed vector space, U ⊂ X an open set, and let f : U → Re . A critical
point x∗ ∈ U is called a saddle point if there exist short curves γ and σ in U based at x∗ such that
γ 0 (0) and σ 0 (0) are linearly independent, and such that t = 0 is a local minimiser of (f ◦ γ)(t), and a
local maximiser of (f ◦ σ)(t).
In univariate calculus, we can use second derivatives to determine the nature of a critical point. For
example, if U ⊆ R is an open set, and f : U → R has a critical point at x∗ ∈ U , then x∗ is a local
minimiser if f 00 (x∗ ) > 0, and a local maximiser if f 00 (x∗ ) < 0.
Again, these results can be generalised to multivariate calculus.
Definition 5.5. A bilinear map A : X × X → R is said to be positive definite if inf A(u, u) > 0. If
kuk=1
the inequality holds as ≥ rather than >, then A is said to be positive semi-definite. We define A to
be negative (semi-)definite in a similar way.
Theorem 5.6. Let U ⊆ X be open, and let f : U → R have a critical point at x∗ ∈ U . If d2 f (x∗ ) is
positive definite, then x∗ is a local minimiser of f , and if d2 f (x∗ ) is negative definite, then x∗ is a local
maximiser of f .
Theorem 5.8 (Lagrange multiplier theorem). Let X and Y be real Banach spaces, let U ⊆ X be open,
and let f : U → R and g : U → Y be C 1 . Suppose that x ∈ U is a local extremiser of f in g −1 (v), and
a regular point of g. Then there exists a continuous linear functional λ : Y → R, called a Lagrange
multiplier, such that
df (x) = λ ◦ dg(x).
Example 5.9. Consider the unit sphere S2 := {(x, y, z) ∈ R3 | g(x, y, z) := x2 + y 2 + z 2 = 1}. Define
a height function f : R3 → R by f (x, y, z) := z, and consider the problem of maximising this height
function over the unit sphere S2 .
12 MA225 Differentiation
Clearly, as Jg has full rank everywhere on S2 , every point of the unit sphere is a regular point of g.
Thus, to extremise f over S2 = g −1 (0), we need to find a scalar λ ∈ R such that
0 0 1 = λ 2x 2y 2z .
From this, one can easily determine that z = ±1 and that λ = ± 21 . Simply evaluating f at these points
allows us to determine that f (0, 0, −1) = −1 is the minimum value of f over S2 , and that f (0, 0, 1) = 1
is the maximum value.