You are on page 1of 4

Three important theorems in advanced calculus

Sean Fitzpatrick
February 25, 2013
Source: Marsden, J. E. and Tromba, A. J., Vector Calculus, 4
th
ed. W. H. Freeman and
Company, New York, 1996.
We state three theorems of theoretical importance in multivariable calculus: the chain
rule, the implicit function theorem, and the inverse function theorem. The rst two are
mentioned in Stewarts text, but not in their most general form, and the last is not mentioned
at all. Well give a general proof of the chain rule, and state the implicit and inverse function
theorems. (The proofs can be found in more advanced texts on real analysis.)
1 The Chain Rule
For a general function f : U R
n
R
m
, the derivative Df(x
0
) at a point x
0
U is the
m n matrix whose entries are given by the partial derivatives of f. That is, if
f(x
1
, . . . , x
n
) = (f
1
(x
1
, . . . , x
n
), . . . , f
m
(x
1
, . . . , x
n
)),
where each of the component functions f
i
is a real-valued function of the variables x
1
, . . . , x
n
,
then the entry in the i
th
row and j
th
column of Df(x
0
) is
f
i
x
j
(x
0
). The chain rule tells us
that the derivative of a composition is given by the product of the derivatives, just as for
the case of single-variable functions. First, we need a fact about linear functions:
Lemma 1.1. Let T : R
n
R
m
be a linear function given by T(x) = A x, where A = [a
ij
]
is an m n matrix. Then T is continuous, and in particular, |T(x)| M|x|, where
M =

m
i=1

n
j=1
a
2
ij
.
Proof. The components of T are given by T
i
(x) =

n
j=1
a
ij
x
j
= a
i
x, where a
i
= a
i1
, . . . , a
in
.
Thus,
|T(x)| =

(T
1
(x)
2
+ + T
m
(x)
2
=

[a
1
x[
2
+ + [a
m
x[
2

|a
1
|
2
|x|
2
+ |a
m
|
2
|x|
2
(Cauchy-Schwartz inequality)
=

(|a
1
|
2
+ |a
m
|
2
)|x|
2
= M|x|.
1
Theorem 1.2 (Chain Rule). Let U R
n
and V R
m
be open. Let g : U R
m
and
f : V R
p
be given functions such that the range of g is contained in V , so that f g is
dened. If g is dierentiable at x
0
U and f is dierentiable at y
0
= g(x
0
) V , then f g
is dierentiable at x
0
and
D(f g)(x
0
) = Df(g(x
0
))Dg(x
0
). (1)
Proof. Using the denition of dierentiability, we need to prove that the right-hand side of
(1) denes a linear function from R
n
to R
p
such that
lim
xx
0
|f(g(x)) f(g(x
0
)) Df(y
0
)Dg(x
0
) (x x
0
)|
|x x
0
|
= 0.
The result then follows from the uniqueness of the derivative. By adding and subtracting
Df(y
0
) (g(x) g(x
0
)) in the numerator and applying the triangle inequality, we get, with
y = g(x) and y
0
= g(x
0
),
|f(y) f(y
0
) Df(y
0
)Dg(x
0
) (x x
0
)| |f(y) f(y
0
) Df(y
0
) (y y
0
)|
+ |Df(y
0
) (g(x) g(x
0
) Dg(x
0
) (x x
0
))|.
Let > 0 be given. According to the lemma above |Df(y
0
) v| M|v| for any v R
m
,
for some constant M > 0. We will apply this for v = g(x) g(x
0
) Dg(x
0
) (xx
0
). Since
g is dierentiable at x
0
, we can nd a
1
> 0 such that 0 < |x x
0
| <
1
implies
|g(x) g(x
0
) Dg(x
0
) (x x
0
)|
|x x
0
|
<

2M
.
Also since g is dierentiable at x
0
, we can nd a
2
> 0 and a constant N such that
0 < |x x
0
| <
2
implies |g(x) g(x
0
)| N|x x
0
|. Since f is dierentiable at
y
0
= g(x
0
), we can nd a
3
> 0 such that 0 < |y y
0
| <
3
implies that
|f(y) f(y
0
) Df(y
0
) (y y
0
)|

2N
|y y
0
| =

2N
|g(x) g(x
0
)| <

2
|x x
0
|,
provided that |x x
0
| < min
2
,
3
/N. Thus if we let = min
1
,
2
,
3
/N, we have
|f(g(x)) f(g(x
0
)) Df(g(x
0
))Dg(x
0
) (x x
0
)|
|x x
0
|
<

2
+ M

2M
= .
2 The Implicit and Inverse Function Theorems
Recall that for a level curve g(x, y) = c, we can solve for y as a function of x locally near a
given point (x
0
, y
0
) on the curve, provided that the curve has a well-dened tangent line at
that point, and that tangent line is not vertical. Notice that nding y

= dy/dx by implicit
dierentiation is the same as nding y

via the relationship


g
x
+
g
y
dy
dx
= 0.
2
Thus, we can solve for y

provided
g
y
(x
0
, y
0
) ,= 0.
We will rst state a special case that will be useful for dealing with level surfaces in R
n
before stating the general result.
Theorem 2.1. Suppose F : R
n+1
R is continuously dierentiable. Denote points in R
n+1
by (x, z), where x R
n
and z R. If at a point (x
0
, z
0
) R
n+1
we have
F(x
0
, z) = 0 and
F
z
(x
0
, z
0
) ,= 0,
then there is a ball U containing x
0
in R
n
and a interval (a, b) containing z in R such
that there is a unique function z = g(x) dened for x U and z (a, b) that satises
F(x, g(x)) = 0. Moreover, if x U and z (a, b) satisfy F(x, z) = 0, then z = g(x).
Finally, z = g(x) is continuously dierentiable, with the derivative given by
Dg(x) =
1
F
z
(x, z)
D
x
F(x, z)[
z=g(x)
,
where D
x
F denotes the matrix of partial derivatives of F with respect to the variables
x
1
, . . . , x
n
. Equivalently, we have
g
x
i
=
F/x
i
F/z
, i = 1, . . . n.
Note that the theorem essentially tells us when we can solve the equation F(x, z) for z
in terms of x. More generally, suppose we are given a system of equations of the form
F
1
(x
1
, . . . x
n
, z
1
, . . . , z
m
) = 0
F
2
(x
1
, . . . x
n
, z
1
, . . . , z
m
) = 0
.
.
.
.
.
.
F
m
(x
1
, . . . x
n
, z
1
, . . . , z
m
) = 0.
The general implicit function theorem tells us that we can solve the system for the z
i
in
terms of the x
j
, giving z
i
= f
i
(x
1
, . . . x
n
) for unique smooth functions f
1
, . . . , f
m
, provided
that the determinant of the m m matrix

F
1
z
1

F
1
zm
.
.
.
.
.
.
Fm
z
1

Fm
zm

is non-zero. A special case of the general implicit function theorem is when m = n and
F
i
(y
1
, . . . y
n
, x
1
, . . . , x
n
) = y
i
f(x
1
, . . . x
n
), (here the x
i
above are now the y
i
, and the z
i
3
above are now the x
i
, just to keep you on your toes) so that we are trying to solve the system
of equations
f
1
(x
1
, . . . x
n
) = y
1
.
.
.
.
.
.
f
n
(x
1
, . . . x
n
) = y
n
,
which means we are trying to invert the system of equations to express the x
i
as functions
of the y
j
. Note that this is equivalent to writing y = F(x) for the vector-valued function
F = f
1
, . . . f
n
and asking for the inverse function such that x = F
1
(y). Given such a
function F, we dene the Jacobian J(F) of F as the determinant of the derivative of F:
J(F)(x) = det Df(x).
Theorem 2.2. Let U R
n
be open and let F : U R
n
be continuously dierentiable. For
any x
0
U, if J(F)(x
0
) ,= 0, then there is a neighbourhood N of x
0
contained in U and a
unique function G that is also continuously dierentiable, such that for each x N and each
y = F(x), we have x = G(y). Moreover, we have
DG(y) = (DF(x))
1
for each x N. (The 1 on the right-hand side denotes the matrix inverse.)
Note: The inverse function theorem only applies to maps F : R
n
R
n
where the number
of variables is equal to the number of components. Notice that even if F is not dened on
all of R
n
, DF(x) is for each x in the domain of F: the domain of any linear function is all of
R
n
. The condition that the determinant of DF(x) is nonzero is equivalent to requiring the
linear function DF(x) to be both one-to-one and onto (and therefore invertible). Since this
condition may hold at some points x in the domain of F and not at others, the invertibility
of F only holds locally (e.g. on the neighbourhood N in the statement of the theorem).
If F : R
n
R
m
with m > n, DF(x) is no longer a square matrix, and therefore cannot
be invertible. In this case, the best we can ask for is that DF(x) is one-to-one. If this is
true at each x in the domain of F, then F is called an immersion. An immersion is a map
that preserves structure in a sense that is made precise in more advanced courses. For
example, if F : R
2
R
3
is an immersion, and C is a curve contained in the domain of F,
then the image of C under F will still be a curve in R
3
, and if D is a region in R
2
(such as a
disk or a rectangle), then the image of D will be a surface. (Roughly speaking, F preserves
the dimension of these objects - it doesnt collapse a curve to a point or a region to a curve.
If m < n, then the strongest condition one can impose is that DF(x) : R
n
R
m
be onto.
When this is the case, F looks locally like a projection. Such maps are called submersions.
4

You might also like