You are on page 1of 189

i

muldown
2010/1/10
page 1
i

Advanced Calculus: Lecture Notes


for Mathematics 217-317
James S. Muldowney
Department of Mathematical and Statistical Sciences
The University of Alberta
Edmonton, Alberta, Canada
January 10, 2010

i
i

muldown
2010/1/10
page 2
i

i
i

muldown
2010/1/10
page i
i

Contents
Preface

iii

The Real Number system & Finite Dimensional


1.1
The Real Number System R . . . . . . . . . .
1.1.1
Fields . . . . . . . . . . . . . . . .
1.1.2
Ordered Fields . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . .
1.1.3
Complete Ordered Field . . . . . .
1.1.4
Properties of R . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . .
1.2
Cartesian Spaces . . . . . . . . . . . . . . . . .
1.2.1
Functions . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . .
1.2.2
Convexity . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . .
1.3
Topology . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . .

Limits, Continuity, and Differentiation


2.1
Sequences . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . .
2.2
Continuity . . . . . . . . . . . . . . . . . .
2.3
Global Properties of Continuous Functions
Exercises . . . . . . . . . . . . . . . . . . .
2.4
Uniform Continuity . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . .
2.5
Limits . . . . . . . . . . . . . . . . . . . .
2.6
Differentiation of real valued functions of a
Exercises . . . . . . . . . . . . . . . . . . .

Cartesian Space
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .

1
1
1
2
3
4
4
5
6
7
11
11
13
13
13
21

.
.
.
.
.
.
.
.
.
.
.

25
25
29
33
35
38
42
42
44
46
49
52

Riemann Integration
3.1
Content and the Riemann Integral . . . . . . . . . . . . . . . . .

57
57

. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
real variable .
. . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

i
i

ii

Contents

3.2
3.3

muldown
2010/1/10
page ii
i

3.1.1
Partition of I . . . . . . . . . . . . . . .
3.1.2
Riemann Sums . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . .
Cauchy Criteria and Properties of Integrals . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . .
Evaluation of Integrals . . . . . . . . . . . . . . . .
3.3.1
Real valued functions of a real variable
3.3.2
Real valued functions on R2 . . . . . . .
3.3.3
Real valued functions on Rn . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

59
59
60
61
69
71
71
73
76
77

Differentiation 0f Functions of Several Variables


4.1
Preliminaries . . . . . . . . . . . . . . . . . . . .
4.1.1
Linear Functions . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . .
4.1.2
Straight Lines and Curves . . . . . .
4.2
The Directional Derivative and the Differential .
Exercises . . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . .
4.2.1
Differentiation Rules . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . .
4.3
Partial Derivatives of Higher Order . . . . . . .
4.3.1
Min-Max Theory . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . .
4.4
Local Properties of C 1 Functions . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . .
4.5
Implicit Function Theorem . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . .
4.5.1
Dimension . . . . . . . . . . . . . .
4.5.2
Application: Lagrange Multipliers .
Exercises . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

81
81
81
84
85
86
89
96
98
102
105
111
119
124
131
131
137
139
147
150

Further Topics in Integration


5.1
Changes of Variables in Integrals .
Exercises . . . . . . . . . . . . . . .
5.2
Integration on Curves and Surfaces
5.2.1
Curves (1-surfaces) . . .
5.2.2
Surfaces (2-surfaces) . .
Exercises . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

155
155
166
169
169
175
178

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

i
i

muldown
2010/1/10
page iii
i

Preface
These notes are not intended as a textbook. It is hoped however that they
will minimize the amount of notetaking activity which occupies so much of a students class time in most courses in mathmatics. Since the material is presented in
the sterile definition, theorem, proof form without much background colour or
discussion most students will find it profitable to use the notes in conjunction with
a textbook recommended by the instructor.
Probably the most important aspect of the notes is the set of exercises. You
should develop the practice of attempting several of these problems every week.
Many of the problems are quite difficult so please consult your instructor if you
are not blessed with success initially. Do not acquire the habit of abandoning a
problem if it does not yield to your first attempt; a defeatist attitude is your greatest
adversary. Solution of a problem, even with some assistance from the teacher when
necessary, is a fine boost to your morale. You will find that a strong effort expended
on the earlier part of the courses will be rewarded by growing self-confidence and
easier success later.

iii

i
i

iv

muldown
2010/1/10
page iv
i

Preface

NOTATION: Except when specified otherwise, upper case (capital) letters will
denote sets and lower case (small) letters will denote elements of sets.
a A means a is an element of the set A.
A B means A is a subset of B.
a 6 A means a is NOT an element of the set A.
A 6 B means A is NOT a subset of B.
Remark The slash through any symbol will mean the negation of the corresponding statement.
B A means b contains A.
A ( B means A is a proper subset of B.
P = Q means statement P implies statement Q.
P Q means P holds if and only if Q holds.
there exists.
for all.
s.t.

such that.

end of proof.
{x : . . .} means the set of all things x that satisfy conditions specified in . . .. For
example, see the following.
A B The union of sets A and B, {x : x A

or x B}.

A B The intersection of sets A and B, {x : x A

and x B}.

A\B Set difference, {x : x A and x 6 B}.


A B Cartesian product of sets A and B, {(x, y) : x A, y B}.
The empty set, a set with no elements.
intervals For a, b R with a < b,
[a, b] means {x R : a x b} (closed interval).
(a, b) means {x R : a < x < b} (open interval).
(a, b] means {x R : a < x b}.
[a, b) means {x R : a x < b}.

i
i

muldown
2010/1/10
page 1
i

Chapter 1

The Real Number system


& Finite Dimensional
Cartesian Space
We begin by defining our basic tool, the real numbers R The real numbers can
be constructed from more primitive notions such as the natural numbers N :=
{1, 2, 3, . . .}, or even from the fundamental axioms of set theory. Here we shall be
content with a precise description of R.

1.1

The Real Number System

Definition 1.1. R is a complete ordered field.


In the next several subsections we will explain the three underlined words.

1.1.1

Fields

A field is a set F together with two binary operations + and (addition, multiplication) which satisfy the following axioms: For all a, b, c, . . . in F
F1 a + b F and a b F (closure)
F2 a + b = b + a and a b = b a (commutativity)
F3 a + (b + c) = (a + b) + c and a (b c) = (a b) c (associativity)
F4 (a + b) c = (a c) + (b c) (distributivity)
F5 There exists unique elements 0 and 1 in F , 0 6= 1, such that
a+0=a

and a 1 = a

a F.

F6 For each a F , there exists a F such that a + (a) = 0, and if a 6= 0, there


exists a1 F such that a a1 = 1 (existence of inverses).
Please note the following conventions and easy consequences:
1

i
i

muldown
2010/1/10
page 2
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
1. a b will henceforth be written ab.
2. a(b + c) = ab + ac is an consequence of axioms F2 and F4, and need not be
separately assumed. (verify for yourself)
3. The elements (a) and a1 are uniquely determined by a. Suppose, for
example, that there are two elements (a1 ), (a2 ) that satisfy a + (a1 ) =
0 = a + (a2 ). Then
(a1 ) = (a1 )+0 = (a1 )+(a+(a2 )) = ((a1 )+a)+(a2 ) = 0+(a2 ) = (a2 ).
4. It is customary to write a b for a + (b) and

a
b

for ab1 .

5. aa, aaa, . . . are usually denoted a2 , a3 , . . ..


6. {1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1, . . .} is usually denoted {1, 2, 3, 4, . . .} = N.
i The simplest (and least interesting) field is the set {, e} with

e
e
+
the operations
e


e
e
e
e

Example 1.2

ii The set Q of rational numbers, i.e., numbers of the form


usual addition and multiplication is a field.

m
n

(n 6= 0) with the

iii The sets R and C of real and complex numbers respectively with the usual
addition and multiplication are fields.
iv The set Q(t) of rational functions with rational coefficients (i.e. functions of
the form p(t)
q(t) where p(t) and q(t) are polynomials with rational coefficients)
is a field.
v The set N = {1, 2, 3, 4, . . .} of natural numbers and the set Z of integers are
NOT fields.

1.1.2

Ordered Fields

A field F is ordered if there is a subset P of F (called the positive elements) such


that the following order axioms hold:
O1 a, b P = a + b P

and ab P .

O2 0
/P
O3 x F,

x 6= 0 = x P

or

x P but not both.

i
i

1.1. The Real Number System R

muldown
2010/1/10
page 3
i

Remark: Every ordered field contains Q as a subfield (we do not prove this). Thus,
Q may be characterized as an ordered field containing no ordered proper subfield,
i.e., Q is the smallest ordered field. (Two ordered fields are considered the same if
they are isomorphic and the isomorphism preserves the order.) A discussion of this
point may be found in
C. Goffman, Real Functions, Proposition 1 and 2 in Chapter 3.
E. Hewitt and K. Stromberg, Real and Abstract Analysis, Theorem 5.9.
We define a relation > on an ordered field as follows: If a, b F , write a > b
(equivalently, b < a) if a b P . Then the axioms O1, O2, and O3 have the
following consequences:
Proposition 1.3 (Properties of <).
i a > b, b > c implies a > c.
ii If a, b F , then exactly one of the following holds, a > b, b > a, a = b.
iii a b, b a implies a = b.
iv a > b implies a + c > b + c for each c F .
v a > b, c > d implies a + c > b + d.
vi a > b, c > 0 implies ac > bc and a > b, c < 0 implies ac < bc.
vii a > 0 implies a1 > 0 and a < 0 implies a1 < 0.
viii a > b implies a >

a+b
2

> b.

ix ab > 0 implies either a > 0 and b > 0, or a < 0 and b < 0.

Exercises
1.1. Establish the following properties of a field:
(a) a 0 = 0

(b) a (1) = (a)

(c) (a)(b) = ab

(d) (ab1 )(cd1 ) = ac(bd)1


(f ) If ab = 0, then a = 0 or b = 0.
1.2. Observe that for P being the set of positive elements
(a) 1 P

(b) a 6= 0 = a2 P

(c) If n N, then n P .

i
i

muldown
2010/1/10
page 4
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
(d) The field {, e} in Example 1.2(i) cannot be ordered.
(e) The field C of complex numbers cannot be ordered.

(f ) The fields Q and R are ordered by the usual notion of positivity.


(g) The field Q(t) in Example 1.2(iv) is ordered if p(t)
q(t) P whenever the
highest power of t in the product p(t)q(t) is positive.
1.3. Prove the statements (i)-(ix) of Proposition 1.3.

1.1.3

Complete Ordered Field

The notion of completeness for an ordered field will stretch your imagination a little
further. We will introduce some terminology necessary to discuss this topic. Let S
be a subset of an ordered field F .
(a) An element u F is an upper bound of the set S if s u,
(b) w F is a lower bound of S if w s,

s S.

s S.

(c) S is bounded above (below) if it has an upper (lower) bound in F . S is bounded


if it is bounded above and below. For example, the set of natural numbers N
is bounded below and unbounded above; the interval [0, 1) = {x : 0 x < 1}
is bounded.
(d) u is the least upper bound (or supremum) of S if
(i) s u,

s S, that is u is an upper bound, and

(ii) s v, s S = u v, that is u is smaller than any other upper


bound for S.
We write either u = sup S or u = lub S.
(e) Similarly, the greatest lower bound (or infimum) of S is a number w which is
a lower bound for S and exceeds all other lower bounds. We write w = inf S
or w = glb S.
Definition 1.4. An ordered field F is complete if each nonempty subset S of F
which has an upper bound has a least upper bound (supremum).
The only (to within an isomorphism) complete ordered field is R. Again we
do not prove this. A discussion may be found in the book of Hewitt and Stromberg,
p.45.

1.1.4

Properties of R

The first property is for an ordered field F to be Archimedean:


Definition 1.5. An ordered field F is Archimedean if a F , there exists n N =
{{1, 2, 3, 4, . . .} such that n > a. (That is the set N is not bounded above.)

i
i

1.1. The Real Number System R

muldown
2010/1/10
page 5
i

Theorem 1.6. R is Archimedean.


Proof. Suppose not. Then let a n, n N be an upper bound of N. Then there
exists a least upper bound b = sup N, since R is complete. Thus, b n, n N and
b 1 < n0 for some no N. But then, b < n0 + 1 N, contradicting b = sup N.
Corollary 1.7. If a > 0, there exists n N such that 0 <
Proof.

There exists n > a1 > 0. (Why?). Thus, a >

1
n

1
n

< a.

> 0.

Corollary 1.8. Q is an Archimedean ordered field.


Corollary 1.9. If a, b R, a < b, then there is a rational r such that a < r < b.
Proof. There exists an n N such that n(b a) > 1 (why?). Let m be the least
integer such that m > na. Hence, m 1 na and so
na < m na + 1 < na + n(b a) = nb = a <

m
< b.
n

Exercises
1.3. Guess the supremum and infimum of the following sets (when they exist):
(0, 1) = {x : 0 < x < 1} [0, 1] = {x : 0 x 1}
N = {1, 2, 3, 4, . . .}
{ n1 : n = 1, 2, 3, 4, . . .}

1.4. If a > 0, there exists n N such that 0 < 21n < a. Hint: Show 2n > n,
n N.
1.5. Q is not Archimedean.
1.6. Let F be an Archimedean ordered field containing an irrational element .
Show that if a, b F , a < b, then there is an irrational element such that
a < < b.
1.7. Show that R contains an irrational element. Hint: Show first that no rational
p satisfies p2 = 2. Then show that p = sup{x > 0 : x2 < 2} must satisfy
p2 = 2.
Theorem 1.10. Let In = [an , bn ], and In+1 In , n N. Then
n=1 In 6= . In
other words, a nested sequence of closed intervals has at least one point common to
all intervals.
Proof. First note that an < bm for all n, m. Thus, each bm is an upper bound for
the set {an : n N}. Therefore, a := sup{an : n N} bm for all m. It follows
that an a bn for all n. Thus, a In for all n.

i
i

muldown
2010/1/10
page 6
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space

Definition 1.11. For x R, the absolute value |x| is defined via



x,
if x 0;
|x| =
x. if x < 0.
The main properties of the absolute value are listed in
Proposition 1.12 (Properties of | |). For x R, there holds
(i) |x| = 0 x = 0.
(ii) | x| = |x|.
(iii) |xy| = |x| |y|.
(iv) If c 0, then |x| c c x c.
(v) ||a| |b|| |a + b| |a| + |b|.
Proof. Part (i)(iv) are an exercise.
For the proof of (v), note
(iv) = |a| a |a|,

|b| b |b|

= (|a| + |b|) a + b |a| + |b|


(iv) = |a + b| |a| + |b|

which is the desired right hand inequality. This implies the left hand inequality
since
|b| = |b a + a| |b a| + |a| = |b| |a| |b a| = |a b|,
interchanging a and b = |a| |b| |a b|,

= ||a| |b|| |a b| by definition of | |.

The rest of (v) follows from replacing b by b.

Exercises
1.9. Let F be an ordered field, with the property that if {In } is a nested sequence
of closed intervals in F , then
n=1 In 6= . Show that F is complete. (Remark:
This exercise and Theorem 1.10 shows that the supremum (completeness)
property and the nested interval property are equivalent.)
1.10. Let In = (0, n1 ). Show that
n=1 In = .
1.11. Let Kn = [n, ) := {x : x n}. Show that
n=1 Kn = .

i
i

1.2. Cartesian Spaces

muldown
2010/1/10
page 7
i

1.12. If a set S of real numbers contains one of its upper bounds a, then a = sup S.
Such a supremeum is called a maximum.
1.13. Show that S R cannot have two suprema.
1.14. Show that an ordered field F is complete if and only if every non-empty
subset of F which has a lower bound has an infimum.
1.15. Show that Q is not complete.
1.16. If s R is bounded and S0 S, show
inf S inf S0 sup S0 sup S.
1.17. If S = {(1)n (1 n1 ) : n = 1, 2, . . .}, find sup S, inf S. Prove any statements
you make.
1.18. Prove (i) (iv) of Proposition 1.12.

1.2

Cartesian Spaces

The Euclidean spaces Rn of dimension n are defined as the Cartesian products


of the real numbers R. The following definition provides the notation and basic
operations on these spaces.
Definition 1.13. The Euclidean spaces Rn are defined as
Rn = R R . . . R = {(x1 , . . . , xn ) : xi R, i = 1, . . . , n}.
|
{z
}
ntimes

The components of (x1 , . . . , xn ) are the xi , i = 1, . . . , n. A point, or vector, in Rn


is x := (x1 , . . . , xn ). The zero vector, or origin is the point O = (0, . . . , 0).
For two points x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ), we define the operations
of addition
x + y := (x1 + y1 , . . . , xn + yn ),
and scalar multiplication by any scalar (number from R
x = (x1 , . . . , xn ).
Geometrically, the sum of points x and y and scalar multiplication is shown
in Figure 1.1
The following properties of Rn follow from the properties of R:
Proposition 1.14. Rn forms a vector space.
(i) x + y = y + x
(ii) (x + y) + z = x + (y + z)
(iii) x + O = O + x = x

i
i

muldown
2010/1/10
page 8
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space

kp

p+q

Figure 1.1. The sum of vectors p, q and the scalar kp, k > 1.
(iv) x + (1)x = O
(v) 1x = x, and 0x = O
(vi) (x) = ()x
(vii) (x + y) = y + x
There is another kind of product on Rn :
Definition 1.15. The inner product or dot product of two vectors x and y is the
quantity
n
X
xi yi = x1 y1 + . . . + xn yn .
x y :=
i=1

The absolute value, or norm, of a point x in Rn is


|x| =

x x = (x21 + . . . + x2n )1/2 .

The main properties of the inner product is summarized by


Proposition 1.16.
(i) x x 0, with equality if and only if x = O.
(ii) x y = y x.
(iii) x (y + z) = x y + x z.
(iv) (x y = (x y) = x (y).
There is a famous inequality that links inner products and the norms of vectors

i
i

1.2. Cartesian Spaces

muldown
2010/1/10
page 9
i

Theorem 1.17 (Cauchy-Bunyakowski-Schwarz inequality). For all x, y


Rn , we have
x y |x| |y|,

with equality if and only if either one of x, y is O, or x = y with > 0.


Proof.
have

Let z = x y with , R. From the properties of inner products, we


0 zz

by (i)

= x x 2x y + 2 y y, by (ii), (iii) and (iv)


= |y|2 |x|2 2|x||y|x y + |x|2 |y|2 , choosing = |y|, = |x|

= 2|x| |y| [|x| |y| x y] .

Hence, x y |x| |y|. If equality holds in the last expression, then working backwards through the proof yields
O = z = |y|x |x|y,

i.e. x =

|x|
y.
|y|

Corollary 1.18. For all x, y Rn , we have


|x y| |x| |y|,
or

1/2
1/2 

.
y12 + . . . + yn2
|x1 y1 + . . . + xn yn | x21 + . . . + x2n

Corollary 1.19 (Triangle Inequality).


||x| |y|| |x y| |x| + |y|.
Proof.

We have
|x + y|2 = (x + y) (x + y) = x x + 2x y + y y
= |x|2 + 2x y + |y|2
|x|2 + 2|x| |y| + |y|2 ,

2
= |x| + |y|

(CBS inequality)

= |x + y| |x| + |y|.

The left-hand inequality follows from this just as in the scalar triangle inequality.
Proposition 1.20 (Properties of Norm).

i
i

10

muldown
2010/1/10
page 10
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space

Figure 1.2. An interval in R2 on the left and in R3 on the right.


(i) |x| 0 with equality if and only if x = O.
(ii) |x| = || |x|.
(iii) ||x| |y|| |x y| |x| + |y|.

Definition 1.21. An interval in Rn is the cartesian product of intervals in R:


I = I1 . . . In ,
where Ii are intervals in R. If each Ii = [ai , bi ], i = 1, . . . , n, is closed, then I is a
closed interval in Rn ; thus,
I = {(x1 , x2 , . . . , xn ) : ai xi bi }.
As in R, we have a nested interval property in Rn .
Theorem 1.22 (Nested Interval Property). If {Ik }, k N, is a sequence of
closed intervals in Rn such that Ik+1 Ik , k = 1, 2, . . ., then

k=1 Ik 6= .
Proof. If Ik = Ik,1 . . . Ik,n , k = 1, 2, . . ., where the Ik,i are closed intervals in
R, then for each i, Ik+1,i Ik,i , k = 1, 2, . . ., and
k=1 Ik,i 6= . Thus, there exits
xi Ik,i , k N, and each i = 1, 2, . . . , n. Hence, (x1 , . . . , xn ) Ik,1 . . . Ik,n =
Ik , k N.

i
i

1.2. Cartesian Spaces

1.2.1

muldown
2010/1/10
page 11
i

11

Functions

Definition 1.23. A subset f of A B is a function from A to B if


(x, y1 ), (x, y2 ) f = y1 = y2 .
We write f : A 7 B, (f is a function from A to B). For (x, y) f , we use the
notation y = f (x). The set Rf := {y : (x, y) f } for some x is called the range
of f , and the set Df := {x : (x, y) f for some y } is called the domain of f . If
U A, then f (U ) := {f (x) : x U } is called the image of U . If V B, then
f 1 (V ) := {x : f (x) V } is called the inverse image of V .
Example 1.24 Consider f = {(x, x2 ) : 1 x 1}; the function f (x) = x2 .
Then
 Df = [1,
 1]
 Rf =[0, 1]

1
f [1, 1/2] = [0, 1] f
[1, 1/2] = [0, 1/ 2].
Definition 1.25. A function f : A 7 B is one-to-one if it also satisfies
(x1 , y), (x2 , y) f = x1 = x2 .
11

We write this f : 7 B.
11

Note that if f : 7 B, then {(y, x) : (x, y) f } is also a one-to-one function,


11
which is denoted by f 1 : B 7 A, and is called the inverse function of f .
If f : A 7 B and g : B 7 C are two functions, then the composition of g
with f is the function
g f (x) = g(f (x)) with domain given by Dgf = f 1 (Dg ).
Example 1.26 As an example of a composition,
f : R 7 R2
f (x) = (|x|, x2 + 1)
2
2
g : R 7 R
g(u, v) = (u + v, u v)
g f : R 7 R2 g f (x) = (|x| + x2 + 1, |x| x2 1).

Exercises
1.19. Prove Corollary 1.18.
1.20. Show |x + y|2 + |x y|2 = 2(|x|2 + |y|2 ) for all x, y Rn . (parallelogram
identity)
1.21. If x = (x1 , . . . , xn ), show

for i = 1, . . . , n.
|xi | |x n sup{|x1 |, . . . , |xn |},

i
i

12

muldown
2010/1/10
page 12
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space

1.22. Show that |x + y|2 = |x|2 + |y|2 x y = 0. In this case, x and y are said
to be orthogonal. This is sometimes denoted by x y.
1.23. Is it true that
|x + y| = |x| + |y| x = y

or y = x

with 0?

1.24. Two sets A and B have the same cardinality if there is a one-to-one function
: A 7 B such that (A) = B and 1 (B) = A. Show that the following
sets have the same cardinality:
(a) N = {1, 2, 3, . . .} and 2N = {2, 4, 6, 8, . . .}.

(b) [0, 1] and [0, 2].

(c) (0, 1) and (0, ) = {x : x > 0}.

(d) [0, 1] and [0, 1).

1.25. A set A is said to be finite if it has the same cardinality as some initial segment
{1, . . . , n} of the natural numbers, and is said to be infinite otherwise. (Thus,
finite means that the elements can be labeled a1 , . . . , an .) Show that a finite
set of real numbers contains its inf and its sup. Hint: Induction.
1.26. A set A is countable if it has the same cardinality as N, the set of natural
numbers, or if it is finite. Otherwise, the set is said to be uncountable. Show
that a countable set need not contain its sup or its inf. (Countability means
all elements of the set can be labeled by the natural numbers {a1 , a2 , a3 , . . .}.)
1.27. Show that the union of a countable collection of countable sets is countable.
Hint:
S1 : a1,1 a1,2
a1,3 . . .

S2 : a2,1
a2,2
a2,3
...

S3 : a3,1
a3,2
a3,3 ,
...

S4 : a4,1
a4,2
a4,3 ,
...
..
..
..
..
..
..
..
..
. :
.
.
.
.
.
.
.
The elements may be counted by the scheme indicated . Deduce that Q is
countable.
1.28. [0, 1] is uncountable. Complete this sketch of proof: Suppose [0, 1] is countable and that
[0, 1] = {a1 , a2 , . . .}.

At least one of the intervals [0, 13 ], [ 13 , 23 ], [ 23 , 1] does not contain a1 ; call this
interval I1 . Subdivide I1 into three closed intervals, then a2 in not in one of
those three; call it I2 . Continuing in this manner, we obtain a nested sequence
of closed intervals, In , with the property that an 6 In . Therefore, none of
the an are in
k=1 Ik . But, by the nested interval theorem, the intersection
is non-empty so there must be an x [0, 1] with x 6= an for any n. This
contradicts our assumption.

i
i

1.3. Topology

1.2.2

muldown
2010/1/10
page 13
i

13

Convexity

Definition 1.27. For two points x, y Rn with x 6= y, we define


(i) the line through x and y as {x + t(y x) : t R};

(ii) and the line segment between x and y as the set {x + t(y x) : t [0, 1]}.

A subset C of Rn is convex if

x, y C = x + t(y x) C,

t [0, 1];

that is, if the line segment between any two points of the set is a subset of C.
Example 1.28 S := {x : |x| 1} is convex.
Proof. If |x| 1 and |y| 1, then
|x + t(y x)| = |(1 t)x + ty| |(1 t)x| + |ty|. triangle inequality
(1 t)|x| + t|y| (1 t) + t = 1.
Thus, x + t(y x) S, for 0 t 1; that is S is convex.

Exercises
1.29. Prove that {x : |x| = 1} is not convex.
1.30. Prove that {(x, y) R2 : y > 0} is convex.
1.31. Let C be any collection of convex sets. Show that AC A is convex. Is AC A
necessarily convex?
1.32. The convex hull H(A) of a set A is the intersection of all convex sets containing A as a subset. Prove that H(A) is convex. What is H(A) if A is a
set consisting of two points only?
1.33. A subset C of Rn is a cone if {tx : x C} C for all t 0.
(i) Prove that a cone C is a convex set if and only if

{x + y : x C, y C} C.
(ii) Draw pictures of convex and non-convex cones in R2 .

1.3

Topology

Definition 1.29. If > 0 and x0 Rn , then the open ball of center x0 and radius
is the set
B(x0 , ) := {x : |x x0 | < }.

A neighborhood of x0 is any set U which contains an open ball with center x0 as a


subset.
A set A is open in Rn if it is a neighborhood of each of its points.

i
i

14

muldown
2010/1/10
page 14
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space

Example 1.30 Consider the following examples:


1. Rn is open.
2. is open.
3. (0, 1) is open in R.
4. [0, 1) is not open in R.
5. {(x, y) : 0 < x < 1, y = 1} is not open in R2
6. B(x0 , ) is open in Rn
Proof. (1),(2) and (5) are left to the reader. For (3) note that if x0 (0, 1), then
B(x0 , ) = (x0 , x0 + ) (0, 1) when = min(x0 , 1 x0 ).
For (4), note that B(0, ) = (, ) is not in [0, 1) for any > 0.
Finally, to see (6), let x1 B(x0 , ). We will show that B(x1 , 1 ) B(x0 , )
when 1 = |x1 x0 | > 0, so that B(x0 , ) is a neighborhood of each of its points,
and hence is open. Let x B(x1 , 1 ), then
|x x0 | = |x x1 + x1 x0 |

|x x1 | + |x1 x0 | (triangle inequality)


< 1 + |x1 x0 | (p B(x1 , 1 ))
= |x1 x0 | + |x1 x0 |

definition of 1

= .

Thus, x B(x0 , ), and also B(x1 , 1 ) B(x0 , ).


Proposition 1.31 (Open set properties). For open sets in Rn , we have
(a) and Rn are open.
(b) If A and B are open sets, then A B is an open set.
(c) The union of any collection of open sets is open.
Proof. For (a), note that both and Rn are neighborhoods of their points (for ,
it is true because there are no points).
For (b), let x0 A B. Then
x0 A = 1 s.t. B(x0 , 1 ) A,

(since A is open)

x0 B = 2 s.t. B(x0 , 2 ) B, (since B is open)


= B(x0 , ) A and B(x0 , ) B for = min(1 , 2 )
= B(x0 , ) A B
= A B is open.

i
i

1.3. Topology

muldown
2010/1/10
page 15
i

15

For (c) let C be a collection of open sets, and select any x AC A. Then
x A for some A C

= B(x, ) A for some > 0 (A is open)


= B(x, ) AC A
= AC A is open

Definition 1.32. A set A is closed in Rn if its complement


Ac := Rn \A := {x Rn : x 6 A}
is an open set.
Proposition 1.33 (Properties of closed sets.).
(a) and Rn are closed.
(b) If A and B are closed sets, then A B is a closed set.
(c) The intersection of any collection of closed sets is closed.
Example 1.34 Consider the following examples:
1. [0, ) := {x : x 0} is closed in R. (Thus, (, 0) is open.)
2. [0, 1] is closed in R, i.e. (, 0) (1, ) is open.
3. [0, 1) is not closed in R. (Why?)
4. {x : |x| 1} is closed in Rn . (Since B(O, 1) is open.)
5. {x : |x| 1} is closed in Rn . (Exercise.)

Proposition 1.35. For a closed set C and an open set V in Rn


(a) C\V is closed;
(b) V \C is open.
Proof.

For (a) note that


C\V := {x : x C

and x 6 V } = C V c .

which is closed since C and V c are closed.


Part (b) is an exercise.

i
i

16

muldown
2010/1/10
page 16
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space

Figure 1.3. A sequence of bisections.


Notice that there are sets which are neither open nor closed (for example,
[0, 1) in R). The sets Rn and are both open and closed. We will see that they are
the only sets in Rn with this property.
Definition 1.36. For a set S in Rn , a point x0 is a cluster point of S if each
neighborhood of x0 contains a point x S with x 6= x0 .
A cluster point of S need not be an element of S. For example, the set
S = { n1 : n = 1, 2, 3, . . .} has 0 as a cluster point.
Definition 1.37. A set S in Rn is bounded if there is a > 0 such that
S B(O, )

(= |x| < ,

x S).

Equivalently, S is bounded if it is contained in some closed interval in Rn .


For a closed interval I = [a1 , b1 ] . . . [an , bn ] in Rn , the quantity
p
(I) := (b1 a1 )2 + . . . + (bn an )2

will be called the diameter of I. Note that if x, y are two points in I, the |x y|
(I).
The next theorem has important consequences.
Theorem 1.38 (Bolzano-Weierstrass Theorem). Every bounded infinite subset
of Rn has a cluster point.
Proof. Let K be a bounded infinite subset of Rn . Then, by definition, there is
a closed interval I1 such that K I1 . Bisect the sides of I1 to partition I1 into
2n subintervals. At least one of these new intervals must contain infinitely many
points of K; pick one of these and call it I2 . Bisect the sides of I2 . Again, at least
one of the 2n subintervals of I2 must contain infinitely many points of K; pick one
of these and call it I3 .
Continue this procedure inductively to define a sequence of closed intervals Ik
with the properties:

i
i

1.3. Topology

muldown
2010/1/10
page 17
i

17

1. Each Ik contains infinitely many points of K.


2. 0 < (Ik ) =

1
2k1 (I1 ),

k = 1, 2, 3, . . ..

3. There is a point x0 k Ik 6= by Theorem 1.10.


We must show that x0 is a cluster point of K. If this were not true, then there
would be a radius > 0 such that the ball B(x0 , ) does not intersect K:
B(x0 , ) K = .
Now, choose k so that
0<

1
(I1 ) < . Why is this possible?
2k1

Then 0 < (Ik ) < . Remember that for any x Ik , |x x0 | < (Ik ) < since x0
is also in Ik . Thus, Ik B(x0 , ). But, Ik contains (infinitely many) points of K;
a contradiction. Therefore, x0 is a cluster point of K

Theorem 1.39. A subset K of Rn is closed if and only if K contains all of its


cluster points.
Proof. = Let K be closed. Let x0 be a cluster point of K. If x0 6 K, then x0
is in the open set K c . Since K c is open, there exists > 0 so that B(x0 , ) K c .
That contradicts the definition of x0 being a cluster point of K. Hence, xo K.
= Let K contain all of its cluster points. If x1 K c , then x1 6 K, and
is not a cluster point of K. Therefore, by definition of cluster points, there exists
> 0 such that B(x1 , ) K c . Thus, K c is open and K is closed.
Definition 1.40. A collection G of open sets is an open cover of a set K if
K UG U.
A set K is Rn is compact if every open cover of K has a finite subcover; that is, for
any open cover G, there is a finite collection of sets U1 , . . . , Um from G such that
K m
j=1 Uj .
Some simple examples:
Example 1.41 The following examples illustrate the concept of compactness:
1. A finite set is compact.
2. [0, ) is not compact. Indeed, the sets Gk := (1, k), k N form an open
cover of [0, ), but there cannot be a finite subcover. (Why?)

i
i

18

muldown
2010/1/10
page 18
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
3. (0, 1) is not compact. (Find the infinite open cover that does not have a finite
subcover.)

The next theorem is a very important characterization of compact subsets of


Rn .
Theorem 1.42 (Heine-Borel Theorem). A subset K of
Rn is compact iff and only if K is closed and bounded.
Proof. = Let K be compact.
We first show K is closed. Let x0 be any point in K c . We wish to find a ball
B(x0 , ) K c for some . Then K c would be open, and hence, K would be closed.
Consider the collection G of open sets Gk , k N, where
Gk := {x : |x x0 | >

1
}.
k

(Why is Gk open?) Clearly, the collection G covers all of Rn \{x0 }, and hence covers
K. Since K is compact, there is a finite subcover {Gki : i = 1, . . . , m} of K. Now,
the Gj are nested with Gj Gi if i > j. Hence, letting k0 = max{k1 , . . . , km }, we
have K Gk0 . In other words,
B(x0 ,

1
1
1
) := {x : |x x0 | < } {x : |x x0 | } = Gck0 K c .
k0
k0
k0

Hence, K c is open and K is closed.


To show that K is bounded, consider the open cover consisting of all balls
B(O, k), k N. This collection covers all of Rn , and so is an open cover of K.
Therefore, there is a finite subcover {B(O, ki : i = 1, . . . , m} of K. Hence, K
B(O, k0 ) where k0 := max{k1 , . . . , km }. Thus, K is bounded.
= Let K be closed and bounded. The proof will be by contradiction. If K
is not compact, then there exists an open cover G = {G } of K such that K is not
contained in any union of finitely many G s. Since K is bounded, it is contained in
some closed interval I1 in Rn . Bisect the sides of I1 to obtain 2n subintervals (as in
the proof of the Bolzano-Weierstrass Theorem). At least one of these subintervals
intersects K in such a way that this intersection cannot be covered by finitely many
G s. Select such a subinterval as I2 . Proceeding inductively in this way, we obtain
a nested sequence of closed intervals Ik , k = 1, 2, . . ., such that
1. Each K Ik cannot be covered by finitely many G s, k = 1, 2, . . ..
2. (Ik ) = (I1 )/2k1 , k = 1, 2, . . ..
By the Nested Intervals Theorem, there is a point x0 k Ik , and this point is a
cluster point of K (as in the Bolzano-Weierstrass Theorems proof). Since K is
closed, x0 K. Therefore, x0 G0 for some 0 , since the G cover K. But, G0

i
i

1.3. Topology

muldown
2010/1/10
page 19
i

19

B
A
Figure 1.4. A disconnection,
is open, so there is a > 0 so that B(x0 , ) G0 . Now, by (2), we can chose k so
large that (Ik ) < . Hence,
Ik B(x0 , ) G0 ,
which contradicts (1).

Definition 1.43. A subset D of Rn is disconnected, if there are open sets A and


B such that
(i) A D 6= , B D 6= .
(ii) (A D) (B D) =
(iii) (A D) (B D) = D
The sets A and B are called a disconnection of D.
A subset D of Rn is said to be connected if it is not disconnected.
Example 1.44
1. The set N = {1, 2, 3, . . .} is disconnected. A disconnection is
given, example, by (A, B) = ((, 32 ), ( 23 , )).

2. Q is disconnected, with a disconnection given by (A, B) = ((, 2), ( 2, )).


3. [0, 1] is connected.
Proof. We will prove that [0, 1] is connected. Suppose it were disconnected and
(A, B) provides the disconnection:
(i) A [0, 1] 6= , B [0, 1] 6= .
(ii) (A [0, 1]) (B [0, 1]) = .
(iii) (A [0, 1]) (B [0, 1]) = [0, 1].

i
i

20

muldown
2010/1/10
page 20
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space

y
B

x
A

Figure 1.5. The line determined by x and y and a disconnection of R.


Now, 1 belongs to one of the sets, say 1 B. Let
c := sup{A [0, 1]},

which implies by

0 c 1.

Thus, c A B, so c A, or c B (but not both by (ii)).


If c A, then c A [0, 1] and 1 B implies c A [0, 1). Since A is open,
and c < 1, there is a > 0 with [c, c + ) A [0, 1). This contradicts the definition
of sup.
If c B, then since B is open, there exists a > 0 such that (c , c) B.
But this again contradicts the definition of c as the sup{A [0, 1]}.
In either case, we arrive at a contradiction, so no disconnection of [0, 1] exists.
Theorem 1.45. Rn is connected.
Proof.

Suppose (A, B) is a disconnection of Rn . Then

(i) A 6= , B 6= .
(ii) A B = .
(iii) A B = Rn .
Choose any points x A and y B, and consider the subsets of R determined by
A1 := {t R : x + t(y x) A},

B1 := {t R : x + t(y x) B}.

Since A, B are open in Rn , both A1 and B1 are open in R (this is not obvious so
think how to show it). Now the statements (i), (ii) and (iii) above imply
(i)1 A1 [0, 1] 6= , B1 [0, 1] 6= . (Indeed, 0 A1 and 1 B1 .)
(ii)1 (A1 [0.1]) (B1 [0, 1]) = .
(iii)1 (A1 [0, 1]) (B1 [0, 1]) = [0, 1].

i
i

1.3. Topology

muldown
2010/1/10
page 21
i

21

This means, (A1 , B1 ) is a disconnection of [0, 1] which, we have seen, is connected. The contradiction implies that Rn is connected.

Corollary 1.46. The only sets in Rn which are both open and closed are Rn and
.
Proof. If the nonempty set A 6= Rn is both open and closed, then so is Ac . Then,
(A, Ac ) would provide a disconnection of Rn .
More restricted notions of connectedness are sometimes used. A set C is
polygonally connected, if for each pair of point x0 , xm in C, there is a finite subset
{x1 , . . . , xm1 } of C such that the polygon
{xi1 + t(xi bf xi1 ) : t [0, 1], i = 1, 2, . . . , m}
is a subset of C. A set C is arcwise connected if, for each pair of points x, y C,
there is a path joining x to y lying entirely in C. That is, there is a continuous
function f : [0, 1] 7 C such that f (0) = x and f (1) = y. Either polygonally
connected or arcwise connected will imply the set is connected in the sense defined
here. However, the converse is not true. For example, the set {(x, y) R2 : x 6=
0, 0 < y x2 } {(0, 0)} is connected (in fact, arcwise connected), but is not
polygonally connected. The set {(x, y) R2 : y = sin( x1 ), x 6= 0} {(0, y) : 1
y 1} is connected, but is not arcwise connected.

Exercises
1.33. Property (b) of Proposition 1.31 implies that the intersection of any finite
collection of open sets is open. Show that it is not true that this holds for an
infinite collection of open sets.
1.34. Prove items (a), (b), and (c) of Proposition 1.33.
1.35. Prove that the two definitions of a bounded set are equivalent.
1.36. Prove that the intersection of any finite collection of open sets is open. Hint:
Use Property (b) of open sets and induction.
1.37. Prove that {x : |x| 1} is closed in Rn .
1.38. Prove that a subset U of Rn is open if and only if it is the union of a collection
of open balls.
1.39. If A is a subset of Rn , then A, the closure of A, is the intersection of all closed
sets which contain A as a subset. Show that
(a) A is closed.
(b) A A.
(c) A = A.

i
i

22

muldown
2010/1/10
page 22
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
(d) A B = A B.
(e) = .

(f ) Observe that A is the smallest closed set containing A.


(g) Prove that B(O, 1) = {x : |x| 1}.

(h) If A and B are subsets of R, then is A B = A B?

1.40. If A is a subset of Rn , then A , the interior of A, is the union of all open sets
contained in A. Show that
(a) A is open.
(b) A A.

(c) A = A .
(d) (A B) = A B .
(e) Rn = Rn .

(f ) Observe that A is the largest open set contained in A.

(g) Prove that B(O, 1) = B(O, 1).


(h) Is there a subset A R such that A = and A = R?

1.41. Let A Rn . The derived set A of A is the set of all cluster points of A
(a) Prove that A is closed.

(b) Prove that A = A A .

(c) Theorem 1.39 says that A is closed if and only if A A. A set A for
which A = A is called perfect. Give examples of perfect and non-perfect
sets.

1.42. If A Rn , then A, the boundary of A, is the set of all points x such that
each neighborhood of x contains a point of A and a point of Ac .
(a) Show A is closed if and only if A A.

(b) Show that (A) = A; hence A is closed.


(c) Show that A = A\A .
1.43. For each of the following sets give its closure, interior, derived set, and boundary.
(a) {x Rn : 0 < |x| < 1}

(b) { n1 R : n N}

1
(c) {( n1 , m
) R2 : n, m N}

(d) {x Rn : |x| < 1}.

1.44. Without using the Heine-Borel Theorem show that {(x, y) : x2 + y 2 < 1} is
not compact on R2 .
1.45. Let A and B be open in R. Prove that A B is open in R2 .
1.46. Show that a finite subset of Rn is closed.

i
i

1.3. Topology

muldown
2010/1/10
page 23
i

23

1.47. Show that a countable subset of R is not open. Show that it may or may not
be closed.
1.48. Let S be an uncountable subset of R. Show that S has a cluster point.
Hint: Show that at least one of the intervals [n, n + 1], n Z, must contain
uncountably many points of S.
1.49. Show that a closed interval is closed.
1.50. Let Q2 denote the set of points in R2 with rational coordinates. What is the
interior of Q2 ? The boundary of Q2 ? Show that Q2 is not connected.
1.51. Show that Qc is not connected. Show that (Q2 )c is connected, in fact, polygonally connected.
1.52. Show that an open connected set in Rn is also polygonally connected.

i
i

24

muldown
2010/1/10
page 24
i

Chapter 1. The Real Number system & Finite Dimensional Cartesian Space

i
i

muldown
2010/1/10
page 25
i

Chapter 2

Limits, Continuity, and


Differentiation

2.1

Sequences

Definition 2.1. A sequence in Rn is a function from p : N 7 Rn . Sequences are


usually denoted by {pk } where pk := p(k).
Example 2.2 Two simple sequences:
1. pk := k1 , {pk } is a sequence in R.
2. pk := ( k1 , sin(k 2 )); {pk } is a sequence in R2 .

Definition 2.3. A sequence {pk } in Rn is convergent if there exists p Rn such


that for each neighborhood U of p, there is a natural number N = N (U ) for which
k N implies pk U . Write limk pk = p, or lim{pk } = p.
Equivalently, {pk } converges if there is a point p Rn such that for each
> 0, there exists and N such that k N implies
|pk p| < .
p is called the limit of the sequence.
The sequence {pk } is said to be divergent if it is not convergent.
Proposition 2.4. A convergent sequence cannot have two limits.
Proof.

Suppose lim{pk } = p and lim{pk } = q. If > 0, then


N1 such that k N1 = |pk p| < ,
N2 such that k N2 = |pk q| < .
25

i
i

26

muldown
2010/1/10
page 26
i

Chapter 2. Limits, Continuity, and Differentiation

If k = max{N1 , N2 }, then
|q p| = |q pk + pk p| |pk q| + |pk p| < 2.
Therefore, |q p| < 2 for each > 0; thus |p q| = 0 (Archimedian Property).
Hence, p = q.
Example 2.5 Two simple limits:
1. xk = 1, k = 1, 2, 3, . . ., then limk xk = 1.
2. limk

1
k

= 0 (from the Archimedean Property, Theorem 1.6).

Remarks: Notice that if limk pk = p, then p is either


(a) a cluster point of the set {pk : k N}, or
(b) {pk } is ultimately constant and equal to p. (That is, there is an N such that
k N = pk = p}.)
Note also that limk pk = p if and only if limk |pk p| = 0.
Theorem 2.6. A convergent sequence is bounded.
Proof. If limk pk = p, there exists a natural number N such that k N =
|pk p| < 1. By the triangle inequality
|pk | |p| |pk p| < 1, if k N
= |pk | < |p| + 1, if k N

= |pk | max{|p1 |, . . . , |pN 1 |, |p| + 1},

k N.

Example 2.7 If pk = k, {pk } is divergent. By the Archimedian property, {pk } is


unbounded and so is divergent by Theorem 2.6.
Definition 2.8. If {kj } is a sequence of natural numbers such that k1 < k2 < k3 <
. . ., then {pkj } is called a subsequence of {pk }.
Theorem 2.9. A bounded sequence has a convergent subsequence.
Proof. There are two cases to consider. Either {pk : k N} is a finite set or it
is an infinite set. If it is a finite set, then there is at least one value p such that
pk = p for infinitely many k these terms in the sequence form a subsequence
of {pk }. In the other case, {pk : k N} is a bounded infinite set and so by the
Bolzano-Weierstrass Theorem 1.38, it has a cluster point p. For the ball B(p, 1),
there is a k1 such that pk1 B(p, 1). Suppose kj is such that pkj B(p, 1j ), then

i
i

2.1. Sequences

muldown
2010/1/10
page 27
i

27

1
). So, by induction, there
there must exist kj+1 > kj such that pkj+1 B(p, j+1
exists a subsequence {pkj } of {pk } such that |pkj p| < 1j , j = 1, 2, 3, . . .. The
Archimedian Property implies limj pkj = p.

Corollary 2.10. If K Rn , then K is compact if and only if each sequence of


points in K has a subsequence convergent to a point in K.
Proof. = The Heine-Borel Theorem shows that K is compact if and only if
it is closed and bounded. Thus, any sequence of points in K is bounded, and so
by Theorem 2.9 contains a convergent subsequence. The limit of this sequence is
either a cluster point of the sequence (and hence of K), and so is contained in K
since K is closed, or else the sequence is ultimately constant so its limit is one of
the members of the sequence and is in K.
= Suppose each sequence in K has a subsequence convergent to a point
of K. To show that K is closed, suppose it is not. Then there exists a cluster point
p of K such that p 6 K. Thus, there exists a sequence of points {pk } from K (pk
chosen in B(p, k1 )K) such that limk pk = p 6 K, contradicting our hypothesis
that the limit must be in K.
To show that K is bounded, again suppose it is not. Then there exist pk K
such that |pk | > k, k = 1, 2, 3, 4 . . .. Then the unbounded sequence {pk } cannot
converge, contradicting our hypothesis.
Thus, K is closed and bounded, and thus compact, by the Heine-Borel Theorem.

Theorem 2.11. A sequence is convergent to a limit p if and only if each subsequence is convergent and has limit p.
Proof. = Suppose limk pk = p, that is, for each > 0 there exists N such
that j N = |pj p| < . If {pkj } is a subsequence of {pk }, then
j N = kj j N = |pkj p| < .
Thus, limj pkj = p.
= Suppose each subsequence {pkj } of {pk } satisfies limj pkj = p.
But, {pk } is a subsequence of itself.
Example 2.12 If pk = (1)k , then {pk } is divergent. Indeed,
lim p2k = 1,

lim p2k+1 = 1.

If {pk } were convergent, these two limits would be the same number by the previous
theorem.

i
i

28

muldown
2010/1/10
page 28
i

Chapter 2. Limits, Continuity, and Differentiation

Theorem 2.13. If {pk } and {qk } are sequences in Rn such that


lim pk = p

and

lim qk = q,

then
(i) limk (pk + qk ) = p + q,
(ii) limk (pk qk ) = p q.
Further, if {xk } is a sequence in R such that limk xk = x, then
(iii) limk xk pk = xp, and
(iv) limk

1
xk pk

= x1 p, if xk , x 6= 0.

Proof. Exercise 2
The following result shows that it is sufficient to consider only sequences in R
when considering convergence or divergence.
Theorem 2.14. Let pk = (x1,k , . . . , xn,k ) Rn . Then {pk } is convergent if and
only if each of the sequences {xi,k } is convergent for i = 1, 2, . . . , n. Furthermore,
lim pk = p = (x1 , . . . , xn ) lim xi,k = xi ,

i = 1, 2, . . . , n.

Proof. = |pk p| |xi,k xi |, for all k N and for i = 1, 2, . . . , n. The


remainder of the proofis an exercise.
= |pk p| n max{|xi,k xi | : i = 1, . . . , n}. Thus, if limk xi,k = xi .
i = 1, . . . , n, given any > 0, there exists Ni such that k Ni = |xi,k xi | < n ,
for each i = 1, . . . , n. Let N = max{Ni : i = 1, . . . , n} so k N implies

|xi,k xi | < ,
n

i = 1, . . . , n = |pk p| < = lim pk = p.


k

Example 2.15 Further examples of limits of sequences:


(1) limk ( k1 , k12 +1) = (0, 1). Use Theorems 2.13 and 2.14 together with limk k1 =
0 (Archimedian Property), to note that limk k12 = 0 and limk k12 +1 = 1
(both by Theorem 2.13), and then Theorem 2.14 to complete the example.
(2) limk

2k+3
k+2

= 2. Indeed,

2+
2k + 3
=
k+2
1+

3
k
2
k

and

lim

3
1
= 0 = lim
= 0,
k k
k

lim

2
= 0,
k

and the result follows by an application of Theorem 2.13.

i
i

2.1. Sequences

muldown
2010/1/10
page 29
i

29

Exercises
2.1.
2.2.
2.3.
2.4.

Show that the two definitions of the limit of a sequence are equivalent.
Prove Theorem 2.13
Finish the proof of = in Theorem 2.14.
Let xk zk yk . Prove that if {xk } and {yk } are convergent with limit c,
then {zk } is convergent with limit c.
2.5. Discuss the convergence or divergence of the sequences whose kth terms are
given by:
k
k
k
(b) (1)
(c) 3k2k
(a) k+1
2 +1
k+1
(d)

2k2 +3
3k2 +1

(e) ( k1 , k)

(f) ((1)k , k1 )

2.6. If {xk } is a sequence of nonnegative real numbers which converge to x, show

xk x

.
that limk xk = x. HINT: xk x =
xk + x

2.7. Let xk = k + 1 k. Discuss the convergence of {xk } and k xk .


2.8. Show that a set C in Rn is closed if and only if each convergent sequence in
C has its limit in C.
2.9. Show limk pk = p if and only if limk |pk p| = 0.
1
2.10. If 0 < r < 1, show that limk rk = 0. HINT: Set r = 1+s
, s > 0. Show
1
1
k
k
. From this
that (1 + s) 1 + ks, k = 1, 2, . . ., so 0 < r = (1+s)k 1+ns
show that

0, if |r| < 1;
lim rk =
1, if r = 1
k
and {rk } is divergent if r = 1 or |r| > 1.





2.11. (Ratio Test) Let {xk } be such that limk xxk+1
= r. Show
k

(a) If 0 r < 1, then limk xk = 0.


(b) If r > 1, then {xk } is divergent.
(c) If r = 1, give examples which show that {xk } may be convergent or
divergent.

HINT: In (a), if r < s < 1, show that for some constant A and all large
enough k, 0 |xk | Ask (using the previous exercise).
k

2.12. Show that the sequence xk has limit 0 if 1 x 1, and is divergent if


|x| > 1.
xk
2.13. Show that lim
= 0 for all real numbers x.
k k!
1
2.14. If x > 0 show limk x k = 1. HINT: If 0 < x < 1, given > 0 there exists
1
N such that k N implies 1+k
< x < 1. (Why?) Therefore
1
1

<x<1
(1 + )k
1 + k

i
i

30

muldown
2010/1/10
page 30
i

Chapter 2. Limits, Continuity, and Differentiation


1
1
< x k < 1, if k N
=
1
+



= x k 1 <
< , if k N.
1+

Do the case |x| > 1.


1
1
2.15. Show limk k k = 1. HINT: Let xk = k k 1 > 0. Show k = (1 + xk )k >
k(k+1) 2
xk , and hence, limk xk = 0.
2
1

2.16. (Root Test) Let {xk } be such that lim |xk | k = r. Show
k

(a) If 0 r < 1, then limk xk = 0.


(b) If r > 1, then {xk } is divergent.

(c) If r = 1, then {xk } may be either convergent or divergent.

2.17. If a and b are nonnegative real numbers, show that lim (ak +bk ) k = max{a, b}.
k

Definition 2.16. A real sequence {xk } is increasing if x1 x2 x3 . . ., and is


decreasing if x1 x2 x3 . . .. The sequence is monotone if it is either increasing
or decreasing.
Theorem 2.17. A monotone sequence is convergent if and only if it is bounded.
Proof. Let {xk } be an increasing sequence. By Theorem 2.6, {xk } convergent
implies {xk } is bounded. On the otherhand, {xk } bounded implies sup{xk : k =
1, 2, . . .} = x exists. If > 0, then x is not an upper bound for {xk }, so there
exists an N such that x < xN x. Since {xk } is increasing
k N = x < xN xk x,
that is, k N = |xk x| < . Thus, the increasing sequence converges to its
supremum.
Remark: Given any ordered field F , each monotone bounded sequence in F is
convergent (with its limit in F if and only if F is complete.
Example 2.18

1. limk rk = 0 if 0 r < 1.

Note that 0 < rk+1 = rrk rk < 1 so {rk } is decreasing and bounded, hence
is convergent with 0 limk rk < 1. Now {rk+1 } is a subsequence of {rk }
so they have the same limit. But, limk rk > 0 implies
lim rk+1 = r lim rk < lim rk ,

a contradiction. (A different proof is given in the exercises.)


2. If ak =

k
2k ,

then limk ak = 0. Indeed,


ak+1 =

k+1
k+1
=
ak = 0 < ak+1 ak .
k+1
2
2k

i
i

2.1. Sequences

muldown
2010/1/10
page 31
i

31

Hence, {ak } is convergent and limk ak 0. Now,


lim ak = lim ak+1 = lim

1
k+1
ak = lim ak
2k
2 k

implies limk ak must be zero.


3. For ak = (1 + k1 )k , the sequence {ak } is convergent and 2 limk ak 3.
Indeed, by the binomial theorem
1 k
k! 1
k 1 k(k + 1) 1
+ ... +
) =1+
+
k
k! k
2!
k2
k! k k
1
1
1
1
2
1
1
21
= 1 + 1 + (1 ) + (1 )(1 ) + . . . + (1 )
.
2!
k
3!
k
k
k!
k
kk

ak = (1 +

Notice that the typical term after the first two has the form
1
1
1
(1 ) (1
)
!
k
k
and this increases in k. Thus, each term of ak is no greater than the corresponding term in ak+1 and ak+1 has one more term. Thus, ak+1 > ak and
{ak } is increasing.
We next show ak is bounded. Examining the above, we find
1
1
1
+ + ...+
2! 3!
k!
1
1
1
1 + 1 + + 2 + . . . + k1 3.
2 2
2

2 < ak 1 + 1 +

Thus, 2 limk (1+ k1 )k 3. This limit is sometimes taken as the definition


of e.
k+1

Note: We used the fact that 1 + r + r2 + . . . + rk = 1r


if r =
6 1. You
1r
will recall that this is easily proved by denoting the left-hand side by sk and
showing sk rsk = 1 rk+1 .

4. If A > 0, x1 > 0, and xk+1 = 12 (xk + xAk ), k = 1, 2, . . ., then limk xk = A.


This is an algorithm for computing square roots. Notice that x1 > 0 = xk >
0 by induction. Then
x2k+1 =

1 2
A2
(xk + 2A + 2 )
4
xk

A2
1
A 2
1 2
) 0.
(xk 2A + 2 ) = (xk
4
xk
4
xk

A = xk+1 A, k = 1, 2, . . . .

= x2k+1 A =
= x2k+1
Hence, xk

A, for k = 2, 3, . . .. Thus,

1
1 x2k A
A
A
1
) = (xk
)=
0.
xk xk+1 = xk (xk +
2
xk
2
xk
2 xk

i
i

32

muldown
2010/1/10
page 32
i

Chapter 2. Limits, Continuity, and Differentiation


So, {x
k } is decreasing and bounded below by
L A. But, on the other hand,
xk+1 =

A > 0. Therefore, limk xk =

1
A
A
1
) = L = (L + ) = L2 = A = L = A.
(xk +
2
xk
2
L

The results on monotone sequences are interesting in that, unlike the proceeding examples, it is not necessary first to guess the limit of a sequence in order to
show that it converges. Fortunately, we are able to do this in general.
Definition 2.19. A sequence {pk } in Rn is a Cauchy sequence if, for each > 0,
there exists a natural number N = N () such that if k, m N , then
|pk pm | < .
Theorem 2.20 (Cauchy Criterion). {pk } is convergent if and only if {pk } is a
Cauchy sequence.
Proof. = Suppose limk pk = p. Thus, for each > 0, there exists N such
that k N = |pk p| < /2. Hence, if k, m N , we have
|pk pm | = |pk p + p pm | |pk p| + |p pm | <

+ = .
2 2

Thus, {pk } is a Cauchy sequence.


= We first show that {pk } is bounded and so by the Bolzano-Weierstrass
Theorem (Theorem 1.38) or Theorem 2.9 it has a convergent subsequence. Indeed,
choose N so that k, m N = |pk pm | < 1. Then,
k > N = |pk pN | < 1 = |pk | < |pN | + 1.
Hence, |pk | max{|p1 |, . . . , |pN 1 |, |pN | + 1}.
Suppose now that {pkj } is the subsequence that converges. We claim limj pkj =
p = limk pk = p. Indeed, since limj pkj = p, for each > 0 there is a N1
such that j N1 = |pkj p| < /2. Since {pk } is a Cauchy sequence, there is
also an N2 such that m, k N2 = |pk pm | < /2. Take any kj max{N1 , N2 }.
Then for k max{N1 , N2 }, we have
|pk p| = |pk pkj + pkj p| |pk pkj | + |pkj p| <

+ = .
2 2

Thus, limk pk = p.
Remark: An ordered field F is complete if and only if each Cauchy sequence in F
is convergent (with its limit in F ).

i
i

2.1. Sequences

muldown
2010/1/10
page 33
i

33
1. The sequence {(1)k } is divergent.

Example 2.21

In fact, we have for xk = (1)k that |xk xk+1 | = 2 for all k, and so {xk }
cannot be a Cauchy sequence.
2. For xk = 1 +

1
2

1
3

+ . . . + k1 , {xk } is divergent. Indeed,

|x2k xk | =

1
1
1
1
1
1
= .
+
+ ...+

+ ...+
k+1 k+2
2k |2k
2k
2
{z
}
k times

Therefore, |x2k xk |

1
2

for all k and {xk } cannot be a Cauchy sequence.

Exercises
2.17. Show that the following sequences are divergent by proving directly they are
not Cauchy sequences.
(a) {k} (b) {(1)k (1 k1 )}
2.18. Show directly that the following are Cauchy sequences and hence are convergent.
1
1
1
(a) { k+1
(b) {1 + 1!
+ 2!
+ . . . + k!
}.
k }
2.19. Determine whether each of the following sequences is convergent or divergent.
In the case
convergence,
find the
o
n limit.
o
n of
k3 +k2 +1
4k2 +k+4
(b)
(a)
3
4
(k+1)
(1+k)(2+ k
o
n
2
2
(k +1)
(d) {k 2 + k}
(c)
(k+1)(k+2)(k+3)



sin k
+3k
3
3
(e) {1 k }
(f)
k
(g)

sin

k
3

+2k2

(h)

k2 +1
k3 +1

cos2

k
4

o


k
k
(i)
3 [3] ,
where [x] denotes the greatest integer not exceeding x.
2.20. For what values of x are the following convergent, divergent? Wherever you
can, give
n the limit.
o


xk
(a)
(b)
(k + 1)(k + 2)xk
(k+2)(k+1)
o
n k k1 o
n
kx +x
+1
xk +k
(d)
(c)
xk+1 +(k+1)
kxk1 +1
n ko
 k
x
(f)
k!x
(e)
k!
2.21. If a1 = 2, ak+1 =
limk ak = 3.

6 + ak , k 1, show that {ak } is increasing and

i
i

34

muldown
2010/1/10
page 34
i

Chapter 2. Limits, Continuity, and Differentiation

2.22. Show that the sequence 1, 1.4, . . . , ak


, . . ., in which (2ak + 3) ak+1 = 4 + 3ak ,
is monotone and that limk ak = 2.
2.23. If a1 = 1 and ak+1 (1 + ak ) = 12, show that limk ak = 3. Hint: Show
a2k+1 is increasing and a2k is decreasing.
2.24. Show the following
(a) If limk ak = 0, then limk k = 0, where k =

a1 +...+ak
.
k

(b) If limk ak = a, then limk k = a. Hint: limk (ak a) = 0.

(c) Give an example to show that {k } may be convergent even though {ak }
is not.
1

2.25. We have seen that limk x k = 1 if x > 0. (Exercise 15). An easier proof is
1
now available to us from our results on monotone sequences. Show {x k } is
1
monotone and bounded, hence convergent. Consider the subsequence {x 2k }
and deduce the result.
2.26. Let {xk } be a sequence of positive real numbers. Show that
1
xk+1
= r = lim xkk = r.
k
k xk

lim

Hint: If r > 0 and > 0, then there exist positive numbers A, B and a natural
number N such that
A(r )k xk B(r + )k ,

k > N.
k

2.27. By applying the last exercise to the sequence { kk! } show that
lim

k
(k!)

1
k

= e,

where

e = lim (1 +
k

1 k
) .
k

1
1
1
+ 2!
+ . . . + k!
. We have seen that both
2.28. Let sk = (1 + k1 )k , tk = 1 + 1!
sequences are convergent. Show that they have the same limit. Hint: First
use the Binomial Theorem to show

sk = 1 +

1
1
1
1
1
k1
+ (1 ) + . . . + (1 ) (1
) tk .
1! 2!
k
k!
k
k

Next, with m fixed, and k m, show that


sk 1 +

1
1
1
1
1
m1
+ (1 ) + . . . +
(1 ) (1
)
1! 2!
k
m!
k
k

and deduce that limk sk tm for each m.


2.29. Show that every sequence of real numbers has a monotone subsequence. (Be
careful.)
2.30. Let f be an ordered field. Show that the following statements are equivalent.
(a) F is complete (sets bounded above have a supremum in F ).
(b) F has the nested interval property.

i
i

2.2. Continuity

muldown
2010/1/10
page 35
i

35

(c) Each bounded monotone sequence in F converges to an element of F .


(d) Each Cauchy sequence in F converges to an element of F .
Discussion: All four statements in this problem are equivalent. However, (a),
(b), (c) are inapplicable if one wishes to consider completeness for sets which
are not ordered, whereas (d) is applicable to any set X for which distance
between points, i.e., a metric, has been defined. A function : X X 7 R
is a metric if
(i) (x, y) 0, x, y X.
(ii) (x, y) = 0 x = y.
(iii) (x, y) = (y, x), x, y X.
(iv) (x, y) (x, z) + (z, y) for all x, y, x X. (Triangle inequality)

The pair (X, ) is called a metric space. A sequence {xk } in X is a Cauchy


sequence (or fundamental sequence) if, for each > 0, there exists a natural
number N such that m, k N = (xm , xk ) < . A metric space {X, } is
called complete if each Cauchy sequence in {X, } is convergent (i.e., there
exists x X such that limk (xk , x) = 0 whenever {xk } is a Cauchy
sequence). For example, the space {Rn , } is a complete metric space if
(p, q) = |p q|. It is easily seen that is a metric from the properties
(i),(ii),(iii),and (iv) above. Theorem 2.20 implies the completeness of Rn .

2.2

Continuity

Let f : Rn 7 Rm , and let D Rn be the domain of f .


Definition 2.22. Suppose p0 D, we say f is continuous at p0 if, for each
neighborhood V of f (p0 ), there exist a neighborhood U of p0 such that p U D =
f (p) V , (i.e. f (U ) V ). Equivalently, f is continuous at p0 if, for each > 0,
there is a > 0 such that
pD

and

|p p0 | < = |f (p) f (p0 )| < .

In general, = (, p0 ).

Definition 2.23. The function f is continuous on D if it is continuous at each


point p0 of D.
Example 2.24 Let f (x) = x2 , 1 x 1, then D = [1, 1], f (D) = [0, 1], and
|f (x) f (x0 )| = |x2 x20 | = |(x x0 )(x + x0 )| |x x0 |(|x| + |x0 |)
= |f (x) f (x0 )| 2|x x0 | for x, x0 [1, 1]

If > 0, take = /2, then |x x0 | < and x, x0 [1, 1] implies


|f (x) f (x0 )| < ,
so f is continuous on [1, 1].

i
i

36

muldown
2010/1/10
page 36
i

Chapter 2. Limits, Continuity, and Differentiation

0110..p
00
11
00p0
11

.
f(p0 )
.

f(p)

f(D)
D
n
R

Figure 2.1. f is continuous at f (p0 ).


Theorem 2.25. Let f : D 7 Rm , D Rn . Then f is continuous at p0
D if and only if for each sequence {pk } in D such that limk pk = p0 we have
limk f (pk ) = f (p0 ).
Proof. = f continuous at p0 implies that for every > 0, there is a > 0 such
that for p D and |p p0 | < we have |f (p) f (p0 )| < . Now, {pk } D and
limk pk = p0 implies the existence of N such that k N = |pk p0 | < .
Hence, for this N we have k N = |f (pk ) f (p0 )| < ; that is limk f (pk ) =
f (p0 ).
= Suppose limk f (pk ) = f (p0 ) for every sequence {pk } in D with
limk pk = p0 . Assume f is not continuous at p0 . Then, negating the definition
of continuity, there exists an 0 > 0 such that each neighborhood U of p0 contains a
point pU for which |f (pU ) f (p0 )| 0 . Consider U = B(p0 , k1 ) and set pU = pk ,
for each k = 1, 2, . . .. Then limk pk = p0 , and |f (pk ) f (p0 )| 0 , so f (pk )
does not converge to f (p0 ), contrary to our hypothesis. Hence, the assumption that
f is not continuous at p0 is false.

Corollary 2.26.
(i) If f, g : Rn 7 Rm are continuous at p0 , then f + g and f g are continuous at
p0 .
(ii) If f : Rn 7 Rm and : Rn 7 R are continuous at p0 , then f is continuous
at p0 and 1 f is continuous at p0 when (p0 ) 6= 0.
Proof. This follows immediately from the corresponding theorem for sequences
Theorem 2.13.

Corollary 2.27. A function f : Rn 7 Rm is continuous at p0 if and only if each


component of f is continuous at p0 .

i
i

2.2. Continuity
Proof.

muldown
2010/1/10
page 37
i

37

If f (p) = (f1 (p), . . . , fm (p)) and limk pk = p0 , then


lim f (pk ) = f (p0 ) lim fi (pk ) = fi (p0 ),

i = 1, 2, . . . , m,

by Theorem 2.14.
Example 2.28 Some examples of continuous and discontinuous functions:
(1) D = [0, 1], f (x) = 1, for 0 < x 1, and f (0) = 0. Then f is discontinuous
at x = 0. Indeed, if xk is any sequence in (0, 1] with limk xk = 0, then
f (xk ) = 1 so limk xk = 1 6= 0.
(2) Let D = R, f (x) = sin( x1 ), for x 6= 0 and f (0) = 0. Then f is discontinuous at
x = 0 since
 
 

2
2
= (1)k+1 ,
= 0 and
f
lim
k (2k + 1)
(2k + 1)
and the latter sequence is not convergent.

Remark: The discontinuity in the first example is removable, i.e. the discontinuity at x = 0 can be removed by changing the value of the function
at 0. The discontinuity in the second example is essential since no matter
what value is assigned to the function at x = 0, the discontinuity cannot be
removed.
(3) D = R2 , f (x, y) = x2xy
+y 2 , for (x, y) 6= O and f (0, 0) = 0. The function f is
discontinuous at (0, 0). Indeed, for the sequence of points
1 1
pk = ( , ),
k k

k N,

lim pk = O and

lim f (pk ) =

1
1
1
/(2 2 ) = 6= 0.
2
k
k
2

1
The discontinuity at (0, 0) is essential since if qk = ( k1 , 2k
), then

lim qk = O,

but

lim f (qk ) =

1
1
2
1
/( + 2 ) = .
2k 2 k 2
4k
5

So we obtain two different limits from two different choices of sequences.


(4) D = R2 , f (x, y) =
continuous on R2 .

x2 y 2
x2 +y 2 ,

for (x, y) 6= O, and f (0, 0) = 0. This function f is

We first show continuity at p0 = (x0 , y0 ) 6= O. If limk (xk , yk ) = (x0 , y0 ),


then limk x2k yk2 = x20 y02 , and limk (x2k + yk2 ) = x20 + y02 6= 0. So, by
Theorem 2.13, limpp0 f (p) = f (p0 ).
When p0 = O, we have x20 + y02 = 0, so because the denominator becomes
zero, we must prove it directly: if (x, y) 6= (0, 0), the x2 + y 2 6= 0 and
x4 + 2x2 y 2 + y 4
x2 + y 2
1
x2 y 2

=
= |(x, y)|2 .
x2 + y 2
2(x2 + y 2 )
2
2

Hence, if |(x, y)| = |(x, y) (0, 0)| < 2, we have |f (x, y) f (0, 0)| < .
|f (x, y) f (0, 0)| =

i
i

38

muldown
2010/1/10
page 38
i

Chapter 2. Limits, Continuity, and Differentiation

Recall that if f : Rn 7 R and g : R 7 Rm , then g f is defined as


g f (p) = g(f (p)). The domain of g f is {p Df : f (p) Dg } = f 1 (Dg ) Df .
Theorem 2.29. Suppose p0 Dgf . Then g f is continuous at p0 if
(i) f is continuous at p0 , and
(ii) g is continuous at f (p0 ).
Proof. If {pk } is a sequence in Dgf Df such that limk pk = p0 , then
{f (pk )} is a sequence in Dg such that limk f (pk ) = f (p0 ), since f is continuous
at p0 . Since g is continuous at f (p0 ), we then have limk g(f (pk )) = g(f (p0 )).
Thus, limk g f (pk ) = g f (p0 ). Thus, by Theorem 2.25, g f is continuous at
p0 .
Corollary 2.30. If f : Rn 7 Rm is continuous at p0 , then |f | : Rn 7 R is
continuous at p0 .
Proof.

If g(q) = |q| for q Rm , then g is continuous on Rm since


||q| |q0 || |q q0 |.

Thus, if > 0 is given, |q q0 | < = ||q| |q0 || < . Therefore, g f = |f | is


continuous at p0 since g is continuous at f (p0 ).

2.3

Global Properties of Continuous Functions

Again recall the notation that if f : Rn 7 Rm with domain D Rn and range


f (D) Rm , then for A Rn , f (A) := {f (p) : p A D}, and for B Rm ,
f 1 (B) := {p : f (p) B}.
Example 2.31 Some functions and inverse images of part of their range:
(1) f (x) = x2 , 1 x 1. Then
1
1
f ([0, ]) = [0, ],
2
4

1
1
f ([ , 2]) = [ , 1],
2
4

1
1
1
f 1 ([ , 3]) = [1. ] [ , 1].
4
2
2

(2) f (x) = sin(x), < x < . Then




[2k,
(2k
+
1)]\{(4k
+
1)/2}
.
f 1 ([0, 1)) =
k=

i
i

2.3. Global Properties of Continuous Functions

muldown
2010/1/10
page 39
i

39

c= f(S)

f
S

R2

Figure 2.2. Figure for Example 2.31 (3).

S2

S1

1=f(S)
1
0

S
1

1 = f(S )
2

R2

Figure 2.3. Figure for Example 2.31 (4).


2

(3) f (x, y) = x4 + y 2 , (x, y) R2 . Then f (R2 ) = {x R : x 0}. For any fixed


c > 0, the set S := {(x, y) : f (x, y) = c} is an ellipse in R2 , and f 1 ({c}) = S.
(4) f (x, y) = xy for (x, y) R2 . Then f (R2 ) = R. The inverse images of 1 and 1
are the hyperbolas S1 := {(x, y) : xy = 1} = f 1 (1) and S2 := {(x, y) : xy =
1} = f 1 (1), while
{x axis} {y axis} = f 1 (0).

Theorem 2.32. If f : Rn 7 Rm , then f is continuous on its domain D if and


only if for each open set V Rm , there is an open set U Rn such that
f 1 (V ) = U D.
Proof. = Let f be continuous on D and V be an open set in Rm . If p0
f 1 (V ) D, then f is continuous at p0 and f (p0 ) V . Since V is open and f is
continuous at p0 , there is a (p0 ) > 0 such that
p B(p0 , (p0 )) D = f (p) V,

i
i

40

muldown
2010/1/10
page 40
i

Chapter 2. Limits, Continuity, and Differentiation

which implies
B(p0 , (p0 )) D f 1 (V ),

p0 f 1 (V ).

(2.1)

Hence, if we set
U := p0 f 1 (V ) B(p0 , (p0 )),
then U is open and (2.1) implies
U D f 1 (V ).

(2.2)

However, from the definition of U , p0 f 1 (V ) implies p0 U D, i.e.


f 1 (V ) U D.

(2.3)

So, U is an open subset of Rn and, from (2.2) and (2.3),


U D = f 1 (V ).
that

= Suppose that for each open V Rm , there exists an open U Rn such


U D = f 1 (V ).

Let p0 D, and > 0 be given. Consider V = B(f (p0 ), ). Then there exists an
open U Rn with U D = f 1 (V ). In particular, p0 U , so U is a neighborhood
of p0 , and
p U V = f (p) V.
Thus, f is continuous at p0
Theorem 2.33 (Preservation of Connectedness). Suppose f : Rn 7 Rm . If
f is continuous on D and D is connected, then f (D) is connected.
Proof. Suppose f (D) is not connected. Then there exist open sets V1 and V2 in
Rm such that
(i) V1 f (D) 6= and V2 f (D) 6= ;
(ii) (V1 f (D)) (V2 f (D)) = ; and
(iii) (V1 f (D)) (V2 f (D)) = f (D).
By the Global Continuity Theorem (Theorem 2.32), there exist open sets U1
and U2 in Rn such that
U1 D = f 1 (V1 ),

U2 D = f 1 (V2 ).

Then
(i) U1 D 6= , U2 D 6= , from (i),
(ii) (U1 D) (U2 D) = , from (ii),

i
i

2.3. Global Properties of Continuous Functions

muldown
2010/1/10
page 41
i

41

(iii) (U1 D) (U2 D) = D, from (iii)


Thus, D is disconnected, a contradiction to our hypothesis that D is connected.
Therefore, our assumption that f (D) is disconnected is false.
Corollary 2.34. Suppose f : Rn 7 R. Let D be a connected subset of Rn , and f
be a continuous real-valued function on D. If p, q D and f (p) < k < f (q), then
there is a p0 D such that f (p0 ) = k.
Proof. If there is no such p0 , then the sets V1 = (, k) and V2 = (k, +) provide
a disconnection of f (D) ; a contradiction to Theorem 2.33.
Example 2.35 At any time there are two antipodal points on the equator at the
same temperature. Indeed, let T (x) be temperature. Then T (x + 2) = T (x).
We suppose T is continuous and consider f (x) = T (x + ) T (x). Then f (0) =
T () T (0), while f () = T (2) T () = T (0) T () = f (0). If f (0) = 0,
then T () = T (0); otherwise, f (0) and f () are of opposite sign and k = 0 is a
value between them. In the latter case, by Corollary 2.34, there is a point x0 , with
f (x0 ) = 0 = T (x0 + ) T (x0 ); i.e. T (x0 + ) = T (x0 ).
Theorem 2.36 (Preservation of Compactness). Let f : Rn 7 Rm . If D is
compact and f is continuous on D, then f (D) is compact.
Proof. Let G be an open covering of f (D). For each V G there is an open set
U Rn such that
U D = f 1 (V ) by Theorem 2.32.
(2.4)

Since {V : V G} covers f (D), the collection {U : U D = f 1 (V ), V G} covers


D. Since D is a compact set, there must be a finite subcover {U1 , . . . , Uk }. From
(2.4), the V s corresponding to the Ui satisfy f (Ui ) Vi so {V1 , . . . , Vk } covers
f (D). Thus, the cover G contains a finite subcover of f (D). Since we began with
an arbitrary cover of f (D), so f (D) must be compact.
Corollary 2.37. A continuous function on a compact set is bounded.
Proof. f (D) is compact and therefore closed and bounded by the Heine-Borel
Theorem.
Corollary 2.38. A continuous real-valued function f : D Rn 7 R on a compact set D achieves its supremum and infimum, that is, it has a maximum and a
minimum value.
Proof. Let f : D Rn 7 R be continuous. Then f (D) is a compact subset of R
(Theorem 2.36). Let M := sup{f (p) : p D} = sup f (D). We wish to show that

i
i

42

muldown
2010/1/10
page 42
i

Chapter 2. Limits, Continuity, and Differentiation

M f (D), that is, that M = f (p0 ) for some p0 D. Suppose M 6 f (D), which
implies M > f (p) for all p D. Consider the function
g(p) =

1
> 0,
M f (p)

for p D.

Then g is continuous on D. (Why?) Since D is compact, g is bounded by Corollary


2.37. Hence, there exists a finite number A such that
0 < g(p) A, p D
1
A, p D
= 0 <
M f (p)
1
=
M f (p), p D
A
1
= f (p) M < M, p D.
A
The last inequality contradicts the fact that M is the least upper bound of f (D).
Hence, we must have M f (D).

Exercises
2.32.
2.33.
2.34.
2.35.
2.36.

Prove that the two definitions of continuity are equivalent.

Let f (x) = x, x 0. Show that f is continuous on [0, ).


Show that every real polynomial of odd degree has at least one real root.
Show that x4 + 7x3 9 has at least two real roots.
Suppose f and g are continuous real valued functions on [0, 1] such that
f (0) < g(0) and f (1) > g(1). Prove that there exists x (0, 1) such that
f (x) = g(x). Hint: Draw a picture.
2.37. Give an alternative proof of Theorem 2.36 by showing directly that f (D) is
closed and bounded if f is continuous on D and D is closed and bounded.
Hint: If f (D) is not bounded, then there exists pk D such that |f (pk )| > k,
k = 1, 2, . . .. If f (D) is not closed, there is at least one cluster point q of
f (D) with q 6 f (D). The latter means there are points pk D so that
limk f (pk ) = limk qk = q 6 f (D). Show that each case leads to a
contradiction of the hypothesis.
2.38. Show that f (x) = x1 is not uniformly continuous on (0, 1].

2.4

Uniform Continuity

Definition 2.39. A function f : D Rn 7 Rm is uniformly continuous on D if,


for each > 0, there is a > 0 so that
(p, q D

and

|p q| < ) = |f (p) f (q)| < .

i
i

2.4. Uniform Continuity

muldown
2010/1/10
page 43
i

43

Here the depends only on (and possibly D), but not on the choice of points.
Example 2.40 Some examples on uniform continuity:
(1) f (x) = x, < x < + (D = R), is uniformly continuous on R. Obviously,
if x, y R and |x y| < , then |f (x) f (y)| < .
(2) f (x) = x2 , 0 x 1 (D = [0, 1]), is uniformly continuous on D. Indeed,
|f (x) f (y)| = |x2 y 2 | = |x y||x + y| 2|x y| for x, y [0, 1].
Thus, if x, y D and |x y| < 2 , then |f (x) f (y)| < , so f is uniformly
continuous on D = [0, 1].
(3) f (x) = x2 , 0 x < + (D = [0, +)), then f is not uniformly continuous on
D. We will show the condition is not satisfied for = 1 by any choice of .
Let > 0 be given and choose x = 1 , y = 1 + 2 . Then
|x y| =

<
2

and |f (x) f (y)| = |x y||x + y| =

2
( + ) > 1.
2
2

Thus, f is not uniformly continuous on [0, ).

(4) f (x) = x, 1 x < + (D = [1, +)). Then f is uniformly continuous on


D. Indeed,

|x y|
1
|f (x) f (y)| = | x y| =
|x y|,
x+ y
2

x, y [1, +).

Thus, if |x y| < 2 and x, y [0, +), we have |f (x) f (y)| < .

(5) f (x) = x, D = [0, +), is uniformly continuous on D. We again use


|x y|
|f (x) f (y)| =
.
x+ y
We may assume
then |f (x)
that x > y and suppose |x y| < .

If x ,
f (y)| < = . If y < x < , then |f (x) f (y)| = x y < 0 = .
In either case, we see that

|x y| < = 2 and x, y [0, +) = |f (x) f (y)| < .
Theorem 2.41. Let f : D Rn 7 Rm . If f is continuous on D and D is compact,
then f is uniformly continuous on D.
Proof. Let > 0 be given. By the continuity of f on D, for each p0 D, there
exists a (, p0 ) > 0 such that
p B(p0 , (.p0 )) D = |f (p) f (p0 | <

.
2

i
i

44

muldown
2010/1/10
page 44
i

Chapter 2. Limits, Continuity, and Differentiation

Now, G := {B(p0 , (.p0 )/2) : p0 D} forms an open cover of D. Since D is


compact, there are p1 , . . . , pk in D such that the corresponding balls form a finite
subcover of D:
D kj=1 B(pj , (.pj )/2).
(2.5)
Let () = min{(.pj )/2 : j = 1, . . . , k}. If p, q D and |p q| < (), then
p B(pj , (.pj )/2) for some j by (2.5). Thus,
|p pj | <

1
(.pj ).
2

(2.6)

But, |p q| < () < 21 (.pj ), and hence from (2.6)


|q pj | |q p| + |p pj | (.pj ).

(2.7)

Therefore,
(2.6) = |f (p) f (pj )| <

and (2.7) = |f (q) f (pj )| <

.
2

Therefore, from the triangle inequality, if p, q D and |p q| < (), then |f (p)
f (q)| < . So, f is uniformly continuous on D.
Alternative Proof: Suppose f is not uniformly continuous on D. Negating
the definition of uniform continuity, there exists and 0 > 0, such that for every
> 0, there exists p, q D with |p q| < but |f (p) f (q)| 0 . In particular,
there exist pk , qk such that |pk qk | < k1 and |f (pk ) f (qk )| 0 > 0.
Now {pk } D and D is compact, hence bounded. Therefore, there is a
subsequence {pkj } converging to some point p0 in D; limj pkj = p0 (Corollary
2.10). Now
|qkj p0 | |qkj pkj | + |qkj p0 |

1
+ |qkj p0 |,
kj

and the right hand side tends to zero as j . Hence, limj qkj = p0 also.
But, the continuity of f at any point p0 D implies that
lim (f (pkj ) f (qkj )) = f (p0 ) f (p0 ) = 0,

contradicting |f (pk ) f (qk )| 0 > 0 for all k.

Exercises
2.38. If f (x) = x1 , x 6= 0, then f is continuous on its domain.
2.39. Show that a polynomial
f (x) = an xn + an1 xn1 + . . . + a1 x + a0 ,

(ai constants)

is continuous on R. Hint: Show that f (x) = constant and g(x) = x are


continuous on R, and deduce the general result from Corollary 2.26.

i
i

2.4. Uniform Continuity

muldown
2010/1/10
page 45
i

45

2.40. Let f : R 7 R be defined by


f (x) =

1 x, x Q;
x,
x
6 Q.

Show that f is continuous at x = 12 and discontinuous elsewhere.


2.41. Let f : R 7 R be continuous on R. Show that if f (x) = 0 for all x Q, then
f (x) = 0 for all x R.
2.42. Use the inequalities | sin(x)| |x| and | cos(x)| 1 and the formula
sin(x) sin(u) = 2 sin(

xu
x+u
) cos(
)
2
2

to show that the sine function is uniformly continuous on R.


2.43. Using the results of exercises 38 and 42 show that the function defined by

x sin( x1 ), x 6= 0;
g(x) =
0,
x = 0;
is continuous on R.
2.44. Is it possible for f and g to be discontinuous and yet g f be continuous?
2.45. Let f be continuous on D.
(a) Is it true that D open implies f (D) open? (This one is easy.)
(b) Is it true that D closed implies f (D) closed? Hint: Consider f (x) =
on R.

1
1+x2

1
2.46. If f (x) = 1+x
2 , then f is uniformly continuous on R.
2.47. Let f be a real-valued function with domain D, D being an open subset of
Rn . Prove that f is continuous if and only if the sets

{p : f (p) > },

{p : f (p) < }

are open for each R.


2.48. If f (x, y) = xy + x4 y 4 , then the equation f (x, y) = 0 has at least four
solutions on any circle x2 + y 2 = R2 > 0.
2.49. Prove the following facts about uniformly continuous functions:
(a) Let f be uniformly continuous on (0, 1] and {xk } be a sequence of
real numbers such that 0 < xk 1 and limk xk = 0. Prove that
limk f (xk ) exists. Hint: What was Cauchys name?
(b) If f is uniformly continuous on (0, 1], then f may be defined at x = 0 so
that f is continuous on [0, 1].
(c) If f : Rn 7 Rm , and f is uniformly continuous on the domain D Rn ,
then f may also be defined on D\D so that f is continuous on D. (D
denotes the closure of D.)
See Exercise 1.39 and 1.41.

i
i

46

muldown
2010/1/10
page 46
i

Chapter 2. Limits, Continuity, and Differentiation

2.50. If f : Rn 7 Rm with domain D Rn , the set


G := {(p, f (p)) : p D} Rn+m
is called the graph of f . Suppose D is connected. Is it true that f is continuous on D if and only if G is connected? What if connected is replaced by
closed? Compact? Prove any statements you believe to be true and find
a counterexample for any false statements.
2.51. If f is real valued and f is continuous at p0 with f (p0 ) > 0, then f (p) > 0
for each p in a neighborhood of p0 . Hint: = 21 f (p0 ).
2.52. Prove that f is continuous at a point p0 in its domain D if and only if,
for each sequence {pk } of points in D such that limk pk = p0 , we have
{f (pk )} convergent. (We say nothing about limk f (pk ).)

2.5

Limits

The notion of a limit which was introduced for sequence can, as you recall from first
year Calculus, be extended to any function.
Definition 2.42. Let f : Rn 7 Rm with domain D Rn . Let p0 be a cluster
point of D (it is not assumed that p0 D). A point L Rm is the limit at p0 of f
if, for each > 0, there exists a > 0 such that
(p D

and

0 < |p p0 | < ) = |f (p) L| < .

Write
lim f (p) = L.
pp0

If no such L exists, we say the limit at p0 of f does not exist.


xy
Example 2.43 (1) f (x, y) = p
is defined for (x, y) 6= O, i.e. D =
2
x + y2
R2 \{O}. We claim lim(x,y)(0,0) f (x, y) = 0. Indeed, using (a + b)2 0 =
a2 + b2 2ab, we see that



1 x2 + y 2
xy
1p 2
1


=
x + y 2 = |(x, y) (0, 0)|.
p
p
2
2
2
2
x +y 2 x +y
2
2
If > 0 is given, then 0 < |(x, y)| < 2 implies |f (x, y)| < .

(2) The function defined on R2 by


f (x, y) =

0, if y 6= x2 ;
1, if y = x2 ;

does not have a limit at O = (0, 0) since each neighborhood of (0, 0) contains
points (x, y) 6= (0, 0) at which f (x, y) = 0 and at which f (x, y) = 1.

i
i

2.5. Limits

muldown
2010/1/10
page 47
i

47

Just as in the case of sequences there is a Cauchy criterion for the existence
of limpp0 f (p).
Theorem 2.44 (Cauchy Criterion). The limit limpp0 f (p) exists if and only
if for each > 0, there is a > 0 such that


p, q D B(p0 , )\{p0 } = |f (p) f (q)| < .
Proof. = Suppose limpp0 f (p) = L; thus, if > 0 is given, there exists a
> 0 such that
p D, 0 < |p p0 | < = |f (p) L| < /2.


Then, if p, q D B(p0 , )\{p0 } , we have

|f (p) f (q)| |f (p) L| + |f (q) L| < /2 + /2 = ,

so Cauchys Criterion is satisfied.


= Let Cauchys Criterion be satisfied, that is, for a given > 0, there is
a () > 0 such that


p, q D B(p0 , ())\{p0 } = |f (p) f (q)| < .

Let {pk } be any sequence in D for which limk pk = p0 , with all pk 6= p0 , k =


1, 2, 3, . . .. Hence, for the aforementioned > 0, there is a N such
 that k N =

0 < |pk p0 | < (). Therefore, if k, m N , then pk , pm D B(p0 , ())\{p0 } ,

and by the Cauchy Criterion, this implies |f (pk ) f (pm )| < .


Thus, {f (pk )} is a Cauchy sequence, and therefore has a limit; say limk f (pk ) =
L. We need to show that limpp0 f (p) = L. From the facts that limk p = p0
and limk f (pk ) = L, we can choose an N1 so that
0 < |pN1 p0 | < (/2) and |f (pN1 ) L| < /2.
Therefore, if p D and 0 < |p p0 | < (/2), we have
|f (p) L| |f (p) f (pN1 )| + |f (pN1 ) L| < /2 + /2 = ,
(by the Cauchy Criterion and the choice of N1 ). Thus, limpp0 f (p) = L.
In fact, general limits are reduced to consideration of limits of sequences by
the following result.
Theorem 2.45. The limpp0 f (p) exists and equals L if and only if limk f (pk )
exists and equals L for each sequence {pk } in D\{p0 } such that limk pk = p0 .
Proof.

Exercise 55.

i
i

48

muldown
2010/1/10
page 48
i

Chapter 2. Limits, Continuity, and Differentiation

A point p0 is called isolated if it is not a cluster point of D. Check back to


the definition of continuity and observe that a function is always continuous at an
isolated point of its domain.
Corollary 2.46. A function f is continuous at a cluster point of its domain if and
only if limpp0 f (p) = f (p0 ).
Proof.

Follows from Theorems 2.25 and 2.45.

The following corollaries follow immediately from the corresponding statements for sequences.
Corollary 2.47. Let f : D Rn 7 Rm , f = (f1 , . . . , fm ), L = (L1 , . . . , Lm ),
and p0 D. Then limpp0 f (p) = L if and only if limpp0 fi (p) = Li , for all
i = 1, 2, . . . , m.
Corollary 2.48. If f, g : Rn 7 R, p0 (Df Dg ), and limpp0 f (p) = L,
limpp0 g(p) = M , then
(i) limpp0 (f (p) + g(p)) = L + M.
(ii) limpp0 (f (p)g(p)) = LM.
(iii) limpp0

f (p)
g(p)

L
M,

if M 6= 0.

If p0 is a cluster point of D0 D we may talk about the limit with respect to


D0 at p0 , i.e., (D0 ) limpp0 f (p) = L if, for each > 0 there is a > 0 such that
p D0 and 0 < |p p0 | < implies |f (p) L| < .
Special Notation: One sided limits.
D0 = (x0 , +) : (D0 ) lim f (x) = lim+ f (x)
xx0

xx0

D0 = (, x0 ) : (D0 ) lim f (x) = lim f (x)


xx0

xx0

Example 2.49 Some one-sided limits:


(1) If f (x) = 1 for x > 0 and f (x) = x for x 0, then
lim f (x) = 1,

x0+

(2) Suppose f : R2 7 R is defined by


f (x, y) =

lim f (x) = 0.

x0

0, if y 6= x2 ;
1, if y = x2 .

Evidently, the limit at O = (0, 0) with respect to points on the parabola


y = x2 is 1, while the limit with respect to the complement of the parabola

i
i

2.6. Differentiation of real valued functions of a real variable

muldown
2010/1/10
page 49
i

49

is 0. Notice that in this case, we have the limit along any line through the
origin is 0 at O, i.e.
lim f (tx0 , ty0 ) = 0

t0

if (x0 , y0 ) 6= O.

However, the limit lim(x,y)(0,0) f (x, y) does not exist.

The moral of this is that when the limit properties of functions of more than
one variable are being considered, it is not enough to look at their behaviour on
straight lines.

2.6

Differentiation of real valued functions of a real


variable

We review the most important results on differentiation from first year calculus.
Throughout this section f : R 7 R.
Definition 2.50. If f is defined in a neighborhood of x0 and
lim

xx0

f (x) f (x0 )
x x0

exists, then this limit is called the derivative of f at x0 and is denoted f (x0 ).
Proposition 2.51. If f (x0 ) exists, then f is continuous at x0 .
Proof.

Since f (x0 ) exists, given = 1, there is a > 0 such that





f (x) f (x0 )


f (x0 ) < 1.
0 < |x x0 | < =
x x0

Hence, |f (x)f (x0 )| < [|f (x0 )| + 1] |xx0 |. From this it follows that limxx0 f (x) =
f (x0 ).
Definition 2.52. The function f has an interior relative maximum (respectively, minimum)
at c if there is a neighborhood U of c such that
f (x) f (c)

xU

(respectively, f (x) f (c)

x U ).

Theorem 2.53. If f has an interior relative maximum or minimum at c and f (c)


exists, then f (c) = 0.
Proof.

We are given that


f (c) = lim

xc

f (x) f (c)
.
xc

i
i

50

muldown
2010/1/10
page 50
i

Chapter 2. Limits, Continuity, and Differentiation

Suppose that f has an interior relative maximum at x = c. Then there exists a


> 0 so that
|x c| < = f (x) f (c).
Therefore,
f (x) f (c)
0
xc
f (x) f (c)
= lim
0.
xc
xc

c < x < c + =

Similarly,
f (x) f (c)
0
xc
f (x) f (c)
= lim
0.
xc
xc

c < x < c =

The two inequalities together imply f (c) = 0.

Corollary 2.54 (Rolles Theorem). Suppose that


(i) f is continuous on [a, b]
(ii) f exists on (a, b)
(iii) f (b) = f (a).
Then there exists c (a, b) such that f (c) = 0.
Proof. First, if f (x) = f (a) = f (b) for all x (a, b), then f (x) = 0 for all
x (a, b).
If there is an x1 (a, b) with f (x1 ) > f (a) = f (b), then f being continuous
on the compact set [a, b] achieves its maximum value m = sup{f (x) : x [a, b]} at
some point c (a, b). Thus, f has an interior relative maximum at c and f (c) = 0.
If there is an x1 (a, b) with f (x1 ) < f (a) = f (b), then f has an interior
relative minimum at some c (a, b) and f (c) = 0.
Corollary 2.55 (Mean Value Theorem). Suppose
(i) f is continuous on [a, b]
(ii) f exists on (a, b)
Then there exists c (a, b) such that
f (c)(b a) = f (b) f (a).

i
i

2.6. Differentiation of real valued functions of a real variable

muldown
2010/1/10
page 51
i

51

Proof. Consider the function (x) = [f (x) f (a)](b a) [f (b) f (a)](x a).
The function is continuous on [a, b], exists on (a, b) and (a) = (b) = 0. By
Rolles Theorem, there is a c (a, b) such that
0 = (c) = f (c)(b a) [f (b) f (a)].
Corollary 2.56 (Cauchy Mean Value Theorem). Suppose that
(i) f, g are continuous on [a, b]
(ii) f and g exist on (a, b)
Then there exists c (a, b) such that
f (c)[g(b) g(a)] = g (c)[f (b) f (a)].
Proof.

Consider the function


(x) = [f (x) f (a)][g(b) g(a)] [f (b) f (a)][g(x) g(a)].

Theorem 2.57 (LHospitals Rule).


g (x) 6= 0 for all x (a, b), and either

If f and g are differentiable on (a, b),

(i) limxb f (x) = 0 and limxb g(x) = 0; or


(ii) limxb f (x) = and limxb g(x) = ,
then
lim

xb

f (x)
f (x)
= L = lim
= L.
g (x)
xb g(x)

Note: limxb f (x) = means that, for each real number N , there is a > 0
such that x (b , b) = f (x) > N .

(x)
=
Proof. Case (i): Suppose limxb f (x) = limxb g(x) = 0 and limxb fg (x)
L. If > 0 is given, there is a > 0 such that


f (x)

x (b , b) =
L < .
(2.8)
g (x)
Let f, g be defined at b by f (b) = g(b) = 0. Then f and g are continuous on [x, b]
if x > a. From the Cauchy Mean Value Theorem,
f (x) f (b)
f (c)
f (x)
=
= ,
g(x)
g(x) g(b)
g (c)
Therefore,




f (x)
f (c)


=
< ,

L
g(x)
g (c)

b < x < c < b.

for

x (b , b).

i
i

52
Thus, limxb

muldown
2010/1/10
page 52
i

Chapter 2. Limits, Continuity, and Differentiation


f (x)
g(x)

= L.

(x)
Case (ii): Suppose limxb f (x) = limxb g(x) = and limxb fg (x)
= L.
As before, (2.8) holds if x (b, b) for some > 0. However, we cannot define f (b)
and g(b) in this case so that f is continuous at b. Let x0 = b and x (b , b).
Then
f (c)
f (x) f (x0 )
= , for some c (b , b).
g(x) g(x0 )
g (c)

Thus, by (2.8), for x0 = b and x (b , b),





f (x) f (x0 )
f (x)


=
<

L
h(x)

L
g(x) g(x0 )
g(x)

where

h(x) =

f (x0 )
f (x)
g(x0 )
g(x)

(2.9)

(2.10)

Notice that limxb h(x) = 1, so there is a 1 < such that


x (b 1 , b) = |h(x) 1| < and h(x) >

1
.
2

(2.11)

Thus, for x (b 1 , b),





f (x)

1 f (x)


L <
h(x) Lh(x) , from (2.11)

2 g(x)
g(x)







f (x)
1 f (x)


L < h(x)
L + |L||1 h(x)|
=
2 g(x)
g(x)
< + |L|, (2.9) and (2.11).
Thus, if > 0, there exists 1 such that



f (x)

L < 2(1 + |L|).
x (b 1 , b) =
g(x)

Therefore,

lim

xb

f (x)
= L.
g(x)

Remark: Define limx = L if, for each > 0, there exists N such that x N
implies |f (x) L| < . LHospitals Rule is also valid if limxb is replaced by
limx . Minor changes are required for the proof however.

Exercises
x2 a2
for x 6= a, show that lim f (x) = 2a.
xa
xa
2.55. Prove Theorem 2.45.
2.54. If f (x) =

i
i

2.6. Differentiation of real valued functions of a real variable


2.56. Show that the limit

muldown
2010/1/10
page 53
i

53

xy 2
(x,y)(0,0) x2 + y 4
lim

does not exist even though


(D0 )

xy 2
,
+ y4

lim

(x,y)(0,0) x2

D0 any straight line through O

exists and equals 0.


2.57. Recall the rules for differentiation of constants, sums, products, composites
(chain rule) which you have learned from calculus. Prove one or two of them.
2.58. Draw the graph of the function f (x) = | sin(x)|, x R. Check that the
derivative exists and is 0 at each relative maximum but does not exist at
each relative minimum.
2.59. Prove that there is no real number k such that the equation x3 3x + k = 0
has two distinct solutions in [0, 1].
2.60. If c0 , . . . , cn are real numbers such that
c0 +

c1
cn1
cn
+ ...+
+
= 0.
2
n
n+1

prove that
c0 + c1 x + . . . + cn1 xn1 + cn xn = 0
has at least one solution in (0, 1).
p
2.61. If f (x) = |x|, then f (x) exists if x 6= 0 and does not exist if x = 0.

2.62. Prove that 10.243


(no calculator!). Hint: By the Mean
< 10.250
< 105
5
Value Theorem 105 = 100 + 2
where
100 < c < 105.
c
2.63. Prove LHospitals Rule with limxb replaced by limx .
limx xex = 0.
2.64. Prove the following limits
arctan(x)
=1
(i) lim
x0
x
1
ex ex 2 tan(x)
=
(ii) lim
3
x0
3
sin (x)

Prove that

x4 5x3 + 6x2 + 4x 8
=3
x2 x4 7x3 + 18x2 20x + 8


1
= 0.
(iv) lim cot(x)
x0
x
csc(x)
= e.
(v) lim 1 + tan(x)
x0

log x 2
= 0.
(vi) lim
+
tan(x)
x
2
(iii) lim

2.65. If f (x) exists for all x near c, f is continuous at c and limxc f (x) = A,
then f (c) exists and equals A.

i
i

54

muldown
2010/1/10
page 54
i

Chapter 2. Limits, Continuity, and Differentiation

2.66. (Darboux Property of the Derivative) Let f (x) exist for each x [a, b] and
f (a) = , f (b) = . Suppose < < . Show that there is a c
(a, b) such that f (c) = . Hint: Show that g(x) = f (x) x must achieve
its minimum in (a, b). This exercise shows that derivatives, like continuous
functions, have the Intermediate Value Property. However, a derivative need
not be continuous on its domain. See Exercise 70 (i) and (ii).
2.67. Let

0, if 1 x 0;
f (x) =
1, if 0 < x 1.
Is there a function g such that g (x) = f (x) on 1 x 1.
2.68. The function defined by

x2 , x Q;
g(x) =
0,
x 6 Q;

is continuous at exactly one point. Is it differentiable there?


2.69. (Taylors Theorem) Suppose that
(i) f (n1) exists and is continuous on [, ], and
(ii) f (n) exists on [, ],
(a) Show that for m > 0
f () = f () +

( )n1 (n1)
( )
f () + . . . +
f
() + Rn (f ; , ),
1!
(n 1)!

where
Rn (f ; , ) =

( )m ( )nm (n)
f ()
m(n 1)!

for some point (, ). This is called Schlomilchs form of the remainder. Special cases are given by taking
( )n (n)
f () (Lagrange form)
n!
( )( )n1 (n)
f () (Cauchy form)
Rn (f ; , ) =
(n 1)!

m = n;
m = 1;

Rn (f ; , ) =

The Lagrange form is the easiest to remember and is adequate for most
purposes. Hint: Let the constant C be defined by
f ()f ()

( )
( )n1 (n1)
f (). . .
f
()()m C = 0,
1!
(n 1)!

and apply Rolles Theorem to


(x) = f ()f (x)

( x)
( x)n1 (n1)
f (x). . .
f
(x)(x)m C.
1!
(n 1)!

i
i

2.6. Differentiation of real valued functions of a real variable

muldown
2010/1/10
page 55
i

55

(b) (i) If f (x) = ex , show that lim Rn (f ; 0, x) = 0 for all x. That is, show
n
that

x2
xn1 
x
= ex , x.
+ ...+
lim 1 + +
n
1!
2!
(n 1)!
(b) (ii) If f (x) = sin(x), then limn Rn (f ; 0, x) = 0 for all x.
(b) (iii) If f (x) =

1
1x ,

x 6= 1, then limn Rn (f ; 0, x) = 0 for 1 < x < 1.

(c) (i) How large must I take n to approximate e to four decimal places by
the expression
1
1
1
1 + + + ...+
.
1! 2!
(n 1)!

(c) (ii) Approximate 3 e to five decimal places.

(c) (iii) Prove that the closest integer to n!


e is divisible by (n 1). Hint:
Use Taylors formula for e1 .

(c) (iv) Compute 97 to four decimal places.


(d) Suppose f (x) exists and is non-negative (non-positive) in a neighborhood of c and that f (c) = 0. Show that f has a relative minimum
(maximum) at c.
(e) Suppose f (x) exists in a neighborhood of c and is continuous at c. Show
that f has a relative minimum (maximum) at c if
f (c) = 0 and f (c) > 0 (< 0).
2.70. Let f0 (x) = sin( x1 ), f1 (x) = x sin( x1 ), f2 (x) = x2 sin( x1 ), f3 (x) = x3 sin( x1 ),
for x 6= 0 and fi (0) = 0 for i = 0, 1, 2, 3.
(i) fi are differentiable at any point x 6= 0 (Chain Rule).
(ii) f0 is discontinuous at x = 0.

(iii) f1 is continuous at x = 0, but is not differentiable at x = 0.


(iv) f2 is differentiable at x = 0 but f2 is discontinuous at x = 0.
(v) f3 is differentiable at x = 0 and f3 is continuous at x = 0.
2.71. Let pn be defined by
pn (x) =

dn 2
(x 1)n ,
dxn

n = 1, 2, . . . .

(i) Show that pn is a polynomial of degree n.


(ii) The equation pn (x) = 0 has exactly n roots in (1, 1).
2.72. Doesnt time fly when youre having fun?

i
i

56

muldown
2010/1/10
page 56
i

Chapter 2. Limits, Continuity, and Differentiation

i
i

muldown
2010/1/10
page 57
i

Chapter 3

Riemann Integration

The definition of the Riemann integral is essentially the same in higher dimensions
as in one dimension. You may have learned Riemann integration of functions of one
variable by the lower and upper Riemann sums approach; the treatment adopted
here is slightly different but equivalent to that approach (see Exercise 17).

3.1

Content and the Riemann Integral

Recall that a closed interval in Rn has the form


I = [a1 , b1 ][a2 , b2 ]. . .[an , bn ] := {(x1 , x2 , . . . , xn ) : ai xi bi , i = 1, 2, . . . , n}.
The diameter of I was defined to be

v
n
1/2 u

uX
= t (bk ak )2 .
(I) := (b1 a1 )2 + . . . + (bn an )1
k=1

If p, q I, then |p q| (I).

Definition 3.1. The content of I, or Jordan measure of I is defined to be


(I) = (b1 a1 )(b2 a2 ) (bn an ) =

n
Y

k=1

(bk ak ).

In R, R2 , R3 , is length, area, and volume respectively. We write n if the dimension of the space needs to be emphasized. A set S Rn has content zero
(Jordan measure zero) if, for each > 0, there is a finite collection of intervals Ii ,
i = 1, . . . , k, such that
S ki=1 Ii

and

k
X

(Ii ) < .

i=1

For such an S, we write simply, (S) = 0.


57

i
i

58

muldown
2010/1/10
page 58
i

Chapter 3. Riemann Integration

Example 3.2 Some sets with content zero:


(1) A set containing only a single point has content zero in Rn .
(2) A set containing a finite number of points has content zero in Rn .
(3) The set {(x, 0) : 0 x 1} has content zero in R2 . The proof is essentially
given in Figure 3.1.
(4) {(x, y) : |x| + |y| = 1} has content zero in R2 . The proof is essentially given by
Figure 3.1. It also will follow from (5) and (6).

(0,0)

(1,0)
1/k
()

(b)

Figure 3.1. (a) The rectangle


P has height and length 1. (b) The rectangles
are squares of side length 1/k, so
( Ii ) = 4/k < , if k is sufficiently large.

(5) If f is a continuous real valued function on [0, 1], then {(x, f (x)) : 0 x 1},
the graph of f , has content zero in R2 .
Proof. Let > 0. Since f is uniformly continuous on [0, 1] (Theorem 2.41),
there exists > 0 such that
x, y [0, 1] and |x y| < = |f (x) f (y)| /2.
Now let m be a natural number so that m < 1 and (m + 1) 1. Consider
the following intervals in R2 :
Ik = [k, (k + 1)] [f (k) /2, f (k) + /2],

Im = [m, 1] [f (m) /2, f (m) + /2].

k = 0, . . . , m 1,

If x [k, (k + 1)] [0, 1], then |x k| < , so


|f (x) f (k)| /2 = f (x) [f (k) /2, f (k) + /2].
Thus, (x, f (x)) Ik if x [k, (k + 1)] [0, 1]. Hence,
{(x, f (x)) : 0 x 1} m
k=1 Ik

so {(x, f (x)) : 0 x 1} = 0.

and

m
X

k=1

(Ik ) = (m + 1 m) = ,

i
i

3.1. Content and the Riemann Integral

muldown
2010/1/10
page 59
i

59

Figure 3.2. A covering of a graph of a continuous function on [0, 1] by


rectangles.
(6) The union of a finite collection of sets with zero content has zero content.
(7) A circle has zero content in R2 . Indeed, the circle of radius r can be written
as

the union of the two half circles which are the graphs of f (x) = r2 x2 ,
and thus has zero content by (5) and (6).

3.1.1

Partition of I

Let I be the rectangle


I = [a1 , b1 ] [a2 , b2 ] . . . [an , bn ].
Let Pi be finite sets of points
Pi := {ti,j : j = 0, . . . , mi },

with ai = ti,0 < . . . < ti,mi = bi ,

i = 1, . . . , n.

Then P = P1 P2 . . . Pn is said to be a partition of I. Notice that a partition generates a subdivision of I into a finite collection of closed non-overlapping
subintervals of the form
[tj1 ,1 , tj1 +1,1 ] [tj2 ,2 , tj2 +1,2 ] . . . [tjn ,n , tjn +1,n ].
The total number of subintervals generated by P is m1 m2 mn . A partition Q
is a refinement of P if P Q. Notice that a refinement of P further subdivides the
intervals Ii generated by P .

3.1.2

Riemann Sums

Let I be a closed interval in Rn and f : I 7 Rm . If P is a partition of I which


generates a subdivision of I into subintervals Ii , then any sum of the form
X
f (pi )(Ii ), pi Ii
S(P, f ) =
i

i
i

60

muldown
2010/1/10
page 60
i

Chapter 3. Riemann Integration

Ii

Figure 3.3. A covering of a graph of a continuous function on [0, 1] by


rectangles.
is a Riemann sum of f corresponding to the partition P .
Definition 3.3 (Riemann Integral). Let f : I 7 Rm where I is a closed interval
in Rn . Let Rm . If, for each > 0, there is a partition P of I such that if
P P and S(P, f ) is any Riemann sum corresponding to P , then
|S(P, f ) | < ,
and the function f is said to be Riemann integrable on I and to have Riemann
integral . We write
Z
Z
= f or = f d.
I

Proposition 3.4. If f is Riemann integrable on I, then the integral of f on I is


unique.
R
R
Proof. Suppose 1 = I f and 2 = I f . Let > 0 be given. Then there are
partitions P1, , P2, of I such that
P Pi, = |S(P, f ) i | < /2,

i = 1, 2.

But the partition P1 P2 P1 and P1 P2 P2 , so that


|1 2 | |S(P1 P2 , f ) 1 )| + |S(P1 P2 , f ) 2 | /2 + /2 = .
We have thus shown |1 2 | < for each > 0. Hence,|1 2 | = 0, i.e. 1 = 2 .

Exercises
3.1. If f is integrable on I, then f is bounded on I.

i
i

3.2. Cauchy Criteria and Properties of Integrals

muldown
2010/1/10
page 61
i

61

3.2. f : I 7 Rm is integrable on I if and only if each component function fi ,


i = 1, . . . , m, of f is integrable on I. Moreover,

Z
Z
Z
f=
f1 , . . . , fm .
I

It therefore suffices to consider only real-valued functions, just as in the case


of sequences.

3.2

Cauchy Criteria and Properties of Integrals

R
Theorem 3.5 (Cauchy Criterion). Let f : I 7 Rm . I f exists if and only if
for each > 0 there exists a partition P of I such that if P, Q P , then
|S(P, f ) S(Q, f )| <
for all Riemeann sums S(P, f ), S(Q, f ) corresponding to P ,Q.
R
Proof. = If = I f exists, then for each > 0 there is a partition P of I
such that if P P , then
|S(P, f ) | < /2.
(3.1)
Let P, Q P , then (3.1) implies
|S(P, f ) S(Q, f )| |S(P, f ) | + | S(Q, f )| <

+ = ,
2 2

that is, the Cauchy Criterion is satisfied.


= Suppose the Cauchy Criterion is satisfied. There exist partitions Pk of
I such that P, Q Pk implies
|S(P, f ) S(Q, f )| <

1
,
k

k = 1, 2, . . . .

(3.2)

Let Qk = kj=1 Pj , for each k = 1, 2, . . ., and consider a fixed Riemann sum


S0 (Qk , f ). If , m k, then Qm Pk and Q Pk so, from (3.2),
|S0 (Q , f ) S0 (Qm , f )| <

1
,
k

, m k.

Thus, {S0 (Qk , f )} is a Cauchy sequence in Rm so


lim S0 (Qk , f ) =

exists.

(3.3)

R
It remains to show that = I f (Theorem 2.20). From (3.3) it follows that, if
> 0, there is a natural number N such that
1

<
N
2

and |S0 (QN , f ) | <

.
2

(3.4)

i
i

62

muldown
2010/1/10
page 62
i

Chapter 3. Riemann Integration

If P is any refinement of QN , then both P and QN are refinements of PN (from the


definition of QN ), so each Riemann sum S(P, f ) satisfies
|S(P, f ) | |S(P, f ) S0 (QN , f )| + |S0 (QN , f ) | <

1
+ < + .
N
2
2 2

by (3.2) and (3.4).

Lemma 3.6. Let I be an interval in Rn . If P is a partition of I into a finite


collection of non-overlapping subintervals {Ii }, then
X
(Ii ).
(I) =
i

Proof. This can be proved by a straightforward induction on the number of subintervals into which I is partitioned. Do it.
The next corollary gives an equivalent but more readily applicable version of
the Cauchy Criterion.
R
Corollary 3.7 (Cauchy Criterion). I f exists if and only if for each > 0,
there exists a partition P such that if S1 (P , f ) and S2 (P , f ) are any two Riemann
sums corresponding to P , then
|S1 (P , f ) S2 (P , f )| < .
Proof. It is evident from Theorem 3.5 that the condition is necessary. To see that
it is sufficient, let P satisfy the requirement of Corollary 3.7, and let P and Q be
refinements of P , generating subintervals {Ak } and {Bj } respectively. Then

P
P


|S(P, f ) S(Q, f )| = k f (pk )(Ak ) j f (qj )(Bj )
P hP
i
P


f
(q
)(B
)
f
(p
)(A
)

= i

j
j
k
k
Bj Ii
Ak Ii
(3.5)
where Ii are the subintervals generated by P . Now there exists points xi , xi in Ii
such that


X

X




f (pk )(Ak )
f (qj )(Bj ) f (xi ) f (xi ) (Ii ).
(3.6)

Ak Ii

Bj Ii
Indeed, to see that (3.6) is true, let

f (xi ) = max{f (pk ), f (qj )},


k,j

and f (xi ) = min{f (pk ), f (qj )},


k,j

i
i

3.2. Cauchy Criteria and Properties of Integrals

muldown
2010/1/10
page 63
i

63

where the max and min are taken over the k and j for which Ak Ii and Bj Ii ,
two finite sets. Then
X
X
X
X
f (x )
(Ai ) f (xi )
(Bj )
f (pk )(Ak )
f (qj )(Bj )
Ak Ii

f (xi )

Ak Ii

Bj Ii

(Ai ) f (x )

Ak Ii

Bj Ii

(Bj )

Bj Ii

and (from the lemma)


X

Ak Ii

(Ai ) =

(Bj ) = (Ii )

Bj Ii

so (3.6) follows.
From (3.5) and (3.6), we find
X
[f (xi ) f (xi )] (Ii ) = |S1 (P , f ) S2 (P , f )| < .
|S(P, f ) S(Q, f )|
i

Thus, the Cauchy


Criterion for integrability of f (as proved in Theorem 3.5) is
R
satisfied, i.e. I f exists.
Theorem 3.8. If f is a continuous real-valued function on I, then

f exists.

Proof. Since I is compact (why?), f is uniformly continuous on I (Theorem 2.41).


If > 0 is given, there is a > 0 such that |pq| < implies |f (p)f (q)| < /(I).
We may choose a partition P sufficiently fine to ensure that the subintervals Ii
which it generates have diameter
(Ii ) < .
Hence, if p, q Ii , then
|p q| (Ii ) < = |f (p) f (q)| < /(I),
and consequently, when S1 (P , f ) and S2 (P , f ) are any two Riemann sums corresponding to P , we have



X



f (pi ) f (qi ) (Ii )
|S1 (P , f ) S2 (P , f )| =


i
X
X

|f (pi ) f (qi )|)(Ii ) <


(Ii ) =

(I) = .
(I)
(I)
i
i
Thus,

f exists by the Cauchy Criterion (Corollary 3.7).

Theorem 3.9. Suppose that

i
i

64

muldown
2010/1/10
page 64
i

Chapter 3. Riemann Integration

(i) f is bounded on I,
(ii) the set of points of discontinuity of f has content zero.
R
Then I f exists.
Proof.

Suppose

|f (p)| N,

p I

(3.7)

and K is the set of points of discontinuity of f in I. Let > 0. Since K has content
zero, there exists a partition Po of I such that if Ii are the subintervals generated
by Po , then
X

.
(3.8)
(Ii ) <
4N
i,KIi 6=

Let

L
L

Figure 3.4. The subintervals, L1 meeting K and those not L2 .


L1 = i,KIi 6= Ii ,

L2 = i,KIi = Ii .

Then f is continuous on L2 , which is a compact subset, and so f is uniformly


continuous on L2 . There exists > 0 such that p, q L2 and
|p q| < = |f (p) f (q)| <

.
2(I)

Let P be a partition of I, generating subintervals Ii , such that


Po P

and (Ii ) < .

Thus, if pi , qi Ii L2 , we have |pi qi | (Ii ) < , and so


|f (pi ) f (qi )| <

.
2(I)

(3.9)

Thus,


X

 X


|f (pi ) f (qi )| Ii
f (pi ) f (qi ) Ii
|S1 (P, f ) S2 (P, f )| =


i

i
i

3.2. Cauchy Criteria and Properties of Integrals

Ii L1

< 2N

muldown
2010/1/10
page 65
i

65

X


|f (pi ) f (qi )| Ii
|f (pi ) f (qi )| Ii +

Ii L1

Ii L2

Ii )
Ii ) +
2(I)
Ii L2

+
(I) =
2N
4N
2(I)

by (3.7),
(3.8), and (3.9). Thus, f satisfies the Cauchy Criterion (Corollary 3.7)
R
and I f exists.
Example 3.10 Let f (x, y) =R xy 2 , 0 x 1, 0 y 1 and I = [0, 1] [0, 1].
Then f is continuous on I so I f exists. Partition I into k 2 intervals of side length
1
k:


 


i1 i
j1 j
Pk := Ii,j =

, i = 1, . . . , k, j = 1, . . . , k .
,
,
k k
k
k

2
j1 2
f (pi,j ) ki kj , so that
If pi,j Ii,j , then i1
k
k

2
 2
k
k
X
X
i1 j1
i j
1
1
S(Pk , f )
2
2
k
k
k
k
k
k
i,j=1
i,j=1

and the same estimate holds for any S(P, f ) if P Pk . Now



2
k
k1 k1
1 X X 2
1 X i1 j1
j
i
= 5
k 2 i,j=1 k
k
k i=1 j=1

1 k(k 1) (k 1)k(2k 1)
1

k5
2
6
6
 2
k
k
k
1 X i j
1 X X 2
j
i
=
k 2 i,j=1 k k
k 5 i=1 j=1
=

=
Thus,

1 k(k + 1) k(k + 1)(2k + 1)


1
.
5
k
2
6
6

f = 16 .

Definition 3.11. Suppose that


(i) D is a bounded subset of Rn (so D I for some closed interval I in Rn )
(ii) f : D 7 R.
Extend the domain Rof f by defining f (p) = 0 if p 6 D. Then we say that
exists if and only if I f exists and define
Z
Z
f = f.
D

i
i

66

muldown
2010/1/10
page 66
i

Chapter 3. Riemann Integration

Theorem 3.12. Suppose


(i) D is bounded and the boundary of D, D, has content zero,
(ii) f is continuous and bounded on D.
R
Then D exists.

Proof. The function f with its domain extended to the interval I D as in the
previous definition, is continuous at eachR point of I except possibly
at points in
R
f
exists
and, by
f
exists.
Therefore,
D. But (D) = 0, so
by
Theorem
3.9,
D
I
R
definition, is equal to I f .
Definition 3.13. A bounded set D Rn has content if
Z
1.
(D) :=

1 exists and

Equivalently, we define the characteristic function D of D by


D (p) =

1,
0,

if p D;
if p
6 D,

Rand let I be any closed interval containing D as a subset. Then D has content if
exists and then
I D
Z
(D) := D .
I

Theorem 3.14. A bounded subset D Rn has content if and only if (D) = 0.


Proof.

Exercise 4.

Theorem 3.15 (Properties of the Integral).


R
R
R
(a) If D f and D g exist and , R, then D (f + g) exists.
R
(b) If f (p) 0 for all p D and D f exists, then
Z
f 0.
D

(c) If

f exists, then

|f | exists, and
Z Z


f
|f |.


D

i
i

3.2. Cauchy Criteria and Properties of Integrals


(d) If

D1

f and

D2

muldown
2010/1/10
page 67
i

67

f exist, and (D1 D2 ) = 0, then


Z
Z
Z
f.
f+
f=
D1 D2

(e) If (i) D has content, (ii)


then

D2

D1

f exists, and (iii) m f (p) M for all p D,


Z
f M (D).
m(D)
D

(f ) (Mean Value theorem for Integrals). If D is a compact connected set with


content (i.e.(D) = 0), and f is continuous on D, then there exists p0 D
such that
Z
f = f (p0 )(D).
D

Proof. (a) There is no loss of generality by assuming D is an interval in this case.


Let > 0 be given. Then
R
1. D f existsRimplies there exists a partition P1, of D such that for P P1, ,
|S(P, f ) D f | < .
R
2. D g existsRimplies there exists a partition P2, of D such that for P P2, ,
|S(P, f ) D g| < .

Let P = P1, P2, . Then if P P , we have



Z
Z
Z
Z



= S(P, f ) + S(P, g)
S(P, f + g)
g
f

g
f



D
D
D

D
Z
Z




|| S(P, f )
f + || S(P, g)
g
D

(|| + ||).
R
R
R
Thus, D (f + g) exists and equals D f + D g.
Proof of (b). Exercise 6
Proof of (c). Exercise 7
Proof of (d): Let Di be the characteristic functions of Di , i = 1, 2, and let I
be a closed interval containing D1 D2 . By definition
Z
Z
Z
f D1
f = f D1 =
ZD1 D2
ZI
ZD1
f D2
f = f D2 =
D2

D1 D2

By part (a) of the present Theorem, D1 D2 f (D1 + D2 ) exists and


Z
Z
Z
Z
Z
f+
f D2 =
f D1 +
f (D1 + D2 ) =
D1 D2

D1 D2

D1 D2

D1

f.

D2

i
i

68

muldown
2010/1/10
page 68
i

Chapter 3. Riemann Integration

From Exercise 3 and the fact that


f (p)(D1 (p) + D2 (p))D1 D2 (p) = f (p)D1 D2 (p),

p 6 D1 D2 ,
with
(D1 D2 ) = 0,

then also by definition we have


Z
Z
f (D1 + D2 ) = f (D1 + D2 )D1 D2
D1 D2
Z
ZI
f.
= f D1 D2 =
D1 D2

Thus,

D1 D2

f exists and equals

D1

f+

D2

f.

Proof of (e): Exercise 8.


Proof of (f): There will be two cases.
Case (i): If (D) = 0, then since m (D) = M (D) = 0, by part (e) we have
Z
Z
f = 0 =
f = f (p0 )(D), p0 D.
D

Case (ii): If (D) 6= 0, then by (e)




R
f
m := inf{f (p) : p D}
= m D M.
M := sup{f (p) : p D}
(D)

By Corollary 2.38, D compact and f continuous on D implies that there exist points
p1 , p2 D with f (p1 ) = m and f (p2 ) = M . Therefore, by Corollary 2.34 (the
Intermediate Value Theorem), there exists p0 D such that
R
Z
f
D
f (p0 ) =
f.
= f (p0 )(D) =
(D)
D
Discussion: A subset of Rn has Jordan measure zero if, for each > 0, there is a
finite collection of intervals {Ik } such that
X
K k Ik and
(Ik ) .
k

K has Lebesgue measure zero if, for each > 0, there is a countable collection
of intervals {Ik } satisfying the displayed relation. This simple extension of the
concept of measure zero has profound consequences. We know that the set of
rationals in [0, 1] does not have Jordan content. However, if this set is enumerated
as {rk : k = 1, 2, 3, . . .}, then rk Ik with (Ik ) = 2k yields
{rk : k = 1, 2, 3, . . .} k Ik

and

X
k

(Ik ) =

X
1
= .
2k
k=1

i
i

3.2. Cauchy Criteria and Properties of Integrals

muldown
2010/1/10
page 69
i

69

Thus the set of rationals in [0, 1] has Lebesgue measure zero. In fact, the same
argument shows that Q, or any countable set of real numbers has Lebesgue measure
zero. Countable sets are not the only sets of Lebesgue measure zero however; in
fact, they can be quite complicated. The remarkable Cantor set which you will
study in Real Analysis has Lebesgue measure zero but nonetheless has the same
cardinality as [0, 1], i.e. it has zero length but has the same number of points
as [0, 1]. A simple discussion of the Cantor set can be found in Bartles book.
An interesting theorem of Lebesgue
states that, for a bounded real-valued
R
function f , the Riemann integral I f exists if and only if the set of points in I at
which f is discontinuous has Lebesgue measure zero. For example, for the functions
in Exercises 15 and 16, the one in the first exercise is discontinuous at each point in
[0, 1], while the function in the second exercise is only discontinuous at the rational
points in [0, 1]

Exercises
3.3. Suppose f and g are bounded on I and f (p) = g(p) for all p I\K, where
K has content zero. Prove that
Z
Z
Z
f exists = g exists and equals f.
I

3.4. Prove Theorem 3.14.


3.5. Give an example of a countable set which has content and one which does
not have content (contented and discontented sets??). Is there a countable
set with positive content?
3.6. Prove Theorem 3.15 (b).
3.7. Prove Theorem 3.15 (c).
3.8. Prove Theorem 3.15 (e). Hint: Use parts (a) and (b).
1 1
3.9. Prove that {( m
, k ) : m, k = 1, 2, . . .} has content zero in R2 .
3.10. Prove that the set of points in [0, 1] [0, 1] with rational coordinates does not
have content in R2 .
3.11. Let D be a subset of Rn Rwhich has content zero and f be any bounded
function on D. Prove that D f exists and equals zero.
3.12. RLet D be any bounded subset of Rn and f (p) = 0 for all p D. Prove that
D f exists and equals 0. Note that D does not necessarily have content.
R1
3.13. Show that 0 sin( x1 ) dx exists and is independent of the value assigned to the
integrand at x = 0.
R
R
3.14. If D f exists and D1 is any subset of D with content, then D1 f exists
(D itself does not necessarily have content). Note that this exercise proves
Rb
Rc
in particular that the existence a f implies the existence of a f for any
c (a, b).

i
i

70

muldown
2010/1/10
page 70
i

Chapter 3. Riemann Integration

3.15. If f (x) = 0, x 6 Q and f (x) = 1 for x Q, then f is not Riemann integrable


on [0, 1].
3.16. If

0, if x 6 Q
f (x) = 1 , if x = p and gcd(p, q) = 1,
q
q
R1
then 0 f = 0.
3.17. If P is a partition of the interval I and f : I 7 R, f bounded, define
S(P, f ) and S(P, f ), the lower and upper Riemann sums corresponding to
the partition P to be inf{S(P, f )} and sup{S(P, f )} repectively, the inf and
sup being taken over all Riemann sums corresponding to the partition P and
the function f . Let
Z

f = sup{S(P, f )}

and

f = inf P {S(P, f )},

the sup and inf here being taken over all partitions P of I.
(i) Show that Q P implies
S(P, f ) S(Q, f ) S(Q, f ) S(P, f ).
(ii) Show that

If.

(iii) (Definition). f is integrable on I if


Z

f=

f=

f=
I

f , and then we define

f.
I

(iv) Show that f is integrable in this sense if and only if there is exactly one
number such that
S(P, f ) S(P, f )
R
for every partition P of I, in which case I f = .
R
(v) (Cauchy Criterion). Show that I f exists in this sense if and only if, for
each > 0, there exists a partition P of I such that
|S(P , f ) S(P , f )| < .
(vi) Show that for a bounded f , f is integrable on I in the sense of (iii) if
and only if f is integrable in the sense adopted
in these notes and that
R
both definitions give the same value for I f .

i
i

3.3. Evaluation of Integrals

3.3
3.3.1

muldown
2010/1/10
page 71
i

71

Evaluation of Integrals
Real valued functions of a real variable

Theorem 3.16 (Fundamental Theorem of Calculus). If f is continuous on


[a, b], then there is a function F on [a, b] such that
F (x) = f (x),

a x b.

Rx
Proof. Consider the function F (x) = a f , a x b which exists by Exercise 14,
we have from Theorem 3.15 (d) and (f) that
F (x + h) F (x) =

x+h

f = f (ch )h,

where ch is in the interval between x and x + h. Since f is continuous on [a, b],


F (x + h) F (x)
= lim f (ch ) = f (x).
h0
h0
h
lim

Any function F with the property that F (x) = f (x), a x b, is called an


antiderivative of f on [a, b].
Proposition 3.17. If F1 and F2 are antiderivatives of f , then F1 F2 is a constant
function.
Proof.

F1 (x) F2 (x) = f (x) f (x) = 0, for all x. Therefore, F1 F2 is constant.

Theorem 3.18. If f is continuous on [a, b] and F is any antiderivative of f on


[a, b], then
Z b
f = F (b) F (a).
a

Rx
Proof. By the proceding Proposition, there is a constant C so that a f = F (x)
Ra
C. When x = a, this yields a f = 0 = F (a) C, while using this when x = b we
Rb
find a f = F (b) C = F (b) F (a).
Corollary 3.19 (Change of Variable Formula). Suppose that
(i) exists and is continuous on [a, b].
(ii) f is continuous on ([a, b]).

i
i

72

muldown
2010/1/10
page 72
i

Chapter 3. Riemann Integration

Then

(b)

f=

f (x) dx =

(a)

Proof.

(a)

(b)

(f ) ,

or

f ((u)) (u) du.

If F is an antiderivative of f , then
d
F ((u)) = f ((u)) (u),
du

so that F is an antiderivative of (f ) . Therefore,


Z

(b)
(a)

f = F ((b)) F ((a)) =

b
a

(f ) .

You will have noticed that we used the Chain Rule here even though it was
not yet proven in these Notes. A proof will be given in the next chapter in a more
general context.
R /2
Example 3.20 To evaluate 0 sin2 (u) cos(u) du, we take f (x) = x2 , (u) =
sin(u), and (u) = cos(u). Then
Z

/2

sin (u) cos(u) du =

(/2)
2

x dx =

(0)

x2 dx =

1
.
3

Corollary 3.21 (Integration by Parts Formula). If f and g are continuous


on [a, b]. then
Z b
Z b
f g +
f g = f (b)g(b) f (a)g(a).
a

Proof. An antiderivative of f g + f g is f g. Therefore, the formula follows immediately.

Example 3.22
Z

1 Z

xex dx = xex
0

1

ex dx = e (ex ) = e (e 1) = 1.
0

i
i

3.3. Evaluation of Integrals

muldown
2010/1/10
page 73
i

73

Real valued functions on R2

3.3.2

The important theorem of Fubini allows us to extend the use of the Fundamental
Theorem of Calculus to functions of more than one variable.
Theorem 3.23 (Fubinis Theorem). Let f : I = [a, b] [c, d] 7 R. Suppose that
R
(ii) I f exists
(ii)

Rd

Then

f (x, y) dy = F (x) exists for each x [a, b].

Rb
a

F (x) dx exists and


Z

F (x) dx =
a

Proof. The definition of


I such that

"Z

f (x, y) dy dx =
c

f.
I

f states that if > 0 is given, there is a partition P of



Z



P P = S(P, f ) f < ,
I

for any Riemann sum S(P, f ). Writing this in detail,

P = {x0 , x1 , . . . , xm } {y0 , y1 , . . . , yk } P


m k
Z

X X
f (xi , yj )(xi xi1 )(yj yj1 ) f < ,
=
I
i=1 j=1

where xi1 xi xi and yj1 yj yj . This may be written as



m
Z
k
X

X
(xi xi1 )
f (xi , yj )(yj yj1 ) f < .

I
i=1
j=1

(3.10)

Condition (ii) implies that for any fixed set of numbers xi , i = 1, . . . , m, the partition {yj } of [c, d] may be chosen to be so fine that



X
Z d

k



f (xi , y) dy <
f (xi , yj )(yj yj1 )
, i = 1, . . . , m,

b

a
c

j=1

Rd
since each of the integrals c f (xi , y) dy exists. Therefore, for the xi we have



X

k




f (xi , yj )(yj yj1 ) F (xi ) <
, i = 1, . . . , m.
(3.11)

ba
j=1

i
i

74

muldown
2010/1/10
page 74
i

Chapter 3. Riemann Integration

Then (3.11) implies


P

Pk
Pm

m
i=1 (xi xi1 ) j=1 f (xi , yj )(yj yj1 ) i=1 (xi xi1 )F (xi )
Pm

< ba
i=1 (xi xi1 ) = ba (b a) = .
The triangle inequality with (3.10) and (3.12) gives

Z
m
X


F (xi )(xi xi1 ) f < 2.


I

(3.12)

i=1

Thus, from the definition of the integral,

Rb
a

F exists and equals

f.

Corollary 3.24 (Interchanging the order of integration). Let I = [a, b][c, d].
If
R
(i) I f exists

(ii)

(iii)

Rd

then

f (x, y) dy exists for each x [a, b],

Rb
a

f (x, y) dx exists for each y [c, d],


Z

"Z

f (x, y) dy dx =
c

d
c

"Z

f (x, y) dx dy.
a

Corollary 3.25. Let I = [a, b] [c, d]. Suppose


(i) f is bounded on I,
(ii) f is continuous on I\K with 2 (K) = 0 (R2 content zero),
(iii) 1 (K {(x, y) : c y d}) = 0 for each x [a, b] (R1 content on vertical
segments).
Then

f=

"Z

f (x, y) dy dx.

R
Proof. Conditions (i) and (ii) imply that the integral I f exists by Theorem 3.9.
Condition (iii) says that the intersection of K with each vertical line (considered as
Rd
a set in R) has content zero so that c f (x, y) dy exists for each x [a, b] again by
Theorem 3.9. Therefore, all the conditions of Fubinis Theorem are satisfied.

i
i

3.3. Evaluation of Integrals

muldown
2010/1/10
page 75
i

75

Corollary 3.26. If D := {(x, y) : a x b, (x) y (x)}, where and


are continuous on [a, b] and f : R2 7 R is continuous on D, then
#
Z b "Z (x)
Z
f=
f (x, y) dy dx.
D

(x)

Proof. Use Corollary 3.25. The graphs of and have zero content in R2 . Each
vertical line intersects each graph once, and these two points have 1 dimensional
content zero. See Figure 3.5

Figure 3.5. Area between two curves.


Example 3.27 Applications of Fubinis Theorem:
(1) f (x, y) = xy 2 , I = [0, 1] [0, 1]. Then

Z
Z 1 Z 1
Z
xy 2 dy dx =
f=
I

Also,

f=

Z

1
0

x
1
dx = .
3
6

y2
1
dy = .
2
6


Z
xy dx dy =
2

(2) f (x, y) = x3 y 2 , D = {(x, y) : 0 x 1, x2 y x}. Then (see Figure 3.6



Z
Z 1 Z x
Z 1
1 3 3 y=x
3 2
x y
dx
f=
x y dy dx =
y=x2
D
0
x2
0 3

 7

Z 1 6
x
x
x9
x10 1
1
=
dx =

.
=
3
6
21
30
70
0
0
Also,

"Z

1 4 2 x= y
x y
dy
x=y
y
0 4
0

 5

Z 1 4
y
y6
y 7 1
1
y
dy =

.
=
=
4
4
20 28 0
70
0

f=
D

3 2

x y dx dy =

i
i

76

muldown
2010/1/10
page 76
i

Chapter 3. Riemann Integration

y=x
2

y=x

Figure 3.6. Second area in Example 3.27.


The examples just given have iterated integrals which are all easily evaluated. However, one sometimes encounters iterated integrals where the antiderivatives cannot be found in terms of elementary functions. A simplification is sometimes achieved by using Fubinis Theorem to reverse the order of the integration.
i
R 1 hR 1
R
Example 3.28 (1) Let 0 y ey/x dx dy = D f . By Fubinis Theorem, this
integral may be rewritten as

Z 1 Z x
Z 1
Z 1
y=x
y/x
y/x
e
dy dx =
xe
dx =
x(e 1) dx = (e 1)/2.
0

y=0

(2) Suppose that


Z

Z

f (x, y) dy dx =

where
(y) =

f=

f (x, y) dx dy

1, if 0 y 1
y, if 1 y 2.

Hence, the last integral may be rewritten as



Z 1 Z 2
Z 2 Z
f (x, y) dx dy +
0

(y)

"Z


f (x, y) dx dy.

Further worked examples may be found in Buck, pp. 115119.

3.3.3

Real valued functions on Rn

Fubinis Theorem may be stated for n-dimensional intervals as follows

i
i

3.3. Evaluation of Integrals

muldown
2010/1/10
page 77
i

77

y=x
y=x
x=1

x=1

x=2

y=0

y=0

Figure 3.7. The figures for Example 3.28.


Theorem 3.29 (Fubinis Theorem). Let I and Im be closed intervals in R and
Rm respectively, with n = + m. Thus,
In = I Im = {(p, q) : p I , q Im }
is a closed interval in Rn . Let f : In 7 R and suppose
R
(i) In f exits
R
(ii) F (p) = Im f (p, q) dq exists for each p I .
Then

exists and equals


Notes:

In

f.

F =

Z Z
I

Im


f (p, q) dq dp

(1) The proof of Theorem 3.29 is exactly the same as that given before when n = 2.
(2) The symbols dp, dq above are simply used as devices to indicate the spaces
on which we are integrating.
(3) For a more general formulation of Fubinis Theorem where the condition (ii) is
dropped see Calculus on Manifolds by M. Spivak (p. 58). However, the above
statement of the theorem is sufficient for our needs.
(4) We will prove a change of variables formula for integrals in higher dimensions
in a later chapter.

Exercises
3.18. Let f be a real-valued function on [a, b] such that f (x) exists for each x [a, b]
Rb
and a f exists. Prove that
Z b
f (b) f (a) =
f .
a

i
i

78

muldown
2010/1/10
page 78
i

Chapter 3. Riemann Integration


(You may not use the Fundamental Theorem of Calculus. Why?)

3.19. Let f be a non-negative continuous function on [a, b]. Prove that


content in R2 of the set

Rb
a

f is the

D := {(x, y) : a x b, 0 y f (x)}.

Rb
R
That is, show D 1 = a f .
3.20. Let D := {(x, y) : 1 x 3, x2 y x2 + 1}. Show that
#
Z "Z 2
3

x +1

(D) =

dy dx = 2.

x2

3.21. Let f (x, y) = x2 + y 2 , D = {(x, y) : 0 x a, 0 y b}. Show


Z
1
f = ab(a2 + b2 ).
3
D
R
3.22. Let f (x, y) = xy, D be the region in the diagram. Show D f =
this in two ways).

131
120 .

(Do

y
y=1
y=x

y=(x3)
4

Figure 3.8. Figure for Exercise 22.


R
3.23. Find D f where f (x, y) = sin( xa + yb ) and D = {(x, y) : 0 x a/2, 0
y b/2}.
3.24. Let D be the region bounded
by the curves x2 y 2 = 1, x2 + y 2 = 4, which
R
contains (0, 0). Find D f where f (x, y) = x2 .
R
3.25. Let f (x, y) = x. Prove that D f = 77/4, where D is the region in R2
illustrated in Figure 3.9.
3.26. Let f (x, y) = g(x) with (x, y) [a, b] [c, d] = I.
R
Rb
R
(i) Prove that if a exists, then I f exists. Deduce from this that D f exists
where D is any subset of I which has content.
R1
(ii) Suppose g is defined on [0, 1] and 0 g exists. Prove that
Z

Z


Z
g(t) dt dx =

tg(t) dt.

3.27. Let f : I 7 R be a bounded function on the closed interval I.

i
i

3.3. Evaluation of Integrals

muldown
2010/1/10
page 79
i

79

y=4

1 y=(2x+4)/3
0
2
y=x 4x + 5
000
111
000
111
00
11
x=4
00
11
y=x/2

x=0

Figure 3.9. Figure for Exercise 25.


(i) If

f exists, then

f 2 exists. [Hint:

|f (p)2 f (q)2 | = |f (p) f (q)||f (p) + f (q)| 2M |f (p) f (q)|,

where M := sup{|f (p)| : p I}.]


R
R
R
(ii) Deduce from (i) that if I f and I g exist, then I f g exists. [Hint:
(f g)2 = f 2 2f g + g 2 .]

3.28. The Mean Value Theorem for Integrals (Theorem 3.15) implies the following
If f is continuous on [a, b], then there exists c [a, b] such that
Z b
f = f (c)(b a).
a

Can you replace continuous by a less restrictive condition which still implies the result? Hint: [What was Darbouxs first name?]
3.29. (Cavalieris Principle) Let A and B be subsets of R2 with content. If x R,
define
Ax := {y : (x, y) A},
Bx := {y : (x, y) B}.

(sections of A and B). Suppose that, for each x, Ax and Bx have content
in R and 1 (Ax ) = 1 (Bx ). Prove that 2 (A) = 2 (B). [Hint: Spell Fubini.]
R1
3.30. Let f be a real-valued function on [0, 1] such that 0 f exists. Define ak by
k
1X j
ak :=
f ( ),
k j=1 k

k = 1, 2, 3, . . . .

Show that {ak } is a convergent sequence and that


Z 1
lim ak =
f.
k

3.31. Let f, g be integrable on [a, b] with g(x) 0, a x b, and f continuous on


[a, b]. Prove that there exists c [a, b] such that
Z b
Z b
f g = f (c)
g.
a

i
i

80

muldown
2010/1/10
page 80
i

Chapter 3. Riemann Integration

3.32. Let D be a compact subset in Rn and f : Rn 7 Rm be continuous on D.


Show that {(p, f (p)) : p D} has Jordan content zero in Rn+m .
3.33. Let f be a continuous real-valued function on [0, 1].
(i) Prove that

xf (sin(x)) dx =
0

/2

f (sin(x)) dx.

(ii) From part (i) deduce that


Z
0

2
x sin(x)
.
dx =
2
4
2 sin (x)

3.34. Use Fubinis Theorem to show that


#
Z 1 "Z
Z 1 "Z 2y
f (x, y) dx dy =

x2

f (x, y) dy dx+

"Z

(x1)2

f (x, y) dy dx.

R
3.35. Show that D 1 = 61 where D := {(x, y, z) : x 0, y 0, z 0, 0
x + y + z 1}.
3.36. Let D be a subset of R2 with content and f be a positive continuous function
on D. Use Fubinis Theorem to show that if
K := {(x, y, z) : (x, y) D, 0 z f (x, y)},
then
3 (K) =

1=

f.

Deduce that m2 (D) 3 (K) M 2 (D) where m, M are lower and upper
bounds for f on D.
3.37. Evaluate
Z 2Z 2
Z Z
2
sin(y)
ex dx dy and
dy dx.
y
0
y
0
x
3.38. Show

"Z 2
1y

1y 2

sin

1 x2

dx dy = 1.

Explain carefully why each integral you consider exists.


3.39. See also exercises pp. 122-124 in Buck.
References for Chapter III
R. G. Bartle: Chapter VI
R. C. Buck: Chapter III.

i
i

muldown
2010/1/10
page 81
i

Chapter 4

Differentiation 0f
Functions of Several
Variables
4.1
4.1.1

Preliminaries
Linear Functions

Definition 4.1. A function L : Rn 7 Rm is linear, if for all p, q Rn and R,


there holds
(i) L(p + q) = L(p) + L(q), (L is additive)
(ii) L(p) = L(p), (L is homogeneous).
If v0 Rm and L is linear, then the function M :
Rn 7 Rm defined by M (p) = v0 + L(p). is an affine function.
Linear algebra plays a prominent background role in this chapter as the next
theorem suggests.
Theorem 4.2. A function L : Rn 7 Rm is linear if and only if there is a matrix
C := [Ci,j ], i = 1, . . . , m, j = 1, . . . , n, such that if p = (x1 , . . . , xn ) Rn and
L(p) = q = (y1 , . . . , ym ) Rm , then

C1,1 . . . C1,n x1
y1
.. ..
.. .. , or qT = CpT
(T is the transpose).
. = .
...
. .
x
ym
Cm,1 . . . Cm,n
n
Pn
In particular, yi = j=1 Ci,j xj , i = 1, . . . , m.

Proof. = If L is representable by a matrix as above, then L is linear by the


nature of matrix multiplication from linear algebra.
= Conversely, let L be linear,
L(p) = (L1 (p), . . . , Lm (p)) = (y1 , . . . , ym ).
81

i
i

82

muldown
2010/1/10
page 82
i

Chapter 4. Differentiation 0f Functions of Several Variables

If ej is the vector with 1 is the jth coordinate and has zeros elsewhere, j = 1, . . . , n,
then
p = (x1 , . . . , xn ) = x1 e1 + . . . + xn en =
L(p) = L(x1 e1 + . . . + xn en ) = x1 L(e1 ) + . . . + xn L(en ).

Thus, for the coordinate functions,


Li (p) = Li (x1 e1 + . . . + xn en ) = x1 Li (e1 ) + . . . + xn Li (en ), i = 1, . . . , m.
Pn
Setting Ci,j = Li (ej ), we see that yi = j=1 Li (ej )xj , and the matrix C has the
form
C = [ L(e1 )T . . . L(en )T ] .
Recall from linear algebra that the rank of a matrix C is the number of vectors
in the largest linearly independent set which can be chosen from the columns of C.
Also, if L : Rn 7 Rm is linear with the matrix representation C, then
rank C = dim(L(Rn )),
here L(Rn ) := {x1 L(e1 ) + . . . + xn L(en ) : xi R, i = 1, . . . , n}. So, the minimum
number of vectors which it takes to span L(Rn ) in Rm is precisely equal to the
rank C. The range of L(Rn ) is a linear subspace of Rm (e.g., if m = 3, it is either
the origin, or a line through the origin, or a plane containing the origin, or all of
R3 ). If v0 is a vector in Rm , then
v0 + L(Rn ) := {v0 + v : v L(Rn )} = {v0 + L(u) : u Rn },
is called an affine space and is evidently L(Rn ) translated by the vector v0 .

z
3

L(R )

v +L(R )
0

(0,0,0)

vo

v o+u
y

(0,1,0)

x
Figure 4.1. An affine plane in R3 .
Example 4.3

1 1
rank 1 1
0 1

0
0 = 2.
1

i
i

4.1. Preliminaries

muldown
2010/1/10
page 83
i

83

Thus, for the linear map L : R3 7 R3 corresponding to this matrix, the range
L(R3 ) has dimension 2. In fact, it consists of all vectors of the form (s, s, t), that is
the plane x = y. An example of an affine space in R3 is (0, 1, 0) + L(R3 ), which is
the parallel plane through the point (0, 1, 0), i.e., the plane y = x + 1.
Theorem 4.4. If L is linear, there is a constant M such that
|L(p)| M |p|,

p Rn .

Proof. Using the matrix representation of a linear map, we have by the CauchySchwartz inequality
v
v
uX
u n
n
X
u n 2 uX
t
Ci,j t
x2j
Ci,j xj |
|yi | = |Li (p)| = |
j=1

j=1

j=1

i=1 j=1

j=1

v
v
v
um
uX
uX
m X
n
uX
u
u n 2
2 t
= |L(p)| = t
yi2 t
Ci,j
xj
i=1

v
uX
n
um X
2 .
or |L(p)| M |p| where M := t
Ci,j
i=1 j=1

Corollary 4.5. A linear function L : Rn 7 Rm is uniformly continuous on Rn .


Proof.

By the previous theorem


|L(p1 ) L(p2 )| = |L(p1 p2 )| M |p1 p2 |.

Thus, |p1 p2 | < /M = |L(p1 ) L(p2 )| < .


Definition 4.6. L : Rn 7 Rm is one-to-one (1-1) if L(p1 ) = L(p2 ) implies
p1 = p2 . Equivalently, p1 =
6 p2 implies L(p1 ) 6= L(p2 ).
The next theorem says that for one-to-one linear functions, the inequality in
Theorem 4.4 can be reversed!
Theorem 4.7. Suppose L : Rn 7 Rm is linear. Then L is one-to-one if and only
if there exists a constant k > 0 such that |L(p)| k|p| for all p Rn .
Proof.

= If such a constant k exists, then


|L(p1 ) L(p2 )| = |L(p1 p2 )| k|p1 p2 |.

Thus, p1 6= p2 = |L(p1 ) 6= L(p2 ).

i
i

84

muldown
2010/1/10
page 84
i

Chapter 4. Differentiation 0f Functions of Several Variables

= L is continuous on Rn by Corollary 4.5, which implies that |L| is continuous on Rn , and in particular, |L| is continuous on S := {p : |p| = 1}. Since S is
compact and |L| is continuous on S, |L| achieves its minimum on S at some point,
call it p0 . Thus, we define
k := min{|L(p)| : |p| S} = |L(p0 )|,

and |p0 | = 1.

Since L(O) = 0 by linearity and L is one-to-one, we have k = |L(p0 )| > 0 = |L(O)|.


Finally, for an arbitrary p Rn \{O}, p/|p| S and




1
1

p =
|L(p)|.
k L
|p|
|p|

Exercises
4.1. Let L : R2 7 R3 be linear with L(e1 ) = (2, 1, 0), L(e2 ) = (1, 0, 1), where
e1 = (1, 0), e2 = (0, 1). Find L(2, 0), L(1, 1), L(1, 3). Draw pictures.
4.2. Show that L(R2 ) 6= R3 for the function in the last exercise.
4.3. Show that if L : R2 7 R3 is linear, then L(R2 ) 6= R3 .
4.4. Let L : R3 7 R2 be linear. Show that there are non-zero vectors p R3 such
that L(p) = O.


a b
4.5. If L : R2 7 R2 has a matrix
, show that
c d
(i) L(R2 ) is a point if and only if a = b = c = d = 0.
(ii) L(R2 ) is a line if and only if = ad bc = 0 and a2 + b2 + d2 + c2 > 0.
(iii) L(R2 ) = R2 if and only if 6= 0.

4.6. Show that in case (iii) of the last exercise, that L is one-to-one (and only in
that case), and that the inverse function L1 is linear with matrix
 d

b

.
c
a

4.7. Show that the sum and composition of two linear functions are linear. What
are the matrix representations of these?
4.8. If L : Rn 7 Rm is linear and one-to-one, then L1 is linear on its domain.
4.9. Let f : Rn 7 Rm be such that
(i) f (p + q) = f (p) + f (q), for all p, q Rn (additive)
(ii) f is continuous at O.

Prove that f is a linear function.


4.10. Let f : R 7 Rm be such that
f (x) = f (x)

(f is homogeneous).

i
i

4.1. Preliminaries

muldown
2010/1/10
page 85
i

85

Notice that f must be linear since f (x) = f (1)x. Show that homogeneity does
not imply linearity for functions of more than one variable. HINT: Consider
the function
( 3 3
x +y
if (x, y) 6= O
2
2,
f (x, y) = x +y
0,
otherwise.

4.1.2

Straight Lines and Curves

If c and u are points in Rn , then {c + tu : t R} is the straight line through c in


the direction of u. Recall that {p + t(q p) : t R} is the straight line through p
and q. It may also be considered as the straight line through p in the direction of
p q.

c+tu
p+t(qp)
u
q

qp
c

Figure 4.2. The straight line through p in the direction of q p.


Notice that this representation of a line is not unique; you may replace u by
any multiple u, 6= 0, and still get the same line.
If f : R 7 Rn , then we say that the set {f (t) : t R} is a curve in Rn
(a continuous curve if f is continuous). The line through f (0 ) and f ( ) when
f ( ) 6= f (0 ), is {f (0 ) + t(f (0 ) f ( )) : t R}. or equivalently,
{f (0 ) + t

f ( ) f (0 )
: t R}.
0

Thus, it is the line through f (0 ) in the direction

f ( ) f (0 )
. We therefore define
0

f( )

f( )
0

Figure 4.3. The tangent line to a curve and a secant line.


the tangent to the curve f (0 ) to be {f (0 ) + tf (0 ) : t R}, the line through f (0 )

i
i

86

muldown
2010/1/10
page 86
i

Chapter 4. Differentiation 0f Functions of Several Variables

in the direction of f (0 ), provided


f (0 ) = lim

f ( ) f (0 )
0

exists.

Evidently,

f (0 ) =

fn ( ) fn (0 )
f1 ( ) f1 (0 )
lim
, . . . , lim
0
0
0
0

= (f1 (0 ), . . . , fn (0 )).

It is also convenient to think of f (t) as being the position of a particle at time


t. Then, f (t) = (f1 (t), . . . , fn (t)) is the velocity vector of the particle at time t.
Notice also that the linear function L(t) = tf (0 ) is the best linear approximation to f (0 + t) f (0 ) in a neighborhood of t = 0 in the sense that


f (0 + t) f (0 ) tf (0 )
= 0.
lim

t0
t

Example 4.8 The direction of the tangent to {(t, t2 ) : t R} = C in R2 at the


origin is (1, 2t)|t=0 = (1, 0). So the tangent line to C at the origin is {O + t(1, 0) :
t R} or {(t, 0) : t R}.

4.2

The Directional Derivative and the Differential

Definition 4.9. Let f : Rn 7 Rm be a function with domain D Rn .


(i) If c is an interior point of D, u Rn , and
f (c + tu) f (c)
,
t0
t
lim

(t R)

exists, then this limit is called the directional derivative of f at c in the


direction of u, and is denoted fu (c).
(ii) If ei is the vector in Rn with 1 in the i-th coordinate and zeros elsewhere, then
fei (c) is usually denoted by
f
(c)
xi
and is called the partial derivative of f at c with respect to the variable xi ,
i = 1, . . . , n.
When we compute
f (c + tu) f (c)
t
we are in fact restricting our attention to the behaviour of the function f at c with
respect to a straight line through c in the direction of u. This straight line in Rn
(at least the portion of it that lies in D) is mapped by f into a curve in f (D) Rm .
lim

t0

i
i

4.2. The Directional Derivative and the Differential

muldown
2010/1/10
page 87
i

87

f(c+tu)
f(D)

f(c)+tf(u)

c+tu
f(c)

c
n

Figure 4.4. The vector fu (c) as the direction of the tangent in Rm .


The vector fu (c) is the direction of the tangent to this curve in Rm . (See Figure
4.4.)
It is also instructive to consider the graph G(D) Rn+m of f , where G :
n
R 7 Rn+m is defined by G(p) = (p, f (p)), p D. Computing
Gu (c) = lim

t0

G(c + tu) G(c)


(c + tu, f (c + tu)) (c, f (c))
= lim
t0
t
t
(tu, f (c + tu) f (c))
= (u, fu (c)).
= lim
t0
t

Thus, Gu (c) = (u, fu (c)) is the direction (in Rn+m ) of the tangent to the curve
{G(c + tu) : t R} = {(c + tu, f (c + tu)) : t R}.
at the point (c, f (c)). (See Figure 4.5.)
m

G(D)

(c,f(c))=G(c)
(c+tu,f(c)+tf (c))=G(c)+t G (c)
u

(c+tu,f(c+tu))=G(c+tu)

c
n

c+tu

Figure 4.5. The directional derivative and the graph of a function.


Example 4.10 We give several examples illustrating properties of directional derivatives:
(1) Consider the function f : R2 7 R defined by f (x, y) = x2 + y 2 . Then its
graph is the function G : R2 7 R3 defined by G(x, y) = (x, y, x2 + y 2 ). The
directional derivative of f at (x, y) in the direction of e1 = (1, 0) is f
x = 2x.
2
(0,
0)
=
0
which
implies
that
the
x-axis
in
R
is
mapped
by
In particular, f
x
G onto a curve in R3 which has a horizontal tangent at G(0, 0) = (0, 0, 0).

i
i

88

muldown
2010/1/10
page 88
i

Chapter 4. Differentiation 0f Functions of Several Variables


y

z
G

z=x 2+ y 2

f(R )
f
(1,0)

(0,0,0)
3

(0,0)

Figure 4.6. The figure for Example 4.10 (1).


(2) For the function f (x1 , x2 ) = (x1 , x2 , x21 + x22 ) and points c = (c1 , c2 ). u =
(u1 , u2 ), we have
f (c + tu) f (c)
t

1
= (c1 + tu1 , c2 + tu2 , (c1 + tu1 )2 + (c2 + tu2 )2 ) (c1 , c2 , c21 + c22 )
t
1
= (tu1 , tu2 , 2tu1 c1 + 2tu2 c2 + t2 u21 + t2 u22 )
t
= (u1 , u2 , 2u1 c1 + 2u2 c2 + tu21 + tu22 ).
Therefore,
fu (c) = lim (f (c + tu) f (c))
t0

Notice that

1
= (u1 , u2 , 2u1 c1 + 2u2 c2 ).
t


u1
1

= 0
u2
2u1 c1 + 2u2 c2
2c1

so that fu (c) is a linear function of u.

0  
u
1 1
u2
2c2

(3) f : R 7 R such that f (c) exists. Then


f (c + tu) f (c)
f (c + tu) f (c)
=
u
t
tu
so that
fu (c) = lim

t0

f (c + h) f (c)
f (c + tu) f (c)
= lim
u = f (c)u.
h0
t
h

Again, fu (c) is linear in u (with matrix [f (c)]!).

i
i

4.2. The Directional Derivative and the Differential


(4) Now for the bad news. Define f

0,
f (x, y) = 1,

0,

muldown
2010/1/10
page 89
i

89

: R2 7 R by
when y 6= x2 ;

when y = x2 and (x, y) 6= O;

when (x, y) = O.

Then fu (0, 0) = 0 for every u = (u1 , u2 ) R2 and is linear in u, but f is


not continuous at (0, 0). In other words, the directional derivative exists in
every direction at the origin, but the function is not continuous there. What
happened to the univariate differentiable implies continuous?

(5) This example shows that the directional derivative does not have to be linear
in u. Define f : R2 7 R by
( 2
x y
when x3 =
6 y2;
3
2,
f (x, y) = x y
0,
when x3 = y 2 .
Then f is not continuous at (0, 0) (why?!). Computing fu (0, 0), for u =
(u, v) R2 , we find
( 2
u
v , if v 6= 0;
f(u,v) (0, 0) =
0,
if v = 0.
In particular, f(u,v) is not linear in u = (u, v).

The preceding examples show that when f is a nice function at c, then fu (c)
is linear in u, but in general it does not need to be linear even if it exists for all u.
Observe however, that all the fu (c) in the above examples are homogeneous in u,
that is, fu (c) = fu (c) for all R. When they fail to be linear, it is the additive
property that fails. The homogeneity in u is always true when fu (c) exists.

Exercises
4.11. Prove that if fu (c) exists, then fu (c) = fu (c).
4.12. Prove that the expressions given in the directional derivatives of (4) and (5)
in Example 4.10 are correct.
4.13. Check that the following partial derivatives are correct:
(a) The function f (x, y) = x2 + y 3 has partial derivatives
f
(x, y) = 2x,
x

f
(x, y) = 3y 2 ,
y

i
i

90

muldown
2010/1/10
page 90
i

Chapter 4. Differentiation 0f Functions of Several Variables


(b) The function f (x, y) = (sin(xy), ex , cos(y)) has partial derivatives
f
(x, y) = (y cos(xy), ex , 0),
x

f
(x, y) = (x cos(xy), 0, sin(y)).
y

4.14. For the function f : R2 7 R3 defined by f (x, y) = (x2 + y 3 , sin(y), ex ), prove


that fu (x, y) exists for each (x, y) R2 and each direction u = (u, v). Show
also that fu (x, y) is linear in u = (u, v) with matrix

2x
3y 2
0 cos(y) .
ex
0
Motivation: At first glance it seems that the directional derivative determines the
properties of functions in the same way that the derivative in one variable does.
Unfortunately, that is not quite the case. For example, in Example 4.10 (4), we
have shown that a function may be discontinuous at a point where all directional
derivatives exist. Therefore, a more rigorous approach must be used to extend the
role of differentiation. By reforulating the definition, we can realize the expected
properties, and hence the usefulness, of the derivative. To this end, we reformulate
the definition of the derivative of a function in one variable to show case the linearity
as found in Example 4.10 (3).
Definition 4.11 (Derivative in one variable reformulated). A function f :
R 7 R is differentiable at c if there exists a number f (c) such that
f (x) f (c)
= f (c)
xc

f (x) f (c)

|f (x) f (c) f (c)(x c)|
= lim
f (c) = lim
= 0.
xc
xc
xc
|x c|
lim

xc

Thus, the function f : R 7 R is differentiable at c if and only if there exists a linear


function L : R 7 R such that
lim

xc

|f (x) f (c) L(x c)|


=0
|x c|

(4.1)

and L is given by L(u) = f (c)(u), for all u in R.

We can consider the linear function L to be the best linear approximation in


the sense of (4.1) to f (x) f (c) in a neighborhood of c, i.e. f (x) f (c) L(x c),
or f (x) L(x c) + f (c) = f (c)(x c) + f (c). This is the approximation along
the tangent line (affine function) learned in calculus.
Definition 4.12 (Differential in several variables). Let f : D Rn 7 Rm
with c and interior point of D, the domain of f . Suppose L : Rn 7 Rm is linear
and
|f (p) f (c) L(p c)|
= 0.
lim
pc
|p c|

i
i

4.2. The Directional Derivative and the Differential

muldown
2010/1/10
page 91
i

91

(x,f(x))

f(x)

(x,f(c)+L(xc))

(c,f(c))

f(c)

Figure 4.7. Derivative as best linear approximation.


Then f is said to be differentiable at c, while L is called the differential of f at c
and is denoted Df (c), that is L(u) = Df (c)u, for all u Rn .
Example 4.13 If f : R 7 R is given by f (x) = x3 , then f (c) = 3c2 , so L(u) =
3c2 u for all u R. Indeed, computing
|f (p) f (c) L(p c)| = |p3 c3 3c2 (p c)| = |p c||p2 + pc + c2 3c2 |
= |p c||p2 + pc 2c2 |

= lim

pc

|f (p) f (c) L(p c)|


= lim |p2 + pc 2c2 | = 0.
pc
|p c|

Theorem 4.14. For a function f : D Rn 7 Rm and a point c interior to D, we


have
Df (c)(u) = L(u) Dfi (c) = Li (u),

u Rn

and

i = 1, . . . , m,

where
f (p) = (f1 (p), . . . , fm (p)),

L(u) = (L1 (u), . . . , Lm (u)).

Proof. This follows because the limit exists if and only if the limit of each component exists (this uses the equivalence of d(p, c) and d (p, c)):
|f (p) f (c) L(p c)|
=0
|p c|
|fi (p) fi (c) Li (p c)|
lim
= 0,
pc
|p c|
lim

pc

i = 1, . . . , m

The matrix for the differential must have the form provided by the next theorem.
Theorem 4.15. If f : D Rn 7 Rm and f is differentiable at an interior point
c of D, then

i
i

92

muldown
2010/1/10
page 92
i

Chapter 4. Differentiation 0f Functions of Several Variables

(i) fu (c) exists for each u Rn and Df (c)(u) = fu (c).


(ii) The matrix of the linear transformation Df (c) is

f1
f1
. . . x
(c)
x1 (c)
n

..
..
,

.
...
.
fm
fm
x1 (c) . . .
xn (c)

This
matrix
is the Jacobian Matrix of f at c, and can be denoted also as
h
i
f

x (c) , or simply f (c).

Proof.

(i) If Df (c) exists, then for u Rn \O and p = c + tu,


|f (c + tu) f (c) Df (c)(tu)|
=0
|tu|



f (c + tu) f (c)
1
lim
Df (c)(u) = 0
=
|u| t0
t
f (c + tu) f (c)
= Df (c)(u).
= lim
t0
t
lim

t0

Here we used the homogeneous property of linearity, Df (c)(tu) = tDf (c)(u) for
the second step.
(ii) recall from Theorem 4.2 that the matrix of L is [Ci,j ] = [Li (ej )]. Here
Li (ej ) = Dfi (c)(ej ) = (fi )ej (c) by part (i)
fi
(c), i = 1, . . . , m, j = 1, . . . , n.
=
xj
Notes: In the case when n = m, the determinant det f (c) is called the Jacobian
of f at c and is often denoted by
Jf (c)

or

(f1 , . . . , fn )
(c).
(x1 , . . . , xn )

Part (ii) of the Theorem means that if u = (u1 , . . . , un ) Rn , then for i = 1, . . . , m,


Li (u) = Dfi (c)(u) =

n
X
fi
(c)uj .
x
j
j=1

Corollary 4.16. If the differential exists, it is unique.


n
Proof. fu (c) is unique (from the uniqueness of limits) for each u R
h . Alteri
fi
fi

(c)
are
unique
and
hence
the
matrix
f
(c)
=
(c)
is
natively, the partials x
xj
j

unique.

i
i

4.2. The Directional Derivative and the Differential

muldown
2010/1/10
page 93
i

93

Theorem 4.17. If f is differentiable at c, then f is continuous at c.


Proof. Let L = Df (c). From the definition of Df (c), given > 0, there is a
() > 0 such that
0 < |p c| < () =

|f (p) f (c) L(p c)|


< ,
|p c|

i.e. |f (p) f (c) L(p c)| < |p c|. In particular, with = 1, we have
|p c| < (1) = |f (p) f (c) L(p c)| < |p c|
= |f (p) f (c)| (1 + M )|p c|,

Theorem 4.4

= lim f (p) = f (c).


pc

Theorem 4.18 (Important). Let f : Rn 7 Rm . If the partial derivatives


fi
,
xj

i = 1, . . . , m,

j = 1, . . . , n,

exist in a neighborhood of c and are continuous at c, then f is differentiable at c.


Proof. By Theorem 4.14, it is sufficient to prove the case m = 1 (i.e. to assume f
is a real-valued function). By the hypothesis, for each > 0, there exists a > 0
such that


f

f
f

|p c| < =
(p) exists and
(p)
(c) < .
xi
xi
xi
Let p = (x1 , . . . , xn ), c = (1 , . . . , n ), and |p c| < . Define

p = p0 , p1 = (1 , x2 , . . . , xn ), p2 = (1 , 2 , x3 , . . . , xn ), . . . , c = pn .
(See Figure 4.8.) Note that

p = ( ,2)
2

p =(x ,x )
0

1 2

p = (1, x2)
1

Figure 4.8. Moving in successive coordinate directions.


|pk c| |p c| <

and

f (p) f (c) = f (p0 ) f (pn ) =

(4.2)
n
X

k=1

[f (pk1 ) f (pk )] .

(4.3)

i
i

94

muldown
2010/1/10
page 94
i

Chapter 4. Differentiation 0f Functions of Several Variables

Now on each line segment between pk1 and pk , k = 1, . . . , n, we are really considering a real-valued differentiable function of a real variable so we may use the Mean
Value Theorem.
f
(p )(xk k )
(4.4)
f (pk1 ) f (pk ) =
xk k
where pk is on the line segment from pk1 to pk , k = 1, . . . , n. Clearly, pk {p :
|p c| < } by (4.2), since this set is convex. From (4.3) and (4.2), we find
f (p) f (c) =

n
X
f
(p )(xk k ).
xk k
k=1

If u = (u1 , . . . , un ), let L(u) be defined by


L(u) =

n
X
f
(c)uk .
xk
k=1

Then since |pk c| < when |p c| < , we have


h


f

f (p) f (c) L(p c) = Pn


k=1 xk (pk )

Pn

k=1

xk (pk )

|f (p)f (c)L(pc)|
|pc|



(xk k )

2 1/2  P
1/2
n
[ k=1 (xk k )2

f
xk (c)

(Cauchy-Schwartz)

n|p c|
=

f
xk (c)

<

n ,

when |p c| <

Therefore, Df (c) exists and Df (c) = L.


Please note that the last Theorem gives only a sufficient condition for the
function to be differentiable at a point.
Example 4.19 Let f : R2 7 R3 be given by f (x1 , x2 ) = (x1 , x2 , x21 + x22 ). The
partial derivatives are continuous on R2 so f is differentiable and Df (x1 , x2 ) has
the matrix representation

1
0
f (x1 , x2 ) = 0
1 .
2x1 2x2
For example, if L = Df (0, 0), then L : R2 7 R3 and

L1 u
1 0  
L2 u = 0 1 u1 , where u = (u1 , u2 ).
u2
L3 u
0 0

i
i

4.2. The Directional Derivative and the Differential

muldown
2010/1/10
page 95
i

95

In other words, Df (0, 0)(u1 , u2 ) = (u1 , u2 , 0) for each (u1 , u2 ) R2 . Check for
yourself that
Df (0, 1)(u1 , u2 ) = (u1 , u2 , 2u2 ),
and

Df (1, 1)(u1 , u2 ) = (u1 , u2 , 2u1 + 2u2 ).


Example 4.20 Let f : R 7 R3 be defined as f (t) = (cos(t), sin(t), t). Then

sin(t)
f (t) = cos(t) ;
1
and Df (0)(u) = (0, u, u), Df (/2)(u) = (u, 0, u).

Example 4.21 Let f : R3 7 R be defined as f (x, y, z) = x2 y+z. Then f (x, y, z) =


(2xy, x2 , 1) and Df (0, 0, 0)(u1 , u2 , u3 ) = u3 , while Df (1, 1, 0)(u1 , u2 , u3 ) = 2u1 +
u2 + u3 .
Example 4.22 Let f : R2 7 R2 be defined as f (x, y) = (x + y, (x + y)2 ). Then


1
1

f (x, y) =
and Df (x0 , y0 ) (u, v) = (u+v, 2(x0 +y0 )(u+v)).
2(x + y) 2(x + y)
Interpretation: Suppose f : Rn 7 Rm and L : Rn 7 Rm be linear. Now
|f (c + u) f (c) L(u)|
=0
uO
|u|
lim

if L = Df (c),

so L(u) is the best linear approximation to f (c+u)f (c) near u = O. Equivalently,


f (c) + L(u) is the best affine approximation to f (c + u) near u = O. You can think
of this as saying that the affine set f (c) + L(Rn ) is tangent to f (Rn ) at f (c).
In Example 4.19 above f (x1 , x2 ) = (x1 , x2 , x21 + x22 ).
If L = Df (0, 0), the tangent at f (0, 0) = (0, 0, 0) is f (0, 0) + L(R2 ), the set of all
vectors of the form (0, 0, 0) + (u1 , u2 , 0), i.e., {(u1 , u2 , 0) : (u1 , u2 ) R2 }, which is
the plane y3 = 0 in R3 . Similarly, the tangent at f (0, 1) = (0, 1, 1) is
f (0, 1) + L(R2 ) = {(0, 1, 1) + (u1 , u2 , 2u2 ) : (u1 , u2 ) R2 }
= {u1 , u2 + 1, 2u2 + 1) : (u1 , u2 ) R2 }( L = Df (0, 1)).
Thus, the tangent is the plane y3 1 = 2(y2 1). Notice that the tangent at every
point on f (R2 ) is a plane since the rank of f (x1 , x2 ) = 2 for all (x1 , x2 ) R2 .
In Example 4.20, f (t) = (cos(t), sin(t), t). The tangent at f (0) = (1, 0, 0) is
f (0) + L(R), (L = D(0)), i.e., {(1, 0, 0) + (0, u, u) : u R} = {(1, u, u) : u R}, a
straight line through (1, 0, 0) and (1, 1, 1).
In Example 4.21, f (x, y, z) = x2 y + z.
The tangent at f (0, 0, 0) = 0, if L = Df (0, 0, 0) is f (0, 0, 0) + L(R2 ) = {u3 :
(u1 , u2 , u3 ) R3 }, i.e., the whole set R. The picture is not very informative in this
case.

i
i

96

muldown
2010/1/10
page 96
i

Chapter 4. Differentiation 0f Functions of Several Variables


x2

y3

f(R )
x1
y2
2

f(0,0)+L(R )
3

y1

Figure 4.9. Illustration for Example 4.19.

f(R)
y
0

f(0)+L(R)
x

Figure 4.10. Illustration for Example 4.20.

y
3

Figure 4.11. Illustration for Example 4.21.

Exercises
4.16. In Example 4.22, sketch the range of f (R2 ) (it is a curve in R2 ). Find the
tangent at one or two points of the range.
4.17. Let f : R2 7 R2 be defined as f (x, y) = (x2 + y, y 2 ). Check that
f (x0 , y0 ) =

2x0
0


1
.
2y0

i
i

4.2. The Directional Derivative and the Differential

muldown
2010/1/10
page 97
i

97

Write down Df (x0 , y0 )(, ) and verify that


|f (, ) f (0, 0) Df (0, 0)(, )|
= 0.
|(, )|
(,)(0,0)
lim

4.18. If f : Rn 7 R is such that


(i) f has an interior relative minimum (maximum) at c,
(ii) Df (c) exists,

f
then Df (c)(u) = 0 for all u Rn . [Hint: Show x
(c) = 0, for i = 1, . . . , n.]
i
4.19. Let f be a real-valued continuous function on a compact subset K of Rn such
that

(i) f (p) = 0 if p K, the boundary of K,

(ii) Df (p) exists if p K 6= , (derivative exists in the interior of K),


Show that there is a point p0 K such that
Df (p0 )(u) = 0,

for all u Rn .

[This is a generalization of Rolles Theorem.]


4.20. For each of the following functions, write down the Jacobian matrix at the
point indicated:
(a) f (x, y) = 3x2 y xy 3 + 2 at (1, 2).

(b) h(u, v) = (u sin(uv), v cos(uv)) at (/4, 1).


(c) g(x, y, z) = (x2 yz, 3xz 2) at (1, 2, 1).

4.21. Let f be a real-valued function on an open set U in R2 such that f


x and
f
exist
and
are
bounded
on
U
.
Prove
that
f
is
continuous
on
U
.
[Hint:
y
f (x, y) f (x0 , y0 ) = f (x, y) f (x0 , y) + f (x0 , y) f (x0 , y0 ), the old-polygonin-a-convex-set trick.]
4.22. Let f be a real-valued function on an open connected set U in R2 such that
f
f
x and y exist and are zero at each point of U . Prove that f is constant.
4.23. If f is a real-valued differentiable function on Rn and c Rn , then the vector


f
f
grad f (c) = f (c) =
(c), . . . ,
(c)
x1
xn
is called the gradient of f at c. Show that
fu (c) = Df (c)(u) = f (c) u,

for each

u Rn .

Deduce that the largest value of fu (c) under the restriction |u| = 1 is |f (c)|
and this value is attained when u = f (c)/|f (c)|. This means that the
direction of maximum rate of increase of f at c is the direction of the gradient
vector. [Hint: What was Cauchy-Schwartz first name?]

i
i

98

muldown
2010/1/10
page 98
i

Chapter 4. Differentiation 0f Functions of Several Variables

4.24. Sketch the surface {(x, y, xy) : (x, y) R2 } in R3 (i.e. z = xy) and show that
the tangent to this surface at the point (1, 1, 1) is (1, 1, 1) + {(u, v, u + v) :
(u, v) R2 } (i.e. z + 1 = x + y).
4.25. If L : Rn 7 Rm is linear, then the differential L exists and equals L at each
point in Rn .

4.2.1

Differentiation Rules

Theorem 4.23. Let , be functions from Rn to Rm which are differentiable at


c Rn .

(i) If h = + , , R, then h is differentiable at c and


Dh(c)(u) = D(c)(u) + D(c)(u),

for each

u Rn .

(ii) If k = = h, i (so k : Rn 7 R), then k is differentiable at c and


Dk(c)(u) = (c) D(c)(u) + (c) D(c)(u),

for each

u Rn .

Proof. (i) Let L = D(c), L = D(c), and consider the function L : Rn 7 Rm


defined as L(u) = L (u) + L (u). Clearly, L is linear and
|h(p) h(c) L(p c)|
|p c|
|(p) + (p) (c) (c) L (p c) L (p c)|
=
|p c|
|(p) (c) L (p c)|
|(p) (c) L (p c)|
+ ||
,
||
|p c|
|p c|

by the triangle inequality. Both terms in this last expression have a limit of zero as
p c, therefore, Dh(c) exists and equals the specified L.
(ii) Let L : Rn R be defined by
L(u) = (c) L (u) + (c) L (u),

u Rn .

Then
|k(p) k(c) L(p c)|
=
|p c|
|(p) (p) (c) (c) (c) L (p c) (c) L (p c)|
|p c|

= |(p) [(p) (c) L (p c)] + (c) [(p) (c) L (p c)]

+ [(p) (c)] L (p c)| /|p c|
|(p)|

|(p) (c) L (p c)|


|(p) (c) L (p c)|
+ |(p)|
|p c|
|p c|
|L (p c)|
.
+ |(p) (c)|
|p c|

i
i

4.2. The Directional Derivative and the Differential

muldown
2010/1/10
page 99
i

99

The last inequality was the result of several applications of the Cauchy-Schwartz
inequality. As p c in the last expression, the first term goes to zero since
is continuous and D(c) exists, the second term goes to zero since D(c) exists,
while the third term goes to zero by the continuity of at c and the fact that
|L (p c)| M |p c| for some constant M since L is linear.
Theorem 4.24 (The Chain Rule). Suppose : Rn 7 Rm and : Rm 7 R ,
and
(i) is differentiable at c Rn
(ii) is differentiable at b = (c) Rm .
Then f = , defined by f (p) = ((p)), is differentiable at c and
Df (c) = D(b) D(c).
In particular, the Jacobian matrices f , , satisfy
f (c) = (b) (c),
that is,

f1
x1

.
..

f
x1

f1
xn

...
..
.
...

1
y1

.
..
. = ..

f
xn

Writing this entry-wise

y1

...
..
.
...

1
ym

1
x1

.. ..
. .

ym

m
x1

...
..
.
...

1
xn

..
. .

m
xn

X i
fi
k
(c) =
(b)
(c),
xj
yk
xj

i = 1, . . . , ,

j = 1, . . . , m.

k=1

Another form often used in textbooks is the following: If f = (y1 , . . . , ym ),


yk = k (x1 , . . . , xn ), k = 1, . . . , m, then
m

X f yk
f
=
,
xj
yk xj
k=1

provided all the functions involved are smooth.


Example 4.25 n = m = = 1 and f (t) = ((t)). If (t0 ) and ((t0 )) exist,
then f (t0 ) exists and
f (t0 ) = ((t0 )) (t0 ).
Less precisely,

df
dt

d d
d dt .

i
i

100

muldown
2010/1/10
page 100
i

Chapter 4. Differentiation 0f Functions of Several Variables

Example 4.26 n = m = 2, = 1. Write


f (x, y) = (u, v),

where

u = u(x, y), v = v(x, y).

In other words, f = where (x, y) = (u(x, y), v(x, y)) If D(x0 , y0 ) and
D(u0 , v0 ) exist, where u0 = u(x0 , y0 ) and v0 = v(x0 , y0 ), then Df (x0 , y0 ) exists
and
f (x0 , y0 ) = (u0 , v0 ) (x0 , y0 ).
In matrix form
 f

Less precisely,

f
y

(x0 ,y0 )

= [
u

v ](u0 ,v0 )

u v
f
=
+
,
x
u x v x

 u
x
v
x

u
y
v
y

(x0 ,y0 )

f
u v
=
+
.
y
u y
v y

As a particular example, if h(r, ) = g(u, v) where u = r cos() and v = r sin(),


then
h
g
g
h
g
g
=
cos() +
sin(),
=
(r sin()) +
r cos().
r
u
v

u
v
Example 4.27 n = 1, m = 2, = 1, F (t) = h(x, y) where x = r(t), y = s(t). Then
 


 h h  dx
dF
dt
= x y
dy ,
dt
dt
or,
h dx h dy
dF
=
+
.
dt
x dt
y dt
Example 4.28 Suppose a particles position (x, y, z) in space is given at time t by
x = cos(t), y = sin(t), z = t (it is moving on a helix), and the temperature at any
point (x, y, z) is given by T (x, y, z) = x2 + y 2 + z 2 . If H(t) is the temperature of
the particle at time t, find H/dt.
In this case, H = T f , where
T (x, y, z) = x2 + y 2 + z 2 ,

f (t) = (cos(t), sin(t), t).

From the Chain Rule, H (t) = T (x, y, z)f (t); more exactly



sin(t)
dH
dH
= [ 2x 2y 2z ] cos(t) , or
= 2x sin(t) + 2y cos(t) + 2z = 2t,
dt
dt
1

In fact for this case, it is easier to find

dH
dt

directly

H(t) = T (x, y, z) = x2 + y 2 + z 2 = cos2 (t) + sin2 (t) + t2 = 1 + t2 ,


so dH/dt = 2t.

i
i

4.2. The Directional Derivative and the Differential


Proof of Theorem 4.24.
two assertions

muldown
2010/1/10
page 101
i

101

Let L = D(c), L = D(c) and b = (c). We make

1. limpc |((p)) ((c)) L ((p) (c))|/|p c| = 0.


2. limpc |L ((p) (c) L (p c))|/|p c| = 0.
Assuming these, if f = , then
|f (p) f (c) L L (p c)|
|p c|
|((p)) ((c)) L ((p) (c)) + L ((p) (c) L (p c))|
=
|p c|
|((p)) ((c)) L ((p) (c))| |L ((p) (c) L (p c))|
+
0

|p c|
|p c|
as p c by (1) and (2). Since L L is a linear function, we have shown that
Df (c) exists and equals L L .
It remains to prove assertions (1) and (2). For assertion (1), let > 0 be given.
Since is differentiable at c, there is a neighborhood U of c and a constant K > 0
such that if p U , then
|(p) (c)| < K|p c|
(see the proof of Theorem 4.17). Since L is the differential of at b = (c), there
exists a > 0 such that

|q b| < = |(q) (b) L (q b)| < |q b|.


K
Thus, from these two inequalities, if p is sufficiently close to c, we have

|((p)) ((c)) L ((p) (c))| < |(p) (c)|


K
< |p c|.

Therefore, assertion (1) is proved.


For the proof of assertion (2), from the linearity of L , there is a constant
M > 0 such that |L (u)| M |u|, for all u Rn . Thus,
|L ((p) (c) L (p c))|
|(p) (c) L (p c)|
M
0
|p c|
|p c|
as p c by definition of L .
The Chain Rule in higher dimensions is more interesting than in one dimension; for example, it includes the rules for differentiating sums and products as
special cases. Thus, Theorem 4.23 is a corollary to Theorem 4.24. This can be seen
by considering the functions
F : Rn 7 R2m , F (p) = ((p), (p))

G : R2m 7 Rm , G(q1 , q2 ) = q1 + q2 ,
H : R2m 7 R, H(q1 , q2 ) = q1 q2 .

q1 , q2 Rm

i
i

102

muldown
2010/1/10
page 102
i

Chapter 4. Differentiation 0f Functions of Several Variables

Then G F = + and H F = .
Theorem 4.29 (Mean Value Theorem). Suppose f : Rn 7 R. If a, b Rn
and f is differentiable at each point of the line segment S between a and b, then
there is a point c S, c 6= a or b, such that
f (b) f (a) = Df (c)(b a).
In the notation of Exercise 23, this may be written as
f (b) f (a) = f (c) (b a)
n
X
f
(c)(bj aj ),
=
x
j
j=1
where b = (b1 , . . . , bn ) and a = (a1 , . . . , an ).
Proof. We reduce the problem to one variable by introducing the function F :
[0, 1] 7 R defined by
F (t) = f (a + t(b a)),
0t1
= f ((t)),
(t) = a + t(b a).
By the Chain Rule, F (t) exists for 0 t 1 and
F (t) = DF (t)(1) = Df ((t)) D(t)(1)
= Df ((t)) (t) = Df ((t))(b a).
By the Mean Value Theorem for F (t),
F (1) F (0) = F (t0 )(1 0) = F (t0 ),
for some t0 (0, 1). Thus,
f (b) f (a) = Df ((t0 ))(b a) = Df (c)(b a),

c = (t0 ).

The Mean Value Theorem does not hold for functions f : Rn 7 Rm if m > 1.
Can you tell why? If you cannot, try to carry out a proof with m > 1. See Exercise
30 for what can be said.

Exercises
4.26. Let f : R 7 R be differentiable. If F : R2 7 R is defined by
F
(a) F (x, y) = f (xy), then x F
x = y y ,

F
(b) F (x, y) = f (ax + by), then b F
x = a y ,

i
i

4.2. The Directional Derivative and the Differential

muldown
2010/1/10
page 103
i

103

F
(c) F (x, y) = f (x2 + y 2 ), then y F
x = x y .

4.27. Define f : R2 7 R by



(x2 + y 2 ) sin 2 1 2 , if (x, y) 6= (0, 0)
x +y
f (x, y) =

0,
if (x, y) = (0, 0).

f
Show that f is differentiable at (0, 0), but f
x and y are not continuous at
(0, 0).
4.28. Let f : Rn 7 R be a differentiable function and C be a smooth curve in
Rn on which the function is constant. Prove that for any point c C, the
tangent to C at c is perpendicular to


f
f
f (c) =
(c), . . . ,
(c) .
x1
xn

4.29. Suppose f : Rn 7 Rn is differentiable. If for all collections of n Co-linear


points {p1 , . . . , pn } in Rn the determinant

f1
f1
x (p1 ) . . . x
(p1 )
n
1


..
..
..


.
.
.


fn (p ) . . . fn (p )
x1

xn

is non-zero, then show that f is 1-to-1 on R . Thus, if n = 1 and f (x) 6= 0


for each x R, then f is 1-to-1 on R. Show that the function f (x, y) =
(x3 y, ex+y ) is 1-to-1 on R2 .
4.30. If f : Rn 7 Rm is differentiable at each point c in the line segment between a
and b and satisfies |Df (c)(u)| M |u|, for each u Rn , then |f (b) f (a)|
M |b a|.
4.31. (Eulers Theorem) The real-valued function f (x) = f (x1 , . . . , xn ) is homogeneous
of degree m if
n

f (tx) = f (tx1 , . . . , txn ) = tm f (x1 , . . . , xn ) = tm f (x),

t R.

For example the functions


x3 + y 3 + 3x2 y,

x5 + y 5 + z 5
,
(x + y + z)5

x2 y 7 + 2z 4 x5 ,

are homogeneous of degree 3, 0, 9 respectively. Prove that if f is differentiable


and homogeneous of degree m, then
x f = x1

f
f
+ . . . + xn
= mf.
x1
xn

[Hint: Differentiate the formula defining the concept of homogeneity with


respect to t and set t = 1. Was Euler an Edmonton hockey player???]

i
i

104

muldown
2010/1/10
page 104
i

Chapter 4. Differentiation 0f Functions of Several Variables

4.32. (Project) Carry out the following:


(i) Suppose f is continuous on [a, b] [c, d] = I to R. Prove that if
F (x) =

f (x, y) dy,

then F is continuous on [a, b]. [Hint: f is uniformly contiguous on I.]


(ii) Suppose f and f
x are continuous on I. Let F be as in part (i). Prove
that F exists, is continuous on [a, b], and

F (x) =

f
(x, y) dy.
x

f
(x, y) dy
x

[Hint: Let
(x) =

then
R x is continuous by part (i). Use the Fubini Theorem to show that
= F (x) F (a). Hence, F is an antiderivative of so F exists and
a
equals .]

(iii) Suppose (x) and (x) have continuous derivatives on [a, b], and f and
f
x are continuous on I. Show
d
dx

(x)

(x)

f (x, t) dt = f (x, (x)) (x) f (x, (x)) (x)


+

(x)

(x)

f
(x, t) dt.
x

[Hint: Apply the Chain Rule and (ii) to


Z z
f (x, t) dt, at (x, y, z) = (x, (x), (x)).]
F (x, y, z) =
y

(iv) From the formula


Z
0

dx
,
= 2
a + b cos(x)
(a b2 )1/2

a > 0,

|b| < a,

establish the results


Z

dx
a
= 2
2
(a + b cos(x))
(a b2 )3/2
b
cos(x) dx
= 2
.
2
(a + b cos(x))
(a b2 )3/2

i
i

4.3. Partial Derivatives of Higher Order


Rb

dx
,
0 x2 +a2

(v) Evaluate
Z

[Hint:

4.3

R
0

105

a > 0, and from your result deduce that

dx
1
= 3 arctan
2
2
2
(x + a )
2a

Show also that

muldown
2010/1/10
page 105
i

f = limT

dx
= 3,
(x2 + a2 )2
4a

RT
0

 
b
b
+ 2 2
,
a
2a (b + a2 )

a > 0.

a > 0.

f if this limit exists.]

Partial Derivatives of Higher Order

Definition 4.30. Let f : Rn 7 R be such that


 
f

also exists at this point, then we write


xj
xi

2f
=
xj xi
xj

and, in particular,
2f

=
2
xi
xi

f
xi

f
xi

f
xi

exists for some point. If

These are the partial derivatives of second order. Partial derivatives of third and
higher order are similarly defined.
Example 4.31 Let f (x, y) = x3 + 3y 2 + 2xy. Then
f
f
= 3x2 + 2y,
= 6y + 2x
x
y
2f
2f
2f
= 6x,
=2=
,
2
x
xy
yx

2f
= 6.
y 2

It is usually the case, (but not always, see Exercise 13, page 349 in Buck),
that the successive partial derivatives may be taken in any order we please, e.g., in
the above example we have seen
2f
2f
=
.
xy
yx
The following two theorems give sufficient conditions for this. There is no loss
of generality in the fact that these theorems are proved in R2 only; we are only
concerned with the behaviour of f with respect to two of the variables in any case.

i
i

106

muldown
2010/1/10
page 106
i

Chapter 4. Differentiation 0f Functions of Several Variables

Theorem 4.32. Let f : R2 7 R. If the mixed partials


2f
,
xy

2f
yx

exist and are continuous on an open set U R2 , then they are equal at each point
of U .
Proof.

Let I = [a, b] [c, d] U . By the Fubini Theorem


!
Z
Z d Z b 2
2f
f
J1 =
=
dx dy
I xy
a xy
c

Z d
f
f
=
(b, y) ( (a, y) dy
y
y
c
= f (b, d) f (b, c) f (a, d) + f (a, c).
!
Z b Z d 2
Z
f
2f
=
dy dx
J2 =
a
c yx
I yx

Z b
f
f
=
(x, d)
(x, c) dx
x
x
a
= f (b, d) f (a, d) f (b, c) + f (a, c).

Therefore, J1 = J2 for each interval I U .


Now, suppose that
2f
2f
(x0 , y0 ) 6=
(x0 , y0 )
yx
xy
for some (x0 , y0 ) U . We may even say that one is larger than the other, say,
2f
2f
(x, y)
(x, y) > 0,
xy
yx
in a small interval I containing (x0 , y0 ) by continuity of the mixed partials. But
then

Z  2
2f
f
> 0,

J1 J2 =
xy yx
I
by Theorem 3.15(a); a contradiction. Hence, we must have equality throughout U .

The next Theorem is more general than the one just proved, but is also more
difficult to prove.
Theorem 4.33. If f : R2 R satisfies
(i)

2 f
f f
x , y , xy ,

exist in a neighborhood U of (x0 , y0 ), and

i
i

4.3. Partial Derivatives of Higher Order


(ii)

2 f
xy

then

2 f
yx (x0 , y0 )

Proof.

muldown
2010/1/10
page 107
i

107

is continuous at (x0 , y0 ),
exists and equals

2f
xy (x0 , y0 ).

We want to show that

2f
(x0 , y0 ) = lim
k0
yx

f
x (x0 , y0

+ k)
k

f
x (x0 , y0 )

exists and equals

2f
(x0 , y0 ),
xy
(4.5)

or equivalently,
lim

k0

f
x (x0 , y0

+ k)
k

f
x (x0 , y0 )

)
2f

(x0 , y0 ) = 0.
xy

The idea of the proof is to show that


f
x (x0 , y0

f
x (x0 , y0 )

2f
(x (k), y (k)) + terms going to zero with k
xy
(4.6)
2 f
and with (x (k), y (k)) (x0 , y0 ) as k 0, so that the continuity of xy
at
(x0 , y0 ) may be used. To that end, given > 0, choose = > 0 so that

2

f
2f


(x , y )
(x0 , y0 ) < /2. (4.7)
max{|x0 x |, |y0 y |} < =
xy
xy
+ k)
k

Fix k with 0 < |k| < . We rewrite the numerator of the quotient in the limit
of (4.5) using
(a) the definition of the partial derivative,
(b) the fact that the Mean Value Theorem for one variable can be applied to
g(y) = f (x0 + h, y) f (x0 , y) for any fixed h (why?), and
(c) that the Mean Value Theorem for one variable can be applied to h(x) =
f
y (x, y1 ) for any fixed y1 (why?),
as follows
f
f

(x0 , y0 + k)
(x0 , y0 ) =
[f (x, y0 + k) f (x, y0 )]x=x0
x
x
x
=

[f (x0 + h, y0 + k) f (x0 + h, y0 )] [f (x0 , y0 + k) f (x0 , y0 )]


+ (h),
h
with (h) 0 as h 0 by (a)

dg
k
g(y0 + k) g(y0 )
+ (h) =
(y0 + h,k k) + (h)
h
dy
h
h
i
f
f
y (x0 + h, y0 + h,k k) y (x0 , y0 + h,k k)
=
k + (h), where |h,k | < 1 by (b)
h
=

i
i

108

muldown
2010/1/10
page 108
i

Chapter 4. Differentiation 0f Functions of Several Variables

=k

2f
(x0 + h,k h, y0 + h,k k) + (h),
xy

where |h,k | < 1 by (c).

If h = h(k) is chosen to be small enough that |h| < |k| and |(h)| < |k|2 , then, after
dividing by k, we obtain (4.6) with x = x (k) = x0 +h,k h, y = y (k) = y0 +h,k k,
and (h)/k as the term that goes to zero with k. Since
max{|x (k) x0 |, |y (k) x0 |} = max{|x0 + h,k h x0 |, |y0 + h,k k y0 |} |k|,
all the requirements are met to achieve



f (x , y + k) f (x , y )
2f

2f
2f
0 0
x 0 0


x

(x0 , y0 ) =
(x , y )
(x0 , y0 ) < ,



k
xy
xy
xy

using (4.7) provided |k| < /2, for then |(h)/k| < |k| < /2.

Notation: Let D Rn . By C(D) we will denote the set of all continuous functions
on D. By C k (D), we mean all functions defined on D having all k-th order partial
derivatives continuous on D. The range of the functions will be clear from the
context in which the notation is used.
We first give Taylors Theorem in an expanded form emphasizing two-variables.
The Theorem can be stated cleanly in a form similar to the univariate form even in
higher dimensions which we do in a second pass.
Theorem 4.34 (Taylors Theorem). If f : U Rn 7 R, f C k (U ), where U
is a convex open subset of Rn , and a, b U .
(i) For n = 2, we can write f (x), b = (b1 , b2 ), a = (a1 , a2 ), as

j
k1
X

1
(b1 a1 )
+ (b2 a2 )
f (a) + Rk ,
f (b) =
j!
x1
x2
j=0
where for some point c on the line segment joining b and a,

k
1

Rk =
(b1 a1 )
+ (b2 a2 )
f (c).
k!
x1
x2
(ii) For an arbitrary n, with b = (b1 , b2 , . . . , bn ), a = (a1 , a2 , . . . , an ), we have
f (b) =

k1
X
j=0

1
j!


j

(b1 a1 )
+ . . . + (bn an )
f (a) + Rk
x1
xn

where for some point c on the line segment joining b and a,



k

(b1 a1 )
Rk =
+ . . . + (bn an )
f (c).
k!
x1
xn

i
i

4.3. Partial Derivatives of Higher Order


Proof.

muldown
2010/1/10
page 109
i

109

As in the proof of the Mean Value Theorem, let


F (t) = f ((t)),

(t) = a + t(b a).

By the Chain Rule, F (t) = Df ((t))(b a)


b 1 a1
F (t) = [
b 2 a2
f
f
((t)) + (b2 a2 )
((t))
= (b1 a1 )
x1
x2



f ((t)).
+ (b2 a2 )
= (b1 a1 )
x1
x2

f
x1 ((t))

f
x2 ((t)) ]

The last expression is again a differentiable function of t (if k 2), so we can apply
the same argument to it as we did for f to get inductively




F (t) = (b1 a1 )
(b1 a1 )
f ((t))
+ (b2 a2 )
+ (b2 a2 )
x1
x2
x1
x2

j

(j)
F (t) = (b1 a1 )
+ (b2 a2 )
f ((t)).
x1
x2
Applying Taylors Theorem in one variable on 0 t 1 to obtain F (1) in
terms of the derivatives of F at zero, we find
F (1) =

k1
X
j=0

1 (j)
F (0) + Rk ,
j!

Rk =

1 (k)
F (),
k!

for some 0 < < 1.

Part (i) of the Theorem follows by substituting


j


(j)
+ (b2 a2 )
f (a),
F (1) = f (b), F (0) = (b1 a1 )
x1
x2

c = ().

Part (ii) is exactly the same, except more terms are used in the sum of partials
because there are more variables.
To view Taylors Theorem somewhat differently, we define the gradient (see
Exercise 23, which you may have done already), or the operator defined on
real-valued differentiable functions on Rn with values in the real-valued functions
on Rn :
 f


f 
,
: f 7
.
(4.8)
,...,
,...,
:=
x1
xn
x1
xn
The gradient vector has many important applications (see Exercise 23). One is that
the directional derivative in the direction u Rn at a point c can be given by the
inner product with the gradient vector at the point
fu (c) = Df (c)(u) =

n
X
j=1

uj

f
= f (c) u = u f (c) =: hf (c), ui.
xj

(4.9)

i
i

110

muldown
2010/1/10
page 110
i

Chapter 4. Differentiation 0f Functions of Several Variables

Below we phrase Taylors Theorem using the gradient notation, which is simpler in form but richer in meaning. When properly viewed it allows the concept to
expand to several variables without unnecessary clutter. (Unfortunately, our minds
sometimes demands a struggle with the clutter before we can fully conceptualize.
Dont be afraid to get dirty with this.) Before stating Taylors Theorem, we do
some house-keeping on notation and look at iterations of the particular operator
u . Look at how the gradient operator is used to define, for a fixed direction u,
the directional derivative as an operator on differentiable functions:
Given u in Rn :

u = u : f 7

Pn

n
X
j=1

uj

f
.
xj

f
: Rn R is differentiable, the operator may
If the resulting function j=1 uj x
j
be applied once again. Quite literally, this gives
n
X
=1

n
n
n
n
X
X f  X X
2f
2f
u uj
uj
=
u uj
.
=
x j=1 xj
x xj
x xj
j=1

This is the operator

(4.10)

,j=1

=1


(u )2 f := (u ) (u )f .

As long as the resulting function (the function defined as the last sum in (4.10) is
differentiable, we can apply the operator again. In this way, we define inductively
the operator,

(u )k f := (u ) (u )k1 f .

This can be written out in long form (in the same manner as the binomial formula)
as a k-fold sum
(u )k f =

1j1 ,...,jk n

uj1 ujk

kf
.
xj1 xjk

Since we can only apply this operator again if all the partial derivatives of each
kf
xj1 xjk

(4.11)

exist, it is defined on the class of functions C N (U ), the space of functions defined


on an open set U Rn with all partial derivatives (4.11) continuous up to order
k = N.
Now we are ready to restate and prove Taylors Theorem:
Theorem 4.35 (Taylors Theorem). Let f C N (U ) where U is an open convex
set in Rn . If a is any point in U , then the value of the function f at any point
b U is given by
f (b) = f (a) +

N
1
X
k=1

k
1
(b a) f (a) + RN ,
k!

i
i

4.3. Partial Derivatives of Higher Order

muldown
2010/1/10
page 111
i

111

where the remainder term RN is given by


RN =

N
1
(b a) f (c)
N!

for some point c on the line segment joining a to b.

Proof. We will reduce the theorem to the univariate Taylor theorem. In fact, one
k
can see directly the analogy when one views (b a) f as the k-th directional
derivative in the direction b a.
Let (t) := a + t(b a), and define
F (t) := f ((t)),

0 t 1.

Then by the Chain Rule for several variables, we have

b 1 a1


f
f
..

((t)), . . . ,
((t))
F (t) = Df ((t))(b a) =
.
x1
xn
b n an

f
f
((t)) + . . . + (bn an )
((t)) = (b a) f ((t)).
= (b1 a1 )
x1
xn

Applying this formula inductively to the result, we find


k
k1

f ((t)) = (b a) f ((t)). (4.12)
F (k) (t) = (b a) (b a)
Therefore, applying the univariate Taylor formula to F (t), we obtain
F (1) =

N
1
X
k=0

F (k) (0)
+ RN ,
k!

where

F (N ) (t0 )
,
for some t0 , 0 < t0 < 1.
N!
Substituting (4.12) into this formula and noting (0) = a and (1) = b, we obtain
the Theorem with c = a + t0 (b a).
RN =

4.3.1

Min-Max Theory

We use Taylors Theorem to look for tests for relative maximums and minimums as
was done with the second derivative tests for functions of one variable. Suppose f
has continuous second order partial derivatives. Then
f (b) = f (a) + (b a) f (a) + (b a) )2 f (c)/2!
for some c on the line segment from a to b. If f has a relative max or min at
x = a, then f restricted to any coordinate direction is a real-valued function of one
variable with a relative max or relative min at a. Therefore, (see Exercise 18)

i
i

112

muldown
2010/1/10
page 112
i

Chapter 4. Differentiation 0f Functions of Several Variables

Theorem 4.36. If the function f : Rn 7 R has a relative maximum or relative


minimum at an interior point a of its domain where its partial derivatives exist,
then
f
(a) = 0,
xj

for all j = 1, . . . , n.

(4.13)

Points that satisfy (4.13) are called stationary or critical points. These are
the potential points for relative max and mins. At a stationary point a, Taylors
theorem of order 2 reads
2
f (b) = f (a) + (b a) f (c)/2.
Thus, we find
2
(b a) f (c) > 0 = f (b) > f (a),
2
(b a) f (c) < 0 = f (b) < f (a).

and

2
Hence the behavior of the operator (b a) f is important for determining
whether the critical point is a max or min.
We will view this a little differently, using matrix notation. Let u Rn , and
consider (u )2 f written in some imaginative ways:
(u )2 f


= (u ) (u )f = uT T f u


x1
u1
. f
f ..
,
.
.
.
,
]
= [u1 , . . . , un ] .. [ x
.
xn
1

un
x

n2 f
2
f
. . . x1 x
x1 x1
n

2
u
f
2f
x2 x1 . . . x2 xn .1
.

= [u1 , . . . , un ]
.
..
..
..

.
.
.
un

= [u1 , . . . , un ]

Pn

j=1

Pn

=1

(4.14)

2
2 f
f
. . . xn x
xn x1
n

Pn
2
f
u
=1 x1 x

Pn

..
.

2 f
=1 xn x u
2

f
.
uj u xj x

i
i

4.3. Partial Derivatives of Higher Order


If we define the n n matrix Ac by
2f

Ac :=

muldown
2010/1/10
page 113
i

113

x1 x1 (c)

...

2 f
x1 xn (c)

2f
x2 x1 (c)

...

2 f
x2 xn (c)

..
.

..
.

..
.

2f
xn x1 (c)

...

2 f
xn xn (c)

then the above equation (4.14) may be written as

(4.15)

(u )2 f (c) = uT Ac u.
We make the following observations: If f has continuous partial derivatives of
second order, then
1. Ac is symmetric (Theorem 4.32),
2. if (b a)T Ac (b a) = ((b a) )2 f (c) > 0 for all b and c close to a, then
f (a) is a relative minimum,
3. if (b a)T Ac (b a) = ((b a) )2 f (c) < 0 for all b and c close to a, then
f (a) is a relative maximum,
4. if in any small neighborhood of a there are b and b with
(
(b a)T Ac (b a) = ((b a) )2 f (c) > 0

and

(b a) Ac (b a) = ((b a) ) f (c) < 0,

then f (a) is neither a max nor a min, and is called a saddle point because
f (b) > f (a) while f (b ) < f (a).
5. Since the second order derivatives are continuous, the function in (4.14) is
continuous in u = b c and therefore, (1)-(4) will be true for Ac replaced by
Aa if b is sufficiently close to a.
Thus, we need to consider the properties of the mapping u 7 uT Au on Rn
for a given n n matrix A. These mappings are called quadratic forms, since the
values uT Au are homogeneous multivariate polynomials of degree 2 in the variables
u1 , . . . , un , that is, they have the form
X
a,j u uj .
1,jn

Definition 4.37. For an n n matrix A, we say


(i) A is positive definite (respectively, positive semi-definite) if uT Au > 0
(respectively, uT Au 0) for all u Rn \O;

i
i

114

muldown
2010/1/10
page 114
i

Chapter 4. Differentiation 0f Functions of Several Variables

(ii) A is negative definite (respectively, negative semi-definite) if uT Au < 0


(respectively, uT Au 0) for all u Rn \O; and
(iii) indefinite otherwise.
Some of the importance of positive definite matrices can be seen from the
following remark.
Remark: A symmetric real matrix is positive definite if and only if its eigenvalues
are positive. (Result from linear algebra which we will take as known.)
Theorem 4.38. Suppose f : Rn R, f C 2 (U ) in an open set containing a, and
suppose further that
(i) a is a stationary point of f , that is,
f
(a) = 0,
xj

j = 1, . . . , n,

and
(ii) the symmetric matrix A(a) is given by
2 f

T
A(a) = f (a) =

x1 x1 (a)

...

2 f
x1 xn (a)

2 f
x2 x1 (a)

...

2 f
x2 xn (a)

..
.

..
.

..
.

2 f
xn x1 (a)

...

2 f
xn xn (a)

Then f has
(a) a relative maximum at a if A is negative definite;
(b) a relative minimum at a if A is positive definite;
(c) a saddle point at a if A is indefinite; and
(d) unknown properties (test doesnt work) at a if A is semi-definite.
Proof.

The proof follows from the discussion above.

Before going further, we return to R2 and look at these ideas more concretely.
A 2 2 symmetric matrix being positive (negative) semidefinite means
Q(x, y) = [ x y ]

a
b

b
c

 
x
= ax2 + 2bxy + cy 2 0 ( 0),
y

(x, y) R2 .

i
i

4.3. Partial Derivatives of Higher Order

muldown
2010/1/10
page 115
i

115

It is said to be positive (negative) definite if Q(x, y) > 0 (< 0) for all (x, y) 6= O.
For two by two matrices this expression reminds one of the quadratic formula, and
we have the following easily checked criterion.
Lemma 4.39. Let the 2 2 symmetric matrix be as in the preceding paragraph.
(i) The matrix is positive (negative) semidefinite if and only if
ac b2 0

and

a 0 ( 0),

c 0 ( 0).

(ii) The matrix is positive (negative) definite if and only if


ac b2 > 0

and

a > 0 (< 0),

c > 0 (< 0).

(iii) The matrix is indefinite if and only if ac b2 < 0.


Proof.

The lemma will follow from three observations

(a) Q(x, 0) = ax2 has the same sign as a.


(b) Q(0, y) = cy 2 has the same sign as c.
(c) aQ(x, y) = (ax + by)2 + (ac b2 )y 2 .
For example, if ac b2 < 0 and a > 0, then Q(x, 0) > 0 for any x, but the choice
x0 = by/a gives Q(x0 , y) < 0 for any y > 0.
Before restating Theorem 4.38 for two variable, we introduce another commonly used notation for partial derivatives
fx :=
fy :=

f
x
f
y

fxy :=
fyx :=

f
yx
f
xy

fxx :=
fyy :=

f
xx
f
yy

Theorem 4.40. Let f : R2 7 R be in C 2 (U ) for some neighborhood U of (x0 , y0 ).


Suppose (x0 , y0 ) is a stationary point for f ; that is,
fx (x0 , y0 ) = fy (x0 , y0 ) = 0,
and let

f (x, y)
A(x, y) = xx
fyx (x, y)


fxy (x, y)
.
fyy (xy)

Then at the point (x0 , y0 ), the function f has


(a) a relative maximum if A(x0 , y0 ) is negative definite, i.e. if
fxx fyy (fxy )2 > 0,

fxx < 0

at (x0 , y0 );

i
i

116

muldown
2010/1/10
page 116
i

Chapter 4. Differentiation 0f Functions of Several Variables

(b) a relative minimum if A(x0 , y0 ) is positive definite, i.e. if


fxx fyy (fxy )2 > 0,

fxx > 0

at (x0 , y0 );

(c) a saddle point or Minimax if A(x0 , y0 ) is indefinite, i.e. if


fxx fyy (fxy )2 < 0,

at (x0 , y0 );

(d) all possibilities at (x0 , y0 ), including none-of-the-above, if A(x0 , y0 ) is properly


semi-definite
det A(x0 , y0 ) = fxx fyy (fxy )2 = 0

11
00
00(c,f(c))
11

01c

at (x0 , y0 ).

f(c)
f
f(U)

G(U)
U
n

n+1

G(p)=(p,f(p))

01c

11
00
00 G(U)
11

f(U)
U

(c,f(c))

n+1

f(c)

G(p)=(p,f(p))

(c,f(c))
1
0

G
f

G(U)
n+1

0c
1

1 f(c)
0

Figure 4.12. Graphical illustrations of a relative maximum (top), a relative minimum (middle), and a saddle point (bottom).

i
i

4.3. Partial Derivatives of Higher Order

muldown
2010/1/10
page 117
i

117

Example 4.41 For the function f (x, y) = x2 + xy + y 2 , from the equations fx =


2x + y = 0 and fy = x + 2y = 0, we find that (x, y) = (0, 0) is the only stationary
point. Now

 

fxx (x, y) fxy (x, y)
2 1
=
,
fyx (x, y) fyy (x, y)
1 2

satisfies fxx (0, 0) = 2 > 0 and 2 2 12 = 3 > 0, so that the matrix is positive
definite and f has a relative minimum at (0, 0). In fact, it is a global minimum
since the matrix above is positive definite for all (x, y), so that the remainder in
Taylors Theorem R2 (x, y) > 0 for any (x, y) 6= (0, 0).

Example 4.42 For the function f (x, y) = x2 + 4xy + y 2 , from the equations fx =
2x + 4y = 0 and fy = 4x + 2y = 0, we find that (x, y) = (0, 0) is the only stationary
point, but in this case

 

fxx (x, y) fxy (x, y)
2 4
=
, satisfies 2 2 42 = 12 < 0
fyx (x, y) fyy (x, y)
4 2
so the matrix is indefinite. Hence, f has a saddle point at (0, 0). Alternately one
can see this from the observation that f (x, y) > 0 on the coordinate axes except at
the origin, but is negative on the line y = x when x > 0.
Example 4.43 For the function f (x, y) = 3x2 y 2 + x3 , from the equations
fx = 6x + 3x2 = 3x(2 + x) = 0,

and fy = 2y = 0,

we see that the stationary points are (0, 0) and (2, 0). We observe


6 0
for (0, 0)
is indefinite = saddle point
0 2


6 0
for (2, 0)
is negative definite = relative maximum.
0 2
Example 4.44 For the function f (x, y) = x4 + y 4 , from the equations
fx = 4x3 = 0,

and fy = 4y 3 = 0,

we see that the stationary point is (0, 0). We observe



 

fxx (x, y) fxy (x, y)
12x2
0
=
, has zero determinant at (0, 0).
fyx (x, y) fyy (x, y)
0
12y 2
Thus, we have no information from the theorem. However, 12x2 0, 12y 2 0 and
144x2 y 2 0 so that the matrix is positive semidefinite for all (x, y) and positive
definite if x 6= 0 and y 6= 0. Thus, the remainder in Taylors theorem R2 (x, y) 0
for all (x, y) and f has a minimum at (0, 0). Of course, one can easily observe that
f (x, y) > 0 when (x, y) 6= (0, 0) without resorting to second derivatives.

i
i

118

muldown
2010/1/10
page 118
i

Chapter 4. Differentiation 0f Functions of Several Variables

Example 4.45 For the function f (x, y) = x4 y 4 , from the equations


fx = 4x3 = 0,

and fy = 4y 3 = 0,

we see that the stationary point is (0, 0). We observe



 

fxx (x, y) fxy (x, y)
12x2
0
=
, has zero determinant at (0, 0).
fyx (x, y) fyy (x, y)
0
12y 2
However, the matrix is positive semidefinite on the line y = 0 and negative semidefinite
on x = 0 so that (0, 0) is a saddle point. In this case, it is also obvious without
considering second order partials that
f (x, 0) > f (0, 0) if x 6= 0

and f (0, y) < f (0, 0) if y 6= 0.

The last two examples illustrates the fact that anything can happen in the case
that fxx fyy (fxy )2 = 0 at a stationary point. Our next example looks at a function
on R3 . However, we need some more applicable criteria for the definitiveness of a
matrix.
The following criteria may be found, say, in the book by Gantmacher, Matrix
Theory.
Theorem 4.46. Let A be a symmetric n n matrix. Then
(a) A is positive semidefinite if and only if the determinants of all the k k submatrices of A symmetric about the main diagonal are non-negative ( 0).
(b) A is negative semidefinite if and only if the determinants of all the k k submatrices of A symmetric about the main diagonal have sign (1)k or are zero.
(c) A is positive definite if and only if the determinants

a1,1 . . . a1,k

.. > 0, k = 1, 2, . . . , n.
det ...
...
.
ak,1 . . . ak,k
(d) A is negative definite if and only if the determinants

a1,1 . . . a1,k

.. > 0, k = 1, 2, . . . , n.
(1)k det ...
...
.
ak,1 . . . ak,k
(e) A is indefinite if and only if it satisfies none of (a), (b), (c), or (d).
The determinants in parts (c) and (d) are sometimes called the principal minors.

i
i

4.3. Partial Derivatives of Higher Order

muldown
2010/1/10
page 119
i

119

Example 4.47 For the function f (x, y, z) = x2 + y 2 + z 2 + 2xyz, solving the equations
fx = 2x + 2yz = 0,

fy = 2y + 2zx = 0,

and fz = 2z + 2xy = 0

yields stationary points at


(0, 0, 0),

(1, 1, 1),

, (1, 1, 1),

(1, 1, 1),

(1, 1, 1).

The matrix A(x, y) is

fxx
fyx
fzx


fxz
2
fyz = 2z
fzz
2y

fxy
fyy
fzy

2z 2y
2 2x .
2x 2

At (0, 0, 0), A(0, 0, 0) = 2I33 which is positive definite; hence f has a relative
minimum at (0, 0, 0). At the other stationary points, |x| = |y| = |z| = 1 and
xyz = 1, so the principal minors of A satisfy


2 2z
det[2] > 0 det
= 4 4z 2 = 0
2z
2

2 2z 2y
det 2z 2 2x = 8(1 x2 y 2 z 2 + 2xyz) < 0.
2y 2x 2
Thus, the matrix is indefinite at all the other critical points so that these are all
saddle points.
In the following exercises, assume all the differentiability you need unless otherwise specified.

Exercises
4.32. Find Vx , Vy , Vz , Vu , Vxy , Vxyz , Vxyzu for the following functions V
(i)

x2 y 2 u
,
a2 z 2

(ii)

x y
z
u xy
+ + + +
.
y
z
u x zu

Solution for (i):


Vx =

2xy 2 u
a2 z 2

Vxy =

4xyu
a2 z 2

Vy =

2x2 yu
a2 z 2

Vz =

8xyzu
(a2 z 2 )2

Vxyz =

2x2 y 2 uz
(a2 z 2 )2

Vxyzu =

Vu =

x2 y 2
a2 z 2

8xyz
(a2 z 2 )2

Solution for (ii):


Vx =

1
y

Vu =

z
u2

u
x2

1
x

y
zu
xy
u2 z

Vy =

x
2

Vxy =

1
y2

1
z

1
zu

x
zu

Vz =

y
z2

Vxyz =

1
uz 2

1
u

xy
z2 u

Vxyzu =

1
u2 z 2

i
i

120

muldown
2010/1/10
page 120
i

Chapter 4. Differentiation 0f Functions of Several Variables

4.33. If V = x2 y 2 , x = 3u 4v + 2, y = 2u + 3v 1, show that Vu = 6x 4y,


Vv = 8x 6y.
4.34. If V = x3 + 8y 3 18xy, find the points (x, y) at which Vx = Vy = 0 and
investigate the nature of V at these points [Solution: (0, 0) is a saddle point,
(3, 3/2) is a minimum.]
4.35. If P = x2 y y 3 y 2 z, Q = xy 2 x3 x2 z, R = xy 2 + x2 y, show that
P (Qz Ry ) + Q(Rx Pz ) + R(Py Qx ) = 0.
4.36. If u = x + y + z, v = x2 + y 2 + z 2 , w = x3 + y 3 + z 3 3xyz, show
(u, v, w)
= 0.
(x, y, z)
4.37. If V, P, Q, R are C 1 functions of (x, y, z) satisfying the relations
Vx = P,

Vy = Q,

for some 6= 0,

, Vz = R,

show that
P (Qz Ry ) + Q(Rx Pz ) + R(Py Qx ) = 0.
Q
4.38. Let P, Q : R2 7 R, with P, Q, P
y , x continuous.

(i) Prove that there is a real valued function f on R2 such that


f (x, y) = [ P (x, y) Q(x, y) ]
if and only if

P
y

Q
x .

[HINT: Only if easy. If consider

f (x, y) =

P (t, y0 ) dt +

Q(x, t) dt.

y0

x0

See Exercise 32 (ii).]


(ii) State what you would consider to be a generalization of part (i) for
functions of three variables.
(iii) Verify that each of the following is the Jacobian matrix f (x, y) of some
function f (x, y) and find f (x, y):
(a)

[ 2xy

x2 + 3y 2 ] ,

[ 2ye2x + 2x cos(y)

(b)

e2x x2 sin(y) ] .

4.39. If F (x, y) = x/(x2 + y 2 ) and G(x, y) = y/(x2 + y 2 ), show that


(i) Fy = Gx , Fx = Gy
(ii) 2 F = 2 G = 0 where 2 =

2
x2

2
y 2 .

4.40. Show the following:


(i) If V = V (x, y), x = x(u, v), and y = y(u, v), then

i
i

4.3. Partial Derivatives of Higher Order

(a) [ Vu
(b)

Vuu
Vvu

Vv ]

Vuv
Vvv

=
+

muldown
2010/1/10
page 121
i

121


xu xv
yu yv




xu yu
Vxx Vxy
xu xv
xv yv
Vyx Vyy
yu yv




y
yuv
x
xuv
.
+ Vy uu
Vx uu
yvu yvv
xvu xvv
[ Vx

Vy ]

(ii) If U = U (x, y), V = V (x, y), x = x(u, v), y = y(u, v), show that
(U, V ) (x, y)
(U, V )
=
.
(u, v)
(x, y) (u, v)
[Hint: Use (i)(a)]
(iii) If x = r cos(), y = r sin(), show
1 (U, V )
(U, V )
=
.
(x, y)
r (r, )
4.41. If V = 3x2 + 2y 2 + (x2 y 2 ), prove that
yVx + xVy = 10xy.
4.42. If V = x2 + y 2 + (xy) + (y/x), prove that
x2 Vxx y 2 Vyy + xVx yVy = 4(x2 y 2 ).
4.43. If V = V (x, y), x = cos(), y = sin(), show that
1
1
2 V := Vxx + Vyy = V + V + 2 V .

p
[Hint: = x2 + y 2 , = arctan(y/x).]
4.44. If V = V (r, ), = c2 /r, = 2 , show that
2 V + V + V = r2 Vrr + rVr + V .
4.45. If V = V (x, y), x = x(u, v), y = y(u, v), and xu = yv , xv = yu , show
Vuu + Vvv
= x2u + x2v = yu2 + yv2 .
Vxx + Vyy
4.46. (Eulers Theorem continued, see Exercise 31) If V is a homogeneous function
of (x, y, z) of the mth degree, show that
x2 Vxx + y 2 Vyy + z 2 Vzz + 2xyVxy + 2yzVyz + 2zxVzx = m(m 1)V.

i
i

122

muldown
2010/1/10
page 122
i

Chapter 4. Differentiation 0f Functions of Several Variables

2
4.47. If zy = F (zz ), F 6= 0, then zxx zyy = zxy
.
4.48. If z = xF (x + y) + G(x + y), show that

zxx 2zxy + zyy = 0.


4.49. Do the following:
(a) If z = F (y +m1 x)+G(y +m2 x) and m1 , m2 are the roots of the quadratic
equation am2 + 2hm + b, then verify that azxx + 2hzxy + bzyy = 0. Show
this equation is satisfied by z = xF (y + m1 x) + G(y + m2 x) if m is a
double root of the quadratic equation.
(b) The equation of lateral vibration of a taut string is
ztt c2 zxx = 0.
Deduce from (a) that z = F (x + ct) + G(x ct) is a solution.

(c) A vibrating string for which initial displacement and velocity are specified
is governed by relations of the form
ztt c2 zxx = 0,

z(x, 0) = f (x),

zt (x, 0) = g(x).

Show that
z(x, t) =

1
1
{f (x + ct) + f (x ct)} +
2
2c

x+ct

xct

is a solution to this problem.


4.50. Do the following:
(a) If z = x(y/x) + (y/x), then
x2 zxx + 2xyzxy + y 2 zyy = 0.
(b) If z = x3 (y/x) + (y/x)/x3 , then
x2 zxx + 2xyzxy + y 2 zyy + xzx + yzy = 9z.
[Hint: If you are observant you dont have to do all that differentiation.]
4.51. Discuss the nature of the stationary points of the functions
(a) 34x2 24xy + 41y 2 . [Solution: (0, 0) is a minimum.]
(b) 3x2 + 4xy 4y 2 . [Solution: (0, 0) is a saddle.]

(c) x2 y 4x2 y 2 . [Solution: (0, 0) is a max; (2 2, 4) are saddles.]

(d) x2 y + 2x2 2xy + 3y 2 4x + 7y. [Solution: (1, 1) is a min; (1 6, 2)


are saddles.]
2
2
(e) x2 yz 2xyz + x2 z + x2 +
y + z + yz 2xz 2x + 2y + z. [Solution:
(1, 1, 0) is a min; (1 2, 2, 1) and (1 2, 0, 1) are saddles.]

i
i

4.3. Partial Derivatives of Higher Order

muldown
2010/1/10
page 123
i

123

4.52. Show that the function


f (x, y) = (y x2 )(y 2x2 )
does not have a relative extrema at (0, 0) even though it has a relative
minimum at this point when the domain is restricted to any line x = t,
y = t. [In the (x, y)-plane sketch the curves f (x, y) = 0, then determine
the regions f (x, y) < 0, f (x, y) > 0; check that on each line through (0, 0),
f (x, y) > 0 = f (0, 0) if (x, y) is close enough to (0,0).]
4.53. Show that the box of maximum volume which can be inscribed in the ellipsoid
y2
z2
x2
+
+
=1
a2
b2
c2

has sides of length 2a/ 3, 2b/ 3, 2c/ 3.


4.54. Suppose we are given n points (xj , yj ) R2 and desire to find a function
F (x) = Ax + B for which the quantity
n 
X
j=1

F (xj ) yj

2

is minimized. Show that this problem leads to the equations


Pn
Pn
Pn
A j=1 x2j + B j=1 xj =
j=1 xj yj
Pn
Pn
A j=1 xj + nB = j=1 yj

which are easily solved for A and B. The line y = Ax + B is the line which
best fits the given set of points in the sense of least squares.
4.55. Let {k } be a sequence of real-valued continuous functions on [a, b] such that

Z b
1, if k = m;
k m =
0,
if k 6= m,
a
Let f be a real-valued continuous function on [a, b]. Prove that the choice of
constants 1 , . . . , which minimizes the quantity
!2
Z b

X
k k ,
f
a

for a given is
k =

k=1

f k ,

k = 1, . . . , .

This problem arises in the theory of Fourier Series.


4.56. The capacity of a condenser formed by two concentric spherical conductors
of radii a and b, 0 < a < b, is
C(a, b) =

ab
.
ba

i
i

124

muldown
2010/1/10
page 124
i

Chapter 4. Differentiation 0f Functions of Several Variables


Suppose that in measuring the radii of a and b of the spheres the measurements are subject to errors of amount a and b respectively. Given that
products of the errors are negligible relative to the other quantities involved
derive the following approximation on the error in C:
a b
b a
C

.
C
a ba
b ba

4.57. The breaking weight W of a cantilever beam is given by the formula W =


kbd2 where b is breadth, is length, d is depth, and k is a constant depending
on the material of the beam. If the breadth is increased by 2% and the depth
by 5%, show that the length should be increased by about 12% if the breaking
weight is to remain unchanged.
4.58. In a triangle ABC, the area is calculated from the elements a, B, C, the
measurements being subject to errors a, B, C. Show that the error S in
the area is approximately given by
S
a
cB
bC
2 +
+
.
S
a
a sin(B) a sin C
4.59. Let f : R2 7 R, be f of class C 2 on a convex subset D of R2 . Suppose that
at each point p D, the matrix


fxx fxy
fyx fyy
is positive semidefinite. Prove that if p, q D, then



1
p+q

f (p) + f (q) .
f
2
2
What does this result mean geometrically?

4.4

Local Properties of C 1 Functions

Recall that if f : Rn 7 Rn , that is f (p) = (f1 (p), . . . , fn (p)), p = (x1 , . . . , xn ), is


differentiable at p0 , then the Jacobian of f at p0 is
f1
f1
x
1 . . . xn

..
.. = (f1 , . . . , fn ) = J (p ).
det f (p0 ) = ...
f
0
.
.

(x1 , . . . , xn )
fn . . . fn
x1

xn p0

Definition 4.48. A function f is locally one-to-one on a subset D of its domain


if there is a neighborhood U of p D on which f is one-to-one.

Lemma 4.49. If f : Rn 7 Rn is of class C 1 at p0 and Jf (p0 ) 6= 0, then there is


a neighborhood U of p0 on which f is one-to-one.

i
i

4.4. Local Properties of C 1 Functions

muldown
2010/1/10
page 125
i

125

Proof. Since the partials of f are continuous at p0 and Jf (p0 ) 6= 0, there is a


convex neighborhood U of p0 such that if pi U , i = 1, . . . , n, then

f1
f1
x (p1 ) . . . x
(p1 )
n
1


..
..
..
(4.16)
6= 0.

.
.
.


f
f
n (p ) . . .
n
(p )
x1

xn

If a, b U , then, applying the Mean-Value Theorem to each component of f , we


find
Pn
f1
f1 (b) f1 (a) =
i=1 (bi ai ) xi (p1 )
.. .. ..
(4.17)
. . .
Pn
fn
fn (b) fn (a) =
i=1 (bi ai ) xi (pn )
where pi ., i = 1, . . . , n, are some points on the line segment between a and b. But
(4.16) and (4.17) imply
(fi (b) fi (a) = 0,

i = 1, . . . , n) b a = 0

since the system of equations has full rank. Thus, f (a) = f (b) if and only if a = b
and f is one-to-one on U .
The following is an immediate consequence of this lemma:
Theorem 4.50. Let f : Rn 7 Rm , m n. Suppose D is an open subset of Rn
and
(i) f C 1 (D)
(ii) rank f (p) = n,

for all

p D,

then f is locally one-to-one on D.


Proof. Suppose f = (f1 , . . . , fm ), m n. We wish to show that f is one-to-one
on a neighborhood U of each point p0 D. Since rank f (p0 ) = n, we may assume
without loss of generality that
(f1 , . . . , fn )
(p0 ) 6= 0.
(x1 , . . . , xn )
(This can be achieved by relabelling the f s.) Thus, if f : Rn 7 Rn is defined by
f = (f1 , . . . , fn ), then f C 1 (D) and Jf(p0 ) 6= 0, so by the Lemma, f is one-toone on a neighborhood of p0 . But since f (p) = f (q) implies f(p) = f(q), f being
one-to-one implies f is one-to-one.
Example 4.51 This example shows that in the preceding theorem, f need not be
globally one-to-one. Take f (x, y) = (ex cos(y), ex sin(y)). Then

x
e cos(y) ex sin(y)
= e2x 6= 0, for all (x, y) R2 .
Jf (x, y) = x
e sin(y) ex cos(y)

i
i

126

muldown
2010/1/10
page 126
i

Chapter 4. Differentiation 0f Functions of Several Variables

Thus, from Theorem 4.50, f is locally one-to-one on R2 . However, f is not globally


one-to-one on R2 since
f (x, y + 2) = f (x, y),

for all

(x, y) R2 .

Note: The strip {(x, y) : x R, y [, )} is mapped onto R2 \O. In fact, the


vertical lines x = c, are mapped to circles u2 + v 2 = ec , and the horizontal lines
y = k, k 6= /2 are mapped to lines v = u tan(k), y = /2 is mapped to the
positive v axis and y = /2 to the negative v axis. (See Figure 4.13.)
y
v=u tan(k)
f

2
y=k
x=c

x
u 2 + v 2= e 2

Figure 4.13. The image of a strip for Example 4.51.

Lemma 4.52. Suppose f : Rn 7 Rn is in C 1 (D) for an open set D Rn . If


Jf (p) 6= 0 for all p D, then f (D) is an open set of Rn .
Proof. In order to prove f (D) is open, we must show that for any q0 f (D),
there is a 0 = (q0 ) > 0 such that B(q0 , 0 ) f (D). Let q0 = f (p0 ) for some
p0 D. Since D is open, we may choose 0 > 0 so that both
(a) B(p0 , 0 ) D, since D is open, and
(b) f is one-to-one on B(p0 , 0 ) by Theorem 4.50.
Let C = B(p0 , 0 ) = {p : |p p0 | = 0 }. Then f (C) is compact (why?) and
q0 = f (p0 ) 6 f (C). Note that
= inf{|q q0 | : q f (C)} = > 0

(Why?).

(See Figure 4.14.)


We claim that B(q0 , /3) f (D) and therefore f (D) is open. To prove this
claim, for an arbitrary q B(q0 , /3), we will show that q f (D). To this end,
consider the function : B(p0 , 0 ) Rn 7 R defined by
2

(p) = |f (p) q| ,

p B(p0 , 0 ).

Now is continuous and B(p0 , 0 ) is compact, so there exists


p B(p0 , 0 )

such that (p ) (p),

for all p B(p0 , 0 ).

i
i

4.4. Local Properties of C 1 Functions

muldown
2010/1/10
page 127
i

127

00
11
p0

1
0
q0

f(C)

Figure 4.14. The point q0 is away from the image of the boundary.
Now p 6 C since otherwise we would have
p
2
p
p C = (p )
> > (p0 ),
3
3

contradicting that has a minimum at p over B(p0 , 0 ). Therefore, p is an


interior minimum so the partial derivatives of are all zero at p (Theorem 4.36).
Thus, from the definition of
(p) = |f (p) q|2 =

n
X
i=1

(fi (p) qi )2

n
X
fi

(p ) = 0 = 2
(p ) (fi (p ) qi ) ,
=
xj
x
j
i=1

j = 1, . . . , n.

But the last line can be viewed as a homogeneous system of linear equations with
coefficient matrix having determinant Jf (p ) 6= 0. Consequently, we must have
n
o
fi (p ) = qi ,
i = 1, . . . , n = f (p ) = q = q f (D).
The proof is complete, and f (D) is open.

Remark: The condition that Jf (p) 6= 0 cannot be dropped even at a single point
in D. Indeed, take D = R and f (x) = x2 . Then f (R) = [0, ) is not open even
though f (x) 6= 0 except at the single point x = 0.
Theorem 4.53. Suppose f : D Rn 7 Rm , m n, for some open set D. If
f C 1 (D) and the Jacobian matrix has full rank, i.e. rank f (p) = m, for every
p D, then f (D) is open.
Proof. Given c D, we must prove that f (c) belongs to the interior of f (D). By
relabeling the x s if necessary, we may assume that
(f1 , . . . , fm )
(c) 6= 0.
(x1 , . . . , xm )
Since f C 1 (D), there is an open neighborhood U of c such that
(f1 , . . . , fm )
(p) 6= 0,
(x1 , . . . , xm )

p U.

i
i

128

muldown
2010/1/10
page 128
i

Chapter 4. Differentiation 0f Functions of Several Variables

Define f : Rm 7 Rm on the open set

by

e := {(x1 , . . . , xm ) : (x1 , . . . , xm , cm+1 , . . . , cn ) U }


U
f(x1 , . . . , xm ) = f (x1 , . . . , xm , cm+1 , . . . , cm ).

e ) and J (x1 , . . . , xm ) 6= 0, for all (x1 , . . . , xm ) U.


e From Lemma
Then f C 1 (U
f
m
e ) is an open set in R . Hence
4.52, f(U
e ) f (U ) f (D)
f (c) = f(c1 , . . . , cm ) f(U

= f (c) f (D) ,

c D,

and so f (D) is open.


e was open, why is that true?
Question: In the proof it was stated that U

Lemma 4.54. If f is continuous and one-to-one on a compact set S, then the


inverse function f 1 is continuous on f (S).
Proof. Let q0 f (S). We wish to show f 1 is continuous at q0 . If f 1 were not
continuous, then for some 0 > 0, there is a sequence of points {qk } f (S) for
which


lim qk = q0 and f 1 (qk ) f 1 (q0 ) 0 .
(4.18)
k

Now the image of the sequence, {f 1 (qk )} is a sequence in the compact set S, hence
there is a subsequence {f 1 (qkj )} such that
lim f 1 (qkj ) = p1 S.

(4.19)

By (4.18), p1 6= f 1 (q0 ). But f is continuous at p1 and f is one-to-one so, from


(4.19),


lim qkj = lim f f 1 (qkj ) = f (p1 ) 6= f f 1 (q0 ) = q0 ,
j

a contradiction. Thus, f 1 must be continuous at q0 .

In general, the requirement that S be compact cannot be dropped. It cannot


even be replaced by the requirement that f (S) be compact as the next example
shows:
Example 4.55 Let f : [0, 2) 7 {x2 +y 2 = 1}, be defined by f () = (cos(), sin()).
Then f C([0, 2)), f is one-to-one and f ([0, 2)) is compact in R2 . However, f 1
is discontinuous at (1, 0) (nearby points on the circle on either side of this point are
at opposite ends of the interval under f 1 ).

i
i

4.4. Local Properties of C 1 Functions

muldown
2010/1/10
page 129
i

129

Lemma 4.56. Let f : Rn 7 Rn . If f C 1 in a neighborhood of p0 and Jf (p0 ) 6= 0,


then there is a neighborhood U of p0 and a constant m > 0 such that
p U = |f (p) f (p0 )| m |p p0 | .
Proof. Since Jf (p0 ) 6= 0, the differential Df (p0 ) : Rn 7 Rn is one-to-one. Therefore, there exists m > 0 such that
|Df (p0 )(u)|

= |Df (p0 )(p p0 )|

2m|u|,

for all u Rn (Theorem 4.7),

2m|p p0 |,

for all p.

(4.20)

But from the definition of Df (p0 ), there is a neighborhood U of p0 such that


p U = |f (p) f (p0 ) Df (p0 )(p p0 )| m|p p0 |.

(4.21)

But then
m|p p0 | |f (p) f (p0 ) Df (p0 )(p p0 )|
|Df (p0 )(p p0 )| |f (p) f (p0 )|
2m|p p0 | |f (p) f (p0 )| by (4.21)
=

m|p p0 | |f (p) f (p0 )|.

We are now in a position to say something about the differentiability of the


inverse function.
Theorem 4.57. Let f : Rn 7 Rn , and suppose
(i) D is an open set of Rn , f C 1 (D), and Jf (p) 6= 0 at every p D, and
(ii) f is one-to-one-on D.
Then f 1 C 1 (f (D)) and Df 1 = (Df )1 (the inverse matrix).
Proof.

Let g : Rn 7 Rn be defined by


1
f (p) f (p0 ) Df (p0 )(p p0 )
g(p) :=
|p p0 |
= lim g(p) = O,
pp0

(by definition of Df (p0 ).).

(4.22)
(4.23)

1
Since Jf (p0 ) 6= 0, the inverse matrix Df (p0 )
exists and from (4.22),


1
1
f (p) f (p0 ) Df (p0 )(p p0 )
|p p0 | Df (p0 )
g(p) = Df (p0 )

1 
f (p) f (p0 ) (p p0 ).
= Df (p0 )
(4.24)

i
i

130

muldown
2010/1/10
page 130
i

Chapter 4. Differentiation 0f Functions of Several Variables

By Theorem 4.53, f (D) is open and, in particular, it is a neighborhood of


f (p0 ). Hence, with q0 = f (p0 ) and q = f (p), we may deduce from (4.24) and
Lemma 4.56 that


1
1
1


|qq0 || Df (p0 )
(g(p))| f 1 (q) f 1 (q0 ) Df (p0 )
(q q0 ) (4.25)
m
if p is in some neighborhood of p0 . Furthermore, from Lemma 4.54, f and f 1 are
continuous at p0 and q0 respectively, so that
q q0 = f 1 (q) f 1 (q0 ) = p p0 .
Therefore, from (4.25) and (4.23) we deduce


1
1

(q q0 )
f (q) f 1 (q0 ) Df (p0 )
= 0,
lim
qq0
|q q0 |
since

lim
qq0

1
Df (p0 )
(g(p)) = 0.

1
Df (p0 )
(g(p)) = lim

pp0

1
1
Thus, since Df (p0 )
is linear, Df 1 exists and equals Df (p0 )
.

1
From the fact that Df 1 (f (p0 )) = Df (p0 )
, it follows that f 1 C 1 (f (D))
1
since the partial derivatives of f
are rational functions of the partial derivatives
of f in which the denominator Jf (p) is not zero.
Example 4.58 We have seen that f (x, y) = (ex cos(y), ex sin(y)) is one-to-one on
the strip 0 y < 2. Recall

 x
e cos(y) ex sin(y)

= e2x 6= 0,
det f (x, y) = det x
e sin(y) ex cos(y)

 x
e cos(y) ex sin(y)
1
.
= [f (x, y)] = e2x
ex sin(y) ex cos(y)
To find f 1 , solve (u, v) = (ex cos(y), ex sin(y)) for x and y:
u2 + v 2 = e2x ,

and
f


1

v
u

v
1
log(u2 + v 2 ), y = arctan
2
u

 v 
1
.
log(u2 + v 2 ), arctan
f 1 (u, v) =
2
u
=

tan(y) =

x=

(u, v) =

u
u2 +v 2
v
u2 +v 2

= e2x

v
u2 +v 2
u
u2 +v 2

1
= 2
u + v2

u
v


ex cos(y) ex sin(y)
.
ex sin(y) ex cos(y)

v
u

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 131
i

131


1
Thus, we see that Df 1 (u, v) = Df (x, y)
.

Exercises
4.60. In Exercise 29 a sufficient condition was given for f : Rn 7 Rn to be globally
one-to-one. Verify that this condition is not satisfied by Example 4.51. (If
you did not do Exercise 29, a proof is given in Lemma 4.49.)
4.61. Show that if f : R 7 R and f (x) 6= 0 for each x R, then f is one-to-one
globally on R.
4.62. Let f : R 7 R. Show that if f is continuous and one-to-one on a connected
subset S of R, then f 1 is continuous on f (S).
4.63. If y = y(x), that is yi = yi (x1 , . . . , xn ), for i = 1, . . . , n, be C 1 . Show that
(y1 , . . . , yn )
(y1 , . . . , yn ) (x1 , . . . , xn )
6= 0 =

= 1.
(x1 , . . . , xn )
(x1 , . . . , xn ) (y1 , . . . , yn )
4.64. Let f (x, y) = (x2 , y/x) when x > 0. Find f (x, y). Show that f is one-to-one
1

on its domain by finding f 1 . Check that f
= f 1 .
4.65. Let
!
x
y
f (x, y) = p
.
,p
x2 + y 2
x2 + y 2

Show that Jf (x, y) = 0 for all (x, y) R2 \O and f is not locally one-to-one
anywhere on its domain. Show that the range of f is the circle u2 + v 2 = 1
and thus contains no open subset.

4.5

Implicit Function Theorem

Question: If we are given a system of n equations in m + n unknowns, when and


how can we choose n of the variables, call them y1 , . . . , yn to be defined implicitly
as functions of the remaining m variables, call them x1 , . . . , xm ?
Writing the independent variables x1 , . . . , xm first and the dependent variables y1 , . . . , yn second, we are asking When does the system of equations
fi (x1 , . . . , xm , y1 , . . . , yn ) = 0,

i = 1, 2, . . . , n

(4.26)

uniquely define y1 , . . . , yn implicitly as functions of x1 , . . . , xm , i.e. when are there


functions i so that
yi = i (x1 , . . . , xm ),

i = 1, 2, . . . , n?

(4.27)

Of course, the way the question was originally stated, part of the problem is to
find which are the independent variables and which are the dependent variables. The
equations will not be written for you with the independent variable so conveniently

i
i

132

muldown
2010/1/10
page 132
i

Chapter 4. Differentiation 0f Functions of Several Variables

labelled. However, since we can always rearrange variables (once we identify them),
for the sake of the statement of the Theorem, we assume the given order.
If the equations (4.26) are linear in the variables yi , then we know we have a
solution if the coefficient matrix is nonsingular. But observe, the coefficient matrix
of linear functions is nothing more than the matrix of partial derivatives for those
functions with respect to the linear variables. This suggests that the answer lies in
whether or not the Jacobian of functions with respect to the dependent variables
in nonzero.
Theorem 4.59 (Implicit Function Theorem). Suppose that
(i) f : Rm+n Rn is C 1 (D) for some open domain D Rm+n , and
(ii) there is a point (p0 , q0 ) in D for which
f (p0 , q0 ) = O

and

(f1 , . . . , fn )
(p0 , q0 ) 6= 0,
(y1 , . . . , yn )

(4.28)

where p = (x1 , . . . , xm ) denotes points in Rm , q = (y1 , . . . , yn ) denotes points


in Rn , and (p, q) denotes points in Rm+n .
Then there is a unique function : Rm Rn defined in a neighborhood U of p0
such that C 1 (U ) and
(p0 ) = q0

and

Moreover, if yi = i (p), i = 1, . . . , n, we have


f1
f1
f1
f1
. . . y
. . . x
y1
x1
n
m
.

..
..
.. = ..
..
.
.
.
.
.
.
.
fn
x1

and

...

fn
xm

h i
( f
x

fn
y1

p U.

f (p, (p)) = O,

...
h ih i

= f
y
x )

fn
yn

1
x1

.
.
.

(f1 , . . . , fn )
(y1 , . . . , yi1 , xj , yi+1 , . . . , yn )
i
yi
=
=
,
(f1 , . . . , fn )
xj
xj
(y1 , . . . , yn )

n
x1

...
..
.
...

(4.29)

1
xm

..

. ,

n
xm

i = 1, . . . , n,
j = 1, . . . , m.

(4.30)

(4.31)

Before we proceed to the proof, we illustrate the theorem by several examples:


Example 4.60 Take f (x, y) = x2 y and note f (0, 0) = 0, (f /y)(0, 0) = 1.
Hence, f (0, 0) = 0 can be solved by y = (x) = x2 in a neighborhood of x = 0.
Notice further that
f
2x
dy
x
= 2x =
= f
.
dx
1
y

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 133
i

133

What about solving for x in terms of y? Now f (0, 0) = 0 and (f /x)(0, 0) = 0


means that the condition (4.28) is not satisfied. The equation f (x, y) = 0 cannot
be solved in the form x = (y) with defined in a full neighborhood of y = 0.

Notice there are two solutions of the form x = y defined on [0, ), but even
these are not C 1 at x = 0. There are infinitely many discontinuous solutions defined

on [0, ); for example, one is (y) = y, y Q, (y) = y, y 6 Q.


Example 4.61 Given the equations
x+y+z =0
x + y + z 2xz 1 = 0
2

(f1 (x, y, z) = 0)
(f2 (x, y, z) = 0)

which of the variables can be solved in terms of the others and at what points?
Basically, we want to test when (4.28) holds. Therefore, we choose pairs of
variables and look at the Jacobian:


(f1 , f2 )
1
1
=
= 2(y x + z).
2x 2z 2y
(x, y)

Therefore, we can solve for x, y as a function of z in a neighborhood of any point


except those lying on the plane x z y = 0. Substituting x = z y into the
second equation gives
4z 2 + 4zy + 2y 2 + 1 = 0,

or z 2 + (z + y)2

1
= 0,
2

q
which implies y = z 12 z 2 . Substituting this in the first equation gives x as
a function of z.
Testing another pair of variables, we find



(f1 , f2 )
1
1
= 0.
=
2x 2z 2z 2x
(x, z)

Hence, we can never solve for x, z as a function of y. One can check this directly:
from the first equation y = (x + z) which substituted into the second equation
gives 2(x + z)2 1 = 0, or 2y 2 = 1. Thus, eliminating x also eliminates z.
Since the equations are symmetric in x and z, it should be no surprise to
discover that we can solve for z, y as a function of x in a neighborhood of any point
except those lying on the plane x z y = 0.
Example 4.62 Consider the equations
2x2 u + v = 0
2y 3 u v = 0

z + u v2 = 0

(f1 (x, y, z, u, v) = 0)
(f2 (x, y, z, u, v) = 0)
(f3 (x, y, z, u, v) = 0)

Checking several possibilities, we find



1
1
(f1 , f2 , f3 )
= 1 1
(u, v, z)
1 2v


0
0 = 2.
1

i
i

134

muldown
2010/1/10
page 134
i

Chapter 4. Differentiation 0f Functions of Several Variables


(f1 , f2 , f3 )
(u, v, x)
(f1 , f2 , f3 )
(u, v, y)
(f1 , f2 , f3 )
(x, y, z)



1 +1 4x


0 = 4x(2v + 1)
= 1 1
1 2v 0


1 +1
0

= 1 1 6y 2 = 6y 2 (2v 1)
1 2v
0


2x 0 0


= 9 6y 2 0 = 12xy 2 .
0
0 1

This shows that we can always solve for u, v, z in terms of x and y, but for the other
combinations tested, there are points at which we cannot apply the theorem. The
first case leads to the solution
v = y 3 x2 ,

u = x2 + y 3 ,

z = x2 y 3 + y 6 2y 3 x2 + x4 .

Example 4.63 Consider the equations


x2 yu
xy + uv

= 0
= 0.

(4.32)

We will view this as


f (x, y, u, v) = (x2 yu, xy + uv) = O
and define implicitly u, v as functions of x and y in a neighborhood of (x0 , y0 ). This
requires from (4.28) that we have a point (x0 , y0 , u0 , v0 ) such that


(f1 , f2 ) y0 0
=
= y0 u0 6= 0.
x20 y0 u0 = 0, x0 y0 + u0 v0 = 0, and
v0 u0
(u, v)

Now, the last equation implies y0 6= 0 and u0 6= 0, and putting this in (4.32)
indicates that x0 6= 0 as well. The equations are easy to solve as
u=

x2
,
y

v=

y 2
.
x

According to (4.31), we should have






2x 0
u 0




y u
x u
u
2x
u
u
x2
=
=
,
=
= = 2
x
yu
y
y
yu
y
y


y 2x


2
v
2x y
y
y2
y
2xv
y 2
v
x
= 2
=
=
= 2
2
x
yu
u
yu
x
x
x


y u


v
x
x v
2y
v
=
= ==
.
y
yu
u y
x

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 135
i

135

You will see from the simple exercises for this section that when the basic
condition on the Jacobian in (4.28) is not satisfied there may be no solution, or no
C 1 solution, or indeed infinitely many solutions.
Consider the function F : Rn+m 7 Rn+m defined by

Proof of Theorem 4.59.

F (p, q) = (p, f (p, q)).


Then clearly f





JF (p0 , q0 ) =


C 1 (D) implies F C 1 (D) and


Imm
f1
x1

..
.

fn
x1

f1
xm

...
..
.
...

..
.

fn
xm


Omn


f1
f1
. . . yn
(f1 , . . . , fn )
y1
=
(p0 , q0 ) 6= 0.
..
..
..
(y1 , . . . , yn )
.
.
.
fn
fn
. . . y
(p0 ,q0 )
y1
n

By Theorem 4.50, F has an inverse function in a neighborhood W of (p0 , q0 ) and


n

R (v)

F(W)

F
m

R (u)

00
11
F(p,q)

00
11
(p ,q
) R (q)
00
011
0
00
11
f
00
11
00
11
00
11
W
00
11
00
11
00
11
m
R (p)
1
0
(p,q)

f(W)
0

1
0
0f(p,q)
1

(p 0,0)

Figure 4.15. Illustrating the proof of the Implicit Function Theorem.


by Theorem 4.57, F 1 C 1 (F (W )), and F (W ) is a neighborhood (Theorem 4.53)
of (p0 , O) = F (p0 , q0 ). Now setting
F (p, q)
= (p, q)
=

=
=

p =

= (u, v)

(p, f (p, q)) = (u, v),


F

(Notice u = p)

(u, v) = (u, (u, v))

(4.33)

u and q = (u, v)
F (F 1 (u, v)) = F (u, (u, v)) = (u, f (u, (u, v)).

for all (u, v) in the neighborhood F (W ) of (p0 , O). In particular,


v = f (u, (u, v))

(u = p)

and thus,
O = f (p, (p, O))
for all p in a neighborhood U of p0 . Therefore,
O = f (p, (p))

if

(p) = (p, O),

and (p0 ) = (p0 , O) = q0

i
i

136

muldown
2010/1/10
page 136
i

Chapter 4. Differentiation 0f Functions of Several Variables

so that (4.29) holds. Also C 1 (U ) follows from (4.33) since F 1 C 1 (F (W )).


To prove (4.31), we apply the Chain Rule to O = f (p, (p)) to obtain


Imm
O = Df (p, (p))
D(p)
n

= 0 =

fi X fi k
+
,
xj
yk xj

i, j = 1, . . . , m,

k=1

which gives (4.30). Viewing that as a linear system of equations for unknowns
k /xj , k = 1, . . . , n, Cramers Rule gives (4.31)
Remark: The function is the unique solution to f (p, (p)) = O with (p0 ) = q0
since if to the contrary we had two solutions
f (p, 1 (p)) = f (p, 2 (p)) = O

and 1 (p) 6= 2 (p)

for some p U , then

F (p, 1 (p)) = F (p, 2 (p)) = (p, O),


contradicting the fact that F is one-to-one on W . This means that the graph of ,
the set {p, (p)) : p U } is the whole set F 1 (O) W .
Corollary 4.64. Let f : R2 7 R. If f is of class C 1 in a neighborhood of (x0 , y0 )
with
f
(x0 , y0 ) 6= 0,
f (x0 , y0 ) = 0 and
y
then the equation f (x, y) = 0 has a unique solution y = (x) with (x0 ) = y0 , which
exists and is continuously differentiable in a neighborhood U of x0 and
(x) =

dy
f /x
=
.
dx
f /y

Corollary 4.65. Let f : R3 7 R. If f is of class C 1 in a neighborhood of (x0 , y0 , z0 )


with
f
f (x0 , y0 , z0 ) = 0 and
(x0 , y0 , z0 ) 6= 0,
z
then the equation f (x, y, z) = 0 has a unique solution z = (x, y) with (x0 , y0 ) =
z0 , which exists and is continuously differentiable in a neighborhood U of (x0 , y0 )
and
f /x
z
f /y
z
=
,
==
.
x
f /z
y
f /z
Corollary 4.66. Let f : R5 7 R2 . If f = (f1 , f2 ) is of class C 1 in a neighborhood
of (x0 , y0 , z0 , u0 , v0 ) with

f1 (x0 , y0 , z0 , u0 , v0 ) = 0,
(f1 , f2 )
and
(x0 , y0 , z0 , u0 , v0 ) 6= 0,
(u, v)
f2 (x0 , y0 , z0 , u0 , v0 ) = 0,

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 137
i

137

then the equations


f1 (x, y, z, u, v) = 0

and

f2 (x, y, z, u, v) = 0

have unique solution


u = 1 (x, y, z),

v = 2 (x, y, z)

with

u0 = 1 (x0 , y0 , z0 ),

v0 = 2 (x0 , y0 , z0 ),

which exist and are continuously differentiable in a neighborhood U of (x0 , y0 , z0 )


and
(f1 ,f2 )

(f1 ,f2 )

2
v
(u,x)
=
= (f ,f ) ,
1 2
x
x

u
1
(x,v)
=
= (f ,f ) ,
1 2
x
x
(u,v)

(u,v)

(f1 ,f2 )

(f1 ,f2 )

1
u
(y,v)
=
= (f ,f ) ,
1 2
y
y

v
2
(u,y)
=
= (f ,f ) ,
1 2
y
y

(u,v)

(u,v)

(f1 ,f2 )

(f1 ,f2 )

1
u
(z,v)
=
= (f ,f ) ,
1 2
z
z

v
2
(u,z)
=
= (f ,f ) .
1 2
z
z

(u,v)

(u,v)

Exercises
4.66. Prove Corollary 4.64. That is, work through the proof of Theorem 4.59 in
this special case.
4.67. The equation y 2 x2 = 0 has two C 1 solutions, y = x in a neighborhood of
x = 0; this shows uniqueness may not hold. What condition of the Implicit
Function Theorem does not hold? Check that there are four solutions of class
C and infinitely many real-valued solutions.
4.68. Show that the equations
x2 yu = 0,

u = u0 ,

v = v0 ,

xy + uv = 0,

define u, v implicitly as functions of x and y in a neighborhood of any point


(x0 , y0 ) if y0 u0 6= 0 and x0 y0 u0 = 0, x0 y0 + u0 v0 = 0. Check that the
Implicit Function Theorem gives
2x
u
=
,
x
y

u
u
=
,
y
y

v
y
2vx
=
,
x
u
uy

v
x v
= +
y
u y

and verify these results by solving the equations directly.


4.69. Consider f (x, y, u, v) = (x3 + yu + v, xv + y 3 u). At what points (x, y, u, v)
can one solve f (x, y, u, v) = (0, 0) in the form (x, y) = (1 (u, v), 2 (u, v))?
Find the differential matrix
 1 1 
u
v
2 .
2
u

i
i

138

muldown
2010/1/10
page 138
i

Chapter 4. Differentiation 0f Functions of Several Variables

4.70. Suppose i (u1 , . . . , un , x1 , . . . , xn ) = 0, i = 1, . . . , n, is satisfied by uj =


uj (x1 , . . . , xn ), the s and us being C 1 functions. Prove that
(1 , . . . , n )
(1 , . . . , n ) (u1 , . . . , un )

= (1)n
.
(u1 , . . . , un ) (x1 , . . . , xn )
(x1 , . . . , xn )
[Hint: Use the Chain Rule and think matrices.]
4.71. If u2 + v 2 + 2xuv + y = 0, uv + (u + v)y + x2 = 0, prove that
(u, v)
uv(u + v) x
=
.
(x, y)
(u v)[(u + v) + y(1 x)]
4.72. If u1 = x1 + x2 + x3 + x4 , u1 u2 = x2 + x3 + x4 , u1 u2 u3 = x3 + x4 , and
u1 u2 u3 u4 = x4 , show that
(x1 , x2 , x3 , x4 )
= u31 u22 u3 .
(u1 , u2 , u3 , u4 )
4.73. Do the following
(a) If V = (u, v), (u, v) = E(x, y), (u, v) = F (x, y), and (, )/(u, v) 6=
0. Prove
E (, ) F (, )
V (, )

=
+

.
x (u, v)
x (u, v)
x (u, v)
(b) If V = u2 + v 2 + uv, u + v = x2 + y 2 , u3 + v 3 = 2xy, prove that
3

2x(x2 y 2 )
V
+ 8y(x2 + y 2 ).
=
y
(x2 + y 2 )2

4.74. If V = (u1 , . . . , un ), k = (u1 , . . . , un ) = fk (x1 , . . . , xm ) k = 1, . . . , n, and


(1 , . . . , n )/(u1 , . . . , un ) 6= 0, show that
n

V (1 , . . . , n ) X fk (1 , . . . , k1 , , k+1 , . . . , n )

=
.
xj (u1 , . . . , un )
xj
(u1 , . . . , un )
k=1

4.75. If F C 1 show that the equation


F (F (x, y), y) = 0
may be solved for y as a function of x near (0, 0) provided F (0, 0) = 0,
Fx (0, 0) 6= 1 and Fy (0, 0) 6= 0.
4.76. Suppose f (x, y) = 0 where f C 2 and f /y 6= 0. Prove that

f
f
0
x
y

1 f
d2 y
2f
2 f
=  3 x
x2
xy .
dx2
f

f
2

f
2 f

y
2
y

yx

i
i

4.5. Implicit Function Theorem

4.5.1

muldown
2010/1/10
page 139
i

139

Dimension

We have a notion of dimension for linear objects, specifically for vectors spaces,
namely, the number of vectors in a basis. If L : Rk 7 Rn is linear, L(x) = Ax, then
the dimension of L(Rk ) Rn is the rank of A. In particular, if rank A = k, then
L(Rk ) Rn is k-dimensional, the same dimension as Rk .
Definition 4.67. A subset S of Rn is a k-dimensional segment, k > 0, if there is
an open connected set D Rk and a function f : Rk 7 Rn such that
(a) f C 1 (D) and f : D 7 S is one-to-one and onto (i.e. f (D) = S),
(b) rank f (p) = k for all p D.
A 0-dimensional segment in Rn is a single point in Rn .
A subset S of Rn is a k-dimensional manifold if for every q S, there exists
an open neighborhood V Rn of q such that V S is a k-dimensional segment.
Remarks: Note that k n always. A k-dimensional segment is automatically a
k-dimensional manifold. If the condition of one-to-one is dropped in (a), then f is
still locally one-to-one by Theorem 4.50, and so, f (U ) is a k-dimensional segment
if U is a sufficiently small open subset of D.
Example 4.68 The set S := {(x, y) : y = 3x, 0 < x < 1} is a 1-dimensional
manifold (segment) in R2 . Indeed, let D = (0, 1) R and define f (t) = (t, 3t),
0 < t < 1. Then f (t) = [ 1 3 ] and rank f = 1.
Example 4.69
Example 4.70 The set S := {(x, y, z) : x2 +y 2 +z 2 = 1,
is a 2-dimensional segment in R3 . In this case, consider

x > 0,


f (, ) := cos() sin(), sin() sin(), cos() = (x, y, z),



Then f C 1 (0, /2) (0, /2) , f is one-to-one, and

sin() sin()
f (, ) = cos() sin()
0

Now rank f (, ) = 2 since for example

2 and 0 < < 2 .

(y,z)
(,)

y > 0,
0<<
0<<

z > 0}

2.

cos() cos()
sin() cos() .
sin()

= cos() sin2 () 6= 0 when 0 < <

Example 4.71 The set S := {(x, y) : (x+1)2 +y 2 = 1}{(x, y) : (x1)2 +y 2 = 1}


is a 1-dimensional segment. This is two circles which are tangent at the origin. We
use the mapping f : (2, 2) 7 R2 defined by

(1 + cos(t), sin(t))
if t (2, 0];
f (t) =
(1 + cos( t), sin( t)), if t (0, 2).

i
i

140

muldown
2010/1/10
page 140
i

Chapter 4. Differentiation 0f Functions of Several Variables

V
1
0
0 1
1
0
1
0
1
0
1
1111111
0000000
0
1
0
1
0
1
0
1
0
1
0
1
0S
1
1

1
0
V2
0
1
0
1
0
1
0
1
0
1
111111
000000
0
1
0
1
0
1
0
1
0
1
0
1
S

0
1
0
1
0
1
0
1
0
1
0
1
0
1
11111111
00000000
0
1
0
1
0
1
0 S
1
0
1
0
1

Figure 4.16. The unit circle


 in Example 4.69 as the union of two 1segmants S = f (, ) f (0, 2) .
The set S := {(x, y) : x2 + y 2 = 1} is a 1-dimensional manifold in R2 . To see this
consider the map f (t) = (cos(t), sin(t)), t R. Then


sin(t)
= 1.
rank f (t) = rank
cos(t)
Thus, f is locally one-to-one and is in fact one-to-one on any open interval of length
2. Then


S = f (, ) f (0, 2) .
z

11
00
00(x,y,z)
11

Figure 4.17. The parametrization of the sphere in Example 4.70.


One can check that f C 1 (2, 2), that f is one-to-one, and that rank f = 1
through out the interval. In particular, take special care to check that there is no
trouble at (0, 0).
Example 4.72 The n-dimensional manifolds in Rn are precisely the open sets;
that is, S is an n-dimensional manifold in Rn if and only if S is open. This follows
from the definition and Theorem 4.53.
Example 4.73 S is a 0-dimensional manifold in Rn if and only if for every p S,
there is an open set V such that V S = {p}; that is, S consists of isolated points.
Thus, {1/k : k = 1, 2, 3, . . .} is a 0-dimensional manifold in R, but {1/k : k =
1, 2, 3, . . .} {0} is not.
The function f in the definition of a k-dimensional segment is called a parameterization

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 141
i

141

x
Figure 4.18. The 1-dimensional segment of Example 4.71.
of the segment (the variable of f being called the parameter). A local parameterization of a manifold is essentially a local coordinate system for the manifold; for
example, in Example 4.70,
/2

= 0

= /2
= 0
0

= /2
x

Figure 4.19. Derivative as best linear approximation.Parametrization as


a local coordinate system for the manifold.
Exercise 82 below shows that the definition of the dimension of a manifold
is consistent in the sense that an object cannot be both an r-dimensional and a
k-dimensional manifold with r 6= k.
From calculus in R2 and R3 you are familiar with non-parametric representations of manifolds in the form of the equations f (p) = O (shorthand for
{p : f (p) = O} = f 1 (O). For example, x2 + y 2 1 = 0 represents a 1dimensional manifold in R2 ; the same equation represents a 2-dimensional manifold
in R3 , namely a cylinder; while the two equations x2 + y 2 1 = 0, z = 0 together
represent the 1-dimensional circle in R3 . The latter is the intersection of a cylinder
and a plane perpendicular to the axis of the cylinder. The next theorem makes this
linkage between manifolds and solution sets of equations more precise.
Theorem 4.74. Let f : Rn 7 Rk . Suppose that
(i) f C 1 (D), D an open set in Rn with O f (D),
(ii) rank f (p) = k n for all p D.
Then f 1 (O) := {p : f (p) = O} is an (n k)-dimensional manifold in Rn .

i
i

142

muldown
2010/1/10
page 142
i

Chapter 4. Differentiation 0f Functions of Several Variables


Less formally, the system of equations
fi (x1 , . . . , xn ) = 0,

i = 1, . . . , k

represents an (n k)-dimensional manifold in Rn if the matrix

fi
xj

has rank k.

Proof. If k = n, then by Theorem 4.50, f is locally one-to-one on D. Thus, if


f (p0 ) = O there is a neighborhood U of p0 such that f (p) 6= O if p U and
p 6= p0 . Thus, f 1 (O) is a set of isolated points in Rn a 0-dimensional manifold.
If 0 < k < n, and c = (1 , . . . , n ) f 1 (O), we may assume, without loss of
generality, that
(f1 , . . . , fk )
(c) 6= 0.
(x1 , . . . , xk )
By the Implicit Function Theorem, the system of equations
fi (x1 , . . . , xk , xk+1 , . . . , xn ) = 0,

i = 1, . . . , k,

may be solved uniquely in the form xi = i (xk+1 , . . . , xn ), with i = i (k+1 , . . . , n ),


i = 1, . . . , k, in an open neighborhood U of (k+1 , . . . , n ). Thus, for some neighborhood V of c, the set f 1 (O) V is the range of the function
F (xk+1 , . . . , xn ) = (1 (xk+1 , . . . , xn ), . . . , k (xk+1 , . . . , xn ), xk+1 , . . . , xn )



F =
, and rank F = n k.
I(nk)(nk)

with

Now F is one-to-one on V if V is sufficiently small by Theorem 4.50. Thus, f 1 (O)


is an (n k)-dimensional manifold in Rn .
Remark: The dimension of the range of f (that is, the number of equations fi = 0)
is immaterial here. If rank f (p) = k on some open set D f 1 (O), then f 1 (O)
is an (n k)-dimensional manifold no matter how many equations there are. You
are no doubt familiar with the corresponding statement for linear functions: If
L : Rn 7 Rm is linear, L(x) = Ax, for a matrix A with rank A = k, then L1 (O)
is a vector space of dimension n k. The number n k is called the nullity of the
matrix A.
Corollary 4.75. If f : R 7 R and is C 1 on an open set D in R, then the graph of
f restricted to D is a 1-dimensional manifold (smooth curve) in R2 .
Proof. Let F (x, y) = f (x) y, (x, y) D f (D). Then {(x, y) : y = f (x),
D} F 1 (0), and
F (x, y) = [ f (x)

1 ]

and

rank F = 1.

Corollary 4.76. Let f : R2 7 R. If f is C 1 on its domain, an open set in R2 ,


then the graph of f is a 2-dimensional manifold (smooth surface) in R3 .

i
i

4.5. Implicit Function Theorem


Proof.

muldown
2010/1/10
page 143
i

143

This time take, F (x, y, z) = f (x, y) z. Then the graph is F 1 (0),


F (x, y, z) = [ fx

1 ] ,

fy

and

rank F = 1.

Example 4.77 Let f (x, y, z) = x2 + y 2 z 2 . Then f (x, y, z) = [ 2x 2y 2z ].


Thus, f (x, y, z) = 0 is a 2-dimensional manifold (smooth surface) in R3 if the origin
is omitted.
z

f=0
0

Figure 4.20. The set where x2 + y 2 z 2 = 0 as a 2-dimensional manifold


in Example 4.77
Example 4.78 Let f = (f1 , f2 ) where f1 (x, y, z) = x2 +y 2 +z 2 1 and f2 (x, y, z) =
Ax + By + Cz. Then


2x 2y 2z
f (x, y, z) =
.
A B C

f 1 (O) is a 1-dimensional manifold in R3 if rank f = 2 on an open set D f 1 (O).

000
11

f=0

Figure 4.21. The set where x2 + y 2 + z 2 = 1 intersected with a plane


Ax + By + Cz = 0 as a 1-dimensional manifold in Example 4.78
Thus, if A, B, C are not all zero (i.e. A2 + B 2 + C 2 > 0), we have
rank f < 2 (A, B, C) = (x, y, z) with 6= 0.
Substituting the last relation into f2 = 0, we find x2 + y 2 + z 2 = 0 if rank f < 2,
contradicting f1 = 0. Thus, rank f = 2 at each point in f 1 (O), so rank f = 2 on
a neighborhood of each point in f 1 (O). Note that we have assumed f 1 (O) 6=
here which is obvious geometrically in this case, but may not always be at all clear
in general problems.

i
i

144

muldown
2010/1/10
page 144
i

Chapter 4. Differentiation 0f Functions of Several Variables

Question: Let
A(x) =

a(x)
d(x)

b(x)
e(x)


c(x)
.
f (x)

Does rank A(x0 ) = 2 imply that rank A(x) = 2 near x0 if the entries in A are
continuous? Does rank A(x0 ) = 1 imply rank A(x) = 1 near x0 ?
Theorem 4.79. Let D Rn be open. Suppose
(i) f : D 7 Rn and f C 1 (D),
(ii) rank f (p) = k for every p D.
Then for each c D, there is a neighborhood Uc of c such that f (Uc ) is a kdimensional segment.
This theorem is analogous to the statement that the dimension of the range
of any linear function L(x) = Ax is the rank of A. The proof of this theorem is
notationally quite complicated so we will consider a few examples and special cases
first. Actually, all the essential ideas of the proof are contained in the special cases
so you may skip the proof if you wish.
Example 4.80 Let f : R2 7 R2 be defined by f (x, y) = (x + y, (x + y)2 ). Then


1
1
rank f (x, y) = rank
= 1 (x, y).
2(x + y) 2(x + y)
Thus, f (R2 ) is the 1-dimensional segment {(t, t2 ) : t R}, a parabola. We have
used t = x + y to parameterize the curve.
Example 4.81 Let f : R3 7 R3 be defined by
f (x, y, z) = (x y, y z, x(x 2y) z(z 2y)).
Then

1
1
rank f (x, y, z) = rank
0
1
2(x y) 2(z x)

0
1 = 2.
2(y z)

Here f (R3 ) is the 2-dimensional segment {(u, v, u2 v 2 ) : (u, v) R2 } (a hyperboloid


w = u2 v 2 ) in R3 . We used (u, v) = (x y, y z) to parameterize the surface.
Proposition 4.82. Let f : R2 7 R2 , and f C 1 (D), D open in R2 . If
rank f (p) = 1 for every p D, and u = f1 (x, y), v = f2 (x, y), then in a neighborhood of any point (x0 , y0 ) D,
either

u = (v)

or

v = (u),

, C 1 ,

that is, f (D) is composed of 1-dimensional segments.

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 145
i

145


Proof. If f1 /x (x0 , y0 ) 6= 0, then f1 /x 6= 0 in an open neighborhood U of
(x0 , y0 ), so f1 (U ) is open (Theorem 4.53). By the Implicit Function Theorem, near
(x0 , y0 , u0 ), the equation u = f1 (x, y) may be solved uniquely for x in the form
x = (u, y),

with

C 1 = v = f2 ((u, y), y).

We show that the last equation is independent of y:


1
f
y

f2
f2
f2
v
=
+
=
y
x y
y
x
=

f1 f2
x y

f2 f1
x y

f1
x

f1
x
(f1 ,f2 )
(x,y)
f1
x

f2
y

= 0,

since rank f = 1. Thus, v = (u); that is, we have a local parameterization of


f (U0 ) in the form
f (U0 ) = {(u, (u)) : u f1 (U0 )} ,
where U0 is an open neighborhood of (x0 , y0 ).

Proposition 4.83. Let f : R3 7 R3 and f C 1 (D) for an open set D R3 . If


rank f (p) = 2 for all p D and
u = f1 (x, y, z),

v = f2 (x, y, z),

w = f3 (x, y, z),

then in some neighborhood U of each point (x0 , y0 , z0 ) D,


either

w = (u, v),

or

u = (v, w),

or

v = (w, u),

, , C 1 ,

that is, f (U ) is a 2-dimensional segment in R3 .


Proof.

From the Implicit Function Theorem,


(f1 , f2 )
(x0 , y0 , z0 ) 6= 0 = u = f1 (x, y, z) and v = f2 (x, y, z),
(x, y)

are uniquely solvable for (x, y) in the form


x = 1 (u, v, z),

y = 2 (u, v, z),

1 , 2 C 1 ;

hence
w = f3 (1 (u, v, z), 2 (u, v, z), z).
We observe that this last function is actually independent of z:
f3 1
f3 2
f3
w
=
+
+
z
x z
y z
z

i
i

146

muldown
2010/1/10
page 146
i

Chapter 4. Differentiation 0f Functions of Several Variables

(f1 ,f2 )
(f1 ,f2 )
f3 (z,y) f3 (x,z) f3
(f ,f ) +
(f ,f ) +
=
1 2
1 2
x
y
z
(x,y)

f3 (f1 ,f2 )
x (y,z)

(f1 ,f2 ,f3 )


(x,y,z)
(f1 ,f2 )
(x,y)

= 0,

(x,y)

f3 (f1 ,f2 )
y (x,z)

f3 (f1 ,f2 )
z (x,y)

(f1 ,f2 )
(x,y)

since

rank f = 2.

Therefore,
w = (u, v)

and f (U0 ) = {(u, v, (u, v)) : (u, v) (f1 , f2 )(U0 )},

for some open neighborhood U0 of (x0 , y0 , z0 ).


Proof of Theorem 4.79. (Optional) If k = n, then f is locally one-to-one (Theorem
4.50) so the theorem reduces to the definition of a k-dimensional segment. If k = 0,
then
fi
= 0, i, j = f is constant
xj
in a convex neighborhood U of c, thus f (U ) = {f (c)} which is a 0-dimensional
segment.
When 0 < k < n, consider
ui = fi (x1 , . . . , xn ),

i = 1, . . . , n.

(4.34)

We are given that rank f (p) = k. So if c D and if (say)


(f1 , . . . , fk )
(p) 6= 0 at p = c,
(x1 , . . . , xk )

(4.35)

then (4.35) also holds in an open neighborhood U of c since f C 1 (D). Therefore,


the set
V = (f1 , . . . , fk )(U ) Rk is open in Rk
by Lemma 4.52. By the Implicit Function Theorem, condition (4.35) implies that
the first k equations in (4.34) may be solved for (x1 , . . . , xk ) in the form
xi = i (u1 , . . . , uk , xk+1 , . . . , xn ),

i = 1, . . . , k.

(4.36)

Substituting (4.36) into the remaining (n k) equations in (4.34) yields


ui = fi (1 , . . . , k , xk+1 , . . . , xn ),

i = k + 1, . . . , n.

(4.37)

We claim that these are functions of (u1 , . . . , un ) only. We show only that there is
no dependence on xk+1 . By the derivative formula (4.31) of the Implicit Function
Theorem,
(f1 ,...,fk )

(x ,...,x1 ,xk+1 ,x+1 ,...,xk )


= 1
,
(f1 ,...,fk )
xk+1

= 1, . . . , k.

(x1 ,...,xk )

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 147
i

147

Substituting these into the Chain Rule applied to equations from (4.37), using the
effect of interchanging rows in determinants, and using the expansion of a determinant about row i, we obtain

(f1 ,...,fk )
k
X
f
ui
(x
,...,x
,x
,x
,...,x
)
i
1 k+1 +1
k
+ fi
=
1
(f
,...,f
)
1
k
xk+1
x
xk+1
=1

k+1
X
=1

(x1 ,...,xk )

(f1 ,...,fk )
f
(x
,...,x
,x
,...,x
,x
)
i
1
1 +1
k
k+1

(1)k+1
(f1 ,...,fk )
x

(f1 ,...,fk ,fi )


(x1 ,...,xk+1 )
(f1 ,...,fk )
(x1 ,...,xk )

= 0,

since

(x1 ,...,xk )

rank f = k.

Thus, (4.37) may be written in the form


ui (u1 , . . . , uk ) = 0,

i = k + 1, . . . , n,

with

i C 1 (V0 )

(4.38)

where V0 = (f1 , . . . , fk )(U0 ) is open. The Jacobian matrix of the (n k) equations


(4.38) is the (n k) k matrix of the form
[ I(nk)(nk)

which has rank (n k). Therefore, by Theorem 4.74, is a k-dimensional manifold.


[Notice that the pen never left the hand!!]

4.5.2

Application: Lagrange Multipliers

Here we provide an application of the Implicit Function Theorem to extremal problems subject to constraints.
Definition 4.84. Let f : D Rn 7 R be a real-valued function on the domain D.
The function f has a relative maximum (respectively, minimum) with respect to a set
S at p0 S D if, for some neighborhood U of p0
f (p) f (p0 )

p S U

(respectively,

f (p) f (p0 )

p S U ).

Theorem 4.85 (Lagrange Multiplier Rule). Let f : Rn 7 R and g =


g1 , . . . , gk ) : Rn 7 Rk , k < n. If f and g are C 1 functions near p0 and
(i) f has a relative extrema at p0 with respect to the set
S = g 1 (O) = {p : gi (p) = 0,

i = 1, . . . , k},

(ii) rank g (p0 ) = k,

i
i

148

muldown
2010/1/10
page 148
i

Chapter 4. Differentiation 0f Functions of Several Variables

then there exists real numbers 1 , . . . , k such that


k
X
gj
f
j
(p0 ) +
(p0 ) = 0,
xi
xi
j=1

i = 1, . . . , n.

The equations gi (p) = 0, i = 1, . . . , k are called constraints. The numbers i


are called Lagrange multipliers.
Before we give a couple of proofs of this theorem, we give some applications.
Example 4.86 Find the distance d from the point (0, 0) to the line x + y = 1. The
problem is to minimize x2 + y 2 subject to the constrain x + y 1 = 0. We note that
a minimum exists since x2 + y 2 goes to infinity as |(x, y)| and that x2 + y 2 is
non-negative. At the minimum value, the Lagrange Multiplier Rule says that there
is a such that
2x + = 0,

2y + = 0,

with

x + y 1 = 0.

Solving this linear system of equations for (x, y, ) we find


1 1
(x, y, ) = ( , , 1) = d2 =
2 2

 2  2
1
1
1
+
= ,
2
2
2

1
or d = .
2

Example 4.87 Find the maximum and minimum of f (x, y) = 2x2 3y 2 2x in


the set {(x, y) : x2 + y 2 1}.
If either the max or min occur in the open set {(x, y) : x2 + y 2 < 1}, then
)


f
=
4x

2
=
0
1
x
= (x, y) =
,0 .
f
= 6y
= 0
2
y
But,

fxx
fyx

fxy
fyy

4 0
=
0 6

is indefinite,

so f has a saddle point at ( 21 , 0). Therefore, the maximum and minimum occur in
the set
{(x, y) : x2 + y 2 1 = 0}.
The Lagrange Multiplier Rule then gives the existence of a such that at the
extrema
4x 2 + 2x = 0

6y + 2y = 0
x2 + y 2 1 = 0.

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 149
i

149

From the second equation y = 0 or = 3. If y = 0, then from the constraint


x = 1. If = 3, then putting
this into the first equation we find x = 1/5, and
p
from the constraint y = 24/25. Testing these points, we find
p
f (1, 0) = 0, f (1, 0) = 4, and f (1/5, 24/25) = 16/5.

Therefore the maximum of 4 is achieved


p at (1, 0), and the minimum of 16/5 is
achieved at two points, namely (1/5, 24/25).

The Lagrange multipliers are of no interest per se, and if they can be eliminated
from the equations so much the better. You will observe that either of the examples
above could have been done by solving the constraints to reduce the dimension of
the problem and completed by applying standard methods. For example, in the
first of the examples, we could solve the constraint for y = x 1 and rewrite
x2 + y 2 = x2 + (1 x)2 = 2x2 2x + 1 which has a minimum 1/2 at x = 1/2. Such a
reduction technique however often leads to unwieldy calculations, but none-the-less
we will use it in one of the proofs of the theorem.
Proof of 1 of Theorem 4.85.
Let f have a relative extremum at p0 with respect
to the set S = {p : g(p) = O}. Consider the function F : Rn 7 Rk+1 given by
F (p) = (f (p), g(p)) = (f (p), g1 (p), . . . , gk (p)).
Then

f
x1
g1
x1

f (p)
=
F (p) =

.
g (p)
..

gk
x1

...
...
..
.
...

f
xn
g1
xn

.. .
.

gk
xn

If rank F (p0 ) = k + 1, then rank F (p) = k + 1 for each p in a sufficiently small


open neighborhood U of p0 . Then for each such U , F (U ) is an open neighborhood of
F (p0 ) = (f (p0 ), g(p0 )) = (f (p0 ), O) (Theorem 4.53). Hence, every neighborhood
U of p0 contains points p1 , p2 such that F (pi ) = (f (pi ), O), i = 1, 2 (thus, g(p1 ) =
g(p2 ) = O) with
f (p1 ) > f (p0 ) and f (p2 ) < f (p0 ),
a contradiction to f (p0 ) being an extrema relative to the set S.
Therefore, rank F (p0 ) < k+1, and in fact, rank F (p0 ) = k since rank g (p0 ) =
k by assumption. Therefore the rows of F (p0 ) are linearly dependent, which means
there are real numbers 0 , . . . , k such that






g1
gk
f
g1
gk
f
+ 1
+ . . . + k
at p0
,...,
,...,
,...,
0
x1
xn
x1
xn
x1
xn
= O, not all s are zero.
In fact 0 6= 0 for otherwise rank g (p0 ) < k. Therefore we may assume 0 = 1 (by
dividing by it) to obtain
f
g1
gk
(p0 ) + 1
(p0 ) + . . . + k
(p0 ) = 0,
xi
xi
xi

i = 1, . . . , n.

i
i

150

muldown
2010/1/10
page 150
i

Chapter 4. Differentiation 0f Functions of Several Variables


R

k
F(U)

U
p

0
(f(p ),0)
0

S: g(p)=0
n
R

Figure 4.22. Every neighborhood U of p0 contains points p1 , p2 such that


F (pi ) = (f (pi ), O), i = 1, 2
These n equations together with the k constraint equations
gj (p0 ) = 0,

j = 1, . . . , k,

serve to determine the (n + k) numbers which are the coordinates of p0 and the
1 , . . . , k .
Proof of 2 of Theorem 4.85.
This proof works in general but we prove just the
case of one constraint. Suppose f, g : Rn 7 R are C 1 functions and f (p), p =
(x1 , . . . , xn ), has an extremum at c = (1 , . . . , n ), with respect to the constraint
g(p) = 0.

(4.39)

If rank g (c) = 1, then we may assume that (g/x1 )(c) 6= 0. Then, equation (4.39)
may be solved for x1 in a neighborhood of (2 , . . . , n ) in the form
x1 = (x2 , . . . , xn ),

with C 1

and 1 = (2 , . . . , n ).

Therefore, f ((x2 , . . . , xn ), x2 , . . . , xn ) has an interior relative extremum at c =


(2 , . . . , n ). Hence all of its partial derivative vanish at c; so by the Chain Rule
f
f

(c) +
(c)
(c) = 0, i = 2, . . . , n, or
xi
x1
xi
!
g
f
f
xi (c)
= 0, i = 2, . . . , n.
(c) +
(c) g
xi
x1
x1 (c)
The last equation holds trivially for i = 1 as well. Hence we find
g
f
(c) +
(c) = 0,
xi
xi

i = 1, . . . , n

1
with = x
.
g

x1

Exercises
4.77. Let f (x, y) = (x + y, 2x + ay) = (u, v).

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 151
i

151

(i) Show that rank f (x, y) = 2 if and only if a 6= 2.


(ii) Find the image of the square

K := {(x, y) : 0 x 1,

0 y 1}

in the three cases a = 1, a = 2, a = 3.


(iii) If a point moves around the boundary of K in the counterclockwise
sense, find what its image does in the cases a = 1, a = 3. What is the
sign of Jf in each case?
4.78. Let f (x, y) = (x, xy) = (u, v). Draw some curves u = constant, v = constant
in the (x, y) plane and some curves x = constant, y = constant in the (u, v)
plane. Is this map one-to-one? Into what region of the (u, v) plane does f
map the rectangle {(x, y) : 1 x 2, 0 y 2}? What points in the
(x, y) plane map into the rectangle {(u, v) : 1 u 2, 0 v 2}?
4.79. If f (x, y) = (cos(x + y), sin(x + y)) = (u, v), show from f (x, y) that (u, v)
lies on a curve in the (u, v) plane. What is its equation?
4.80. In each of the following show that (u, v, w) lies on a surface in R3 and find
an equation for the surface.
(i) u =

x+y
z ,

v=

z+y
x ,

w=

y(x+y+z)
.
xz

(ii) u = x + y + z, v = xy + yz + zx, w = x3 + y 3 + z 3 3xyz.

(iii) u = x + y + z, v = x2 + y 2 + z 2 , w = x3 + y 3 + z 3 3xyz.

[Solutions: (i) w + 1 = uv, (ii) w = u(u2 3v), (iii) w = u(3v u2 )/2.]


4.81. Show that {(u cos(v), u sin(v), u) : 0 < u < 1, 0 < v < 2} is a 2dimensional segment in R3 . Draw a picture of it.
4.82. Show that a set cannot be both an r-dimensional segment and a k-dimensional
segment if r 6= k. [Hint: Suppose S = (U ) = (V ) where , are C 1
functions on U, V respectively, U, V are open sets in Rr , Rk respectively with
(say) r < k. Show that if and are one-to-one and rank = r, rank = k,
there is a contradiction. Use the Implicit Function Theorem.]
4.83. If i , i = 1, . . . , k are C 1 functions on an open set D Rn , then the set of
points (x1 , . . . , xn , y1 , . . . , yk ) in Rn+k satisfying
yi = i (x1 , . . . , xn ),

i = 1, . . . , k

is an n-dimensional manifold in Rn+k . [That is, the graph of a C 1 function


from Rn 7 Rk is an n-dimensional manifold if its domain is open.]
4.84. Let V = x2 + y 2 + z 2 , = x2 + y 2 1, = x + y + z. Find the
maximum and minimum of V subject to the constraints = 0, = 0.
This means: Find the length (squared) of the major and minor semi-axes
of the ellipse C which is the intersection of the cylinder
= 0 and the
plane = 0.
[Solution:
max
V
=
V
((1/
2,
1/
2, 2)) = 3 and

min V = V ((1/ 2, 1/ 2, 0)) = 1.]


4.85. Find the minimum value of V = x2 + y 2 + z 2 when

i
i

152

muldown
2010/1/10
page 152
i

Chapter 4. Differentiation 0f Functions of Several Variables

=0
C
=0

Figure 4.23. Illustration for Exercise 84.


(i) x + y + z = 3a [Solution: 3a2 at (a, a, a).]
(ii) xy + yz + zx = 3a2 [Solution: 3a2 at (a, a, a).]

(iii) xyz = a3 [Solution: 3a2 at (a, a, a), (a, a, a), (a, a, a), (a, a, a).]

4.86. Find the maximum and minimum values of x2 + y 2 3x + 5y when (x + y)2 =


4(x y).
4.87. Find the maximum and minimum values of 2x2 + y 2 + 2x if |x| + |y| 1.
4.88. Do Exercise 53 again using the Lagrange Multiplier Rule.
4.89. Prove that the stationary values of V = x2 +y 2 +z 2, subject to the constraints
x + my + nz = 0, ax2 + by 2 + cz 2 = 1 are given by the quadratic
2
m2
n2
+
+
= 0.
1 aV
1 bV
1 cV
4.90. Show that the following inequalities are true.
(a) (x1 x2 xn )2

1
nn

if x21 + x22 + . . . + x2n = 1.

(b) (Arithmetic-geometric mean inequality)


(a1 a2 an )1/n

1
(a1 + a2 + . . . + an ),
n

ai 0, i = 1, . . . , n.

4.91. Given two smooth plane curves


f (x, y) = 0 and g(x, y) = 0
show that when the distance between point (, ) and (, ) lying on the
respective curves has an extremum, then
fx (, )
gx (, )

=
=
.

fy (, )
gy (, )
Use this to find the shortest distance between the ellipse x2 +2xy+5y 2 16y =
0 and the line x + y 8 = 0.

i
i

4.5. Implicit Function Theorem

muldown
2010/1/10
page 153
i

153

4.92. Let f be a real valued function of class C 1 on R3 . Prove that there are at least
two points on the sphere x2 + y 2 + z 2 = R2 , R > 0, at which the equations
f
f
x
=0
x
y
f
f
z
y
=0
y
z
f
f
x
z
=0
z
x

are satisfied.
4.93. (The H
older and Minkowski inequalities.) Let p > 1, q > 1 and
xp
p

(a) Show that the minimum of f (x, y) =


x > 0, y > 0, and xy = 1 is 1.
(b) Show that a 0, b 0 implies ab

ap
p

+
+

y
q

1
p

1
q

= 1.

subject to the constraints

bq
q .

(c) Show that ak 0, bk 0, k = 1, . . . , n implies


n
X

k=1

ak b k

n
X

apk

k=1

!1/p

n
X

bqk

k=1

!1/q

(Holders Inequality).

Pn
Pn
1/p
1/q
[Hint: Let A = ( k=1 apk )
and B = ( k=1 bqk )
and consider a =
ak /A, b = bk /B.]

(d) If ak , bk , k = 1, . . . , n are real numbers and p 1, then


n
X

k=1

|ak + bk |

!1/p

n
X

k=1

|ak |

!1/p

n
X

k=1

|bk |

!1/p

(Minkowskis inequality) [Hint: |a + b|p = |a + b||a + b|p/q |a||a +


b|p/q + |b||a + b|p/q if p > 1. Use H
olders inequality.]
References for Chapter IV
R. G. Bartle: Chapter V.
R. C. Buck: Chapter V.

i
i

154

muldown
2010/1/10
page 154
i

Chapter 4. Differentiation 0f Functions of Several Variables

i
i

muldown
2010/1/10
page 155
i

Chapter 5

Further Topics in
Integration

5.1

Changes of Variables in Integrals

Every n n matrix A can be expressed as a product of a finite number of matrices


of the (block) form

I O O
A1 = O a O k-row
O O I

I O O O O
O 1 O 1 O k-row

A2 =
O O I O O
O 0 O 1 O j-row
O O O O I

I O O O O
O 0 O 1 O k-row

A3 =
O O I O O .
O 1 O 0 O j-row
O O O O I

That is, every linear function from Rn to Rn can be expressed as a composition of


linear functions of the form
L1 (x1 , . . . , xk1 , xk , xk+1 , . . . , xn ) = (x1 , . . . , xk1 , axk , xk+1 , . . . , xn )
L2 (x1 , . . . , xk1 , xk , xk+1 , . . . , xn ) = (x1 , . . . , xk1 , xk + xj , xk+1 , . . . , xn )
L3 (x1 , . . . , xk1 , xk , . . . , xj1 , xj . . . , xn ) = (x1 , . . . , xk1 , xj , . . . , xj1 , xk , . . . , xn )
We shall not prove this statement. You should prove it yourself as an exercise; it
is obviously true for n = 1, use induction on n. But first prove it for n = 2 to see
how the general proof should go.
155

i
i

156

muldown
2010/1/10
page 156
i

Chapter 5. Further Topics in Integration

Q
Recall the concepts of an interval in Rn , I = nj=1 [aj , bj ]; the Jordan content
Q
n
of an interval in Rn , (I) = j=1 (bj aj ); and the diameter of an interval I in Rn ,
P
1/2
n
2
. A special case of an interval is an n-cube
(I) =
j=1 (bj aj )
Definition 5.1. If for the interval I, bj aj = a, j = 1, . . . , n, then I is an
bn an
1
.
n-cube of side a and center b1 a
2 ,...,
2

Our immediate goal is to look at sets with content and to investigate what
happens to the content after mapping by nice functions. We begin with investigating the content of intervals under linear functions, and get progressively more
sophisticated. This is background for change of variables in integration.
Lemma 5.2. If I is an interval in Rn and : Rn 7 Rn is linear, then
((I)) = |J |(I).

Proof. The matrices Aj are the Jacobian matrices for the linear functions Lj ,
j = 1, 2, 3, respectively. It is easy to determine what these matrices do to intervals
I. Only the linear function L1 changes the content by changing the length of one
side.
Indeed, both L1 (I) and L3 (I) are again intervals. For L1 (I), the k-th interval
becomes either [aak , abk ] (a > 0), or [abk , aak ] (a < 0), and the content is (L1 (I)) =
|a|(I). For L3 (I), the intervals of the k-th and j-th coordinates are swapped, and
thus, the overall content, namely the product of the lengths of the intervals, remains
unchanged.
Since L2 (I) is no longer an interval, we need to use the theory developed in
Chapter 3. In passing from I to L2 (I) only the description of the k-th coordinate
changes
n
L2 (I) = p = (x1 , . . . , xn ) : am xm bm , m = 1, . . . , k 1, k + 1, . . . , n,
o
ak + xj xk bk + xj .

Let L2 (I) denote the characteristic function of L2 (I). Chose an interval K Rn


so that L2 (I) K. Let K2 be the 2-dimensional subinterval for the variables xj
and xk , and let Kn2 be the subinterval for the remaining variables. Then by the
definition of content and Theorem 3.29, we have
Z

Z
Z
(L2 (I)) =
L2 (I) =
L2 (I)
K

In2

bj

aj

Kn2

K2

1 dxk

bk +xj

ak +xj

= (bj aj )(bk ak )

dxj

(Corollary 3.26)

1 = (I).
In2

i
i

5.1. Changes of Variables in Integrals

muldown
2010/1/10
page 157
i

157

Thus, the Lemma holds for the elementary linear functions. Since any linear
function is a composition of finitely many elementary linear functions, the differential is the product of the differentials, and hence the Jacobian is the product of the
Jacobians, the general case follows.
The next three lemmas and Theorem 5.7 that follows are technical in nature.
In view of the last lemma, you will find it easy to accept Theorem 5.7, so on your
first reading, you may proceed directly to the important change of variables theorem,
Theorem 5.8; however, read the statement of Theorem 5.7 first.
Lemma 5.3. Suppose that : G Rn 7 Rn , is C 1 (G) on the open set G. Then
D G,

D compact and (D) = 0 = ((D)) = 0.

In particular, this holds if is a linear map of rank n.


Proof.

If > 0, there is a finite collection of n-cubes Ij , j = 1, . . . , m, such that


Pm
D m
j=1 Ij G and
j=1 (Ij ) < (Why?)

C 1 (G) = C 1 m
j=1 Ij .

Thus, the partial derivatives of are continuous on the compact set m


j=1 Ij . Therefore, there exists M > 0 such that
|D(p)(u)| M |u|,

p m
j=1 Ij ,

u Rn .

From the Mean Value Theorem 4.29 applied to , we find


|(p) (q)| M |p q|,

p, q Ij , j = 1, . . . , m,

M (Ij ) = nM (Ij )1/n , (Ij is an n-cube).

Therefore, the diameter of (Ij ) is at most nM (Ij )1/n , and so we can find an
n-cube Kj with
n

(Ij ) Kj and (Kj ) 2 nM (Ij ), j = 1, . . . , m.

Consequently,

(D) nj=1 Kj

and ((D)

n
X
j=1

n

(Kj ) 2 nM .

It follows that (D) has content zero.


Lemma 5.4. Suppose G Rn is an open set and : G 7 Rn satisfies
C 1 (G)

and

J (p) 6= 0,

p G.

i
i

158

muldown
2010/1/10
page 158
i

Chapter 5. Further Topics in Integration

If D is a compact subset of G and D has content, then (D) is a compact set with
content.
Proof. We know that C(D) and D compact implies (D) is compact (Theorem
2.36). In particular, D D implies that the boundary of (D) is contained in
(D). By Theorem 4.50 and Theorem 4.53, is locally one-to-one on G and maps
open sets onto open sets. Therefore,
both

(G)

and (G)\(D)

are open sets.

Now, if (p) = q is on the boundary of (D), then each neighborhood V of q


contains a point q1 (G)\(D), which implies that each neighborhood U of p
contains a point in G\D. Therefore, p is in the boundary of D and we have shown
((D)) ((D)).

(5.1)

The set D has content if and only if the content of its boundary is zero (Theorem 3.14). Therefore
(D) = 0 = (((D))) = 0,
= (((D))) = 0,
= (D)

Lemma 5.3
by (5.1)

has content. (Theorem 3.14)

Lemma 5.5. Suppose that K is the n-cube of side 2r centered at O, i.e


K = [r, r] . . . [r, r] .
{z
}
|
n times

Let G be an open set in Rn containing K and suppose we have a function : G 7 Rn


such that
C 1 (G) and J (p) 6= 0, p K.

Then for , 0 < < 1/ n,


|(p) p| < |p|,

n
n
((K))
p K = 1 n <
< 1+ n .
(K)

Proof. From the proof of Lemma 5.4, (5.1),


we know that ((K)) ((K)). If
p = (x1 , . . . , xn ) (K), then r |p| n r, and consequently,

|(p) p| < |p| n r

= r(1 n) |xj | n r |j (p)| |xj | + n r r(1 + n),


|{z}
some j

where the left most inequality is true for at least


one j. Therefore, the boundary
n)r with center O and inside an
of (K) lies outside an
n-cube
of
side
2(1

n-cube of side 2(1 + n)r with center O. Since (K) = (2r)n , this implies
n
n
((K))
< 1+ n .
1 n <
(K)

i
i

5.1. Changes of Variables in Integrals

muldown
2010/1/10
page 159
i

159

(K)

p
Figure 5.1. The cube K is mapped toa region (K)
whose boundary is
captured between two cubes of side lengths 1 n and 1 + n.
We next need a stronger version of Lemma 5.2.
Lemma 5.6. Suppose that L : Rn 7 Rn is a linear map of full rank, and that
: Gopen Rn 7 Rn is in C 1 (G) and J (p) 6= 0 for all p G. Then for any
interval I G,
(L((I))) = |JL |((I)).
Proof.
K,

By Lemma 5.4, (I) has content. Therefore, for any interval K with (I)
Z
(I) .
((I)) =
K

By the Cauchy Criterion for integrals, Corollary 3.7, and by the proper choice
of Riemann sums, given any > 0, there are finitely many intervals with nonoverlapping interior, I1 , . . . , Im , Im+1 , . . . , IN such that
N
m
j=1 Ij (I) j=1 Ij

and
((I))
Then

m
X
j=1

(Ij ) ((I))

N
X
j=1

(Ij ) ((I)) + .



N
L m
j=1 Ij L ((I)) L j=1 Ij

and we have from Lemma 5.2

L m
j=1 Ij
=

m
X
j=1



(L ((I))) L( N
j=1 Ij

|JL |(Ij ) (L ((I)))

N
X
j=1

|JL |(Ij )



= |JL | (((I)) ) (L ((I))) |JL | (((I)) + ) .


Since > 0 is arbitrary, the lemma follows.

i
i

160

muldown
2010/1/10
page 160
i

Chapter 5. Further Topics in Integration

Theorem 5.7 (The Jacobian Theorem). Suppose D is a compact subset of an


open set G Rn and
: G 7 Rn ,

C 1 (G)

with

J (p) 6= 0,

p G.

Then, given > 0, there is a () > 0 such that for any n-cube, K, with center
p D and side length less than 2,
|J (p)| (1 )n

((K))
|J (p)| (1 + )n .
(K)

Proof. Since J (p) 6= 0, (D(p)) =: Lp exists and is linear. Moreover, det[Lp ] =


1/J (p). Since D is compact and C 1 (D), the partial derivatives of are
uniformly continuous on D and we can deduce the following:
(a) There is an M > 0 such that
|Lp (u)| M |u|,

p D

and u Rn . (Why?)

(b) If > 0 is given, there is a = () > 0 (independent of p) such that


(p D

and |u| < ) = |(p + u) (p) D(p)(u)|

|u|.
M n

Indeed, for any p + u B(p, ) G centered on p, by the Mean Value


Theorem (Theorem 4.29), (p + u) (p) = D(c)(u) for some c on the line
segment joining p + u and p. The inequality follows for suitable by the
uniform continuity of the partial derivatives when |p c| < .
For fixed p, define (u) = Lp ((p + u) (p)). Then

|(u) u| = |(u) Lp (D(p))(u)| |u|,


n

Applying Lemma 5.5 with = / n to , we find


(1 )n

by (a) and (b).

((K))
(1 + )n .
(K)

e
But, by Lemma 5.6 applied to e and K p where (u)
= (p + u) (p), and
from the fact that content is invariant under translation, we find
e p)) = | det[Lp ]|((K
e p))
((K)) = (Lp ((K
=

((K))
((K) (p))
=
|J (p)|
|J (p)|

= |J (p)| (1 )n

((K))
|J (p)| (1 + )n .
(K)

Theorem 5.8 (Change of Variable Theorem). Suppose that

i
i

5.1. Changes of Variables in Integrals

muldown
2010/1/10
page 161
i

161

(i) G is an open subset of Rn ;


(ii) : G 7 Rn , C 1 (G), J (p) 6= 0 for all p G;
(iii) D is a compact, connected subset of G, D has content, and is one-to-one on
D.
Then (D) has content and for any f , f : (D) 7 R, f C((D)), we have
Z
Z
(f ) |J |.
f=
(D)

Proof. The set (D) has content by Lemma 5.4. Without loss of generality, we
may assume both f 0 and J 0 (WHY?). By Theorem 3.12 both of the
intergrals in the theorem exist. Therefore, it only remains to show that they are
equal.
For any , 1 > > 0, we may choose a partition on D consisting of n-cubes
Kj , j = 1, . . . , m, which are sufficiently small so that
R

Pm


(a) D (f )J j=1 (f )(qj )J (pj )(Kj ) < , for any qj Kj and pj
being the center of Kj ;
(b) J (pj )(1 )n

((Kj ))
J (pj )(1 + )n ,
(Kj )

(item (a) by the definition of the integral and the continuity of J (p), and item (b)
by Theorem 5.7).

(D)

Figure 5.2. A partition of D and the distortion under the mapping .


Now on the other hand
Z
m Z
X
f
f=
(D)

j=1

(since is one-to-one)

(Kj )

i
i

162

muldown
2010/1/10
page 162
i

Chapter 5. Further Topics in Integration

=
=

m
X

j=1
m
X

f (pj )((Kj )),

pj (Kj )

where qj Kj .

f ((qj ))((Kj )),

j=1

Therefore from item (b) above, we have


Z
m
X
n
f ((qj ))J (pj )(Kj )(1)
(c)
j=1

(Theorem 3.15(e))

m
X

(D)

f ((qj ))J (pj )(Kj )(1+)n .

j=1

Since > 0 is arbitrary, (a) and (c) together imply


Z
Z
f.
(f )J =
(D)

Remarks: Strictly speaking, the use of the Mean Value Theorem for Integrals,
Theorem 3.15(e), is not applicable to those Kj which intersect the boundary of D.
But this presents no problem (why ?).
You will notice that Theorem 5.8 is much more restrictive than the corresponding result in R1 , Corollary 3.19, which states that
Z

(b)

f=

(a)

(f )

without the restrictions that 6= 0 and being one-to-one. These restrictions are
necessary in Rn because of considerations derived from the notion of orientation
which will be discussed later.
Example 5.9 Compute the area of the region


(x, y) : 0 y, 0 < r2 x2 + y 2 R2 R2 .

Let (x, y) = (u, v) = (v cos(u), v sin(u)). Then




(x, y)
v sin(u) cos(u)
J =
= det
.
v cos(u) sin(u)
(u, v)

Therefore, |J | = v and by Theorem 5.8


Z
Z
((D)) =
(D) =
((D) )|J |
(D)

=
=

D |J | =

D
Z R
r

v dv =

v du dv


1
R2 r 2 .
2

i
i

5.1. Changes of Variables in Integrals

muldown
2010/1/10
page 163
i

163

v
R
r
x
r

Figure 5.3. (a) The regions D and (D) for Example 5.9
Alternatively, you may write this as


Z
Z
(x, y)
du dv.

1
1 dx dy =
(u, v)
D

(D)

Remark: The conditions J 6= 0 and is one-to-one etc, may fail to hold on a


set S D without affecting the result of Theorem 5.8 provided (S) = ((S)) = 0.
All the integrals involved may be approximated as closely as we please by integrals
over regions for which the conditions do hold. For instance, in Example 5.9, we may
take r = 0 and get the correct formula for the area of the semicircle even though
J = 0 on the u-axis and maps any segment of that axis onto the point (0, 0) in
the (x, y)-plane. The details are assigned as Exercise 4.
Example 5.10 Find the area of the region in the (x, y)-plane bounded by the curve
r = a + b cos(), 0 < b < a, where x = r cos(), y = r sin(). Now
9

5
0
4
2

8
5

0
0

Figure 5.4. The region D in the (, r)-plane and its image, the inside of
a cardioid in the (x, y)-plane for Example 5.10.

(x, y) = (r, ) = (r cos(), r sin()),

J =


(x, y) cos()
=
sin()
(r, )


r sin()
= r.
r cos()

i
i

164

muldown
2010/1/10
page 164
i

Chapter 5. Further Topics in Integration

Therefore,
Z
Z
dx dy =
(D)



Z 2 Z a+b cos()
(x, y)
dr d =

r dr d


0
0
D (r, )
Z
Z
1 2
1 2 2
=
(a + b cos())2 d =
(a + 2ab cos() + b2 cos2 ()) d
2 0
2 0




Z
1
1
1 2
a2 + b2 (1 + cos(2)) d = a2 + b2 .
=
2 0
2
2

Here again J = 0 on the -axis (r R= 0) andR is not one-to-one on the lines


= 0, = 2, and r = 0. However, (D) and D both exist and Theorem 5.8 is
applicable to regions which difffer from (D) and D respectively by regions which
have arbitrarily small content.
Example 5.11 The function (x, y) = (u, v) = (u2/3 v 1/3 , u1/3 v 2/3 ) maps the triangle D := {(u, v) : 0 u, 0 v, u + v 1} onto the area bounded by the loop
of the curve x3 + y 3 = xy. Observe that J = 31 and that maps the u and v axes
1

1.5

1
0.5
0.5
0
0

0.5
0.5

0.5

0.5
0.5

0.5

1.5

Figure 5.5. The triangle D in the (u, v) plane is transformed to the inside
of the loop of the curve x3 + y 3 = xy in Example 5.11
onto (0, 0). If f is continuous on (D), then
Z
Z
(f )|J |.
f=
(D)

For example, if f (x, y) = xy, then (f )(u, v) = (u2/3 v 1/3 )(u1/3 v 2/3 ) = uv; hence
Z
Z Z
Z
1 1 1v
1
1 1
f=
.
(1 v)2 dv =
uv du dv =
3 0 0
6 0
72
(D)
Why is this valid? is not C 1 at (0, 0) and is not one-to-one on the u and v axes.
The first three examples illustrated how the change of variable formula may
be used to simplify the region of integration. It may also be used to simplify the
integrand, which was its basic role in one-dimension.

i
i

5.1. Changes of Variables in Integrals


Example 5.12 Find


Z
xy
dx dy
exp
x+y
D

muldown
2010/1/10
page 165
i

165

when D := {(x, y) : 0 x, 0 y, x + y 1}.

Let (u, v) = (x y, x + y) = 1 (x, y). Then

x+y=1

D
u=v

u=v
D*
x
0

Figure 5.6. The tip down triangle, D, in the (u, v)-plane is transformed
to the right triangle D in Example 5.12.
D = 1 D , D = (D),


u+v vu
= (u, v),
,
(x, y) =
2
2
1
1
(x, y) 2
1
J =
= 1 12 = .
(u, v)
2

Alternatively,

J1


(u, v) 1
=
=
(x, y) 1


1
1
= 2 = J = ,
1
2

so in fact it was not necessary to find . Then


Z

exp
D

xy
x+y

dx dy =

exp

u 1
v

du, dv =

1
2

Z
1 1 u/v u=v
1
=
ve
dv =
2 0
2
u=v


1
1
1
=
e
= sinh(1).
4
e
2

0
Z 1
0

exp

u

v


1
dv
v e
e
v

du, dv

See also the examples in Buck pp 306-311 and the exercises on p311-313. The
particular exercises 10-12 indicate a proof of Theorem 5.8 based on the Implicit
Function Theorem when n = 2.

i
i

166

muldown
2010/1/10
page 166
i

Chapter 5. Further Topics in Integration

Exercises
5.1. For f1 (x, y) =
Z

f1

x2
a2

and

y2
b2

and f2 (x, y) = x2 + y 2 , find




Z
y2
x2
f2 where D = (x, y) : 2 + 2 1 .
a
b
D

[Solution: ab/2, ab(a2 + b2 )/4.]


5.2. Let f be a real valued continuous function on R2 . Show that


Z 1 Z x
Z 1/2 Z 1
f (x, y) dy dx = 2
f ( + , ) d d.
0

5.3. Do the following:


(i) Show that the ball S = {(x, y, z) : x2 + y 2 + z 2 a2 } has content
M = 4a3 /3 in R3 . [Hint: Let x = r sin() cos(), y = r sin() cos(),
and z = r cos(), for 0 r a, 0 2, 0 .]
(ii) Let S be as in part (i) and show
Z
2M a2
Iz := (x2 + y 2 ) dx dy dz =
.
5
S

[Iz is the moment of inertia of a uniform spherical


p mass distribution
(total mass = M ) about a diameter. The quantity 2a2 /5 is called the
radius of gyration about a diameter.]

5.4. (The case of the vanishing Jacobian) Suppose : Rn 7 Rn is of class C 1 on


an open set G Rn and that J 6= 0 and is one-to-one except on a set K
with content zero. Suppose D G is compact and has content, and f is a
real valued continuous function on (D). Show that (D) has content and
Z
Z
f=
(f )|J |.
(D)

[Hint: For a given , enclose K in the union U of a finite number of n-cubes


such that (U ) < . Apply Theorem 5.8 to D\U.]
5.5. Show that
Z

1/2
1
(x y)2 + 2(x + y) + 1
dx dy = 2 log(2)
2
K
where K is the triangle bounded by x = 0, y = 2, and x = y. [Hint:
x = u(1 + v), y = v(1 + u).]
R
R
5.6. Evaluate D1 f and D2 f where
f (x, y) =

1
(1 + x2 + y 2 )2

i
i

5.1. Changes of Variables in Integrals

muldown
2010/1/10
page 167
i

167

and (i) D1 is the region bounded by one loop of the lemiscate


(x2 + y 2 )2 (x2 y 2 ) = 0;

and (ii) D2 is the triangle with vertices (0, 0), (2, 0), (1, 3). [Solution: 4 12

and 23 arctan(1/2).]
5.7. Show that


Z
y2
z2
x2
a 2 b 2 c2
where K := (x, y, x) : 2 + 2 + 2 1 .
|xyz| dx dy dz =
6
a
b
c
K
5.8. Let (x, y) = Ax2 + 2Hxy + By 2 + 2Gx + 2F y + C.
(i) Show
Z
1
= ab(Aa2 +Bb2 +4C) where
4
K

K :=



y2
x2
(x, y, ) : 2 + 2 1 .
a
b

Save yourself some work by using symmetry to show


Z
Z
Z
y dx dy = 0.
x dx dy =
xy dx dy =
(ii) Show

(iii) Show
R

= 14 ab(Aa2 + Bb2 + 4(x0 , y0 )) where


n
o
2
2
0)
0)
H := (x, y, ) : (xx
+ (yy
1 .
a2
b2

(2x2 + y 2 + 3x 2y + 4) dx dy =
2

57
2

where

S is inside x + 4y 2x + 8y + 1 = 0.

(iv) If the plane x + my + nz = p intersects the elliptic paraboloid


y2
z
x2
+ 2 =2 ,
2
a
b
c
show that the volume of the solid bounded by these two surfaces is
ab 2 2
(a + b2 m2 + 2pcn)2 .
4c3 n4
5.9. Find
Z

log(x2 + y 2 ) dx dy

where D := {(x, y) : b2 (x2 + y 2 ) a2 }

and show that the limit as b 0 of this expression is 2a2 (log(a) 21 ).

i
i

168

muldown
2010/1/10
page 168
i

Chapter 5. Further Topics in Integration

R
5.10. Show that D x3 y 2 dx dy = 248 52 , where D is the region bounded by the lines
y = 3(x 2), y = 3(x 4). [Hint: Move the origin to the center of the
region and use symmetry as much as you can.]
5.11. Show that the area of the region in the first quadrant bounded by
y 3 = a1 x2 ,
2

b31 ,

y 3 = a2 x2 ,
2

a1 > a2 > 0,

and

b32 ,

xy =
b1 > b2 0, is


7  15/7
1/7
1/7
15/7
.
Area =
a2
a1
b1 b2
5
xy =

5.12. Show that the area of the region bounded by the loop of the curve x3 + y 3 =
3axy is 3a2 /2.
5.13. The transformation x = um (v), y = un (v) maps the rectangle [u1 , u2 ]
[v1 , v2 ] into a region in the (x, y)-plane. The area of this region is


Rv

n
(um+n
if m 6= n and
um+n
) v12 mm+n
,
1
2

|m((v1 )(v1 ) (v2 )(v2 )) log(u1 /u2 )|,

if m = n.

What conditions must and satisfy if this statement is correct? Prove the
statement under your conditions.
R
5.14. Find D xyz dx dy dz where D is the region in the first octant bounded by
the cylinder x2 + y 2 = 16 and the plane z = 3, that is,


D = (x, y, z) : 0 x, 0 y, 0 z 3, x2 + y 2 16 .

5.15. Let G be the region in the first quadrant which


R is bounded by the curves
xy = 1, xy = 3, x2 y 2 = 1, x2 y 2 = 4. Find G F where F (x, y) = xy.
5.16. Show
Z
1
= pq(ap2 + bq 2 ) + 2pq(x0 , y0 )
3
D
where D is the region bounded by the four straight lines
y y0
x x0

= 1 (p.q > 0)
p
q
and (x, y) = ax2 + 2hxy + by 2 + 2gx + 2f y + c.
5.17. Show the following:
(i) If X = x + y + z + u, XY = y + z + u, XY Z = z + u, XY ZU = u. show
that
(x, y, z, u)
= X 3 Y 2 Z.
(X, Y, Z, U )
(ii) Show that
Z

1u

1zu
0

1yzu

(x + y + z + u)n xyzu dx dy dz du =

i
i

5.2. Integration on Curves and Surfaces


Z

X n+7 dX

169

Y 5 (1 Y ) dY

Z 3 (1 Z) dZ

U (1 U ) dU =
=

5.18. Which is larger

5.2

R1
0

xx dx or

R1R1
0

muldown
2010/1/10
page 169
i

1
.
(n + 8)7!

(xy)xy dx dy?

Integration on Curves and Surfaces

To extend the notion of integration on subsets of Rn as introduced in Chapter


III to subsets of k-dimensional objects in Rn , it is necessary to introduce a definition of content for these objects. Rather than deal with integration on general
k-dimensional objects, we will restrict our attention to 1- and 2-dimensional objects,
i.e. curves and surfaces. First, a manifold segment is too general a concept for which
to hope to define content; for example, the graph of f (x) = sin(1/x), x (0, 1), or
the graph on the right in Figure 5.7, could not be expected to have finite length.
So we should probably restrict our attention to the maps of fairly well-behaved
compact subsets of the parameter space. On the other hand, the smoothness requirement on manifold segments is somewhat too restrictive in the sense that we
should be able to assign a length to objects like those in the picture on the left in
Figure 5.7. We shall therefore consider objects which behave like nice manifold

Figure 5.7. The function on the right has graph that are equilateral triangles with base on [1/(k + 1), 1/k], k = 1, . . ., alternating up and down. Since
the height is one, the length of the graph of each triangle is greater than 2. Hence,
the total length cannot be finite. The graph on the left is piecewise smooth and
intuitively does have lenght.
segments piecewise.

5.2.1

Curves (1-surfaces)

Definition 5.13. Let U be an open connected subset of R and


: U 7 Rn ,

(t) = (x1 (t), . . . , xn (t)),

n 1.

i
i

170

muldown
2010/1/10
page 170
i

Chapter 5. Further Topics in Integration

(i) If C(U ), then is a curve on U in Rn .


(ii) If C 1 (U ) and rank (t) = 1 for all t U , then is a smooth curve on U
in Rn :


x1 (t)
n
X
..

xi (t)2 > 0, t U.

(t) = . , smooth
xn (t)

i=1

(iii) Two smooth curves and on U and U respectively are called parametrically
equivalent if there is a function f : U 7 U , f C 1 (U ), such that f (t) > 0
and (t) = (f (t)) for all t U .
(iv) (U ) is the trace of the curve in Rn .
(v) A curve : U 7 Rn is piecewise smooth if
(a) is continuous on U
(b) Each compact subinterval of U is the union of a finite number of intervals
I such that : I 7 Rn is a smooth curve and | | is Riemann integrable
on I.
We shall refer informally to the curve and its trace as the curve. We will
see that this is unambiguous in the integration theory for parametrically equivalent
curves. It is perhaps even more precise to consider curves as equivalence classes
of functions : R 7 Rn , the equivalence relation being parametric equivalence.
However, you might consider this idea of a curve as too eccentric on your first
encounter.
Example 5.14 A line in R3 through a point p0 = (x0 , y0 , z0 ) may be parametrized
by

( x = x + at,
a
0
: y = y0 + bt, t R
(t) = b .
z = z0 + ct,
c

If a2 + b2 + c2 > 0 this is a smooth curve. The numbers (a, b, c) are called


the direction numbers of . They are not unique in the sense that (a, b, c) =
(a, b, c) define a parametrically equivalent line if > 0. The values
(, m, n) =

a2

1
(a, b, c),
+ b 2 + c2

with

2 + m2 + n2 = 1,

are the direction cosines of . The triple (, m, n) are the cosines of the angles
determined by the line and the directions (1, 0, 0), (0, 1, 0), (0, 0, 1) respectively.
A line is determined by (i) a point (x0 , y0 , z0 ) and a direction (a, b, c), or (ii)
two points (x0 , y0 , z0 ) and (x1 , y1 , z1 ). In the latter case, a direction is determined
as (a, b, c) = (x1 x0 , y1 y0 , z1 z0 ).

i
i

5.2. Integration on Curves and Surfaces

muldown
2010/1/10
page 171
i

171

p + (a,b,c)
o
c
po

a
Figure 5.8. The line through p0 = (x0 , y0 , z0 ) in the direction (a, b, c).
The direction (a, b, c) is orthogonal to (, , ) if
0 = (a, b, c) (, , ) = a + b + c.
The set of points (x, y, z) in R3 such that (x x0 , y y0 , z z0 ) is orthogonal to
(a, b, c) is a plane through (x0 , y0 , z0 ) with normal (a, b, c). The equation of this
plane is
0 = (x x0 , y y0 , z z0 ) (a, b, c) or a(x x0 ) + b(y y0 ) + c(z z0 ) = 0.

:(a,b,c)

p
o

(x,y,z)

Figure 5.9. The normal to the plane through p0 = (x0 , y0 , z0 ).

Example 5.15 A portion of the parabola y = x2 can be parametrized by


:

( x = t,

y = t2

,0 < t < 1,

(t) =


1
.
2t

Then x (t)2 + y (t)2 = 1 + 4t2 > 0 so is a smooth curve.

i
i

172

muldown
2010/1/10
page 172
i

Chapter 5. Further Topics in Integration

y
(t,t 2 )
x
Figure 5.10. A parametrization og the parabola in Example 5.15.
Example 5.16 Another example weve seen before is the helix:

(
sin(t)
x = cos(t)
: y = sin(t), t R,
(t) = cos(t) .
1
z=t
Again x (t)2 + y (t)2 = 2 > 0, so is a smooth curve.

(cos(t),sin(t),t)

Figure 5.11. The spiral in Example 5.16.


We know that if C 1 , then (t0 ) + D(t0 )(t t0 ) is the best affine approximation to (t) near t0 . This motivates the following definition.
Definition 5.17. Let : U R 7 Rn be a smooth curve.
(i) The tangent line to the smooth curve at a point t0 is
p(t) = (t0 ) + D(t0 )(t t0 ).
For example, in R3 , the tangent to

x = x(t)
: y = y(t) , is

z = z(t)

the curve

x = x0 + x (t0 )(t t0 )
y = y0 + y (t0 )(t t0 ) ,

z = z0 + z (t0 )(t t0 )

i
i

5.2. Integration on Curves and Surfaces

muldown
2010/1/10
page 173
i

173

where x0 = x(t0 ), y0 = y(t0 ), z0 = z(t0 ).


(ii) The vector v = (t0 ) is the tangent vector of a smooth curve at a point t0 .
The direction of a smooth curve at the point t0 is
v
(t0 )
=
.
|v|
| (t0 )|

For example in R3 , the tangent vector is

1
p
(x (t0 ), y (t0 ), z (t0 )) .

x (t0 ) + y (t0 )2 + z (t0 )2

(iii) The normal plane at the point t0 for a smooth curve in R3 is

x (t0 )(x x0 ) + y (t0 )(y y0 ) + z (t0 )(z z0 ) = 0 or

x x0
[ x (t0 ) y (t0 ) z (t0 ) ] y y0 = 0.
z z0

Question: What corresponds to this normal plane for a smooth curve in R2 ?

Figure 5.12. The plane norm to a curve as determined by the tangent


vector .

Remark: Two parametrically equivalent curves have the same direction. That is,
if (t) = (f (t)) with f > 0, then
(t)
(f (t))f (t)
(f (t))
=
=
.

| (t)|
| (f (t))f (t)|
| (f (t))|
Definition 5.18. Let [a, b] U R and let : U 7 Rn be a curve.

(i) The length of a smooth ([a, b]) is defined as the quantity


Z b
Z bq

x1 (t)2 + . . . + xn (t)2 dt.


(([a, b])) :=
| (t)| dt =
a

Symbolically, we shall write d = | (t)| dt.

i
i

174

muldown
2010/1/10
page 174
i

Chapter 5. Further Topics in Integration

(ii) The curve is piecewise smooth on U if for any [a, b] there are nonoverlapping
intervals of [a, b], [ai , bi ], such that is smooth on (ai , bi ). In this case.
X
(([ai , bi ])).
(([a, b])) =
i

Remark: If and are parametrically equivalent curves with f (a ) = a and


f (b ) = b, then ( ([a , b ])) = (([a, b])) since
Z b
Z b


(f (t)) f (t) dt


(t) dt =
( ) =
a
b

| (t)| dt.

Example 5.19 The curve y = f (x) can be parametrized simply by



x=t
:=
with (t) = (1, f (t)).
y = f (t),
Then
(([a, b])) =

b
a

p
1 + f (t)2 dt.

Example 5.20 The circumference of a circle of radius a is the length of a smooth


curve:

Z 2 q
x = a cos(t)
:=
a2 sin2 (t) + a2 cos2 (t) dt = 2a.
(([0, 2])) =
y = a sin(t)
0

NOTE: (([0, 4])) = 2(([0, 2])) even though these 2 curves have the same
trace.

Motivation and Remarks: By way of motivation for the definition of the length
of a curve, we offer the following:
(a) The length of the line segment p(t) = p0 + q0 t for t [t1 , t2 ] is
Z t2
|p (t)| dt.
|p(t2 ) p( t1 )| = |q0 (t2 t1 )| =
t1

(b) Now consider the segment [a, b] and a partition (tk )m


k=0 of [a, b]. Suppose
k [tk1 , tk ], k = 1, . . . , m. One expects the sum of the lengths of the
line segment p(t) = (k ) + (k )(t k ), t [tk1 , tk ], k = 1, . . . , m, to
approximate the length of [a, b] as closely as one pleases provided only that
the partition is fine enough. This sum is
m
X

k=1

which is a Riemann sum for

| (k )|(tk tk1 ),

Rb
a

| (t)| dt.

i
i

5.2. Integration on Curves and Surfaces

muldown
2010/1/10
page 175
i

175

Alternatively, the length of [a, b] is often defined to be


sup

m
nX
k=1

o
|(tk ) (tk1 )| : (tk ) partitions of [a, b] .

It is not difficult to see by the Mean Value Theorem, that this is equivalent to the

Figure 5.13. The two means to approximate length, length of tangent lines
(top), and lengths of secant lines (bottom).
Rb
definition ([a, b]) = a | (t)| dt if is C 1 . However, this alternative definition is
more general in that it pertains to any curve for which the supremum exists as a
real number; in particular, need not be C 1 .

5.2.2

Surfaces (2-surfaces)

Definition 5.21. Let U be an open connected subset of R2 and


: U 7 Rn ,

(u, v) = (x1 (u, v), . . . , xn (u, v)),

n 2.

(i) If C(U ), then is a surface (2-surface) on U in Rn .


(ii) If C 1 (U ) and rank (u, v) = 2 for all (u, v) U , then is a smooth surface
on U in Rn :
x1 x1
2
n 
u
v
X
(xi , xj )

.. and is smooth
> 0.
= ...
.
(u, v)
x
x
i,j=1
n

(iii) Two smooth surfaces and on U and U respectively are called parametrically
equivalent if there is a function f : U 7 U with f C 1 (U ), Jf (p) > 0 for
all p U , f is one-to-one from U onto U and (p) = (f (p)) for all
p U .
(iv) (U ) is called the trace of the surface in Rn . Again, we will not worry
excessively about distinguishing between a surface and its trace.

i
i

176

muldown
2010/1/10
page 176
i

Chapter 5. Further Topics in Integration

(v) A surface : U 7 Rn is piecewise smooth if


(a) is continuous on U
(b) each compact Jordan measurable (has content) subset of U is the union of
finitely many Jordan measurable subsets D such that D = D , : D 7
Rn is a smooth surface, and the quantity

2 1/2
n 
X
(x
,
x
)
i
j

(u,
v)
i,j=1

is Riemann integrable on D.

Example 5.22 The following surface is a plane in R3 . The parametrization

(
1 1
x=u+v
: y = u v, (u, v) R2 , = 1 1 ,
1 0
z=u

describes a smooth plane since rank = 2 which may also be described as the
solution set of

x
x + y 2z = 0, or [ 1 1 2 ] y = 0.
z

From the latter equation, we see that the direction numbers of the normal vector
are (1, 2, 2). Notice also that these direction numbers are in fact given by


(y, z) (z, x) (x, y)
.
,
,
(u, v) (u, v) (u, v)

Example 5.23 The unit sphere in R3 , x2 + y 2 + z 2 = 1, may be parametrized by

x = cos() sin()
:
y = sin() sin(), (, ) R2 .

z = cos()

sin() sin() cos() cos()


= cos() sin() sin() cos() ,
0
sin()

for which rank = 2 unless is an integer multiple of . Hence, is piecewise


smooth in regions not containing m, m Z.

Example 5.24 In Example 5.22 we had a particular plane through the origin in
R3 . In this example, we take a longer look at the general form for a plane. Let

i
i

5.2. Integration on Curves and Surfaces

muldown
2010/1/10
page 177
i

177

(x0 , y0 , z0 ), (a1 , a2 , a3 ) and (b1 , b2 , b3 ) be given triples of real numbers, and consider
the parametrization
(
x = x0 + a1 u + b1 v
:
y = y0 + a2 u + b2 v, (u, v) R2 , or
z = z + a u + b3 v
0 3

x
x0
a1 b 1  
y = y 0 + a2 b 2 u .
v
z
z0
a3 b 3

In vector form, (x, y, z) = (x0 , y0 , z0 ) + u(a1 , a2 , a3 ) + v(b1 , b2 , b3 ), this is a plane


through the point (x0 , y0 , z0 ) provided the vectors (a1 , a2 , a3 ) and (b1 , b2 , b3 ) are
linearly independent. Looking at the determinants of certain 2 2 submatrices of
, we define




(z, x) a3 b3
(y, z) a2 b2
=
=
A,
=
= B,
a3 b 3
(u, v)
(u, v) a1 b1


(x, z) a1 b1
= C.
=
a2 b 2
(u, v)

Therefore, is a smooth surface if and only if

A2 + B 2 + C 2 > 0 a = (a1 , a2 , a3 ) 6= const b = const(b1 , b2 , b3 ).


To identify the plane in the form given in Example 5.14, consider
A(x x0 ) + B(y y0 ) + C(z z0 )
= A(a1 u + b1 v) + B(a2 u + b2 v) + C(a3 u + b3 v)


a 1 u + b 1 v a1 b 1


= a2 u + b2 v a2 b2
a 3 u + b 3 v a3 b 3




a1 a1 b 1
b 1 a1 b 1




= u a2 a2 b2 + v b2 a2 b2 = 0.
a3 a3 b 3
b 3 a3 b 3

This is the plane through (x0 , y0 , z0 ) with normal




(y, z) (z, x) (x, y)
(A, B, C) =
.
,
,
(u, v) (u, v) (u, v)
Notice that the normal vector is given by





e1 e2 e3
a
a2 b 2






+ e2 3
a b := a1 a2 a3 = e1

a1
a
b
3
3
b1 b2 b3



a
b3
+ e3 1
a2
b1


b1
.
b2

i
i

178

muldown
2010/1/10
page 178
i

Chapter 5. Further Topics in Integration

The best affine approximation to any surface (u, v) for (u, v) near (u0 , v0 ) is
(u0 , v0 ) + D(u0 , v0 )(u u0 , v v0 ); hence the following definition.
Definition 5.25. For 2-surfaces
(i) the tangent plane for a smooth surface at the point (u0 , v0 ) is
p(u, v) = (u0 , v0 ) + D(u0 , v0 )(u u0 , v v0 ).
For example, in R3 , the tangent to the surface

x
x

x
+
(u

u
)
+

0
0
u p
v p (v v0 )

0
0
x = x(u, v)

y
y
y0 + u
(u u0 ) + v (v v0 )
: y = y(u, v) is

p0
p0

z = z(u, v)

z
z
z0 + u
(u u0 ) + v
(v v0 ),
p0

p0

where p0 = (u0 , v0 ), x0 = x(u0 , v0 ), y0 = y(u0 , v0 ), z0 = z(u0 , v0 ). In Example 5.24, this is the plane
(y, z)
(z, x)
(x, y)
(x x0 ) +
(y y0 ) +
(z z0 ) = 0.
(u, v) p0
(u, v) p0
(u, v) p0

(ii) the normal line to the surface at p0 in R3 has the direction numbers


(u, v)
(z, x)
(x, y)
n=
.
,
,

(u, v) p0 (u, v) p0 (u, v) p0
The unit normal direction is n1 = n/|n|.

Question: In R4 , what corresponds to the normal line discussed above for the
surface in R3 ? If you cannot figure it out, ask.

Exercises
5.19. Show that two parametrically equivalent surfaces in R3 have the same unit
normal n1 .
5.20. Let : R2 7 Rn , (xi = xi (u, v), i = 1, . . . , n), be a smooth surface. Show
that the vectors




xn
xn
x1
x1
, v=
,
,...,
,...,
u=
u
u p0
v
v p0
are tangent vectors to certain smooth curves in . Check that the condition
rank (p0 ) = 2 simply requires that u and v be linearly independent. Check
that in the case n = 3, n = u v.

i
i

5.2. Integration on Curves and Surfaces

muldown
2010/1/10
page 179
i

179

We discuss the concept of surface area only for 2 surfaces in R3 . It is hoped


that the treatment of the problem of surface area in Rn should be clear from this.
It should further indicate the procedure to be adopted in dealing with content of
k-surfaces in Rn .
Definition 5.26.
(i) If D is a compact Jordan measurable (has content) subset of U R2 and is
a smooth surface on U in R3 ,

x = x(u, v)
: y = y(u, v), (u, v) U

z = z(u, v)
then

A((D)) :=

|n|du dv =

s

(y, z)
(u, v)

2

(z, x)
(u, v)

2

(x, y)
(u, v)

2

du dv.

A((D)) is called the area of (D). Symbolically,


dA = |n|du dv.
(ii) If is a piecewise smooth surface on U , then
X
A((Di ))
A((D)) :=
i

where Di are nonoverlapping Jordan measurable subsets of D such that is


smooth on Di .
Remark: Two parametrically equivalent surfaces have the same area (more on this
later).
Example 5.27 Here we compute the surface area for a surface given as the graph
of a function of two variable, namely, z = f (x, y), (x, y) D:
(x = u
:

Then

y=v
z = f (u, v)

1
= 0
fu

that is

z = f (x, y),

0
1 ,
fv

and

f C 1 (D).

rank = 2,

so is smooth. The area formula reads


Z p
Z q
A((D)) =
fu2 + fv2 + 1 du dv =
fx2 + fy2 + 1 dx dy.
D

i
i

180

muldown
2010/1/10
page 180
i

Chapter 5. Further Topics in Integration

Example 5.28 The surface area of a sphere of radius a is easily computed when
the sphere is described as the surface

x = a sin() cos()
: y = a sin() sin() on D : {(, ) : 0 2, 0 .}.

z = a cos()
Then x2 + y 2 + z 2 = a2 can easily be checked, and

a sin() sin() a cos() cos()


= a sin() cos() a cos() sin() .
0
a sin()

It follows that

|n|2 = a4 sin4 () cos2 () + a4 sin4 () sin2 ()


A((D)) =
=

2
+a4 sin()(cos() sin2 () + cos() sin() cos2 () ,

|n|

D
Z
0

and

q
a4 sin2 () cos2 () + a4 sin4 () d d

a2 sin() d d


= 2a2 cos() = 4a2 .

x
Figure 5.14. The sphere in Example 5.20.
Example 5.29 The surface area of a torus. A torus can be parametrized by

(
0 2
x = (R a cos()) cos()
: y = (R a cos()) sin(), on D :
, 0 < a < R,

z = a sin()
0 2

i
i

5.2. Integration on Curves and Surfaces

muldown
2010/1/10
page 181
i

181

and

Figure 5.15. The torus in Example ??.

(R a cos()) sin() a sin() cos()


= (R a cos()) cos() a sin() sin() .
0
a cos()

Then

|n|2 = ((R a cos()) cos() cos())2 + ((R a cos()) sin() cos())2


2
+ (R a cos()) sin2 () sin() + cos2 () sin()
= a2 (R a cos())2 .

Therefore,

Z 2 Z 2
a(R a cos()) d d
|n| =
0
D
0
Z
= 2a (R a cos()) d = (2)2 aR.

A((D)) =

As motivation for the definition of A() we offer the following two ideas.
(a) The area of a plane segment (see the figure)
Ayz = A cos(1 ) = A,
= A2

Azx = a cos(2 ) = Am,

Axy = a cos(3 ) = An

= A2 (2 + m2 + n2 ) = A2yz + A2zx + A2xy .

(5.2)
Thus, A2 is the sum of the squares of the areas of the projectons onto the
coordinate planes (yeah Pythagoras!).
Consider the affine function : R2 7 R3 and
onto the coordinate planes in R3 :

x
x0
a1
: y = y 0 + a2
z
z0
a3

the projections yz , zx , xy

b1  
u
b2
v
b3

i
i

182

muldown
2010/1/10
page 182
i

Chapter 5. Further Topics in Integration

3
n =(l,m,n)
A
y
A

xy

Figure 5.16. The plane segment area and its projection onto the xy-plane.
 
    
u
a b
y
y
= 0 + 2 2
v
a3 b 3
z
z0
    
 
z
z
a b
u
:
= 0 + 3 3
x
x0
v
a1 b 1
    
 
x
x0
a b
u
:
=
+ 1 1
y
v
y0
a2 b 2

yz :
zx
xy

If D is a subset of R2 with content, then from Lemma 5.2








(y, z)
a2 b2
A(D), hbox


A(yz (D)) = det
A(D) =
a3 b 3
(u, v)

etc.,

so, by (5.2) above,

q
A(yz (D))2 + A(zx (D))2 + A(xy (D))2
s
2 
2 
2
(z, x)
(x, y)
(y, z)
=
+
+
A(D)
(u, v)
(u, v)
(u, v)
Z
= |n|A(D) =
|n| du dv (since n is constant here).

A((D)) =

(b) The area of a smooth surface. Let : D R2 7 R3 be a smooth surface. If


D is partitioned into subsets Di , then the sum of the areas of tangent plane
segments |n(pi )|A(Di ), pi Di . should be expected to approximate the
area of (D) very closely, provided that
R the partition is fine enough. But
P
|n(p
)|A(D
)
is
a
Riemann
sum
for
i
i
i
D |n| du dv.

Now that the notion of content or measure has been introduced for curves and
surfaces it is a simple matter to extend the idea of integration to such objects. For
example in R3 : If : R 7 R3 is a smooth curve and I is a closed interval in R, then

i
i

5.2. Integration on Curves and Surfaces

muldown
2010/1/10
page 183
i

183

.
.

Figure 5.17. The area of a surface as approximated by tangent plane areas.


a partition of (I) is induced by a partition of I into subintervals {Ii }. So, if f is
a real valued function on (I), we may define Riemann sums
X
X
(f )(ti )((Ii )), pi (Ii ), ti Ii , (ti ) = pi .
f (pi )((Ii )) =
i

The resulting integral will be


Z
Z
Z
f ==
f d := (f )| | dt
(I)
(I)
I
Z
p
= = f x(t), y(t), z(t) x (t)2 + y (t)2 + z (t)2 dt.
I

i
i

You might also like